Rule Learner should discover rules that classify any robots as friendly or unfriendly.
With Rule Learner, we usually represent the problem in two Excel tables. The first one describes all known instances of the problem and in this case it looks as follows:
The description of all attributes used in the above table goes to another Excel table (glossary):
That’s it! After placing these two tables in the standard Rule Learner project, we just need to double-click on the provided file from Windows Explorer”learn.bat“. The automatically produced rules are placed in another Excel file “GeneratedRules.xls” and look as below:
When I ran a rule engine, these rules correctly classified all 12 robots. It seems like the problem is successfully solved. Not so fast!
Applying machine learning algorithms to real-world problems for years, I know that automatically learned usually could suffer from over-fitting and it’s better to assume that they are far from perfect. It is especially true for small training sets like this one. No wonder, that the applied ML algorithm C4.5 produced these statistical metrics:
After applying cross-validation with 10 folds the learner warns us that the actual quality of these rules could be really bad when applied to new instances. So, I decided to add 4 more robots at the end of the table with training instances:
When I applied the same rules to classify these 16 robots, I received the following results:
All 4 new robots were incorrectly classified with old rules! My skepticism was justified.
Continuing Learning. So, now, when I have 16 robots (training instances with know classifications), why would I try to learn new rules? To do this I just clicked again on “learn.bat”. The following rules were quickly generated:
This time Rule Learner produced 9 rules instead of previous 3, but they still quite readable and look intuitive. Being applied to classification of 16 robots, these rules produced the following results:
While all new robots were classified correctly, the robot #7 was incorrectly and here is why:
This robot satisfied the conditions of the second rule and was classified as FRIENDLY instead of UNFRIENDLY.
Should we even try to generate perfect rules that guarantee to give the correct classification on all instances in the training set? The authors of the ML system WEKA (whose implementation of ML algorithms we currently used in Rule Learner) put it this way: “You would rather generate ‘sensible’ rules that avoid over-fitting the training set and thereby stand a better chance of performing well on new instances.”
This simple example demonstrates the necessity to create such a learning environment when our decision-making application is always in a process of continuing learning!
Ever-Learning Decisioning Systems. That’s why Rule Learner supports “Ever-Learning Decisioning” described in the following picture:
You can read more about it here. And in many practical cases it is the right architectural approach to a decisioning system to cover two different worlds: Analytical (where ML algorithms are used) and Operational (where a decision engine utilizes the latest business rules produced in the analytical world).
You may download Rule Learner (it’s completely free!) and try to run this example, add more robots, and generate more rules. You would be interested to see that even for this simple problem I split it into 2 projects: RobotsAnalytical and RobotsOperational. The rules generated in RobotsAnalytical have been saved into RobotsOperational! I didn’t create a special Rule Trainer for this simple case, but you will find another simple project called “Credits”. It contains 1,000 samples of debtors that should be classified as “good“ or “bad“ and a rule trainer is used for selection of different subsets of credit data.