WEKA: Pentaho Data Mining

In most cases, today's Business Intelligence solutions are good or cheap. Never good and cheap at the same time. The more it's known, the more surprising becomes the case of Pentaho Data Mining, another great component of Pentaho Open Source Business Intelligence solutions. Once more and more company managers successfully implement analysis, reporting, and dashboard solutions into their IT systems, a lot of them still have many difficulties with data mining. Unfortunately, that's the thing that leverages Business Intelligence onto a completely new level.

Having a lot of numbers, spreadsheets full of values, charts, diagrams, histograms, so on and so forth, it rarely is easy to discover trends or specific dependencies soon enough to influence on them. It's not easy to predict, then, because everything needs to grow significantly before it's noticed. Pentaho Data Mining - the same as any other data mining solution - helps with discovering these trends before they're actually noticeable by human. The system's idea is to quickly analyze extremely large volumes of data and search for any trends that are being shaped. The most meaningful advantage is the fact Pentaho Data Mining tool highlights the trends long before they become "traditionally" noticeable, what makes it a real value in today's Business Intelligence.
Which contractor to choose? Which product to buy? Sometimes it truly is difficult to decide, and these are the cases which efficient Data Mining solution could help with.
Who - beside the system - would be able to find out that this type contractors seem to delay the payments and - therefore - there is quite a significant risk the one we want to begin cooperating with would do the same? None.

Pentaho Data Mining distinguishes itself with tight integration with other Business Intelligence tools what makes it even more "intelligent". The numbers no longer are just numbers, but there always is a story behind them that lets users know more about the highlighted trend. With a full support from data integration, analysis, dashboards, and reporting, Pentaho Data Mining is truly a worth considering solution.

Data mining with Pentaho in practice

The idea of data mining isn't complex, however doing it "manually" could be difficult and time consuming. What's more, wouldn't guarantee the final success. Data mining within Pentaho Data Mining tools begins with choosing a model. There are numerous options to choose from - segmentation, decision trees, nets, random forests, clustering, and many others. On the chosen models can depend the efficiency of data mining. Then, data is added. After this, there is a need to adapt the chosen model to sample data - it's a crucial moment, thereupon there are two methods to choose from. In all cases, it could be done automatically (following the most common procedures and parameters). However, it sometimes is possible to do it personally, as well. Nonetheless, the adjusted parameters require testing. Thereupon, it's suggested to verify the model on some data from the future (and check out whether the output is more or less the same with what happened next). Perfecting, if needed, can be provided later to ensure the model suits data as good as possible. Finally, it comes to data mining in the form most people understand it. Once there is data inputted and model suited up and perfected, is a time for delivering the output. How the results are going to look, depends on Pentaho Data Mining user. However, there always are different options to choose from (alerts in other applications, graphical illustrations, etc.).

Pentaho Data Mining features

Among many others, main Pentaho Data Mining features are:

  • powerful engine working well even with the largest data volumes
  • numerous and differentiated learning algorithms originated from the Weka (principal component analysis, random forests, decision trees, neural networks, segmentation, clustering, so on and so forth)
  • simplified and accelerated data integration
  • automated data transforming capability (from almost any other to the format Pentaho Data Mining requires)
  • twofold algorithm applying methods (from Java code or directly to the dataset)
  • various methods for output presentation
  • differentiated filters for data analysis
  • PMML (Predictive Model Markup Language) support
  • graphical user interfaces
  • efficient hidden relationships and patterns uncovering capabilities
  • using already discovered patterns in future data mining
  • embedding insights into other applications capabilities (patterns, then, can be displayed every time they could be useful, not only when one wants to check for them)

Pentaho Data Mining resources

  • http://www.pentaho.com/download/asset_container.php?durl=pentaho_data_mining.pdf&furl=540000 - a short story about Pentaho Data Mining. Not a rightful data sheet and not a full technical documentation, however the source surely worth consideration. There are all important information included, nonetheless almost no specific data is given. One shouldn't look for technical details nor parameters or requirements, as they're not included. What - on the other hand - can be considered useful and worth interesting, are regular explanations. Each and every signalized feature is there described and explained in a few sentences what makes it easier to understand. Additionally, there is a table illustrating the differences between Pentaho solutions Community Edition and Enterprise Edition included.
  • http://forums.pentaho.com/forumdisplay.php?81-Pentaho-Data-Mining-WEKA - once Pentaho solutions are spread as an open source, other users' support seems to be the most useful in a lot of cases. Thereupon, it's suggested to visit the official forum of Pentaho Data Mining where the threads are devoted strictly to the solution. However, not only users are posting on the forum. Also, this is where it's the easiest and the fastest to receive an answer from Pentaho specialists, too.
  • http://www.openwebapplications.com/open-source-apps/pentaho-bi - here are a few words about Pentaho Data Mining tool itself, however the website is devoted rather to all open source business applications. Thereupon, there is quite a wide description of Pentaho Business Intelligence suite which everybody - including Pentaho Data Mining users - can benefit from. It always is good to know other related applications.
  • http://news.cnet.com/Pentaho-buys-open-source-data-mining-project/2100-7344_3-6117340.html - that's how it's begun. Before Pentaho Data Mining actually became Pentaho, it was a separate open source project called Weka. Here's the article describing how the whole thing began, a few words about Weka's history, and quite a lot about the acquisition itself. Just a "historical background" for the ones who want to know what they work with.