RedHat recently released Drools 5 (and 5.1 is not far away), so we thought we’d do a little performance comparison of the tried and true Drools 4 and the new and fancy Drools 5. It’s also a good test run for our new benchmarking framework and data collection systems.
Platform
The new benchmarks are being run on an Amazon virtual machine (small instance), with the following hardware, OS and JVM.
| Attribute | Value |
|---|---|
| Server Type | Amazon 32-bit virtual machine (EC2) |
| Server Memory | 1.7 GB |
| OS | Sun-OS Version 5.11 |
| Java Version | 1.6.0_13 |
| JVM | Java HotSpot(TM) Client VM (build 11.3-b02, mixed mode) |
| JVM Memory (min) | -Xms1024m |
| JVM Memory (max) | -Xmx1024m |
Process
In order to get a good sense of how the two versions scale, we ran a series of tests with varying data sizes. We aim for 250 data points for each graph (these results are abbreviated due to the amount of time required to collect data). For example, the Banking benchmark is run from 5,000 through to 125,000 in steps of 5,000 (the number represents the number of cashflows). For each data size we run 250 iterations and average the results to obtain a single data point. Averaging smooths out the fluctuations found in the initial few runs. For anyone wishing to repeat these tests, the rules and object model are available for download.
The Banking benchmark aggregates credit and debit cashflows into accounting periods and determines the account balance at the end of each period after all the cashflows have been applied. Unlike Manners, in which many activations are placed on the agenda and one selected to fire, Banking forces the engine to fire all activations. This is a very good stress test for the engine.
Data Collection
The last time we ran benchmarks, back in 2007, we were using an old test framework that was an unholy combination of shell scripts, CSV files and GNU Plot. This time we’ve got a shiny new JSR94x based framework that allows us to capture additional data and events, such as memory usage. The following memory graphs are simple snapshots using MBeans before and after the rules are fired.
Rule Firing Time
We’ll start with the one everybody wants to see: speed. This graph records rule firing times for each data size. This graph only goes up to 125,000 as the data size. Previously, and in our formal report, 1,000,000 is the top end of the data size, but with the iterations at 250 per data size, the tests take significantly longer to run. Despite the abbreviated data set, a clear trend can be observed, with Drools/5 outperforming Drools/4.
Data Loading Time
The time taken to load the facts into working memory. This can be a significant factor in batch use cases. Banking is a good one, as some customers of ours are using rules engines for BASEL II compliance, which requires a lot of reconciliation across various books. In these scenarios, data load time can be an important factor. From the graph it’s clear even with the limited data run that Drools/5 is significantly faster.
Pre-Run Memory Used v Post-Run Memory Used – Drools/4
Measuring memory usage with Java applications is notoriously problematic. Here we simply capture the amount of heap memory used prior to rule execution (after facts are inserted into working memory), and heap memory used after rule execution is completed. We ran these tests with the -server option, and it appears that memory usage stabilizes around the 95,000 mark and is essentially flat after that.
Pre-Run Memory Used v Post-Run Memory Used – Drools/5
Here is the corresponding chart for Drools/5. It’s interesting to note such different shapes to the graphs. The figures have been double checked, and they’re correct. It’s the same code capturing the statistics in both cases, so clearly something fundamental has changed between the two versions.
Pre-Run Memory Used
Here’s an overlay of the two engines, where the different memory usage characteristics are more striking. At the ‘flat line’, once things have stabilized, it appears that Drools/4 uses slightly less memory than Drools/5.
Post_Run Memory Used
The same graph, but with post run memory snapshots. Seems nearly identical, but included here for completeness.
Bottom Line
Making broad statements about the suitability of a rules engine based on a single benchmark would be foolhardy, however given these results, the preliminary results from the remaining benchmarks and our own experiences with Drools/5, anyone running Drools/4 should feel confident that Drools/5 will perform as well as or better than Drools/4.






Nice post Steve
Steve et al:
Just a quick observation. We did the bank benchmark problem some time ago (last year?) and my only hesitation is that it does not seem to be a “proper” benchmark. Meaning, that while it can have a plethora of rules and/or data, it is generated and the rules are quite simple. Ergo, there is no forward chaining, backward chaining, extensive use of Rete, etc.
THAT being said, it is probably as good or better than Miss Manners BUT I would not put it on a level with the rest of the UT benchmarks such as Waltz-50 or WaltzDB-16/200. Waltz, and the others, are quite complex and require a full-blown rule engine to do them. The banking problem could, probably, theoretically, be done with a DTable approach.
We definitely agree that benchmarks are a preliminary performance measurement only and that any real project should develop their own benchmark using their rules and their data to see how important each facet of a benchmark is important. Most “business” applications don’t even need Rete nor forward chaining nor backward chaining. However, those things should never be a concern of the “business” user, only of the IT or rule engineer.
I really DO like the presentations here and probably will mimic them on some of the benchmarks that I am running as well. I didn’t think that there would be that much difference between Drools 4.0.4/7 and Drools 5.0.0/1. I think that benchmarks and performance will be discussed in much detail at October Rules Fest, http://www.OctoberRulesFest.Org, by Forgy, Riley and others. See you there!
SDG
jco