I noticed in this article about Cassandra 1.2 that they have added the concept of vnodes, which allow you to have multiple nodes on a piece of hardware. This is pretty much the same as Oracle NoSQL Database's capability to place multiple Rep Nodes per Storage Node using the Capacity parameter. In general, the recommended starting point in configuring multiple Replication Nodes per Storage Node is one Rep Node per spindle or IO Channel.
The article also talks about Atomic Batching, which has been available in Oracle NoSQL Database since R1 through the various oracle.kv.KVStore.execute() methods. This capability allows an application to batch multiple operations against multiple records with the same major key in one atomic operation (transaction). Our users have all said that this is an important capability.
It's official: we've shipped Oracle NoSQL Database R2.
Of course there's a press release, but if you want to cut to the chase, the major features this release brings are:
- Elasticity - the ability to dynamically add more storage nodes and have the system rebalance the data onto the nodes without interrupting operations.
- Large Object Support - the ability to store large objects without materializing those objects in the NoSQL Database (there's a stream API to them).
- Avro Schema Support - Data records can be stored using Avro as the schema.
- Oracle Database External Table Support - A NoSQL Database can act as an Oracle Database External Table.
- SNMP and JMX Support
- A C Language API
There are both an open-source Community Edition (CE) licensed under aGPLv3, and an Enterprise Edition (EE) licensed under a standard Oracle EE license. This is the first release where the EE has additional features and functionality.
Congratulations to the team for a fine effort.
In an earlier post I noted that Berkeley DB Java Edition cleaner performance had improved significantly in release 5.x. From an Oracle NoSQL Database point of view, this is important because Berkeley DB Java Edition is the core storage engine for Oracle NoSQL Database.
Many contemporary NoSQL Databases utilize log based (i.e. append-only) storage systems and it is well-understood that these architectures also require a "cleaning" or "compaction" mechanism (effectively a garbage collector) to free up unused space. 10 years ago when we set out to write a new Berkeley DB storage architecture for the BDB Java Edition ("JE") we knew that the corresponding compaction mechanism would take years to perfect. "Cleaning", or GC, is a hard problem to solve and it has taken all of those years of experience, bug fixes, tuning exercises, user deployment, and user feedback to bring it to the mature point it is at today. Reports like Vinoth Chandar's where he observes a 20x improvement validate the maturity of JE's cleaner.
Cleaner performance has a direct impact on predictability and throughput in Oracle NoSQL Database. A cleaner that is too aggressive will consume too many resources and negatively affect system throughput. A cleaner that is not aggressive enough will allow the disk storage to become inefficient over time. It has to
- Work well out of the box, and
- Needs to be configurable so that customers can tune it for their specific workloads and requirements.
The JE Cleaner has been field tested in production for many years managing instances with hundreds of GBs to TBs of data. The maturity of the cleaner and the entire underlying JE storage system is one of the key advantages that Oracle NoSQL Database brings to the table -- we haven't had to reinvent the wheel.
Berkeley DB Java Edition 5.x has significant performance improvements. One user noted that they are seeing a 20x improvement in cleaner performance.