Backblaze has two interesting blog posts about enterprise vs consumer drives. Their conclusions are that drives fail either when they're young, or when they're old, but not when they're in mid-life. They also found no real difference failure rates between the two classes. http://blog.backblaze.com/2013/11/12/how-long-do-disk-drives-last/ http://blog.backblaze.com/2013/12/04/enterprise-drive-reliability/
NoSQL - Data Center Centric Application Enablement
AUGUST 6 WEBINAR
About the Webinar
The growth of Datacenter infrastructure is trending out of bounds, along with the pace in user activity and data generation in this digital era. However, the nature of the typical application deployment within the data center is changing to accommodate new business needs. Those changes introduce complexities in application deployment architecture and design, which cascade into requirements for a new generation of database technology (NoSQL) destined to ease that complexity. This webcast will discuss the modern data centers data centric application, the complexities that must be dealt with and common architectures found to describe and prescribe new data center aware services. Well look at the practical issues in implementation and overview current state of art in NoSQL database technology solving the problems of data center awareness in application development.
NOTE! All attendees will be entered to win a guest pass to the NoSQL Now! 2013 Conference & Expo.
About the Speaker Robert Greene, Oracle NoSQL Product Management
Robert GreeneRobert Greene is a principle product manager / strategist for Oracle’s NoSQL Database technology. Prior to Oracle he was the V.P. Technology for a NoSQL Database company, Versant Corporation, where he set the strategy for alignment with Big Data technology trends resulting in the acquisition of the company by Actian Corp in 2012. Robert has been an active member of both commercial and open source initiatives in the NoSQL and Object Relational Mapping spaces for the past 18 years, developing software, leading project teams, authoring articles and presenting at major conferences on these topics. In his previous life, Robert was an electronic engineer developing first generation wireless, spread spectrum based security systems.
MorphoTrak: "Storing billions of images in a hybrid relational and NoSQL database using Oracle Active Data Guard and Oracle NoSQL Database"Database
Contest - Show off your NoSQL, win an iPad!!
A partner of ours, OpalSoft Inc, is running a contest to select the Coolest “Oracle NoSQL Database Application". It is simple to enter the contest, just go to http://www.nosqlcontest.com/ and submit information about an application you've built on the Oracle NoSQL Database. If you haven't built one already, you still have time, so go ahead and download create an application or integrate with some cool open source project then submit your entry. You have until July, 8th 2013 to complete your submission. The chosen winner will receive a new iPad with one of those retina displays, perfect for hanging out by the pool this summer! For complete details, visit the contest website.
I noticed in this article about Cassandra 1.2 that they have added the concept of vnodes, which allow you to have multiple nodes on a piece of hardware. This is pretty much the same as Oracle NoSQL Database's capability to place multiple Rep Nodes per Storage Node using the Capacity parameter. In general, the recommended starting point in configuring multiple Replication Nodes per Storage Node is one Rep Node per spindle or IO Channel.
The article also talks about Atomic Batching, which has been available in Oracle NoSQL Database since R1 through the various oracle.kv.KVStore.execute() methods. This capability allows an application to batch multiple operations against multiple records with the same major key in one atomic operation (transaction). Our users have all said that this is an important capability.
It's official: we've shipped Oracle NoSQL Database R2.
Of course there's a press release, but if you want to cut to the chase, the major features this release brings are:
- Elasticity - the ability to dynamically add more storage nodes and have the system rebalance the data onto the nodes without interrupting operations.
- Large Object Support - the ability to store large objects without materializing those objects in the NoSQL Database (there's a stream API to them).
- Avro Schema Support - Data records can be stored using Avro as the schema.
- Oracle Database External Table Support - A NoSQL Database can act as an Oracle Database External Table.
- SNMP and JMX Support
- A C Language API
There are both an open-source Community Edition (CE) licensed under aGPLv3, and an Enterprise Edition (EE) licensed under a standard Oracle EE license. This is the first release where the EE has additional features and functionality.
Congratulations to the team for a fine effort.
In an earlier post I noted that Berkeley DB Java Edition cleaner performance had improved significantly in release 5.x. From an Oracle NoSQL Database point of view, this is important because Berkeley DB Java Edition is the core storage engine for Oracle NoSQL Database.
Many contemporary NoSQL Databases utilize log based (i.e. append-only) storage systems and it is well-understood that these architectures also require a "cleaning" or "compaction" mechanism (effectively a garbage collector) to free up unused space. 10 years ago when we set out to write a new Berkeley DB storage architecture for the BDB Java Edition ("JE") we knew that the corresponding compaction mechanism would take years to perfect. "Cleaning", or GC, is a hard problem to solve and it has taken all of those years of experience, bug fixes, tuning exercises, user deployment, and user feedback to bring it to the mature point it is at today. Reports like Vinoth Chandar's where he observes a 20x improvement validate the maturity of JE's cleaner.
Cleaner performance has a direct impact on predictability and throughput in Oracle NoSQL Database. A cleaner that is too aggressive will consume too many resources and negatively affect system throughput. A cleaner that is not aggressive enough will allow the disk storage to become inefficient over time. It has to
- Work well out of the box, and
- Needs to be configurable so that customers can tune it for their specific workloads and requirements.
The JE Cleaner has been field tested in production for many years managing instances with hundreds of GBs to TBs of data. The maturity of the cleaner and the entire underlying JE storage system is one of the key advantages that Oracle NoSQL Database brings to the table -- we haven't had to reinvent the wheel.
Berkeley DB Java Edition 5.x has significant performance improvements. One user noted that they are seeing a 20x improvement in cleaner performance.
Here's an interesting article on YCSB benchmarks run on Cassandra, HBase, MongoDB, and Riak. Compare these to the Oracle NoSQL Database YCSB performance test results.Oracle NoSQL Database Performance Tests Oracle NoSQL Database Exceeds 1 Million Mixed YCSB Ops/Sec
Here's an Oracle NoSQL Database customer success story for Passoker, an online betting house.
There are a lot of great points made in the Solutions section, but as a developer the one I like the most is this one:
- Eliminated daily maintenance related to single-node points-of-failure by moving to Oracle NoSQL Database, which is designed to be resilient and hands-off, thus minimizing IT support costs
Blueprints is a collection of interfaces, implementations, ouplementations, and test suites for the property graph data model. Blueprints is analogous to the JDBC, but for graph databases. As such, it provides a common set of interfaces to allow developers to plug-and-play their graph database backend. Moreover, software written atop Blueprints works over all Blueprints-enabled graph databases. Within the TinkerPop software stack, Blueprints serves as the foundational technology for:
We ran a set of YCSB performance tests on Oracle NoSQL Database using SSD cards and Intel Xeon
E5-2690 CPUs with the goal of achieving 1M mixed ops/sec on a 95% read / 5% update workload. We used the standard YCSB parameters: 13 byte keys and 1KB data size (1,102 bytes after serialization). The maximum database size was 2 billion records, or approximately 2 TB of data. We sized the shards to ensure that this was not an "in-memory" test (i.e. the data portion of the B-Trees did not fit into memory). All updates were durable and used the "simple majority" replica ack policy, effectively 'committing to the network'. All read operations used the Consistency.NONE_REQUIRED parameter allowing reads to be performed on any replica.
In the past we have achieved 100K ops/sec using SSD cards on a single shard cluster (replication factor 3) so for this test we used 10 shards on 15 Storage Nodes with each SN carrying 2 Rep Nodes and each RN assigned to its own SSD card. After correcting a scaling problem in YCSB, we blew past the 1M ops/sec mark with 8 shards and proceeded to hit 1.2M ops/sec with 10 shards.Hardware Configuration
We used 15 servers, each configured with two 335 GB SSD cards. We did not have homogeneous CPUs across all 15 servers available to us so 12 of the 15 were Xeon E5-2690, 2.9 GHz, 2 sockets, 32 threads, 193 GB RAM, and the other 3 were Xeon E5-2680, 2.7 GHz, 2 sockets, 32 threads, 193 GB RAM. There might have been some upside in having all 15 machines configured with the faster CPU, but since CPU was not the limiting factor we don't believe the improvement would be significant.
The client machines were Xeon X5670, 2.93 GHz, 2 sockets, 24 threads, 96 GB RAM. Although the clients had 96 GB of RAM, neither the NoSQL Database or YCSB clients require anywhere near that amount of memory and the test could have just easily been run with much less.
Networking was all 10GigE.
We made three modifications to the YCSB benchmark. The first was to allow the test to accommodate more than 2 billion records (effectively int's vs long's). To keep the key size constant, we changed the code to use base 32 for the user ids.
The second change involved to the way we run the YCSB client in order to make the test itself horizontally scalable.The basic problem has to do with the way the YCSB test creates its Zipfian distribution of keys which is intended to model "real" loads by generating clusters of key collisions. Unfortunately, the percentage of collisions on the most contentious keys remains the same even as the number of keys in the database increases. As we scale up the load, the number of collisions on those keys increases as well, eventually exceeding the capacity of the single server used for a given key.This is not a workload that is realistic or amenable to horizontal scaling. YCSB does provide alternate key distribution algorithms so this is not a shortcoming of YCSB in general.
We decided that a better model would be for the
key collisions to be limited to a given YCSB client process. That way,
as additional YCSB client processes (i.e. additional load) are added, they each maintain the same number
of collisions they encounter themselves, but do not increase the number
of collisions on a single key in the entire store. We added client processes proportionally to the number of records in the database (and therefore the number of shards).
This change to the use of YCSB better models a use case where new groups of users are likely to access either just their own entries, or entries within their own subgroups, rather than all users showing the same interest in a single global collection of keys. If an application finds every user having the same likelihood of wanting to modify a single global key, that application has no real hope of getting horizontal scaling.
Finally, we used read/modify/write (also known as "Compare And Set") style updates during the mixed phase. This uses
versioned operations to make sure that no updates are lost. This mode
of operation provides better application behavior than the way we
have typically run YCSB in the past, and is only practical at scale because we eliminated
the shared key collision hotspots.It is also a more realistic testing scenario. To reiterate, all updates used a simple majority replica ack policy making them durable.
In the table below, the "KVS Size" column is the number of records with the number of shards and the replication factor. Hence, the first row indicates 400m total records in the NoSQL Database (KV Store), 2 shards, and a replication factor of 3. The "Clients" column indicates the number of YCSB client processes. "Threads" is the number of threads per process with the total number of threads. Hence, 90 threads per YCSB process for a total of 360 threads. The client processes were distributed across 10 client machines.
We ran some benchmarks using FusionIO ioDrive2 SSD drives and Oracle NoSQL Database. FusionIO has published a whitepaper with the results of the benchmarks.
"Results of testing showed that using an ioDrive2 for data delivered nearly 30 times more operations per second than a 300GB 10k SAS disk on a 90 percent read and 10 percent write workload and nearly eight times more operations per second on a 50 percent read and 50 percent write workload. Equally impressive, an ioDrive2 reduced latency over 700 percent (seven times) on inserts in a 90 percent read and 10 percent write workload and over 5800 percent (58 times) on reads in a 50 percent read and 50 percent write workload."
Here's a short paper called De-mystifying "Eventual-Consistency" In Distributed Systems by Ashok Joshi.
Recently, there’s been a lot of talk about the notion of eventual consistency, mostly in the context of NoSQL databases and “Big Data”. This short article explains the notion of consistency, and also how it is relevant for building NoSQL applications.
Our colleagues at Cisco gave us access to their Unified Computing and Servers (UCS) labs for some Oracle NoSQL Database performance testing. Specifically, they let us use a dozen C210 servers for hosting the Oracle NoSQL Database Rep Nodes and a handful of C200 servers for driving load.
The C210 machines were configured with 96GB RAM, dual Xeon X5670 CPUs (2.93 GHz), and 16 x 7200 rpm SAS drives. The drives were configured into two sets of 8 drives, each in a RAID-0 array using the hardware controller, and then combined into one large RAID-0 volume using the OS. The OS was Linux 2.6.32-130.el6.x86_64.
Cisco 10GigE switches were used to connect all the machines (Rep Nodes and load drivers).
We used the Yahoo! Cloud System Benchmark
as the client for the tests. Our keysize was 13 bytes and the datasize
1108 bytes (that's how our serialization turned out for 1K of data).
We ran two phases: a load, and a 50/50 read/update benchmark. Because
YCSB only supports a Java integer's worth of records (2.1 billion), we
created 400 million records per NoSQL Database Rep Group. The "KVS
size" column shows the total number of records in the K/V Store followed
by the number of rep groups and replication factor in ()'s. For
example, "400m(1x3)" means 400m total records in a K/V Store consisting
of 1 Rep Group with a Replication Factor of 3 (3 Replication Nodes
The clients ran on the C200 nodes, which were configured with dual X5670 Xeon CPUs and 96GB of memory, although really only the CPU speed matters on that side of the equation since they were not memory or IO bound. Typically, we ran with 90 client threads per YCSB client process. In the table below, the total number of client processes is shown in the "Clients" column, and at 90 threads/client (in general), the total client threads is shown in the "Total Client Threads" column.
The Oracle NoSQL Database Rep Node cache sizes were configured such that the B+Tree Internal Nodes fit into memory, but the leaf nodes (the data) did not. Specifically, we configured them with 32GB of JVM heap and 22GB of cache. Therefore, the 50/50 Read/Update results are showing a single I/O per YCSB operation. The Durability was the NoSQL Database recommended (and default) value of no_sync, simple_majority, no_sync. The Consistency that we used for the 50/50 read/update test was Consistency.NONE.
Insert Avg Latency (ms)
95% Latency (ms)
99% Latency (ms)
400m(1x3) 3 90 15,139 26,498 3.3 5 7 1200m(3x3) 3 270 16,738 71,684 3.6 7
11 1600m(4x3) 4 360 17,053 94,441 3.7 7 18
50/50 Read/Update Results
Avg Read Latency
95% Read Latency
99% Read Latency
Avg Update Latency 95% Update Latency 99% Update Latency millions
ops/sec ms ms ms ms ms ms 400m(1x3)
30 5,595 4.8 13 50 5.6 13 52 1200m(3x3)
270 17,097 4.0 13 53 5.7 15 57 1600m(4x3)
360 24,893 4.0 12 43 5.3 14 51
The results demonstrate excellent scalability, throughput, and latency of Oracle NoSQL Database.
I want to say "thank you" to my colleagues at Cisco for sharing their extremely capable hardware, lab, and staff with us for these tests.
Inspired by some other "Getting started in 5 minutes" guides, we now have a Quick Start Guide for Oracle NoSQL Database. kvlite, the single process Oracle NoSQL Database, makes it incredibly easy to get up and running. I have to say the standard disclaimer: kvlite is only meant for kicking the tires on the API. It is not meant for any kind of performance evaluation or production use.
Install Oracle NoSQL Database
- Download the tar.gz file from http://www.oracle.com/technetwork/database/nosqldb/downloads/index.html.
gunzip and untar the
.tar.gzpackage (or unzip if you downloaded the
.zippackage). Oracle NoSQL Database version 1.2.116 Community Edition is used in this example.
$ gunzip kv-ce-1.2.116.tar.gz $ tar xvf kv-ce-1.2.116.tar kv-1.2.116/ kv-1.2.116/bin/ kv-1.2.116/bin/kvctl kv-1.2.116/bin/run-kvlite.sh kv-1.2.116/doc/ ... kv-1.2.116/lib/servlet-api-2.5.jar kv-1.2.116/lib/kvclient-1.2.116.jar kv-1.2.116/lib/kvstore-1.2.116.jar kv-1.2.116/LICENSE.txt kv-1.2.116/README.txt $
cd into the kv-1.2.116 directory to start the NoSQL Database server.
$ cd kv-1.2.116 $ java -jar lib/kvstore-1.2.116.jar kvlite Created new kvlite store with args: -root ./kvroot -store kvstore -host myhost -port 5000 -admin 5001
In a second shell, cd into the kv-1.2.116 directory and ping your KV
Lite to test that it's alive.
$ cd kv-1.2.116 $ java -jar lib/kvstore-1.2.116.jar ping -port 5000 -host myhost Pinging components of store kvstore based upon topology sequence #14 kvstore comprises 10 partitions and 1 Storage Nodes Storage Node [sn1] on myhost:5000 Datacenter: KVLite [dc1] Status: RUNNING Ver: 11gR188.8.131.52 Rep Node [rg1-rn1] Status: RUNNING,MASTER at sequence number: 31 haPort: 5011
Compile and run the Hello World example. This opens the Oracle NoSQL
Database and writes a single record.
$ javac -cp examples:lib/kvclient-1.2.116.jar examples/hello/HelloBigDataWorld.java $ java -cp examples:lib/kvclient-1.2.116.jar hello.HelloBigDataWorld Hello Big Data World! $
Peruse the Hello World example code and expand it to experiment more
with the Oracle NoSQL Database API.
Open the doc landing page (either locally
From there, the Getting Starting Guide
will introduce you to the NoSQL Database API. The Oracle NoSQL
Database Administrator's Guide
will help you understand how to plan and deploy a larger installation.
Remember, KVLite should only be used to become familiar with the NoSQL Database API. Any serious evaluation of the system should be done with a multi-process, multi-node configuration.
To install a standard, multi-node system, you need to repeat the
instructions above on how to unpack the package on any nodes that do
not yet have the software accessible. Then follow a few additional
steps, described in the Admin Guide
Installation chapter. Be sure to run ntp on each node in the system.
If you want to get started with a multi-node installation right away,
here's a sample script for creating a 3 node configuration on a set of
nodes named compute01, compute02, compute03.
You can execute it using
the NoSQL Database
configure "mystore" plan -execute deploy-datacenter BurlDC Burlington plan -execute deploy-sn 1 compute01 5000 Compute01StorageNode plan -execute deploy-admin 1 5001 addpool mySNPool joinpool mySNPool 1 plan -execute deploy-sn 1 compute02 5000 Compute02StorageNode joinpool mySNPool 2 plan -execute deploy-sn 1 compute03 5000 Compute03StorageNode joinpool mySNPool 3 plan -execute deploy-store mySNPool 3 100 show plans show topology quit
You can access the Adminstrative Console at http://compute01:5001/ at
any time after the
plan-execute deploy-admincommand to view the status of your store.
To evaluate performance, you will want to be sure to set JVM and cache
size parameters to values appropriate for your target
hosts. See Planning Your
Installation for information on how to determine those values. The
following commands are sample parameters for target machines that have
more than 32GB of memory. These commands would be invoked after the
configure "mystore" command.
set policy "javaMiscParams=-server -d64 -XX:+UseCompressedOops -XX:+AlwaysPreTouch -Xms32000m -Xmx32000m -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/tmp/gc-kv.log" set policy "cacheSize=22423814540"
You can ask questions, or make comments on the Oracle NoSQL Database OTN forum.
Oracle NoSQL Database release 1.2.123, both Community Edition (new) and Enterprise Edition, are now available for download on OTN:
In addition to some minor bug fixes, a performance improvement to the snapshot function, and deprecation of kvctl (see the changelog for details), the Community Edition is now available. The CE package includes source code and the license for CE is aGPLv3. The license for the EE remains the same as before (standard OTN license).
Here are my slides from my HPTS. There are some slides with performance figures starting at slide 28.
Oracle NoSQL Database's simple K/V pair model utilizes a B+Tree on each node to index by the key of each record. Is a Key-Value store useful with only primary key indexing? Absolutely.
- Click stream logs - indexed by timestamp or ipaddr
- User Profiles - indexed by UID
- Sensor/Stats/Network capture - indexed by timestamp
- Mobile device backup services - indexed by device id or user id
- Personalization - index these by user id and then do further look-up within the user id by sub key as needed
- Authentication services - indexed by user id
In an unstructured or semi-structured environment, primary-key indexing is very often sufficient. Further, consider the case of Map/Reduce post-processing of NoSQL Database data in any of the above scenarios. During the M/R steps, secondary indices, sometimes ad-hoc, are effectively generated on-the-fly.