Bloom filters
Date: Sun, 9 Dec 2012 19:53:00 +0000 (UTC)
Message-ID: <pan.2012.12.09.19.53.01_at_gmail.com>
I have first heard of Bloom filters when I started reading about Exadata. Bloom filtering is a probabilistic algorithm which for the given set can very quickly decide whether an item at hand is a member or not. The algorithm can produce false postives, but not false negatives.
http://en.wikipedia.org/wiki/Bloom_filter
The main use in the database, or so I thought, was for Exadata to search the blocks requested by the DB query. However, a recent discussion on the Metalink community forum has produced the following:
The Oracle database makes use of Bloom filters in the following 4 situations:
- To reduce data communication between slave processes in parallel joins: mostly in RAC
- To implement join-filter pruning: in partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list
- To support result caches: when you run a query, Oracle will first see if the results of that query have already been computed and cached by some session or user, and if so, it will retrieve the answer from the server result cache instead of gathering all of the database blocks
- To filter members in different cells in Exadata: Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom filters as to determine whether a row is a member of the desired result set.
You can identify a bloom pruning in a plan when you see :BF0000 in the Pstart and Pstop columns of the execution plan and PART JOIN FILTER CREATE in the operations column:
Two hints can be used: px_join_filter and no_px_join_filter; and there are a couple of views for monitoring Bloom filters:
- v$sql_join_filter: information about Bloom filters; number of rows filtered out and probed by Bloom filters
- v$pq_tqstat: check reduced communication due to usage of Bloom filters The Oracle database makes use of Bloom filters in the following 4 situations:
- To reduce data communication between slave processes in parallel joins: mostly in RAC
- To implement join-filter pruning: in partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list
- To support result caches: when you run a query, Oracle will first see if the results of that query have already been computed and cached by some session or user, and if so, it will retrieve the answer from the server result cache instead of gathering all of the database blocks
- To filter members in different cells in Exadata: Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom filters as to determine whether a row is a member of the desired result set.
You can identify a bloom pruning in a plan when you see :BF0000 in the Pstart and Pstop columns of the execution plan and PART JOIN FILTER CREATE in the operations column:
Two hints can be used: px_join_filter and no_px_join_filter; and there are a couple of views for monitoring Bloom filters:
- v$sql_join_filter: information about Bloom filters; number of rows filtered out and probed by Bloom filters
- v$pq_tqstat: check reduced communication due to usage of Bloom filters
The author is an Oracle employee, so I searched the Metalink. The search has produced the document 912330.1 which, while less detailed than the post above still confirms it. Of course, as with all new features, there are "features":
Bug 12637294: BLOOM PARTITION PRUNING MAY CAUSE DEADLOCK ORA-60
Funny thing about this particular "feature" is that it is marked as fixed in release 12.1.
There are also some good blog articles, like this one:
http://perfexpert2.wordpress.com/2011/05/21/undocumented-parameter/
-- Mladen Gogala The Oracle Whisperer http://mgogala.byethost5.comReceived on Sun Dec 09 2012 - 20:53:00 CET