Bloom filters

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Sun, 9 Dec 2012 19:53:00 +0000 (UTC)
Message-ID: <pan.2012.12.09.19.53.01_at_gmail.com>

I have first heard of Bloom filters when I started reading about Exadata. Bloom filtering is a probabilistic algorithm which for the given set can very quickly decide whether an item at hand is a member or not. The algorithm can produce false postives, but not false negatives.

http://en.wikipedia.org/wiki/Bloom_filter

The main use in the database, or so I thought, was for Exadata to search the blocks requested by the DB query. However, a recent discussion on the Metalink community forum has produced the following:

The Oracle database makes use of Bloom filters in the following 4 situations:

To reduce data communication between slave processes in parallel joins: mostly in RAC
To implement join-filter pruning: in partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list
To support result caches: when you run a query, Oracle will first see if the results of that query have already been computed and cached by some session or user, and if so, it will retrieve the answer from the server result cache instead of gathering all of the database blocks
To filter members in different cells in Exadata: Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom filters as to determine whether a row is a member of the desired result set.

You can identify a bloom pruning in a plan when you see :BF0000 in the Pstart and Pstop columns of the execution plan and PART JOIN FILTER CREATE in the operations column:

Two hints can be used: px_join_filter and no_px_join_filter; and there are a couple of views for monitoring Bloom filters:

v$sql_join_filter: information about Bloom filters; number of rows filtered out and probed by Bloom filters
v$pq_tqstat: check reduced communication due to usage of Bloom filters The Oracle database makes use of Bloom filters in the following 4 situations:
To reduce data communication between slave processes in parallel joins: mostly in RAC
To implement join-filter pruning: in partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list
To support result caches: when you run a query, Oracle will first see if the results of that query have already been computed and cached by some session or user, and if so, it will retrieve the answer from the server result cache instead of gathering all of the database blocks
To filter members in different cells in Exadata: Exadata performs joins between large tables and small lookup tables, a very common scenario for data warehouses with star schemas. This is implemented using Bloom filters as to determine whether a row is a member of the desired result set.

You can identify a bloom pruning in a plan when you see :BF0000 in the Pstart and Pstop columns of the execution plan and PART JOIN FILTER CREATE in the operations column:

Two hints can be used: px_join_filter and no_px_join_filter; and there are a couple of views for monitoring Bloom filters:

v$sql_join_filter: information about Bloom filters; number of rows filtered out and probed by Bloom filters
v$pq_tqstat: check reduced communication due to usage of Bloom filters

The author is an Oracle employee, so I searched the Metalink. The search has produced the document 912330.1 which, while less detailed than the post above still confirms it. Of course, as with all new features, there are "features":

Bug 12637294: BLOOM PARTITION PRUNING MAY CAUSE DEADLOCK ORA-60

Funny thing about this particular "feature" is that it is marked as fixed in release 12.1.

There are also some good blog articles, like this one:

http://perfexpert2.wordpress.com/2011/05/21/undocumented-parameter/

-- 
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com

Received on Sun Dec 09 2012 - 20:53:00 CET

This message: [ Message body ]
Next message: Mladen Gogala: "Re: Bloom filters"
Previous message: Mladen Gogala: "Re: RMAN error: ORA-01031 Insufficient privileges"
Next in thread: joel garry: "Re: Bloom filters"
Reply: joel garry: "Re: Bloom filters"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message