Re: split block (torn page) problem

From: <allan.robertson_at_emc.com>
Date: Mon, 12 Dec 2011 06:53:51 -0500
Message-ID: <D55067F5A11D3B459D64A66C3038C0D70AB6FA280B_at_MX35A.corp.emc.com>



Laimis

The Oracle double checksum method had "smart" storage systems aware of different data blocks stored inside the storage system as holding Oracle database pages.

The idea here is that when a host write arrives at the storage destined for a region known to hold Oracle data, some logic would be exercised to execute a special form of "checksum re-verification". This is about identifying that an incoming 8KB block write is an Oracle DB page, and that there is an Oracle checksum that is located specifically at bytes 24 through 32 within that 8KB that should be used. This is over and above the regular SCSI data block CRC transmission check summing, etc.

If a corruption is detected, on that 8KB "Oracle DB page" that has been received, the storage system is supposed to immediately flag a SCSI IO error going back, as opposed to corrupting the previously stored data.

Now, however, Oracle has been working with a number of partners, e.g. EMC and Emulex, in driving a new end-to-end data integrity standard into the T10 standard body.

With this new standard, each component that is T10 PI compliant, formerly called T10 DIF, observes an expanded standard in the SCSI IO block structure which would ensure that they not only check the received data for correctness, but also passes along the standard data integrity checking information to the next physical device inside the SCSI request packet. Each component, starting from the host's HBA port, through the SAN switches, the array front side, the array backend ports, the physical drives, etc should all enforce the check along the way.

Instead of having to worry about which block that is written down is "relevant Oracle data", we are, inside the storage, contending with a standard SCSI write request that direct us specifically to take additional checking action on the data block received per the T10 standard definition, reporting back any potential errors back in the manner specified by the standard. The devices just need to be T10 DIF compliant. They do not have to worry about distinguishing between Oracle data and something else.

EMC's Yaron Dar, who wrote the techbook you quoted - "Oracle Databases on EMC Symmetrix Storage Systems", presented at Openworld this year covering T10 PI. A copy of the OOW slides can be found here with the info on T10 PI at slide 33 onwards

https://oracleus.wingateweb.com/published/oracleus2011/sessions/33580/S33580_1542630.pdf

Hope that this helps.

Allan
Principal Solutions Engineer,
Enterprise Applications,
Strategic Solutions Engineering
EMC Solutions Group (ESG)

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Dec 12 2011 - 05:53:51 CST

Original text of this message