Home » SQL & PL/SQL » SQL & PL/SQL » Partitioning strategy for over time transactions
Partitioning strategy for over time transactions [message #304048] Mon, 03 March 2008 21:28 Go to next message
kman01
Messages: 12
Registered: January 2008
Junior Member
Hi Gurus,

Im developing a data warehouse solution for a client where the transaction they deal with doesn't take place in a single moment like a financial transaction. Instead their transactions take place over a period of time ranging from a few minutes to a few days. Essentially the transaction has a start and end date.

The client has many millions of records in the FACT table (close to a billion).

Im looking for a strategy to partition these records in a way that is usable in their normal queries.

The queries that we use are similar to the "Any Interact" clause used in spatial queries. So if the dates requested have any interaction with the start and end date of the transaction then we want those records.

I hope you understand this question. Im sure this issue has been faced by many people but not myself (Until Now)

Thankyou

Kingsley Callender
AirServices Australia.


[Updated on: Mon, 03 March 2008 22:14]

Report message to a moderator

Re: Partitioning strategy for over time transactions [message #304101 is a reply to message #304048] Tue, 04 March 2008 01:07 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
Depepnds on the date ranges in the queries really.

You should collect all the queries against that table over a time period, and then make your decision.

Partitioning against the dates, might degrade performance against other queries.

Are the queries mainly based on financial tax year, calendar year, calendar month, tax month etc...

You want to create partitions that allow the maximum number of queries to be resolved within a single partition. You want to use as many LOCAL indexes as possible. Each tiem you create a global index, it will need to build against a billion or more rows.

Also think about archiving strategy. Do you ever need to archive rows off. Partitions can facilitate that also.

Sorry to be so vague, but there is no generic solution to partitioning. It really depends ont he queries being run.
Re: Partitioning strategy for over time transactions [message #304294 is a reply to message #304048] Tue, 04 March 2008 15:37 Go to previous messageGo to next message
kman01
Messages: 12
Registered: January 2008
Junior Member
Its a datawarehouse thus the queries are always date based (well at least 95%)

I.E. give me a set of data between date1 and date2. The queries we write always have to have three "or" clauses to ensure we get all information as the transactions also take place over a period

An example (Psuedo Coded)

SELECT *
FROM ALL_FLIGHTS
WHERE 
     (FLIGHT_START_DATE < :REPORT_END_PERIOD
      AND
      FLIGHT_START_DATE > :REPORT_START_PERIOD)
  OR 
     (FLIGHT_END_DATE < :REPORT_END_PERIOD
      AND
      FLIGHT_END_DATE > :REPORT_START_PERIOD)
  OR 
     (FLIGHT_START_DATE < :REPORT_START_PERIOD
      AND
      FLIGHT_END_DATE > :REPORT_END_PERIOD)

[Updated on: Tue, 04 March 2008 15:38]

Report message to a moderator

Re: Partitioning strategy for over time transactions [message #304303 is a reply to message #304048] Tue, 04 March 2008 16:37 Go to previous messageGo to next message
coleing
Messages: 213
Registered: February 2008
Senior Member
Ok, ill re-arrange my wording, my point is the same:-

If your reports are generally the same as you have said:-

You need to accomodate the most common query.

ie: :REPORT_START_DATE and :REPORT_END_DATE

IF they are generaly for a year, for a mnth, across many years etc...

This is what will determine your partition date ranges.

Ideally, you want the maximum number of queries to query a single partition as possible.

So partitioning by month, when most queries are for a whole year is inneficient, as all queries will be reading 12 partitions.

Same the opposite way around. You could be querying a whole years worth of data every time you run a query for a few days.

As a side point, if you have a 1 billion row fact table, why are the report queries not looking at time dimensions instead of looking at the whole fact table?
Re: Partitioning strategy for over time transactions [message #304308 is a reply to message #304303] Tue, 04 March 2008 17:38 Go to previous messageGo to next message
kman01
Messages: 12
Registered: January 2008
Junior Member
The reports are based mostly on monthly some more some less but a guesstimate would be monthly.

Which column\value would you partition on Start Date or End Date or is there some magic bullet hidden in the supplied packages that deals with date ranges.

If i partition via START_DATE the query based on the start_date would know what partition the record lives in but since i also have to include end date it will end up going through them all at a minimum a global scan of an index (bypassing partitioning).

I hope you see my issue here. Its not the amount of time between partitioning but instead what to base the partitions on.

Maybe im looking for some magic on the query side like "Does date range A interact with date range B" this would make it easy to partition as id do it via the "Date Range" column/datatype/function what ever they provide?.

Im suprised oracle do not provide a mechanism for this. Ive dealt with spatial information and the concept of "ANY INTERACT" works well there so why not for date ranges?

I hope this clears the issue of the time period between partitions i can play with this an optimise as needed.

The unique thing with this is that the transactions occur over a period of time not at a single point like a normal transaction based system.


Thankyou

Kingsley

Re: Partitioning strategy for over time transactions [message #304323 is a reply to message #304308] Tue, 04 March 2008 20:47 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
Firstly, if you can make an assumption that :REPORT_END_PERIOD > :REPORT_START_PERIOD and a similar assumption that FLIGHT_END_DATE > FLIGHT_START_DATE, then I think your sample query can be condensed:
SELECT *
FROM ALL_FLIGHTS
WHERE 
     (FLIGHT_START_DATE < :REPORT_END_PERIOD
      AND
      FLIGHT_END_DATE > :REPORT_START_PERIOD)


So what we have is two unbounded range scans on different columns.

This is a nasty situation and goes beyond partitioning. See this article for more issues on range-keyed tables.

There is simply no good way to partition range-keyed tables (to the advantage of range queries) unless you can impose a limited maximum duration on the range.

If - for example - you could limit the duration of a flight to no more than 2 days, and had column constraints on the table that enforced:
- FLIGHT_START_DATE < FLIGHT_END_DATE
- FLIGHT_START_DATE > FLIGHT_END_DATE - 2
- FLIGHT_END_DATE < FLIGHT_START_DATE + 2

Then by transitive logic, the above (condensed) SQL would become:
SELECT *
FROM ALL_FLIGHTS
WHERE 
      FLIGHT_START_DATE < :REPORT_END_PERIOD
AND   FLIGHT_END_DATE - 2 < :REPORT_END_PERIOD
AND   FLIGHT_END_DATE > :REPORT_START_PERIOD
AND   FLIGHT_START_DATE + 2 > :REPORT_START_PERIOD


further arithmetic logic gives us:

SELECT *
FROM ALL_FLIGHTS
WHERE 
      FLIGHT_START_DATE < :REPORT_END_PERIOD
AND   FLIGHT_START_DATE > :REPORT_START_PERIOD - 2
AND   FLIGHT_END_DATE > :REPORT_START_PERIOD
AND   FLIGHT_END_DATE < :REPORT_END_PERIOD + 2


which is of course:

SELECT *
FROM ALL_FLIGHTS
WHERE 
      FLIGHT_START_DATE BETWEEN :REPORT_START_PERIOD - 2 AND :REPORT_END_PERIOD
AND   FLIGHT_END_DATE BETWEEN :REPORT_START_PERIOD :REPORT_END_PERIOD + 2


This would allow you to partition on either date to advantage.

Only problem is that Oracle will probably not perform all of these transitions for you. There is some transitive logic built into the CBO, but I suspect it works only for = predicates. It certainly doesn't do arithmetic transitions. You would have to supply that final SQL as-is.
Re: Partitioning strategy for over time transactions [message #304328 is a reply to message #304323] Tue, 04 March 2008 21:45 Go to previous message
kman01
Messages: 12
Registered: January 2008
Junior Member
rleishman wrote on Tue, 04 March 2008 20:47


SELECT *
FROM ALL_FLIGHTS
WHERE 
     (FLIGHT_START_DATE < :REPORT_END_PERIOD
      AND
      FLIGHT_END_DATE > :REPORT_START_PERIOD)





Nice work! i like that. didnt look at it this way.

I think we will put a MAX/MIN identifier in the DIM_DATE table this way based on the date given we will be able to narrow the query down to an ID range. Everything that we record that flys has a limited time in the air.

The ID is based on the Departure Date and a few other bits. We will also partition by the ID.

This was pretty well the conclusion we came to without deliving into the world of mathematics and trying to figure out some complex way of storing date ranges and then comparing them. I think Oracle could have done this for us already! Maybe 12G.

Thankyou for your time and effort on this issue.

Kingsley



Previous Topic: insert append over dblink
Next Topic: Dynamic Variable Declaration how to do ? (merged)
Goto Forum:
  


Current Time: Sat Dec 10 20:15:09 CST 2016

Total time taken to generate the page: 0.09727 seconds