Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> Re: One-to-many relationship, non-equality join criteria. Can't come up with a decent solution

Re: One-to-many relationship, non-equality join criteria. Can't come up with a decent solution

From: John Gilson <jag_at_acm.org>
Date: Mon, 03 Mar 2003 20:12:04 GMT
Message-ID: <oGO8a.52200$Mh3.18654926@twister.nyc.rr.com>


"nippur" <dberge_at_mx2.redestb.es> wrote in message news:ee93444d.0303031009.12e94730_at_posting.google.com...
> Hi,
>
> I can't seem to come up with an efficient solution for the
> following problem:
>
> BACKGROUND
> ----------
>
> I have data taken from a driving monitor and data from a journey
> log, it consists of information such as vehicle and driver ids, date
> and time of start and end of a trip, and other things. The goal is to
> relate them both, but there are the following complications:
>
> 1.- start/end times in the driving monitor and journey log don't
> match exactly.
>
> 2.- there isn't a one-to-one correspondence between driving monitor
> and journey log records, as the monitor usually resets itself after a
> short period (e.g., a rest stop in the middle of a journey), and then
> starts a new record, but that still comes up as one entry in the
> journey log (we'll ignore the case where one driving monitor record
> comes under two log entries, for simplicity).
>
> so the idea is to match a driving monitor record with the latest
> journey started just before the DM's start time (ignoring for the
> moment the fact that the log timestamp could be later than the
> monitor's)

SELECT M.vehicle_id, M.odometer, L1.journey_id FROM DrivingMonitor AS M

            INNER JOIN
            JourneyLog AS L1
            ON M.vehicle_id = L1.vehicle_id AND
                   L1.dt_start <= M.dt_start
            LEFT OUTER JOIN
            JourneyLog AS L2
            ON L2.vehicle_id = L1.vehicle_id AND
                   L2.dt_start > L1.dt_start AND
                   L2.dt_start <= M.dt_start
WHERE L2.vehicle_id IS NULL

Given the described characteristics of the data, this might help you with your more general problem:

SELECT M.vehicle_id, M.odometer, L1.journey_id FROM DrivingMonitor AS M

            INNER JOIN
            JourneyLog AS L
            ON M.vehicle_id = L.vehicle_id AND
                   (M.dt_start BETWEEN L.dt_start AND L.dt_end OR
                    L.dt_start BETWEEN M.dt_start AND M.dt_end)

Regards,
jag

> Graphically, a typical case could be represented a bit like this:
>
> Journey Log: |-----------------------------------------------|
> (A)
>
> Driving Monitor: |------------| |------| |------|
> |-------------|
> (1) (2) (3) (4)
>
> The horizontal axis represents time, the lines represent JL or DM
> entries, the letters/numbers in parentheses would be their respective
> IDs. In this case the idea is to end a with a relation where (A)
> links to (1), (2), (3), & (4), and nothing else.
>
>
> THE DATA
> --------
>
> Here's a sketch of my schema:
>
> dr_monitor
> ==========
> vehicle_id (pk)
> odometer (pk)
> dt_start (idx)
> ...
>
> j_log
> =====
> journey_id (pk)
> vehicle_id
> dt_start (idx)
> ...
>
> relation
> ========
> vehicle_id (pk / fk ref. dr_monitor.vehicle_id)
> odometer (pk / fk ref. dr_monitor.odometer)
> journey_id (pk / fk ref. j_log.journey_id)
>
> I tried something like:
>
> INSERT INTO relation (vehicle_id, odometer, journey_id)
> SELECT m.vehicle_id, odometer, journey_id
> FROM dr_monitor m, j_log l
> WHERE m.vehicle_id = l.vehicle_id
> AND l.dt_start = (SELECT MAX(dt_start) FROM j_log WHERE
> j_log.dt_start <= dr_monitor.dt_start);
>
> but a) the RDBMS I will be using (MySQL) doesn't support subqueries,
> and b) from testing in MS-Access with a reduced data set, it took
> painfully long.
>
> then I also tried a procedural approach:
>
> ------ START PSEUDO-CODE ------
>
> $dataset1 = do_query("SELECT vehicle_id, odometer, dt_start
> FROM dr_monitor;");
>
> foreach($row in $dataset1)
> {
> $dataset2 = do_query("SELECT journey_id
> FROM j_log
> WHERE vehicle_id = $row['vehicle_id']
> AND dt_start <= $row['dt_start']
> ORDER BY dt_start DESC
>
> $results[] = fetch_row($dataset2);
> }
>
> // Run INSERT query with the data from $results
>
> ------ END PSEUDO-CODE ------
>
> However, it didn't help much execution time-wise (there 26000
> journey logs and over 200000 driving monitor records).
>
> Has anybody come across a similar case before? I've run out of
> ideas for the moment, so I would be interested in any ideas,
> suggestions, comments from you people.
>
> TIA.
>
> Regards,
> Diego Berge.
> --
> NOTE: e-mail address is not valid. Please reply to newsgroup.
> (a Google search will come up with valid e-mail addresses if you need
> to contact me privately)
Received on Mon Mar 03 2003 - 14:12:04 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US