One-to-many relationship, non-equality join criteria. Can't come up with a decent solution
Date: 3 Mar 2003 10:09:30 -0800
Message-ID: <ee93444d.0303031009.12e94730_at_posting.google.com>
Hi,
I can't seem to come up with an efficient solution for the
following problem:
BACKGROUND
I have data taken from a driving monitor and data from a journey
log, it consists of information such as vehicle and driver ids, date
and time of start and end of a trip, and other things. The goal is to
relate them both, but there are the following complications:
1.- start/end times in the driving monitor and journey log don't
match exactly.
2.- there isn't a one-to-one correspondence between driving monitor
and journey log records, as the monitor usually resets itself after a
short period (e.g., a rest stop in the middle of a journey), and then
starts a new record, but that still comes up as one entry in the
journey log (we'll ignore the case where one driving monitor record
comes under two log entries, for simplicity).
so the idea is to match a driving monitor record with the latest
journey started just before the DM's start time (ignoring for the
moment the fact that the log timestamp could be later than the
Graphically, a typical case could be represented a bit like this:
Journey Log: |-----------------------------------------------| (A)
Driving Monitor: |------------| |------| |------| |-------------|
(1) (2) (3) (4)
The horizontal axis represents time, the lines represent JL or DM entries, the letters/numbers in parentheses would be their respective IDs. In this case the idea is to end a with a relation where (A) links to (1), (2), (3), & (4), and nothing else.
THE DATA
Here's a sketch of my schema:
dr_monitor
vehicle_id (pk) odometer (pk) dt_start (idx)
...
j_log
journey_id (pk)
vehicle_id
dt_start (idx)
...
relation
vehicle_id (pk / fk ref. dr_monitor.vehicle_id) odometer (pk / fk ref. dr_monitor.odometer) journey_id (pk / fk ref. j_log.journey_id)
I tried something like:
INSERT INTO relation (vehicle_id, odometer, journey_id)
SELECT m.vehicle_id, odometer, journey_id
FROM dr_monitor m, j_log l
WHERE m.vehicle_id = l.vehicle_id
AND l.dt_start = (SELECT MAX(dt_start) FROM j_log WHERE j_log.dt_start <= dr_monitor.dt_start);
but a) the RDBMS I will be using (MySQL) doesn't support subqueries, and b) from testing in MS-Access with a reduced data set, it took painfully long.
then I also tried a procedural approach:
- START PSEUDO-CODE ------
FROM dr_monitor;");
foreach($row in $dataset1)
{
$dataset2 = do_query("SELECT journey_id
FROM j_log WHERE vehicle_id = $row['vehicle_id'] AND dt_start <= $row['dt_start'] ORDER BY dt_start DESC
$results[] = fetch_row($dataset2);
}
// Run INSERT query with the data from $results
- END PSEUDO-CODE ------
Has anybody come across a similar case before? I've run out of ideas for the moment, so I would be interested in any ideas, suggestions, comments from you people.
TIA.
Regards,
Diego Berge.
-- NOTE: e-mail address is not valid. Please reply to newsgroup. (a Google search will come up with valid e-mail addresses if you need to contact me privately)Received on Mon Mar 03 2003 - 19:09:30 CET