Path: news.easynews.com!newsfeed1.easynews.com!easynews.com!easynews!news-west.rr.com!news-server.columbus.rr.com!cyclone.rdc-nyc.rr.com!news-out.nyc.rr.com!twister.nyc.rr.com.POSTED!not-for-mail Reply-To: "John Gilson" From: "John Gilson" Newsgroups: comp.databases.theory References: Subject: Re: One-to-many relationship, non-equality join criteria. Can't come up with a decent solution Lines: 149 X-Priority: 3 X-MSMail-Priority: Normal X-Newsreader: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 Message-ID: Date: Mon, 03 Mar 2003 20:12:04 GMT NNTP-Posting-Host: 66.65.26.174 X-Complaints-To: abuse@rr.com X-Trace: twister.nyc.rr.com 1046722324 66.65.26.174 (Mon, 03 Mar 2003 15:12:04 EST) NNTP-Posting-Date: Mon, 03 Mar 2003 15:12:04 EST Organization: Road Runner - NYC Xref: newsfeed1.easynews.com comp.databases.theory:25135 X-Received-Date: Mon, 03 Mar 2003 13:15:58 MST (news.easynews.com) "nippur" wrote in message news:ee93444d.0303031009.12e94730@posting.google.com... > Hi, > > I can't seem to come up with an efficient solution for the > following problem: > > BACKGROUND > ---------- > > I have data taken from a driving monitor and data from a journey > log, it consists of information such as vehicle and driver ids, date > and time of start and end of a trip, and other things. The goal is to > relate them both, but there are the following complications: > > 1.- start/end times in the driving monitor and journey log don't > match exactly. > > 2.- there isn't a one-to-one correspondence between driving monitor > and journey log records, as the monitor usually resets itself after a > short period (e.g., a rest stop in the middle of a journey), and then > starts a new record, but that still comes up as one entry in the > journey log (we'll ignore the case where one driving monitor record > comes under two log entries, for simplicity). > > so the idea is to match a driving monitor record with the latest > journey started just before the DM's start time (ignoring for the > moment the fact that the log timestamp could be later than the > monitor's) SELECT M.vehicle_id, M.odometer, L1.journey_id FROM DrivingMonitor AS M INNER JOIN JourneyLog AS L1 ON M.vehicle_id = L1.vehicle_id AND L1.dt_start <= M.dt_start LEFT OUTER JOIN JourneyLog AS L2 ON L2.vehicle_id = L1.vehicle_id AND L2.dt_start > L1.dt_start AND L2.dt_start <= M.dt_start WHERE L2.vehicle_id IS NULL Given the described characteristics of the data, this might help you with your more general problem: SELECT M.vehicle_id, M.odometer, L1.journey_id FROM DrivingMonitor AS M INNER JOIN JourneyLog AS L ON M.vehicle_id = L.vehicle_id AND (M.dt_start BETWEEN L.dt_start AND L.dt_end OR L.dt_start BETWEEN M.dt_start AND M.dt_end) Regards, jag > Graphically, a typical case could be represented a bit like this: > > Journey Log: |-----------------------------------------------| > (A) > > Driving Monitor: |------------| |------| |------| > |-------------| > (1) (2) (3) (4) > > The horizontal axis represents time, the lines represent JL or DM > entries, the letters/numbers in parentheses would be their respective > IDs. In this case the idea is to end a with a relation where (A) > links to (1), (2), (3), & (4), and nothing else. > > > THE DATA > -------- > > Here's a sketch of my schema: > > dr_monitor > ========== > vehicle_id (pk) > odometer (pk) > dt_start (idx) > ... > > j_log > ===== > journey_id (pk) > vehicle_id > dt_start (idx) > ... > > relation > ======== > vehicle_id (pk / fk ref. dr_monitor.vehicle_id) > odometer (pk / fk ref. dr_monitor.odometer) > journey_id (pk / fk ref. j_log.journey_id) > > I tried something like: > > INSERT INTO relation (vehicle_id, odometer, journey_id) > SELECT m.vehicle_id, odometer, journey_id > FROM dr_monitor m, j_log l > WHERE m.vehicle_id = l.vehicle_id > AND l.dt_start = (SELECT MAX(dt_start) FROM j_log WHERE > j_log.dt_start <= dr_monitor.dt_start); > > but a) the RDBMS I will be using (MySQL) doesn't support subqueries, > and b) from testing in MS-Access with a reduced data set, it took > painfully long. > > then I also tried a procedural approach: > > ------ START PSEUDO-CODE ------ > > $dataset1 = do_query("SELECT vehicle_id, odometer, dt_start > FROM dr_monitor;"); > > foreach($row in $dataset1) > { > $dataset2 = do_query("SELECT journey_id > FROM j_log > WHERE vehicle_id = $row['vehicle_id'] > AND dt_start <= $row['dt_start'] > ORDER BY dt_start DESC > > $results[] = fetch_row($dataset2); > } > > // Run INSERT query with the data from $results > > ------ END PSEUDO-CODE ------ > > However, it didn't help much execution time-wise (there 26000 > journey logs and over 200000 driving monitor records). > > Has anybody come across a similar case before? I've run out of > ideas for the moment, so I would be interested in any ideas, > suggestions, comments from you people. > > TIA. > > Regards, > Diego Berge. > -- > NOTE: e-mail address is not valid. Please reply to newsgroup. > (a Google search will come up with valid e-mail addresses if you need > to contact me privately)