Home » SQL & PL/SQL » SQL & PL/SQL » SQL problem - variation on top-n-of-group. DDL and DML supplied. (Oracle - any version this millenium, also MySQL and PostgreSQL.)
SQL problem - variation on top-n-of-group. DDL and DML supplied. [message #653343] Sat, 02 July 2016 20:34 Go to next message
dragam
Messages: 31
Registered: May 2012
Location: Ireland
Member



Hi all,

I have an SQL problem and I'm stumped.


I have the following data (see DDL and DML at end of post)

"ex_id"; "c_id"; "guid"; "supplier"; "ts"
-------- ------- -------- ----------- ---------------------
42; 4; "aaaa"; 61; "2016-07-01 22:05:01"
34; 1; "xxxx"; 50; "2016-07-01 22:05:01"
33; 1; "xxxx"; 54; "2016-07-01 22:05:01"
35; 1; "xxxx"; 50; "2016-07-15 22:05:01"
36; 1; "xxxx"; 50; "2016-07-15 22:05:01"
37; 2; "yyyy"; 78; "2016-07-01 22:05:01"
38; 2; "yyyy"; 79; "2016-07-01 22:05:01"
39; 2; "yyyy"; 71; "2016-07-17 22:05:01"
40; 2; "yyyy"; 74; "2016-07-17 22:05:01"
41; 3; "zzzz"; 60; "2016-07-01 22:05:01"


and the resulting table - i.e. the records I want to retain in the table are:


"ex_id"; "c_id"; "guid"; "supplier"; "ts"
-------- ------- -------- ----------- ---------------------
42; 4; "aaaa"; 61; "2016-07-01 22:05:01"
35; 1; "xxxx"; 50; "2016-07-15 22:05:01"
36; 1; "xxxx"; 50; "2016-07-15 22:05:01"
39; 2; "yyyy"; 71; "2016-07-17 22:05:01"
40; 2; "yyyy"; 74; "2016-07-17 22:05:01"
41; 3; "zzzz"; 60; "2016-07-01 22:05:01"



i.e. Take the combination of c_id, guid and date which is the last, no matter
how many records there are with a given combination of same. The (guid) 'aaaa'
and the 'zzzz' records obviously stay since there's only 1 of each.

But the records with 'xxxx' and 'yyyy' have to be "pruned" - only keep those
with the latest date where c_id matches and guid matches.

1) This query will also have to work with MySQL, so CTEs and Analytic functions
are out - good old fashioned SQL only.

2) Ideally, I would like two queries, one under the assumption that ex_id is
sequential with time and the other that ex_id is not necessarily in sync with
the timestamp.

3) This appears to be a "top-n-per-group" problem of some sort - I would
appreciate an explanation of the query, not just the raw query itself - I'm
trying to learn rather than just a bald answer. References, URLs &c appreciated.

Just to show that I'm not totally lazy Smile, I have formulated what I think is
the first step, but I'm stuck. It would be easy enough to do this with CTEs but,
as I've said, the query also has to work with MySQL.

If you want to throw in solutions based on CTEs and/or SET operators and/or
Analytic functions, for pedagological purposes, great, but I do need a solution
that will also work with MySQL.



SELECT t1.c_id AS theid1, t1.guid AS theguid1, MAX(t1.ts) AS mydate1 FROM ex t1
INNER JOIN ex t2
ON t1.c_id = t2.c_id AND t1.guid = t2.guid
GROUP BY t1.c_id, t1.guid
ORDER BY t1.guid, MAX(t1.ts)

"theid1"; "theguid1"; "mydate1"
--------- ----------- ---------------------
4; "aaaa"; "2016-07-01 22:05:01"
1; "xxxx"; "2016-07-15 22:05:01"
2; "yyyy"; "2016-07-17 22:05:01"
3; "zzzz"; "2016-07-01 22:05:01"



Now, these are the records that **shouldn't** be deleted - without
the supplier data - which also has to be retained - i.e. I need to
retain more records than this - but this should be a start.


I just can't figure out how to formulate the statement to delete the
other records.

Maybe with SET operators, if MySQL had those?

Anyway, all and any input appreciated. If any further info. is required, please
let me know.


============= DDL and DML for the table ===============

CREATE SEQUENCE ex_seq
START WITH 1
INCREMENT BY 1
CACHE 100;



CREATE TABLE ex
(
ex_id number,
c_id number,
guid varchar(32),
supplier number,
ts timestamp,
PRIMARY KEY (ex_id)
);



insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 1, 'xxxx', 54, '2016-07-01 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 1, 'xxxx', 53, '2016-07-01 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 1, 'xxxx', 51, '2016-07-15 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 1, 'xxxx', 50, '2016-07-15 22:05:01');


insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 2, 'yyyy', 78, '2016-07-01 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 2, 'yyyy', 79, '2016-07-01 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 2, 'yyyy', 71, '2016-07-17 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 2, 'yyyy', 74, '2016-07-17 22:05:01');

insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 3, 'zzzz', 60, '2016-07-01 22:05:01');
insert into ex (ex_id, c_id,, c_id, guid, supplier, ts) VALUES (ex_seq.nextval, 4, 'aaaa', 61, '2016-07-01 22:05:01');

[Updated on: Sat, 02 July 2016 21:01]

Report message to a moderator

Re: SQL problem - variation on top-n-of-group. DDL and DML supplied. [message #653344 is a reply to message #653343] Sat, 02 July 2016 21:31 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9090
Registered: November 2002
Location: California, USA
Senior Member
Your insert statements are invalid. The number of columns to be inserted into and the number of values to insert do not match and the timestamps do not have formats. Also, the data does not match the other things that you posted. So, I have ignored the inserts and used the other things that you posted. It is not reasonable to expect the same SQL statements to work on Oracle and MySQL. The following is the simplest SQL that runs on Oracle that I can think of that gets the desired results. Hopefully, that will be similar enough to what you need for MySQL that you can figure out what to do for that as well. What it does is select the c_id and guid and the maximum ts for each of those unique combinations, then deletes all rows where the c_id and guid and ts are not in that list of unique combinations. I have broken down the steps for better understanding, but all you need is the delete statement.

-- starting data:
SCOTT@orcl_12.1.0.2.0> SELECT * FROM ex
  2  /

     EX_ID       C_ID GUID         SUPPLIER TS
---------- ---------- ---------- ---------- --------------------
        42          4 aaaa               61 2016-07-01 22:05:01
        34          1 xxxx               50 2016-07-01 22:05:01
        33          1 xxxx               54 2016-07-01 22:05:01
        35          1 xxxx               50 2016-07-15 22:05:01
        36          1 xxxx               50 2016-07-15 22:05:01
        37          2 yyyy               78 2016-07-01 22:05:01
        38          2 yyyy               79 2016-07-01 22:05:01
        39          2 yyyy               71 2016-07-17 22:05:01
        40          2 yyyy               74 2016-07-17 22:05:01
        41          3 zzzz               60 2016-07-01 22:05:01

10 rows selected.

-- criteria for rows that you want to keep:
SCOTT@orcl_12.1.0.2.0> SELECT c_id, guid, MAX(ts) ts
  2  FROM   ex
  3  GROUP  BY c_id, guid
  4  /

      C_ID GUID       TS
---------- ---------- --------------------
         4 aaaa       2016-07-01 22:05:01
         1 xxxx       2016-07-15 22:05:01
         2 yyyy       2016-07-17 22:05:01
         3 zzzz       2016-07-01 22:05:01

4 rows selected.

-- rows you want to keep:
SCOTT@orcl_12.1.0.2.0> SELECT *
  2  FROM   ex
  3  WHERE  (c_id, guid, ts) IN
  4  	    (SELECT c_id, guid, MAX(ts)
  5  	     FROM   ex
  6  	     GROUP  BY c_id, guid)
  7  /

     EX_ID       C_ID GUID         SUPPLIER TS
---------- ---------- ---------- ---------- --------------------
        42          4 aaaa               61 2016-07-01 22:05:01
        35          1 xxxx               50 2016-07-15 22:05:01
        36          1 xxxx               50 2016-07-15 22:05:01
        39          2 yyyy               71 2016-07-17 22:05:01
        40          2 yyyy               74 2016-07-17 22:05:01
        41          3 zzzz               60 2016-07-01 22:05:01

6 rows selected.

-- rows you do not want to keep:
SCOTT@orcl_12.1.0.2.0> SELECT *
  2  FROM   ex
  3  WHERE  (c_id, guid, ts) NOT IN
  4  	    (SELECT c_id, guid, MAX(ts)
  5  	     FROM   ex
  6  	     GROUP  BY c_id, guid)
  7  /

     EX_ID       C_ID GUID         SUPPLIER TS
---------- ---------- ---------- ---------- --------------------
        34          1 xxxx               50 2016-07-01 22:05:01
        33          1 xxxx               54 2016-07-01 22:05:01
        37          2 yyyy               78 2016-07-01 22:05:01
        38          2 yyyy               79 2016-07-01 22:05:01

4 rows selected.

-- delete rows you do not want to keep:
SCOTT@orcl_12.1.0.2.0> DELETE FROM ex
  2  WHERE  (c_id, guid, ts) NOT IN
  3  	    (SELECT c_id, guid, MAX(ts)
  4  	     FROM   ex
  5  	     GROUP  BY c_id, guid)
  6  /

4 rows deleted.

-- results:
SCOTT@orcl_12.1.0.2.0> SELECT * FROM ex
  2  /

     EX_ID       C_ID GUID         SUPPLIER TS
---------- ---------- ---------- ---------- --------------------
        42          4 aaaa               61 2016-07-01 22:05:01
        35          1 xxxx               50 2016-07-15 22:05:01
        36          1 xxxx               50 2016-07-15 22:05:01
        39          2 yyyy               71 2016-07-17 22:05:01
        40          2 yyyy               74 2016-07-17 22:05:01
        41          3 zzzz               60 2016-07-01 22:05:01

6 rows selected.

Once you are satisfied with the results, then:

SCOTT@orcl_12.1.0.2.0> COMMIT
  2  /

Commit complete.

edit:
I see that while I was testing and posting, you edited your post and fixed the mismatched number of columns, but they still lack timestamp formats and don't match your other posted data.

[Updated on: Sat, 02 July 2016 21:33]

Report message to a moderator

Re: SQL problem - variation on top-n-of-group. DDL and DML supplied. [message #653403 is a reply to message #653344] Mon, 04 July 2016 14:38 Go to previous messageGo to next message
dragam
Messages: 31
Registered: May 2012
Location: Ireland
Member


Hi Barbara,

Thanks for your input. My humble apologies for the SNAFU over
my DDL and DML - SQLFiddle wasn't available and I winged it Sad

However, all is not lost - because of this, I discovered the
livesql.oracle.com site - where one can test snippets like the
above and ensure that they're working correctly.

Even though the interface leaves a **lot** to be desired, at
least it will ensure that this error (at least on my part Smile )
will not happen again.

Needless to say, your logical and simple query won't work on
MySQL - but that's an issue for other sites.

Thanks again and rgs,


Ellen.

Re: SQL problem - variation on top-n-of-group. DDL and DML supplied. [message #653404 is a reply to message #653403] Mon, 04 July 2016 14:56 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Ellen,

You should install XE on your PC to making code testing & development easier.
Re: SQL problem - variation on top-n-of-group. DDL and DML supplied. [message #653408 is a reply to message #653404] Mon, 04 July 2016 16:49 Go to previous messageGo to next message
dragam
Messages: 31
Registered: May 2012
Location: Ireland
Member
BlackSwan wrote on Mon, 04 July 2016 21:56
Ellen,

You should install XE on your PC to making code testing & development easier.



Yes - but I'm travelling and my trusty old laptop doesn't have sufficient
swap space for XE - this machine is a bit like an old dog that one
doesn't want to put down! Smile

Ellen.
Re: SQL problem - variation on top-n-of-group. DDL and DML supplied. [message #653409 is a reply to message #653403] Mon, 04 July 2016 17:20 Go to previous message
Barbara Boehmer
Messages: 9090
Registered: November 2002
Location: California, USA
Senior Member
MySQL:

mysql> select * from mysql.ex;
+-------+------+------+----------+------------+
| ex_id | c_id | guid | supplier | ts         |
+-------+------+------+----------+------------+
|    42 |    4 | aaaa |       61 | 2016-07-01 |
|    34 |    1 | xxxx |       50 | 2016-07-01 |
|    33 |    1 | xxxx |       54 | 2016-07-01 |
|    35 |    1 | xxxx |       50 | 2016-07-15 |
|    36 |    1 | xxxx |       50 | 2016-07-15 |
|    37 |    2 | yyyy |       78 | 2016-07-01 |
|    38 |    2 | yyyy |       79 | 2016-07-01 |
|    39 |    2 | yyyy |       71 | 2016-07-17 |
|    40 |    2 | yyyy |       74 | 2016-07-17 |
|    41 |    3 | zzzz |       60 | 2016-07-01 |
+-------+------+------+----------+------------+
10 rows in set (0.00 sec)

mysql> delete t1.*
    -> from   mysql.ex t1
    -> left join
    ->        (select c_id, guid, max(ts) ts
    ->         from   mysql.ex
    ->         group  by c_id, guid) t2
    -> on     t1.c_id = t2.c_id and t1.guid = t2.guid and t1.ts = t2.ts
    -> where  t2.c_id is null;
Query OK, 4 rows affected (0.05 sec)

mysql> select * from mysql.ex;
+-------+------+------+----------+------------+
| ex_id | c_id | guid | supplier | ts         |
+-------+------+------+----------+------------+
|    42 |    4 | aaaa |       61 | 2016-07-01 |
|    35 |    1 | xxxx |       50 | 2016-07-15 |
|    36 |    1 | xxxx |       50 | 2016-07-15 |
|    39 |    2 | yyyy |       71 | 2016-07-17 |
|    40 |    2 | yyyy |       74 | 2016-07-17 |
|    41 |    3 | zzzz |       60 | 2016-07-01 |
+-------+------+------+----------+------------+
6 rows in set (0.00 sec)

mysql>

Also, we do have a MySQL sub-forum on these forums:

http://www.orafaq.com/forum/f/59/

[Updated on: Mon, 04 July 2016 17:29]

Report message to a moderator

Previous Topic: String Manipulations
Next Topic: query doubt
Goto Forum:
  


Current Time: Fri Apr 26 16:16:02 CDT 2024