Home » SQL & PL/SQL » SQL & PL/SQL » Alternative approach for a query run programmatically. (12)
Alternative approach for a query run programmatically. [message #683574] Mon, 01 February 2021 18:13 Go to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
I have a query that returns a large data set from a table where each row is unique.
This has worked well for years to stream and consume in code. However, an update
is pending that requires associating it with a few other tables which contain one
to zero or more records.

For example:
CREATE TABLE "Users" (
	"id" INT,
	"FirstName" VARCHAR2(255),
	"LastName" VARCHAR2(255) NOT NULL,
	constraint USERS_PK PRIMARY KEY ("id")
);

CREATE TABLE "UserData" (
	"id" INT NOT NULL,
	"user_id" INT NOT NULL,
	"ColA" INT NOT NULL,
	"ColB" INT NOT NULL,
	constraint USERDATA_PK PRIMARY KEY ("id")
);
ALTER TABLE "UserData" ADD CONSTRAINT "UserData_fk0" FOREIGN KEY ("user_id") REFERENCES "Users"("id");
As the UserData table contains from 0 to many rows for each account, this
makes a regular join difficult. I also do not want to consume all the related tables
initially and hold them in memory.

One approach may be to use an ordered union and select null for each column from
the other tables I want, and also select a fixed value that is unique to each table
so as I stream it, I can parse it.

For example:
WITH cte AS (
    SELECT 'a' "tb", id, FirstName, LastName, NULL ColA, NULL ColB FROM Users
    UNION ALL
    SELECT 'b' "tb", user_id id, NULL FirstName, NULL LastName, ColA, ColB FROM UserData
)
SEELCT * FROM cte
ORDER BY tb,id
Does a more elegant facility exist with Oracle to help me with streaming the
related data from these tables without consuming all of the 1 to many related
tables first?

Re: Alternative approach for a query run programmatically. [message #683576 is a reply to message #683574] Tue, 02 February 2021 01:06 Go to previous messageGo to next message
John Watson
Messages: 8535
Registered: January 2010
Location: Global Village
Senior Member
Why can you not use an outer join?

Possibly the problem is your use of terms "streaming", "consuming", and "hold in memory". What do you mean by these?
Re: Alternative approach for a query run programmatically. [message #683577 is a reply to message #683576] Tue, 02 February 2021 08:32 Go to previous messageGo to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
Hi,
I have a few other tables to join, as they are also one to many, therefore it becomes ambiguous as to what rows are or are not duplicates.

For example, the primary table will have unique rows, so repeating values are understood to be duplicates. But the second and third table may have 5 and 8 rows for a given record in the primary table.

Using an outer join would make it ambiguous as to what the actual row count is for the final result set for a given record.


One approach is to read all the data in the related tables initially and build a map, then read the data from the primary set and lookup any related data. This application does not hold all the data in memory by design, as it requires data, it sends it off to another consuming application which keeps its memory usage low. Reading all the data at once undermines that.
Re: Alternative approach for a query run programmatically. [message #683578 is a reply to message #683577] Tue, 02 February 2021 11:50 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3048
Registered: January 2010
Location: Connecticut, USA
Senior Member
ritmo2k wrote on Tue, 02 February 2021 09:32

Using an outer join would make it ambiguous as to what the actual row count is for the final result set for a given record.
What do you want to count? Just number of users? If so, COUNT(DISTINCT "Users"."id") will give you that. Need count of row for each id? COUNT(*) OVER(PARTITION BY Users"."id") will give you that.

SY.
Re: Alternative approach for a query run programmatically. [message #683580 is a reply to message #683578] Wed, 03 February 2021 13:05 Go to previous messageGo to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
Hi,
I am not trying to count any quantity of rows.

Exploring the join approach, how can I include a column in the join that indicates what table the result set originates from?
Re: Alternative approach for a query run programmatically. [message #683581 is a reply to message #683580] Wed, 03 February 2021 13:30 Go to previous messageGo to next message
John Watson
Messages: 8535
Registered: January 2010
Location: Global Village
Senior Member
Quote:
how can I include a column in the join that indicates what table the result set originates from?
This question suggests that you may not be familiar with the concepts behind relational engineering. A row generated by joining two or more rows does not come from one table, it comes from two tables. Looking back at your other posts there are more indications that you are not familiar with some underlying concepts. Your use of "unique" and "duplicate" is rather odd, for example.

If you could describe what you need to do with these rows (this "consumption" process) it might help.
Re: Alternative approach for a query run programmatically. [message #683582 is a reply to message #683581] Wed, 03 February 2021 14:19 Go to previous messageGo to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
Hi,
Sorry for the confusion, please let me know if the following helps:

The data is consumed in an application, the primary table contains rows that are distinct, for each row of this primary table there may be from zero to many related rows from 4 other tables.

If you perform a typical left join on the primary table with each of the other four tables, your result set (per primary record) starts increasing based on the number of rows in each of the related tables.

For example, if the first table contains 2 records for a given row in the primary table, and the second table contains 3 records for the same row in the primary table, it becomes ambiguous to differentiate within the result set.

Using a join is only unambiguous when the tables involved are one to one. What I am looking for is a strategy to differentiate the data when consuming it programmatically within code.

Thanks for the patience.
Re: Alternative approach for a query run programmatically. [message #683584 is a reply to message #683582] Thu, 04 February 2021 01:37 Go to previous messageGo to next message
John Watson
Messages: 8535
Registered: January 2010
Location: Global Village
Senior Member
Sorry, I do not understand. What do you mean be "ambiguous"? Your use of "distinct" and "primary" is also a bit odd, an entity relationship diagram might help. And again, if you would describe what you are going to do with these rows it would help: what is the "consumption" process? ( It sounds more like "mastication" at the moment Smile )
Re: Alternative approach for a query run programmatically. [message #683586 is a reply to message #683584] Thu, 04 February 2021 06:18 Go to previous messageGo to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
Hi John,
No need to apologize, I will try to be more clear. As I don't have quick access to get a test schema created, I'll use sqlite as an example:

CREATE TABLE Users
(
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    FirstName TEXT NOT NULL,
    LastName TEXT NOT NULL
);
INSERT INTO Users VALUES(1,'John','Doe');
INSERT INTO Users VALUES(2,'Jim','Berry');
INSERT INTO Users VALUES(3,'Bob','Moore');
CREATE TABLE UserDataA
(
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL,
    type TEXT NOT NULL,
    value TEXT NOT NULL,
    FOREIGN KEY(user_id) REFERENCES Users(id)
);
INSERT INTO UserDataA VALUES(1,1,'role','FIN');
INSERT INTO UserDataA VALUES(2,1,'role','DEV');
INSERT INTO UserDataA VALUES(3,3,'role','SHP');
CREATE TABLE UserDataB
(
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id INTEGER NOT NULL,
    type TEXT NOT NULL,
    value TEXT NOT NULL,
    FOREIGN KEY(user_id) REFERENCES Users(id)
);
INSERT INTO UserDataB VALUES(1,1,'attr','a');
INSERT INTO UserDataB VALUES(2,1,'attr','a');
INSERT INTO UserDataB VALUES(3,1,'attr','b');
I am reading this data in an application, from within compiled code. I do not want to read individual tables entirely, I would prefer to read one statement and process it as it arrives. Given the above criteria, we can see the following points:


  • The Users table is single valued, only row relates to one record.
  • The UserDataA and UserDataB tables are multi valued, from none to unlimited row relate to one User record.
I would then need to assemble the data programmatically as map of the following:


  • Column name to column value for data in the Users table.
  • Column name to column values for data in the UserDataA and UserDataB tables.
In the above data set, this yields:
record 0:
    FirstName: John
    LastName: Doe
    role: [FIN, DEV]
    attr: [a, a, b]

record 1:
    FirstName: Jim
    LastName: Berry
    role: []
    attr: []

record 2:
    FirstName: Bob
    LastName: Moore
    role: [SHP]
    attr: []
However, if you simply left join this data, you cannot identify the actual count of values due to the repetition:
SELECT Users.FirstName,Users.LastName,UserDataA.type,UserDataA.value,UserDataB.type,UserDataB.value
FROM Users
LEFT JOIN UserDataA on Users.id = UserDataA.user_id
LEFT JOIN UserDataB on Users.id = UserDataB.user_id
ORDER BY Users.FirstName,Users.LastName,UserDataA.type,UserDataA.value,UserDataB.type,UserDataB.value
which produces the first table in the image.

As a result, one approach is to create a select for each table that returns a column for the combined data set, and a column that indicates what table the data originates from and then perform a union all:
SELECT 'a' AS "tbl", id, FirstName, LastName, NULL AS UserDataAtype, NULL AS UserDataAvalue, NULL AS UserDataBtype, NULL AS UserDataBvalue
FROM Users
UNION ALL
SELECT 'b' AS "tbl", user_id AS "id", NULL, NULL, type, value, NULL, NULL
FROM UserDataA
JOIN Users ON UserDataA.user_id = Users.id
UNION ALL
SELECT 'c' AS "tbl", user_id AS "id", NULL, NULL, NULL, NULL, type, value
FROM UserDataB
JOIN Users ON UserDataB.user_id = Users.id
ORDER BY id,tbl;
which produces the second table in the image that can be read and differentiated within application code.

My question does Oracle provide a facility that accomplishes what I have done in the union all more elegantly?
  • Attachment: samples.png
    (Size: 51.87KB, Downloaded 76 times)

[Updated on: Thu, 04 February 2021 06:20]

Report message to a moderator

Re: Alternative approach for a query run programmatically. [message #683587 is a reply to message #683586] Thu, 04 February 2021 06:35 Go to previous messageGo to next message
John Watson
Messages: 8535
Registered: January 2010
Location: Global Village
Senior Member
Do you mean something like this,
orclz> select firstname,lastname,
  2  (select listagg(value,',') from userdataa where userdataa.user_id=users.id),
  3  (select listagg(value,',') from userdatab where userdatab.user_id=users.id)
  4  from users;

FIRSTNAME  LASTNAME
---------- ----------
(SELECTLISTAGG(VALUE,',')FROMUSERDATAAWHEREUSERDATAA.USER_ID=USERS.ID)
---------------------------------------------------------------------------------------------------------------------------------------
(SELECTLISTAGG(VALUE,',')FROMUSERDATABWHEREUSERDATAB.USER_ID=USERS.ID)
---------------------------------------------------------------------------------------------------------------------------------------
John       Doe
FIN,DEV
a,a,b

Jim        Berry



Bob        Moore
SHP



orclz>
Though I am still at a loss to understand what you intend to do with this. What is the "consumption" you are going to do? For example, do you have some process that demands a particular format?

Re: Alternative approach for a query run programmatically. [message #683588 is a reply to message #683587] Thu, 04 February 2021 07:36 Go to previous messageGo to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
Hi,
That is certainly getting me closer (compared to the awful UNION ALL I am using now).

The reference to consuming it aims to describe that I am reading it within application code, such as C, or CPP. I open a cursor and loop over the entire result set. The data must remain as a non-jagged array of tabular values. It looks like your sqlplus session inserts a line break before each LISTAGG select just for visualization.

What you have looks very promising, how can I transpose the LISTAGG values to appear under the user row, but also include an indicator to identify what table the data originates from?
Re: Alternative approach for a query run programmatically. [message #683589 is a reply to message #683588] Thu, 04 February 2021 10:14 Go to previous messageGo to next message
Michel Cadot
Messages: 67817
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
It's just formatting:
SQL> break on res skip 1
SQL> With
  2    roles as (
  3      select user_id,
  4             listagg(value,', ') within group (order by value) roles
  5      from userdataa
  6      group by user_id
  7    ),
  8    attrs as (
  9      select user_id,
 10             listagg(value,', ') within group (order by value) attrs
 11      from userdatab
 12      group by user_id
 13    )
 14  select 'record '||to_char(row_number() over (order by null)-1)||':
 15      FirstName: '||firstname||'
 16      Lastname: '||lastname||'
 17      role: ['||roles||']
 18      attr: ['||attrs||']' res
 19  from Users u
 20       left outer join roles r on r.user_id = u.id
 21       left outer join attrs a on a.user_id = u.id
 22  /
RES
------------------------------------------------------------------------------
record 0:
    FirstName: John
    Lastname: Doe
    role: [DEV, FIN]
    attr: [a, a, b]

record 1:
    FirstName: Bob
    Lastname: Moore
    role: [SHP]
    attr: []

record 2:
    FirstName: Jim
    Lastname: Berry
    role: []
    attr: []


3 rows selected.
Re: Alternative approach for a query run programmatically. [message #683592 is a reply to message #683589] Thu, 04 February 2021 10:55 Go to previous messageGo to next message
John Watson
Messages: 8535
Registered: January 2010
Location: Global Village
Senior Member
As ever, MC writes SQL that is a zillion times better than mine Smile
Re: Alternative approach for a query run programmatically. [message #683593 is a reply to message #683592] Thu, 04 February 2021 11:08 Go to previous messageGo to next message
ritmo2k
Messages: 11
Registered: October 2020
Junior Member
Hah, thank you all so much for the patience.

I am trying to interpret this to yield the non-jagged, tabular result that matches the second table in the image.
Re: Alternative approach for a query run programmatically. [message #683594 is a reply to message #683592] Thu, 04 February 2021 11:38 Go to previous message
Michel Cadot
Messages: 67817
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

John Watson wrote on Thu, 04 February 2021 17:55
As ever, MC writes SQL that is a zillion times better than mine Smile
You've done the main part sorting out the requirements. Smile

Previous Topic: ROW_NUMBER() not working properly?
Next Topic: old hash value is used to identify same sql and Plan hash value is to identify sql plans?
Goto Forum:
  


Current Time: Fri Apr 23 04:35:39 CDT 2021