Home » RDBMS Server » Server Utilities » loading data from twitter with sqlldr (sqlloader )
loading data from twitter with sqlldr [message #667129] Fri, 08 December 2017 12:18 Go to next message
rocioperezn
Messages: 2
Registered: December 2017
Junior Member
Hello all,

I have some problems loading a twitter data from a csv, with some multiline records. I have 4 columns id_tweet,text,created_at, username .
When text is enclosed by " the register is multiline or has comas inside, I show you 3 examples, 1st enclosed by " with comas inside the text, 2nd enclosed by " and multiline, 3rd no enclosed by coma no multiline:

793418980342063104,"@realDonaldTrump I am also a $ donor, i can not give much in the way of $ but i do.(poor) My advice is more valuable if you have used it.",2016-11-01T11:46:40.000Z, ReynardGosso
793418981415583745,"@FranklinFoer
Thanks @realDonaldTrump you’re doing a great Job helping to elect @HillaryClinton !
",2016-11-01T11:46:40.000Z,759714473556840448,false,wabroussard1,,,0,-1,1,0
793418985458929664,This is the 5 year old Nazi German-American @realDonaldTrump that wants 2 be president. No Americans elect adults.,2016-11-01T11:46:41.000Z, AnthonyNnani

I´m using sqldr but when I use

load data
infile 'tweetv1.csv' "str '\r\n'"
append continueif last != ','
into table TWEETS
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( id_TWEET CHAR(4000),
text CHAR(4000),
created_at CHAR(4000),
userid CHAR(4000)
)
It doesn´t recognize when it's a new logical record.

Any advice on how to do it?

Thanks a lot,
Rocio
Re: loading data from twitter with sqlldr [message #667131 is a reply to message #667129] Fri, 08 December 2017 22:45 Go to previous messageGo to next message
Barbara Boehmer
Messages: 8805
Registered: November 2002
Location: California, USA
Senior Member
You need to have something in your data that can be used to either identify when a record begins or when a record ends. You have used

continueif last != ','

to tell it that the record is continued on the next line if the last character on the current line is not a comma, but there are no commas at the ends of your records. In the following example, I have instead used

continueif next preserve(1) != '7'

to tell it that the record is continued on the next line if the next character on the next line is not a 7 and, if it is a 7, then that begins the next record and to preserve that 7 as part of the first field. This, obviously will only work if every id_tweet begins with a 7. If not, then you will need to obtain your data in a different format, in order to ensure being able to load all records, finding some way to either add an end of record character or beginning of record character.

SCOTT@orcl_12.1.0.2.0> host type tweetv1.csv
793418980342063104,"@realDonaldTrump I am also a $ donor, i can not give much in the way of $ but i do.(poor) My advice is more valuable if you have used it.",2016-11-01T11:46:40.000Z, ReynardGosso
793418981415583745,"@FranklinFoer
Thanks @realDonaldTrump youΓÇÖre doing a great Job helping to elect @HillaryClinton !
",2016-11-01T11:46:40.000Z,759714473556840448,false,wabroussard1,,,0,-1,1,0
793418985458929664,This is the 5 year old Nazi German-American @realDonaldTrump that wants 2 be president. No Americans elect adults.,2016-11-01T11:46:41.000Z, AnthonyNnani

SCOTT@orcl_12.1.0.2.0> host type test.ctl
load data
infile 'tweetv1.csv' "str '\r\n'"
append continueif next preserve(1) != '7'
into table TWEETS
fields terminated by ','
OPTIONALLY ENCLOSED BY '"' AND '"'
trailing nullcols
( id_TWEET    CHAR(4000)
, text        CHAR(4000)
, created_at  CHAR(4000)
, userid      CHAR(4000) )

SCOTT@orcl_12.1.0.2.0> create table tweets
  2    (id_tweet    varchar2(4000),
  3     text        varchar2(4000),
  4     created_at  varchar2(4000),
  5     userid      varchar2(4000))
  6  /

Table created.

SCOTT@orcl_12.1.0.2.0> host sqlldr scott/tiger control=test.ctl log=test.log

SQL*Loader: Release 12.1.0.2.0 - Production on Fri Dec 8 20:38:26 2017

Copyright (c) 1982, 2014, Oracle and/or its affiliates.  All rights reserved.

Path used:      Conventional
Commit point reached - logical record count 2
Commit point reached - logical record count 3

Table TWEETS:
  3 Rows successfully loaded.

Check the log file:
  test.log
for more information about the load.

SCOTT@orcl_12.1.0.2.0> column id_tweet    format a18
SCOTT@orcl_12.1.0.2.0> column text        format a60 word_wrapped
SCOTT@orcl_12.1.0.2.0> column created_at  format a24
SCOTT@orcl_12.1.0.2.0> column userid      format a13 word_wrapped
SCOTT@orcl_12.1.0.2.0> select * from tweets
  2  /

ID_TWEET           TEXT                                                         CREATED_AT               USERID
------------------ ------------------------------------------------------------ ------------------------ -------------
793418980342063104 @realDonaldTrump I am also a $ donor, i can not give much in 2016-11-01T11:46:40.000Z ReynardGosso
                   the way of $ but i do.(poor) My advice is more valuable if
                   you have used it.

793418981415583745 @FranklinFoerThanks @realDonaldTrump youΓÇÖre doing a great  2016-11-01T11:46:40.000Z 7597144735568
                   Job helping to elect @HillaryClinton !                                                40448

793418985458929664 This is the 5 year old Nazi German-American @realDonaldTrump 2016-11-01T11:46:41.000Z AnthonyNnani
                   that wants 2 be president. No Americans elect adults.


3 rows selected.

Re: loading data from twitter with sqlldr [message #667133 is a reply to message #667129] Sat, 09 December 2017 08:24 Go to previous messageGo to next message
EdStevens
Messages: 902
Registered: September 2013
Senior Member
https://community.oracle.com/message/14629790#14629790
Re: loading data from twitter with sqlldr [message #667134 is a reply to message #667131] Sat, 09 December 2017 14:53 Go to previous message
rocioperezn
Messages: 2
Registered: December 2017
Junior Member
Thanks a lot !! Really useful , it works perfectly !!!

Regards,
Rocío.
Previous Topic: oracle impdp command issue
Next Topic: ORA-00984: column not allowed here. date. sql loader.
Goto Forum:
  


Current Time: Fri Feb 23 03:30:10 CST 2018

Total time taken to generate the page: 0.01820 seconds