Re: MySQL import of XML

From: Luuk <luuk_at_invalid.lan>
Date: Sat, 18 Jul 2015 20:38:44 +0200
Message-ID: <55aa9d34$0$2862$e4fe514c_at_news.xs4all.nl>


On 18-7-2015 19:47, Bjarne Jensen wrote:
> I have converted a pdf to XML - see truncated version:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
> <pdf2xml producer="poppler" version="0.24.1">
> <page number="1" >
> <text top="836" >DatNum</text>
> <text top="836" > 1/53</text>
> <text top="47" >Manual:</text>
> <text top="59" >Code:</text>
> <text top="71" >Revision No:</text>
> <text top="104" >CHAPTER</text>
> <text top="104" >CODE</text>
> <text top="104" >PAGE</text>
>
> ... zipped ...
>
> <text top="800" >01SEP14</text>
> <text top="812" > GENERAL</text>
> </page>
> <page number="2" >
> <text top="47" >CHAPTER</text>
> <text top="47" >CODE</text>
> ... zipped ...
> <text top="251">03JUN15</text>
> <text top="251">CHG</text>
> </page>
> </pdf2xml>
>
> I need the value/content of
> - page number
> - top number
> - text.
>
> so...:
>
> mysql> CREATE TABLE clist ( number INT, top INT, text VARCHAR(40) );
> Query OK, 0 rows affected (0,06 sec)
>
> Then I picked this up from
> http://stackoverflow.com/questions/5491056
>
> mysql> LOAD XML LOCAL INFILE '/home/me/xml/CheckList_MOD_9.xml'
> INTO TABLE clist(number, top, text);
> Query OK, 0 rows affected (0,01 sec)
> Records: 0 Deleted: 0 Skipped: 0 Warnings: 0
>
> No error messages but also no imported data.
>
> What am I doing wrong?
>

I did something like that some time ago, let me give you what i did...

wget "http://feeds.feedburner.com/pfc" -O $tmp -a $log

mysql test <<EOQ
LOAD DATA INFILE

     '$tmp'
     INTO TABLE
         pfc
         CHARACTER SET 'utf8'
         LINES STARTING BY '<item>' TERMINATED BY '</item>'
         (_at_tmp)
         SET
           title      = ExtractValue(_at_tmp, '//title'),
           link       = ExtractValue(_at_tmp, '//link'),
           guid       = ExtractValue(_at_tmp, '//guid'),
           pubDate    = ExtractValue(_at_tmp, '//pubDate'),
           content    = ExtractValue(_at_tmp, '//content:encoded')
               ;

EOQ after this, i cen see the history on updatesof i.e. Notepad++ ;)

[root_at_test]> select title, min(pubDate) from pfc where title like '%Notepad++%' group by title;

+-----------------------------+---------------------------------+
| title                       | min(pubDate)                    |
+-----------------------------+---------------------------------+
| [UPDATE] Notepad++ v6.6     | Fri, 02 May 2014 10:36:53 -0400 |
| [UPDATE] Notepad++ v6.6.1   | Fri, 02 May 2014 10:36:53 -0400 |
| [UPDATE] Notepad++ v6.6.2   | Fri, 09 May 2014 13:10:28 -0400 |
| [UPDATE] Notepad++ v6.6.3   | Mon, 19 May 2014 13:30:24 -0400 |
| [UPDATE] Notepad++ v6.6.4   | Thu, 05 Jun 2014 12:50:24 -0400 |
| [UPDATE] Notepad++ v6.6.6   | Fri, 13 Jun 2014 16:42:37 -0400 |
| [UPDATE] Notepad++ v6.6.7   | Wed, 25 Jun 2014 07:21:05 -0400 |
| [UPDATE] Notepad++ v6.6.8   | Mon, 28 Jul 2014 12:26:53 -0400 |
| [UPDATE] Notepad++ v6.6.9   | Mon, 08 Sep 2014 13:02:45 -0400 |
| [UPDATE] Notepad++ v6.7     | Mon, 15 Dec 2014 14:47:13 -0500 |
| [UPDATE] Notepad++ v6.7.1   | Mon, 15 Dec 2014 14:47:13 -0500 |
| [UPDATE] Notepad++ v6.7.2   | Sun, 28 Dec 2014 06:13:28 -0500 |
| [UPDATE] Notepad++ v6.7.3   | Sun, 28 Dec 2014 06:13:28 -0500 |
| [UPDATE] Notepad++ v6.7.4   | Sat, 10 Jan 2015 19:33:54 -0500 |
| [UPDATE] Notepad++ v6.7.5   | Wed, 11 Mar 2015 08:21:28 -0400 |
| [UPDATE] Notepad++ v6.7.6   | Wed, 15 Apr 2015 14:57:17 -0400 |
| [UPDATE] Notepad++ v6.7.7   | Wed, 15 Apr 2015 14:57:17 -0400 |
| [UPDATE] Notepad++ v6.7.8   | Sun, 17 May 2015 02:07:44 -0400 |
| [UPDATE] Notepad++ v6.7.8.1 | Sun, 17 May 2015 02:07:44 -0400 |
| [UPDATE] Notepad++ v6.7.8.2 | Sun, 24 May 2015 04:00:05 -0400 |
| [UPDATE] Notepad++ v6.7.9   | Fri, 12 Jun 2015 14:08:39 -0400 |
| [UPDATE] Notepad++ v6.7.9.1 | Mon, 22 Jun 2015 15:09:56 -0400 |
| [UPDATE] Notepad++ v6.7.9.2 | Mon, 22 Jun 2015 15:09:56 -0400 |
+-----------------------------+---------------------------------+
23 rows in set (6.31 sec) Received on Sat Jul 18 2015 - 20:38:44 CEST

Original text of this message