Re: MYSQL Error 2013 load infile 15mln rec 6gb CSV

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Sat, 25 Oct 2014 20:11:03 +0000 (UTC)
Message-ID: <pan.2014.10.25.20.11.03_at_gmail.com>


On Sat, 25 Oct 2014 14:41:20 +0100, Eric wrote:

> On 2014-10-24, Mladen Gogala <gogala.mladen_at_gmail.com> wrote:

>> On Mon, 20 Oct 2014 23:11:45 +0100, Eric wrote:
>>
>>
>>> idiot: someone who disagrees with you on a single point?
>>
>> Nope. Idiot is someone who can say that hints are not necessary 
because
>> Postgres is not for profit.

>
> I think what they said was that hints are unnecessary, and that they do
> not need to implement them in spite of that because Postgres is not for
> profit.

What is necessary and what is not is decided by the users of the database software. And overwhelming majority of those is shying away from Postgres. The designers are not the people to make the statement of what is necessary and what is not. Users are. When designers tell users what is necessary, that is condescending and rude. Postgres designers would do well to study the case of GNOME 3.

> If they are right about the first,

But they are not right nor they can be right. Designers do not get to decide what is necessary and what is needed and what is not because they usually do not use the product every day.

> the second is not idiotic,
> nor is it if they are convinced about the first even if it eventually
> turns out to be wrong. Because they are not for profit, they are not
> trying to compete with Oracle or anything else, and can look at things
> on a longer timescale.

They're not trying to compete with Oracle??? Is that a joke? Why create relational database software if your intention is to not compete with the existing database software? What are they actually trying to do? And what "longer timescale" are we talking about? Talking about the longer timescale, even I wouldn't preclude using PostgreSQL, of course it implements hints.

>
> Note: I have nothing to do with the Postgres team, I have never read a
> single line of the source code,

I have hosted a Postgres meetup in New York: http://www.meetup.com/postgresql-3/messages/11836673/

I have also read some source code and had a basic implementation of hints. I know what I am talking about. Bruce Momjian was also there. I hoped that I would be able to persuade him to accept hints, but no such luck. When I saw the situation as it is, I gave up on Postgres.

> I have a single instance of Postgres on
> my home server which I ignore as long as it runs because something else
> that I do "need" uses it. I'm just trying to start from a more
> open-minded position that you seem ever to have had, at least since you
> discovered there were no hints.

Open mindedness isn't a virtue in itself, it's a frequently misused word which was used to justify drug use as a way to achieve an expanded state of consciousness. Aldous Huxley has written a novel called "The Doors of Perception" about it, after which a legendary rock band was named. Open mindedness was a part of a heavy propaganda barrage during the age of so called "counter culture".
The net result of the "open mindedness" was a plague of heroin use which has devastated so many families that I consider it a real national tragedy. I am usually suspicious when someone tries to argue something by "open mindedness" instead of the real arguments. I prefer logic.

>

>>> As for needing hints, how often, on any forum, is an OP told to step
>>> above their actual question and state the goal behind it. Are you sure
>>> that you are not just like them, assuming that the way to do something
>>> in product X must exist in the same form in product Y.
>>
>> Nope. The hints are basically means to override the optimizer decision 
>> and require the execution plan to go differently than the optimizer 
has
>> originally calculated. Insisting that optimizer decision should never be
>> overridden means telling your users that you think that a computer 
>> program is smarter than them, when it comes to their data.

>
> And what component of any other commercial software product do you need
> to override? How many of those give you a way to do it?

Nice try of changing subject. We are talking about relational databases and overriding the optimizer decisions. The only database which doesn't allow that is - Postgres. Please, stick to the topic.

>

>> State of
>> affairs in technology is simply not there yet. Optimizer is right in the
>> 99% of the cases, but for the remaining 1% percent, hints are needed, >> period. PostgreSQL is the only major database which doesn't have hints.
>> And the idiot I quoted before is trying to make a technical inadequacy,
>> something that PostgreSQL is missing, as opposed to all other major >> databases, into a phylosophical sticky point. The attitude is even more
>> curious because there already are methods to change the execution plans,

>> only the scope is session, not a single SQL. Unrelated to that, one of 
>> the main people designing Postgres is Bruce Momjian who works for the 
>> company which sells a commercial version of Postgres which supports 
>> hints. One of the replies I received on the mailing lists was "if you 
>> want hints, you will have to pay for them". Funny, isn't it?

>
> So a commercial organisation made a commercial decision. Are you
> suggesting that the Postgres team is being used as a tool of that
> organisation? Then where is the project fork that would inevitably
> follow?

EnterpriseDB is a commercial fork of PostgreSQL. And I believe that EnterpriseDB is steering the decisions in such a direction that one would necessarily need to pay for the license if they want usable partitioning or hints. Having those features in the freeware version would undermine the viability of already embattled database vendor. Also, inadequate tuning mechanisms sell consulting. Of course I don't have any proof, but I can ask you for some open-mindedness, just like you asked me.

>

>> Of course
>> I will pay, only not to EnterpriseDB or other commercial versions of 
>> Postgres. If I have to pay for hints, it will be to the makers of 
Oracle,
>> DB2 or MS SQL Server, who also sell databases which support hints. The 
>> makers of those databases do not have problems with hints.
>> One of the points in the article by our former pastry baker is that 
>> "commercial databases have hints because DBAs want them". True, if you 
>> want me to use your software, you better make sure that it fulfills my 
>> needs. And that is why Postgres is remaining an obscure solution which 
I
>> do not recommend to anyone, unless its developers change attitude and 
>> implement hints and some other things that users need.

>
>>> OK, I use hints because they are the best way (or the only way) to
>>> achieve a desired situation in Oracle, but I am quite prepared to
>>> believe that they are not a universal answer.
>>
>> Your religion is not my problem.

>
> Oh really!!! The verb "to believe" and its various derived words may be
> used frequently in connection with any or all religions, but they are
not
> restricted to that usage (even "faith" is not restricted in this way),
> and for you to assume any such meaning on my part is actually rather
> insulting.

The verb "to believe" means to hold some statements to be true, despite of not having evidence to prove it. When somebody says "I believe you", that means "I think you are telling the truth, although I cannot check it". From that to religion is not a big step. All one needs to do is to throw hands toward the sky and say "hasa diga eebowai".

>

>> There are people who are prepared to 
>> believe that they will go to heaven and be greeted by 69.98 virgins if 
>> they blow themselves up in a bus full of infidels. Dealing with the 
>> occasional optimizer hiccups is my problem. And I need hints in order 
to
>> be able to do that. In other words, when I have SQL that needs to use 
>> specific index or access path, I must be able to tell that to the 
>> optimizer. If I cannot do that with the PostgreSQL, I will not use it, 
>> period.

>

>> As for the "universal answer", it has already been found: "42". The only
>> problem is what is the question. Hints are not the answer, hints are 
>> means of controlling the optimizer.

>
> Universal answer to a particular and already mentioned question,
> thankyou. And of course there isn't one, but there may be a reasonable
> candidate other than the one you are so insistent on using.

There may be a reasonable candidate? Speaking in long terms? Let's cut the BS: Postgres is the only database which doesn't allow users to change optimizer decision. Period. Let's stick to the provable and observable facts: hints are useful and used by many developers and administrators alike. Hints are definitely useful. Postgres doesn't want hints.

>

>> The question is whether users should 
>> have the ability to control the optimizer or should not have that 
>> ability. And the users themselves answered that question by 
>> overwhelmingly flocking to the databases which offer them such 
control,
>> which is everything else except PostgreSQL.

>
> See my question above about other software.
>>
>> I am just helping Postgres to remain obscure by constantly pointing out
>> the attitude of developers who think that users do not need to have 
>> control of their optimizer. Such a condescending and rude attitude is 
>> hard to defend. It amounts to telling the users that they are too 
stupid
>> to control the optimizer. I didn't accept that message too well

>
> I don't see the message like that (obviously).

Of course you don't see it. You believe. The message is loud and clear: users are not smart enough to second guess the optimizer.

>

>> and, 
>> judging by the size of Postgres conferences, neither have the vast 
>> majority of the other database users. I do believe in evolution. 
Ingres
>> has lost to Oracle and Postgres is on its way to lose to Oracle too. 

>
> Evolution (in the biological sense) is not the march of progress, it is
> the march of increasing complexity, some examples of which survive for
> much longer than others due to various contingent circumstances. Often
> the survivors would have seemed unlikely when they first appeared. I
> suspect that this applies to software "evolution" as well.

Actually, evolution is survival of the fittest in the given circumstances. And the "fittest" in case of software means the software that best corresponds to the criteria of the users.

>
> Unfortunately people are more comfortable with the familiar, and are too
> inclined to see everything as "either/or", so the most popular "answer"
> is not necessarily the best.

Now, this is a typical BS lament of the human nature, which is generalized to the galaxy far, far away from the original discussion. The issue is very simple:

I have a query that is not performing well and a nervous VP staring over my shoulder and asking "is it done yet"? At that point, re-arranging the data model is out of the question. All I need to do is to fix the query, because that's what the viability of web application depends upon. If there are no hints, I am SOL, nothing I can do. Marketing needs to engage in expensive damage control, competitors will point at the company web application flop and there will be definite business damage. It may even be a resume generating event. For my own safety, I don't use Postgres and I tell everybody not use it. Of course there may be better ways. On USS Enterprise, they have replicators and transporters, there is no need for competition. We all need just to hold our hands and sing "kumbaya", with the obvious exception of the Ebola patients. Unfortunately, in the world as it is, competition and last minute problems are facts of life. So is the need for the hints, despite what developers tell you. Developers do not get to decide what users need and what they don't need. Users do decide that. The job of developers is develop what the users need, not to tell them what they need. GNOME 3 developers tried that. So did Ubuntu Unity developers. Result: XFCE is the most popular desktop environment on Linux, bar none. PostgreSQL developers do not want to implement hints, despite users clamoring for them. Result: MySQL is the most popular open source database, bar none.

-- 
Mladen Gogala
The Oracle Whisperer
http://mgogala.byethost5.com
Received on Sat Oct 25 2014 - 22:11:03 CEST

Original text of this message