Re: More benchmark bullshit, and Linux luser mating calls... (was Re: Linux betas NT in TPC testing, running Oracle8

From: The Ghost In The Machine <ewill_at_lexi.athghost7038suus.net>
Date: 28 Apr 1999 03:24:55 GMT
Message-ID: <slrn7ic76d.4rk.ewill_at_lexi.athghost7038suus.net>


Followups trimmed way back.

On Tue, 27 Apr 1999 17:33:44 GMT,
Anthony Ord <nws_at_rollingthunder.demon.co.uk> wrote:
>On 26 Apr 1999 11:45:06 -0600, Craig Kelley
><ink_at_inconnu.isu.edu> wrote:
>
>>ewill_at_lexi.athghost7038suus.net (The Ghost In The Machine) writes:
>>
>>> >>Oh, linux has _plenty_ of flaws, just fewer than WindowsNT for example.
>>> >
>>> >Would you care to elaborate? And be specific.
>>>
>>> I can at least state that Linux has a nasty flaw -- but it's mostly
>>> an internal one, and may not have much applicability (it is a big
>>> C program, after all).
>>>
>>> It uses pointers in its structures. Naughty naughty! :-)
>>>
>>> (I've been corrupted by a famous person who gives out C++ lectures around
>>> here whose name completely escapes me at the moment (grr) --
>>> "Pointers are bad". Mind you, it's not clear whether we want
>>> to have C++ and STL [*] in the Linux kernel... :-) )
>>
>>How can pointers in structures be naughty?
>
>Can someone point out how to do linked lists and binary
>trees without them?

Heh.

Well, there is that issue, but there are many ways of representing smart pointers in C++. One of the ones that I personally like is a representation in which the location of the pointer is subtracted from what it points to, yielding a difference -- not unlike that used in many branch instructions. With some care (very easily done in C++), a relative pointer can be shoved into an mmap()ed [*] area without fear of losing its ... um ... relevance.

*watches everyone cringe at the bad pun*

Or, one can use an offset from a known location, say, a large contiguous buffer allocated (or, again mmap()ed) for the occasion. Both have problems; in particular, one pays a performance penalty (it takes time to subtract or add) every time the pointer is manipulated, and there has to be an int big enough to hold the pointer -- an issue for 64-bit architectures, and the soon-to-be- developed 128-bit architectures ("long long long"?). However, if all the code uses one particular pointer type, one can do some neat stuff, like start where it left off in an mmap()ed area; that could be real interesting for InstantOn(c)(r)(tm) -- just map the file, find the relevant data the app needs (which hopefully is near the top of the file), and go. (If the app is multithreaded, it will have to be careful about checkpointing, though.)

This is not to say we should use such techniques in Linux, considering that we're not trying to solve a persistence and/or relocation problem, as far as I know (although this InstantOn thing sounds like it could be *real* interesting -- think checkpointed swap file!). But there is one big issue, that I personally don't like.

Basically, in some old code, there are a lot of structures linked together in a list, and each of these structures has a forward pointer (declared to be a pointer to the same structure).

    struct thingy {

        struct thingy * next_thingy;
        ...

    };

    struct thingy * all_thingys;

This has three problems:

  • it's a singly linked list, which is simple to scan, but ugly to maintain,
  • it's a replicated singly linked list, which is even uglier; one has to write the code to append to the list N times, where N is the number of structure types; similarly for delete, insert, remove, scan, etc.,
  • it's difficult to determine whether a variable is being used to hold onto a collection, or to walk through an existing collection, without at least a comment nearby.

One would ideally replace the singly linked with a doubly linked list, or even perhaps a dynamic array, a btree, or extensible hash table, depending on the needs of the algorithm, but that would mean dragging the pointers outside of the structure, and changing a lot of for(;;) loops to use STL iterators. Still, the idea of solving one problem many times has never appealed to me, and hopefully if we do a lot of list-walking in Linux (I doubt that there's that much), then we can centralize the list (and the walking) problem in one module, or at least one set of .h files (since Linux is C, not C++).

(One very nice feature of STL is that one *can* change the collection type without rewriting code everywhere. If one wants to use a list today, and a vector tomorrow, then one can change a single typedef and recompile. The performance characteristics will change, but the source code won't need to.)

I will also mention the Amiga; now *there* was an OO operating system before OO became so hip and trendy (it was BCPL and C based, not C++). It has the problem that the list node was part of the structure, though -- but, apart from that, it had nice operations on lists that simply just worked; one didn't have to rewrite "append to end of list" code hundreds of times; one merely used Insert() or Remove(). (It also had a prioritized list, but that IMO is rather a poorly-scalable construct, as it takes O(N) time to insert an element into the list.) All this in a package that fit into 4 disks and 256K of ROM (which later got beefed up to 512K, I think).

This is why pointers can be naughty. I would think, though, that Linux is not quite as naughty in this regard as some other operating systems...especially since Linux doesn't seem to make any dumb assumptions about pointer size, unlike (apparently) a certain other operating system made by a certain other company which doesn't like more than a certain amount of token competition... :-)

[.sigsnip]

[*] mmap() is a system call that takes a chunk of virtual memory

    and associates it with a regular file, making the regular file     appear as though it were a swap file, in a way. Instead of     using read() or write() calls on the file, one merely takes     the returned memory location and uses it as a pointer,     or adds an offset to it and fetches memory, much like a     buffer created with malloc()/calloc() or operator new.

    Both Unix and VMS had this call (although in VMS it was     named something else; I forget precisely what).



ewill_at_aimnet.com Received on Wed Apr 28 1999 - 05:24:55 CEST

Original text of this message