Processes vs Threads

From: Andrew Francis <andrew_at_gmvt4.concordia.ca>
Date: 6 Nov 1994 05:13:43 GMT
Message-ID: <39hom7$fso_at_newsflash.concordia.ca>


In article <783740754snz_at_sambusys.demon.co.uk> psb_at_sambusys.demon.co.uk writes:

PB>Threads!  I hate them.  They're so inelegant that they're bound not
PB>to be last.  I've investigated them and I'm confident I can ignore
PB>them.  My thoughts:

PB>Unless you have kernel support for threads a la Sun then each threaded
PB>process has its own (hidden) scheduler thread.  So the OS scheduler
PB>schedules the threaded process which splits its activity between the
PB>threads using its thread scheduler.  So the ``fine control'' over
PB>scheduling often claimed as a benefit of threads is a fantasy:  The
PB>thread scheduler has no control over how often the threaded process
PB>is itself scheduled by the OS.  The task of the thread scheduler can
PB>be compared to that of a person balancing a pole on his head while PB>another randomly turns the light on and off.

  I must state that I am far from being an expert on LWPs. That said, I do not get your arguments. Threads within a process, normally share that process' time slice and will run at its priority level. You seem to believe that this is bad and that a thread scheduler should be able to run a thread within a process higher than the process itself. As far as I am concerned, thread schedulers cannot and should not be controlling the Unix scheduler. Rather the system administrator should be giving an application a high priority to begin with. In turn, the thread scheduler inherits this. I believe this is the way bounded real-time threads work in Solaris: the process gets a high nice priority to begin with. It maybe true that a thread scheduler (Sun LWP is FCFS) is akin to a person balancing a pole on his head while someone is turning the light on and off. However this does not matter because the person is blindfolded and does notice the flicker of the light. A fundamental principle of multi-tasking operating systems is that scheduling is transparent to the application.

(Perhaps thread scheduling works different on some OS specifically designed for  real-time applications? However I suspect their kernels are very simple)

PB>On the other hand if you don't have kernel support for threads then
PB>thread context switches can occur without intervention of the kernel
PB>(because each threaded process has its own scheduler) thus making
PB>context switching cheap.  Perhaps that is the benefit of threads? 
PB>No: any system call still requires the process to be suspended while
PB>the kernel copes with the request.

   I do not believe that you understand the purpose of light weight processes. A thread and a process are not synonmous. Threads exist to faciliate, among other things, the coding of applications that internally have several "threads" of execution that operate more or less independently of one another but once in a while, need to communicate amongst themselves. Unfortunately, Unix traditionally had one paradigm for representing this type of concurrency: fork(). Perhaps you should be comparing threads to actors or something like a C++ task?

PB>Whatever the benefit of threads (and it is my contention that there
PB>is none) they are an abomination to program.  Just look at the calls
PB>in the pthread library!

   I have seen the Posix thread API and have used the Sun LWP thread library. I like the Sun LWP API because it has a msg_send() and msg_recv() call which I feel is a very straightforward way to communicate between threads. Also encapsulating data within the thread and having threads communicate only through messages, helps avoid the need for monitors, semaphores, and condition variables. Chances are I would get a higher degree of throughput if I did use these mechanisms. However the code would get more complex. That said, the event message passing model can represent concurrency. Also threads are good for applications that do much I/O. In that fashion one uses threads and avoid asynchronous I/O, whose coding logic tends to be messy and/or inefficient.

PB> I'm glad Oracle use separate processes rather than threads.

   As a programmer, why should you give a shit what concurrency mechanisms the database uses? That is the concern of the people that designed the database. That said, I once installed a database called Postgres. I had to reconfigure the kernel in order to add more semaphores and shared memory. I thought this sucked and threads would probably make that database less of a hog without having to have the hassle of rebuilding the system.

PB>But I would like one simple extension to C or (are you listening, Oracle?)
PB>to Pro*C that would let a variable be declared as ``share'' so that both
PB>processes would have access to the same memory location after a fork().
PB>I know it can be implemented using shared memory and I've written a little
PB>library to do it for me 'cos I seem to do it so often but I wish the
PB>language did it for me. 

   Shared memory! And you complain about threads? If you have reader and writer processes, now you not only have to create shared memory, but you now also have to use semaphores. After coding a server using fork() and message queues, and coding server using threads, I will take threads any day of the week.

   As for language support. I believe that Ada and Modula3 have tasks and coroutines. Also C++ and Smalltalk have tasks. However tasks don't block and are non-preemptive (not that Sun LWP threads are pre-emptive).

PB>Then the last excuse for the existence of threads - sharing the same PB>address space - disappears and threads could certainly go hang themselves.

   Why do you hate threads so much? Did you poorly code a mission critical application that malfunctioned and cost your company mucho $$$?

PB>Once upon a time a Unix fork() was expensive.  And lite processes
PB>seemed a good idea.  But not anymore.  fork() takes no longer than
PB>pThreadCreate() (or whatever it is called - I can no longer remember.)

   Unix fork() is still expensive. From looking at vfork(), it seems to exist solely to make execing less wasteful.

PB>Of course, MS Windows and VMS programmers love threads - they don't PB>have (a proper or a cheap) fork().

   Again, how can fork() be cheap if makes a copy of the process? Perhaps shared libraries help but I don't know by how much? That said, I see most contemporary operating systems having thread support. Then again, we should be developing high level concurrency constructs so that the average programmer does not have to know about the underlying process mechanisms.

PB>Why do threads exist in the Unix environment if they are as useless as I PB>say?

   Threads are useful. Perhaps the problem is guys like you do not understand them much less how to use them.

PB>Because then the likes of Transarc can provide portability between Unix, 
PB>VMS and Windows via compatible threading libraries.  Finally we have the 
PB>reason for the existence of of threads.  And that's why I'll ignore them: 
PB>For the same reason I ignore VMS and MS Windows.

   I don't think that programmers using DECThreads or Windows NT are losing any sleep because you do not want to use threads.

--Andrew Received on Sun Nov 06 1994 - 06:13:43 CET

Original text of this message