Re: approaches for embedding a data language in a general purpose language

From: David Cressey <dcressey_at_verizon.net>
Date: Tue, 10 Oct 2006 12:03:48 GMT
Message-ID: <E_LWg.957$P92.491_at_trndny02>


"Marshall" <marshall.spight_at_gmail.com> wrote in message news:1160406756.523915.48510_at_m73g2000cwd.googlegroups.com...
> Hello all,
>
> There are various different approaches one can take for embedding
> a domain specific lanuage into a general purpose programming language.
> Common examples are regular expression libraries inside languages
> that don't directly support regular expressions, and, directly to our
> purpose, SQL inside Java or C/C++.
>
> The three main approaches I can think of are:
> 1) a library that accepts text written in the language as string
> parameters
> ex.: JDBC, ODBC
>
> 2) Code generation
> ex.: Hibernate, any one of ~1000 O/R mappers
>
> 3) Direct embedding using a preprocessor
> ex.: SQL-J, embedded SQL (for C) etc.
>
> I've used the first two extensively, but never the third one. I've got
> the nagging suspicion that it's the one I would like the best. (Of
> course
> one must immediately suspect grass-is-greener syndrome here.)
>
> An issue is that general purpose languages typically need to know
> the types of things up front, and that means that in the code
> generation approach, it's necessary to regenerate the code
> every time a query with a new result set type is needed. That's
> a bit inconvenient, and means that your modification will necessarily
> be far away from the point in the code that's motivating it.
>
> One thing particularly pernicious about the code generation approach
> is that it really pushes the programmer in the direction of
> row-at-a-time
> thinking. This leads to horrific performance.
>
> Anyway, I'd be interested in a discussion of the merits and deficits
> of the various approaches, and particularly if anyone has anything
> to say about 3). I can't help but feel there must be a better way
> that what I've been doing.
>
This is an issue can speak to. DEC Rdb/VMS, my DBMS of choice back in the 80s and 90s had a preprocessor with
RDML embedded in languages like COBOL or FORTRAN, and also in Pascal. The language may have had a different name when used with Pascal than it did with COBOL or FORTRAN.

RDML, when contrasted with SQL, did stimulate record at a time processing. Given the time frame, 3GL programmers (including me) were slowly evolving away from record at a time thinking. The engineers who designed RDML may have been on an evolution of their own, or they may have been trying to offer programmers a tool we could live with.

But I don't think embedded source necessitates record at a time thinking to the extent that you suggest. It depends on the two languages. Consider the following: suppose we have embedded in a language like Pascal or C the following functionality only: DECLARE CURSOR, OPEN, FETCH, and CLOSE. It's possible, in our DECLARE CURSOR usage, to "think relationally", and come up with a complex and subtle query that makes good use of the power of relational operators to get exactly the data we really need. We can then embed the FETCH in a local loop what fetches result rows and disposes of them.

It's almost certainly possible to better than that, but I submit that it's not necessary to do worse.

Anyway, if you'd like to hear more of my ramblings on the subject, I'm ready. Received on Tue Oct 10 2006 - 14:03:48 CEST

Original text of this message