Re: Why all the max length constraints?

From: dawn <dawnwolthuis_at_gmail.com>
Date: 27 May 2006 20:31:31 -0700
Message-ID: <1148787091.347602.244980_at_j55g2000cwa.googlegroups.com>


J M Davitt wrote:
> dawn wrote:
> > J M Davitt wrote:
> >
> >>dawn wrote:
> >>
> >>>[OK, here is my next "stupid question" as I cut a path in my study of
> >>>the RM. Those teachers who just want to tell this student how ignorant
> >>>she is are welcome to sit this out as I really am hoping to
> >>>understand.]
> >>
> >>[Quiz time: what are scalars, tuples, and relations?]
>
> It's a quiz. These are fundamental terms in the relational model.

Yes, I know. Why do you think I could not define them? I've provided definitions, likely for each of these, for the glossary at one time or another, along with defs for "functions" which are relations.

I have read quite a bit about the RM (including TTM and almost all of Date's 8e). My questions are typically about topics not covered in such places and are all about the application and usefulness of the theory related to databases. They stem from having had a belief that the RM was the way to go and then seeing how much more bang for the buck at least one non-RM-database model was for a company. So, I'm trying to square up what I've learned in the real world with the theory that has driven the database industry for the past couple of decades. There is still a gap, but I'm understanding a bit more where the implementations of the RM have taken the industry a bit astray from what I have seen as best practices (2VL, LVAs, variable length variables, for example).

> > You define scalar and I'll define tuples and relations.
> > btw, I think my def of relation is already in the cdt dictionary.
> >
> >
> >>>In SQL-DBMS's, like VSAM (and other indexed sequential files before
> >>>them) a lot of attributes are specified with max length constraints.
> >>
> >>I don't mean to pick nits, but I don't grok "VSAM" and SQL-DBMS's.
> >
> >
> > It is not a nit pick for you to point out things you don't fully
> > understand.
>
> Okay, let's be explicit: what do SQL and VSAM have to do with each
> other?

Developers using each include max lengths in the metadata, given them both that good old computer card feel where you list the attributes in some order (recognizing order is different for cards and vsam than for sql-dbms's) with their lengths (and types) when defining your schema. It almost make me want to add a Hollerith code, length of 1, of course.

> >>IIRC, VSAM provided for an OCCURS-like construction in record layouts -
> >>but all that meant was that you could have a variable number of fields,
> >>all of fixed width, up to some specified maximum.
> >
> >
> > Indexed sequential files were a natural progression from cards. It is
> > easy to see how fixed lengths arose out of working with card decks. I
> > wondering if the concept was simply never dropped or had a reason for
> > continuing.
>
> Fine. But off-point. Isn't it true that the variable in VSAM means
> a range of fixed-width fields?

There can be an OCCURS clause (in COBOL) for cardinality > 1 and there can also be fields of arity > 1, so we lost both of those features with SQL, but otherwise it was just defining a virtual card, which could be
> 80 columns once it wasn't read in or written to an actual card. The following is from memory and I haven't coded COBOL since 88, but it was something like

01 PERSON.
    03 PERSON_ID PIC X(9).
    03 NAME.

         05  FIRST_NAME    PIC X(15).
         05  LAST_NAME     PIC X(20).
    03  GENDER               PIC X.
    03  AGE                      PIC 999.

Which might become
CREATE TABLE PERSON
   PERSON_ID CHAR(9),
   FIRST_NAME CHAR(15),
   LAST_NAME CHAR(20),
   GENDER CHAR(1),
   AGE INT(3); The changes for the SQL table were the removal of attributres with higher cardinality or arity than 1 and the idea that the order was not important. Otherwise, these look rather similar and unlike languages where you need not define a max length for every variable. IBM played a major role with both too, which might also be irrelevant.

> >>MicroData, Pick,
> >>Progress, &c, mean very different things when they describe things as
> >>variable.
> >
> >
> > The first two should mean the same if you are referring to MicroData
> > Reality. There are some good stories on the splitting of ways between
> > Dick Pick and Don Fuller, but yes, I'm sure that "variable" is used for
> > different things.
>
> And doesn't their use of the term variable differ from the use of the
> term in VSAM? And don't you want to clear this up before moving
> forward? After all, you tossed VSAM and "all data as variable in
> length" into the discussion about fixed-width attributes. (Or, more
> precisely, you used the phrase "max length constraints."

I'm not sure what is not clear. I used the term "max length constraints" so that it was clear what I meant by variable length (being the opposite where there are no such maximum length constraints on the data values). Sorry I'm not understanding your question. Did this clear it up?

> >>>While there are some attributes where this constraint is related to a
> >>>conceptual constraint (from the analysis phase), these lengths are
> >>>often introduced for the logical model or implemenation in the DBMS.
> >>>
> >>>In other words, when mapping from the conceptual (analysis) to the
> >>>logical (design) data models (pick the terms you like best for these),
> >>>these constraints are designed for many attributes that have no such
> >>>conceptual/business limits (if implemented with a paper system, there
> >>>would be no such limit, for example).
> >>>
> >>>Is there something about the RM that would prompt all (or most?)
> >>>existing implementations (however flawed) to drive developers to add in
> >>>these constraints for performance, space saving, or other reasons?
> >>
> >>No.
> >>
> >>
> >>>I
> >>>realize there can be variable length attributes, but specifying a max
> >>>field length still seems to be the norm (is that still the case?)
> >>
> >>Yes.
> >>
> >>
> >>>As many of you know, I work with database management systems that treat
> >>>all data as variable in length, while one might specify a length for
> >>>display purposes.
> >>
> >>I can't imagine that it's useful for 'Smith, Joseph' and 'Smith, John'
> >>to appear as identical values when, say, displayed in a field of eight
> >>characters.
> >
> >
> > Obviously not if truncated. The default with Pick would be to wrap in
> > any display. This is usually good, but has its own issues too, of
> > course. For every solution...
> >
> >
> >> I also work with products where all data are of variable
> >>length. (There is a maximum, but it's huge.) PITA. This mis-feature
> >>accounts for a fair number of support calls.
> >
> >
> > What is the best mitigation strategy in your case? If you know that an
> > attribute must have no more than two characters, what do you do?
>
> Restrict it to one or two characters, obviously. Wouldn't you?

Yup, so what's the problem that becomes a PITA and what makes it difficult to correct or mitigate it to avoid the support calls?

> >
> >>>Thanks for any insights into database attribute length constraints,
> >>>their purpose (is it related to output with fixed fonts, database
> >>>performance, size or what?), and any correlation to implementations
> >>>(again, however flawed) of the RM,
> >>
> >>The issue has nothing to do with the relational model.
> >
> >
> > I know nothing of how dbms products are designed internally. I have
> > this idea that possibly the fact that everything in XML, Pick, and
> > elsewhere is ordered gives them more of a likelihood of permitting
> > variable length and handling it well since you can tag-delimit values.
> >
> >
> > Is there possibly something about how RDBMS's would need to allocate
> > space for columns or something that might make it advantageous for
> > there to be a max length? The other thought I had was that working
> > with sets and joins might mean that implementing with everything
> > variable length could be problematic for allocation of space somehow,
> > but I (obviously) don't know.
>
> But, you know, there is always a "max length." It may be a large
> value, but it's there.

Yes, understood. My question is about the logical data model (pick your favorite name for it) -- the definition of the schema to the computer. Why give a max length to an attribute that doesn't conceptually require such? Why can't the DBMS handle that for you (efficiently, of course)?

> IIRC, 32K is common in the products I cited
> in my earlier post. Can you verify that?
>
> >>if there is such. Could a vendor
> >>
> >>>write an implementation of the RM where length constraints are as rare
> >>>as they are in the conceptual model without introducing performance or
> >>>any other issues?
>
> There are already many products that can do that. But adding the
> phrase "without introducing performance or other issues" is naive.
> You should know better; everything has a cost.

I'm referring to run-time issues. Of course there would be a cost to change a product. At run-time, fewer constraints could be a performance boost. I can't think of any reason why MV systems would be faster if there were max length constraints in the DBMS, for example. Your answer makes me think there is something about the way RM implementations (or approximations thereof) operate that is enhanced by knowing how many card columns are required, thereby prompting developers to specify this physical design and limit the implementation unnecessarily (in ways the conceptual model need not).

> > Is your answer to this question "yes"? If so, why hasn't anyone done
> > so? It seems it would help with the goal of decoupling the physical
> > from the logical. If there is no conceptual reason for a limit,
> > couldn't the dbms take care of such physical issues?
>
> This is not what others are referring to when they mention separation
> between logical and physical design.

I know ;-)

> >
> > I'm looking right now at all of the aspects of a conceptual design that
> > need to be adjusted in what I was calling the logical (data) design
> > (the one implemented by means of a person or program interfacing with
> > the DBMS) and how such designs are done differently using the RM (or an
> > implementation thereof, however flawed) or MV.
> >
>
> That's a step down the wrong path. How could any subsequent design
> effort change the conceptual model? It's either complete and correct
> -- or it isn't.

No, no, I'm mapping the conceptual model for the purpose of implementation. The conceptual model, the business requirements, might not have a limit on the size of a color attribute, for example, but then in the "implementation model" or what I and others have called the "logical data model" we add in a length of 12 characters, for example.

Did that help clarify? --dawn Received on Sun May 28 2006 - 05:31:31 CEST

Original text of this message