HOW PROGRAMMING LANGUAGES DIFFER:
                 A CASE STUDY OF SPL, PASCAL, AND C
                       by Eugene Volokh, VESOFT
           Presented at 1987 SCRUG Conference, Pasadena, CA
       Presented at 1987 INTEREX Conference, Las Vegas, NV, USA
           Published by The HP CHRONICLE, May 1987-May 1988.

ABSTRACT: The HP3000's wunderkind sets out to study Pascal, C and SPL
for  the HP mini  in a set of articles,  using real-life examples and
plenty of tips on how to code for optimum efficiency in each language.
First in the series:  ground rules  for the comparison  and  a look at
control structures. (The HP CHRONICLE, May 1987)

INTRODUCTION

   Programmers  get  passionate about programming  languages. We spend
most  of  our  time hacking code,  exploiting the language's features,
being bitten by its silly restrictions. There are dozens of languages,
and  each  one has its fanatical  adherents and its ardent detractors.
Some  like APL, some like FORTH, LISP, C, PASCAL; some might even like
COBOL or FORTRAN, perish the thought.

   In particular, a lot of fuss has recently arisen about SPL, PASCAL,
and  C.  All  three  of them are  considered good "system programming"
(whatever  that is) languages, and  naturally people argue about which
one is the best.

   HP's  Spectrum  project has come out in  favor of PASCAL -- all new
MPE/XL  code  will  be written in PASCAL, and  HP won't even provide a
native  mode  SPL compiler. On the other  hand, HP's also getting more
and more into UNIX, which is coded entirely in C. Especially between C
and PASCAL adherents there seems to be something like a "holy war"; it
becomes not just a matter of advantages and disadvantages, but of Good
and  Evil, Right and Wrong. Strict type  checking is Good, some say --
loose  type checking is Evil; pointers  are Wrong -- array indexing is
Right. The battle-lines are drawn and the knights are sharpening their
swords.

   But,  some ask -- what's the big  deal? After all, it's an axiom of
computer  science  that all you need is an  IF and a GOTO, and you can
code anything you like. Theoretically speaking, C, SPL, and PASCAL are
all equivalent; practically, is there that much of a difference?

   In  other words, is it just esthetics or prejudice that animate the
ardent  fans  of  C,  PASCAL,  or SPL, or  are there real, substantive
differences between the languages -- cases in which using one language
rather  than another will make your life substantially easier? Are the
main differences between, say, C and PASCAL that PASCAL uses BEGIN and
END  and  C uses "{" and "}"? That  C's assignment operator is "=" and
PASCAL's is ":="?

   The  goal of this paper is to answer just this question. I will try
to analyze each of the main areas where SPL, C, and PASCAL differ, and
point  out  those differences using  actual programming examples. I'll
try  not  to  emphasize  vague, general statements,  like "PASCAL does
strict  type checking", or subjective opinions, like "C is too hard to
read";  rather,  I  want to use SPECIFIC  EXAMPLES which can help make
clear  the  exact  influence of strict or  loose type checking on your
programming tasks.


                          RULES OF EVIDENCE

   Saying that I'll "compare SPL, PASCAL, and C" isn't really saying a
whole  lot.  How  will  I  compare  them? What criteria  will I use to
compare  them?  Will  I  compare how easy it is  to read them or write
them?  Will  I  compare what programming habits  they instill in their
users? Which versions of these languages will I compare?

   To  do  this, and to do this in  as useful a fashion as possible, I
set myself some rules:

   *  I  resolved  to try to show the  differences by use of examples,
     preferably  as  real-life  as  possible. The emphasis  here is on
     CONCRETE  SPECIFICS, not on general statements such as "C is less
     readable" or "PASCAL is more restrictive".

   *  I  decided  not to go into  questions of efficiency. Compiling a
     certain  construct  using  one  implementation of  a compiler may
     generate  fast  code,  whereas  a  different  implementation  may
     generate slow code. Sure, the FOR loop in PASCAL/3000 may be less
     efficient  than in SPL or in CCS's C/3000, but who knows how fast
     it'll be under PASCAL/XL?

     For  this  reason,  I  don't wax too  poetic about the efficiency
     advantages  of features such as C's  "X++" (which increments X by
     1)  --  a modern optimizing compiler  is quite likely to generate
     equally  fast code for "X:=X+1", automatically seeing that it's a
     simple  increment-by-one (even the  15-year-old SPL/3000 compiler
     does this).

     The  only times when I'll mention efficiency is when some feature
     is  INHERENTLY more or less efficient than another (at least on a
     conventional machine architecture); for instance, passing a large
     array BY VALUE will almost certainly be slower than passing it BY
     REFERENCE,  since by-value passing would  require copying all the
     array data.

     Even   in   these   cases,   I   try  to  play  down  performance
     considerations;  if  you're  concerned  about speed  (as well you
     should be), do your own performance measurements for the features
     and compiler implementations that you know you care about.

   *  I  resolved -- for space reasons if for  no other -- not to be a
     textbook  for  SPL, PASCAL, or C. Some  of the things I say apply
     equally well to almost all programming languages, and I hope that
     they will be understandable even to people who've never seen SPL,
     PASCAL, or C.

     For  other  things,  I  rely  on the relative  readability of the
     languages and their similarity to one another. I hope that if you
     know  any  one  of  SPL,  PASCAL,  or  C,  you should  be able to
     understand the examples written in the other languages.

     However,  it may be wise for you  to have manuals for these three
     languages  --  either  their  HP3000  implementations  or general
     standards  -- at hand, in case  some of the examples should prove
     too arcane.

   *  As you can tell by the size  of this paper, I also decided to be
     as thorough as practical in my comparisons, and ESPECIALLY in the
     evidence backing up my comparisons.

     One  of the main reasons I wrote this paper is that I hadn't seen
     much  OBJECTIVE  discussion comparing C and  PASCAL; I wanted not
     just  to present my conclusions -- which might as easily be based
     on  prejudice as on fact -- but also the reasons why I arrived at
     them, so that you could decide for yourself.

     So  as  not to burden you with  having to read all 200-odd pages,
     though,  I've summarized my conclusions in the "SUMMARY" chapter.
     You  might  want to have a look  there first, and then perhaps go
     back to the main body of the paper to see the supporting evidence
     of the points I made.


                    WHAT ARE C AND PASCAL, ANYWAY?

   If  you  think about it, SPL is  a very unusual language indeed. To
the  best of my knowledge, there is exactly one SPL compiler available
anywhere,  on any computer (eventually, the independent SPLash! may be
available  on  Spectrum,  but  that is another story).  I can say "SPL
supports  this"  or  "SPL  can't  do that"  and, excepting differences
between  one chronological version of SPL  and the next, be absolutely
precise  and  objectively  verifiable.  SPL  can be  said to "support"
something  only  because  there  is  only one SPL  compiler that we're
talking about.

   To  say  "PASCAL  can  do  X" is a  chancy proposition indeed. ANSI
Standard  PASCAL  doesn't  support  variable-length strings,  but most
modern  PASCAL implementations, including HP PASCAL, have some sort of
string  mechanism.  What about HP's new  PASCAL/XL, reputed to be even
more powerful still? Similarly, with C, there are the old "Kernighan &
Ritchie"  C, the proposed new ANSI standard  C, whatever it is that HP
uses on the Spectrum, AND whatever you use on the 3000, which might be
CCS's C compiler or Tymlabs' C.

   On  the one hand, I contemplated  comparing standard C and standard
PASCAL.  This  is  easier  for  me,  and  it  also makes  sense from a
portability  point  of  view  (if  you want it  to be portable, you're
better off using the standard, anyway).

   On  the other hand, portability is  fine and dandy, but most people
aren't  going  to  be porting their software  any further than from an
MPE/XL  machine to an MPE/V machine and  back. As long as you stick to
HP3000s, you have the full power of so-called "HP PASCAL", an extended
superset  of  PASCAL that's supported on  3000s, 1000s, 9000s, and the
rest; it's hardly fair (or practical) to ignore this in a comparison.

   Finally,   what  about  PASCAL/XL?  It'll  have  even  more  useful
features,  but  they may not be ported  back to the MPE/V machines, at
least  for  a  while.  Should  I  then  compare PASCAL/XL  and C/XL, a
representative  contest  for the XL machines,  but not necessarily for
MPE  V  machines,  and  certainly not if you  really want to port your
software onto other machines.

   This  is  all,  incidentally,  aggravated  by  the  fact  that HP's
extensions  to  PASCAL are more substantial  than its extensions to C;
thus,  comparing  the  "standards"  is  likely  to  put  PASCAL  in  a
relatively  worse  light  than comparing "supersets"  (not to say that
PASCAL is worse than C in either case).

   Faced  with  all  this,  I've  decided  to compare  everything with
everything else. There are actually 7 different compilers I discuss at
one time or another:

   * SPL.
     There's only one, thank God.

   * Standard PASCAL.
     This  is  the original ANSI Standard,  on which all other PASCALs
     are  based.  This  is  also very similar to  Level 0 ISO Standard
     PASCAL (see next item).

   * Level 1 ISO Standard PASCAL.
     This  standard,  put out in the  early 1980's, supports so-called
     CONFORMANT  ARRAY  parameters (see the  DATA STRUCTURES chapter).
     The  same standard document defined "Level 0 ISO Standard PASCAL"
     to   be   much  like  classic  "Standard  PASCAL",  i.e.  without
     conformant  arrays.  Compiler  writers  were given  the choice of
     which  one to implement, and it isn't obvious how popular Level 1
     ISO  Standard  will be. When I say  "Standard PASCAL", I mean the
     original  standard, which is almost identical  to the ISO Level 0
     Standard.

   * PASCAL/3000.
     This is HP's implementation of PASCAL on the pre-Spectrum HP3000.
     Although the Spectrum machines will also be called 3000's, when I
     say  PASCAL/3000 I mean the  pre-Spectrum version. PASCAL/3000 is
     itself  a superset of HP Pascal,  which is also implemented by HP
     on  HP  1000s  and  HP  9000s.  PASCAL/3000 is a  superset of the
     original Standard PASCAL, not the ISO Level 1 Standard.

   * PASCAL/XL.
     This  is  HP's  implementation  of  PASCAL on  the Spectrum. It's
     essentially  a  superset of both PASCAL/3000  and the ISO Level 1
     Standard.

   * Kernighan & Ritchie (K&R) C.
     This  is the C described by Brian Kernighan and Dennis Ritchie in
     their  now-classic  book "The C  Programming Language" (which, in
     fact,  is usually called "Kernighan and Ritchie"). Although never
     an  official standard, it is  quite representative of most modern
     C's.  In  fact,  for  practical  purposes, it can  be said that a
     program  written  in  K  &  R  C  is portable to  virtually any C
     implementation  (assuming you avoid those  things that K&R itself
     describes as implementation-dependent).

   * Draft ANSI Standard C.
     ANSI is now working on codifying a standard of C, which will have
     some  (but not very many) improvements over K&R. My reference for
     this  was Harbison & Steele's book "C: A Reference Manual", which
     also discusses various other implementations of C. Although Draft
     ANSI  Standard  C  is  Standard,  it  is also Draft.  Some of the
     features  described in it are  implemented virtually nowhere, and
     it's not clear how much of them C/XL will include.

   Matters  are  further  complicated,  of  course, by the  lack of an
HP-provided C compiler on the pre-Spectrum HP3000. The compiler I used
to  research  this  paper  is  CCS Inc.'s C/3000  compiler, which is a
super-set  of  K&R  C and a subset of  Draft ANSI Standard C. The most
conspicuous  Draft Standard feature that  CCS C/3000 lacks is Function
Prototypes  --  an  understandable  lack  since virtually  all other C
compilers don't have them, either.

   Whenever  any  difference  exists  between  any of the  PASCAL or C
versions,  I try to point it out. Which versions you compare are up to
you:

   * You can compare Standard PASCAL and K&R C.
     If it isn't in these general standards that everybody implements,
     you're unlikely to get much portability.

   * You can compare PASCAL/XL and Draft ANSI Standard C.
     These are the compilers that will most likely be available on the
     Spectrum.

   * You can compare PASCAL/3000 and Draft ANSI Standard or K&R C.
     Even  though you might not usually care about porting to, say, an
     IBM  or a VAX, you may very seriously care about porting from the
     pre-Spectrum  to the Spectrum and  vice versa. HP hasn't promised
     to  port  PASCAL/XL back to the  pre-Spectrums, so PASCAL/3000 is
     probably the lowest common denominator.

SPL  is  nice.  At  least  until  SPLash!'s  promised Native  Mode SPL
compiler  comes  out,  there's only one SPL  compiler to compare with.
This makes me very happy.


                        ARE C, PASCAL, AND SPL
                      FUNDAMENTALLY DIFFERENT OR
                         FUNDAMENTALLY ALIKE?

   In my opinion, they are definitely FUNDAMENTALLY ALIKE. In the rest
of the paper, I'll tell you all about their differences, but those are
EXCEPTIONS in their fundamental similarity.

   Why  do  I  think so? Well, virtually  every important construct in
either  of  the  three  languages has an almost  exact parallel in the
other two (the only exception being, perhaps, record structures, which
SPL doesn't have).

   *  All  three languages emphasize writing your  program as a set of
     re-usable,  parameterized  procedures  or  functions  (which, for
     instance, COBOL 74 and most BASICs do not);

   *  All three languages share virtually the same rich set of control
     structures (which neither FORTRAN/IV nor BASIC/3000 possesses).

   *  The languages may on the surface LOOK somewhat different (PASCAL
     and  C certainly do), but remember  that the ESSENCE is virtually
     identical  --  PASCAL may say "BEGIN" and  "END" where C says "{"
     and "}", but that's hardly a SUBSTANTIVE difference.

   Despite  all  the  differences  which  I'll  spend all  these pages
describing  --  and  I  think many of the  differences are indeed very
important  ones -- I still think that  SPL, PASCAL, and C are about as
close to each other as languages get.


              SO, WHICH IS BETTER -- C, PASCAL, OR SPL?

   You  think  I'm  going  to answer that? With  all my pretensions to
objectivity,  and dozens of angry language fanatics ready to berate me
for choosing the "wrong one"?

   The  main purpose of this paper is  to show you all the differences
and  let  you  decide  for  yourselves;  after all, there  are so many
parameters  (how portable do you want the  code to be? how much do you
care  about  error  checking?)  that  are  involved  in  this  sort of
decision.

   The  closest  I  come to actually saying which  is better is in the
"SUMMARY" chapter (at the very end of the paper); there I explain what
I  think the major drawbacks and advantages of each language are. Look
there,  but remember -- only you can decide which language is best for
your purposes.


                   TECHNICAL NOTE ABOUT C EXAMPLES

   In  case  you  didn't  know,  C  differentiates between  upper- and
lower-case.  The  variables "file" and "FILE"  are quite different, as
are  "file",  "File", and "fILE". (In SPL  and PASCAL, of course, case
differences are irrelevant; all of the just-given names would refer to
the same variable.)

   In  fact,  in  C  programs the majority of  all objects -- reserved
words,  procedure  names,  variables,  etc.  --  are  lower-case.  The
reserved  words ("if", "while", "for", "int", etc.) are required to be
lower-case  by  the  standard;  theoretically,  you can  name all your
variables  and  procedures  in upper-case, but  most C programmers use
lower-case  for them, too (although  they can sometimes use upper-case
variable names as well, perhaps to indicate their own defined types or
#define macros).

   This  is  why  all  the  examples  of C programs  in this paper are
written  in lower-case. The one exception to this is when I refer to a
C  object -- a variable, a procedure, or a reserved word -- within the
text of a paragraph. Then, I'll often capitalize it to set it off from
the rest of the paper, to wit:

   proc (i, j)
   int i, j;
   {
   if (i == j)
     ...
   }

   The procedure PROC takes two parameters, I, and J.
   The IF statement makes sure that they're equal, ....

   The  fact  that  I refer to them in  upper-case in the text doesn't
mean  that  you should actually use upper-case  names. I just do it to
make the text more readable.

   Another  example  of  how a little lie  can help reveal the greater
truth...


                           ACKNOWLEDGMENTS

   I'd  like to thank the following people for their great help in the
writing of this paper:

   *  CCS, Inc., authors of CCS  C/3000, a C compiler for pre-Spectrum
     HP3000s.  All the research and testing of the C examples given in
     this   paper   was   done  using  their  excellent  compiler.  In
     particular, I'd also like to thank Tim Chase, who gave me a great
     deal of help on some of the details of the C language.

   *  Steve Hoogheem of the HP Migration Center, who served as liaison
     between  me and the PASCAL/XL lab in answering my questions about
     PASCAL/XL.  *  Mr.  Tom  Plum  (of  Plum  Hall,  Cardiff,  NJ), a
     recognized  C  expert  and  member  of the Draft  ANSI Standard C
     committee,  who  was kind enough to  answer many of the questions
     that I had about the Draft Standard.

   *  Dennis Mitrzyk, of Hewlett-Packard, who helped me obtain much of
     my  PASCAL/XL information, and who was also kind enough to review
     this paper.

   *  Joseph Brothers, David Greer (of  Robelle), Dave Lange and Roger
     Morsch  (of State Farm Insurance), and Mark Wallace (of Robinson,
     Wallace,  and  Company),  all  of  whom  reviewed  the  paper and
     provided a lot of useful input and corrections.


                          CONTROL STRUCTURES

   GOTOs,  some  say,  are  Considered  Harmful. Perhaps  they are and
perhaps  they are not. But the major reason for the control structures
that  PASCAL  and  C  provide  (as opposed to,  say, FORTRAN IV, which
doesn't)  is not that they replace GOTOs, but rather that they replace
them  with  something  more  convenient.  If given  the choice between
saying

   IF FNUM = 0 THEN
     PRINTERROR
   ELSE
     BEGIN
     READFILE;
     FCLOSE (FNUM, 0, 0);
     END;

and

   IF FNUM <> 0 THEN GOTO 10;
     PRINTERROR;
     GOTO 20;
  10:
     READFILE;
     FCLOSE (FNUM, 0, 0);
  20:

then  I would choose the former. IF-THEN-ELSE is a common construct in
all  of  the algorithms we write, and  it's easier for both the writer
and  the reader to have a language construct that directly corresponds
to it.

   C and PASCAL share some of the fundamental control structures. Both
have


   * IF-THEN-ELSEs. They look slightly different:

       IF FNUM=0 THEN        { PASCAL }
         PRINTERROR
       ELSE
         BEGIN
         READFILE;
         FCLOSE (FNUM, 0, 0);
         END;

     and

       if (fnum==0)          /* C */
         printerror;         /* note the semicolon */
       else
         {
         readfile;
         fclose (fnum, 0, 0);
         }

     but  I hardly think the  difference very substantial. There'll be
     some who forever curse C for using lower-case or PASCAL for using
     such  L-O-N-G reserved words, like "BEGIN"  and "END"; I can live
     with either.


   * WHILE-DOs, although again there are some minor differences

       WHILE GETREC (FNUM, RECORD) DO
         PRINTREC (RECORD);

     vs.

       while (getrec (fnum, record))
         printrec (record);

   * DO-UNTILs:

       REPEAT
         GETREC (FNUM, RECORD);
         PRINTREC (RECORD);
       UNTIL
         NOMORERECS (FNUM);

     and

       do
         {
         getrec (fnum, record);
         printrec (record);
         }
       while
         (!nomorerecs (fnum));    /* "!" means "NOT" */

     Note  that  PASCAL  has  a  DO-UNTIL  and  C has  a DO-WHILE. Big
     difference.


   *  And, finally, C's and  PASCAL's procedure support is comparable,
     as well.

The  interesting  things,  of  course,  are the points  at which C and
PASCAL  differ.  There  are  some,  and for those  us who thought that
IF-THEN-ELSE  and  WHILE-DO are all the  control structures we'll ever
need, the differences can be quite surprising.


         THE "WHILE" LOOP AND ITS LIMITATIONS; THE "FOR" LOOP

   It  is, indeed, true, that all iterative constructs can be emulated
with  the WHILE-DO loop. On the other hand, why do the work if someone
else can do it for you?

   The  PASCAL FOR loop -- a child  of FORTRAN's DO -- is actually not
that hard to emulate:

   FOR I:=1 TO 9 DO
     WRITELN (I);

is identical, of course, to

   I:=1;
   WHILE I<=9 DO
     BEGIN
     WRITELN (I);
     I:=I+1;
     END;

Not  such  a  vast savings, but, still,  the FOR loop definitely looks
nicer.

   Unfortunately,  for  all  the savings that the  FOR loop gives you,
I've  found  that  it's  not as useful as  one might, at first glance,
believe.  This is because it ALWAYS  loops through all the values from
the  start to the limit. How often do you need to do that, rather than
loop  until  EITHER a limit is reached  OR another condition is found?
String  searching, for instance -- you want to loop until the index is
at  the  end of the string OR  you've found what you're searching for.
Always looping until the end is wasteful and inconvenient.

   Looking  through my MPEX source code, incidentally, I find 53 WHILE
loops  and  8  FOR loops. In my RL, the  numbers are 170 WHILEs and 38
FORs (at least 6 of these FORs should have been WHILEs if I weren't so
lazy).  (How's  that  for  an  argument -- I don't  use it, ERGO it is
useless.  I'm rather proud of it.)  In any case, though, my experience
has been that

   * THE PURE "FOR" LOOP -- A LOOP THAT ALWAYS GOES ON UNTIL THE LIMIT
     HAS  BEEN  REACHED  --  IS  NOT  AS COMMON AS  ONE MIGHT THINK IN
     BUSINESS AND SYSTEM PROGRAMS (although scientific and engineering
     applications,  which often handle matrices and such, use pure FOR
     loops more often). MORE OFTEN YOU WANT TO ALSO SPECIFY AN "UNTIL"
     CONDITION WHICH WILL ALSO TERMINATE THE LOOP.

   What I wanted, then, was simple -- a loop that looked like

   FOR I:=START TO END UNTIL CONDITION DO

For instance,

   FOR I:=1 TO STRLEN(S) UNTIL S[I]=C DO;

or

   FOR I:=1 TO STRLEN(S) WHILE S[I]=' ' DO;

What I got -- and I'm not sure if I'm sorry I asked or not -- is the C
FOR loop:

   for (i=1;  i<=strlen(s) && s[i]!=c;   i=i+1)
     ;

The  C FOR loop -- like most  things in C, accomplished with a minimum
of letters and a maximum of special characters -- looks like this:

   for (initcode; testcode; inccode)
     statement;

It is functionally identical to

   initcode;
   while (testcode)
     {
     statement;
     inccode;
     }

In  other  words,  this is a sort of  "build-your-own" FOR loop -- YOU
specify the initialization, the termination test, and the "STEP". This
is   actually  quite  useful  for  loops  that  don't  involve  simple
incrementing, such as stepping through a linked list:

   for (ptr=listhead;   ptr!=nil;   ptr=ptr.next)
     fondle (ptr);

The  above loop, of course, fondles  every element of the linked list,
something  quite  analogous to what an  ordinary PASCAL FOR loop would
do, but with a different kind of "stepping" action.

   The standard PASCAL loop, of course, can easily be emulated --

   for (i=start;   i<=limit;   i=i+1)
     statements;

   I'm  sure it would be fair to conclude that C's FOR loop is clearly
more  powerful than PASCAL's. On the other  hand, a WHILE loop is more
powerful  than  a  FOR loop, too; and, a  GOTO is the most powerful of
them  all  (heresy!).  The  reason  a  PASCAL FOR loop  -- or for that
matter,  a  C FOR loop -- is good  is because simply by looking at it,
you can clearly see that it is a WHILE loop of a particular kind, with
clearly evident starting, terminating, and stepping operations.

   The  major argument that may be made against C's for loop is simply
one of clarity. Possible reasons include:

   *  The loop variable has to be  repeated four (or three, if you use
     "i++" instead of "i=i+1") times.

   *  The  semicolons,  adequate to delimit the  three clauses for the
     compiler,  may not sufficiently delimit them to a human reader --
     it  may  not  be  instantly  obvious where one  clause starts and
     another ends.

   *  Also,  the  very  use of semicolons  instead of control keywords
     (like  "TO")  may  be  irritating; in a way,  it's like having to
     write

       FOR I,1,100

     instead of

       FOR I:=1 TO 100

     If  you think the first version  isn't any worse than the second,
     you  shouldn't mind C; some, however, find "FOR I,1,100" slightly
     less clear than "FOR I:=1 TO 100".

   for (i=1; i<=10; i=i+1)         FOR I:=1 TO 10 DO

       or, alternatively

   for (i=1; i<=10; i++)           FOR I:=1 TO 10 DO

Which  do you prefer? Frankly, for  me, the PASCAL version is somewhat
clearer,  although  I'm not prepared to say  that the clarity is worth
the  cost in power. On the other hand, many a C programmer doesn't see
any  advantage in the PASCAL style,  and perhaps there isn't any. Some
of the C/PASCAL differences, I'm afraid, boil down to simply this.


     THE WHILE LOOP AND ITS LIMITATIONS -- AN INTERESTING PROBLEM

   Consider the following simple task -- you want to read a file until
you  get a record whose first character  is a "*"; for each record you
read,  you want to execute some  statements. Your PASCAL program might
look like this:

   READLN (F, REC);
   WHILE REC[1]<>'*' DO
     BEGIN
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     READLN (F, REC);
     END;

All  well and good? But, wait a minute  -- we had to repeat the READLN
statement  a second time at the end of the WHILE loop. "Lazy bum," you
might  reply.  "Can't handle typing an extra  line." Well, what if, in
order  to  get  the  record, we had to do  more than just a READLN? We
might need to, say, call FCONTROL before doing the READLN, and perhaps
have  a  more complicated loop test. Our  program might end up looking
like:

   FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
   FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
   READLN (F, REC);
   GETFIELD (REC, 3, FIELD3);
   WHILE FIELD3<>'*' DO
     BEGIN
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
     END;

This  is not a happy-looking program. We had to duplicate a good chunk
of code, with all the resultant perils of such a duplication; the code
was harder to write, it's now harder to read, and when we maintain it,
we're  liable to change one of the occurrences of the code and not the
other.

   Workarounds, of course, exist. We can say

   REPEAT
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
     IF FIELD3 <> '*' THEN
       BEGIN
       PROCESS_RECORD_A (REC);
       PROCESS_RECORD_B (REC);
       PROCESS_RECORD_C (REC);
       END;
   UNTIL
     FIELD3 = '*';

although  this  is  also rather messy -- we've  had to repeat the loop
termination  condition,  and  the resulting code  is really a WHILE-DO
loop masquerading as a REPEAT-UNTIL.

   Some might reply that what we ought to do is to move the FCONTROLs,
READLN,  and  GETFIELD into a separate  function that returns just the
value  of FIELD3, or perhaps even the loop test (FIELD3 <> '*'). Then,
the loop would look like:

   WHILE FCONTROLS_READLN_AND_GETFIELD_CHECK_STAR (FNUM, REC) DO
     BEGIN
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     END;

This,  indeed, does look nice -- but are we to be expected to create a
new procedure every time a control structure doesn't work like we want
it  to? I like procedures just as much as the next man; in fact, I'm a
lot  more  prone  to pull code out into  procedures than others are (I
like  my procedures to be twenty lines or shorter). On the other hand,
what   if  someone  said  that  you  couldn't  use  BEGIN  ..  END  in
IF/THEN/ELSE  statements  -- if you want to  do more than one thing in
the THEN clause, you have to write a procedure?

   C  has  some  advantage  here.  With C's "comma"  operator, you can
combine  any  number  of  statements  (with some  restrictions) into a
single  expression, whose result is the last value. Thus, what you can
do is something like this:

   while ((fcontrol (fnum(f), extended_read, dummy),
           fcontrol (fnum(f), set_timeout, timeout),
           gets (f, rec, 80),
           getfield (rec, 3, field3),
           field3<>'*'))
     {
     process_record_a (rec);
     process_record_b (rec);
     process_record_c (rec);
     };

Whether  this is better or not,  you decide. The "comma" construct can
be  very confusing. In "while ((...)) do", the outside parentheses are
the  WHILE's; the inner pair is  the comma constructs'; and all others
belong  to internal expressions and  function calls. Additionally, you
have  to  keep track of which commas  belong to the function calls and
which delimit the comma constructs' elements. &P

   What  is  that  slithering  underfoot? Could it  be the serpent? He
proposes this:

   WHILE TRUE DO
     BEGIN
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
     IF FIELD3<>'*' THEN GOTO 99;
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     END;
  99:

"Sssimple and ssstraightforward, madam.  Won't you have a bite?" Shame
on  you!  Still, it's not obvious that  the old faithful "GOTO" isn't,
relatively  speaking,  a  reasonable solution. C  has its own variant,
that lets us get away without using the "forbidden word":

   while (TRUE)
     {
     fcontrol (fnum(f), extended_read, dummy);
     fcontrol (fnum(f), set_timeout, timeout);
     gets (f, rec, 80);
     getfield (rec, 3, field3);
     if (field3='*') break;
     process_record_a (rec);
     process_record_b (rec);
     process_record_c (rec);
     };

C's  "BREAK"  construct  gets  you  out  of the  construct that you're
immediately  in,  be  it  a  WHILE  loop  (as in this  case), a SWITCH
statement  (in  which it is vital), a FOR,  or a DO. If you believe in
the evil of GOTOs, you probably won't much like BREAKs; again, though,
I  ask -- is the above example any  less muddled than the other ones I
showed?

   Incidentally,  the best approach that I've seen so far comes from a
certain  awful, barbarian language called  FORTH (OK, all you FORTHies
--  meet  me  in  the  alley  after the talk and  we can have it out).
Translated into civilized terms, the loop looked something like this:

   DO
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
   WHILE FIELD3<>'*'
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
   ENDDO;

This  so-called  "loop-and-a-half"  solves  what  I  think is  the key
problem,  present  in so many WHILE loops  -- that the condition often
takes  more than a single expression  to calculate. Well, in any case,
neither SPL, PASCAL, nor C have such a construct, so that's that.


       BREAK, CONTINUE, AND RETURN -- PERFECTION OR PERVERSION?

   As  I  mentioned  briefly in the last  section, C has three control
structures  that  PASCAL  does  not,  and  some say  should not. These
structures,  Comrade,  are Ideologically Suspect.  A Dangerous Heresy.
Still, they're there, and ought to be briefly discussed.

   * BREAK -- exits the immediately enclosing loop (WHILE, DO, or FOR)
     or  a  SWITCH  statement.  Essentially  a  GOTO to  the statement
     immediately following the loop.

   *  CONTINUE  --  goes  to  the "next iteration"  of the immediately
     enclosing loop (WHILE, DO, or FOR).

   *  RETURN  --  exits  the current  procedure. "RETURN <expression>"
     exits  the current procedure, returning the value of <expression>
     as the procedure's result.

   * Of course, GOTO, the old faithful.

   Now,  as  you  may  or  may not recall, a  while ago there was much
argument  made against GOTOs. Instead of GOTOs, it was said, you ought
to   use   only   IF-THEN-ELSEs   and   WHILE-DOs.  CASEs,  FORs,  and
REPEAT-UNTILs,  being  just variants of  the other control structures,
were  all  right;  but  GOTOs  were  condemned,  on several  very good
grounds:

   *  First of all, with GOTOs, the "shape" of a procedure stops being
     evident. If you don't use GOTOs, each procedure and block of code
     will  have only one ENTRY and only  one EXIT. This means that you
     can  always  assume  that  control  will  always  flow  from  the
     beginning  to  the  end, with iterations  and departures that are
     always  clearly  defined and the conditions  for which are always
     evident.

   *  If  you avoid GOTOs, then for  any statement, you can tell under
     what  conditions  it  will  be  executed  just by  looking at the
     control structures within which it is enclosed.

These  concerns,  I  would  say,  may  apply  equally well  to BREAKs,
CONTINUEs, and RETURNs.

   Personally,  I must confess, I don't use  GOTOs. I don't know if it
is  the  appeal  of  reason, the lesson of  experience, or fear for my
immortal  soul.  About  five years ago I  resolved to stop using them;
except  for  "long  jumps"  (which I'll talk about  more later), I use
GOTOs  in 1 procedure of MPEX's 40  procedures, and in 2 procedures of
my  RL's 350 (both of the uses  of "GOTO" are as "RETURN" statements).
However, I must say that in many cases the temptation does seem great.

   Consider,  for  a  moment,  the following case. We  need to write a
procedure  that opens a file, reads some records, writes some records,
and  closes  the  file.  In case any of  the file operations fails, we
should  immediately  close  the  file  and  not do  anything else. The
"GOTO-less" solution:

   munch_file (f)
   char f[40];
   {
   int fnum;

   fnum = fopen (f, 1);
   if (error == 0)             /* let's say ERROR is an error code */
      {
      freaddir (fnum, buffer, 128, rec_a);
      if (error == 0)
         {
         munch_record_one_way (buffer);
         fwritedir (fnum, buffer, 128, rec_a);
         if (error == 0)
            {
            freaddir (fnum, buffer, 128, rec_b);
            if (error == 0)
               {
               munch_record_another_way (buffer);
               fwritedir (fnum, buffer, 128, rec_b);
               if (error == 0)
                  some_more_stuff;
               }
            }
         }
      }
   fclose (fnum, 0, 0);
   }

Or, using "GOTO":

   munch_file (f)
   char f[40];
   {
   int fnum;
   #define check_error    if (error != 0) goto done

   fnum = fopen (f, 1);
   if (error = 0)
      {
      freaddir (fnum, buffer, 128, rec_a);
      check_error;
      munch_record_one_way (buffer);
      fwritedir (fnum, buffer, 128, rec_a);
      check_error;
      freaddir (fnum, buffer, 128, rec_b);
      check_error;
      munch_record_another_way (buffer);
      fwritedir (fnum, buffer, 128, rec_b);
      check_error;
      some_more_stuff;
      }

   done:
   fclose (fnum, 0, 0);
   }

Is the latter way really worse? I'm not so sure. Also, I can't see any
way  in which I can rewrite  this example without GOTOs without making
it as cumbersome as the first case.

   Similar  examples  can  be  found  for  BREAK  and RETURN.  If, for
instance,  I  wasn't required to close the  file, I'd just do a RETURN
instead  of doing the "GOTO DONE"; if  I had to loop through the file,
my code might look something like:

   framastatify (f)
   char f[40];
   {
   int fnum;

   fnum = fopen (f, 1);
   if (error = 0)
      {
      while (TRUE)
         {
         fread (fnum, rec1, 128);
         if (error != 0) break;
         if (frob_a (rec1) == failed)
            break;
         fupdate (fnum, rec1, 128);
         if (error != 0) break;
         freadlabel (fnum, rec1, 128, 0);
         if (error != 0) break;
         if (twiddle_label (rec1) == failed)
            break;
         fwritelabel (fnum, rec1, 128, 0);
         if (error != 0) break;
         fspace (fnum, 20);
         if (error != 0) break;
         }
      fclose (fnum, 0, 0);
      }
   }

Just IMAGINE all those IFs you'd need to nest if you avoided BREAK!

   CONTINUE,  on the other hand, is  a vile heresy. Everybody who uses
CONTINUE should be burned at the stake.

   To  summarize, "C Notes, A Guide  to the C Programming Language" by
C.T. Zahn (Yourdon 1979) says:

   "In practice, BREAK is needed rarely, CONTINUE never, and GOTO even
    less  often  than that...   It also is  good style to minimize the
    number  of  RETURN  statements;  exactly  one  at  the end  of the
    function is best of all for readability."

On the other hand, I say

   "If this be treason, make the most of it!"

Especially   if   your  procedures  are  short  enough  and  otherwise
well-written  enough, I think that you can well make the judgment that
even  with  the introduction of GOTOs, the  control flow will still be
clear enough.

   Just don't tell anyone I told you to do it.


                  LONG JUMPS -- PROBLEM AND SOLUTION

   Modern structured programming encourages FACTORING. Your algorithm,
it  says, should be broken up into small procedures, small enough that
each one can be easily understood and digested by anybody reading it.

   I'm  quite  fond  of  factoring myself, and you'll  find most of my
procedures  to  be  about 20-odd lines long or  shorter. I try to make
each procedure a "black box", with a well-defined, atomic function and
no  unobvious  side effects. Naturally, with  procedures this small, I
often  end  up  going  several levels of procedure  calls deep to do a
relatively simple task.

   For  instance, I might have a procedure called ALTFILE that takes a
file  name  and a string of keywords indicating  how the file is to be
altered:

   * ALTFILE calls PARSE_KEYWORDS to parse the keyword string;

   *  PARSE_KEYWORDS  separates  the string  into individual keywords,
     calling PROCESS_KEYWORD for each one;

   * PROCESS_KEYWORD figures out what keyword is being referenced, and
     calls   a   parsing   routine   --   PARSE_INTEGER,   PARSE_DATE,
     PARSE_INT_ARRAY,  etc. -- depending on the  type of the value the
     user specified;

   *  PARSE_INT_ARRAY takes a list of integer values delimited by, say
     ":"s, and calls PARSE_INTEGER for each one.

   *  PARSE_INTEGER  converts  the  text string  containing an integer
     value into a number and returns the numeric value.

Not  a  far-fetched  example,  you  must  agree;  in fact,  many of my
programs  (e.g.  MPEX's  %ALTFILE  parser) nest even  deeper. Now, the
question  arises -- what if PARSE_INTEGER  realizes that the value the
user specified isn't a valid number after all?

   The solution seems clear -- PARSE_INTEGER, in addition to returning
the integer's value, also returns a true/false flag indicating whether
or  not  the  value was actually valid.  PARSE_INTEGER returns this to
PARSE_INT_ARRAY;  now,  PARSE_INT_ARRAY  realizes  that  its parameter
isn't  a valid integer array --  it must also return a success/failure
flag  to  PROCESS_KEYWORD;  PROCESS_KEYWORD  must  pass it  back up to
PARSE_KEYWORDS;  PARSE_KEYWORDS should return  it to ALTFILE; finally,
ALTFILE informs its caller that the operation failed.

   Let's  look  at  a particular specimen of  one of these procedures;
say,  the  portion  that  handles  the keyword FOOBAR,  which the user
should specify in conjunction with an integer array, a string, and two
dates:

   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET_SUBPARM (0, PARM_STRING);
     IF PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE) = FALSE THEN
       PARSE_KEYWORD:=FALSE
     ELSE
       BEGIN
       GET_SUBPARM (1, PARM_STRING);
       IF PARSE_STRING (PARM_STRING, SP1_VALUE) = FALSE THEN
         PARSE_KEYWORD:=FALSE
       ELSE
         BEGIN
         GET_SUBPARM (2, PARM_STRING);
         IF PARSE_DATE (PARM_STRING, SP2_VALUE) = FALSE THEN
           PARSE_KEYWORD:=FALSE
         ELSE
           BEGIN
           GET_SUBPARM (3, PARM_STRING);
           PARSE_KEYWORD:=
             PARSE_DATE (PARM_STRING, SP3_VALUE) = FALSE ;
           END;
         END;
       END;
     END;
   ...

Of  course,  the  same  sort  of  thing  has  to be  repeated in every
procedure  in  the  calling  sequence;  the moment an  error return is
detected from one of the called procedures, the other calls have to be
skipped,  and  the  error  condition  should be passed  back up to the
caller.

   Error  handling,  of  course,  is important business,  and it would
hardly be appropriate to crash and burn just because the user inputs a
bad  value (users input bad values all the time). Still, all this work
just to catch the error condition?

   What we really want to do in this case is to

   * HAVE WHOEVER DETECTS THE ERROR CONDITION AUTOMATICALLY RETURN ALL
     THE WAY TO THE TOP OF THE CALLING SEQUENCE.

In other words, the error finder might have code that looks like:

   NUM:=BINARY (STR, LEN);
   IF CCODE<>CCE THEN                  { an error detected? }
     SIGNAL_ERROR;                     { return to the top! }

The  procedure we want to return to would indicate its desire to catch
these errors by saying something like:

   ON ERROR DO
     { the code to be activated when the error is detected };
   RESULT:=ALTFILE (FILE, KEYWORDS);

Finally,   the   intermediate  procedures  can  now  be  the  soul  of
simplicity:

   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET_SUBPARM (0, PARM_STRING);
     PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE);
     GET_SUBPARM (1, PARM_STRING);
     PARSE_STRING (PARM_STRING, SP1_VALUE);
     GET_SUBPARM (2, PARM_STRING);
     PARSE_DATE (PARM_STRING, SP2_VALUE);
     GET_SUBPARM (3, PARM_STRING);
     PARSE_DATE (PARM_STRING, SP3_VALUE);
     END;
   ...

Thus, the three components of this scheme:

   * The code that finds the error -- it "SIGNALS THE ERROR";

   *  The code that should be branched  to in case of error is somehow
     indicated,  at compile time or run  time (but before the error is
     actually signaled).

   *  Finally, the intermediate code  knows nothing about the possible
     error condition. It's automatically exited by the error signaling
     mechanism.

For  want of a better name, I'll call this concept a "Long Jump". It's
also  been called a "non-local GOTO", a "throw", a "signal raise", and
other  unsavory  things, but "Long Jump" --  which happens to be the C
name for it -- sounds more romantic.


           LONG JUMPS, CONTINUED -- SOLUTIONS AND PROBLEMS

   I've  indicated the need -- or at least, I think it's a need -- and
a  possible  prototype solution. There  are several implementations of
this already extant, each with its own little quirks and problems.


                     PASCAL -- STANDARD AND /3000

   The  only  mechanism  Standard  PASCAL and PASCAL/3000  give you to
solve  our problem is the GOTO. In  PASCAL, you're allowed to GOTO out
of  a  procedure  or function; however, you  can only branch INTO the
main body of the program or from a nested procedure into the procedure
that contains it. In other words, if you have

   PROCEDURE P;
     PROCEDURE INSIDE_P;   { nested in P }
     BEGIN
     ...
     END;
   BEGIN
   ...
   END;

   PROCEDURE Q;
   BEGIN
   ...
   P;
   ...
   END;

then  you can branch from INSIDE_P into P, but you can't branch from P
into Q, even though Q calls P.

   Even if this restriction weren't present, the GOTO to a fixed label
still  wouldn't  be  the  right  answer -- what  if our PARSE_KEYWORDS
procedure  is called from two places? Surely we wouldn't want an error
condition  to  cause  a  branch  to  the same location  in both cases!
Besides,  if  we  want  to compile PARSE_KEYWORDS  separately from its
caller,  we'd  have  to  allow  "global label  variables". In reality,
PASCAL can't do these "long jumps".


                                 SPL

   SPL  has a different and rather  better facility. In SPL, you can't
branch  from one procedure into another; however, you CAN pass a label
as a parameter to a procedure. Thus, you could write:

   PROCEDURE PARSE'INT'ARRAY (PARM, RESULT, ERR'LABEL);
   BYTE ARRAY PARM;
   INTEGER ARRAY RESULT;
   LABEL ERR'LABEL;
   BEGIN
   ...
   IF << test for error condition >> THEN
     GOTO ERR'LABEL;
   ...
   END;

Then, you might call this from within PROCESS'KEYWORD by saying

   PROCEDURE PROCESS'KEYWORD (KEYWORD'AND'PARM, ERR'LABEL);
   BYTE ARRAY KEYWORD'AND'PARM;
   LABEL ERR'LABEL;
   BEGIN
   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET'SUBPARM (0, PARM'STRING);
     PARSE'INT'ARRAY (PARM'STRING, SP0'VALUE, ERR'LABEL);
     ...
     END;
   ...
   END;

When  you  call  PARSE'INT'ARRAY,  you  pass it the  label to which it
should return in case of error -- in this case, also called ERR'LABEL,
which  was  also  passed  to  this  procedure.  Finally,  the  topmost
procedure -- ALTFILE -- might say:

   RESULT:=ALTFILE (FILENAME, KEYWORDS, GOT'ERROR);
   ...
   GOT'ERROR:
     << report to the user that an error occurred >>

The  key  point  here  is  that  each procedure  doesn't really return
directly to the top; rather, it returns to the error label that it was
passed  by its caller. Since that may  well be the label passed by the
caller's  caller, and so on, you get a sort of "daisy chain" effect by
which  you  can  easily  exit  ten  levels  of procedures  in one GOTO
statement.

   At  this  point,  I think it's quite  important to mention a SEVERE
PROBLEM  of  these  "long  jumps"  that  I  think  any  implementation
mechanism has to be able to address:

   *  THE  VERY ESSENCE OF A LONG JUMP  IS THAT IT BYPASSES SEVERAL OF
     THE  PROCEDURES  IN  THE CALLING SEQUENCE.  A PROCEDURE (say, our
     PROCESS_KEYWORD) CALLS ANOTHER PROCEDURE, EXPECTING THE CALLEE TO
     RETURN, BUT THE CALLEE NEVER DOES!

   Imagine  for a moment that PROCESS_KEYWORD opened a file, intending
to  close it at the end of the operation; after the long jump branches
out  of  it,  the file will remain open.  Any other kind of cleanup --
resetting  temporarily  changed  global variables,  releasing acquired
resources  --  that a procedure expects to  do at the end might remain
undone because the procedure will be branched out of.

   Similarly,  what  if a procedure EXPECTS  another procedure that it
calls  to detect an error condition? What  is a fatal error under some
circumstances  may be quite normal under others; for instance, say you
have  a procedure that reads data from  a file and signals an error if
the  file couldn't be opened -- in some cases, you may expect the file
to be unopenable, and have a set of defaults you want to use instead.

   By using the convenience of long jumps, you lose the certainty that
every  procedure  has complete control over  its execution, and can be
sure that any procedure it calls will always return.

   The  advantage of SPL's approach is that you could call a procedure
passing   to   it   any   error  label  you  want  to.  For  instance,
PROCESS'KEYWORD might look like:

   PROCEDURE PROCESS'KEYWORD (KEYWORD'AND'PARM, ERR'LABEL);
   BYTE ARRAY KEYWORD'AND'PARM;
   LABEL ERR'LABEL;
   BEGIN
   INTEGER FNUM;
   FNUM:=FOPEN (KEY'INFO'FILE, 1);
   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET'SUBPARM (0, PARM'STRING);
     PARSE'INT'ARRAY (PARM'STRING, SP0'VALUE, CLOSE'FILE);
     ...
     END;
   ...
   RETURN;   << if we finished normally, just return >>
   CLOSE'FILE:   << branch here in case of error >>
   FCLOSE (FNUM, 0, 0);
   GOTO ERR'LABEL;
   END;

Because  you have complete control over each branch, you don't HAVE to
pass the procedure you call the same error label that you were passed;
if  you want to do some cleanup, you can just pass the label that does
the cleanup, and THEN returns to your own error label.

   Thus,  with SPL's label parameter system,  you get the best of both
worlds:

   *  If  you pass an "error label"  to a procedure, the procedure may
     choose to return normally or to return to the error label.

   *  Since you can pass the same label to a procedure you call as the
     one  that  you yourself were passed, a  single GOTO to that label
     can conceivably exit any number of levels of procedures.

   *  On the other hand, if you want  to do some cleanup in case of an
     error,  you  can just pass a different  label, one that points to
     the cleanup code.

   * Finally -- if you want to -- you can actually pass several labels
     to  a  procedure,  allowing  it  to  return  to  a  different one
     depending on what error condition it finds. A bit extravagant for
     my blood, but maybe I'm just too stodgy.

   The only problems that this system has are:

   *  You  have  to  pass  the  label  to  any  procedure  that  might
     conceivably  want  to  participate  in a long  jump -- either the
     procedure  that initially detects the error or any one that wants
     to  pass  it on. This may often  mean that virtually every one of
     your procedures will have to have this error label parameter. Not
     a very unpleasant problem, but a bit of a bother nonetheless.

   *  Similarly, there are some  procedures whose parameters you can't
     dictate; for instance, control-Y trap procedures (ones in which a
     long  jump  to the control-Y handling code  may often be just the
     thing  you  want  to  do).  Other  trap  procedures  (arithmetic,
     library,  and system) are just like this, too, as are those which
     are   themselves   passed  as  "procedure  parameters"  to  other
     procedures  and  whose  parameters  are  dictated by  those other
     procedures (got that?).

Besides these minor problems, though, SPL's long jump support is quite
reasonably done.


                       PROPOSED ANSI STANDARD C

   C's  "GOTO" doesn't allow any branch  from one function to another;
neither does C provide label parameters like SPL does. Long jumps in C
are  accomplished with a different mechanism, involving the SETJMP and
LONGJMP built-in procedures.

   SETJMP  is a procedure to which you pass a record structure (of the
predefined  type "jmp_buf"). When you first  call it, it saves all the
vital  statistics  of the program --  the current instruction pointer,
the  current  top-of-stack address, etc. --  in this record structure.
Then,  when  the  same record structure is  passed to LONGJMP, LONGJMP
uses  this  information  to restore the  instruction pointer and stack
pointer  to be exactly what they were at SETJMP time. Thus, control is
passed back to the SETJMP location, wherever it may be.

   A typical application of this might be:

   jmp_buf error_trapper;

   proc()
   {
   ...
   if (setjmp(error_trapper) != 0)
      /* do error processing */;
   else
      {
      result = altfile (filename, keywords);
      ...
      }
   ...
   }

   ...

   int parse_integer (str)
   char str[];
   {
   ...
   if (bad_value)
      longjmp (error_trapper, 1);
   ...
   }

   One  thing,  I  didn't,  as  you see, mention at  first was the "IF
(SETJMP(ERROR_TRAPPER)  != 0)". Well, since the LONGJMP jumps DIRECTLY
to  the instruction following the SETJMP, we  have to have some way of
distinguishing  the  first  time  it  is executed  (after a legitimate
SETJMP) and the next time (after the LONGJMP which transferred control
back to it). The initial SETJMP, you see, returns a 0; a LONGJMP takes
its  second  parameter  (in  this  case,  a 1), and  returns it as the
"result" of SETJMP.

   Thus,  when  the  IF statement is first  executed, the value of the
"(SETJMP  ... != 0)" will be FALSE, and the ALTFILE will be done; when
the  IF  is executed a second time, the  value will be TRUE, the error
processing will be performed.

   Note the distinctive features of the SETJMP/LONGJMP construct:

   *  The  "jump buffer" -- set by SETJMP  and used by LONGJMP -- need
     not  be  passed  as  a parameter to each  procedure that needs it
     (although  it  could  be).  Typically,  it's  stored as  a global
     variable (which the SPL error label parameter couldn't be).

   *  You still have control over procedures  you call; if you want to
     trap  their jump yourself (either to  do some cleanup or treat it
     as a normal condition), you can just do your own SETJMP using the
     same buffer that they'll LONGJMP with.

   *  On the other hand, if you want do some cleanup and then continue
     the LONGJMP process -- propagate it back up to the original error
     trapper,  in this case PROC -- you have to do more work. You must
     save  the  original  jump  buffer in a  temporary variable before
     doing  the  SETJMP, and restore it  before continuing the LONGJMP
     (or   simply   returning   from  the  procedure).  For  instance,
     PROCESS_KEYWORD might look like this:

     process_keyword (keyword_and_parm)
     char keyword_and_parm[];
     {
     jmp_buf error_trapper;   /* declare our temporary save buffer */
     int fnum;
     fnum = fopen (key_info_file, 1);
     save_error_trapper = error_trapper;
     if (setjmp (error_trapper) != 0)
        /* Must be an error condition */
        {
        fclose (fnum, 0, 0);
        error_trapper = save_error_trapper;
        longjmp (error_trapper, 1);
        }
     ...
     if (strcmp (keyword, "foobar"))
       {
       get_subparm (0, parm_string);
       parse_int_array (parm_string, sp0_value);
       ...
       }
     ...
     fclose (fnum, 0, 0);
     error_trapper = save_error_trapper; /* restore for future use */
     }

   Frankly  speaking,  if you ask me -- and  even if you don't -- this
doesn't  look  very  clean. I'd like to  see some way of automatically
"stacking"  SETJMPs so that the system would  do the saving of the old
jump  buffer  for you; also, I'd prefer not  to have to type that ugly
"IF  (SETJMP  ...  != 0)" kludge. On the  other hand, this can be made
quite  palatable-looking  with  a  few  macros,  and it's  better than
nothing (or is it?).


                    PASCAL/XL AND THE TRY..RECOVER

   The  authors  of PASCAL/XL -- perhaps  because they were faced with
the  non-trivial  task  of  building  a language that  MPE/XL could be
profitably  written in -- must have given this subject a great deal of
thought.  And, fortunately, they've come up with  what I think to be a
very powerful construct.

   TRY
     statement1;
     statement2;
     ...
     statementN;
   RECOVER
     recoverycode;

The behavior here is

   *  EXECUTE statement1 THROUGH statementN. IF ANY PASCAL ERROR (e.g.
     giving a bad numeric value to a READLN) OR A CALL TO THE BUILT-IN
     "ESCAPE"  PROCEDURE  OCCURS  WITHIN THESE  STATEMENTS, CONTROL IS
     TRANSFERRED  TO  recoverycode,  AND  AFTER THAT  TO THE STATEMENT
     FOLLOWING TRY..RECOVER.

   This,  as  you  see,  allows  you  to  put a  TRY..RECOVER into the
top-level  procedure (in our case, PROC or ALTFILE) and an ESCAPE call
in  any  of the called procedures  (e.g. PARSE_INTEGER) that detects a
fatal error.

   The  best  part,  though,  is  that  any  procedure  that  wants to
establish  some  sort  of  "cleanup"  code can do  this trivially! For
instance, our PROCESS_KEYWORD might say:

   PROCEDURE PROCESS_KEYWORD (VAR KEYWORD_AND_PARM: STRING);
   VAR FNUM: INTEGER;
       SAVE_ESCAPECODE: INTEGER;
   BEGIN
   FNUM:=FOPEN (KEY_INFO_FILE, 1);
   TRY
     ...
     IF KEYWORD="FOOBAR" THEN
       BEGIN
       GET_SUBPARM (0, PARM_STRING);
       PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE);
       END;
     ...
     FCLOSE (FNUM, 0, 0);
   RECOVER
     BEGIN
     SAVE_ESCAPECODE:=ESCAPECODE;
     FCLOSE (FNUM, 0, 0);
     ESCAPE (SAVE_ESCAPECODE);
     END;
   END;

If any error occurs in the code between TRY and RECOVER, the BEGIN/END
in  the RECOVER part is triggered. This is now free to close the file,
or  do  whatever  else it needs to, and  then "pass the error down" by
calling ESCAPE again.

   This ESCAPE -- since it's no longer between this TRY and RECOVER --
will  activate  the previously defined TRY/RECOVER  block (say, in the
PARSE_KEYWORDS  procedure)  which might do more  cleanup and then call
ESCAPE  again.  Eventually,  the error will  percolate to the top-most
TRY/RECOVER,  which  will  just  do some work and  not call ESCAPE any
more, continuing with the rest of the program.

   In  other words, "TRY .. RECOVER"s  can be nested. In the following
piece of code

   TRY
     A;
     TRY
       B;
       TRY
         C;
       RECOVER
         R1;
       D;
     RECOVER
       R2;
     E;
   RECOVER
     R3;

   * An error or ESCAPE in C will cause a branch to R1.

   *  An error/ESCAPE in B or D will, of course, branch to R2 (since B
     and  D are outside the innermost  TRY .. RECOVER R1). However, an
     error/ESCAPE in R1 will also cause a branch to R2! That's because
     R1  is  also  out  of the area of effect  of the innermost TRY ..
     RECOVER.

     In  other words, the "recovery  handler" R1 is only "established"
     between  the  innermost TRY and the  innermost RECOVER; when it's
     actually  "triggered",  it's  disestablished,  and  the  recovery
     handler that was previously in effect is re-established.

   * By this token, an error/ESCAPE in A, E, or R2 will branch to R3.

   *  And, finally, an error in R3  -- or anywhere else outside of the
     TRY  .. RECOVER -- will actually  abort the program with an error
     message.

   As  you see, then, all is for the best in this best of all possible
worlds.  We can do long jumps "up  the stack" to the RECOVER code, but
each  intervening procedure can also easily set up "cleanup code" that
needs to be executed before the long jump can continue.

   Several notes:

   *  First  of  all, remember that the  RECOVER statement is executed
     ONLY  in case of an error or an ESCAPE. If the statements between
     TRY  and RECOVER finish normally, any "cleanup" code you may have
     inside  the  RECOVER will NOT be  executed. That's why our sample
     program  has  two FCLOSEs -- one for  the normal case and one for
     the cleanup case.

   *  Note  also that the ESCAPE call  can take a parameter (just like
     C's  LONGJMP).  This parameter is then  available as the variable
     ESCAPECODE  in the RECOVER handler, and  is used to indicate what
     kind of error or ESCAPE happened.

     A  RECOVER handler might, for instance, be used to avoid an abort
     caused  by  an  expected  error condition (e.g.  file I/O error);
     however,  if  it  sees  that  ESCAPECODE  indicates  some  other,
     unexpected,  error  condition, it might  terminate or call ESCAPE
     again,  hoping that some "higher-level"  RECOVER block can handle
     the error.

   * Finally, if a RECOVER block wants to continue the long jump after
     doing  its cleanup work, it often needs to pass the ESCAPECODE up
     as  well  (unless,  of  course, the  higher-level RECOVER handler
     won't  use  the ESCAPECODE). Unfortunately,  the PASCAL/XL manual
     explicitly tells us:

       -  "It is wise to assign  the result of the ESCAPECODE function
         to  a  local  variable immediately upon  entering the RECOVER
         part  of  a  TRY-RECOVER  construct,  because the  system can
         change that value later in the RECOVER part."

     This  is too bad; it would have  been nice to have TRY .. RECOVER
     do  this  saving for you automatically,  saving you the burden of
     having  to  declare  and  set an extra  local variable. Still, we
     oughtn't look a gift horse in the mouth.

   Note,  incidentally, how C's #define macro facility can come to our
aid  if we want to implement this same  construct in C. All we need is
three #defines:

   int escapecode;
   int jump_stack_ptr = -1;
   jmp_buf jump_stack[100];    /* the stack used to do nesting */
   #define TRY     if (setjmp(jump_stack[++jump_stack_ptr])==0) {
   #define RECOVER jump_stack_ptr--; } else
   #define ESCAPE(parm)   \
           { \
           escapecode = parm; \
           longjmp(jump_stack[jump_stack_ptr--], 1);
           }

This would allow us to say:

   TRY
     code;
   RECOVER
     errorhandler;

and

   ESCAPE(value);

just  like  we could in PASCAL/XL! Note  how we've added this entirely
new  control structure without any changes  to the compiler -- nothing
more complicated than a few #defines! (Many thanks to Tim Chase of CCS
for showing me how to do this!)


                          NESTED PROCEDURES

   An  interesting feature of PASCAL is its ability to have procedures
nested within other procedures. In other words, I could say:

   PROCEDURE PARSE_THING (VAR THING: STRING);
   VAR CURR_PTR, CURR_DELIMS: INTEGER;
       QUOTED: BOOLEAN;
       ...

     PROCEDURE PARSE_CURR_ELEMENT (...);
     BEGIN
     ...
     END;

   BEGIN
   ...
   PARSE_CURR_ELEMENT (...);
   ...
   END;

PARSE_CURR_ELEMENT  here is just like  a local variable of PARSE_THING
--  it's a local procedure. It's callable only from within PARSE_THING
and not from any other procedure in the program. More importantly,

   *  THE  NESTED  PROCEDURE  (PARSE_CURR_ELEMENT)  CAN ACCESS  ALL OF
     PARSE_THING'S LOCAL VARIABLES.

This is a significant consideration. If PARSE_CURR_ELEMENT didn't need
to  access  PARSE_THING's  local  variables,  not  only could  it be a
different  (non-nested)  procedure, but it probably  should be. When a
procedure is entirely self-contained, it's usually a good idea to make
it accessible to as many possible callers as possible.

   On  the other hand, what if PARSE_CURR_ELEMENT needs to interrogate
CURR_PTR  to find out where we are in parsing the thing; or look at or
modify  CURR_DELIMS  or  QUOTED or whatever  other local variables are
relevant to the operation?

   We  don't  want  to have to pass all  these values as parameters --
there could be dozens of them.

   We  don't want to make them  global variables, since they're really
only  relevant  to  PARSE_THING  -- why make  them accessible by other
procedures  that  have  no business messing  with them? (Incidentally,
making the variables global will also prevent PARSE_THING from calling
itself recursively.)

   But,   on   the   other   hand,   we  certainly  DO  want  to  have
PARSE_CURR_ELEMENT  be a procedure -- after all, we might need to call
it  many times from within PARSE_THING; surely we don't want to repeat
the code every time!

   Thus,  the  main  advantage of nested procedures  is not just that,
like  local  variables,  they  can  only be accessed  by the "nester".
Rather,  the  advantage  is the fact that  they can share the nester's
local  variables,  which  are often quite relevant  to what the nested
procedure is supposed to do.

   Another  substantial  benefit  comes  when  you pass  procedures as
parameters  to  other  procedures.  A good example of  this might be a
report writer procedure:

   TYPE LINE_TYPE = PACKED ARRAY [1..256] OF CHAR;
   PROCEDURE PRINT_LINE (VAR LINE: LINE_TYPE;
                         LINE_LEN: INTEGER;
                         PROCEDURE PAGE_HEADER (PAGENUM: INTEGER);
                         PROCEDURE PAGE_FOOTER (PAGENUM: INTEGER));

This  procedure  takes the line to be  output and its length, but also
takes  two procedures -- one that will be called in case a page header
should be printed and one in case a page footer should be printed. The
utility  of  this is obvious -- it gives  the user the power to define
his own header and footer format.

   Now, let's say we have the following procedure:

   PROCEDURE PRINT_CUST_REPORT (VAR CATEGORY: INTEGER);
   VAR CURRENT_COUNTRY: PACKED ARRAY [1..40] OF CHAR;
   ...
   BEGIN
   ...
   PRINT_LINE (OUT_LINE, OUT_LINE_LEN,
               MY_PAGE_HEAD_PROC, MY_PAGE_FOOT_PROC);
   ...
   END;

PRINT_LINE   will   output   OUT_LINE   and,   in   some  cases,  call
MY_PAGE_HEAD_PROC or MY_PAGE_FOOT_PROC. Now, it makes sense for you to
want  these  procedures  to print, say, the  current value of CATEGORY
and, perhaps, CURRENT_COUNTRY.

   In   C   and   SPL,   which   have   no   nested   procedure,  both
MY_PAGE_HEAD_PROC  and  MY_PAGE_FOOT_PROC  would  have to  be separate
procedures   which   have   no  access  to  PRINT_CUST_REPORT's  local
variables.

   The  variables  would  either  have  to  be global  (which is quite
undesirable)  or would somehow have to  be passed to PRINT_LINE, which
in turn would pass them to the MY_PAGE_xxx_PROC procedures.

   This  would  be  quite  cumbersome, since  in PRINT_CUST_REPORT the
header and footer procedures need to be passed an integer and a PACKED
ARRAY  OF  CHAR, whereas in some  other application of PRINT_LINE they
would have be to passed, say, three floats and a record structure.

   In   PASCAL,   on   the  other  hand,  both  MY_PAGE_HEAD_PROC  and
MY_PAGE_FOOT_PROC can be nested within PRINT_CUST_REPORT and thus have
access  to  CATEGORY  and  CURRENT_COMPANY  (and  all the  other local
variables   of   the   PRINT_CUST_REPORT  procedure).  Another  useful
application for nested procedures.


   C,  as I mentioned, has no nested  procedure support at all. On the
other  hand,  it  does  have #DEFINEs, which allow  you to define text
substitutions  that can often do the  job (see the section on DEFINES)
of  a nested procedure, especially if  it's a small one. For instance,
you can say:

   #define foo(x,y) \
   { \
   int a, b; \   /* variables local to THIS DEFINE */
   a = x + parm1; \    /* access a variable local to the procedure */
   b = y * parm2; \    /* (the nesting procedure) */
   x = a + b; \
   y = a * b; \
   }

As  you  can  see,  C's  support for "block-local"  variables -- local
variables  that are local not just to the procedure, but rather to the
"{"/"}"  block in which they're defined -- allows you to have #DEFINEs
that are almost as powerful as real procedures.

   SPL  allows you to have "SUBROUTINE"s nested within procedures, but
subject to some rather stringent restrictions:

   * The subroutines can have no local variables of their own. This is
     a  pretty  severe  problem,  since  it means that  all your local
     variables  have  to  be declared in  the nesting procedure, which
     increases  the  likelihood of errors and  also prohibits you from
     calling  the subroutine recursively (which you would otherwise be
     able to do).

   *  The  subroutines  can  not be passed  as procedure parameters to
     other procedures (only procedures can be -- try parsing that!).

   *  Furthermore, this nesting capability goes to only one level; you
     can  nest SUBROUTINEs in PROCEDUREs,  but you can't nest anything
     within  SUBROUTINEs.  In PASCAL, procedures  can be nested within
     each  other  to an arbitrary number  of levels. Frankly speaking,
     I'm  hard  put  to  think  of  an  application  for triply-nested
     procedures.

   Practically,  you'll  have to decide  for yourself whether PASCAL's
nested procedure support -- and C's lack of it -- is important to you.
I  brought  this issue up to a C  partisan, and she replied that she's
simply  never  run  into a case where  nested procedures were all that
important.  Upon thinking about this, I  found myself forced to agree,
at least partially:

   * #DEFINEs can do much of the job that nested procedures are needed
     for;

   *  Most  procedures should often NOT be  nested, but rather be made
     self-contained  and made available to  the world at large (rather
     than just to a particular procedure).

   *  If the reason you don't want to declare your variables as global
     is that you want to "hide" them from other procedures, you can do
     this  in C by making them "static". This will make them available
     only to the procedures in the file in which they're defined. This
     allows  you  to  share  data between procedures  (which you might
     otherwise  have wanted to nest  within each other) without making
     the data readable and modifiable by everybody.

   *  On  the  other hand, there's no denying  that there are cases in
     which  PASCAL's nested procedures are quite a bit superior to any
     C  or SPL alternative. For  instance, a recursive procedure might
     well  not be able to use  the "static global variable" approach I
     just mentioned.


                              DATA TYPES

   The  difference  most  often cited between PASCAL  and C is the way
that  they treat data types. PASCAL is often considered a "strict type
checking"  language and C a "loose type checking" language, and that's
true enough. However, the effects of this philosophical difference are
subtler and more pervasive than at first glance appears.

   What  are  data  types?  Data types can be  seen in the earliest of
languages, from FORTRAN and COBOL onwards. When you declare a variable
to  be  a  certain  data  type,  you  give certain  information to the
compiler -- information that the compiler must have to produce correct
code. Historically, this information has included:


   *  What the various operators of  the language MEAN when applied to
     the  variable.  "+", for instance, isn't  just "addition" -- when
     you add two integers, it's integer addition, and when you add two
     reals,  it's  real  addition. Two  entirely different operations,
     with  entirely different machine  language opcodes and (possibly)
     different  effects  on  the  system  state. Similarly,  a FORTRAN
     "DISPLAY X" means:

       - If X is a string, print it verbatim;

       - If X is an integer, print its ASCII representation;

       -  If  X  is  a real, print its  ASCII representation, but in a
         different format and with a different conversion mechanism.


   *  How much SPACE is to be allocated for the variable. "Array of 20
     integers" is a type, too, one from which the compiler can exactly
     deduce  how  much memory (20 words) needs  to be allocated to fit
     this data.

If  you look at SPL (and,  incidentally, FORTRAN and other languages),
you'll  find  that  all  of  its type declarations  essentially aim at
serving  these  two  functions. However, in recent  times, a few other
functions have been ascribed to type declarations:

   *  Using type declarations, the compiler can DETECT ERRORS that you
     may  make.  The  compiler  can't,  of course, figure  out if your
     program  does  "the  right thing" since it  doesn't know what the
     right  thing  is;  however, it can see  if there are any internal
     inconsistencies in your program.

     For instance, if you're multiplying two strings, the compiler can
     tag  that  as  an obvious error; similarly,  if you pass a string
     parameter to a procedure that expects an integer (or vice versa),
     a  good  compiler  will find this and save  you a lot of run-time
     debugging. The more elaborate and precise the type specifications
     you give, the more error checking the compiler can do.

     Error  checking can also be provided at run time, where code that
     knows  what size arrays are, for instance, can make sure that you
     don't inadvertently index outside them. PASCAL's "subrange types"
     do  this sort of thing, too,  allowing you to declare what values
     (e.g.  "0  to  100") a variable may  take and triggering an error
     when you try assigning it an invalid value.

   *   Furthermore,   with   a  type  declaration,  the  compiler  can
     automatically SAVE WORK for you by automatically defining special
     tools for the given type.

     The  classic  example  of  this  is  the  record structure  -- by
     declaring  the structure, you're automatically  defining a set of
     "operators"  (one for each field of the structure) that allow you
     to  easily access the structure.  Similarly, enumerated types can
     save  you  the  burden  of  having to  manually allocate distinct
     values  for each of the  elements in the enumeration (admittedly,
     not a very large burden).

     Some  fancy  compilers  can  even  automatically  define  "print"
     operations for each record structure, so that you can easily dump
     it  in  a legible format to the  terminal without having to print
     each element individually.

   *  Good  type  handling provisions can  INSULATE YOUR PROGRAMS FROM
     CHANGES  IN YOUR DATA'S INTERNAL REPRESENTATION. For instance, if
     the  compiler allows you to refer to a field of a record as, say,
     "CUSTREC.NAME"  instead  of  "CUSTREC(20)",  then you  can easily
     reformat  the insides of the  record (adding new fields, changing
     field  sizes,  etc.)  without  having  to change  all places that
     reference this record.

     Similarly,  if  your language allows  functions to return records
     and  arrays as well as scalars, you can easily change the type of
     your,  say,  disc  addresses  from  a 2-word double  integer to a
     10-word  array  of integers. In SPL,  for instance, such a change
     would  require rewriting all procedures  that want to return such
     objects or to take them as "by-value" parameters. Even changing a
     value from an "integer" to a "double integer" in SPL will require
     you to change a great deal of code.


   The  reason  I've given this list is  that SPL, PASCAL, and C place
different  weights on each of these  points, and this makes for rather
substantial differences in the way you use these languages.

    Now, away from the generalities and on to concrete examples.


                          RECORD STRUCTURES

   Consider  for  a  moment  an entry in your  "employee" data set. It
could  be a file label; it could  be a Process Control Block entry; it
could  be any chunk of memory  that contains various fields of various
data types.

   A  typical  layout  of  this employee entry  (or employee "record")
might be:

   Words 0 through 14 - The employee name (a 30-character string);
   Words 15-19 - Social security number (10-character string);
   Words 20-21 - Birthday (a double integer, just to be interesting);
   Words 22-23 - Monthly salary (a real number).

A  simple record. It's 24 words long, but it's not really an "array of
24  words";  logically  speaking, to you and  me, it's a collection of
four  objects, each of a different  type, each starting at a different
(but constant) offset within the record.

   How  do  we declare a variable to  hold this record? In FORTRAN and
SPL, it's easy:

   INTEGER ARRAY EMPREC(0:23);
     or
   INTEGER EMPREC(24)

Short  and sweet. The compiler's happy --  it knows that it's an array
of integers, which means you can extract an arbitrary element from it,
and  pass  it  to  a  procedure  (like DBGET), which  will receive its
address  as  an  integer  pointer.  This  defines to  the compiler the
MEANING  of the "indexing" and "pass to procedure" operations that can
be  done  on  EMPREC.  Also, the compiler knows  that 24 words must be
allocated for this array, as a global or local variable.

   The compiler is happy, but are you? First of all, how are you going
to access the various elements of this record structure? Are you going
to say

   EMPREC(20)

when  you mean the employee's birthday  (actually, since it's a double
integer, you couldn't even do that)?

   What  about error checking? Since all the compiler knows about this
is that it's an integer array, it'll be happy as punch to allow you to
put  it anywhere an integer array can go. Would you like to pass it as
the  database  name to DBGET instead of  as the buffer variable? Fine.
Would  you like to view it as a 4 by 5 matrix and multiply it by, say,
the department record? The computer will gladly oblige.

   Finally,  consider the burden this places  on you whenever you want
to  change  the layout of EMPREC -- say,  to increase the name from 30
characters  to  40.  You'll  have to change  all your "EMPREC(20)"s to
"EMPREC(25)",  all your "INTEGER ARRAY EMPREC(0:23)" to "INTEGER ARRAY
EMPREC (0:28)". And, of course, if you forget one or the other -- why,
the  compiler  will  be  happy  to extract the 4th  word of the social
security number and treat it as the employee's birthday!

   Of  course,  you're  not  going to do this.  You will certainly not
refer  to  all  the elements of the  record structure by their numeric
array  indices (although it so happens that most of HP's MPE code does
exactly  this). Rather, you'll say (of course, in SPL, you can also do
the same thing with DEFINEs):

   EQUATE SIZE'EMPREC = 24;
   BYTE ARRAY EMP'NAME          (*) = EMPREC(0);
   BYTE ARRAY EMP'SSN           (*) = EMPREC(15);
   DOUBLE ARRAY EMP'BIRTHDATE   (*) = EMPREC(20);
   REAL ARRAY EMP'SALARY        (*) = EMPREC(22);
   [Note: The fact that we define, say, EMP'BIRTHDATE and
   EMP'SALARY as arrays isn't a problem.  If we say EMP'SALARY
   with no subscript, it'll refer to the 0th element of this
   "array", which is exactly what we want it to do.]

   FORTRAN  is  similar  (you'd  use  an EQUIVALENCE); COBOL  is a bit
simpler,  allowing  you  to  say (remembering that  COBOL doesn't have
REALs).

   01 EMPREC.
      05 NAME          PIC X(30).
      05 SSN           PIC X(10).
      05 BIRTHDATE     PIC S9(9) COMP.
      05 SALARY        PIC S9(5)V9(2) COMP-3.

As  you  see,  COBOL at least has  the advantage that it automatically
calculates  the  indexes  of  each  subfield  for  you. This  is nice,
especially  when  you  change  the structure,  reshuffling, inserting,
deleting,  or resizing fields. On the other hand, I wouldn't call this
a  very  substantial  feature, especially since  sometimes you WANT to
manually  specify the field offsets  (whenever the record structure is
not under your control, like, say, an MPE file label).

   To  summarize,  this "EQUIVALENCE"ing approach  that's available in
SPL,  FORTRAN, and COBOL saves you from the very substantial bother of
having to hardcode the offsets of all the subfields into your program.
This is certainly a good thing; however, PASCAL and C go substantially
beyond this.

   The  most serious problem with  what I'll call the "EQUIVALENCE"ing
approach  is a rather subtle one, one  that I didn't realize until I'd
used it for some time.

   The  definitions  we  saw  above  --  in SPL, FORTRAN,  or COBOL --
defined  several variables as subfields  of another variable. EMP'NAME
and  EMP'SSN are subfields of EMPREC. What  if we need to declare this
EMPREC twice -- say, in two different procedures?

   Clearly  we  don't want to have to  repeat the EQUIVALENCEs in each
procedure.  Yet what choice do we have? We might, for instance, set up
each  of  the subfields as a DEFINE  instead of an equivalence, making
the DEFINEs available in all the procedures that reference EMPREC:

   DEFINE EMP'NAME          = EMPREC(0) #;
   DEFINE EMP'SSN           = EMPREC(15) #;
   DEFINE EMP'BIRTHDATE     = EMPREC(20) #;
   DEFINE EMP'SALARY        = EMPREC(22) #;

but then, since DEFINEs are merely text substitutions and EMPREC is an
integer  array, each EMP'xxx will also  be an integer array. We'd have
to say

   BYTE ARRAY EMPREC'B(*)=EMPREC;
   DOUBLE ARRAY EMPREC'D(*)=EMPREC;
   REAL ARRAY EMPREC'R(*)=EMPREC;

in each procedure that defines an EMPREC array, and a

   DEFINE EMP'NAME          = EMPREC'B(0) #;
   DEFINE EMP'SSN           = EMPREC'B(15) #;
   DEFINE EMP'BIRTHDATE     = EMPREC'D(20) #;
   DEFINE EMP'SALARY        = EMPREC'R(22) #;

at  the  beginning  of  the program. Still, we'd  have had to have the
defines  of the BYTE ARRAY, DOUBLE ARRAY, and REAL ARRAY repeated once
for  each  declaration  of  EMPREC;  and, what if we  want to call the
record  something  else,  like  have  two  records called  EMPREC1 and
EMPREC2?

   *  THE PROBLEM WITH DEFINING SUBFIELDS  OF A RECORD STRUCTURE USING
     THE  "EQUIVALENCING" APPROACH IS THAT IT DEFINES THE SUBFIELDS OF
     ONLY ONE RECORD STRUCTURE VARIABLE.

     WHAT  WE WANT IS TO DEFINE A GENERALIZED "TEMPLATE" ONCE AND THEN
     APPLY THIS TEMPLATE TO EACH RECORD STRUCTURE VARIABLE WE USE.

In other words, we want to be able to say

   DEFINE'TYPE EMPLOYEE'REC (SIZE 24)
     BEGIN
     BYTE ARRAY NAME          (*) = RECORD(0);
     BYTE ARRAY SSN           (*) = RECORD(15);
     DOUBLE ARRAY BIRTHDATE   (*) = RECORD(20);
     REAL ARRAY SALARY        (*) = RECORD(22);
     END;

and then declare any particular employee record buffer by saying:

   EMPLOYEE'REC EMPREC1;
   EMPLOYEE'REC EMPREC2;

Then,  we  could  extract each individual subfield  of the record like
this:

   NEW'SALARY := EMPREC1.SALARY * 1.1;

The point here is that

   * IN ADDITION TO NOT HAVING TO EXPLICITLY SPECIFY THE OFFSET OF THE
     SUBFIELD  OF THE RECORD (like having  to say RECORD(22), an awful
     thing  to  do),  WE  CAN  NOW  DEFINE  THE  LAYOUT OF  THE RECORD
     STRUCTURE  ONCE,  REGARDLESS  OF  HOW  MANY  VARIABLES  WITH THAT
     STRUCTURE WE WANT TO DECLARE.

Do you see how nicely this dovetails with the "INSULATING YOUR PROGRAM
FROM  CHANGING  INTERNAL REPRESENTATION" principle  we gave above? The
record structure layout is defined in EXACTLY ONE PLACE in the program
file.  We can have a hundred different  variables of this type -- none
of  them  will have to specify the physical  size of the buffer or the
offsets  of the subfields. Each one will merely refer back to the type
declaration.

   Also,  we've now announced EMPREC1 to  the compiler as being of the
special  "EMPLOYEE'REC"  type. It's no longer  a simple INTEGER ARRAY,
just  like  any  other  integer  array.  Conceivably, if  we declare a
procedure to be

   PROCEDURE PUT'EMPLOYEE (DBNAME, EMPREC, FRAMASTAT);
   INTEGER ARRAY DBNAME;
   EMPLOYEE'REC EMPREC;
   INTEGER FRAMASTAT;
   ...

the compiler can warn us that

   EMPLOYEE'REC EMPREC;
   INTEGER ARRAY DBNAME;
   INTEGER FOOBAR;
   ...
   PUT'EMPLOYEE (EMPREC, DBNAME, FOOBAR);

is  an invalid call -- it sees  that an object of type EMPLOYEE'REC is
being  passed  in  place of an INTEGER ARRAY,  and an INTEGER ARRAY is
being passed in place of an EMPLOYEE'REC. Without this error checking,
you'd  have  to  find this problem yourself  at run-time, a distinctly
more difficult task.


                  RECORD STRUCTURES IN PASCAL AND C

   What I just gave is the rationale for record structures, mostly for
the  benefit of SPL programmers who  haven't used PASCAL and C before.
Of  course,  the  only  reason I gave it is  that PASCAL and C do have
record  structure support, remarkably similar  support at that. Here's
the way you declare a structure data type in PASCAL:

   { "PACKED ARRAY OF CHAR"s are PASCAL strings }
   TYPE EMP_RECORD = RECORD
                     NAME: PACKED ARRAY [1..30] OF CHAR;
                     SSN: PACKED ARRAY [1..10] OF CHAR;
                     BIRTHDATE: INTEGER;  { really a double integer }
                     SALARY: REAL;
                     END;
   ...
   VAR
     EMPREC: EMP_RECORD;   { declare a variable called "EMPREC" }

And in C:

   typedef
     struct {char name[30];
             char ssn[10];
             long int birthdate;
             float salary;
            }
     emp_record;
   ...
   emp_record emprec;   /* declare a variable called "emprec" */

You  can  see  the  minor differences -- the  type names are different
("float"  instead  of "REAL", "long int"  to mean double integer); the
type name comes at the end of the "typedef"; the newly defined type is
used a "statement" all its own rather than as part of a VAR statement;
and,   of  course,  everything's  written  in  those  CUTE  lower-case
characters.  In  essence,  of  course,  the constructs  are absolutely
identical.

   The use is identical, as well:

   NEW_SALARY := EMPREC.SALARY * 1.1;
   new_salary = emprec.salary * 1.1;

Incidentally,  if we didn't want to define a new type, but rather just
wanted  to  define  one  variable of a given  structure, we could have
said:

   VAR EMPREC: RECORD
               NAME: PACKED ARRAY [1..30] OF CHAR;
               SSN: PACKED ARRAY [1..10] OF CHAR;
               BIRTHDATE: INTEGER;  { really a double integer }
               SALARY: REAL;
               END;

   struct {char name[30];
           char ssn[10];
           long int birthdate;
           float salary;
          }
     emprec;

Note  how the type declaration is very much like the original variable
declaration.

   So,  declaring  and using record structures  is identical in PASCAL
and C. However, there's a VERY BIG DIFFERENCE between PASCAL and C.

   *  In  PASCAL, strict type checking is  more than just a good idea,
     it's the LAW.

     If  a  function  parameter is declared  as type EMPLOYEE_REC, any
     function  call to it must pass an object of that type. Even if it
     passes  a  record structure that's defined  with exactly the same
     fields  but  with  a  different  type  name  (admittedly  a  rare
     occurrence), the compiler will cough.

     Any structure parameter must be of EXACTLY THE RIGHT TYPE.


   *  Many  C  programmers view strict type checking  much as you or I
     might  view,  say, the Gestapo or the  KGB. Kernighan & Ritchie C
     compilers DO NOT do type checking.

     In  fact, in Kernighan & Ritchie C, you can pass a string where a
     real  number is expected, and the  compiler won't say a word! (On
     the other hand, your program is unlikely to work right.)


I could fault C for this, treating C's lack of type checking much as I
do,  say,  SPL's  lack of an easy I/O  facility. The trouble is that C
programmers  don't  think  that  lack of type checking  is a bug; they
think  it's  a  feature. The problem is  philosophical -- what are the
benefits of type checking and do they outweigh the drawbacks?


      TYPE CHECKING -- ORIGINAL STANDARD PASCAL AND PASCAL/3000

   Earlier  in the paper I brought  up a certain point. Compilers that
know  the type of variables can, I  said, check your code to make sure
that you're not using types inconsistently.

   For  instance,  if  you use a character when  you should be using a
real  number, that's an "obvious error" and  the compiler can do you a
favor  by  complaining  at  compile-time.  Similarly,  if you  pass an
employee  record  to a procedure that  expects a database name, that's
also an error, and should also be reported.

   Now,  this  principle  is  in many ways at  the heart of the PASCAL
language.  And,  certainly, everyone will agree  that it would be good
for the compiler to find errors in your program rather than making you
do it yourself. The question is --

   IS A COMPILER WISE ENOUGH TO DETERMINE WHAT IS AN ERROR AND WHAT IS
   NOT?

   For instance, say you write

   VAR CH: INTEGER;
   IF 'a'<=CH AND CH<='z' THEN
     CH:=CH-'a'+'A';

Utterly  awful!  We have what -- to PASCAL,  at least -- is at least 4
type  inconsistencies; we're comparing an  integer against a character
two  times,  and  then  we're  adding  and subtracting  characters and
integers! Obviously an error.

   Actually,  of  course,  this  code takes CH, which  it assumes is a
character's  ASCII  code,  and  upshifts it. If it  finds that CH is a
lower  case character, it shifts it  into the upper case character set
by subtracting 'a' and adding 'A'.

   Some  might complain that this code  is not portable (it won't, for
instance,  work  on  EBCDIC  machines),  but that's  not relevant. The
programmer  has a perfect right to assume that the code will run on an
ASCII machine; you mustn't ram portability down his throat. Sometimes,
it's  very useful to be able to, say, treat characters as integers and
vice versa.

   Now,  before anybody accuses me of  slandering PASCAL, I must point
out  that  the  solution  is  readily available. Pascal  can convert a
character  to an integer using the "ORD" function, and an integer to a
character using "CHR"; our code could easily be re-written:

   VAR CH: INTEGER;
   IF ORD('a')<=CH AND CH<=ORD('z') THEN
     CH:=CH-ORD('a')+ORD('A');

The  important  point  here  is  not  whether  or not  you can upshift
characters; the important fact is that:

   *  SOMETIMES  A  PROGRAMMER MAY CONSCIOUSLY WANT  TO DO THINGS THAT
     MIGHT USUALLY BE VIEWED AS TYPE INCOMPATIBILITIES.

   Consider, for a moment, the following application:

   * You want to write a procedure that adds a record to the database.
     Unlike  DBPUT,  this one should just  take the database name, the
     dataset  name,  and  the  buffer,  and do all  the error checking
     itself.

Sounds simple, no? You write:

   TYPE TDATABASE = PACKED ARRAY [1..30] OF CHAR;
        TDATASET = PACKED ARRAY [1..16] OF CHAR;
        TRECORD = ???;
   ...
   PROCEDURE PUT_REC (VAR DB: TDATABASE;
                      S: TDATASET;
                      VAR REC: TRECORD);
   BEGIN
   ...
   END;

BUT HOW DO YOU DEFINE "TRECORD"?

   Remember  why I said that type  checking is such a wonderful thing.
After  all, if a procedure expects a "customer record" and you pass it
an "employee record", you want the compiler to complain.

   But what if the procedure expects ANY kind of record? What if it'll
be  perfectly  HAPPY  to  take  an employee record,  a sales record, a
database name, or a 10 x 10 real matrix? How should the compiler react
then?

   Unfortunately,  PASCAL,  with all its  sophisticated type checking,
falls  flat  on  its  face  (this is true of  both Standard PASCAL and
PASCAL/3000).

   At  this point, in the interest  of fairness (and for the practical
use  of  those  who  HAVE to do this sort  of thing in PASCAL), I must
point  out  that  PASCAL  does have a  mechanism for supporting record
structures  of  different  types.  The  trick  is to  use a degenerate
variation  of  the  record  structure called the  "tagless variant" or
"union"  structure. It's quite similar  to EQUIVALENCE in FORTRAN, but
even uglier.

   To put it briefly, you have to say the following:

   TYPE TANY_RECORD =
        RECORD
          CASE 1..5 OF
            1: (EMP_CASE: TEMPLOYEE_RECORD);
            2: (CUST_CASE: TCUSTOMER_RECORD);
            3: (VENDOR_CASE: TVENDOR_RECORD);
            4: (INV_CASE: TINVOICE_RECORD);
            5: (DEPT_CASE: TDEPARTMENT_RECORD);
        END;

This defines the type "TANY_RECORD" to be a record structure which can
be looked at in one of FIVE different ways:

   *   As  having  one  field  called  "EMP_CASE"  which  is  of  type
     "TEMPLOYEE_RECORD".

   *  As  having  one  field  called  "CUST_CASE"  which  is  of  type
     "TCUSTOMER_RECORD".

   *  Or,  as  having  one field called  "VENDOR_CASE", "INV_CASE", or
     "DEPT_CASE",     which     is     of    type    "TVENDOR_RECORD",
     "TINVOICE_RECORD", or "TDEPARTMENT_RECORD", respectively. You get
     the idea.

If  you  declare a variable of  type "TANY_RECORD", it'll be allocated
with enough room for the largest of the component datatypes. Then, you
can  make  the variable "look" like any  one of these records by using
the appropriate subfield:

   VAR R: TANY_RECORD;
   ...
   WRITELN (R.EMP_CASE.NAME);   { views R as an employee record }
   WRITELN (R.DEPT_CASE.DEPTHEAD);  { views R as a dept record }
   WRITELN (R.INV_CASE.AMOUNT);   { views R as an invoice record }

In  other  words,  an  object  of  type  TANY_RECORD is  actually five
different record structures "equivalenced" together; which one you get
depends on which ".xxx_CASE" subfield you use.

   Got  all  that?  Now,  here's  how you define  and call the PUT_REC
procedure:

   PROCEDURE PUT_REC (VAR DB: TDATABASE;
                      S: TDATASET;
                      VAR REC: TANY_RECORD);
   BEGIN
   ...
   END;
   ...
   { now, all dataset records you need to pass must be declared to }
   { be of type TANY_RECORD. }
   READLN (R.EMP_CASE.NAME, R.EMP_CASE.SSN);
   R.EMP_CASE.BIRTHDATE := 022968;
   R.EMP_CASE.SALARY := MINIMUM_WAGE - 1.00;
   PUT_REC (MY_DB, EMP_DATASET, R);

You  must  declare ALL YOUR DATASET RECORDS  to be of type TANY_RECORD
(wasting  space  if,  say,  TDEPARTMENT_RECORD  is  10 bytes  long and
TINVOICE_RECORD  is  200 bytes long); you must  refer to them with the
appropriate  ".xxx_CASE" subfield; then, you must pass the TANY_RECORD
to  PUT_REC.  (Alternately, you may have  one "working area" record of
type  TANY_RECORD  and  move the record you  want into the appropriate
subfield of this "working area" record before calling PUT_REC.)

   As  you  may  have guessed, I think this  is a very poor workaround
indeed:

   * You need to specify in the TANY_RECORD declaration every possible
     type that you'll ever want to pass to PUT_REC;

   *  You have to declare any record you want to pass to PUT_REC to be
     of type TANY_RECORD, even if it wastes space.

   *  If  you  don't want to use a  "working area" record, you have to
     refer to all your records as "R.EMP_CASE" or "R.DEPT_CASE" rather
     than  just defining R as the appropriate type and referring to it
     just as "R".

   * If you do use a "working area" record, to wit:

       VAR WORK_RECORD: TANY_RECORD;
           EMP_REC: TEMPLOYEE_RECORD;
       ...
       READLN (EMP_REC.NAME, EMP_REC.SSN);
       EMP_REC.BIRTHDATE := 022968;
       EMP_REC.SALARY := MINIMUM_WAGE - 1.00;
       WORK_RECORD.EMP_CASE := EMP_REC;
       PUT_REC (MY_DB, EMP_DATASET, WORK_RECORD);

     then  you  have  to  move your data into  it before every PUT_REC
     call, which is both ugly and inefficient.

And  why?  All  because  PASCAL isn't flexible enough  to allow you to
declare a parameter to be of "any type".

   A  couple  more  examples  of  cases where strict  type checking is
utterly lethal may be in order:

   *  Say that you want to write  a procedure that compares two PACKED
     ARRAY  OF  CHARs  (in Standard PASCAL, these  are the only way of
     representing   strings).  You  must  define  the  types  of  your
     parameters, INCLUDING THE PARAMETER LENGTHS! In other words,

       TYPE TPAC = PACKED ARRAY [1..256] OF CHAR;
       VAR P1: PACKED ARRAY [1..80] OF CHAR;
           P2: PACKED ARRAY [1..80] OF CHAR;
       ...
       FUNCTION STRCOMPARE (VAR X1: TPAC; VAR X2: TPAC): BOOLEAN;
       BEGIN
       ...
       END;
       ...
       IF STRCOMPARE (P1, P2) THEN ...

     is  ILLEGAL. P1, you see, is an 80-character string, which is not
     compatible  with the function parameter, which is a 256-character
     string.

   *  Say that you want to write  a procedure like WRITELN, which will
     format  data of various types. WRITELN  may not be sufficient for
     your  needs  --  you  might  need  to  be able  to output numbers
     zerofilled or in octal, you might want to provide for page breaks
     and  line  wraparound,  etc.  Surely you should  be allowed to do
     this!

     Well,  first  of  all,  you  can't  have  a  variable  number  of
     parameters.  But,  even  if you're willing to  have a maximum of,
     say, 10 parameters and pad the list with 0s, your parameters must
     all be of fixed types!

     Thus,  even if your design calls for some kind of "format string"
     that'll  tell  your  WRITELN-replacement what the  actual type of
     each  parameter is, you can't do anything. You must either have a
     procedure  for each possible type  combination (one to output two
     integers  and  a  string,  one to output a  real, an integer, and
     three  strings,  etc.),  or  have  the procedure  only output one
     entity at a time. This way, you'll have to write:

       PRINTS ('THE RESULT WAS ');
       PRINTI (ACTUAL);
       PRINTS (' OUT OF A MAXIMUM ');
       PRINTI (MAXIMUM);
       PRINTS (', WHICH WAS ');
       PRINTR (ACTUAL/MAXIMUM*100);
       PRINTS ('%');
       PRINTLN;

     instead of

     PRINTF ('THE RESULT WAS %d OUT OF A MAXIMUM %d, WHICH WAS %f',
             ACTUAL, MAXIMUM, ACTUAL/MAXIMUM*100);

   *  Finally  --  although  it should be obvious  by now -- you can't
     write,  say,  a matrix inversion function  that takes any kind of
     matrix.  You  could  write a 2x2 inverter,  a 3x3 inverter, a 4x4
     inverter,  and  so  on. You could also  write a matrix multiplier
     that  multiplies  2x2s  by 2x2s, another that  does 2x2s by 2x3s,
     another  2x2s  by 2x4s, another 3x2s by  2x2s, .... Just think of
     the job security you'll have!

   For  fairness's  sake,  I must admit that  this problem is SLIGHTLY
mitigated in PASCAL/3000.

   PASCAL/3000  has  a "STRING" data type,  which is a variable-length
string  (as  opposed to PACKED ARRAY OF  CHAR, which is a fixed-length
string).   In   other   words,  PASCAL/3000  STRINGs  are  essentially
(internally)  record structures, containing an  integer -- the current
string length -- and a PACKED ARRAY OF CHAR -- the string data.

   When HP implemented this, they were good enough to make all STRINGs
--  regardless of their maximum sizes -- "assignment- compatible" with
each other. This means that you can say:

   VAR STR1: STRING[80];
       STR2: STRING[256];
   ...
   STR1:=STR2;

and also

   TYPE TSTR256 = STRING[256];
   VAR S: STRING[80];
   ...
   FUNCTION FIRST_NON_BLANK (PARM: TSTR256): INTEGER;
   BEGIN
   ...
   END;
   ...
   I := FIRST_NON_BLANK (S);

Since  STRING[80]s  (strings with maximum  length 80) and STRING[256]s
(strings  with maximum length 256) are assignment- compatible, you may
both  directly assign them (STR1:=STR2) and pass one by value in place
of another (PROC(S)).

   Although  "assignment  compatibility"  allows  by-value  passing, a
variable  passed by reference still has to be of exactly the same type
as the formal parameter specified in the procedure's header. Thus,

   TYPE TSTR256 = STRING[256];
   VAR S: STRING[80];
   ...
   FUNCTION FIRST_NON_BLANK (VAR PARM: TSTR256): INTEGER;
   BEGIN
   ...
   END;
   ...
   I := FIRST_NON_BLANK (S);

is  still  illegal, since STRING[80]s can't  be passed to by-reference
(VAR)  parameters  of type STRING[256].  Fortunately, PASCAL/3000 also
lets you say:

   FUNCTION FIRST_NON_BLANK (VAR PARM: STRING): INTEGER;

Specifying  a type of "STRING"  rather than "STRING[maxlength]" allows
you to pass any string in place of the parameter.

   This  only works for STRING parameters.  It doesn't work for PACKED
ARRAYs  OF CHAR; it doesn't work  for other array structures; it isn't
supported by Standard PASCAL. However, for the specific case of string
manipulation,  you  can get around some  of PASCAL's onerous parameter
type checking restrictions.

   Remember also that this is strictly an PASCAL/3000 (PASCAL/3000 and
PASCAL/XL)  feature,  and  can  not  be relied on  in any other PASCAL
compiler.


                TYPE CHECKING -- KERNIGHAN & RITCHIE C

   Where  PASCAL insists on checking all  parameters for an exact type
match,  original  -- Kernighan & Ritchie  -- C takes the diametrically
opposite view.

   Classic  C  checks  NOTHING. It does not  check parameter types; it
does  not even check the number of parameters. All data in C is passed
"by  value", which means that the  value of the expression you specify
is pushed onto the parameter stack for the called procedure to use; if
you want to pass a variable "by reference" -- pushing its pointer onto
the  stack  -- you have to use the  "&" operator to get the variable's
address, to wit:

   myproc (&result, parm1, parm2);

If  you  omit  the  "&",  or specify it when  you shouldn't -- well, C
doesn't check for this, either.

   Much  can  be  said about the philosophical  reasons that C is this
way;  many labels, from "flexibility"  to "cussedness" can be attached
to  it.  The fact of the matter, though,  is that K&R C -- which means
many,  if  not  most,  of  today's C compilers --  doesn't do any type
checking.

   The  effects of this, of course, are the opposite of the effects of
PASCAL's strong type checking:

   *  You have almost complete flexibility in what types you pass to a
     procedure.  In two different calls, the same parameter can be one
     of two entirely different record structures; one of two character
     or  integer  arrays  of entirely different  lengths (C doesn't do
     run-time bounds checking, anyway); a real in one call, an integer
     in another, and a pointer in a third.

     Practically, virtually all of the examples I showed in the PASCAL
     chapter can thus be implemented in C. For instance,

       int strcompare(s1,s2,len)
       char *s1, *s2;
       int len;
       {
       int i;
       i = 0;
       while ((i < len) && (s1[i] == s2[i]))
         i = i+1;
       }

     will  merrily  compare two character  arrays, no questions asked.
     You  can  pass arrays of any size, and  it'll do the job. You can
     pass  integers,  reals, integer arrays,  whatever; of course, the
     code  isn't  likely  to  work,  but, hey, it's  a free country --
     nobody'll stop you.

   *  In most implementations of K&R C,  you're even allowed to pass a
     different  number  of  parameters than the  function was declared
     with.  Though  this is not guaranteed  portable, most C compilers
     make  sure  that if, say, your  procedure's formal parameters are
     "a", "b", and "c" (all integers) and you actually pass the values
     "1"  and "2", then A will be set to 1, B to 2, and C will contain
     garbage (that's "C" the variable, not "C" the language).

     This  is good because it allows you to write procedures that take
     a  variable  number  of parameters; as long as  you have a way of
     finding  out  how many parameters were  actually passed (e.g. the
     PRINTF   format   string),   your   procedure   can  handle  them
     accordingly.

   *  On the other hand, say you make a mistake in a procedure call --
     you  pass  a  real  instead  of an integer, a  value instead of a
     pointer,  or  perhaps  even omit a  parameter. The compiler won't
     check  this; the only way you'll find the error is by running the
     program, and even then the erroneous results may first appear far
     away from the real error.

     Some  C compilers (especially on UNIX) come with a program called
     LINT  that can check for this  error and others, but that's often
     not  enough.  First of all, your programmers  have to run LINT as
     well  as  C  for  each program, which  slows down the compilation
     pass;  more importantly, since LINT is no way part of standard C,
     many C compilers don't have it.

     VAX/VMS C, for instance, doesn't come with LINT; neither does the
     CCS C that's available on the HP3000.

   *  Similarly,  even  things  that  seem like they  ought to work --
     passing  an  integer  in  place of a real  and expecting it to be
     reasonably converted -- will fail badly. Thus,

       sqrt(100)

     won't  work  if  SQRT  expects  a  real; C won't  realize that an
     integer-to-real conversion is required, and will thus pass 100 as
     an integer, which is a different thing altogether.

     A  similar  problem  occurs  on computers (like  the HP3000) that
     represent  byte  pointers  (type  "char *")  and integer pointers
     (type  "int  *"  and  other  pointer types)  differently. Since C
     doesn't  know  which  type of pointer  a procedure expects, it'll
     never  do conversions. If you call a procedure like FGETINFO that
     expects  byte pointers and pass it  an integer pointer, you'll be
     in trouble (unless you manually cast the pointer yourself).

     Incidentally,   for   ease   of   using   real  numbers,  C  will
     automatically convert all "single-precision real" (called "float"
     in  C) arguments to "double-precision  real" ("long") in function
     calls.  This makes sure that if SQRT expects a "long", passing it
     a "float" won't confuse it.

   *  On  the  other  hand  (how  many  hands  am  I up  to now?), C's
     conversion  woes  -- requirements of  passing "float"s instead of
     "int"s,  "char  *"s  instead  of "int *"s, etc.  -- are easier to
     solve  than  in  PASCAL.  Since C allows you  to easily convert a
     value  from  one  datatype to another  (using the so-called "type
     casts"), you could say

       my_proc ((float) 100, (char *) &int_value);

     and  thus  pass a "float" and a  "char *" to "my_proc". In PASCAL
     you   couldn't   do   things  this  easily.  The  compiler  might
     automatically translate an integer to a float for you; but, if it
     expects  a  character  value  and  all you've got  is an integer,
     there's no easy way for you to tell it "just pass this integer as
     a byte address, I know what I'm doing."

   Thus,  K&R C is flexible enough to  do all that Standard PASCAL can
not. If this is necessary to you -- and I can easily understand why it
would  be; Standard PASCAL's restrictions are very substantial -- then
you'll  have  to  live  with C's lack of  error checking. On the other
hand,  if flexibility is of less than  critical value, you have to ask
yourself  whether  or  not you want the  extra level of compiler error
checking that PASCAL can provide you.

   My  personal experience, incidentally, has been that compiler error
checking of parameters is very nice, but not absolutely necessary. I'd
love  to  have  the  compiler  find  my bugs for me,  but I can muddle
through  without it. PASCAL's  restrictions, though, are substantially
more  grave. More than inconveniences,  they can make certain problems
almost impossible to solve.


                        DRAFT ANSI STANDARD C

   Time,  it  is  said,  heals  all  wounds; perhaps it  can also heal
wounded  computer languages. God knows, FORTRAN 77 isn't the greatest,
but it sure is better than FORTRAN IV.

   The  framers  of  the  new  Draft  ANSI Standard  C have apparently
thought  about  some  of the problems that  C has, especially the ones
with  function  call  parameter checking and  conversion. The solution
seems to be quite good, letting you impose either strict or loose type
checking  --  whichever you prefer --  for each procedure or procedure
parameter. Remember, though, the standard is still only Draft, so it's
not  unlikely  that  any given C compiler you  might want to use won't
have it.


   In Draft Standard C, you can do one of two things:

   *  You can call a procedure the same old way that you'd do in K & R
     C.  No  type  checking, no automatic  conversion, no nothin'. You
     might declare its result type, to wit:

       extern float sqrt();

     (Remember,  you'd have to do that anyway in K&R C; otherwise, the
     compiler  will  treat SQRT's result as  an integer.) But no other
     declarations are required, and no checking will be done.


   *  Alternatively, you can declare a FUNCTION PROTOTYPE. This can be
     done  either for an external function  or for one you're defining
     --  the  prototype  is  very much like  PASCAL's procedure header
     declaration. A sample might be:

       extern int ASCII (int val, int base, char *buffer);

     or simply

       extern int ASCII (int, int, char *);

     [Note  that  the  parameter  NAMES, as opposed  to TYPES, are not
     necessary in a prototype for an EXTERNAL function. For a function
     that  you're  actually  defining,  the  names are  necessary; the
     declarations  in  the  prototype  are  used in place  of the type
     declarations   that  you'd  normally  specify  for  the  function
     parameters.]

     This  function  prototype  tells  the  compiler enough  about the
     function  parameters  for  it  to be able  to do appropriate type
     checking  and  conversion.  One of the reasons  K&R C couldn't do
     that is precisely because of the lack of this information.

Consider  the  cases where this would come  in handy. We might declare
SQRT as

   extern float sqrt (float);

and then a call like

   sqrt (100)

would  automatically be taken to mean "sqrt ((float) 100)", i.e. "sqrt
(100.0)". Similarly,

   sqrt (100, 200)

or

   sqrt ()

would  cause a compiler error or warning, since now the compiler KNOWS
that SQRT takes exactly one parameter.

   In general, say that you have a function declared as

   extern int f(formaltype);   /* or non-extern, for that matter */

This  simply  means  that "f" is a function  that returns an "int" and
takes  one  parameter  of  type "formaltype". Now,  say that your code
looks like:

   actualtype x;
   ...
   i = f(x);

Is  this  kind  of  call  valid or not? Of  course, it depends on what
"formaltype" and "actualtype" are:

   *  If  both  FORMALTYPE  and ACTUALTYPE are  numbers -- integers or
     floats,  short,  long,  or  whatever  --  then X  is converted to
     ACTUALTYPE before the call. This is what lets us say

        sqrt(100)

     when "sqrt" is declared to take a parameter of type "real".

     (The  same goes the other way -- if "mod" is declared to take two
     "int"s,  then "mod(10.5,3.2)" would  be converted to "mod(10,3)",
     although  the  compiler might print a  warning message to caution
     you that a truncation is taking place.)


   *  If  FORMALTYPE  is  a  pointer  --  which  is  the case  for all
     "by-reference"  parameters,  since  that's how we  pass things by
     reference  in C -- then ACTUALTYPE must be EXACTLY the same type.
     In other words, if we say:

        int copystring (char *src, char *dest)

     then in the call

        char x;
        int y;
        ...
        copystring (x, &y);

     BOTH  parameters will cause an error message. The first parameter
     will  be  a "CHAR" passed where a  "CHAR *" is expected, which is
     illegal -- a good way of checking for attempts to pass parameters
     by  value  where by-reference was  expected. The second parameter
     will  be an "INT *" passed where a "CHAR *" is expected, which is
     also  illegal, since although both are pointers, they don't point
     to the same type of thing.

   *  If  ACTUALTYPE  is  a  pointer,  then FORMALTYPE must  also be a
     pointer  of  EXACTLY  the  same  type. Again, this  is useful for
     catching attempts to pass "by-reference" calls to procedures that
     expect "by-value" parameters, and also attempts to pass a pointer
     to the wrong type of object.

   *  If  either ACTUALTYPE or FORMALTYPE is  a pointer of the special
     type  "void  *",  then the other one may  be any type of pointer.
     This is very useful when we want a parameter to be a BY-REFERENCE
     parameter  of some arbitrary type (similar to PASCAL/XL's ANYVAR,
     for  which  see  below). Thus, if we  want to write our "put_rec"
     procedure  that'll  put  any  type  of  record  structure  into a
     database, we'd say:

        put_rec (char *dbname, char *dbset, void *rec)

     Then, we could say:

        typedef struct {...} sales_rec_type;
        typedef struct {...} emp_rec_type;
        ...
        sales_rec_type srec;
        emp_rec_type erec;
        ...
        put_rec (mydb, sales_set, &srec);
        ...
        put_rec (mydb, emp_set, &erec);

     Both  of  the  PUT_REC  calls  are  valid since  both "&srec" and
     "&erec"  (and, for that matter, any  other pointer) can be passed
     in place of a "void *" parameter. If we'd declared "put_rec" as:

        put_rec (char *dbname, char *dbset, sales_rec_type *rec)

     then  the  "put_rec  (mydb,  emp_set,  &erec)" call  would NOT be
     legal, sinec "&erec" is NOT compatible with "sales_rec_type *".

     Note  that  on  some machines -- including  the HP3000 -- integer
     pointers and character pointers are NOT represented the same way.
     However, it's always safe to pass either a "char *" or an "int *"
     in  place  of  a  parameter that's declared as  a "void *". The C
     compiler  will always do the  appropriate conversion; thus, if we
     declare the ASCII intrinsic as

        extern int ASCII (int, int, void *);

     then both of the calls below:

        char *cptr;
        int *iptr;
        ...
        i = ASCII (num, 10, cptr);
        ...
        i = ASCII (num, 10, iptr);

     will  be valid (assuming that a  "void *" is actually represented
     as  a byte pointer, which is what the ASCII intrinsic wants). You
     can  thus  think  of  "void  *"  as the "most  general type"; any
     pointer can be successfully passed to a "void *".

   * Note that although you CAN'T pass, say, a "char *" to a parameter
     of  type "int *", C will ignore the SIZE of the array the pointer
     to which is being passed. In other words, a function such as

        extern strlen (char *s);

     may  be  passed a pointer to a string  of any size -- both of the
     following calls:

        char s1[80], s2[256];
        ...
        i = strlen (s1);
        i = strlen (s2);

     are  valid.  Remember  that  C  makes  no  distinction  between a
     "pointer  to  an  80-byte  array"  and  a "pointer  to a 256-byte
     array";  similarly, it makes no distinction between an array like
     "s1" and a "pointer to a character" (see below).

   *  An interesting exception to the  above rules is that the integer
     constant  0  can  be  passed  to  ANY pointer  parameter. This is
     because  a pointer with value 0  is conventionally used to mean a
     "null pointer".

     This  is quite useful in some applications, but can often prevent
     the compiler from detecting some errors. If I say:

        extern PRINT (int *buffer, int len, int cctl);
        ...
        PRINT (0, -10, current_cctl);

     this  won't, of course, print a "0"; rather, it'll pass PRINT the
     integer  pointer "0", which will point  to God knows what in your
     stack.  Not  a  very serious problem, but  something you ought to
     keep in mind.

   * Unlike Standard PASCAL, not only can you entirely waive parameter
     checking  for a procedure (just omit the prototype!), but you can
     also  explicitly CAST an actual parameter whenever you want it to
     match  the  type of a formal parameter.  In other words, say that
     you declare two structure types:

        typedef struct {...} rec_a;
        typedef struct {...} rec_b;
        rec_a ra;   /* declare a variable of type "rec_a" */
        rec_b rb;   /* declare a variable of type "rec_b" */

     and then write a function

        process_record_a (int x, int y, rec_a *r)
        {
        ...
        }

     If you then say

        process_record_a (10, 20, &rb);

     then  the compiler will (quite  properly) print an error message,
     since  you were trying to pass a  "pointer to rec_b" instead of a
     "pointer  to  rec_a". If you really want  to do this, though, all
     you need to do is say:

        process_record_a (10, 20, (rec_a *) &rb);

     manually  CASTING the pointer "&rb" to  be of type "rec_a *", and
     the compiler won't mind.

   *  Finally,  let  me also point out that,  like everywhere in C, an
     "array  of T" and a "pointer  to T" are mutually interchangeable.
     In other words, if you say:

        extern int string_compare (char *s1, char *s2);

     and then call it as:

        char str1[80], str2[256];
        ...
        if (string_compare (str1, str2)) ...

     the  compiler  won't  mind. To it a "char  *" and a "char []" are
     really one and the same type.

     Somewhat  (but  not  exactly) similarly --  perhaps I should say,
     similarly but differently -- the NAME OF A FUNCTION can be passed
     to  a  parameter  that  is expecting a POINTER  TO A FUNCTION. In
     other words, if you write a procedure

        int do_function_on_array_elems (int *f(), int *a, int len);

     (which  takes  a pointer to a function,  a pointer to an integer,
     and an integer), and then call it as:

        do_function_on_array_elems (myfunc, xarray, num_xs);

     the  compiler won't complain (assuming, of course, that MYFUNC is
     really a function and not, say, an integer or a pointer).


   To  summarize, then, Draft Proposed ANSI  Standard C lets you check
function  parameters  almost  as  precisely  as  Standard  PASCAL. The
differences are:


   *  You  can  ENTIRELY  INHIBIT PARAMETER CHECKING  for all function
     parameters by just omitting the function prototype.


   *  You can declare a parameter to BE A BY-REFERENCE PARAMETER OF AN
     ARBITRARY TYPE by declaring it to be of type "void *". You can do
     this  while still enforcing tight type checking for all the other
     parameters.


   *  In addition to overriding type  checking on a PROCEDURE BASIS or
     PROCEDURE PARAMETER basis, you can also override type checking on
     a  particular call by simply casting  the actual parameter to the
     formal parameter's datatype.

   * Unlike PASCAL, C will never check the SIZE of an array parameter;
     only its TYPE.


     STANDARD "LEVEL 1" PASCAL TYPE CHECKING -- CONFORMANT ARRAYS

   If  you recall, one of the  PASCAL features I most complained about
was  the  inability  to  pass  arrays of different  sizes to different
procedures.  This  essentially  prevents you from  writing any sort of
general array handling routine, including:

   *  For  PACKED  ARRAYs  OF  CHAR  --  the way  that Standard PASCAL
     represents  strings -- you can't write things like blank trimming
     routines,  string  searches, or anything  that's intended to take
     PACKED ARRAYs OF CHAR of different sizes.

   *  For  other arrays, the problem is  exactly the same -- you can't
     write  matrix handling routines that work with arbitrary sizes of
     arrays, e.g. matrix addition, multiplication division, etc.

This  wasn't  the  only  type  checking  problem (others  included the
inability  to  pass  various  record  types to  database I/O routines,
etc.), but it was a major one.

   The ISO Pascal Standard, released in the early 80's, addresses this
problem.  A  new feature called "conformant  arrays" has been defined;
PASCAL  compilers are encouraged, but not required, to implement it. A
compiler is said to

   * "Comply at level 0" if it does not support conformant arrays;

   * "Comply at level 1" if it does support them.

You  see  the problem -- who knows  just how many new PASCAL compilers
will  include  this feature? It is a  fact that most compilers written
before the ISO Standard do NOT include it.

   PASCAL/3000,  for  instance,  does  not have it;  PASCAL/XL, on the
other hand, does.

   What are "conformant arrays"? To put it simply, they are

   *  FUNCTION PARAMETERS that are defined to be ARRAYS OF ELEMENTS OF
     A  GIVEN  TYPE,  but  whose bounds are  NOT defined. Instead, the
     compiler  makes  sure  that  the ACTUAL BOUNDS  of whatever array
     parameter is ACTUALLY passed are made known to the procedure.

An example:

   FUNCTION FIRST_NON_BLANK
            (VAR STR: PACKED ARRAY [LOWBOUND..HIGHBOUND: INTEGER]
                      OF CHAR): INTEGER;
   VAR I: INTEGER;
   BEGIN
   I:=LOWBOUND;
   WHILE I<HIGHBOUND AND STR[I]=' ' DO
     I:=I+1;
   FIRST_NON_BLANK:=I;
   END;

This  procedure  is intended to find the  index of the first non-blank
character  of  STR. Note how it declares  STR: Instead of specifying a
constant  lower  and  upper  bound in the PACKED  ARRAY [x..y] OF CHAR
declaration, it specifies TWO VARIABLES.

   When   the   procedure   is   entered,  the  variable  LOWBOUND  is
automatically  set  to  the  lower  bound  of whatever  array the user
actually passed, and HIGHBOUND is set to the upper bound of the array.

   In other words, if we say:

   VAR MYSTR: PACKED ARRAY [1..80] OF CHAR;
   ...
   I:=FIRST_NON_BLANK (MYSTR);

then,  in FIRST_NON_BLANK, the variable LOWBOUND  will be set to 1 and
HIGHBOUND  will  be  set  to  80.  Instead  of just  passing the MYSTR
parameter, PASCAL actually passes "behind your back" 1 and 80 as well.

   The way I see it, this is a very good solution, even better in some
ways  than C's (in which you can always pass an array of any arbitrary
size):

   *  You're no longer restricted (like you are in Standard PASCAL) to
     a fixed size for your array parameters.

   * When you pass an array to a conformant array parameter, you don't
     have  to manually specify the size of the array; the array bounds
     are  automatically  passed  for you. If I  were to write the same
     procedure in C, I'd have to say

       int first_non_blank (str, maxlen)
       char str[];
       int maxlen;
       ...

     and  then  manually pass it both the  string and the size that it
     was  allocated with; otherwise, the  procedure won't know when to
     stop  searching  (assuming  you  don't use the  convention that a
     string is terminated by a null or some such terminator).

   *  Since  the  compiler  itself  knows  what  the  conformant array
     parameter's bounds are (it doesn't know the actual values, but it
     does  know  what  variables  contain  the  values),  it  can emit
     appropriate run-time bounds checking code. This can automatically
     catch  some  errors at run-time, which is  good if you like heavy
     compiler-generated error checking.

   *  Conformant arrays are even better for two-dimensional arrays. To
     index  into a two-dimensional array the compiler must, of course,
     know  the number of columns in the array (assuming it's stored in
     row-major  order, as C and PASCAL 2-D arrays are). In C, you must
     either declare the number of columns as a constant, e.g.

       matrix_invert (m)
       float m[][100];

     or  declare  the  parameter  as  a 1-D array,  pass the number of
     columns  as  a  parameter, and then do  your own 2-D indexing, to
     wit:

       matrix_invert (m, numcols)
       float m[];
       int numcols;
       ...
       element = m[row*numcols+col];  /* instead of M[ROW,COL] */
       ...

    In ISO Level 1 PASCAL, you just declare the procedure as:

       PROCEDURE MATRIX_INVERT (M: ARRAY [MINROW..MAXROW,
                                          MINCOL..MAXCOL] OF REAL);

    Then  you automatically know the bounds  of the array AND can also
    do  normal  array indexing (M[ROW,COL]),  since the compiler knows
    the number of columns, too.


This,  it  seems, is how original  Standard PASCAL should have worked,
and  I'm glad that the standards  people have established it. The only
problems are:


    * This is, of course, somewhat less efficient than not passing the
      bounds  or just passing, say, the upper bound (like you would in
      C).


    *  Remember  that  this only fixes the case  where we want to pass
      differently  sized  arrays  to  a procedure. If  we want to pass
      different  TYPES  (like  in  our  PUT_REC procedure  that should
      accept  one of several database record types), conformant arrays
      won't help us.

    *  Most importantly, MANY PASCAL  COMPILERS MIGHT NOT SUPPORT THIS
      WONDERFUL  FEATURE. In particular,  PASCAL/3000 DOES NOT SUPPORT
      CONFORMANT ARRAYS.


                       PASCAL/XL TYPE CHECKING

   PASCAL/XL  obeys all of PASCAL's type checking rules, but gives you
a number of ways to work around them:

   *  PASCAL/XL  supports  the  CONFORMANT  ARRAYS that  I just talked
     about.

   * PASCAL/XL allows you to specify a variable as "ANYVAR", e.g.

       PROCEDURE PUT_REC (VAR DB: TDATABASE;
                          S: TDATASET;
                          ANYVAR REC: TDBRECORD);

     What  this  means to PASCAL is that,  when PUT_REC is called, the
     third parameter (REC) will NOT be checked. Inside PUT_REC, you'll
     be  able to refer to this parameter  as REC, and to PUT_REC it'll
     have  the type TDBRECORD; however, the CALLER need not declare it
     as TDBRECORD. For instance,

       VAR SALES_REC: TSALES_REC;
           EMP_REC: TEMP_REC;
       ...
       PUT_REC (MY_DB, SALES_DATASET, SALES_REC);
       ...
       PUT_REC (MY_DB, EMP_DATASET, EMP_REC);

     will  do  EXACTLY what we want it  to -- it'll pass SALES_REC and
     EMP_REC  to our PUT_REC procedure without complaining about their
     data types.

     As  I  said,  PUT_REC  itself  will view the  REC parameter as an
     object of type TDBRECORD. However, PUT_REC can say

       SIZEOF(REC)

     and  determine  the  TRUE  size of the  actual parameter that was
     passed  in place of REC. This can be very useful if PUT_REC needs
     to  do  an  FWRITE or some such operation  that needs to know the
     size of the thing being manipulated.

     The  way  this is done, of course,  is by PASCAL/XL's passing the
     size  of the actual parameter as well as the parameter's address.
     Incidentally,  you  can  turn  this off for  efficiency's sake if
     you're not going to use this SIZEOF construct.

   *  PASCAL/XL  allows  you  to  do TYPE COERCION --  you can take an
     object  of  an arbitrary type and view  it as any other type. For
     instance,  you can take a generic  "ARRAY OF INTEGER" and view it
     as  a record type, or take an  INTEGER parameter and view it as a
     FLOAT. A possible application might be:

       TYPE COMPLEX = RECORD REAL_PART, IMAG_PART: REAL; END;
            INT_ARRAY = ARRAY [1..2] OF INTEGER;
       ...
       PROCEDURE WRITE_VALUE (T: INTEGER; ANYVAR V: INT_ARRAY);
       BEGIN
       IF T=1 THEN WRITELN (V[1])
       ELSE IF T=2 THEN WRITELN (FLOAT(V))
       ELSE IF T=3 THEN WRITELN (BOOLEAN(V))
       ELSE IF T=4 THEN WRITELN (COMPLEX(V).REAL_PART,
                                 COMPLEX(V).IMAG_PART);
       END;

     As  you  see,  this  procedure  takes a type  indicator (T) and a
     variable  of  any  type V. Then, depending on  the value of T, it
     VIEWS  V as an integer, a float, a boolean, or a record structure
     of type COMPLEX. All we need to do is say

       typename(value)

     and  it returns an object with  EXACTLY THE SAME DATA as "value",
     but viewed by the compiler as being of type "typename". Note that
     this  means that "REAL(10)" won't return  10.0 (which is what a C
     "(float)  10"  type  cast  would  do);  rather, it'll  return the
     floating point number the MACHINE REPRESENTATION of which is 10.

     Some  other  example applications for  this very useful construct
     are:

       -  You can now have a pointer variable that can be set to point
         to  an object of an arbitrary  type; this allows you to write
         things like generic linked list handling procedures that work
         regardless  of what type of  object the linked list contains.
         More about this on ANYPTR below.

       -  You  may  write a generic bit  extract procedure that can be
         used  for  extracting bits from  characters, integers, reals,
         etc. You'd declare it as:

           FUNCTION GETBITS (VAL, STARTBIT, LEN: INTEGER): INTEGER;
           ...

         and call it using

           I:=GETBITS (INTEGER(3.0*EXP(X)), 10, 6);

         or

           I:=GETBITS (INTEGER(MYSTRING[I]), 5, 1);

         or  whatever.  Note  that  you  couldn't do  this with ANYVAR
         parameters since ANYVAR parameters are by-reference, and thus
         can't be passed constants or expressions.

   *  PASCAL/XL -- just like PASCAL/3000 -- makes STRING parameters of
     any  size  compatible  with  each  other.  Thus,  you can  pass a
     STRING[20]  to a procedure that's  defined to take a STRING[256];
     or,  if  you're  passing  the  string by REFERENCE,  you can just
     declare   the   formal  parameter  as  "STRING",  which  will  be
     compatible with any string type.

   * PASCAL/XL has a new type called "ANYPTR"; declaring a variable to
     be  an  ANYPTR  makes  it "assignment-compatible"  with any other
     pointer  type, which means that that  variable can be easily made
     to  point  to objects of different  types. This, coupled with the
     "type  coercion"  operation  mentioned above,  makes manipulating
     say, linked lists of different data structures much easier.

   Needless  to  say, use of any of  these constructs can get you into
trouble  precisely  because  of the additional  freedom they give you.
Converting  a chunk of data from one  record data type to another only
makes  sense  if  you  know  exactly what you're  doing; if you don't,
you're likely to end up with garbage.

   However,  often  there  are  cases  where you  NEED this additional
freedom,  and  in  those  cases, PASCAL/XL really  comes through. As a
rule,  its  type  checking  is  as stringent and  thorough as Standard
PASCAL's,  but  it  allows  you  to  relatively easily  waive the type
checking whenever you need to.


                        ENUMERATED DATA TYPES

   If  you recall, before I started talking about type checking, I was
describing  RECORD  STRUCTURES,  a  new  data  type that  PASCAL and C
support.  My  mind,  you  see,  works  like a stack  -- sometimes I'll
interrupt  what  I'm  doi