HOW PROGRAMMING LANGUAGES DIFFER:
                 A CASE STUDY OF SPL, PASCAL, AND C
                       by Eugene Volokh, VESOFT
           Presented at 1987 SCRUG Conference, Pasadena, CA
       Presented at 1987 INTEREX Conference, Las Vegas, NV, USA
           Published by The HP CHRONICLE, May 1987-May 1988.

ABSTRACT: The HP3000's wunderkind sets out to study Pascal, C and SPL
for  the HP mini  in a set of articles,  using real-life examples and
plenty of tips on how to code for optimum efficiency in each language.
First in the series:  ground rules  for the comparison  and  a look at
control structures. (The HP CHRONICLE, May 1987)

INTRODUCTION

   Programmers  get  passionate about programming  languages. We spend
most  of  our  time hacking code,  exploiting the language's features,
being bitten by its silly restrictions. There are dozens of languages,
and  each  one has its fanatical  adherents and its ardent detractors.
Some  like APL, some like FORTH, LISP, C, PASCAL; some might even like
COBOL or FORTRAN, perish the thought.

   In particular, a lot of fuss has recently arisen about SPL, PASCAL,
and  C.  All  three  of them are  considered good "system programming"
(whatever  that is) languages, and  naturally people argue about which
one is the best.

   HP's  Spectrum  project has come out in  favor of PASCAL -- all new
MPE/XL  code  will  be written in PASCAL, and  HP won't even provide a
native  mode  SPL compiler. On the other  hand, HP's also getting more
and more into UNIX, which is coded entirely in C. Especially between C
and PASCAL adherents there seems to be something like a "holy war"; it
becomes not just a matter of advantages and disadvantages, but of Good
and  Evil, Right and Wrong. Strict type  checking is Good, some say --
loose  type checking is Evil; pointers  are Wrong -- array indexing is
Right. The battle-lines are drawn and the knights are sharpening their
swords.

   But,  some ask -- what's the big  deal? After all, it's an axiom of
computer  science  that all you need is an  IF and a GOTO, and you can
code anything you like. Theoretically speaking, C, SPL, and PASCAL are
all equivalent; practically, is there that much of a difference?

   In  other words, is it just esthetics or prejudice that animate the
ardent  fans  of  C,  PASCAL,  or SPL, or  are there real, substantive
differences between the languages -- cases in which using one language
rather  than another will make your life substantially easier? Are the
main differences between, say, C and PASCAL that PASCAL uses BEGIN and
END  and  C uses "{" and "}"? That  C's assignment operator is "=" and
PASCAL's is ":="?

   The  goal of this paper is to answer just this question. I will try
to analyze each of the main areas where SPL, C, and PASCAL differ, and
point  out  those differences using  actual programming examples. I'll
try  not  to  emphasize  vague, general statements,  like "PASCAL does
strict  type checking", or subjective opinions, like "C is too hard to
read";  rather,  I  want to use SPECIFIC  EXAMPLES which can help make
clear  the  exact  influence of strict or  loose type checking on your
programming tasks.


                          RULES OF EVIDENCE

   Saying that I'll "compare SPL, PASCAL, and C" isn't really saying a
whole  lot.  How  will  I  compare  them? What criteria  will I use to
compare  them?  Will  I  compare how easy it is  to read them or write
them?  Will  I  compare what programming habits  they instill in their
users? Which versions of these languages will I compare?

   To  do  this, and to do this in  as useful a fashion as possible, I
set myself some rules:

   *  I  resolved  to try to show the  differences by use of examples,
     preferably  as  real-life  as  possible. The emphasis  here is on
     CONCRETE  SPECIFICS, not on general statements such as "C is less
     readable" or "PASCAL is more restrictive".

   *  I  decided  not to go into  questions of efficiency. Compiling a
     certain  construct  using  one  implementation of  a compiler may
     generate  fast  code,  whereas  a  different  implementation  may
     generate slow code. Sure, the FOR loop in PASCAL/3000 may be less
     efficient  than in SPL or in CCS's C/3000, but who knows how fast
     it'll be under PASCAL/XL?

     For  this  reason,  I  don't wax too  poetic about the efficiency
     advantages  of features such as C's  "X++" (which increments X by
     1)  --  a modern optimizing compiler  is quite likely to generate
     equally  fast code for "X:=X+1", automatically seeing that it's a
     simple  increment-by-one (even the  15-year-old SPL/3000 compiler
     does this).

     The  only times when I'll mention efficiency is when some feature
     is  INHERENTLY more or less efficient than another (at least on a
     conventional machine architecture); for instance, passing a large
     array BY VALUE will almost certainly be slower than passing it BY
     REFERENCE,  since by-value passing would  require copying all the
     array data.

     Even   in   these   cases,   I   try  to  play  down  performance
     considerations;  if  you're  concerned  about speed  (as well you
     should be), do your own performance measurements for the features
     and compiler implementations that you know you care about.

   *  I  resolved -- for space reasons if for  no other -- not to be a
     textbook  for  SPL, PASCAL, or C. Some  of the things I say apply
     equally well to almost all programming languages, and I hope that
     they will be understandable even to people who've never seen SPL,
     PASCAL, or C.

     For  other  things,  I  rely  on the relative  readability of the
     languages and their similarity to one another. I hope that if you
     know  any  one  of  SPL,  PASCAL,  or  C,  you should  be able to
     understand the examples written in the other languages.

     However,  it may be wise for you  to have manuals for these three
     languages  --  either  their  HP3000  implementations  or general
     standards  -- at hand, in case  some of the examples should prove
     too arcane.

   *  As you can tell by the size  of this paper, I also decided to be
     as thorough as practical in my comparisons, and ESPECIALLY in the
     evidence backing up my comparisons.

     One  of the main reasons I wrote this paper is that I hadn't seen
     much  OBJECTIVE  discussion comparing C and  PASCAL; I wanted not
     just  to present my conclusions -- which might as easily be based
     on  prejudice as on fact -- but also the reasons why I arrived at
     them, so that you could decide for yourself.

     So  as  not to burden you with  having to read all 200-odd pages,
     though,  I've summarized my conclusions in the "SUMMARY" chapter.
     You  might  want to have a look  there first, and then perhaps go
     back to the main body of the paper to see the supporting evidence
     of the points I made.


                    WHAT ARE C AND PASCAL, ANYWAY?

   If  you  think about it, SPL is  a very unusual language indeed. To
the  best of my knowledge, there is exactly one SPL compiler available
anywhere,  on any computer (eventually, the independent SPLash! may be
available  on  Spectrum,  but  that is another story).  I can say "SPL
supports  this"  or  "SPL  can't  do that"  and, excepting differences
between  one chronological version of SPL  and the next, be absolutely
precise  and  objectively  verifiable.  SPL  can be  said to "support"
something  only  because  there  is  only one SPL  compiler that we're
talking about.

   To  say  "PASCAL  can  do  X" is a  chancy proposition indeed. ANSI
Standard  PASCAL  doesn't  support  variable-length strings,  but most
modern  PASCAL implementations, including HP PASCAL, have some sort of
string  mechanism.  What about HP's new  PASCAL/XL, reputed to be even
more powerful still? Similarly, with C, there are the old "Kernighan &
Ritchie"  C, the proposed new ANSI standard  C, whatever it is that HP
uses on the Spectrum, AND whatever you use on the 3000, which might be
CCS's C compiler or Tymlabs' C.

   On  the one hand, I contemplated  comparing standard C and standard
PASCAL.  This  is  easier  for  me,  and  it  also makes  sense from a
portability  point  of  view  (if  you want it  to be portable, you're
better off using the standard, anyway).

   On  the other hand, portability is  fine and dandy, but most people
aren't  going  to  be porting their software  any further than from an
MPE/XL  machine to an MPE/V machine and  back. As long as you stick to
HP3000s, you have the full power of so-called "HP PASCAL", an extended
superset  of  PASCAL that's supported on  3000s, 1000s, 9000s, and the
rest; it's hardly fair (or practical) to ignore this in a comparison.

   Finally,   what  about  PASCAL/XL?  It'll  have  even  more  useful
features,  but  they may not be ported  back to the MPE/V machines, at
least  for  a  while.  Should  I  then  compare PASCAL/XL  and C/XL, a
representative  contest  for the XL machines,  but not necessarily for
MPE  V  machines,  and  certainly not if you  really want to port your
software onto other machines.

   This  is  all,  incidentally,  aggravated  by  the  fact  that HP's
extensions  to  PASCAL are more substantial  than its extensions to C;
thus,  comparing  the  "standards"  is  likely  to  put  PASCAL  in  a
relatively  worse  light  than comparing "supersets"  (not to say that
PASCAL is worse than C in either case).

   Faced  with  all  this,  I've  decided  to compare  everything with
everything else. There are actually 7 different compilers I discuss at
one time or another:

   * SPL.
     There's only one, thank God.

   * Standard PASCAL.
     This  is  the original ANSI Standard,  on which all other PASCALs
     are  based.  This  is  also very similar to  Level 0 ISO Standard
     PASCAL (see next item).

   * Level 1 ISO Standard PASCAL.
     This  standard,  put out in the  early 1980's, supports so-called
     CONFORMANT  ARRAY  parameters (see the  DATA STRUCTURES chapter).
     The  same standard document defined "Level 0 ISO Standard PASCAL"
     to   be   much  like  classic  "Standard  PASCAL",  i.e.  without
     conformant  arrays.  Compiler  writers  were given  the choice of
     which  one to implement, and it isn't obvious how popular Level 1
     ISO  Standard  will be. When I say  "Standard PASCAL", I mean the
     original  standard, which is almost identical  to the ISO Level 0
     Standard.

   * PASCAL/3000.
     This is HP's implementation of PASCAL on the pre-Spectrum HP3000.
     Although the Spectrum machines will also be called 3000's, when I
     say  PASCAL/3000 I mean the  pre-Spectrum version. PASCAL/3000 is
     itself  a superset of HP Pascal,  which is also implemented by HP
     on  HP  1000s  and  HP  9000s.  PASCAL/3000 is a  superset of the
     original Standard PASCAL, not the ISO Level 1 Standard.

   * PASCAL/XL.
     This  is  HP's  implementation  of  PASCAL on  the Spectrum. It's
     essentially  a  superset of both PASCAL/3000  and the ISO Level 1
     Standard.

   * Kernighan & Ritchie (K&R) C.
     This  is the C described by Brian Kernighan and Dennis Ritchie in
     their  now-classic  book "The C  Programming Language" (which, in
     fact,  is usually called "Kernighan and Ritchie"). Although never
     an  official standard, it is  quite representative of most modern
     C's.  In  fact,  for  practical  purposes, it can  be said that a
     program  written  in  K  &  R  C  is portable to  virtually any C
     implementation  (assuming you avoid those  things that K&R itself
     describes as implementation-dependent).

   * Draft ANSI Standard C.
     ANSI is now working on codifying a standard of C, which will have
     some  (but not very many) improvements over K&R. My reference for
     this  was Harbison & Steele's book "C: A Reference Manual", which
     also discusses various other implementations of C. Although Draft
     ANSI  Standard  C  is  Standard,  it  is also Draft.  Some of the
     features  described in it are  implemented virtually nowhere, and
     it's not clear how much of them C/XL will include.

   Matters  are  further  complicated,  of  course, by the  lack of an
HP-provided C compiler on the pre-Spectrum HP3000. The compiler I used
to  research  this  paper  is  CCS Inc.'s C/3000  compiler, which is a
super-set  of  K&R  C and a subset of  Draft ANSI Standard C. The most
conspicuous  Draft Standard feature that  CCS C/3000 lacks is Function
Prototypes  --  an  understandable  lack  since virtually  all other C
compilers don't have them, either.

   Whenever  any  difference  exists  between  any of the  PASCAL or C
versions,  I try to point it out. Which versions you compare are up to
you:

   * You can compare Standard PASCAL and K&R C.
     If it isn't in these general standards that everybody implements,
     you're unlikely to get much portability.

   * You can compare PASCAL/XL and Draft ANSI Standard C.
     These are the compilers that will most likely be available on the
     Spectrum.

   * You can compare PASCAL/3000 and Draft ANSI Standard or K&R C.
     Even  though you might not usually care about porting to, say, an
     IBM  or a VAX, you may very seriously care about porting from the
     pre-Spectrum  to the Spectrum and  vice versa. HP hasn't promised
     to  port  PASCAL/XL back to the  pre-Spectrums, so PASCAL/3000 is
     probably the lowest common denominator.

SPL  is  nice.  At  least  until  SPLash!'s  promised Native  Mode SPL
compiler  comes  out,  there's only one SPL  compiler to compare with.
This makes me very happy.


                        ARE C, PASCAL, AND SPL
                      FUNDAMENTALLY DIFFERENT OR
                         FUNDAMENTALLY ALIKE?

   In my opinion, they are definitely FUNDAMENTALLY ALIKE. In the rest
of the paper, I'll tell you all about their differences, but those are
EXCEPTIONS in their fundamental similarity.

   Why  do  I  think so? Well, virtually  every important construct in
either  of  the  three  languages has an almost  exact parallel in the
other two (the only exception being, perhaps, record structures, which
SPL doesn't have).

   *  All  three languages emphasize writing your  program as a set of
     re-usable,  parameterized  procedures  or  functions  (which, for
     instance, COBOL 74 and most BASICs do not);

   *  All three languages share virtually the same rich set of control
     structures (which neither FORTRAN/IV nor BASIC/3000 possesses).

   *  The languages may on the surface LOOK somewhat different (PASCAL
     and  C certainly do), but remember  that the ESSENCE is virtually
     identical  --  PASCAL may say "BEGIN" and  "END" where C says "{"
     and "}", but that's hardly a SUBSTANTIVE difference.

   Despite  all  the  differences  which  I'll  spend all  these pages
describing  --  and  I  think many of the  differences are indeed very
important  ones -- I still think that  SPL, PASCAL, and C are about as
close to each other as languages get.


              SO, WHICH IS BETTER -- C, PASCAL, OR SPL?

   You  think  I'm  going  to answer that? With  all my pretensions to
objectivity,  and dozens of angry language fanatics ready to berate me
for choosing the "wrong one"?

   The  main purpose of this paper is  to show you all the differences
and  let  you  decide  for  yourselves;  after all, there  are so many
parameters  (how portable do you want the  code to be? how much do you
care  about  error  checking?)  that  are  involved  in  this  sort of
decision.

   The  closest  I  come to actually saying which  is better is in the
"SUMMARY" chapter (at the very end of the paper); there I explain what
I  think the major drawbacks and advantages of each language are. Look
there,  but remember -- only you can decide which language is best for
your purposes.


                   TECHNICAL NOTE ABOUT C EXAMPLES

   In  case  you  didn't  know,  C  differentiates between  upper- and
lower-case.  The  variables "file" and "FILE"  are quite different, as
are  "file",  "File", and "fILE". (In SPL  and PASCAL, of course, case
differences are irrelevant; all of the just-given names would refer to
the same variable.)

   In  fact,  in  C  programs the majority of  all objects -- reserved
words,  procedure  names,  variables,  etc.  --  are  lower-case.  The
reserved  words ("if", "while", "for", "int", etc.) are required to be
lower-case  by  the  standard;  theoretically,  you can  name all your
variables  and  procedures  in upper-case, but  most C programmers use
lower-case  for them, too (although  they can sometimes use upper-case
variable names as well, perhaps to indicate their own defined types or
#define macros).

   This  is  why  all  the  examples  of C programs  in this paper are
written  in lower-case. The one exception to this is when I refer to a
C  object -- a variable, a procedure, or a reserved word -- within the
text of a paragraph. Then, I'll often capitalize it to set it off from
the rest of the paper, to wit:

   proc (i, j)
   int i, j;
   {
   if (i == j)
     ...
   }

   The procedure PROC takes two parameters, I, and J.
   The IF statement makes sure that they're equal, ....

   The  fact  that  I refer to them in  upper-case in the text doesn't
mean  that  you should actually use upper-case  names. I just do it to
make the text more readable.

   Another  example  of  how a little lie  can help reveal the greater
truth...


                           ACKNOWLEDGMENTS

   I'd  like to thank the following people for their great help in the
writing of this paper:

   *  CCS, Inc., authors of CCS  C/3000, a C compiler for pre-Spectrum
     HP3000s.  All the research and testing of the C examples given in
     this   paper   was   done  using  their  excellent  compiler.  In
     particular, I'd also like to thank Tim Chase, who gave me a great
     deal of help on some of the details of the C language.

   *  Steve Hoogheem of the HP Migration Center, who served as liaison
     between  me and the PASCAL/XL lab in answering my questions about
     PASCAL/XL.  *  Mr.  Tom  Plum  (of  Plum  Hall,  Cardiff,  NJ), a
     recognized  C  expert  and  member  of the Draft  ANSI Standard C
     committee,  who  was kind enough to  answer many of the questions
     that I had about the Draft Standard.

   *  Dennis Mitrzyk, of Hewlett-Packard, who helped me obtain much of
     my  PASCAL/XL information, and who was also kind enough to review
     this paper.

   *  Joseph Brothers, David Greer (of  Robelle), Dave Lange and Roger
     Morsch  (of State Farm Insurance), and Mark Wallace (of Robinson,
     Wallace,  and  Company),  all  of  whom  reviewed  the  paper and
     provided a lot of useful input and corrections.


                          CONTROL STRUCTURES

   GOTOs,  some  say,  are  Considered  Harmful. Perhaps  they are and
perhaps  they are not. But the major reason for the control structures
that  PASCAL  and  C  provide  (as opposed to,  say, FORTRAN IV, which
doesn't)  is not that they replace GOTOs, but rather that they replace
them  with  something  more  convenient.  If given  the choice between
saying

   IF FNUM = 0 THEN
     PRINTERROR
   ELSE
     BEGIN
     READFILE;
     FCLOSE (FNUM, 0, 0);
     END;

and

   IF FNUM <> 0 THEN GOTO 10;
     PRINTERROR;
     GOTO 20;
  10:
     READFILE;
     FCLOSE (FNUM, 0, 0);
  20:

then  I would choose the former. IF-THEN-ELSE is a common construct in
all  of  the algorithms we write, and  it's easier for both the writer
and  the reader to have a language construct that directly corresponds
to it.

   C and PASCAL share some of the fundamental control structures. Both
have


   * IF-THEN-ELSEs. They look slightly different:

       IF FNUM=0 THEN        { PASCAL }
         PRINTERROR
       ELSE
         BEGIN
         READFILE;
         FCLOSE (FNUM, 0, 0);
         END;

     and

       if (fnum==0)          /* C */
         printerror;         /* note the semicolon */
       else
         {
         readfile;
         fclose (fnum, 0, 0);
         }

     but  I hardly think the  difference very substantial. There'll be
     some who forever curse C for using lower-case or PASCAL for using
     such  L-O-N-G reserved words, like "BEGIN"  and "END"; I can live
     with either.


   * WHILE-DOs, although again there are some minor differences

       WHILE GETREC (FNUM, RECORD) DO
         PRINTREC (RECORD);

     vs.

       while (getrec (fnum, record))
         printrec (record);

   * DO-UNTILs:

       REPEAT
         GETREC (FNUM, RECORD);
         PRINTREC (RECORD);
       UNTIL
         NOMORERECS (FNUM);

     and

       do
         {
         getrec (fnum, record);
         printrec (record);
         }
       while
         (!nomorerecs (fnum));    /* "!" means "NOT" */

     Note  that  PASCAL  has  a  DO-UNTIL  and  C has  a DO-WHILE. Big
     difference.


   *  And, finally, C's and  PASCAL's procedure support is comparable,
     as well.

The  interesting  things,  of  course,  are the points  at which C and
PASCAL  differ.  There  are  some,  and for those  us who thought that
IF-THEN-ELSE  and  WHILE-DO are all the  control structures we'll ever
need, the differences can be quite surprising.


         THE "WHILE" LOOP AND ITS LIMITATIONS; THE "FOR" LOOP

   It  is, indeed, true, that all iterative constructs can be emulated
with  the WHILE-DO loop. On the other hand, why do the work if someone
else can do it for you?

   The  PASCAL FOR loop -- a child  of FORTRAN's DO -- is actually not
that hard to emulate:

   FOR I:=1 TO 9 DO
     WRITELN (I);

is identical, of course, to

   I:=1;
   WHILE I<=9 DO
     BEGIN
     WRITELN (I);
     I:=I+1;
     END;

Not  such  a  vast savings, but, still,  the FOR loop definitely looks
nicer.

   Unfortunately,  for  all  the savings that the  FOR loop gives you,
I've  found  that  it's  not as useful as  one might, at first glance,
believe.  This is because it ALWAYS  loops through all the values from
the  start to the limit. How often do you need to do that, rather than
loop  until  EITHER a limit is reached  OR another condition is found?
String  searching, for instance -- you want to loop until the index is
at  the  end of the string OR  you've found what you're searching for.
Always looping until the end is wasteful and inconvenient.

   Looking  through my MPEX source code, incidentally, I find 53 WHILE
loops  and  8  FOR loops. In my RL, the  numbers are 170 WHILEs and 38
FORs (at least 6 of these FORs should have been WHILEs if I weren't so
lazy).  (How's  that  for  an  argument -- I don't  use it, ERGO it is
useless.  I'm rather proud of it.)  In any case, though, my experience
has been that

   * THE PURE "FOR" LOOP -- A LOOP THAT ALWAYS GOES ON UNTIL THE LIMIT
     HAS  BEEN  REACHED  --  IS  NOT  AS COMMON AS  ONE MIGHT THINK IN
     BUSINESS AND SYSTEM PROGRAMS (although scientific and engineering
     applications,  which often handle matrices and such, use pure FOR
     loops more often). MORE OFTEN YOU WANT TO ALSO SPECIFY AN "UNTIL"
     CONDITION WHICH WILL ALSO TERMINATE THE LOOP.

   What I wanted, then, was simple -- a loop that looked like

   FOR I:=START TO END UNTIL CONDITION DO

For instance,

   FOR I:=1 TO STRLEN(S) UNTIL S[I]=C DO;

or

   FOR I:=1 TO STRLEN(S) WHILE S[I]=' ' DO;

What I got -- and I'm not sure if I'm sorry I asked or not -- is the C
FOR loop:

   for (i=1;  i<=strlen(s) && s[i]!=c;   i=i+1)
     ;

The  C FOR loop -- like most  things in C, accomplished with a minimum
of letters and a maximum of special characters -- looks like this:

   for (initcode; testcode; inccode)
     statement;

It is functionally identical to

   initcode;
   while (testcode)
     {
     statement;
     inccode;
     }

In  other  words,  this is a sort of  "build-your-own" FOR loop -- YOU
specify the initialization, the termination test, and the "STEP". This
is   actually  quite  useful  for  loops  that  don't  involve  simple
incrementing, such as stepping through a linked list:

   for (ptr=listhead;   ptr!=nil;   ptr=ptr.next)
     fondle (ptr);

The  above loop, of course, fondles  every element of the linked list,
something  quite  analogous to what an  ordinary PASCAL FOR loop would
do, but with a different kind of "stepping" action.

   The standard PASCAL loop, of course, can easily be emulated --

   for (i=start;   i<=limit;   i=i+1)
     statements;

   I'm  sure it would be fair to conclude that C's FOR loop is clearly
more  powerful than PASCAL's. On the other  hand, a WHILE loop is more
powerful  than  a  FOR loop, too; and, a  GOTO is the most powerful of
them  all  (heresy!).  The  reason  a  PASCAL FOR loop  -- or for that
matter,  a  C FOR loop -- is good  is because simply by looking at it,
you can clearly see that it is a WHILE loop of a particular kind, with
clearly evident starting, terminating, and stepping operations.

   The  major argument that may be made against C's for loop is simply
one of clarity. Possible reasons include:

   *  The loop variable has to be  repeated four (or three, if you use
     "i++" instead of "i=i+1") times.

   *  The  semicolons,  adequate to delimit the  three clauses for the
     compiler,  may not sufficiently delimit them to a human reader --
     it  may  not  be  instantly  obvious where one  clause starts and
     another ends.

   *  Also,  the  very  use of semicolons  instead of control keywords
     (like  "TO")  may  be  irritating; in a way,  it's like having to
     write

       FOR I,1,100

     instead of

       FOR I:=1 TO 100

     If  you think the first version  isn't any worse than the second,
     you  shouldn't mind C; some, however, find "FOR I,1,100" slightly
     less clear than "FOR I:=1 TO 100".

   for (i=1; i<=10; i=i+1)         FOR I:=1 TO 10 DO

       or, alternatively

   for (i=1; i<=10; i++)           FOR I:=1 TO 10 DO

Which  do you prefer? Frankly, for  me, the PASCAL version is somewhat
clearer,  although  I'm not prepared to say  that the clarity is worth
the  cost in power. On the other hand, many a C programmer doesn't see
any  advantage in the PASCAL style,  and perhaps there isn't any. Some
of the C/PASCAL differences, I'm afraid, boil down to simply this.


     THE WHILE LOOP AND ITS LIMITATIONS -- AN INTERESTING PROBLEM

   Consider the following simple task -- you want to read a file until
you  get a record whose first character  is a "*"; for each record you
read,  you want to execute some  statements. Your PASCAL program might
look like this:

   READLN (F, REC);
   WHILE REC[1]<>'*' DO
     BEGIN
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     READLN (F, REC);
     END;

All  well and good? But, wait a minute  -- we had to repeat the READLN
statement  a second time at the end of the WHILE loop. "Lazy bum," you
might  reply.  "Can't handle typing an extra  line." Well, what if, in
order  to  get  the  record, we had to do  more than just a READLN? We
might need to, say, call FCONTROL before doing the READLN, and perhaps
have  a  more complicated loop test. Our  program might end up looking
like:

   FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
   FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
   READLN (F, REC);
   GETFIELD (REC, 3, FIELD3);
   WHILE FIELD3<>'*' DO
     BEGIN
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
     END;

This  is not a happy-looking program. We had to duplicate a good chunk
of code, with all the resultant perils of such a duplication; the code
was harder to write, it's now harder to read, and when we maintain it,
we're  liable to change one of the occurrences of the code and not the
other.

   Workarounds, of course, exist. We can say

   REPEAT
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
     IF FIELD3 <> '*' THEN
       BEGIN
       PROCESS_RECORD_A (REC);
       PROCESS_RECORD_B (REC);
       PROCESS_RECORD_C (REC);
       END;
   UNTIL
     FIELD3 = '*';

although  this  is  also rather messy -- we've  had to repeat the loop
termination  condition,  and  the resulting code  is really a WHILE-DO
loop masquerading as a REPEAT-UNTIL.

   Some might reply that what we ought to do is to move the FCONTROLs,
READLN,  and  GETFIELD into a separate  function that returns just the
value  of FIELD3, or perhaps even the loop test (FIELD3 <> '*'). Then,
the loop would look like:

   WHILE FCONTROLS_READLN_AND_GETFIELD_CHECK_STAR (FNUM, REC) DO
     BEGIN
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     END;

This,  indeed, does look nice -- but are we to be expected to create a
new procedure every time a control structure doesn't work like we want
it  to? I like procedures just as much as the next man; in fact, I'm a
lot  more  prone  to pull code out into  procedures than others are (I
like  my procedures to be twenty lines or shorter). On the other hand,
what   if  someone  said  that  you  couldn't  use  BEGIN  ..  END  in
IF/THEN/ELSE  statements  -- if you want to  do more than one thing in
the THEN clause, you have to write a procedure?

   C  has  some  advantage  here.  With C's "comma"  operator, you can
combine  any  number  of  statements  (with some  restrictions) into a
single  expression, whose result is the last value. Thus, what you can
do is something like this:

   while ((fcontrol (fnum(f), extended_read, dummy),
           fcontrol (fnum(f), set_timeout, timeout),
           gets (f, rec, 80),
           getfield (rec, 3, field3),
           field3<>'*'))
     {
     process_record_a (rec);
     process_record_b (rec);
     process_record_c (rec);
     };

Whether  this is better or not,  you decide. The "comma" construct can
be  very confusing. In "while ((...)) do", the outside parentheses are
the  WHILE's; the inner pair is  the comma constructs'; and all others
belong  to internal expressions and  function calls. Additionally, you
have  to  keep track of which commas  belong to the function calls and
which delimit the comma constructs' elements. &P

   What  is  that  slithering  underfoot? Could it  be the serpent? He
proposes this:

   WHILE TRUE DO
     BEGIN
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
     IF FIELD3<>'*' THEN GOTO 99;
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
     END;
  99:

"Sssimple and ssstraightforward, madam.  Won't you have a bite?" Shame
on  you!  Still, it's not obvious that  the old faithful "GOTO" isn't,
relatively  speaking,  a  reasonable solution. C  has its own variant,
that lets us get away without using the "forbidden word":

   while (TRUE)
     {
     fcontrol (fnum(f), extended_read, dummy);
     fcontrol (fnum(f), set_timeout, timeout);
     gets (f, rec, 80);
     getfield (rec, 3, field3);
     if (field3='*') break;
     process_record_a (rec);
     process_record_b (rec);
     process_record_c (rec);
     };

C's  "BREAK"  construct  gets  you  out  of the  construct that you're
immediately  in,  be  it  a  WHILE  loop  (as in this  case), a SWITCH
statement  (in  which it is vital), a FOR,  or a DO. If you believe in
the evil of GOTOs, you probably won't much like BREAKs; again, though,
I  ask -- is the above example any  less muddled than the other ones I
showed?

   Incidentally,  the best approach that I've seen so far comes from a
certain  awful, barbarian language called  FORTH (OK, all you FORTHies
--  meet  me  in  the  alley  after the talk and  we can have it out).
Translated into civilized terms, the loop looked something like this:

   DO
     FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
     FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
     READLN (F, REC);
     GETFIELD (REC, 3, FIELD3);
   WHILE FIELD3<>'*'
     PROCESS_RECORD_A (REC);
     PROCESS_RECORD_B (REC);
     PROCESS_RECORD_C (REC);
   ENDDO;

This  so-called  "loop-and-a-half"  solves  what  I  think is  the key
problem,  present  in so many WHILE loops  -- that the condition often
takes  more than a single expression  to calculate. Well, in any case,
neither SPL, PASCAL, nor C have such a construct, so that's that.


       BREAK, CONTINUE, AND RETURN -- PERFECTION OR PERVERSION?

   As  I  mentioned  briefly in the last  section, C has three control
structures  that  PASCAL  does  not,  and  some say  should not. These
structures,  Comrade,  are Ideologically Suspect.  A Dangerous Heresy.
Still, they're there, and ought to be briefly discussed.

   * BREAK -- exits the immediately enclosing loop (WHILE, DO, or FOR)
     or  a  SWITCH  statement.  Essentially  a  GOTO to  the statement
     immediately following the loop.

   *  CONTINUE  --  goes  to  the "next iteration"  of the immediately
     enclosing loop (WHILE, DO, or FOR).

   *  RETURN  --  exits  the current  procedure. "RETURN <expression>"
     exits  the current procedure, returning the value of <expression>
     as the procedure's result.

   * Of course, GOTO, the old faithful.

   Now,  as  you  may  or  may not recall, a  while ago there was much
argument  made against GOTOs. Instead of GOTOs, it was said, you ought
to   use   only   IF-THEN-ELSEs   and   WHILE-DOs.  CASEs,  FORs,  and
REPEAT-UNTILs,  being  just variants of  the other control structures,
were  all  right;  but  GOTOs  were  condemned,  on several  very good
grounds:

   *  First of all, with GOTOs, the "shape" of a procedure stops being
     evident. If you don't use GOTOs, each procedure and block of code
     will  have only one ENTRY and only  one EXIT. This means that you
     can  always  assume  that  control  will  always  flow  from  the
     beginning  to  the  end, with iterations  and departures that are
     always  clearly  defined and the conditions  for which are always
     evident.

   *  If  you avoid GOTOs, then for  any statement, you can tell under
     what  conditions  it  will  be  executed  just by  looking at the
     control structures within which it is enclosed.

These  concerns,  I  would  say,  may  apply  equally well  to BREAKs,
CONTINUEs, and RETURNs.

   Personally,  I must confess, I don't use  GOTOs. I don't know if it
is  the  appeal  of  reason, the lesson of  experience, or fear for my
immortal  soul.  About  five years ago I  resolved to stop using them;
except  for  "long  jumps"  (which I'll talk about  more later), I use
GOTOs  in 1 procedure of MPEX's 40  procedures, and in 2 procedures of
my  RL's 350 (both of the uses  of "GOTO" are as "RETURN" statements).
However, I must say that in many cases the temptation does seem great.

   Consider,  for  a  moment,  the following case. We  need to write a
procedure  that opens a file, reads some records, writes some records,
and  closes  the  file.  In case any of  the file operations fails, we
should  immediately  close  the  file  and  not do  anything else. The
"GOTO-less" solution:

   munch_file (f)
   char f[40];
   {
   int fnum;

   fnum = fopen (f, 1);
   if (error == 0)             /* let's say ERROR is an error code */
      {
      freaddir (fnum, buffer, 128, rec_a);
      if (error == 0)
         {
         munch_record_one_way (buffer);
         fwritedir (fnum, buffer, 128, rec_a);
         if (error == 0)
            {
            freaddir (fnum, buffer, 128, rec_b);
            if (error == 0)
               {
               munch_record_another_way (buffer);
               fwritedir (fnum, buffer, 128, rec_b);
               if (error == 0)
                  some_more_stuff;
               }
            }
         }
      }
   fclose (fnum, 0, 0);
   }

Or, using "GOTO":

   munch_file (f)
   char f[40];
   {
   int fnum;
   #define check_error    if (error != 0) goto done

   fnum = fopen (f, 1);
   if (error = 0)
      {
      freaddir (fnum, buffer, 128, rec_a);
      check_error;
      munch_record_one_way (buffer);
      fwritedir (fnum, buffer, 128, rec_a);
      check_error;
      freaddir (fnum, buffer, 128, rec_b);
      check_error;
      munch_record_another_way (buffer);
      fwritedir (fnum, buffer, 128, rec_b);
      check_error;
      some_more_stuff;
      }

   done:
   fclose (fnum, 0, 0);
   }

Is the latter way really worse? I'm not so sure. Also, I can't see any
way  in which I can rewrite  this example without GOTOs without making
it as cumbersome as the first case.

   Similar  examples  can  be  found  for  BREAK  and RETURN.  If, for
instance,  I  wasn't required to close the  file, I'd just do a RETURN
instead  of doing the "GOTO DONE"; if  I had to loop through the file,
my code might look something like:

   framastatify (f)
   char f[40];
   {
   int fnum;

   fnum = fopen (f, 1);
   if (error = 0)
      {
      while (TRUE)
         {
         fread (fnum, rec1, 128);
         if (error != 0) break;
         if (frob_a (rec1) == failed)
            break;
         fupdate (fnum, rec1, 128);
         if (error != 0) break;
         freadlabel (fnum, rec1, 128, 0);
         if (error != 0) break;
         if (twiddle_label (rec1) == failed)
            break;
         fwritelabel (fnum, rec1, 128, 0);
         if (error != 0) break;
         fspace (fnum, 20);
         if (error != 0) break;
         }
      fclose (fnum, 0, 0);
      }
   }

Just IMAGINE all those IFs you'd need to nest if you avoided BREAK!

   CONTINUE,  on the other hand, is  a vile heresy. Everybody who uses
CONTINUE should be burned at the stake.

   To  summarize, "C Notes, A Guide  to the C Programming Language" by
C.T. Zahn (Yourdon 1979) says:

   "In practice, BREAK is needed rarely, CONTINUE never, and GOTO even
    less  often  than that...   It also is  good style to minimize the
    number  of  RETURN  statements;  exactly  one  at  the end  of the
    function is best of all for readability."

On the other hand, I say

   "If this be treason, make the most of it!"

Especially   if   your  procedures  are  short  enough  and  otherwise
well-written  enough, I think that you can well make the judgment that
even  with  the introduction of GOTOs, the  control flow will still be
clear enough.

   Just don't tell anyone I told you to do it.


                  LONG JUMPS -- PROBLEM AND SOLUTION

   Modern structured programming encourages FACTORING. Your algorithm,
it  says, should be broken up into small procedures, small enough that
each one can be easily understood and digested by anybody reading it.

   I'm  quite  fond  of  factoring myself, and you'll  find most of my
procedures  to  be  about 20-odd lines long or  shorter. I try to make
each procedure a "black box", with a well-defined, atomic function and
no  unobvious  side effects. Naturally, with  procedures this small, I
often  end  up  going  several levels of procedure  calls deep to do a
relatively simple task.

   For  instance, I might have a procedure called ALTFILE that takes a
file  name  and a string of keywords indicating  how the file is to be
altered:

   * ALTFILE calls PARSE_KEYWORDS to parse the keyword string;

   *  PARSE_KEYWORDS  separates  the string  into individual keywords,
     calling PROCESS_KEYWORD for each one;

   * PROCESS_KEYWORD figures out what keyword is being referenced, and
     calls   a   parsing   routine   --   PARSE_INTEGER,   PARSE_DATE,
     PARSE_INT_ARRAY,  etc. -- depending on the  type of the value the
     user specified;

   *  PARSE_INT_ARRAY takes a list of integer values delimited by, say
     ":"s, and calls PARSE_INTEGER for each one.

   *  PARSE_INTEGER  converts  the  text string  containing an integer
     value into a number and returns the numeric value.

Not  a  far-fetched  example,  you  must  agree;  in fact,  many of my
programs  (e.g.  MPEX's  %ALTFILE  parser) nest even  deeper. Now, the
question  arises -- what if PARSE_INTEGER  realizes that the value the
user specified isn't a valid number after all?

   The solution seems clear -- PARSE_INTEGER, in addition to returning
the integer's value, also returns a true/false flag indicating whether
or  not  the  value was actually valid.  PARSE_INTEGER returns this to
PARSE_INT_ARRAY;  now,  PARSE_INT_ARRAY  realizes  that  its parameter
isn't  a valid integer array --  it must also return a success/failure
flag  to  PROCESS_KEYWORD;  PROCESS_KEYWORD  must  pass it  back up to
PARSE_KEYWORDS;  PARSE_KEYWORDS should return  it to ALTFILE; finally,
ALTFILE informs its caller that the operation failed.

   Let's  look  at  a particular specimen of  one of these procedures;
say,  the  portion  that  handles  the keyword FOOBAR,  which the user
should specify in conjunction with an integer array, a string, and two
dates:

   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET_SUBPARM (0, PARM_STRING);
     IF PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE) = FALSE THEN
       PARSE_KEYWORD:=FALSE
     ELSE
       BEGIN
       GET_SUBPARM (1, PARM_STRING);
       IF PARSE_STRING (PARM_STRING, SP1_VALUE) = FALSE THEN
         PARSE_KEYWORD:=FALSE
       ELSE
         BEGIN
         GET_SUBPARM (2, PARM_STRING);
         IF PARSE_DATE (PARM_STRING, SP2_VALUE) = FALSE THEN
           PARSE_KEYWORD:=FALSE
         ELSE
           BEGIN
           GET_SUBPARM (3, PARM_STRING);
           PARSE_KEYWORD:=
             PARSE_DATE (PARM_STRING, SP3_VALUE) = FALSE ;
           END;
         END;
       END;
     END;
   ...

Of  course,  the  same  sort  of  thing  has  to be  repeated in every
procedure  in  the  calling  sequence;  the moment an  error return is
detected from one of the called procedures, the other calls have to be
skipped,  and  the  error  condition  should be passed  back up to the
caller.

   Error  handling,  of  course,  is important business,  and it would
hardly be appropriate to crash and burn just because the user inputs a
bad  value (users input bad values all the time). Still, all this work
just to catch the error condition?

   What we really want to do in this case is to

   * HAVE WHOEVER DETECTS THE ERROR CONDITION AUTOMATICALLY RETURN ALL
     THE WAY TO THE TOP OF THE CALLING SEQUENCE.

In other words, the error finder might have code that looks like:

   NUM:=BINARY (STR, LEN);
   IF CCODE<>CCE THEN                  { an error detected? }
     SIGNAL_ERROR;                     { return to the top! }

The  procedure we want to return to would indicate its desire to catch
these errors by saying something like:

   ON ERROR DO
     { the code to be activated when the error is detected };
   RESULT:=ALTFILE (FILE, KEYWORDS);

Finally,   the   intermediate  procedures  can  now  be  the  soul  of
simplicity:

   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET_SUBPARM (0, PARM_STRING);
     PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE);
     GET_SUBPARM (1, PARM_STRING);
     PARSE_STRING (PARM_STRING, SP1_VALUE);
     GET_SUBPARM (2, PARM_STRING);
     PARSE_DATE (PARM_STRING, SP2_VALUE);
     GET_SUBPARM (3, PARM_STRING);
     PARSE_DATE (PARM_STRING, SP3_VALUE);
     END;
   ...

Thus, the three components of this scheme:

   * The code that finds the error -- it "SIGNALS THE ERROR";

   *  The code that should be branched  to in case of error is somehow
     indicated,  at compile time or run  time (but before the error is
     actually signaled).

   *  Finally, the intermediate code  knows nothing about the possible
     error condition. It's automatically exited by the error signaling
     mechanism.

For  want of a better name, I'll call this concept a "Long Jump". It's
also  been called a "non-local GOTO", a "throw", a "signal raise", and
other  unsavory  things, but "Long Jump" --  which happens to be the C
name for it -- sounds more romantic.


           LONG JUMPS, CONTINUED -- SOLUTIONS AND PROBLEMS

   I've  indicated the need -- or at least, I think it's a need -- and
a  possible  prototype solution. There  are several implementations of
this already extant, each with its own little quirks and problems.


                     PASCAL -- STANDARD AND /3000

   The  only  mechanism  Standard  PASCAL and PASCAL/3000  give you to
solve  our problem is the GOTO. In  PASCAL, you're allowed to GOTO out
of  a  procedure  or function; however, you  can only branch INTO the
main body of the program or from a nested procedure into the procedure
that contains it. In other words, if you have

   PROCEDURE P;
     PROCEDURE INSIDE_P;   { nested in P }
     BEGIN
     ...
     END;
   BEGIN
   ...
   END;

   PROCEDURE Q;
   BEGIN
   ...
   P;
   ...
   END;

then  you can branch from INSIDE_P into P, but you can't branch from P
into Q, even though Q calls P.

   Even if this restriction weren't present, the GOTO to a fixed label
still  wouldn't  be  the  right  answer -- what  if our PARSE_KEYWORDS
procedure  is called from two places? Surely we wouldn't want an error
condition  to  cause  a  branch  to  the same location  in both cases!
Besides,  if  we  want  to compile PARSE_KEYWORDS  separately from its
caller,  we'd  have  to  allow  "global label  variables". In reality,
PASCAL can't do these "long jumps".


                                 SPL

   SPL  has a different and rather  better facility. In SPL, you can't
branch  from one procedure into another; however, you CAN pass a label
as a parameter to a procedure. Thus, you could write:

   PROCEDURE PARSE'INT'ARRAY (PARM, RESULT, ERR'LABEL);
   BYTE ARRAY PARM;
   INTEGER ARRAY RESULT;
   LABEL ERR'LABEL;
   BEGIN
   ...
   IF << test for error condition >> THEN
     GOTO ERR'LABEL;
   ...
   END;

Then, you might call this from within PROCESS'KEYWORD by saying

   PROCEDURE PROCESS'KEYWORD (KEYWORD'AND'PARM, ERR'LABEL);
   BYTE ARRAY KEYWORD'AND'PARM;
   LABEL ERR'LABEL;
   BEGIN
   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET'SUBPARM (0, PARM'STRING);
     PARSE'INT'ARRAY (PARM'STRING, SP0'VALUE, ERR'LABEL);
     ...
     END;
   ...
   END;

When  you  call  PARSE'INT'ARRAY,  you  pass it the  label to which it
should return in case of error -- in this case, also called ERR'LABEL,
which  was  also  passed  to  this  procedure.  Finally,  the  topmost
procedure -- ALTFILE -- might say:

   RESULT:=ALTFILE (FILENAME, KEYWORDS, GOT'ERROR);
   ...
   GOT'ERROR:
     << report to the user that an error occurred >>

The  key  point  here  is  that  each procedure  doesn't really return
directly to the top; rather, it returns to the error label that it was
passed  by its caller. Since that may  well be the label passed by the
caller's  caller, and so on, you get a sort of "daisy chain" effect by
which  you  can  easily  exit  ten  levels  of procedures  in one GOTO
statement.

   At  this  point,  I think it's quite  important to mention a SEVERE
PROBLEM  of  these  "long  jumps"  that  I  think  any  implementation
mechanism has to be able to address:

   *  THE  VERY ESSENCE OF A LONG JUMP  IS THAT IT BYPASSES SEVERAL OF
     THE  PROCEDURES  IN  THE CALLING SEQUENCE.  A PROCEDURE (say, our
     PROCESS_KEYWORD) CALLS ANOTHER PROCEDURE, EXPECTING THE CALLEE TO
     RETURN, BUT THE CALLEE NEVER DOES!

   Imagine  for a moment that PROCESS_KEYWORD opened a file, intending
to  close it at the end of the operation; after the long jump branches
out  of  it,  the file will remain open.  Any other kind of cleanup --
resetting  temporarily  changed  global variables,  releasing acquired
resources  --  that a procedure expects to  do at the end might remain
undone because the procedure will be branched out of.

   Similarly,  what  if a procedure EXPECTS  another procedure that it
calls  to detect an error condition? What  is a fatal error under some
circumstances  may be quite normal under others; for instance, say you
have  a procedure that reads data from  a file and signals an error if
the  file couldn't be opened -- in some cases, you may expect the file
to be unopenable, and have a set of defaults you want to use instead.

   By using the convenience of long jumps, you lose the certainty that
every  procedure  has complete control over  its execution, and can be
sure that any procedure it calls will always return.

   The  advantage of SPL's approach is that you could call a procedure
passing   to   it   any   error  label  you  want  to.  For  instance,
PROCESS'KEYWORD might look like:

   PROCEDURE PROCESS'KEYWORD (KEYWORD'AND'PARM, ERR'LABEL);
   BYTE ARRAY KEYWORD'AND'PARM;
   LABEL ERR'LABEL;
   BEGIN
   INTEGER FNUM;
   FNUM:=FOPEN (KEY'INFO'FILE, 1);
   ...
   IF KEYWORD="FOOBAR" THEN
     BEGIN
     GET'SUBPARM (0, PARM'STRING);
     PARSE'INT'ARRAY (PARM'STRING, SP0'VALUE, CLOSE'FILE);
     ...
     END;
   ...
   RETURN;   << if we finished normally, just return >>
   CLOSE'FILE:   << branch here in case of error >>
   FCLOSE (FNUM, 0, 0);
   GOTO ERR'LABEL;
   END;

Because  you have complete control over each branch, you don't HAVE to
pass the procedure you call the same error label that you were passed;
if  you want to do some cleanup, you can just pass the label that does
the cleanup, and THEN returns to your own error label.

   Thus,  with SPL's label parameter system,  you get the best of both
worlds:

   *  If  you pass an "error label"  to a procedure, the procedure may
     choose to return normally or to return to the error label.

   *  Since you can pass the same label to a procedure you call as the
     one  that  you yourself were passed, a  single GOTO to that label
     can conceivably exit any number of levels of procedures.

   *  On the other hand, if you want  to do some cleanup in case of an
     error,  you  can just pass a different  label, one that points to
     the cleanup code.

   * Finally -- if you want to -- you can actually pass several labels
     to  a  procedure,  allowing  it  to  return  to  a  different one
     depending on what error condition it finds. A bit extravagant for
     my blood, but maybe I'm just too stodgy.

   The only problems that this system has are:

   *  You  have  to  pass  the  label  to  any  procedure  that  might
     conceivably  want  to  participate  in a long  jump -- either the
     procedure  that initially detects the error or any one that wants
     to  pass  it on. This may often  mean that virtually every one of
     your procedures will have to have this error label parameter. Not
     a very unpleasant problem, but a bit of a bother nonetheless.

   *  Similarly, there are some  procedures whose parameters you can't
     dictate; for instance, control-Y trap procedures (ones in which a
     long  jump  to the control-Y handling code  may often be just the
     thing  you  want  to  do).  Other  trap  procedures  (arithmetic,
     library,  and system) are just like this, too, as are those which
     are   themselves   passed  as  "procedure  parameters"  to  other
     procedures  and  whose  parameters  are  dictated by  those other
     procedures (got that?).

Besides these minor problems, though, SPL's long jump support is quite
reasonably done.


                       PROPOSED ANSI STANDARD C

   C's  "GOTO" doesn't allow any branch  from one function to another;
neither does C provide label parameters like SPL does. Long jumps in C
are  accomplished with a different mechanism, involving the SETJMP and
LONGJMP built-in procedures.

   SETJMP  is a procedure to which you pass a record structure (of the
predefined  type "jmp_buf"). When you first  call it, it saves all the
vital  statistics  of the program --  the current instruction pointer,
the  current  top-of-stack address, etc. --  in this record structure.
Then,  when  the  same record structure is  passed to LONGJMP, LONGJMP
uses  this  information  to restore the  instruction pointer and stack
pointer  to be exactly what they were at SETJMP time. Thus, control is
passed back to the SETJMP location, wherever it may be.

   A typical application of this might be:

   jmp_buf error_trapper;

   proc()
   {
   ...
   if (setjmp(error_trapper) != 0)
      /* do error processing */;
   else
      {
      result = altfile (filename, keywords);
      ...
      }
   ...
   }

   ...

   int parse_integer (str)
   char str[];
   {
   ...
   if (bad_value)
      longjmp (error_trapper, 1);
   ...
   }

   One  thing,  I  didn't,  as  you see, mention at  first was the "IF
(SETJMP(ERROR_TRAPPER)  != 0)". Well, since the LONGJMP jumps DIRECTLY
to  the instruction following the SETJMP, we  have to have some way of
distinguishing  the  first  time  it  is executed  (after a legitimate
SETJMP) and the next time (after the LONGJMP which transferred control
back to it). The initial SETJMP, you see, returns a 0; a LONGJMP takes
its  second  parameter  (in  this  case,  a 1), and  returns it as the
"result" of SETJMP.

   Thus,  when  the  IF statement is first  executed, the value of the
"(SETJMP  ... != 0)" will be FALSE, and the ALTFILE will be done; when
the  IF  is executed a second time, the  value will be TRUE, the error
processing will be performed.

   Note the distinctive features of the SETJMP/LONGJMP construct:

   *  The  "jump buffer" -- set by SETJMP  and used by LONGJMP -- need
     not  be  passed  as  a parameter to each  procedure that needs it
     (although  it  could  be).  Typically,  it's  stored as  a global
     variable (which the SPL error label parameter couldn't be).

   *  You still have control over procedures  you call; if you want to
     trap  their jump yourself (either to  do some cleanup or treat it
     as a normal condition), you can just do your own SETJMP using the
     same buffer that they'll LONGJMP with.

   *  On the other hand, if you want do some cleanup and then continue
     the LONGJMP process -- propagate it back up to the original error
     trapper,  in this case PROC -- you have to do more work. You must
     save  the  original  jump  buffer in a  temporary variable before
     doing  the  SETJMP, and restore it  before continuing the LONGJMP
     (or   simply   returning   from  the  procedure).  For  instance,
     PROCESS_KEYWORD might look like this:

     process_keyword (keyword_and_parm)
     char keyword_and_parm[];
     {
     jmp_buf error_trapper;   /* declare our temporary save buffer */
     int fnum;
     fnum = fopen (key_info_file, 1);
     save_error_trapper = error_trapper;
     if (setjmp (error_trapper) != 0)
        /* Must be an error condition */
        {
        fclose (fnum, 0, 0);
        error_trapper = save_error_trapper;
        longjmp (error_trapper, 1);
        }
     ...
     if (strcmp (keyword, "foobar"))
       {
       get_subparm (0, parm_string);
       parse_int_array (parm_string, sp0_value);
       ...
       }
     ...
     fclose (fnum, 0, 0);
     error_trapper = save_error_trapper; /* restore for future use */
     }

   Frankly  speaking,  if you ask me -- and  even if you don't -- this
doesn't  look  very  clean. I'd like to  see some way of automatically
"stacking"  SETJMPs so that the system would  do the saving of the old
jump  buffer  for you; also, I'd prefer not  to have to type that ugly
"IF  (SETJMP  ...  != 0)" kludge. On the  other hand, this can be made
quite  palatable-looking  with  a  few  macros,  and it's  better than
nothing (or is it?).


                    PASCAL/XL AND THE TRY..RECOVER

   The  authors  of PASCAL/XL -- perhaps  because they were faced with
the  non-trivial  task  of  building  a language that  MPE/XL could be
profitably  written in -- must have given this subject a great deal of
thought.  And, fortunately, they've come up with  what I think to be a
very powerful construct.

   TRY
     statement1;
     statement2;
     ...
     statementN;
   RECOVER
     recoverycode;

The behavior here is

   *  EXECUTE statement1 THROUGH statementN. IF ANY PASCAL ERROR (e.g.
     giving a bad numeric value to a READLN) OR A CALL TO THE BUILT-IN
     "ESCAPE"  PROCEDURE  OCCURS  WITHIN THESE  STATEMENTS, CONTROL IS
     TRANSFERRED  TO  recoverycode,  AND  AFTER THAT  TO THE STATEMENT
     FOLLOWING TRY..RECOVER.

   This,  as  you  see,  allows  you  to  put a  TRY..RECOVER into the
top-level  procedure (in our case, PROC or ALTFILE) and an ESCAPE call
in  any  of the called procedures  (e.g. PARSE_INTEGER) that detects a
fatal error.

   The  best  part,  though,  is  that  any  procedure  that  wants to
establish  some  sort  of  "cleanup"  code can do  this trivially! For
instance, our PROCESS_KEYWORD might say:

   PROCEDURE PROCESS_KEYWORD (VAR KEYWORD_AND_PARM: STRING);
   VAR FNUM: INTEGER;
       SAVE_ESCAPECODE: INTEGER;
   BEGIN
   FNUM:=FOPEN (KEY_INFO_FILE, 1);
   TRY
     ...
     IF KEYWORD="FOOBAR" THEN
       BEGIN
       GET_SUBPARM (0, PARM_STRING);
       PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE);
       END;
     ...
     FCLOSE (FNUM, 0, 0);
   RECOVER
     BEGIN
     SAVE_ESCAPECODE:=ESCAPECODE;
     FCLOSE (FNUM, 0, 0);
     ESCAPE (SAVE_ESCAPECODE);
     END;
   END;

If any error occurs in the code between TRY and RECOVER, the BEGIN/END
in  the RECOVER part is triggered. This is now free to close the file,
or  do  whatever  else it needs to, and  then "pass the error down" by
calling ESCAPE again.

   This ESCAPE -- since it's no longer between this TRY and RECOVER --
will  activate  the previously defined TRY/RECOVER  block (say, in the
PARSE_KEYWORDS  procedure)  which might do more  cleanup and then call
ESCAPE  again.  Eventually,  the error will  percolate to the top-most
TRY/RECOVER,  which  will  just  do some work and  not call ESCAPE any
more, continuing with the rest of the program.

   In  other words, "TRY .. RECOVER"s  can be nested. In the following
piece of code

   TRY
     A;
     TRY
       B;
       TRY
         C;
       RECOVER
         R1;
       D;
     RECOVER
       R2;
     E;
   RECOVER
     R3;

   * An error or ESCAPE in C will cause a branch to R1.

   *  An error/ESCAPE in B or D will, of course, branch to R2 (since B
     and  D are outside the innermost  TRY .. RECOVER R1). However, an
     error/ESCAPE in R1 will also cause a branch to R2! That's because
     R1  is  also  out  of the area of effect  of the innermost TRY ..
     RECOVER.

     In  other words, the "recovery  handler" R1 is only "established"
     between  the  innermost TRY and the  innermost RECOVER; when it's
     actually  "triggered",  it's  disestablished,  and  the  recovery
     handler that was previously in effect is re-established.

   * By this token, an error/ESCAPE in A, E, or R2 will branch to R3.

   *  And, finally, an error in R3  -- or anywhere else outside of the
     TRY  .. RECOVER -- will actually  abort the program with an error
     message.

   As  you see, then, all is for the best in this best of all possible
worlds.  We can do long jumps "up  the stack" to the RECOVER code, but
each  intervening procedure can also easily set up "cleanup code" that
needs to be executed before the long jump can continue.

   Several notes:

   *  First  of  all, remember that the  RECOVER statement is executed
     ONLY  in case of an error or an ESCAPE. If the statements between
     TRY  and RECOVER finish normally, any "cleanup" code you may have
     inside  the  RECOVER will NOT be  executed. That's why our sample
     program  has  two FCLOSEs -- one for  the normal case and one for
     the cleanup case.

   *  Note  also that the ESCAPE call  can take a parameter (just like
     C's  LONGJMP).  This parameter is then  available as the variable
     ESCAPECODE  in the RECOVER handler, and  is used to indicate what
     kind of error or ESCAPE happened.

     A  RECOVER handler might, for instance, be used to avoid an abort
     caused  by  an  expected  error condition (e.g.  file I/O error);
     however,  if  it  sees  that  ESCAPECODE  indicates  some  other,
     unexpected,  error  condition, it might  terminate or call ESCAPE
     again,  hoping that some "higher-level"  RECOVER block can handle
     the error.

   * Finally, if a RECOVER block wants to continue the long jump after
     doing  its cleanup work, it often needs to pass the ESCAPECODE up
     as  well  (unless,  of  course, the  higher-level RECOVER handler
     won't  use  the ESCAPECODE). Unfortunately,  the PASCAL/XL manual
     explicitly tells us:

       -  "It is wise to assign  the result of the ESCAPECODE function
         to  a  local  variable immediately upon  entering the RECOVER
         part  of  a  TRY-RECOVER  construct,  because the  system can
         change that value later in the RECOVER part."

     This  is too bad; it would have  been nice to have TRY .. RECOVER
     do  this  saving for you automatically,  saving you the burden of
     having  to  declare  and  set an extra  local variable. Still, we
     oughtn't look a gift horse in the mouth.

   Note,  incidentally, how C's #define macro facility can come to our
aid  if we want to implement this same  construct in C. All we need is
three #defines:

   int escapecode;
   int jump_stack_ptr = -1;
   jmp_buf jump_stack[100];    /* the stack used to do nesting */
   #define TRY     if (setjmp(jump_stack[++jump_stack_ptr])==0) {
   #define RECOVER jump_stack_ptr--; } else
   #define ESCAPE(parm)   \
           { \
           escapecode = parm; \
           longjmp(jump_stack[jump_stack_ptr--], 1);
           }

This would allow us to say:

   TRY
     code;
   RECOVER
     errorhandler;

and

   ESCAPE(value);

just  like  we could in PASCAL/XL! Note  how we've added this entirely
new  control structure without any changes  to the compiler -- nothing
more complicated than a few #defines! (Many thanks to Tim Chase of CCS
for showing me how to do this!)


                          NESTED PROCEDURES

   An  interesting feature of PASCAL is its ability to have procedures
nested within other procedures. In other words, I could say:

   PROCEDURE PARSE_THING (VAR THING: STRING);
   VAR CURR_PTR, CURR_DELIMS: INTEGER;
       QUOTED: BOOLEAN;
       ...

     PROCEDURE PARSE_CURR_ELEMENT (...);
     BEGIN
     ...
     END;

   BEGIN
   ...
   PARSE_CURR_ELEMENT (...);
   ...
   END;

PARSE_CURR_ELEMENT  here is just like  a local variable of PARSE_THING
--  it's a local procedure. It's callable only from within PARSE_THING
and not from any other procedure in the program. More importantly,

   *  THE  NESTED  PROCEDURE  (PARSE_CURR_ELEMENT)  CAN ACCESS  ALL OF
     PARSE_THING'S LOCAL VARIABLES.

This is a significant consideration. If PARSE_CURR_ELEMENT didn't need
to  access  PARSE_THING's  local  variables,  not  only could  it be a
different  (non-nested)  procedure, but it probably  should be. When a
procedure is entirely self-contained, it's usually a good idea to make
it accessible to as many possible callers as possible.

   On  the other hand, what if PARSE_CURR_ELEMENT needs to interrogate
CURR_PTR  to find out where we are in parsing the thing; or look at or
modify  CURR_DELIMS  or  QUOTED or whatever  other local variables are
relevant to the operation?

   We  don't  want  to have to pass all  these values as parameters --
there could be dozens of them.

   We  don't want to make them  global variables, since they're really
only  relevant  to  PARSE_THING  -- why make  them accessible by other
procedures  that  have  no business messing  with them? (Incidentally,
making the variables global will also prevent PARSE_THING from calling
itself recursively.)

   But,   on   the   other   hand,   we  certainly  DO  want  to  have
PARSE_CURR_ELEMENT  be a procedure -- after all, we might need to call
it  many times from within PARSE_THING; surely we don't want to repeat
the code every time!

   Thus,  the  main  advantage of nested procedures  is not just that,
like  local  variables,  they  can  only be accessed  by the "nester".
Rather,  the  advantage  is the fact that  they can share the nester's
local  variables,  which  are often quite relevant  to what the nested
procedure is supposed to do.

   Another  substantial  benefit  comes  when  you pass  procedures as
parameters  to  other  procedures.  A good example of  this might be a
report writer procedure:

   TYPE LINE_TYPE = PACKED ARRAY [1..256] OF CHAR;
   PROCEDURE PRINT_LINE (VAR LINE: LINE_TYPE;
                         LINE_LEN: INTEGER;
                         PROCEDURE PAGE_HEADER (PAGENUM: INTEGER);
                         PROCEDURE PAGE_FOOTER (PAGENUM: INTEGER));

This  procedure  takes the line to be  output and its length, but also
takes  two procedures -- one that will be called in case a page header
should be printed and one in case a page footer should be printed. The
utility  of  this is obvious -- it gives  the user the power to define
his own header and footer format.

   Now, let's say we have the following procedure:

   PROCEDURE PRINT_CUST_REPORT (VAR CATEGORY: INTEGER);
   VAR CURRENT_COUNTRY: PACKED ARRAY [1..40] OF CHAR;
   ...
   BEGIN
   ...
   PRINT_LINE (OUT_LINE, OUT_LINE_LEN,
               MY_PAGE_HEAD_PROC, MY_PAGE_FOOT_PROC);
   ...
   END;

PRINT_LINE   will   output   OUT_LINE   and,   in   some  cases,  call
MY_PAGE_HEAD_PROC or MY_PAGE_FOOT_PROC. Now, it makes sense for you to
want  these  procedures  to print, say, the  current value of CATEGORY
and, perhaps, CURRENT_COUNTRY.

   In   C   and   SPL,   which   have   no   nested   procedure,  both
MY_PAGE_HEAD_PROC  and  MY_PAGE_FOOT_PROC  would  have to  be separate
procedures   which   have   no  access  to  PRINT_CUST_REPORT's  local
variables.

   The  variables  would  either  have  to  be global  (which is quite
undesirable)  or would somehow have to  be passed to PRINT_LINE, which
in turn would pass them to the MY_PAGE_xxx_PROC procedures.

   This  would  be  quite  cumbersome, since  in PRINT_CUST_REPORT the
header and footer procedures need to be passed an integer and a PACKED
ARRAY  OF  CHAR, whereas in some  other application of PRINT_LINE they
would have be to passed, say, three floats and a record structure.

   In   PASCAL,   on   the  other  hand,  both  MY_PAGE_HEAD_PROC  and
MY_PAGE_FOOT_PROC can be nested within PRINT_CUST_REPORT and thus have
access  to  CATEGORY  and  CURRENT_COMPANY  (and  all the  other local
variables   of   the   PRINT_CUST_REPORT  procedure).  Another  useful
application for nested procedures.


   C,  as I mentioned, has no nested  procedure support at all. On the
other  hand,  it  does  have #DEFINEs, which allow  you to define text
substitutions  that can often do the  job (see the section on DEFINES)
of  a nested procedure, especially if  it's a small one. For instance,
you can say:

   #define foo(x,y) \
   { \
   int a, b; \   /* variables local to THIS DEFINE */
   a = x + parm1; \    /* access a variable local to the procedure */
   b = y * parm2; \    /* (the nesting procedure) */
   x = a + b; \
   y = a * b; \
   }

As  you  can  see,  C's  support for "block-local"  variables -- local
variables  that are local not just to the procedure, but rather to the
"{"/"}"  block in which they're defined -- allows you to have #DEFINEs
that are almost as powerful as real procedures.

   SPL  allows you to have "SUBROUTINE"s nested within procedures, but
subject to some rather stringent restrictions:

   * The subroutines can have no local variables of their own. This is
     a  pretty  severe  problem,  since  it means that  all your local
     variables  have  to  be declared in  the nesting procedure, which
     increases  the  likelihood of errors and  also prohibits you from
     calling  the subroutine recursively (which you would otherwise be
     able to do).

   *  The  subroutines  can  not be passed  as procedure parameters to
     other procedures (only procedures can be -- try parsing that!).

   *  Furthermore, this nesting capability goes to only one level; you
     can  nest SUBROUTINEs in PROCEDUREs,  but you can't nest anything
     within  SUBROUTINEs.  In PASCAL, procedures  can be nested within
     each  other  to an arbitrary number  of levels. Frankly speaking,
     I'm  hard  put  to  think  of  an  application  for triply-nested
     procedures.

   Practically,  you'll  have to decide  for yourself whether PASCAL's
nested procedure support -- and C's lack of it -- is important to you.
I  brought  this issue up to a C  partisan, and she replied that she's
simply  never  run  into a case where  nested procedures were all that
important.  Upon thinking about this, I  found myself forced to agree,
at least partially:

   * #DEFINEs can do much of the job that nested procedures are needed
     for;

   *  Most  procedures should often NOT be  nested, but rather be made
     self-contained  and made available to  the world at large (rather
     than just to a particular procedure).

   *  If the reason you don't want to declare your variables as global
     is that you want to "hide" them from other procedures, you can do
     this  in C by making them "static". This will make them available
     only to the procedures in the file in which they're defined. This
     allows  you  to  share  data between procedures  (which you might
     otherwise  have wanted to nest  within each other) without making
     the data readable and modifiable by everybody.

   *  On  the  other hand, there's no denying  that there are cases in
     which  PASCAL's nested procedures are quite a bit superior to any
     C  or SPL alternative. For  instance, a recursive procedure might
     well  not be able to use  the "static global variable" approach I
     just mentioned.


                              DATA TYPES

   The  difference  most  often cited between PASCAL  and C is the way
that  they treat data types. PASCAL is often considered a "strict type
checking"  language and C a "loose type checking" language, and that's
true enough. However, the effects of this philosophical difference are
subtler and more pervasive than at first glance appears.

   What  are  data  types?  Data types can be  seen in the earliest of
languages, from FORTRAN and COBOL onwards. When you declare a variable
to  be  a  certain  data  type,  you  give certain  information to the
compiler -- information that the compiler must have to produce correct
code. Historically, this information has included:


   *  What the various operators of  the language MEAN when applied to
     the  variable.  "+", for instance, isn't  just "addition" -- when
     you add two integers, it's integer addition, and when you add two
     reals,  it's  real  addition. Two  entirely different operations,
     with  entirely different machine  language opcodes and (possibly)
     different  effects  on  the  system  state. Similarly,  a FORTRAN
     "DISPLAY X" means:

       - If X is a string, print it verbatim;

       - If X is an integer, print its ASCII representation;

       -  If  X  is  a real, print its  ASCII representation, but in a
         different format and with a different conversion mechanism.


   *  How much SPACE is to be allocated for the variable. "Array of 20
     integers" is a type, too, one from which the compiler can exactly
     deduce  how  much memory (20 words) needs  to be allocated to fit
     this data.

If  you look at SPL (and,  incidentally, FORTRAN and other languages),
you'll  find  that  all  of  its type declarations  essentially aim at
serving  these  two  functions. However, in recent  times, a few other
functions have been ascribed to type declarations:

   *  Using type declarations, the compiler can DETECT ERRORS that you
     may  make.  The  compiler  can't,  of course, figure  out if your
     program  does  "the  right thing" since it  doesn't know what the
     right  thing  is;  however, it can see  if there are any internal
     inconsistencies in your program.

     For instance, if you're multiplying two strings, the compiler can
     tag  that  as  an obvious error; similarly,  if you pass a string
     parameter to a procedure that expects an integer (or vice versa),
     a  good  compiler  will find this and save  you a lot of run-time
     debugging. The more elaborate and precise the type specifications
     you give, the more error checking the compiler can do.

     Error  checking can also be provided at run time, where code that
     knows  what size arrays are, for instance, can make sure that you
     don't inadvertently index outside them. PASCAL's "subrange types"
     do  this sort of thing, too,  allowing you to declare what values
     (e.g.  "0  to  100") a variable may  take and triggering an error
     when you try assigning it an invalid value.

   *   Furthermore,   with   a  type  declaration,  the  compiler  can
     automatically SAVE WORK for you by automatically defining special
     tools for the given type.

     The  classic  example  of  this  is  the  record structure  -- by
     declaring  the structure, you're automatically  defining a set of
     "operators"  (one for each field of the structure) that allow you
     to  easily access the structure.  Similarly, enumerated types can
     save  you  the  burden  of  having to  manually allocate distinct
     values  for each of the  elements in the enumeration (admittedly,
     not a very large burden).

     Some  fancy  compilers  can  even  automatically  define  "print"
     operations for each record structure, so that you can easily dump
     it  in  a legible format to the  terminal without having to print
     each element individually.

   *  Good  type  handling provisions can  INSULATE YOUR PROGRAMS FROM
     CHANGES  IN YOUR DATA'S INTERNAL REPRESENTATION. For instance, if
     the  compiler allows you to refer to a field of a record as, say,
     "CUSTREC.NAME"  instead  of  "CUSTREC(20)",  then you  can easily
     reformat  the insides of the  record (adding new fields, changing
     field  sizes,  etc.)  without  having  to change  all places that
     reference this record.

     Similarly,  if  your language allows  functions to return records
     and  arrays as well as scalars, you can easily change the type of
     your,  say,  disc  addresses  from  a 2-word double  integer to a
     10-word  array  of integers. In SPL,  for instance, such a change
     would  require rewriting all procedures  that want to return such
     objects or to take them as "by-value" parameters. Even changing a
     value from an "integer" to a "double integer" in SPL will require
     you to change a great deal of code.


   The  reason  I've given this list is  that SPL, PASCAL, and C place
different  weights on each of these  points, and this makes for rather
substantial differences in the way you use these languages.

    Now, away from the generalities and on to concrete examples.


                          RECORD STRUCTURES

   Consider  for  a  moment  an entry in your  "employee" data set. It
could  be a file label; it could  be a Process Control Block entry; it
could  be any chunk of memory  that contains various fields of various
data types.

   A  typical  layout  of  this employee entry  (or employee "record")
might be:

   Words 0 through 14 - The employee name (a 30-character string);
   Words 15-19 - Social security number (10-character string);
   Words 20-21 - Birthday (a double integer, just to be interesting);
   Words 22-23 - Monthly salary (a real number).

A  simple record. It's 24 words long, but it's not really an "array of
24  words";  logically  speaking, to you and  me, it's a collection of
four  objects, each of a different  type, each starting at a different
(but constant) offset within the record.

   How  do  we declare a variable to  hold this record? In FORTRAN and
SPL, it's easy:

   INTEGER ARRAY EMPREC(0:23);
     or
   INTEGER EMPREC(24)

Short  and sweet. The compiler's happy --  it knows that it's an array
of integers, which means you can extract an arbitrary element from it,
and  pass  it  to  a  procedure  (like DBGET), which  will receive its
address  as  an  integer  pointer.  This  defines to  the compiler the
MEANING  of the "indexing" and "pass to procedure" operations that can
be  done  on  EMPREC.  Also, the compiler knows  that 24 words must be
allocated for this array, as a global or local variable.

   The compiler is happy, but are you? First of all, how are you going
to access the various elements of this record structure? Are you going
to say

   EMPREC(20)

when  you mean the employee's birthday  (actually, since it's a double
integer, you couldn't even do that)?

   What  about error checking? Since all the compiler knows about this
is that it's an integer array, it'll be happy as punch to allow you to
put  it anywhere an integer array can go. Would you like to pass it as
the  database  name to DBGET instead of  as the buffer variable? Fine.
Would  you like to view it as a 4 by 5 matrix and multiply it by, say,
the department record? The computer will gladly oblige.

   Finally,  consider the burden this places  on you whenever you want
to  change  the layout of EMPREC -- say,  to increase the name from 30
characters  to  40.  You'll  have to change  all your "EMPREC(20)"s to
"EMPREC(25)",  all your "INTEGER ARRAY EMPREC(0:23)" to "INTEGER ARRAY
EMPREC (0:28)". And, of course, if you forget one or the other -- why,
the  compiler  will  be  happy  to extract the 4th  word of the social
security number and treat it as the employee's birthday!

   Of  course,  you're  not  going to do this.  You will certainly not
refer  to  all  the elements of the  record structure by their numeric
array  indices (although it so happens that most of HP's MPE code does
exactly  this). Rather, you'll say (of course, in SPL, you can also do
the same thing with DEFINEs):

   EQUATE SIZE'EMPREC = 24;
   BYTE ARRAY EMP'NAME          (*) = EMPREC(0);
   BYTE ARRAY EMP'SSN           (*) = EMPREC(15);
   DOUBLE ARRAY EMP'BIRTHDATE   (*) = EMPREC(20);
   REAL ARRAY EMP'SALARY        (*) = EMPREC(22);
   [Note: The fact that we define, say, EMP'BIRTHDATE and
   EMP'SALARY as arrays isn't a problem.  If we say EMP'SALARY
   with no subscript, it'll refer to the 0th element of this
   "array", which is exactly what we want it to do.]

   FORTRAN  is  similar  (you'd  use  an EQUIVALENCE); COBOL  is a bit
simpler,  allowing  you  to  say (remembering that  COBOL doesn't have
REALs).

   01 EMPREC.
      05 NAME          PIC X(30).
      05 SSN           PIC X(10).
      05 BIRTHDATE     PIC S9(9) COMP.
      05 SALARY        PIC S9(5)V9(2) COMP-3.

As  you  see,  COBOL at least has  the advantage that it automatically
calculates  the  indexes  of  each  subfield  for  you. This  is nice,
especially  when  you  change  the structure,  reshuffling, inserting,
deleting,  or resizing fields. On the other hand, I wouldn't call this
a  very  substantial  feature, especially since  sometimes you WANT to
manually  specify the field offsets  (whenever the record structure is
not under your control, like, say, an MPE file label).

   To  summarize,  this "EQUIVALENCE"ing approach  that's available in
SPL,  FORTRAN, and COBOL saves you from the very substantial bother of
having to hardcode the offsets of all the subfields into your program.
This is certainly a good thing; however, PASCAL and C go substantially
beyond this.

   The  most serious problem with  what I'll call the "EQUIVALENCE"ing
approach  is a rather subtle one, one  that I didn't realize until I'd
used it for some time.

   The  definitions  we  saw  above  --  in SPL, FORTRAN,  or COBOL --
defined  several variables as subfields  of another variable. EMP'NAME
and  EMP'SSN are subfields of EMPREC. What  if we need to declare this
EMPREC twice -- say, in two different procedures?

   Clearly  we  don't want to have to  repeat the EQUIVALENCEs in each
procedure.  Yet what choice do we have? We might, for instance, set up
each  of  the subfields as a DEFINE  instead of an equivalence, making
the DEFINEs available in all the procedures that reference EMPREC:

   DEFINE EMP'NAME          = EMPREC(0) #;
   DEFINE EMP'SSN           = EMPREC(15) #;
   DEFINE EMP'BIRTHDATE     = EMPREC(20) #;
   DEFINE EMP'SALARY        = EMPREC(22) #;

but then, since DEFINEs are merely text substitutions and EMPREC is an
integer  array, each EMP'xxx will also  be an integer array. We'd have
to say

   BYTE ARRAY EMPREC'B(*)=EMPREC;
   DOUBLE ARRAY EMPREC'D(*)=EMPREC;
   REAL ARRAY EMPREC'R(*)=EMPREC;

in each procedure that defines an EMPREC array, and a

   DEFINE EMP'NAME          = EMPREC'B(0) #;
   DEFINE EMP'SSN           = EMPREC'B(15) #;
   DEFINE EMP'BIRTHDATE     = EMPREC'D(20) #;
   DEFINE EMP'SALARY        = EMPREC'R(22) #;

at  the  beginning  of  the program. Still, we'd  have had to have the
defines  of the BYTE ARRAY, DOUBLE ARRAY, and REAL ARRAY repeated once
for  each  declaration  of  EMPREC;  and, what if we  want to call the
record  something  else,  like  have  two  records called  EMPREC1 and
EMPREC2?

   *  THE PROBLEM WITH DEFINING SUBFIELDS  OF A RECORD STRUCTURE USING
     THE  "EQUIVALENCING" APPROACH IS THAT IT DEFINES THE SUBFIELDS OF
     ONLY ONE RECORD STRUCTURE VARIABLE.

     WHAT  WE WANT IS TO DEFINE A GENERALIZED "TEMPLATE" ONCE AND THEN
     APPLY THIS TEMPLATE TO EACH RECORD STRUCTURE VARIABLE WE USE.

In other words, we want to be able to say

   DEFINE'TYPE EMPLOYEE'REC (SIZE 24)
     BEGIN
     BYTE ARRAY NAME          (*) = RECORD(0);
     BYTE ARRAY SSN           (*) = RECORD(15);
     DOUBLE ARRAY BIRTHDATE   (*) = RECORD(20);
     REAL ARRAY SALARY        (*) = RECORD(22);
     END;

and then declare any particular employee record buffer by saying:

   EMPLOYEE'REC EMPREC1;
   EMPLOYEE'REC EMPREC2;

Then,  we  could  extract each individual subfield  of the record like
this:

   NEW'SALARY := EMPREC1.SALARY * 1.1;

The point here is that

   * IN ADDITION TO NOT HAVING TO EXPLICITLY SPECIFY THE OFFSET OF THE
     SUBFIELD  OF THE RECORD (like having  to say RECORD(22), an awful
     thing  to  do),  WE  CAN  NOW  DEFINE  THE  LAYOUT OF  THE RECORD
     STRUCTURE  ONCE,  REGARDLESS  OF  HOW  MANY  VARIABLES  WITH THAT
     STRUCTURE WE WANT TO DECLARE.

Do you see how nicely this dovetails with the "INSULATING YOUR PROGRAM
FROM  CHANGING  INTERNAL REPRESENTATION" principle  we gave above? The
record structure layout is defined in EXACTLY ONE PLACE in the program
file.  We can have a hundred different  variables of this type -- none
of  them  will have to specify the physical  size of the buffer or the
offsets  of the subfields. Each one will merely refer back to the type
declaration.

   Also,  we've now announced EMPREC1 to  the compiler as being of the
special  "EMPLOYEE'REC"  type. It's no longer  a simple INTEGER ARRAY,
just  like  any  other  integer  array.  Conceivably, if  we declare a
procedure to be

   PROCEDURE PUT'EMPLOYEE (DBNAME, EMPREC, FRAMASTAT);
   INTEGER ARRAY DBNAME;
   EMPLOYEE'REC EMPREC;
   INTEGER FRAMASTAT;
   ...

the compiler can warn us that

   EMPLOYEE'REC EMPREC;
   INTEGER ARRAY DBNAME;
   INTEGER FOOBAR;
   ...
   PUT'EMPLOYEE (EMPREC, DBNAME, FOOBAR);

is  an invalid call -- it sees  that an object of type EMPLOYEE'REC is
being  passed  in  place of an INTEGER ARRAY,  and an INTEGER ARRAY is
being passed in place of an EMPLOYEE'REC. Without this error checking,
you'd  have  to  find this problem yourself  at run-time, a distinctly
more difficult task.


                  RECORD STRUCTURES IN PASCAL AND C

   What I just gave is the rationale for record structures, mostly for
the  benefit of SPL programmers who  haven't used PASCAL and C before.
Of  course,  the  only  reason I gave it is  that PASCAL and C do have
record  structure support, remarkably similar  support at that. Here's
the way you declare a structure data type in PASCAL:

   { "PACKED ARRAY OF CHAR"s are PASCAL strings }
   TYPE EMP_RECORD = RECORD
                     NAME: PACKED ARRAY [1..30] OF CHAR;
                     SSN: PACKED ARRAY [1..10] OF CHAR;
                     BIRTHDATE: INTEGER;  { really a double integer }
                     SALARY: REAL;
                     END;
   ...
   VAR
     EMPREC: EMP_RECORD;   { declare a variable called "EMPREC" }

And in C:

   typedef
     struct {char name[30];
             char ssn[10];
             long int birthdate;
             float salary;
            }
     emp_record;
   ...
   emp_record emprec;   /* declare a variable called "emprec" */

You  can  see  the  minor differences -- the  type names are different
("float"  instead  of "REAL", "long int"  to mean double integer); the
type name comes at the end of the "typedef"; the newly defined type is
used a "statement" all its own rather than as part of a VAR statement;
and,   of  course,  everything's  written  in  those  CUTE  lower-case
characters.  In  essence,  of  course,  the constructs  are absolutely
identical.

   The use is identical, as well:

   NEW_SALARY := EMPREC.SALARY * 1.1;
   new_salary = emprec.salary * 1.1;

Incidentally,  if we didn't want to define a new type, but rather just
wanted  to  define  one  variable of a given  structure, we could have
said:

   VAR EMPREC: RECORD
               NAME: PACKED ARRAY [1..30] OF CHAR;
               SSN: PACKED ARRAY [1..10] OF CHAR;
               BIRTHDATE: INTEGER;  { really a double integer }
               SALARY: REAL;
               END;

   struct {char name[30];
           char ssn[10];
           long int birthdate;
           float salary;
          }
     emprec;

Note  how the type declaration is very much like the original variable
declaration.

   So,  declaring  and using record structures  is identical in PASCAL
and C. However, there's a VERY BIG DIFFERENCE between PASCAL and C.

   *  In  PASCAL, strict type checking is  more than just a good idea,
     it's the LAW.

     If  a  function  parameter is declared  as type EMPLOYEE_REC, any
     function  call to it must pass an object of that type. Even if it
     passes  a  record structure that's defined  with exactly the same
     fields  but  with  a  different  type  name  (admittedly  a  rare
     occurrence), the compiler will cough.

     Any structure parameter must be of EXACTLY THE RIGHT TYPE.


   *  Many  C  programmers view strict type checking  much as you or I
     might  view,  say, the Gestapo or the  KGB. Kernighan & Ritchie C
     compilers DO NOT do type checking.

     In  fact, in Kernighan & Ritchie C, you can pass a string where a
     real  number is expected, and the  compiler won't say a word! (On
     the other hand, your program is unlikely to work right.)


I could fault C for this, treating C's lack of type checking much as I
do,  say,  SPL's  lack of an easy I/O  facility. The trouble is that C
programmers  don't  think  that  lack of type checking  is a bug; they
think  it's  a  feature. The problem is  philosophical -- what are the
benefits of type checking and do they outweigh the drawbacks?


      TYPE CHECKING -- ORIGINAL STANDARD PASCAL AND PASCAL/3000

   Earlier  in the paper I brought  up a certain point. Compilers that
know  the type of variables can, I  said, check your code to make sure
that you're not using types inconsistently.

   For  instance,  if  you use a character when  you should be using a
real  number, that's an "obvious error" and  the compiler can do you a
favor  by  complaining  at  compile-time.  Similarly,  if you  pass an
employee  record  to a procedure that  expects a database name, that's
also an error, and should also be reported.

   Now,  this  principle  is  in many ways at  the heart of the PASCAL
language.  And,  certainly, everyone will agree  that it would be good
for the compiler to find errors in your program rather than making you
do it yourself. The question is --

   IS A COMPILER WISE ENOUGH TO DETERMINE WHAT IS AN ERROR AND WHAT IS
   NOT?

   For instance, say you write

   VAR CH: INTEGER;
   IF 'a'<=CH AND CH<='z' THEN
     CH:=CH-'a'+'A';

Utterly  awful!  We have what -- to PASCAL,  at least -- is at least 4
type  inconsistencies; we're comparing an  integer against a character
two  times,  and  then  we're  adding  and subtracting  characters and
integers! Obviously an error.

   Actually,  of  course,  this  code takes CH, which  it assumes is a
character's  ASCII  code,  and  upshifts it. If it  finds that CH is a
lower  case character, it shifts it  into the upper case character set
by subtracting 'a' and adding 'A'.

   Some  might complain that this code  is not portable (it won't, for
instance,  work  on  EBCDIC  machines),  but that's  not relevant. The
programmer  has a perfect right to assume that the code will run on an
ASCII machine; you mustn't ram portability down his throat. Sometimes,
it's  very useful to be able to, say, treat characters as integers and
vice versa.

   Now,  before anybody accuses me of  slandering PASCAL, I must point
out  that  the  solution  is  readily available. Pascal  can convert a
character  to an integer using the "ORD" function, and an integer to a
character using "CHR"; our code could easily be re-written:

   VAR CH: INTEGER;
   IF ORD('a')<=CH AND CH<=ORD('z') THEN
     CH:=CH-ORD('a')+ORD('A');

The  important  point  here  is  not  whether  or not  you can upshift
characters; the important fact is that:

   *  SOMETIMES  A  PROGRAMMER MAY CONSCIOUSLY WANT  TO DO THINGS THAT
     MIGHT USUALLY BE VIEWED AS TYPE INCOMPATIBILITIES.

   Consider, for a moment, the following application:

   * You want to write a procedure that adds a record to the database.
     Unlike  DBPUT,  this one should just  take the database name, the
     dataset  name,  and  the  buffer,  and do all  the error checking
     itself.

Sounds simple, no? You write:

   TYPE TDATABASE = PACKED ARRAY [1..30] OF CHAR;
        TDATASET = PACKED ARRAY [1..16] OF CHAR;
        TRECORD = ???;
   ...
   PROCEDURE PUT_REC (VAR DB: TDATABASE;
                      S: TDATASET;
                      VAR REC: TRECORD);
   BEGIN
   ...
   END;

BUT HOW DO YOU DEFINE "TRECORD"?

   Remember  why I said that type  checking is such a wonderful thing.
After  all, if a procedure expects a "customer record" and you pass it
an "employee record", you want the compiler to complain.

   But what if the procedure expects ANY kind of record? What if it'll
be  perfectly  HAPPY  to  take  an employee record,  a sales record, a
database name, or a 10 x 10 real matrix? How should the compiler react
then?

   Unfortunately,  PASCAL,  with all its  sophisticated type checking,
falls  flat  on  its  face  (this is true of  both Standard PASCAL and
PASCAL/3000).

   At  this point, in the interest  of fairness (and for the practical
use  of  those  who  HAVE to do this sort  of thing in PASCAL), I must
point  out  that  PASCAL  does have a  mechanism for supporting record
structures  of  different  types.  The  trick  is to  use a degenerate
variation  of  the  record  structure called the  "tagless variant" or
"union"  structure. It's quite similar  to EQUIVALENCE in FORTRAN, but
even uglier.

   To put it briefly, you have to say the following:

   TYPE TANY_RECORD =
        RECORD
          CASE 1..5 OF
            1: (EMP_CASE: TEMPLOYEE_RECORD);
            2: (CUST_CASE: TCUSTOMER_RECORD);
            3: (VENDOR_CASE: TVENDOR_RECORD);
            4: (INV_CASE: TINVOICE_RECORD);
            5: (DEPT_CASE: TDEPARTMENT_RECORD);
        END;

This defines the type "TANY_RECORD" to be a record structure which can
be looked at in one of FIVE different ways:

   *   As  having  one  field  called  "EMP_CASE"  which  is  of  type
     "TEMPLOYEE_RECORD".

   *  As  having  one  field  called  "CUST_CASE"  which  is  of  type
     "TCUSTOMER_RECORD".

   *  Or,  as  having  one field called  "VENDOR_CASE", "INV_CASE", or
     "DEPT_CASE",     which     is     of    type    "TVENDOR_RECORD",
     "TINVOICE_RECORD", or "TDEPARTMENT_RECORD", respectively. You get
     the idea.

If  you  declare a variable of  type "TANY_RECORD", it'll be allocated
with enough room for the largest of the component datatypes. Then, you
can  make  the variable "look" like any  one of these records by using
the appropriate subfield:

   VAR R: TANY_RECORD;
   ...
   WRITELN (R.EMP_CASE.NAME);   { views R as an employee record }
   WRITELN (R.DEPT_CASE.DEPTHEAD);  { views R as a dept record }
   WRITELN (R.INV_CASE.AMOUNT);   { views R as an invoice record }

In  other  words,  an  object  of  type  TANY_RECORD is  actually five
different record structures "equivalenced" together; which one you get
depends on which ".xxx_CASE" subfield you use.

   Got  all  that?  Now,  here's  how you define  and call the PUT_REC
procedure:

   PROCEDURE PUT_REC (VAR DB: TDATABASE;
                      S: TDATASET;
                      VAR REC: TANY_RECORD);
   BEGIN
   ...
   END;
   ...
   { now, all dataset records you need to pass must be declared to }
   { be of type TANY_RECORD. }
   READLN (R.EMP_CASE.NAME, R.EMP_CASE.SSN);
   R.EMP_CASE.BIRTHDATE := 022968;
   R.EMP_CASE.SALARY := MINIMUM_WAGE - 1.00;
   PUT_REC (MY_DB, EMP_DATASET, R);

You  must  declare ALL YOUR DATASET RECORDS  to be of type TANY_RECORD
(wasting  space  if,  say,  TDEPARTMENT_RECORD  is  10 bytes  long and
TINVOICE_RECORD  is  200 bytes long); you must  refer to them with the
appropriate  ".xxx_CASE" subfield; then, you must pass the TANY_RECORD
to  PUT_REC.  (Alternately, you may have  one "working area" record of
type  TANY_RECORD  and  move the record you  want into the appropriate
subfield of this "working area" record before calling PUT_REC.)

   As  you  may  have guessed, I think this  is a very poor workaround
indeed:

   * You need to specify in the TANY_RECORD declaration every possible
     type that you'll ever want to pass to PUT_REC;

   *  You have to declare any record you want to pass to PUT_REC to be
     of type TANY_RECORD, even if it wastes space.

   *  If  you  don't want to use a  "working area" record, you have to
     refer to all your records as "R.EMP_CASE" or "R.DEPT_CASE" rather
     than  just defining R as the appropriate type and referring to it
     just as "R".

   * If you do use a "working area" record, to wit:

       VAR WORK_RECORD: TANY_RECORD;
           EMP_REC: TEMPLOYEE_RECORD;
       ...
       READLN (EMP_REC.NAME, EMP_REC.SSN);
       EMP_REC.BIRTHDATE := 022968;
       EMP_REC.SALARY := MINIMUM_WAGE - 1.00;
       WORK_RECORD.EMP_CASE := EMP_REC;
       PUT_REC (MY_DB, EMP_DATASET, WORK_RECORD);

     then  you  have  to  move your data into  it before every PUT_REC
     call, which is both ugly and inefficient.

And  why?  All  because  PASCAL isn't flexible enough  to allow you to
declare a parameter to be of "any type".

   A  couple  more  examples  of  cases where strict  type checking is
utterly lethal may be in order:

   *  Say that you want to write  a procedure that compares two PACKED
     ARRAY  OF  CHARs  (in Standard PASCAL, these  are the only way of
     representing   strings).  You  must  define  the  types  of  your
     parameters, INCLUDING THE PARAMETER LENGTHS! In other words,

       TYPE TPAC = PACKED ARRAY [1..256] OF CHAR;
       VAR P1: PACKED ARRAY [1..80] OF CHAR;
           P2: PACKED ARRAY [1..80] OF CHAR;
       ...
       FUNCTION STRCOMPARE (VAR X1: TPAC; VAR X2: TPAC): BOOLEAN;
       BEGIN
       ...
       END;
       ...
       IF STRCOMPARE (P1, P2) THEN ...

     is  ILLEGAL. P1, you see, is an 80-character string, which is not
     compatible  with the function parameter, which is a 256-character
     string.

   *  Say that you want to write  a procedure like WRITELN, which will
     format  data of various types. WRITELN  may not be sufficient for
     your  needs  --  you  might  need  to  be able  to output numbers
     zerofilled or in octal, you might want to provide for page breaks
     and  line  wraparound,  etc.  Surely you should  be allowed to do
     this!

     Well,  first  of  all,  you  can't  have  a  variable  number  of
     parameters.  But,  even  if you're willing to  have a maximum of,
     say, 10 parameters and pad the list with 0s, your parameters must
     all be of fixed types!

     Thus,  even if your design calls for some kind of "format string"
     that'll  tell  your  WRITELN-replacement what the  actual type of
     each  parameter is, you can't do anything. You must either have a
     procedure  for each possible type  combination (one to output two
     integers  and  a  string,  one to output a  real, an integer, and
     three  strings,  etc.),  or  have  the procedure  only output one
     entity at a time. This way, you'll have to write:

       PRINTS ('THE RESULT WAS ');
       PRINTI (ACTUAL);
       PRINTS (' OUT OF A MAXIMUM ');
       PRINTI (MAXIMUM);
       PRINTS (', WHICH WAS ');
       PRINTR (ACTUAL/MAXIMUM*100);
       PRINTS ('%');
       PRINTLN;

     instead of

     PRINTF ('THE RESULT WAS %d OUT OF A MAXIMUM %d, WHICH WAS %f',
             ACTUAL, MAXIMUM, ACTUAL/MAXIMUM*100);

   *  Finally  --  although  it should be obvious  by now -- you can't
     write,  say,  a matrix inversion function  that takes any kind of
     matrix.  You  could  write a 2x2 inverter,  a 3x3 inverter, a 4x4
     inverter,  and  so  on. You could also  write a matrix multiplier
     that  multiplies  2x2s  by 2x2s, another that  does 2x2s by 2x3s,
     another  2x2s  by 2x4s, another 3x2s by  2x2s, .... Just think of
     the job security you'll have!

   For  fairness's  sake,  I must admit that  this problem is SLIGHTLY
mitigated in PASCAL/3000.

   PASCAL/3000  has  a "STRING" data type,  which is a variable-length
string  (as  opposed to PACKED ARRAY OF  CHAR, which is a fixed-length
string).   In   other   words,  PASCAL/3000  STRINGs  are  essentially
(internally)  record structures, containing an  integer -- the current
string length -- and a PACKED ARRAY OF CHAR -- the string data.

   When HP implemented this, they were good enough to make all STRINGs
--  regardless of their maximum sizes -- "assignment- compatible" with
each other. This means that you can say:

   VAR STR1: STRING[80];
       STR2: STRING[256];
   ...
   STR1:=STR2;

and also

   TYPE TSTR256 = STRING[256];
   VAR S: STRING[80];
   ...
   FUNCTION FIRST_NON_BLANK (PARM: TSTR256): INTEGER;
   BEGIN
   ...
   END;
   ...
   I := FIRST_NON_BLANK (S);

Since  STRING[80]s  (strings with maximum  length 80) and STRING[256]s
(strings  with maximum length 256) are assignment- compatible, you may
both  directly assign them (STR1:=STR2) and pass one by value in place
of another (PROC(S)).

   Although  "assignment  compatibility"  allows  by-value  passing, a
variable  passed by reference still has to be of exactly the same type
as the formal parameter specified in the procedure's header. Thus,

   TYPE TSTR256 = STRING[256];
   VAR S: STRING[80];
   ...
   FUNCTION FIRST_NON_BLANK (VAR PARM: TSTR256): INTEGER;
   BEGIN
   ...
   END;
   ...
   I := FIRST_NON_BLANK (S);

is  still  illegal, since STRING[80]s can't  be passed to by-reference
(VAR)  parameters  of type STRING[256].  Fortunately, PASCAL/3000 also
lets you say:

   FUNCTION FIRST_NON_BLANK (VAR PARM: STRING): INTEGER;

Specifying  a type of "STRING"  rather than "STRING[maxlength]" allows
you to pass any string in place of the parameter.

   This  only works for STRING parameters.  It doesn't work for PACKED
ARRAYs  OF CHAR; it doesn't work  for other array structures; it isn't
supported by Standard PASCAL. However, for the specific case of string
manipulation,  you  can get around some  of PASCAL's onerous parameter
type checking restrictions.

   Remember also that this is strictly an PASCAL/3000 (PASCAL/3000 and
PASCAL/XL)  feature,  and  can  not  be relied on  in any other PASCAL
compiler.


                TYPE CHECKING -- KERNIGHAN & RITCHIE C

   Where  PASCAL insists on checking all  parameters for an exact type
match,  original  -- Kernighan & Ritchie  -- C takes the diametrically
opposite view.

   Classic  C  checks  NOTHING. It does not  check parameter types; it
does  not even check the number of parameters. All data in C is passed
"by  value", which means that the  value of the expression you specify
is pushed onto the parameter stack for the called procedure to use; if
you want to pass a variable "by reference" -- pushing its pointer onto
the  stack  -- you have to use the  "&" operator to get the variable's
address, to wit:

   myproc (&result, parm1, parm2);

If  you  omit  the  "&",  or specify it when  you shouldn't -- well, C
doesn't check for this, either.

   Much  can  be  said about the philosophical  reasons that C is this
way;  many labels, from "flexibility"  to "cussedness" can be attached
to  it.  The fact of the matter, though,  is that K&R C -- which means
many,  if  not  most,  of  today's C compilers --  doesn't do any type
checking.

   The  effects of this, of course, are the opposite of the effects of
PASCAL's strong type checking:

   *  You have almost complete flexibility in what types you pass to a
     procedure.  In two different calls, the same parameter can be one
     of two entirely different record structures; one of two character
     or  integer  arrays  of entirely different  lengths (C doesn't do
     run-time bounds checking, anyway); a real in one call, an integer
     in another, and a pointer in a third.

     Practically, virtually all of the examples I showed in the PASCAL
     chapter can thus be implemented in C. For instance,

       int strcompare(s1,s2,len)
       char *s1, *s2;
       int len;
       {
       int i;
       i = 0;
       while ((i < len) && (s1[i] == s2[i]))
         i = i+1;
       }

     will  merrily  compare two character  arrays, no questions asked.
     You  can  pass arrays of any size, and  it'll do the job. You can
     pass  integers,  reals, integer arrays,  whatever; of course, the
     code  isn't  likely  to  work,  but, hey, it's  a free country --
     nobody'll stop you.

   *  In most implementations of K&R C,  you're even allowed to pass a
     different  number  of  parameters than the  function was declared
     with.  Though  this is not guaranteed  portable, most C compilers
     make  sure  that if, say, your  procedure's formal parameters are
     "a", "b", and "c" (all integers) and you actually pass the values
     "1"  and "2", then A will be set to 1, B to 2, and C will contain
     garbage (that's "C" the variable, not "C" the language).

     This  is good because it allows you to write procedures that take
     a  variable  number  of parameters; as long as  you have a way of
     finding  out  how many parameters were  actually passed (e.g. the
     PRINTF   format   string),   your   procedure   can  handle  them
     accordingly.

   *  On the other hand, say you make a mistake in a procedure call --
     you  pass  a  real  instead  of an integer, a  value instead of a
     pointer,  or  perhaps  even omit a  parameter. The compiler won't
     check  this; the only way you'll find the error is by running the
     program, and even then the erroneous results may first appear far
     away from the real error.

     Some  C compilers (especially on UNIX) come with a program called
     LINT  that can check for this  error and others, but that's often
     not  enough.  First of all, your programmers  have to run LINT as
     well  as  C  for  each program, which  slows down the compilation
     pass;  more importantly, since LINT is no way part of standard C,
     many C compilers don't have it.

     VAX/VMS C, for instance, doesn't come with LINT; neither does the
     CCS C that's available on the HP3000.

   *  Similarly,  even  things  that  seem like they  ought to work --
     passing  an  integer  in  place of a real  and expecting it to be
     reasonably converted -- will fail badly. Thus,

       sqrt(100)

     won't  work  if  SQRT  expects  a  real; C won't  realize that an
     integer-to-real conversion is required, and will thus pass 100 as
     an integer, which is a different thing altogether.

     A  similar  problem  occurs  on computers (like  the HP3000) that
     represent  byte  pointers  (type  "char *")  and integer pointers
     (type  "int  *"  and  other  pointer types)  differently. Since C
     doesn't  know  which  type of pointer  a procedure expects, it'll
     never  do conversions. If you call a procedure like FGETINFO that
     expects  byte pointers and pass it  an integer pointer, you'll be
     in trouble (unless you manually cast the pointer yourself).

     Incidentally,   for   ease   of   using   real  numbers,  C  will
     automatically convert all "single-precision real" (called "float"
     in  C) arguments to "double-precision  real" ("long") in function
     calls.  This makes sure that if SQRT expects a "long", passing it
     a "float" won't confuse it.

   *  On  the  other  hand  (how  many  hands  am  I up  to now?), C's
     conversion  woes  -- requirements of  passing "float"s instead of
     "int"s,  "char  *"s  instead  of "int *"s, etc.  -- are easier to
     solve  than  in  PASCAL.  Since C allows you  to easily convert a
     value  from  one  datatype to another  (using the so-called "type
     casts"), you could say

       my_proc ((float) 100, (char *) &int_value);

     and  thus  pass a "float" and a  "char *" to "my_proc". In PASCAL
     you   couldn't   do   things  this  easily.  The  compiler  might
     automatically translate an integer to a float for you; but, if it
     expects  a  character  value  and  all you've got  is an integer,
     there's no easy way for you to tell it "just pass this integer as
     a byte address, I know what I'm doing."

   Thus,  K&R C is flexible enough to  do all that Standard PASCAL can
not. If this is necessary to you -- and I can easily understand why it
would  be; Standard PASCAL's restrictions are very substantial -- then
you'll  have  to  live  with C's lack of  error checking. On the other
hand,  if flexibility is of less than  critical value, you have to ask
yourself  whether  or  not you want the  extra level of compiler error
checking that PASCAL can provide you.

   My  personal experience, incidentally, has been that compiler error
checking of parameters is very nice, but not absolutely necessary. I'd
love  to  have  the  compiler  find  my bugs for me,  but I can muddle
through  without it. PASCAL's  restrictions, though, are substantially
more  grave. More than inconveniences,  they can make certain problems
almost impossible to solve.


                        DRAFT ANSI STANDARD C

   Time,  it  is  said,  heals  all  wounds; perhaps it  can also heal
wounded  computer languages. God knows, FORTRAN 77 isn't the greatest,
but it sure is better than FORTRAN IV.

   The  framers  of  the  new  Draft  ANSI Standard  C have apparently
thought  about  some  of the problems that  C has, especially the ones
with  function  call  parameter checking and  conversion. The solution
seems to be quite good, letting you impose either strict or loose type
checking  --  whichever you prefer --  for each procedure or procedure
parameter. Remember, though, the standard is still only Draft, so it's
not  unlikely  that  any given C compiler you  might want to use won't
have it.


   In Draft Standard C, you can do one of two things:

   *  You can call a procedure the same old way that you'd do in K & R
     C.  No  type  checking, no automatic  conversion, no nothin'. You
     might declare its result type, to wit:

       extern float sqrt();

     (Remember,  you'd have to do that anyway in K&R C; otherwise, the
     compiler  will  treat SQRT's result as  an integer.) But no other
     declarations are required, and no checking will be done.


   *  Alternatively, you can declare a FUNCTION PROTOTYPE. This can be
     done  either for an external function  or for one you're defining
     --  the  prototype  is  very much like  PASCAL's procedure header
     declaration. A sample might be:

       extern int ASCII (int val, int base, char *buffer);

     or simply

       extern int ASCII (int, int, char *);

     [Note  that  the  parameter  NAMES, as opposed  to TYPES, are not
     necessary in a prototype for an EXTERNAL function. For a function
     that  you're  actually  defining,  the  names are  necessary; the
     declarations  in  the  prototype  are  used in place  of the type
     declarations   that  you'd  normally  specify  for  the  function
     parameters.]

     This  function  prototype  tells  the  compiler enough  about the
     function  parameters  for  it  to be able  to do appropriate type
     checking  and  conversion.  One of the reasons  K&R C couldn't do
     that is precisely because of the lack of this information.

Consider  the  cases where this would come  in handy. We might declare
SQRT as

   extern float sqrt (float);

and then a call like

   sqrt (100)

would  automatically be taken to mean "sqrt ((float) 100)", i.e. "sqrt
(100.0)". Similarly,

   sqrt (100, 200)

or

   sqrt ()

would  cause a compiler error or warning, since now the compiler KNOWS
that SQRT takes exactly one parameter.

   In general, say that you have a function declared as

   extern int f(formaltype);   /* or non-extern, for that matter */

This  simply  means  that "f" is a function  that returns an "int" and
takes  one  parameter  of  type "formaltype". Now,  say that your code
looks like:

   actualtype x;
   ...
   i = f(x);

Is  this  kind  of  call  valid or not? Of  course, it depends on what
"formaltype" and "actualtype" are:

   *  If  both  FORMALTYPE  and ACTUALTYPE are  numbers -- integers or
     floats,  short,  long,  or  whatever  --  then X  is converted to
     ACTUALTYPE before the call. This is what lets us say

        sqrt(100)

     when "sqrt" is declared to take a parameter of type "real".

     (The  same goes the other way -- if "mod" is declared to take two
     "int"s,  then "mod(10.5,3.2)" would  be converted to "mod(10,3)",
     although  the  compiler might print a  warning message to caution
     you that a truncation is taking place.)


   *  If  FORMALTYPE  is  a  pointer  --  which  is  the case  for all
     "by-reference"  parameters,  since  that's how we  pass things by
     reference  in C -- then ACTUALTYPE must be EXACTLY the same type.
     In other words, if we say:

        int copystring (char *src, char *dest)

     then in the call

        char x;
        int y;
        ...
        copystring (x, &y);

     BOTH  parameters will cause an error message. The first parameter
     will  be  a "CHAR" passed where a  "CHAR *" is expected, which is
     illegal -- a good way of checking for attempts to pass parameters
     by  value  where by-reference was  expected. The second parameter
     will  be an "INT *" passed where a "CHAR *" is expected, which is
     also  illegal, since although both are pointers, they don't point
     to the same type of thing.

   *  If  ACTUALTYPE  is  a  pointer,  then FORMALTYPE must  also be a
     pointer  of  EXACTLY  the  same  type. Again, this  is useful for
     catching attempts to pass "by-reference" calls to procedures that
     expect "by-value" parameters, and also attempts to pass a pointer
     to the wrong type of object.

   *  If  either ACTUALTYPE or FORMALTYPE is  a pointer of the special
     type  "void  *",  then the other one may  be any type of pointer.
     This is very useful when we want a parameter to be a BY-REFERENCE
     parameter  of some arbitrary type (similar to PASCAL/XL's ANYVAR,
     for  which  see  below). Thus, if we  want to write our "put_rec"
     procedure  that'll  put  any  type  of  record  structure  into a
     database, we'd say:

        put_rec (char *dbname, char *dbset, void *rec)

     Then, we could say:

        typedef struct {...} sales_rec_type;
        typedef struct {...} emp_rec_type;
        ...
        sales_rec_type srec;
        emp_rec_type erec;
        ...
        put_rec (mydb, sales_set, &srec);
        ...
        put_rec (mydb, emp_set, &erec);

     Both  of  the  PUT_REC  calls  are  valid since  both "&srec" and
     "&erec"  (and, for that matter, any  other pointer) can be passed
     in place of a "void *" parameter. If we'd declared "put_rec" as:

        put_rec (char *dbname, char *dbset, sales_rec_type *rec)

     then  the  "put_rec  (mydb,  emp_set,  &erec)" call  would NOT be
     legal, sinec "&erec" is NOT compatible with "sales_rec_type *".

     Note  that  on  some machines -- including  the HP3000 -- integer
     pointers and character pointers are NOT represented the same way.
     However, it's always safe to pass either a "char *" or an "int *"
     in  place  of  a  parameter that's declared as  a "void *". The C
     compiler  will always do the  appropriate conversion; thus, if we
     declare the ASCII intrinsic as

        extern int ASCII (int, int, void *);

     then both of the calls below:

        char *cptr;
        int *iptr;
        ...
        i = ASCII (num, 10, cptr);
        ...
        i = ASCII (num, 10, iptr);

     will  be valid (assuming that a  "void *" is actually represented
     as  a byte pointer, which is what the ASCII intrinsic wants). You
     can  thus  think  of  "void  *"  as the "most  general type"; any
     pointer can be successfully passed to a "void *".

   * Note that although you CAN'T pass, say, a "char *" to a parameter
     of  type "int *", C will ignore the SIZE of the array the pointer
     to which is being passed. In other words, a function such as

        extern strlen (char *s);

     may  be  passed a pointer to a string  of any size -- both of the
     following calls:

        char s1[80], s2[256];
        ...
        i = strlen (s1);
        i = strlen (s2);

     are  valid.  Remember  that  C  makes  no  distinction  between a
     "pointer  to  an  80-byte  array"  and  a "pointer  to a 256-byte
     array";  similarly, it makes no distinction between an array like
     "s1" and a "pointer to a character" (see below).

   *  An interesting exception to the  above rules is that the integer
     constant  0  can  be  passed  to  ANY pointer  parameter. This is
     because  a pointer with value 0  is conventionally used to mean a
     "null pointer".

     This  is quite useful in some applications, but can often prevent
     the compiler from detecting some errors. If I say:

        extern PRINT (int *buffer, int len, int cctl);
        ...
        PRINT (0, -10, current_cctl);

     this  won't, of course, print a "0"; rather, it'll pass PRINT the
     integer  pointer "0", which will point  to God knows what in your
     stack.  Not  a  very serious problem, but  something you ought to
     keep in mind.

   * Unlike Standard PASCAL, not only can you entirely waive parameter
     checking  for a procedure (just omit the prototype!), but you can
     also  explicitly CAST an actual parameter whenever you want it to
     match  the  type of a formal parameter.  In other words, say that
     you declare two structure types:

        typedef struct {...} rec_a;
        typedef struct {...} rec_b;
        rec_a ra;   /* declare a variable of type "rec_a" */
        rec_b rb;   /* declare a variable of type "rec_b" */

     and then write a function

        process_record_a (int x, int y, rec_a *r)
        {
        ...
        }

     If you then say

        process_record_a (10, 20, &rb);

     then  the compiler will (quite  properly) print an error message,
     since  you were trying to pass a  "pointer to rec_b" instead of a
     "pointer  to  rec_a". If you really want  to do this, though, all
     you need to do is say:

        process_record_a (10, 20, (rec_a *) &rb);

     manually  CASTING the pointer "&rb" to  be of type "rec_a *", and
     the compiler won't mind.

   *  Finally,  let  me also point out that,  like everywhere in C, an
     "array  of T" and a "pointer  to T" are mutually interchangeable.
     In other words, if you say:

        extern int string_compare (char *s1, char *s2);

     and then call it as:

        char str1[80], str2[256];
        ...
        if (string_compare (str1, str2)) ...

     the  compiler  won't  mind. To it a "char  *" and a "char []" are
     really one and the same type.

     Somewhat  (but  not  exactly) similarly --  perhaps I should say,
     similarly but differently -- the NAME OF A FUNCTION can be passed
     to  a  parameter  that  is expecting a POINTER  TO A FUNCTION. In
     other words, if you write a procedure

        int do_function_on_array_elems (int *f(), int *a, int len);

     (which  takes  a pointer to a function,  a pointer to an integer,
     and an integer), and then call it as:

        do_function_on_array_elems (myfunc, xarray, num_xs);

     the  compiler won't complain (assuming, of course, that MYFUNC is
     really a function and not, say, an integer or a pointer).


   To  summarize, then, Draft Proposed ANSI  Standard C lets you check
function  parameters  almost  as  precisely  as  Standard  PASCAL. The
differences are:


   *  You  can  ENTIRELY  INHIBIT PARAMETER CHECKING  for all function
     parameters by just omitting the function prototype.


   *  You can declare a parameter to BE A BY-REFERENCE PARAMETER OF AN
     ARBITRARY TYPE by declaring it to be of type "void *". You can do
     this  while still enforcing tight type checking for all the other
     parameters.


   *  In addition to overriding type  checking on a PROCEDURE BASIS or
     PROCEDURE PARAMETER basis, you can also override type checking on
     a  particular call by simply casting  the actual parameter to the
     formal parameter's datatype.

   * Unlike PASCAL, C will never check the SIZE of an array parameter;
     only its TYPE.


     STANDARD "LEVEL 1" PASCAL TYPE CHECKING -- CONFORMANT ARRAYS

   If  you recall, one of the  PASCAL features I most complained about
was  the  inability  to  pass  arrays of different  sizes to different
procedures.  This  essentially  prevents you from  writing any sort of
general array handling routine, including:

   *  For  PACKED  ARRAYs  OF  CHAR  --  the way  that Standard PASCAL
     represents  strings -- you can't write things like blank trimming
     routines,  string  searches, or anything  that's intended to take
     PACKED ARRAYs OF CHAR of different sizes.

   *  For  other arrays, the problem is  exactly the same -- you can't
     write  matrix handling routines that work with arbitrary sizes of
     arrays, e.g. matrix addition, multiplication division, etc.

This  wasn't  the  only  type  checking  problem (others  included the
inability  to  pass  various  record  types to  database I/O routines,
etc.), but it was a major one.

   The ISO Pascal Standard, released in the early 80's, addresses this
problem.  A  new feature called "conformant  arrays" has been defined;
PASCAL  compilers are encouraged, but not required, to implement it. A
compiler is said to

   * "Comply at level 0" if it does not support conformant arrays;

   * "Comply at level 1" if it does support them.

You  see  the problem -- who knows  just how many new PASCAL compilers
will  include  this feature? It is a  fact that most compilers written
before the ISO Standard do NOT include it.

   PASCAL/3000,  for  instance,  does  not have it;  PASCAL/XL, on the
other hand, does.

   What are "conformant arrays"? To put it simply, they are

   *  FUNCTION PARAMETERS that are defined to be ARRAYS OF ELEMENTS OF
     A  GIVEN  TYPE,  but  whose bounds are  NOT defined. Instead, the
     compiler  makes  sure  that  the ACTUAL BOUNDS  of whatever array
     parameter is ACTUALLY passed are made known to the procedure.

An example:

   FUNCTION FIRST_NON_BLANK
            (VAR STR: PACKED ARRAY [LOWBOUND..HIGHBOUND: INTEGER]
                      OF CHAR): INTEGER;
   VAR I: INTEGER;
   BEGIN
   I:=LOWBOUND;
   WHILE I<HIGHBOUND AND STR[I]=' ' DO
     I:=I+1;
   FIRST_NON_BLANK:=I;
   END;

This  procedure  is intended to find the  index of the first non-blank
character  of  STR. Note how it declares  STR: Instead of specifying a
constant  lower  and  upper  bound in the PACKED  ARRAY [x..y] OF CHAR
declaration, it specifies TWO VARIABLES.

   When   the   procedure   is   entered,  the  variable  LOWBOUND  is
automatically  set  to  the  lower  bound  of whatever  array the user
actually passed, and HIGHBOUND is set to the upper bound of the array.

   In other words, if we say:

   VAR MYSTR: PACKED ARRAY [1..80] OF CHAR;
   ...
   I:=FIRST_NON_BLANK (MYSTR);

then,  in FIRST_NON_BLANK, the variable LOWBOUND  will be set to 1 and
HIGHBOUND  will  be  set  to  80.  Instead  of just  passing the MYSTR
parameter, PASCAL actually passes "behind your back" 1 and 80 as well.

   The way I see it, this is a very good solution, even better in some
ways  than C's (in which you can always pass an array of any arbitrary
size):

   *  You're no longer restricted (like you are in Standard PASCAL) to
     a fixed size for your array parameters.

   * When you pass an array to a conformant array parameter, you don't
     have  to manually specify the size of the array; the array bounds
     are  automatically  passed  for you. If I  were to write the same
     procedure in C, I'd have to say

       int first_non_blank (str, maxlen)
       char str[];
       int maxlen;
       ...

     and  then  manually pass it both the  string and the size that it
     was  allocated with; otherwise, the  procedure won't know when to
     stop  searching  (assuming  you  don't use the  convention that a
     string is terminated by a null or some such terminator).

   *  Since  the  compiler  itself  knows  what  the  conformant array
     parameter's bounds are (it doesn't know the actual values, but it
     does  know  what  variables  contain  the  values),  it  can emit
     appropriate run-time bounds checking code. This can automatically
     catch  some  errors at run-time, which is  good if you like heavy
     compiler-generated error checking.

   *  Conformant arrays are even better for two-dimensional arrays. To
     index  into a two-dimensional array the compiler must, of course,
     know  the number of columns in the array (assuming it's stored in
     row-major  order, as C and PASCAL 2-D arrays are). In C, you must
     either declare the number of columns as a constant, e.g.

       matrix_invert (m)
       float m[][100];

     or  declare  the  parameter  as  a 1-D array,  pass the number of
     columns  as  a  parameter, and then do  your own 2-D indexing, to
     wit:

       matrix_invert (m, numcols)
       float m[];
       int numcols;
       ...
       element = m[row*numcols+col];  /* instead of M[ROW,COL] */
       ...

    In ISO Level 1 PASCAL, you just declare the procedure as:

       PROCEDURE MATRIX_INVERT (M: ARRAY [MINROW..MAXROW,
                                          MINCOL..MAXCOL] OF REAL);

    Then  you automatically know the bounds  of the array AND can also
    do  normal  array indexing (M[ROW,COL]),  since the compiler knows
    the number of columns, too.


This,  it  seems, is how original  Standard PASCAL should have worked,
and  I'm glad that the standards  people have established it. The only
problems are:


    * This is, of course, somewhat less efficient than not passing the
      bounds  or just passing, say, the upper bound (like you would in
      C).


    *  Remember  that  this only fixes the case  where we want to pass
      differently  sized  arrays  to  a procedure. If  we want to pass
      different  TYPES  (like  in  our  PUT_REC procedure  that should
      accept  one of several database record types), conformant arrays
      won't help us.

    *  Most importantly, MANY PASCAL  COMPILERS MIGHT NOT SUPPORT THIS
      WONDERFUL  FEATURE. In particular,  PASCAL/3000 DOES NOT SUPPORT
      CONFORMANT ARRAYS.


                       PASCAL/XL TYPE CHECKING

   PASCAL/XL  obeys all of PASCAL's type checking rules, but gives you
a number of ways to work around them:

   *  PASCAL/XL  supports  the  CONFORMANT  ARRAYS that  I just talked
     about.

   * PASCAL/XL allows you to specify a variable as "ANYVAR", e.g.

       PROCEDURE PUT_REC (VAR DB: TDATABASE;
                          S: TDATASET;
                          ANYVAR REC: TDBRECORD);

     What  this  means to PASCAL is that,  when PUT_REC is called, the
     third parameter (REC) will NOT be checked. Inside PUT_REC, you'll
     be  able to refer to this parameter  as REC, and to PUT_REC it'll
     have  the type TDBRECORD; however, the CALLER need not declare it
     as TDBRECORD. For instance,

       VAR SALES_REC: TSALES_REC;
           EMP_REC: TEMP_REC;
       ...
       PUT_REC (MY_DB, SALES_DATASET, SALES_REC);
       ...
       PUT_REC (MY_DB, EMP_DATASET, EMP_REC);

     will  do  EXACTLY what we want it  to -- it'll pass SALES_REC and
     EMP_REC  to our PUT_REC procedure without complaining about their
     data types.

     As  I  said,  PUT_REC  itself  will view the  REC parameter as an
     object of type TDBRECORD. However, PUT_REC can say

       SIZEOF(REC)

     and  determine  the  TRUE  size of the  actual parameter that was
     passed  in place of REC. This can be very useful if PUT_REC needs
     to  do  an  FWRITE or some such operation  that needs to know the
     size of the thing being manipulated.

     The  way  this is done, of course,  is by PASCAL/XL's passing the
     size  of the actual parameter as well as the parameter's address.
     Incidentally,  you  can  turn  this off for  efficiency's sake if
     you're not going to use this SIZEOF construct.

   *  PASCAL/XL  allows  you  to  do TYPE COERCION --  you can take an
     object  of  an arbitrary type and view  it as any other type. For
     instance,  you can take a generic  "ARRAY OF INTEGER" and view it
     as  a record type, or take an  INTEGER parameter and view it as a
     FLOAT. A possible application might be:

       TYPE COMPLEX = RECORD REAL_PART, IMAG_PART: REAL; END;
            INT_ARRAY = ARRAY [1..2] OF INTEGER;
       ...
       PROCEDURE WRITE_VALUE (T: INTEGER; ANYVAR V: INT_ARRAY);
       BEGIN
       IF T=1 THEN WRITELN (V[1])
       ELSE IF T=2 THEN WRITELN (FLOAT(V))
       ELSE IF T=3 THEN WRITELN (BOOLEAN(V))
       ELSE IF T=4 THEN WRITELN (COMPLEX(V).REAL_PART,
                                 COMPLEX(V).IMAG_PART);
       END;

     As  you  see,  this  procedure  takes a type  indicator (T) and a
     variable  of  any  type V. Then, depending on  the value of T, it
     VIEWS  V as an integer, a float, a boolean, or a record structure
     of type COMPLEX. All we need to do is say

       typename(value)

     and  it returns an object with  EXACTLY THE SAME DATA as "value",
     but viewed by the compiler as being of type "typename". Note that
     this  means that "REAL(10)" won't return  10.0 (which is what a C
     "(float)  10"  type  cast  would  do);  rather, it'll  return the
     floating point number the MACHINE REPRESENTATION of which is 10.

     Some  other  example applications for  this very useful construct
     are:

       -  You can now have a pointer variable that can be set to point
         to  an object of an arbitrary  type; this allows you to write
         things like generic linked list handling procedures that work
         regardless  of what type of  object the linked list contains.
         More about this on ANYPTR below.

       -  You  may  write a generic bit  extract procedure that can be
         used  for  extracting bits from  characters, integers, reals,
         etc. You'd declare it as:

           FUNCTION GETBITS (VAL, STARTBIT, LEN: INTEGER): INTEGER;
           ...

         and call it using

           I:=GETBITS (INTEGER(3.0*EXP(X)), 10, 6);

         or

           I:=GETBITS (INTEGER(MYSTRING[I]), 5, 1);

         or  whatever.  Note  that  you  couldn't do  this with ANYVAR
         parameters since ANYVAR parameters are by-reference, and thus
         can't be passed constants or expressions.

   *  PASCAL/XL -- just like PASCAL/3000 -- makes STRING parameters of
     any  size  compatible  with  each  other.  Thus,  you can  pass a
     STRING[20]  to a procedure that's  defined to take a STRING[256];
     or,  if  you're  passing  the  string by REFERENCE,  you can just
     declare   the   formal  parameter  as  "STRING",  which  will  be
     compatible with any string type.

   * PASCAL/XL has a new type called "ANYPTR"; declaring a variable to
     be  an  ANYPTR  makes  it "assignment-compatible"  with any other
     pointer  type, which means that that  variable can be easily made
     to  point  to objects of different  types. This, coupled with the
     "type  coercion"  operation  mentioned above,  makes manipulating
     say, linked lists of different data structures much easier.

   Needless  to  say, use of any of  these constructs can get you into
trouble  precisely  because  of the additional  freedom they give you.
Converting  a chunk of data from one  record data type to another only
makes  sense  if  you  know  exactly what you're  doing; if you don't,
you're likely to end up with garbage.

   However,  often  there  are  cases  where you  NEED this additional
freedom,  and  in  those  cases, PASCAL/XL really  comes through. As a
rule,  its  type  checking  is  as stringent and  thorough as Standard
PASCAL's,  but  it  allows  you  to  relatively easily  waive the type
checking whenever you need to.


                        ENUMERATED DATA TYPES

   If  you recall, before I started talking about type checking, I was
describing  RECORD  STRUCTURES,  a  new  data  type that  PASCAL and C
support.  My  mind,  you  see,  works  like a stack  -- sometimes I'll
interrupt  what  I'm  doing  and  go  off  on a  digression (sometimes
relevant,  sometimes not); then, I'll just POP the stack, and I'm back
where I was before.

   So,  I'm  popping  the stack and continuing  with the discussion of
"new"  data types -- data types that  C and/or PASCAL support, but SPL
does not.

   Say  you want to call the FCLOSE intrinsic. You pass to it the file
number  of  the  file  to  be  closed  and  you  also pass  the file's
DISPOSITION.  This disposition is a  numeric constant, indicating what
the system is to do with the file being closed:

   FCLOSE (FNUM, 0, 0);   means just close the file;
   FCLOSE (FNUM, 1, 0);   means SAVE the file as a permanent file;
   FCLOSE (FNUM, 2, 0);   means save the file as a TEMPORARY file;
   FCLOSE (FNUM, 3, 0);   means save the file as a temporary file,
                          but if it's a tape file, DON'T REWIND;
   FCLOSE (FNUM, 4, 0);   means DELETE the file being closed;

[we'll ignore for now the "squeeze" disposition and the fairly useless
third  parameter.]  Now,  naturally,  today's  enlightened  programmer
doesn't  want to specify the disposition  as a numeric constant -- how
many people will understand what's going on if they see a

   FCLOSE (FNUM, 4, 0);

in the middle of the program? Instead, we'd define some constants --

   EQUATE DISP'NONE = 0,
          DISP'SAVE = 1,
          DISP'TEMP = 2,
          DISP'NOREWIND = 3,
          DISP'PURGE = 4;

Now, we can say

   FCLOSE (FNUM, DISP'PURGE, 0);

Don't you like this better? I knew you would.

   As  you  see,  in  this  case,  an  integer is being  used not as a
QUANTITATIVE  measure  (how  large  a  file  is,  how many  seconds an
operation  took, etc.), but rather as a sort of FLAG. This flag has no
mnemonic  relationship to its numeric value; the numeric value is just
a  way  of  encoding  the operation we're  talking about (save, purge,
etc.).

   This  sort  of  application  actually occurs  very frequently. Some
examples might include:

   *  FFILEINFO  item codes, which indicate  what information is to be
     retrieved (4 = record size, 8 = filecode, 18 = creator id, etc.).

   *  CREATEPROCESS  item  numbers,  which indicate  what parameter is
     being passed (6 = maxdata, 8 = $STDIN, 11 = INFO=, etc.).

   *  FOPEN foptions bits -- 1 = old permanent, 2 = old temporary, 4 =
     ASCII  file,  64 = variable record length  file, 256 = CCTL file,
     etc.; same for aoptions bits.

   *  And  many  other  cases;  each  system  table  you look  at, for
     instance,  will  contain at least two or  three of these sorts of
     encodings.

   As  I  mentioned,  SPL's  solution to this sort  of problem is just
declaring  constants  (using  EQUATE). Similarly, in  PASCAL you could
easily say:

   CONST DISP_NONE = 0;
         DISP_SAVE = 1;
         DISP_TEMP = 2;
         DISP_NOREWIND = 3;
         DISP_PURGE = 4;

and in C, you could code:

   #define disp_none 0
   #define disp_save 1
   #define disp_temp 2
   #define disp_norewind 3
   #define disp_purge 4

Nice  and readable; the constant  declaration creates the link between
the  symbolic  name and the real numeric  value -- after this, you can
use the symbolic name wherever you need to.

   Enumerated  data  types  are  just  like  this, only  different. In
PASCAL, you could say

   TYPE FCLOSE_DISP_TYPE = (DISP_NONE, DISP_SAVE, DISP_TEMP,
                            DISP_NOREWIND, DISP_PURGE);

Instead of just defining five constants with values 0, 1, 2, 3, and 4,
this  declaration defines a new  DATA TYPE called FCLOSE_DISP_TYPE and
five OBJECTS of this type -- DISP_NONE, DISP_SAVE, etc. If you declare
the FCLOSE intrinsic as

   PROCEDURE FCLOSE (FNUM: INTEGER; DISP: FCLOSE_DISP_TYPE;
                     SEC: INTEGER);
   EXTERNAL;

you'll now be able to say

   FCLOSE (FNUM, DISP_PURGE, 0);

    The  key difference between an  "ENUMERATED TYPE" declaration and
the ordinary constant definitions is that the objects of the data type
can't be used as integers. For instance, saying this:

   VAR DISP: FCLOSE_DISP_TYPE;
   ...
   DISP:=1;

is an error, and you certainly can't say:

   DISP:=DISP_SAVE*DISP_PURGE;

   In  fact, if you've declared FCLOSE as was shown above, then PASCAL
will  even check the DISP parameter to make sure you're really passing
a DISP_xxx; if you accidentally pass something else, the compiler will
catch this and complain.

   As  you see, the advantage of  enumerated types is type checking (a
field  which  PASCAL,  in  general,  is rather  compulsive about). The
disadvantage is this:

   *  How are you certain that  when you declared the enumerated type,
     DISP_SAVE  actually  corresponded  to  1 and DISP_PURGE  to 4? In
     other  words, when you pass a  disposition to FCLOSE, PASCAL must
     pass  it  as  some  integer value -- if you  had declared it as a
     constant,  you'd KNOW the value; with an enumerated type, how are
     you sure?

Well,  although Standard PASCAL doesn't define what the "ACTUAL" value
of an enumerated type object is, most PASCALs -- including PASCAL/3000
and  PASCAL/XL -- number the objects from  0 up, in the order given in
the   enumerated   type   declaration.   This   is   what   lets   our
FCLOSE_DISP_TYPE  type work; the way that PASCAL allocates the numeric
values of the DISP_xxx objects is exactly the way we want it to.

   On  the  other  hand,  say that I want  to define file system error
numbers  (which  FCHECK might return). These  are also special numeric
codes  that we'd like to access using symbolic names, but they are NOT
sequentially ordered. For instance, you might want to declare

   CONST FERR_EOF = 0;
         FERR_NO_FILE = 52;
         FERR_SECURITY = 93;
         FERR_DUP_FILE = 100;

How  can you declare this as an enumerated data type? Well, you can't,
unless you're willing to declare 51 "dummy items" between FERR_EOF and
FERR_NO_FILE  so  that  FERR_NO_FILE  will  fall  on  52.  In general,
wherever  there  are  "holes" in the  sequence, enumerated types can't
really be used.

   Now,  this  is  not  a  complaint against enumerated  types per se.
Enumerated  types are great as long as  YOU DON'T CARE WHAT THE VALUES
OF  THE ENUMERATED TYPE OBJECTS ARE; if the type is used solely within
your  programs, you won't have any problems. The trouble comes in when
you  try to use enumerated types for objects whose values are dictated
externally.

   To summarize,

   * Enumerated types are very similar to constant declarations.

   * Enumerated types' big advantages are:

     -  The compiler does type checking for them, making sure that you
       don't  accidentally use, say, an  FOPEN foptions mode where you
       ought to use an FCLOSE disposition.

     -  You  don't  have  to  manually assign a  numeric value to each
       enumerated type object.

   *  Enumerated  types  are great if they're  defined and used solely
     within  your  program,  where  you  don't  care  what  values the
     compiler assigns to each object.

   *  If  you're  using  enumerated  types to  represent objects whose
     actual  value is important --  say FCLOSE dispositions, FFILEINFO
     item numbers, file system errors -- you may have troubles. If the
     actual  values are numbered sequentially starting with 0, you can
     use  an enumerated type to represent  these values; if they don't
     start  with  0  or  are  not sequential, you  can't really use an
     enumerated type.

   *  In general, even if the  values ARE numbered sequentially from 0
     (like  FCLOSE dispositions, FFILEINFO item numbers, CREATEPROCESS
     item  numbers,  etc.) you might want  to use constants instead of
     enumerated  types. This is because the numeric assignments aren't
     easily   visible   in   enumerated   type  declarations;  if  you
     accidentally  omit  a  possible  value (e.g. declare  the type as
     (NONE, SAVE, TEMP, PURGE), omitting NOREWIND), it won't be at all
     obvious that PURGE now has the wrong value.


                      ENUMERATED DATA TYPES IN C

   Classic  K&R  C  did not support enumerated  types; as we saw, this
probably  wasn't  such a big disadvantage,  since enumerated types are
just a fancy form of constant declarations.

   Draft  ANSI Standard C -- and, in  fact, most modern Cs -- supports
enumerated types; you can say

   typedef enum { disp_none, disp_save, disp_temp,
                  disp_norewind, disp_purge } fclose_disp_type;

which  will  define  the  type  FCLOSE_DISP_TYPE  just  like  PASCAL's
enumerated  type  declaration  will.  In  fact, the  numeric values of
DISP_NONE,  DISP_SAVE, etc. will even be assigned the same way as they
would be with PASCAL.

   The  trouble  is  this:  what  was  the  major advantage  of PASCAL
enumerated types over constants?

   Well,  once  the  PASCAL  compiler  knew that a  variable was of an
enumerated  type, it could to appropriate type checking. But C isn't a
strong type checking language! To C, any object of any enumerated type
is viewed exactly as an integer would be viewed.

   Thus, the above declaration is EXACTLY the same as:

   #define disp_none 0
   #define disp_save 1
   #define disp_temp 2
   #define disp_norewind 3
   #define disp_purge 4

If you say

   fclose_disp_type disp;

(thus declaring DISP to be an object of FCLOSE_DISP_TYPE), you can now
code

   disp = disp_save;

but you could also (if you wanted to) say

   disp = 1;

or

   disp = (i+123)/7;

   One advantage that C has, though, is that (unlike PASCAL) it allows
you  to  explicitly specify the numeric values  of each element in the
enumeration, to wit:

   typedef enum { ferr_eof = 0, ferr_no_file = 52,
                  ferr_security = 93, ferr_dup_file = 100 }
           file_error_type;

The  DEFAULT  sequence, you see, is from  0 counting up by 1; however,
you can override it with any initializations you want.

   In  other words, in C, enumerated  type declarations are truly just
another way of defining integer constants. The above declaration is in
fact identical to

   #define ferr_eof 0
   #define ferr_no_file 52
   #define ferr_security 93
   #define ferr_dup_file 100


   To summarize:

   * Enumerated data types in PASCAL =
       Just like ordinary constants +
       Type checking.

   * Enumerated data types in Draft ANSI Standard C =
       Enumerated data types in PASCAL -
       Type checking.

   * Ergo,
     Enumerated data types in Draft ANSI Standard C =
       Just like ordinary constants!

See how easy things become if you use a little mathematics?


                       SUBRANGE TYPES IN PASCAL

   Another  new category of data type that PASCAL has is the so-called
subrange  type.  It is in some  ways the quintessential PASCAL feature
because  it  really  performs  NO  NEW  FUNCTION  except  for allowing
additional compiler type checking.

   In PASCAL, you can declare a variable thus:

   VAR V: 1..100;

This  means that V is defined to always be between 1 and 100. It is an
error  for it be outside of these  bounds, and the PASCAL compiler may
generate code to check for this (PASCAL/3000 certainly does).

   Now,  fortunately,  the  type  checking  on  this  isn't  quite  as
stringent as on other types. In other words, if you declare:

   TYPE RANGE1 = 1..10000;
        SMALL_RANGE = 100..199;
   VAR SM: SMALL_RANGE;
   ...
   PROCEDURE P (NUM: RANGE1);

then you can still call

   P (SM);

even though SMALL_RANGE and RANGE1 are not the same type.


   On the other hand, if NUM is a BY-REFERENCE (VAR) parameter, i.e.

   PROCEDURE P (VAR NUM: RANGE1);

then saying

   P (SM);

will be an error! Any by-reference parameter MUST be an IDENTICAL type
(i.e.  either  the  same type or one  defined as identical, i.e. "TYPE
NEWTYPE  =  OLDTYPE"). Different subranges of  the same type (even two
differently-named  and separately-defined types  whose definitions are
identical!) are FORBIDDEN.

   If the full implications of this haven't sunk in yet, consider this
procedure:

   TYPE TPAC256 = PACKED ARRAY [1..256] OF CHAR;
   PROCEDURE COUNT_CHARS (VAR S: TPAC256;
                          VAR NUM_BLANKS: INTEGER;
                          VAR NUM_ALPHA: INTEGER;
                          VAR NUM_NUMERIC: INTEGER;
                          VAR NUM_SPECIALS: INTEGER);

This  one's simple -- it goes through a string S and counts the number
of  blanks,  alphabetic  characters,  numeric  characters,  and  other
"special"  characters;  all  the  counts  are returned  as integer VAR
parameters.

   The  variables that we pass  as NUM_BLANKS, NUM_ALPHA, NUM_NUMERIC,
and NUM_SPECIALS can NOT be declared as subrange types! If we say:

   VAR NBLANKS, NALPHA, NNUMERIC, NSPECIALS: 1..256;
   ...
   COUNT_CHARS (S, NBLANKS, NALPHA, NNUMERIC, NSPECIALS);

the compiler WON'T just check the variables after the COUNT_CHARS call
to  make  sure  that COUNT_CHARS didn't set  them to the wrong values;
rather, THE COMPILER WILL PRINT AN ERROR MESSAGE!

   If you still insist on using subrange types for this sort of thing,
you  get into the ridiculous circumstance in which YOU NEED A SEPARATE
COUNT_CHARS  PROCEDURE  FOR  EACH  POSSIBLE  TYPE  COMBINATION  OF THE
NUM_BLANKS, NUM_ALPHA, NUM_NUMERIC, AND NUM_SPECIALS PARAMETERS!

   This  is  why  I'm skeptical of the  utility of subrange variables.
It's  great for the compiler to be  able to do run-time error checking
and  warn me of any errors in  my program; however, I can never really
pass "by reference" subrange variables to any general-purpose routine!

   On  the  one  hand,  we  are  told that it's great  to have lots of
general-purpose  utility procedures that can be  called by a number of
other  procedures in a number of  possible circumstances; on the other
hand, we're prevented from doing this by too-stringent type checking!

   Thus, to summarize:

   *  Subrange  types are theoretically useful as  a way of giving the
     compiler more information with which to do run-time checking.

   *  However, their utility is SERIOUSLY COMPROMISED by the fact that
     you  can't, for instance, pass a  subrange type by reference to a
     procedure that expects an INTEGER (or vice versa).

     This  is especially damaging if you  like to (and you should like
     to)   write  general-purpose  procedures  --  your  only  serious
     alternative  there  is  to  declare any  by-reference parameters'
     types  to  be INTEGER and make sure  that all the variables you'd
     ever want to pass to such a procedure are type INTEGER too.

   SPL  and  C,  not  being  very strict  type-checkers, don't support
subranges.  In light of all I've said, this doesn't seem to be such an
awful lack.

   Finally,  one  more  important  comment about subrange  types. As I
mention  in the "BIT OPERATORS" section  of this paper, subrange types
(in   PACKED   RECORDs   and  PACKED  ARRAYs)  are  PASCAL/3000's  and
PASCAL/XL's mechanism for accessing bit fields.

   This  is  NOT endorsed or supported by  the PASCAL Standard, but it
turns  out to be one of the most useful applications of subrange types
in PASCAL/3000 and PASCAL/XL.


                           DATA ABSTRACTION

   When  I  was  converting  our MPEX/3000 product to  run on both the
pre-Spectrum   and  Spectrum  machines,  I  had  to  overcome  several
problems.

   One  was, of course, that some (although not all) of the privileged
procedures  and  operations  that  I  did  had  to  be  done  somewhat
differently on MPE/XL.

   My conversion here was helped by the concept of "code isolation" --
rather  than  putting  various calls to, say,  DIRECSCAN or FLABIO all
over my program, I isolated them in individual procedures. Then, all I
had  to  do  was  replace those "wrapping" procedures,  and all of the
programs that called them didn't have to be changed.

   Another  problem was that some of  the tables (like the file label,
directory  entries,  ODD, JIT, etc.), though  similar in principle and
containing  much  the  same  fields,  had different  offsets for those
fields.

   Here  I was helped by the  fact that I never explicitly referenced,
say,  the  filecode  field  of  the  file label  by saying "FLAB(26)".
Instead, I had an $INCLUDE file that DEFINEd the token "FLAB'FCODE" to
be  "FLAB(26)"  --  all  I had to do was  change the $INCLUDE file and
again the rest of my programs didn't need to be changed.

   One  area,  though,  that  gave  me more trouble  than I would have
expected was the changing size of some fields.

   Not the changing meaning -- the file label still contained a record
size field and a block size field, and the directory still contained a
file  label  address -- but rather the  changing SIZE. The record size
and  the  block  size  were now 2 words rather  than 1; the file label
address was 10 words instead of 2.

   Consider a few of my procedures:

   DOUBLE PROCEDURE ADDRESS'FNUM (FNUM);
   VALUE FNUM;
   INTEGER FNUM;
   << Given a file number, returns its file label's disc address. >>

   DOUBLE PROCEDURE ADDRESS'NAME (FILENAME);
   BYTE ARRAY FILENAME;
   << Given a filename, returns its file label's disc address. >>

   PROCEDURE FLABREAD (ADDR, FLABEL);
   VALUE ADDR;
   DOUBLE ADDR;
   ARRAY FLABEL;
   << Given a file label's disc address, reads the file label. >>

   The  plan  here  is  that  FLABREAD  is the master  file label read
procedure, to which we pass a disc address. We can either say

   FLABREAD (ODD'FLAB'ADDR, FLABEL);   << if we have the address >>

or

   FLABREAD (ADDRESS'FNUM(IN'FNUM), FLABEL);

or

   FLABREAD (ADDRESS'NAME(PROG'FILENAME), FLABEL);

Convenient, readable, general. What's wrong with it?

   This  mechanism was quite acceptable for MPE/III, MPE/IV, and MPE/V
because  then  the  disc  address  was a double  integer. In MPE/XL it
changed  to a 10-word array. Any places that explicitly refer to it as
a DOUBLE must be changed to call it an INTEGER ARRAY.

   "Data  abstraction"  refers  to exactly this  concern. Don't call a
disc  address  "a  double integer". Rather call  it "an object of type
DISC_ADDRESS". In PASCAL terms, don't say:

   PROCEDURE FLABREAD (ADDR: INTEGER; VAR F: FLABEL);

Say

   TYPE DISC_ADDRESS = INTEGER;   { double integer }
   ...
   PROCEDURE FLABREAD (ADDR: DISC_ADDRESS; VAR F: FLABEL);

Then,  when you need to change to MPE/XL, all you need to do is change
the TYPE declaration to

   TYPE DISC_ADDRESS = ARRAY [1..10] OF SHORTINT;   { 10 words }

and  you're home free. Of course,  you'll doubtless have to change the
IMPLEMENTATION  of  FLABREAD (if the disc  address format has changed,
probably the way of accessing it has, too); however, you won't have to
touch any of the CALLERS of FLABREAD.

   So   that's   the  first  component  of  data  abstraction  --  the
responsibility  of  the programmer for declaring  objects not with the
type  they  happen  to  have  -- say, INTEGER --  but rather with some
"abstract type" (DISC_ADDRESS) that is defined elsewhere as INTEGER.

   The  second  component  of  data abstraction, though,  is much less
obvious. Say that you said

   { in PASCAL }
   TYPE DISC_ADDRESS = ARRAY [1..10] OF SHORTINT;
   ...
   FUNCTION ADDRESS_FNUM (FNUM: INTEGER): DISC_ADDRESS;

   { in C }
   typedef int disc_addr[10];
   ...
   disc_addr address_fnum (fnum);
   int fnum;

   { in SPL }
   DEFINE DISC'ADDRESS = INTEGER ARRAY #;
   DISC'ADDRESS ADDRESS'FNUM (FNUM);
   VALUE FNUM;
   INTEGER FNUM;

All  of  these operations would make sense  -- instead of returning an
integer,  ADDRESS'FNUM  is  to return an  object of type DISC_ADDRESS,
which happens to be an integer array.

   The  trouble here is that in neither  Standard PASCAL nor C nor SPL
can a procedure return an integer array!

   Thus,  "hiding"  the  type  of an object from  most of the object's
users  is  very  nice,  but ONLY IF THE  COMPILER PERMITS IT TO REMAIN
HIDDEN. In another example, in SPL, saying

   FOR I:=1 UNTIL RECSIZE DO

is  only  legal  if  RECSIZE  is  an  integer. If RECSIZE  is a double
integer,  all  the  data  abstraction in the world  will do us no good
because the SPL compiler itself will reject the above FOR loop.

   To  be  truly able to have "data abstraction"  -- to be able to not
care  about an object's underlying representation type -- the compiler
must treat all the possible types as equally as possible.

   Considering  again the case of the  disc address, there's no way we
can have an SPL procedure return anything that can represent a 10-word
value. We'd have to write ADDRESS'FNUM as

   PROCEDURE ADDRESS'FNUM (FNUM, RETURN'VALUE);
   VALUE FNUM;
   INTEGER FNUM;
   INTEGER ARRAY RETURN'VALUE;

and then call it as:

   DOUBLE TEMP'DISC'ADDR;
   ...
   ADDRESS'FNUM (FNUM, TEMP'DISC'ADDR);
   FLABREAD (TEMP'DISC'ADDR, FLAB);

instead of simply

   FLABREAD (ADDRESS'FNUM(FNUM), FLAB);

This  is, of course, less convenient, which  is why I kept the address
as  a double integer instead of an integer array -- and got stuck when
I converted to Spectrum.

   In Standard PASCAL, as I said, I couldn't have a function returning
an  integer array. PASCAL/3000, though,  lifts this restriction -- you
can now say:

   TYPE DISC_ADDRESS = ARRAY [1..10] OF SHORTINT;
   ...
   FUNCTION ADDRESS_FNUM (FNUM: INTEGER): DISC_ADDRESS;
   ...
   FLABREAD (ADDRESS_FNUM(FNUM), FLAB);

Since  the function is allowed to return an integer array, we can keep
the  same  interface  regardless  of whether DISC_ADDRESS  is a double
integer  or  an array. Of course, the  efficiency of the code won't be
quite  the  same;  similarly,  the  internals  of  ADDRESS_FNUM  would
doubtless  be somewhat different. However, the callers of ADDRESS_FNUM
wouldn't  have  to change a whit despite  the change in the underlying
definition of the DISC_ADDRESS type.

   In  C  (K&R  or  Draft  Standard),  functions can't  return arrays,
either.  However,  they  can return structures,  and a structure might
well contain only one element -- an array. Thus, we could say

   typedef struct {int x[10];} disc_address;
   ...
   disc_address address_fnum(fnum)
   int fnum;
   ...
   flabread (address_fnum(fnum), flab);

Of  course,  it  isn't quite as convenient  to manipulate an object of
type DISC_ADDRESS as it would be if it were a simple array (instead of
"discaddr[3]=ldev",  we have to say "discaddr.x[3]=ldev"), but this is
a reasonable alternative.

   Again,  note how we can easily switch the underlying representation
of  DISC_ADDRESS  to,  say,  a  double  integer,  or a  long float, or
whatever, without changing the fundamental structure of the procedures
that use DISC_ADDRESSes.

   Similarly,  compare  the  SPL  treatment  of  INTEGERs  and DOUBLEs
against  the  PASCAL  treatment  of  SHORTINTs  (1-word  integers) vs.
INTEGERs (2-word integers) or the C treatment of "short int"s (1-word)
vs.  "int"s  (2-word).  In  SPL,  INTEGERs  and  DOUBLEs  are mutually
INCOMPATIBLE -- you can't say:

   DOUBLE D;
   INTEGER I;
   D:=I+D;

In PASCAL, though,

   TYPE SHORTINT = -32768..32767;   { this is built into PASCAL/XL }
   VAR S: SHORTINT;
       I: INTEGER;
   I:=I+S;

will work, as will

   short int s;
   long int i;
   i = i + s;

in C.

   A  similar thing, incidentally, can be  said about SPL, PASCAL, and
C's  handling of real numbers. SPL's  REAL and LONG (double precision)
types   are   incompatible;   in  PASCAL  and  C  dialects  where  two
floating-point  types  are  provided  (remember,  neither  language is
OBLIGATED   to   provide  more  than  one  floating-point  type),  the
floating-point types are always mutually compatible.

   What  this means, of course, is that it's quite easy in PASCAL or C
to  change the type of a variable from "short integer" or "short real"
to  "long  integer" or "long real", or  vice versa; in SPL, it's quite
difficult, since we'll have to put in a lot of manual type conversions
to make sure everything stays consistent.

   To  summarize,  then,  the  differences  in  the  way  the  various
languages handle data types:


[Note: "STD PAS" refers to both Standard PASCAL and the ISO Level 1
Standard.]

                                            STD  PAS/ PAS/ K&R  STD
                                       SPL  PAS  3000 XL   C    C

CAN A FUNCTION RETURN ANY OBJECT?
  CAN IT RETURN AN ARRAY?              NO   NO   YES  YES  NO   NO
  CAN IT RETURN A RECORD?              NO   NO   YES  YES  NO   YES

CAN A FUNCTION OR A PROCEDURE
  HAVE ANY OBJECT AS A "BY-VALUE" PARM?
  CAN IT HAVE A BY-VALUE ARRAY?        NO   YES  YES  YES  NO   NO
  CAN IT HAVE A BY-VALUE RECORD?       NO   YES  YES  YES  YES  YES

DOES AN ASSIGNMENT STATEMENT COPY
  ANY TYPE OF OBJECT?
  CAN IT COPY AN ARRAY?                NO   YES  YES  YES  NO   NO
  CAN IT COPY A RECORD?                NO   YES  YES  YES  NO   YES

CAN YOU MIX, SAY, "INTEGER"S AND
  "DOUBLE"S IN AN EXPRESSION?          NO   YES  YES  YES  YES  YES

CAN YOU MIX, SAY, "REAL"S AND
  "LONGREAL"S IN AN EXPRESSION?        NO   YES  YES  YES  YES  YES

   The  more similar the treatment of  various types, the easier it is
to achieve data abstraction -- and thus to insulate a program from the
underlying representation that a particular type might have.


                         I/O IN PASCAL AND C

   You  can't  write  a program without I/O  -- that's obvious enough.
Even  minimally  sophisticated  programs, especially  system programs,
need  to be able to do many I/O-related things. This doesn't just mean
reading and writing; it means direct I/O (by record number rather than
serial),  building  new  files,  opening  old  files,  deleting files,
checking to see if files exist, and so on.

   Of  course, here we run into the classic problem of portability vs.
functionality.  Nowhere  do operating systems vary  more than in their
file  systems and the modes of I/O that they support; implementing I/O
in a portable programming language can be a nightmare for the language
designers.

   PASCAL  and  C  I/O  are substantially different  in many respects.
Standard  PASCAL and PASCAL/3000 I/O  are different too, and PASCAL/XL
adds  a  couple  more interesting quirks. And,  of course, Kernighan &
Ritchie  C and Draft Standard C have their differences as well -- what
fun!

   Before  I  go  further,  some ground rules  have to be established.
There  are  two  ways  to  talk  about I/O (or any  other feature of a
language):

   *  We  can  discuss the BUILT-IN I/O  mechanisms; in PASCAL's case,
     this  includes  WRITE,  READ, WRITELN, READLN,  GET, PUT, and the
     like  --  in  C's it includes  "fopen", "fclose", "getc", "putc",
     "printf", "scanf", etc.

   *  We  can  discuss how EXTENSIBLE the  I/O mechanism is. Since I/O
     systems differ on all machines, no standard portable language can
     include  all  the  features that are  available on all computers.
     Thus,  the  question arises -- how  easily can we use additional,
     machine-related   features,   together   with  the  standard  I/O
     facility?

     In  other  words,  do  we have to choose  "all standard" vs. "all
     native  mode"  or  can we, say, open  a file using our particular
     computer's  I/O  system  and  then  read it  using the language's
     facility?

   This, I believe, is an important distinction. It's true that PASCAL
and  C are "extensible" languages -- as long as a hook is available to
the   machine-specific   system   procedures   (e.g.   INTRINSICs   in
PASCAL/3000),  we  can  use the host's I/O  system (e.g. FOPEN, FREAD,
FWRITE, FCLOSE). But what's the point of re-inventing the wheel?

   We'd  like  the  built-in I/O system to  satisfy most of our needs,
both for portability's sake and convenience's sake. On the other hand,
we  know  that  some  machine-dependent features won't  be included in
either the standard or even the particular machine implementation.

   How do you expect, for instance, to have PASCAL/3000 know about RIO
files?  You  have  to  have  some  means  of accessing  the native I/O
procedures  (e.g. HP's FOPEN, FCLOSE, etc.), but more than that -- you
have  to  be  able  to  use a maximum of  the language's I/O mechanism
combined  with  the  necessary minimum of  the host's non-portable I/O
system.

   In  other  words,  you  shouldn't  be  forced to  either use RESET,
READLN, and WRITELN or FOPEN, FREAD, FWRITE, and FCLOSE, but not both.
For  instance, you ought to be able to  call FOPEN to open a file in a
special  mode  but  then  use  READLN  and  WRITELN  against  it;  or,
conversely,  open the file using RESET or  REWRITE and then be able to
call built-in procedures like FGETINFO or FREADLABEL against it.

   This  will  be  both easier to write and  more portable -- when you
port   the   program,   you'll   only   have   to   change  the  small
system-dependent part.


                           STANDARD PASCAL

   I  have  a  theory  about  SPL. I believe that  the main reason why
SPL/3000  isn't  more  popular in the HP3000  community is not that it
has,  say, an ASSEMBLE statement or a TOS construct. Nobody HAS to use
ASSEMBLEs or TOSes.

   Rather,  the  problem was that you CAN'T  DO SIMPLE I/O IN SPL. You
want  to write a program that adds two numbers? The addition statement
is simple:

   INTEGER NUM1, NUM2, RESULT;
   RESULT:=NUM1+NUM2;

Ah, but the I/O!

   INTRINSIC ASCII, BINARY, PRINT, READX;
   INTEGER ARRAY BUFFER'L(0:39);
   BYTE ARRAY BUFFER(*)=BUFFER'L;
   INTEGER LEN;
   LEN:=READX (BUFFER'L, -80);
   NUM1:=BINARY (BUFFER, LEN);
   LEN:=READX (BUFFER'L, -80);
   NUM2:=BINARY (BUFFER, LEN);
   ...
   LEN:=ASCII (RESULT, 10, BUFFER);
   PRINT (BUFFER'L, -LEN, 0);

And  this  is  without  prompting  the  user,  or printing  any string
constants  at  all! How can a beginner  program get anything DONE this
way?  For that matter, think of the trouble that even an EXPERT has to
go to to do anything useful!

   Note that in SPL, you have complete FLEXIBILITY -- you can call any
intrinsic,  open a file in any mode, do I/O with any carriage control.
But,  since  you  have  no  built-in  I/O interface to  make all these
features  easy  to  use, you have to go through  a lot of effort to do
what  you  need to do. Like life  itself -- everything is possible but
nothing is easy.

   PASCAL -- having originally been designed as a teaching language --
naturally placed a premium on quick "start-up" time. Terminal I/O, for
instance, of either strings or numbers, isn't difficult; READ, READLN,
WRITE,  and WRITELN can do  appropriate formatting. File I/O, however,
is  quite a bit less flexible, and even terminal I/O lacks some rather
valuable features.

   Consider the set of Standard PASCAL I/O operators:

   * READ and READLN can be used to read data from a file.

   * WRITE and WRITELN can be used to write data to a file.

   * PAGE can be used to trigger a form feed.

   *  RESET and REWRITE "open" files  for reading or writing, filename
     that corresponds to a particular PASCAL file.

   *  GET,  PUT, and file buffer variables  allow you to work with the
     file  in  a slightly different way than  READ and WRITE; we won't
     discuss  these  much  in  this  section, since for  the most part
     they're quite similar to READ and WRITE.

   * EOLN allows you to detect an end-of-line condition in text input.

   *  EOF allows you to determine whether or not the NEXT read against
     a  file  will  get  an  end-of-file. This is  very nice, since it
     allows you to say:

       WHILE NOT EOF(F) DO
         BEGIN
         READLN (F, X, Y, Z);
         ...
         END;

     as  opposed  to,  say,  the SPL solution, with  which you have to
     repeat the read twice:

       FREAD (F, REC, 128);
       WHILE = DO
         BEGIN
         ...
         FREAD (F, REC, 128);
         END;

   This is what PASCAL has -- what doesn't it have?

   *  There is no standard way of  telling PASCAL to output a "prompt"
     --  a string not followed by a carriage return/line feed. A vital
     operation,  I'm  sure  you'll agree, and  surely any computer can
     support it -- why doesn't Standard PASCAL include it?

   *  There  is  no  standard  way of accessing  a file using "direct-
     access"  --  reading  or  writing  by  record  number  instead of
     serially  (like  FREADDIR  and  FWRITEDIR  do).  Even  FORTRAN IV
     supports this (READ (10@RECNUM))!

   * There is no standard way of indicating exactly what file you want
     to open. Most PASCALs associate some default system filename with
     each  file  declared  in  the  programmer  (e.g. the  pascal file
     "EMPFILE" may be associated with the MPE filename "EMPFILE"); but
     what if you don't know the filename at compile time?

     Portability, incidentally, isn't a concern here. There are plenty
     of  very  portable  programs that require this  feature -- say, a
     simple file copier, a text editor, etc.

   *  Standard PASCAL allows you to open a file for read access or for
     write  access.  You  can't  open  a  file  for  APPEND  access or
     INPUT/OUTPUT  access,  both very common  requirements. Again, why
     not? Almost every operating system supports these access modes!

   * Of course, no provisions are provided for other, equally portable
     and  equally  important  features  like closing  a file, deleting
     files, creating files, checking if a file exists, not to mention,
     say, renaming a file.

   *  No  standard  mechanism  exists  for  detecting  errors  in file
     operations.  If,  say, a file open  (RESET or REWRITE) fails, the
     program is typically aborted by the PASCAL run-time library. What
     about graceful recovery?

     How  would  you  like,  say,  a  command-driven file-manipulation
     program  that  aborted  with  a  compiler  library  error message
     whenever you gave it a bad filename?

   *  The  lack of error handling is  particularly grave in READs from
     text  files. It's great that PASCAL  will parse the numeric input
     for  you,  but what if the user  enters an invalid number? Surely
     you don't just want the entire program to abort!

   *  WRITELN and READLN are rather simple-minded. No provision exists
     for  left-  vs.  right-justification,  octal  or  hex  output  of
     numbers,  mandatory  sign specification (i.e. print  a "+" if the
     number  is positive, rather than printing  no sign at all), and a
     number of other useful things.

   I  find  this  to  be  a  rather  substantial set  of inadequacies,
especially if we want to use PASCAL as a system programming language.

   Now,  all those problems are a property of Standard PASCAL. I'll be
the  first  to  admit  that virtually all  PASCAL implementations work
around  at least some of these things  (after all, if they didn't, the
language wouldn't be usable).

   However,   remember   the  advantages  of  STANDARDS.  Some  PASCAL
compilers  might call the prompt function PROMPT and others might just
use  WRITE;  some  might  have an APPEND procedure  to open a file for
append  access  and  others might have this as  an option to a general
OPEN procedure.

   A  general language like PASCAL is great for writing portable code,
and  surely  there's nothing non-portable about  wanting to prompt the
user  or  append  to  a  file! But,  the more implementation-dependent
features we have to use, the more portability we'll lose.


                             PASCAL/3000

   The  designers  of  PASCAL/3000  knew  about Standard  PASCAL's I/O
deficiencies,  and  they  introduced  a number of  features to correct
them:

   *  PROMPT  has been added -- this  is just like WRITELN, but prints
     its stuff without a carriage return/line feed.

   * READDIR, WRITEDIR, and SEEK do direct I/O; they are equivalent to
     FREADDIR, FWRITEDIR, and FPOINT.

   *  RESET and REWRITE allow you to  specify the filename of the file
     to be opened, for input or output access, respectively.

   *  OPEN  allows you to open a  file for input/output access; APPEND
     lets you open for appending.

   * CLOSE lets you close a file; procedures like LINEPOS, MAXPOS, and
     POSITION  let you find out various information about an open file
     (a  very small subset of FGETINFO). CLOSE has an option that lets
     you purge the open file or save it as temporary.

   *  FNUM allows you to get the  file number of any open PASCAL file,
     thus  letting you call any  file system intrinsic (like FGETINFO,
     FRENAME,  etc.)  on  a  PASCAL  file.  This is a  MAJOR and VITAL
     flexibility  feature, because otherwise you  would have to either
     do your I/O on a particular file using either ONLY the PASCAL I/O
     system, or only the MPE I/O system, but never both.

   *  Finally,  a very intricate  and hard-to-use mechanism (XLIBTRAP)
     has   been   implemented   to   catch   either   I/O   errors  or
     string-to-number  conversion  errors. To use it,  you have to use
     XLIBTRAP,   the   low-level  WADDRESS  procedure,  and  a  global
     variable;  look  at  the  example  in the HP  Pascal manual under
     TRAPPING RUN-TIME ERRORS (pages 10-21 through 10-23 in the OCT 83
     issue  of the manual) to convince yourself that there's GOT to be
     a better way.

This  of  course makes things a lot  more bearable. Still, some things
remain unresolved:

   *  PASCAL/3000  allows  you  to  open  a  file  for  input, output,
     input/output,  and append access. It  also allows you to indicate
     CCTL vs. NOCCTL and SHARED vs. EXCLUSIVE. This is very nice, but,
     of  course,  MPE allows ten times this  many options -- how about
     opening  temporary  files,  opening  files  for  OUTKEEP  access,
     specify record size, file limit, ASCII vs. BINARY, etc.?

     You can use FNUM to go from a PASCAL file variable to an MPE file
     number  and thus use MPE intrinsics  on a PASCAL-opened file. You
     ought  to be able to do the converse of this -- open a file using
     FOPEN  and then use PASCAL features  (like READLN and WRITELN) on
     this  file.  You  can't -- if you need  to open, say, a temporary
     file,  you'd  have  to FOPEN it and then  use your own FREADs and
     FWRITEs.

     (Actually,  you could use a :FILE equate issued using the COMMAND
     intrinsic;  this,  however,  is  much  more  cumbersome,  doesn't
     support  all  the FOPEN intrinsic options,  and prevents you from
     allowing the user to issue his own file equations for the file.)

   * Error trapping, as I said, is still very hard to do.

   The  first  two,  I think, are the  most serious problems. Since so
much   systems   programming  involves  file  handling,  flexible  and
resilient file system operations are, I believe, a MUST for any system
programming  language.  PASCAL/3000  is  a  lot  better  than Standard
PASCAL, but it still has some flaws.


                              PASCAL/XL

   PASCAL/XL  I/Os is even better than PASCAL/3000's. PASCAL/XL adds a
couple of features that make its I/O capability almost complete:

   * The ASSOCIATE built-in procedure:

       ASSOCIATE (pascalfile, fnum);

     Very  simple.  Makes the given PASCAL  FILE variable point to the
     file  indicated  by FNUM. Thus, you  can call FOPEN with whatever
     options  your  heart  desires,  and then use  all of PASCAL's I/O
     facilities against that file. Such a deal!

   * TRY .. RECOVER. This construct is described in more detail in the
     "CONTROL  STRUCTURES"  chapter  of  this  manual  --  and  a very
     powerful construct it is -- but for file I/O it lets you do this:

       ERROR:=FALSE;
       REPEAT
         PROMPT ('Enter filename: ');
         READLN (FILENAME);
         TRY
           OPEN (FILENAME);
         RECOVER
           ERROR:=TRUE;     { will branch here in case of error }
       UNTIL
         NOT ERROR;

     You  just wrap a "TRY" and  a "RECOVER" around the file operation
     that  might get an error, and  the statement after "RECOVER" will
     get branched into in case of error (instead of having the program
     abort). Similarly, you can say:

       ERROR:=FALSE;
       REPEAT
         PROMPT ('Enter an integer: ');
         TRY
           READLN (I);
         RECOVER
           ERROR:=TRUE;
       UNTIL
         NOT ERROR;

     Still  not  QUITE as easy as I'd like  it to be, but a lot better
     than before.

   The only trouble is that -- at least for you and me and the rest of
the  HP3000 user community -- PASCAL/XL  is still a "future" language;
we  can't really tell how good it is until we've hacked at it for some
time  and have seen all the  implications of the various new features.
Still,  the  PASCAL/XL I/O system seems  to be an eminently reasonable
and usable creature.


                        KERNIGHAN & RITCHIE C

   The  original  "Kernighan  &  Ritchie"  book,  which  for practical
purposes  was the original C "standard" has a chapter describing the C
I/O  library. Its first sentence was  "input and output facilities are
not  part  of  the  C language", which  while technically true, proved
practically incorrect. By the very act of inclusion of the I/O chapter
into  the  K & R C book, this  I/O library became as "standard" as the
rest  of  the  C  language  described therein -- which  is to say, not
entirely   standard,   but   nonetheless  surprisingly  compatible  on
virtually all modern machines.

   Note  that the same can not be  said of the next chapter, "THE UNIX
SYSTEM  INTERFACE", and I won't consider  the features listed there as
part of standard C.

   The  list  of standard C I/O  features differs from standard PASCAL
I/O features:

   *  The  C I/O facility emphasizes what  are known in PASCAL as TEXT
     files  --  files  that  are  viewed as streams  of characters. In
     PASCAL  you  can  declare  a  "FILE OF RECORD"  or "FILE OF ARRAY
     [0..127]  OF  INTEGER"  and read an  entire record or 128-element
     array   at   a   time.  In  C  you'd  have  to  read  this  array
     character-by-character.

     Note  that  from  a performance point of view,  this may not be a
     problem,  since virtually all C's buffer  their I/O in rather big
     chunks  --  256  single-byte reads in C  shouldn't be much slower
     than  a single 128-word read.  Still, this kind of character-read
     loop is more cumbersome than one would like.

   *  C  provides  the  GETC  and PUTC primitives to  read and write a
     character at a time.

     C  also  provides  an  UNGETC  primitive  that "ungets"  the last
     character  you've read, effectively moving  the file pointer back
     by one byte and assuring that the next character you'll read will
     be the same one you've just read.

     This is surprisingly useful, especially for parsing.

   *  C  provides  FSCANF  and FPRINTF to do  formatted I/O. These are
     rather  more powerful than PASCAL's READLN and WRITELN -- see the
     FORMATTED I/O: C vs. PASCAL section below.

   *  FGETS  and FPUTS read an entire line  at a time. Nothing much --
     just like PASCAL's READLN and WRITELN of strings.

   *  FOPEN  (not  to  be confused with the  MPE intrinsic of the same
     name!) lets you open an arbitrary file for read, write, or append
     access.  Unlike Standard PASCAL, C allows you to specify the name
     of the file you want to open. FCLOSE closes an open file.

   *  End of file is indicated by a special return condition from GETC
     and PUTC. In PASCAL, of course, the special EOF procedure returns
     you  the  end of file indication, and  all attempts to read at an
     end   of  file  cause  a  program  abort.  Each  method  has  its
     advantages.

   * Records in a file are delimited not by a special "line delimiter"
     as  in  PASCAL,  but  rather  by  the  ordinary  ASCII  character
     "NEWLINE".  The exact ASCII value of this character is left up to
     the  compiler's  discretion -- it's usually  a LINEFEED (10), but
     sometimes  a  CARRIAGE  RETURN  (13); however,  this character is
     always available in C as '\n', so you can say something like:

       if (getc(stdin) == '\n')
         /* do end of line processing */

   *  If  you  want  to  skip  to  the next line in  a file (or on the
     terminal),  you have to output a newline character. Thus, instead
     of

       WRITELN ("HELLO, WORLD!")

     you'd say

       fprintf (stdout, 'hi, wld!\n');
         /* no C programmer would actually SPELL OUT "hello" */
         /* or "world". */

     This  implies that just by omitting the "\n", you can prevent the
     skip-to-the-next-line. Thus,

       printf ("name? ");  /* same as 'fprintf (stdout,"name? ");' */
       scanf ("%s", &name);

     will  presumably prompt the user with  "name? " and request input
     on the same line. In most PASCALs (including PASCAL/3000), if you
     do a WRITE followed by a READ, the prompt won't actually come out
     until a subsequent WRITELN -- the WRITE output will be "buffered"
     until  "flushed" by a WRITELN. As best  I can tell, no C standard
     would  actually  prevent this behavior in  a C compiler; however,
     most  C  compilers  do  the  right  thing  and flush  any pending
     terminal output before doing terminal input.

   * Error handling is different from PASCAL's:

     -  If FOPEN can't open a file,  it returns a special value to the
       program  (unlike PASCAL, which aborts the program). The program
       can  then check for this condition and handle it appropriately.
       I  like  this,  even  if  there's  no standard way  to find out
       exactly what kind of error occurred.

     -  FSCANF  behaves  differently  from  PASCAL READLN.  If you use
       FSCANF  to read an integer it won't print an error if it sees a
       letter  or some special character;  rather, it'll just consider
       that that character has delimited the read of the integer. 0 is
       returned  as  the  value  of the integer,  and the file pointer
       points  to  the  non-numeric  character that  stopped the read.
       Then,  you  can  use  GETC to make sure  that the character was
       really  a  newline  or  a  blank  or  whatever  it is  that you
       expected; or, you can check the result of FSCANF (which will be
       set  to  the  number of items actually read)  to see if all the
       items that you were asking for were really given. This is a lot
       better  than PASCAL's approach of  just aborting and giving the
       program no chance to recover gracefully.

     -  Error  conditions  for the other functions  (except for end of
       file on GETC) are not defined by K&R.

Seeing  how  I tore apart Standard  PASCAL's I/O facility earlier, you
might expect the following complaints from me about C's I/O:

   *  As  I  mentioned,  K&R C can't gracefully  handle reads of, say,
     records or large arrays from files. It emphasizes flexible-format
     text files rather than fixed-format binary files.

   *  You  can't read or write a  record at a particular record number
     (direct access); you can only access the file serially.

   * You can't open a file for input/output access.

   * No delete/create/check-if-file-exists support.

How sad!


                        DRAFT ANSI STANDARD C

   Draft ANSI Standard C has expanded the standard C I/O library quite
dramatically.  A  number of useful (and  often confusing) features now
exist:

   * Input/output file opens are supported.

   *  I/O  in  units of more than one  character is allowed; FREAD and
     FWRITE  (again,  no  relationship to the  MPE intrinsics) let you
     easily read or write structures and arrays from/to files.

   * Direct I/O is provided using the FSEEK procedure, which positions
     the  file  pointer  in  a  file.  Then  you  can  use any  of the
     read/write   mechanisms  (GETC,  PUTC,  FSCANF,  FPRINTF,  FREAD,
     FWRITE, etc.) to do the I/O at the new location.

   *  Error  handling  is  more concrete. Presumably,  none of the new
     services  are  allowed  to  abort  in  case of  error; the FERROR
     procedure  returns  to  you  the  error status  (combination of a
     has-error-occurred-flag  and  some  kind  of error  number) of an
     operation.

   * REMOVE and RENAME, which delete and rename files, are provided.

   * Various other new features of various utility and arcaneness have
     been added.

   These  all look quite nice, and seem to satisfy me as thoroughly --
or more so -- as PASCAL/XL. Note, however, that the only two languages
that  I'm happy with are ones that barely exist and in which I've done
virtually no serious programming.

   This  may say something about my  character; it also says something
about the pitfalls of comparing "new- improved-we'll have them for you
Real  Soon  Now"  languages. Both PASCAL/XL and  Draft Standard C SEEM
nice, but who knows how and whether they'll actually work?

   One  other  thing that I ought to point  out: as you recall, in the
discussion  of  PASCAL/3000 and PASCAL/XL I  sang the praises of FNUM,
which  returns  the system file number of  a PASCAL file variable, and
ASSOCIATE,  which  initializes  a  file  variable to point  to a given
system file number. The reason for this was to allow you to mix PASCAL
and native file system I/O.

   Naturally,  Draft  Standard  C, being a  portable standard, doesn't
discuss these features; however, I wouldn't like to use any particular
implementation  of  C  that  doesn't support  FNUM- and ASSOCIATE-like
operations.  I  hope  that HP's C/XL provides  them; I know that CCS's
C/3000 provides both.


                     FORMATTED I/O: C vs. PASCAL


   In the previous discussion, we talked about the I/O operations that
C  and  PASCAL  allow.  Two  of  the most useful  ones, of course, are
formatted write and formatted read -- this is what allows you to input
and output numbers (so hard to do using SPL).

   Standard  PASCAL  lets you input (READ,  READLN) and output (WRITE,
WRITELN)  characters, strings, integers, reals, and booleans. A sample
call might look like:

   WRITELN (STRING_VALUE, INT_VALUE:10, REAL_VALUE:10:2);
            { write a string, an integer left-justified in a
              10-character field, and a real number in a 10-character
              field with 2 characters after the decimal }

or

   READLN (STR, INT, REALNUM);

These procedures allow you to

   * Read entities delimited by blanks.

   * Write values right-justified in a fixed-format field.

   *  Write  values "free-format", i.e. in  as many characters as they
     need (this is done by setting the "width" parameter in WRITELN to
     a  size smaller than that needed  to fit the entire number; e.g.,
     WRITELN (I:0)).

   *  Write  real numbers in either  exponential format (comparable to
     FORTRAN's Ew.d) or fraction format (Fw.d).

   *  Output  booleans  as the strings  "TRUE" or "FALSE"; PASCAL/3000
     expands  this  to  allow  WRITEing  any variable  belonging to an
     enumerated  type as its symbolic equivalent. Thus, if COLOR is of
     type  (MAUVE, PUCE, AQUA) and has  value PUCE, it'll be output as
     "PUCE",  rather  than,  say,  1, which might  happen to be PUCE's
     integer representation.

This  is quite a nice set of  functions, but quite obviously there are
some important features missing:

   *  The ability to output data  to a program string variable, rather
     than a file. PASCAL/3000 has this feature (STRREAD and STRWRITE).

   * Output in hex or octal, vital for a system programming language.

   * Left-justified as well as right-justified output.

   * Money format ("123,456,789").

   *  As mentioned before, some way  of reading numbers without having
     the  program abort in case the number is invalid (AS I SAID, THIS
     IS *VERY IMPORTANT!).

Less important but desirable features include:

   *  Padding  with  zeroes (e.g. printing  123 as "00123"; especially
     important in octal and hex).

   * Always printing the sign, even if the number is positive.

   The  most important failing of PASCAL's  READ, WRITE, et al. is, in
fact, one of the less obvious ones:

   *  If  you're  dissatisfied with the way  READ and WRITE work, IT'S
     VERY DIFFICULT FOR YOU TO WRITE YOUR OWN.

Think  about  it  --  say  you  wanted to add  a "money format" output
facility. You'd like to write a procedure called MYWRITELN that's just
like  WRITELN,  but  allows  its  caller  to  somehow  specify  that a
particular  type REAL parameter is to  be output in money format. What
could you do?

   Remember:

   *  In Standard PASCAL and PASCAL/3000 you can't have your functions
     have a variable number of parameters.

   *  In Standard PASCAL and PASCAL/3000 you can't have your functions
     take parameters of flexible data types.

   * Almost incidentally to all this, READ, WRITE, READLN, and WRITELN
     are  the  only  "procedures" that allow  you to specify auxiliary
     parameters like field width and fraction digits using a ":".

You  see, PASCAL documentation calls  READ, READLN, WRITE, and WRITELN
"procedures", but they're not like the procedures that we mere mortals
can write. If we want to write our "money-format output" procedure, we
have to have it have only one data parameter of type REAL and a couple
of parameters indicating the field width and fraction digits.

   A typical call to this might look like:

   WRITE ('COMMISSIONS ARE ');
   WRITEMONEY (COMMISSIONS, 15, 2);
   WRITE (' OUT OF A TOTAL OF ');
   WRITEMONEY (TOTAL, 15, 2);
   WRITELN;

Instead  of being able to stick this  all into one WRITELN, we have to
have  a  special procedure that takes exactly  one value to be output,
making us write five lines rather than one.

[In PASCAL/3000, we can avoid this by having WRITEMONEY be a procedure
that  returns  a  string instead of outputting  it, and then write the
call  as  "WRITELN  ('COMMISIONS  ARE ',  FMTMONEY(COMMISIONS, 15, 2),
...");   however,   this   is   both   fairly  inefficient  and  quite
non-portable,  since Standard PASCAL doesn't allow functions to return
string results.]

   Note that this all applies only to Standard PASCAL and PASCAL/3000.
PASCAL/XL's  winning  new  features  might  very well  extinguish this
particular problem.

   So much for PASCAL. How about C?

   Standard  C's WRITELN is called PRINTF  (or FPRINTF, if you want to
print  to a file rather than the terminal); READLN is called SCANF (or
FSCANF, to read from a file). Examples might be:

   printf ("%s %10d %10.2f", string_value, int_value, real_value);

or

   scanf ("%s %d %f", &string_value, &int_value, &real_value);
         /* The "&"s are needed to indicate that the address of
            the variable is to be passed, not the actual value. */

Note  how  both  PRINTF's  and  SCANF's first  parameters are "control
strings"   that   indicate   the   format  of  the  input  or  output.
Incidentally, they also tell PRINTF and SCANF how many parameters they
are to expect and what the type of each parameter will be. If you make
an  error in the control strings,  beware! You'll get VERY interesting
results.

In any event, PRINTF's and SCANF's features include:

   * Output of integers, in decimal, octal, or hex.

   *  Output  of  reals, in exponential or  fractional format. You can
     also  output  a  real  number  using  so-called  "general" format
     (similar to FORTRAN's Gw.d), which uses exponential or fractional
     format, whichever is shorter.

   *  Free-format output, left-justification, and right-justification.
     In  other  words, 10 can be output as  "10", "   10", or "10   ".
     All three of these formats are useful in different applications.

   * Zero-padding (e.g. outputting 10 as "00010").

   *  Bad  input data (strings where  numbers are expected, etc.) does
     not  generate an error; in fact, the  only way of detecting is to
     read  the  next  character  after the SCANF  is done (say, using,
     GETC)  and see if it's the  terminator you expected (e.g. a blank
     or a newline) or some other character that might have been viewed
     as a numeric terminator.

     This  is somewhat cumbersome, but in  the long run more flexible;
     it is certainly much better than having your entire program abort
     whenever the user types bad data.

   *  Standard  C  allows you to use SSCANF  to read from a string and
     SPRINTF to output to a string.

This  is, overall, a richer feature  set than PASCAL's. Note, however,
some problems:

   * Still no monetary output facility.

   *  Still  no  "always  print a sign character  even if the number's
     positive" feature (again, Draft Standard corrects this).

   *  Unlike PASCAL, printing a boolean value will just print a 0 or 1
     (since C doesn't have a separate boolean type). What's more, even
     an  enumerated  type  value  will just be  printed as its numeric
     equivalent (since C views variables of enumerated types as simple
     integer constants).

In  my  opinion,  these  things  are  all  pretty  bearable;  but, the
important  thing is that in C you CAN define your own PRINTF and SCANF
like procedures.

   You  can have parameters of varying types; even variable numbers of
parameters  are  supported  by  the Draft  Standard (most non-Standard
compilers  give you some such feature,  too). Thus, you can write your
"myprintf" procedure, and call it using

   myprintf ("comms are %15.2m of tot %15.2m", commissions, total);
            /* assuming you've defined "%X.Ym" to be your
               "money-format" format specifier. */

Of course, nobody says it'll be easy to write this MYPRINTF procedure,
especially  if  you'll want to emulate  the standard PRINTF directives
(which  can  be done by just calling  SPRINTF); the important thing is
that  you  CAN  write  a procedure like  MYPRINTF, whereas in Standard
PASCAL and PASCAL/3000, you can't.


                      SUMMARY OF I/O FACILITIES

[Since  SPL  relies  solely  on  the  HP System Intrinsics  to do I/O,
numeric  formatting,  etc.,  I  don't  include it  in this comparison.
Believe  me  --  with  SPL I/O, everything is  possible but nothing is
easy.]


                                       STD  PAS/ PAS/ K&R  STD
                                       PAS  3000 XL   C    C

OPEN ARBITRARY FILE GIVEN NAME         NO   YES  YES  YES  YES

OPEN FILE FOR APPEND ACCESS            NO   YES  YES  YES  YES

OPEN FILE FOR INPUT/OUTPUT ACCESS      NO   YES  YES  NO   YES

CLOSE A FILE                           NO   YES  YES  YES  YES

READ, WRITE FILES SERIALLY             YES  YES  YES  YES  YES

READ, WRITE FILES BY RECORD NUMBER     NO   YES  YES  NO   YES

DETECT AND HANDLE FILE ERRORS          NO   NO+  YES  YES  YES

FORMAT NUMBERS FOR OUTPUT              YES  YES  YES  YES+ YES+

INPUT NUMBERS                          YES  YES  YES  YES  YES
  INPUT ERROR DETECTION?               NO   NO+  YES  YES  YES

OUTPUT A STRING WITH NO NEW-LINE       NO   YES  YES  YES  YES

USE A PASCAL-OPENED FILE FOR NATIVE    N/A  YES  YES  N/A  N/A
  FILE OPERATIONS

USE A "NATIVELY" OPENED FILE FOR       N/A  NO   YES  N/A  N/A
  PASCAL FILE OPERATIONS

WRITE YOUR OWN WRITELN/PRINTF-LIKE     NO   NO   YES- YES- YES
  FUNCTION (WITH VARIOUS PARAMETER
  TYPES AND NUMBERS OF PARAMETERS)

LEGEND:  YES = Implemented.
         YES+ = Implemented in a really nice and useful way.
         YES- = Implemented, but there's some ugliness involved.
         NO = Not implemented.
         NO+ = I can't fairly say that it's simply "not implemented",
               but believe me, it's soooo ugly...
         N/A = Not applicable.


                  STRINGS IN STANDARD PASCAL AND SPL

   Much,  if  not most, of the data  we keep on computers is character
data  --  filenames,  user  names,  application  data, text  files. Of
course,  it's  imperative  that  any  programming language  we use can
represent and manipulate this sort of data.

   Standard  PASCAL's  mechanism  for  storing strings  is the "PACKED
ARRAY  OF  CHAR". PACKED here is simply  a convention used to indicate
that  there should be one character stored per byte, not one per word.
I've never seen anyone use an unpacked ARRAY OF CHAR.

   If  you  think about it, SPL and C  use PACKED ARRAY OF CHARs, too.
All  a PACKED ARRAY [1..100] OF CHAR  means is "an array of 100 bytes,
each of which is individually addressable". Practically speaking,

   VAR X: PACKED ARRAY [1..256] OF CHAR;   { PASCAL }
   BYTE ARRAY X(0:255);                    << SPL >>
   char x[256];                            /* C */

are all identical -- and fairly reasonable -- ways of storing a string
that's between 0 and 256 characters long. Still, despite this identity
of  representation,  I claim that Standard  PASCAL has severe problems
with string processing.

   Support  for  strings involves much more than  just having a way of
representing  them. The important thing for strings -- as for any data
type  -- is the OPERATORS THAT  ARE DEFINED to manipulate them. What's
the   use  of  having  strings  if  you  can't  extract  a  substring?
Concatenate  them? Find a character within a string? It's by the level
of this sort of support that a language's string facility is measured.

   SPL, for instance, has several useful operators that help in string
manipulation:

   * You can say

       MOVE STR1(OFFSET1):=STR2(OFFSET2),(LENGTH);

     to  move one substring of a  string into another. PASCAL can only
     move  the  entire thing in one  shot (STR1:=STR2), or examine/set
     one character at a time (STR1[I]:=STR2[I]).


   * You can say

       IF STR1(OFFSET1)=STR2(OFFSET2),(LENGTH) THEN ...

     to compare two substrings. You can also compare for <, >, <=, >=,
     and   <>,   as   well  as  compare  against  constants  (e.g.  IF
     STR1(X)="FOO").

   * You can say

       MOVE STR1(OFFSET1):=STR2(OFFSET2) WHILE ANS;

     that  will  copy  substrings WHILE the  character being copied is
     Alphabetic  or Numeric (upShifting in  the process). You can copy
     only  WHILE  AN  (no  upshifting),  WHILE  AS  (while alphabetic,
     upshifting),  or  WHILE N (while numeric).  You can also find out
     how  many characters were so copied  (i.e. at what point the copy
     stopped).

   * You can say

       I:= SCAN STR(OFFSET) UNTIL "xy";

     assigning  to I the index of the first character in the substring
     which is either equal to "x" or to "y"; you can also say

       I:= SCAN STR(OFFSET) WHILE "b";

     which  will  assign to I the index  of the first character in the
     substring that is NOT equal to "b" (for more details, see the SPL
     manual).  Note  that you can NOT say  "SCAN until you either find
     this  character  OR you've gone through  80 characters", which is
     very desirable if you know that the maximum length of your string
     is, say, 80.

   * You can say

       P (STR(OFFSET));

     calling  the  procedure  P and passing to  it all of STR starting
     with  offset  OFFSET.  In  PASCAL,  you can only  pass the entire
     string,  not  this sort of substring.  Note, however, that in SPL
     you can't pass BYTE ARRAYs by value -- only by reference.

   *  On  the  other  hand, reading or writing  strings is rather more
     difficult  than  one  would  like.  Since  the  PRINT  and  READX
     intrinsics  take "logical arrays" rather  than "byte arrays", you
     in general have to say:

       LOGICAL ARRAY BUFFER'L(0:127);
       BYTE ARRAY BUFFER(*)=BUFFER'L;
       INTEGER IN'LEN;
       ...
       IN'LEN:=READX (BUFFER'L, -256);
       MOVE STR(OFFSET):=BUFFER,(IN'LEN);
       ...
       MOVE BUFFER:=STR(OFFSET),(LEN);
       PRINT (BUFFER'L, -LEN, 0);

These  features  are  a  part  of  the SPL language;  you can use them
without  writing any procedures of your  own. Furthermore, if you want
to,  say, write a procedure that finds the first occurrence of STR1 in
STR2, you can just say:

   INTEGER PROCEDURE FIND'STRING (STR1, LEN1, STR2, LEN2);
   VALUE LEN1, LEN2;
   BYTE ARRAY STR1, STR2;
   INTEGER LEN1, LEN2;
   ...

and  implement it yourself. It may not  be easy (actually, it is), but
it's certainly possible.

   These  are the string-handling features  that SPL supports, and you
may  consider them sufficient or not. PASCAL supports a different, and
somewhat smaller set of features:

   *   You  can  copy  entire  strings,  or  examine  and  set  single
     characters:

       STR1:=STR2;

     or

       STR1[I]:=STR2[J];

     You  can't copy substrings without writing  your own FOR loop, or
     having  a  special  temporary array and  calling the little-known
     PACK and UNPACK procedures.

   *  You can input and output a  string using READLN and WRITELN. You
     can output the first N characters of a string by saying

       WRITELN (STR:N);

     but you again can't output an arbitrary substring of STR.

   * You can pass a string to a procedure; you can't pass a substring.
     On  the other hand, you can pass a  string by value as well as by
     reference.

As  you  see,  this  set of operators is  in some respects richer than
SPL's  (I/O)  and  in  others poorer  (substrings, comparisons, SCANs,
etc.). But the WORST problem with PASCAL's string handling is:

   * YOU CAN'T WRITE YOUR OWN GENERAL STRING HANDLERS!

   Strange  for  a language that emphasizes  breaking things down into
little,  general-purpose procedures, eh? But  if you've read the "DATA
STRUCTURES, TYPE CHECKING" chapter of this paper, you'll know why:

   * YOU CAN'T DECLARE A PROCEDURE TO TAKE A GENERAL STRING!

You can write

   TYPE PAC256 = PACKED ARRAY [1..256] OF CHAR;
   ...
   FUNCTION STRCOMPARE (S1, S2: PAC256): INTEGER;

but  THE ONLY STRINGS YOU CAN PASS TO THIS PROCEDURE ARE THOSE OF TYPE
PAC256!  What  if  you  have  a string that's at  most 8 bytes long, a
PACKED  ARRAY  [1..8] OF CHAR? No dice!  You have to either declare it
with a maximum length of 256 bytes (thus wasting 97% of the space!) OR
write  one  STRCOMPARE procedure for every  possible combination of S1
and S2 maximum lengths.

   This  means that not only do you  start with a somewhat poor set of
string  handling  primitives,  but  you'll  have  a very  hard time of
implementing  your own, unless you're willing to have all your strings
be  of  the  same maximum length! In my  opinion, this is a very, very
unpleasant circumstance.


             STRING HANDLING IN PASCAL/3000 AND PASCAL/XL

   A   better  string  handling  system  is  one  of  the  conspicuous
improvements that HP put into PASCAL/3000.

   The  first new feature that you'd  notice in PASCAL/3000 strings is
that A PASCAL/3000 STRING CONTAINS MORE THAN JUST CHARACTERS. When you
say

   VAR S: STRING[256];

you're  allocating  more  than  just a PACKED  ARRAY [1..256] OF CHAR.
You're essentially creating a record structure:

   VAR S: RECORD
          LEN: -32768..32767;   { 2 bytes in /3000; 4 bytes in /XL }
          DATA: PACKED ARRAY [1..256] OF CHAR;
          END;

Now  S  isn't  REALLY  a  PASCAL  RECORD -- you  can't just access its
subfields  using "S.LEN" and "S.DATA". But  internally, it is a record
structure,   in  that  it  contains  both  of  these  pieces  of  data
independently. When you say

   S:='FOOBAR';

then  not only will PASCAL move "FOOBAR" to the data portion, but also
set  the  length  portion  to 6. The LEN  subfield contains the actual
current  length  of  the  data;  there  may  be  room  for  up  to 256
characters,  but  in this case it indicates  that the actual length is
only 6 characters.

   A  brief  aside:  Obviously,  it's quite important  to somehow keep
track  of the current string length. For a fixed-length thing like the
8-character account name, we may not need it, but if we're doing, say,
text  editing,  we want to know the actual  length of the line. Let me
point  out,  though,  that  keeping  a  separate  length field  is not
imperative  for this; C uses a  NULL character as a string terminator,
and  many  SPL  programmers  do  similar things. In  other words, just
because  PASCAL/3000  represents  strings  this way,  don't think that
that's the only way of doing it...

   Back  to  the  PASCAL approach. The  representational change is the
most  obvious difference in PASCAL/3000; but,  as we saw earlier, it's
the  OPERATORS rather than the REPRESENTATION  that really make a data
type. PASCAL/3000 provides a pretty rich set, including especially:

   * You can extract and manipulate arbitrary substrings using STR:

        STR(X,10,7)

     returns  a  string containing the 7  characters starting from the
     10th  characters.  The  result  of  the STR function  can be used
     anywhere  a  "real string" could be used;  however, it can not be
     assigned to or passed as a by-reference parameter.

   * You can concatenate two strings using "+":

        S:='PURGE '+STR(FILENAME,1,10)+',TEMP';

   *  You  can find out a string's  length using the STRLEN procedure.
     This  is somewhat more convenient than  in SPL; SPL strings don't
     have  a  separate "length" field, so  most SPL programmers end up
     terminating  their  string  data with  some distinctive character
     (often  a  carriage  return, %15). Thus,  an SPL programmer would
     have to say

        I:= SCAN STR UNTIL [8/%15,8/%15];

     to  scan  through  the  string  looking for a  carriage return (a
     relatively, though not very, slow process). The PASCAL programmer
     would say

        I:= STRLEN(STR);


   *  You  can copy substrings using  STRMOVE. (STRMOVE also works for
     PACKED ARRAY OF CHARs.)


   *  You  can  easily edit a string  using STRDELETE and STRINSERT to
     delete/insert characters anywhere in the string.


   *  You can find the first occurrence of one string in another using
     STRPOS.


   *  You  can  strip  leading and trailing  blanks using STRLTRIM and
     STRRTRIM.  Stripping  trailing  blanks  is a  particularly useful
     operation.


   *  You can also do READLNs  and WRITELNs into strings using STRREAD
     and  STRWRITE; this means that you can easily convert a string to
     an integer and vice versa.


   *  You  can have functions that return  strings (many of the above,
     including STR and STRRTRIM are examples). Standard PASCAL doesn't
     allow functions to return structured types including arrays.

More importantly, you can now write a procedure

   PROCEDURE STRREPLACE (VAR STR, FROMSTR, TOSTR: STRING);
   { Changes all occurrences of FROMSTR to TOSTR in STR. }

that  will  work  for  a  string  of  ANY maximum  length, because you
declared  the parameters to be of type STRING, rather than STRING[256]
or  some  such  fixed  length. (Note, however,  that only BY-REFERENCE
string parameters can be declared to be of type STRING.)

   One fairly serious problem, though, that still afflicts PASCAL/3000
strings  (PASCAL/XL  fixes  this)  is  the  inability  to  dynamically
allocate a string of a size that is not known until run-time. For more
discussion of this, look at the "POINTERS" chapter of this manual.


                         STRING HANDLING IN C

   The  designers of C, of course, faced the same sorts of problems as
the  designers  of  PASCAL/3000, but at least  in the area of strings,
they attacked them in a somewhat different way.

   As  a  matter of convention, C strings  -- kept as simple arrays of
characters  -- are terminated by a NULL ('\0', ascii 0) character. You
can, of course, have a

   char x[10];

array,  none of whose characters is a null, but all that means is that
you'll  get  screwy  results  if you pass it  to a string manipulation
procedure. When you say

   strcpy (x, "testing");

the  C  compiler  will  pass  to  STRCPY  the  address of  the array X
(remember  that  in  C saying "arrayname" gets  you the address of the
array)  and the address of the  8-character string which contains "t",
"e", "s", "t", "i", "n", "g", and NULL. Then STRCPY -- just because of
the  way  it's written, and because this is  the useful thing to do --
will  copy all the characters from  the second string ("testing") into
the first string (X), up to and including the terminating NULL.

   So  here you see a  fundamental representational difference between
PASCAL/3000 (and PASCAL/XL) and C strings:

   *  PASCAL/3000 keeps the current length of the string as a separate
     field.

   *  C  keeps  it  implicitly, having a  null character terminate the
     string's actual data.

The PASCAL/3000 clearly has some advantages:

   *  Determining  the  string length is much  faster -- you need only
     extract the first 2 bytes of the string array, and you've got it.
     In  C, you'd need to scan through each character until you find a
     null.

   *  PASCAL/3000 strings can contain any character. C strings may not
     contain a null, since that would be viewed as a terminator.

In  practice,  though, the second issue  (strings that need to contain
nulls)  doesn't  arise, and the first issue  isn't as important as one
would  think.  In fact, there are  some compensating advantages to C's
approach, but I'll discuss them a bit later.

   Given  what  we  know about the  different representational format,
what about the defined operations?

   Kernighan  & Ritchie is rather  cavalier about this vital question,
and  merely  alludes  to the "standard I/O  library", which is said to
contain various string manipulation functions. Thus, it's not unlikely
that  there'll  be  some  non-trivial  differences  between  various C
implementations in this area (although there'll also be a good deal of
similarity).

   Therefore, I'll have to compare PASCAL/3000 and Draft ANSI Standard
C;  keep  in  mind that the C functions  might not be available on all
compilers.

   Let's  consider a (possibly) typical  application. We need to write
two procedures:

   *  One that takes a file name (MPEX), group name (PUB), and account
     name  (VESOFT),  and  makes them into  a fully-qualified filename
     (MPEX.PUB.VESOFT).

   *  Another  that  does  the  opposite  --  takes  a fully-qualified
     filename  and  splits  it  into  its  file part,  group part, and
     account part.

Here's what they'd look like, in PASCAL:

   PROGRAM PROG (INPUT, OUTPUT);
   TYPE TSTR256 = STRING[256];
        TSTR8 = STRING[8];
   VAR FILENAME, GROUP, ACCT: STRING[8];

   FUNCTION FNAME_FORMAT (FILENAME, GROUP, ACCT: TSTR8): TSTR256;
   BEGIN
   FNAME_FORMAT := STRRTRIM(FILENAME) + '.' + STRRTRIM(GROUP) + '.' +
                   STRRTRIM(ACCT);
   END;

   PROCEDURE FNAME_PARSE (QUALIFIED: TSTR256;
                          VAR FILENAME, GROUP, ACCT: STRING);
   VAR START_GROUP, START_ACCT: INTEGER;
   BEGIN
   START_GROUP := STRPOS (QUALIFIED, '.') + 1;
   START_ACCT := STRPOS (STR (QUALIFIED, START_GROUP,
                              STRLEN(QUALIFIED)-START_GROUP-1), '.')
                 + START_GROUP;
   FILENAME := STR (QUALIFIED, 1, START_GROUP - 2);
   GROUP := STR (QUALIFIED, START_GROUP, START_ACCT-START_GROUP-1);
   ACCT := STR (QUALIFIED, START_ACCT,
                STRLEN (QUALIFIED) - START_ACCT + 1);
   END;

   BEGIN
   WRITELN (FNAME_FORMAT ('MPEX    ', 'PUB     ', 'VESOFT  '));
   FNAME_PARSE ('MPEX.PUB.VESOFT', FILENAME, GROUP, ACCT);
   WRITELN (FILENAME, ',', GROUP, ',', ACCT, ';');
   END.

and in C:

   #include <stdio.h>
   #include <string.h>

   char *strrtrim (s)
   char s[];
   {
   /* Strips trailing blanks from F; also returns F as the result. */
   int i;
   for (i = strlen(s);   (i>0) && (s[i-1]==' ');   i = i-1)
     ;
   s[i] = '\0';
   return s;
   }

   char *fname_format (filename, group, acct, qual)
   char filename[], group[], acct[], qual[];
   {
   qual[0] = '\0';
   strcat (qual, strrtrim (filename));
   strcat (qual, ".");
   strcat (qual, strrtrim (group));
   strcat (qual, ".");
   strcat (qual, strrtrim (acct));
   return qual;
   }

   fname_parse (qual, filename, group, acct)
   char qual[], filename[], group[], acct[];
   {
   char *start_group, *start_acct;
   start_group = strchr (qual, '.') + 1;
   start_acct = strchr (start_group, '.') + 1;
   strncpy (filename, qual, start_group - qual - 1);
   filename[start_group - qual - 1] = '\0';
   strncpy (group, start_group, start_acct - start_group - 1);
   group[start_acct - start_group - 1] = '\0';
   strcpy (acct, start_acct);
   }

   main ()
   {
   char qual[256], filename[8], group[8], acct[8];
   printf ("%s\n",
           fname_format ("sl      ", "pub     ", "sys     ", qual));
   fname_parse ("mpex.pub.vesoft", filename, group, acct);
   printf ("%s,%s,%s;\n", filename, group, acct);
   }

   What  are the differences between these  two, besides the fact that
one  is upper-case and one is lower-case? Let's examine these programs
a piece at a time.

   The  FNAME_FORMAT  procedure,  which merges the  three "file parts"
into a fully-qualified filename in PASCAL looks like this:

   FUNCTION FNAME_FORMAT (FILENAME, GROUP, ACCT: TSTR8): TSTR256;
   BEGIN
   FNAME_FORMAT := STRRTRIM(FILENAME) + '.' + STRRTRIM(GROUP) + '.' +
                   STRRTRIM(ACCT);
   END;

As  you  see,  we're  taking  full  advantage  here  of the  fact that
PASCAL/3000 lets us say "A + B" to concatenate two strings. In C, this
is quite a bit more difficult:

   char *fname_format (filename, group, acct, qual)
   char filename[], group[], acct[], qual[];
   {
   qual[0] = '\0';
   strcat (qual, strrtrim (filename));
   strcat (qual, ".");
   strcat (qual, strrtrim (group));
   strcat (qual, ".");
   strcat (qual, strrtrim (acct));
   return qual;
   }

Instead  of  just  saying "A + B", we  must say "STRCAT (A, B)", which
MODIFIES  THE STRING A (appending the string B to it) rather than just
returning  a newly-constructed string. In fact, STRCAT does return the
address of the modified A, so we could conceivably say:

   strcat (strcat (strcat (strcat (strcat (qual,
                                           strrtrim(filename)),
                                   "."),
                           strrtrim (group)),
                   "."),
           strrtrim (acct));

but  for obvious reasons we don't. This  is one advantage of having an
operator  like "+" instead of a function like "strcat" -- it makes the
program look quite a bit cleaner, especially if we have to nest it.

   Note  also that the PASCAL string manipulators are quite willing to
"create" a new string, like + and STR do. The C string package, on the
other  hand,  can  only modify parameters that  are passed to it (like
"strcat" does to its first parameter).

   This  is  an  artifact  of  the fact that  C functions can't return
arrays  (like  PASCAL  functions,  but unlike  PASCAL/3000 functions).
[Actually,  some  C  compilers,  including Draft  ANSI Standard, allow
functions  to  return  structures,  so we could  have a structure that
"contains"  only  one  subfield, which is an array  -- but this is not
usually done.]

   While we were writing FNAME_FORMAT, we needed some way of stripping
trailing  blanks  from  the  file  name, group name,  and account name
strings.  In  PASCAL,  this  was accomplished by  calling the STRRTRIM
function; in C, no such function exists, so we had to write our own:

   char *strrtrim (s)
   char s[];
   {
   /* Strips trailing blanks from F; also returns F as the result. */
   int i;
   for (i = strlen(s);   (i>0) && (s[i-1]==' ');   i = i-1)
     ;
   s[i] = '\0';
   return s;
   }

What  we  do  is  quite simple -- we find  the end of the string using
STRLEN  and then step back until we find a non-blank; then, we set the
first  of  the  trailing  blanks  to  a  '\0',  which  is  the  string
terminator.

   What  this  means,  among other things, is  that even if you aren't
satisfied  with  C's  string library, or if  you're using a C compiler
that doesn't come with a string library, you can write all your string
handling primitives quite easily. I'd guess that all of the Draft ANSI
Standard  C  string-handling  routines  (except  perhaps  the  numeric
formatting/parsing   ones,  like  "sprintf"  and  "sscanf")  could  be
implemented in 200 lines or less.

   On  the other hand, of course, you'd  rather not have to write even
that much yourself.

   Continuing through our sample programs, we get to FNAME_PARSE. It's
a  more complicated procedure -- we have  to find the locations of the
two  dots  and  then  extract  the  three substrings  that lie before,
between, and after the dots. In PASCAL, this would be:

   PROCEDURE FNAME_PARSE (QUALIFIED: TSTR256;
                          VAR FILENAME, GROUP, ACCT: STRING);
   VAR START_GROUP, START_ACCT: INTEGER;
   BEGIN
   START_GROUP := STRPOS (QUALIFIED, '.') + 1;
   START_ACCT := STRPOS (STR (QUALIFIED, START_GROUP,
                              STRLEN(QUALIFIED)-START_GROUP-1), '.')
                 + START_GROUP;
   FILENAME := STR (QUALIFIED, 1, START_GROUP - 2);
   GROUP := STR (QUALIFIED, START_GROUP, START_ACCT-START_GROUP-1);
   ACCT := STR (QUALIFIED, START_ACCT,
                STRLEN (QUALIFIED) - START_ACCT + 1);
   END;

and in C:

   fname_parse (qual, filename, group, acct)
   char qual[], filename[], group[], acct[];
   {
   char *start_group, *start_acct;
   start_group = strchr (qual, '.') + 1;
   start_acct = strchr (start_group, '.') + 1;
   strncpy (filename, qual, start_group - qual - 1);
   filename[start_group - qual - 1] = '\0';
   strncpy (group, start_group, start_acct - start_group - 1);
   group[start_acct - start_group - 1] = '\0';
   strcpy (acct, start_acct);
   }

Again, there are both similarities and differences:

   *  Both  PASCAL  and  Draft  Standard C have  functions that find a
     character inside a string (STRPOS in PASCAL, "strchr" in C).

   * PASCAL's STRPOS returns the INDEX of the character in the string,
     but C "strchr" returns a POINTER to the character.

   *  PASCAL has a STR function that returns a substring, the room for
     which  is  allocated  on the stack. In C,  on the other hand, one
     would  usually  use  a  pointer  and  address  directly  into the
     original string. That's why we say:

        start_group = strchr (qual, '.') + 1;
        start_acct = strchr (start_group, '.') + 1;

     instead of

        START_GROUP := STRPOS (QUALIFIED, '.') + 1;
        START_ACCT := STRPOS (STR (QUALIFIED, START_GROUP,
                         STRLEN(QUALIFIED)-START_GROUP-1), '.')
                      + START_GROUP;

     As  you see, we just passed  START_GROUP (which is a pointer into
     the  string  QUAL)  directly  to  STRCHR,  rather than  having to
     specially  extract  a  substring, which would  probably have been
     inefficient  as  well  as  being somewhat  more cumbersome. Note,
     though,  that  when  we  pass  START_GROUP, we don't  pass a true
     substring  in  the sense of "L  characters starting at offset S";
     rather,  STRPOS sees all of QUAL starting at the location pointed
     to by START_GROUP.

   *  Although  the  C routine manipulates pointers  more than it does
     offsets, we can say

       p + 1

     to refer to "a pointer that points 1 character after P", or

       p - q

     to  refer to "the number of characters between the pointers P and
     Q".

   Finally,  the  calling  sequences  to the two  procedures are quite
similar:

   WRITELN (FNAME_FORMAT ('MPEX    ', 'PUB     ', 'VESOFT  '));
   FNAME_PARSE ('MPEX.PUB.VESOFT', FILENAME, GROUP, ACCT);
   WRITELN (FILENAME, ',', GROUP, ',', ACCT, ';');

   printf ("%s\n",
           fname_format ("sl      ", "pub     ", "sys     ", qual));
   fname_parse ("mpex.pub.vesoft", filename, group, acct);
   printf ("%s,%s,%s;\n", filename, group, acct);

Note  that  in both PASCAL and C,  you can pass constant strings (e.g.
"MPEX.PUB.VESOFT")  to  procedures  --  a major  improvement over SPL,
which can't do this.

   I've  already  mentioned  some  of  the  built-in  string  handling
functions that Draft Standard C provides; here's a full list:

   *  STRCPY and STRNCPY copy one  string into another (one copies the
     entire  string,  the  other copies either the  entire string or a
     given  number  of  characters,  whichever  is  smaller).  They're
     comparable   to   PASCAL/3000's  string  assignment  and  STRMOVE
     functions.  The  "N"  procedures -- STRNCPY,  STRNCAT, STRNCMP --
     coupled with C's ability to extract "instant substrings" ("&x[3]"
     is  the address of the substring  of X starting with character 3)
     are  intended to compensate for the  lack of a substring function
     like PASCAL/3000's STR.

   *  STRCAT and STRNCAT concatenate one  string to another; again the
     "N" version (STRNCAT) will append not an entire string but rather
     up  to some number of characters  of it. These functions are most
     similar  to  PASCAL/3000's  STRAPPEND, but can do  the job of "+"
     with a bit of extra difficulty (as we saw above).

   *  STRCMP and STRNCMP compare  two strings, much like PASCAL/3000's
     ordinary  relational  operators (<, >, <=,  >=, =, <>) applied to
     strings.

   *  STRLEN  returns the length of  a string (just like PASCAL/3000's
     STRLEN).

   *  STRCHR  and  STRRCHR  search for the  first and last occurrence,
     respectively,  of a character in a string. STRSTR finds the first
     occurrence  of  one  string within another.  STRSTR is the direct
     equivalent  of PASCAL/3000's STRPOS (which  therefore can do what
     STRCHAR does, too). STRRCHR has no direct PASCAL/3000 equivalent.

   *  STRCSPN  searches  for  the first occurrence of  ONE OF A SET OF
     CHARACTERS  within  a  string. 'strspn (x,  "abc")' will find the
     first occurrence of an "a", "b", OR "c" in the string X.

     STRSPN  searches for the first character that is NOT ONE OF A SET
     OF  CHARACTERS  --  'strspn  (x,  "  0.")' will  skip all leading
     spaces,  zeroes, and dots in X, and return the index of the first
     character that is neither a space, zero, nor dot.

     These functions have no real parallels in PASCAL/3000.

   * STRTOK is a pretty complicated-looking routine that allows you to
     break  up a string into "tokens"  separated by delimiters. It has
     no parallel in PASCAL/3000.

   *  SPRINTF and SSCANF are the equivalents of PASCAL/3000's STRWRITE
     and  STRREAD  --  they  let  you do formatted  I/O using a string
     instead of a file.

   *  PASCAL/3000 procedures that don't have  a direct equivalent in C
     include: STRDELETE and STRINSERT (which can be emulated with some
     difficulty  using  STRNCPY);  STRLTRIM  and STRRTRIM,  which trim
     leading/trailing   blanks;   and  STRRPT,  which  returns  string
     containing a given number of repetitions of some other string.

   I   intentionally  gave  this  list  AFTER  the  example  comparing
FNAME_PARSE and FNAME_FORMAT is PASCAL and C. As we saw with STRRTRIM,
PASCAL/3000's   standard   string  handling  routines  can  be  easily
implemented  in C, and I'm sure  that C's string handling routines can
be easily implemented in PASCAL/3000.

   The  important  thing,  I  believe,  is  not  the exact  set of the
built-in   string-handling   procedures   but   rather   the  ease  of
extensibility  (which is good in both  PASCAL/3000 and C, but very bad
in  Standard  PASCAL)  and  the  general  "style"  of  string-handling
programming   (which,  as  you  can  see,  is  somewhat  different  in
PASCAL/3000 and C).

   If  you  prefer  C's  pointer  and  null-terminated strings  -- or,
conversely,  if you prefer PASCAL/3000's  "+" operator and the ability
of  functions to return string results -- I'm sure that you'll have no
problems implementing whatever primitives you need in either language.


               SEPARATE COMPILATION -- STANDARD PASCAL

   As  I'm sure you can imagine, the  MPE/XL source code is NOT stored
in  one  source  file.  Neither,  for that matter,  is my MPEX/3000 or
SECURITY/3000,  or  virtually any serious program.  Not only do I, for
instance, heavily use $INCLUDE files, I often compile various portions
of  my  program  into various USLs, RLs,  and SLs, and then eventually
link  them together at compile and :PREP time. Obviously, this sort of
thing  is imperative when your programs  get into thousands or tens of
thousands of lines.

   Now, what you may not be aware of is that

   * STANDARD PASCAL DEMANDS THAT YOUR ENTIRE PROGRAM BE COMPILED FROM
     A SINGLE SOURCE FILE.

That's  right.  In the orthodox Standard,  you can't have your program
call  SL  procedures; you can't have it  call RL procedures; you can't
have  it call any procedures OTHER THAN  THE ONES THAT WERE DEFINED IN
THE  SAME SOURCE FILE. Believe it or not, this is true -- and it's one
of the major problems with Standard PASCAL.


   Let's  say that you want to keep  all of your utility procedures --
ones  that might be useful to many  different programs -- in an RL (or
SL).  This  way,  you  won't have to copy  them all into each program,
which  would  be  a  maintenance  headache,  and  would slow  down the
compiles  substantially  (my utility RL is  13,000 lines long; MPEX is
3,000 lines).

   Standard  PASCAL has a problem with  this. Say that it encounters a
statement such as:

   I:=MIN(J,80);

If it had previously seen a function definition such as:

   FUNCTION MIN (X, Y: INTEGER): INTEGER;
   BEGIN
   IF X<Y THEN MIN:=X ELSE MIN:=Y;
   END;

then  it would realize that MIN is a function -- a function that takes
two  by-value  integer parameters and returns  an integer -- and would
generate  code  accordingly. But what if MIN  isn't in the same source
file? How does PASCAL know what to do?

   Now,  PASCAL  might  conceivably  be  able to decide  that MIN is a
function  -- after all, it couldn't  be anything else. Still, what are
the  function's  parameters?  Is  J,  for  instance,  a by-value  or a
by-reference  (VAR) parameter? PASCAL must know, because it would have
to  generate  different  code  in these cases.  Are its parameters and
return  value  32-bit integers or 16-bit  integers? Again, PASCAL must
know.  Does the function really have two integer parameters, or is the
programmer  making a mistake? PASCAL wants to do type checking, but it
has no information to check against.

   Essentially, what we have here is a "knowledge crisis":

   *  WHEN YOU TRY TO CALL A  PROCEDURE THAT ISN'T DEFINED IN THE SAME
     SOURCE  FILE,  PASCAL  DOESN'T  HAVE  ENOUGH KNOWLEDGE  ABOUT THE
     PROCEDURE  TO  GENERATE  CORRECT CODE FOR  THE CALL. FURTHERMORE,
     PASCAL's  TYPE-CHECKING DESIRES ARE FRUSTRATED  BY THIS VERY SAME
     THING.


    HOW HP PASCAL, PASCAL/XL, AND SPL HANDLE SEPARATE COMPILATION

   Now  this, of course, is by no means a new problem. Other languages
have  to call separately compiled procedures, too, and they've managed
to work out some solutions. In HP's COBOL/68, for instance, if you say

   CALL "MIN" USING X, 80.

the  compiler will assume that the function has two parameters, X, and
80,  and that both of them are passed by reference, as word addresses.
If  the parameters are passed by value,  or as byte address, or if the
procedure  returns  a  value -- why, that's  just your tough luck. You
can't  call  this  procedure  from  COBOL/68.  COBOL/68 compatibility,
incidentally,  is  the  reason  why  all  the  IMAGE  intrinsics  have
by-reference, word-addressed parameters.

   FORTRAN/3000 adopted a similar but somewhat more flexible approach.
In FORTRAN, saying

   CALL MIN (X, 80)

will  also  make  FORTRAN assume that MIN  has two parameters, each of
them  by reference. However, if X is of a character type, FORTRAN will
pass  it as a byte address, not a  word address; if X is an integer or
any  other  non-character  object,  FORTRAN  will  pass  it as  a word
address. This gives the user more flexibility.

   Furthermore, FORTRAN/3000 allows you to say:

   CALL MIN (\X\, 80)

to  tell the compiler that a particular parameter -- in this case X --
should  be  passed BY VALUE rather  than by reference. Furthermore, if
you want MIN to be a function, you can say

   I = MIN (\X\, \80\)

from  which FORTRAN will deduce that MIN returns a result. The type of
the  result,  incidentally,  is assumed to be  an integer by FORTRAN's
default  type  conventions (anything starting with  an M, or any other
character  between I and N, is an integer). If you want to declare MIN
to be real, you can simply say:

   REAL MIN
   ...
   I = MIN (\X\, \80\)

   Thus,  we can see four possible components in a compiler's decision
about how a procedure is being called:

   *  STANDARD  ASSUMPTIONS.  Both COBOL68/3000  and FORTRAN/3000, for
     instance, ASSUME that all parameters are by reference.

   *  ASSUMPTIONS  DERIVED  FROM  A NORMAL CALLING  SEQUENCE. How many
     parameters  does a procedure have? Both COBOL68 and FORTRAN guess
     this from the number of parameters the user specified. Similarly,
     FORTRAN  determines  whether or not a  procedure returns a result
     and  also  the  word-/byte-addressing of the  parameters from the
     details of the particular call.

   *  CALLING  SEQUENCE  MECHANISMS  BY WHICH A  USER CAN OVERRIDE THE
     COMPILER'S  ASSUMPTIONS.  In FORTRAN, the  backslashes (\) around
     by-value  parameters allow a user to override the assumption that
     the  parameters are to be passed by reference. Similarly, in HP's
     COBOL/74 (COBOLII/3000), you can say

       CALL "MIN" USING @X, \10\.

     which  indicates that X is to be  passed as a byte address and 10
     is  to  be  passed  by  value. Of course, if  MIN expects X to be
     by-value,  too, this call won't give you the right result -- it's
     your responsibility to specify the correct calling sequence.

   * ONE-TIME DECLARATIONS THAT THE USER CAN SPECIFY. If a user says

       REAL MIN

     and  then uses MIN as a function, the compiler will automatically
     know that MIN returns a real result, regardless of the context in
     which it is used.

Different  compilers,  as  you  see,  use different ones  of the above
methods,  and  use them in different cases.  COBOL/68, as I said, only
uses  default assumptions and information that  it can derive from the
standard  calling  sequence;  FORTRAN/3000 uses all  four of the above
methods  to determine different things about  how a procedure is to be
called.

   HP  PASCAL, PASCAL/XL, and SPL all  take exactly the same approach.
They

   *  REQUIRE A USER TO DECLARE  EVERYTHING THAT THE COMPILER NEEDS TO
     KNOW ABOUT THE PROCEDURE CALLING SEQUENCE.

Unlike  COBOL/3000  or  FORTRAN/3000,  they  don't make  any "educated
guesses";  but,  on  the other hand, they  let you specify the calling
sequence  in  exact detail, thus allowing  you to call procedures that
wouldn't be easily callable from either COBOL or FORTRAN.

   In  fact,  SPL, HP PASCAL, and PASCAL/XL  demand that you copy into
your  program the PROCEDURE HEADER  of every separately-compiled (also
known  as  "external")  procedure that you call.  For instance, if you
declared your SPL procedure MIN as

   INTEGER PROCEDURE MIN (X, Y);
   VALUE X, Y;
   INTEGER X, Y;
   BEGIN
   IF X<Y THEN MIN:=X ELSE MIN:=Y;
   END;

then  an SPL program that wants to  call MIN as an external would have
to say

   INTEGER PROCEDURE MIN (X, Y);
   VALUE X, Y;
   INTEGER X, Y;
   OPTION EXTERNAL;

The  "OPTION  EXTERNAL;" indicates that  the compiler shouldn't expect
the  actual body of MIN to go  here; rather, the procedure itself will
be linked into the program later on.

   Similarly, if you want to call the PASCAL/3000 procedure

   FUNCTION MIN (X, Y: INTEGER): INTEGER;
   BEGIN
   IF X<Y THEN MIN:=X ELSE MIN:=Y;
   END;

from another PASCAL/3000 program, you'd have to say:

   FUNCTION MIN (X, Y: INTEGER): INTEGER;
   EXTERNAL;

Here,  just  the  word  "EXTERNAL;"  tells PASCAL that  this is only a
declaration  of  the  calling  sequence; but, armed  with this calling
sequence, PASCAL can both

   * Generate the correct code, and

   * Check the parameters you specify to make sure that they're really
     what the procedure expects.

In other words, armed with these declarations, both SPL and PASCAL can
make  sure  that  you specify the right  number of parameters, and (in
PASCAL more than in SPL) that they are of the right types.

   For  instance,  if  MIN was declared  with by-reference rather than
by-value  parameters,  the  compiler  would  BOTH be sure  to pass the
address  rather than the value AND  would make sure that you're really
passing  a  variable and not a  constant or expression. Finally, since
the  external  declaration is an exact  copy of the actual procedure's
header,  you're  sure  that  you  can  call ANY  PASCAL procedure from
another  program, even if it has procedural/ functional parameters and
other arcane stuff.


                    SEPARATE COMPILATION IN K&R C

   Where  PASCAL/3000,  PASCAL/XL,  and  SPL  all  follow  the  strict
"declare  everything, assume nothing" approach,  Kernighan & Ritchie C
does  almost  the  exact opposite. Its solution  is actually much like
FORTRAN's, only more general and more demanding on the programmer.

   C will:

   *  Pass ALL parameters by value -- this isn't an assumption, it's a
     requirement.

   * Deduce the number of parameters the called procedure has from the
     procedure call.

   * Deduce the type of each parameter -- integer, float, or structure
     -- from the procedure call.

   *  Allow  you  to declare the procedure's  result type, but usually
     assume  it to be "integer"  (some compilers make this assumption,
     while others signal an error).

What's  more,  these are K&R C's assumptions  EVEN IF THE PROCEDURE IS
DECLARED  IN  THE  SAME  FILE  (i.e. not separately  compiled). If you
declare MIN as

   int min(x,y)
   int x,y;
   {
   if x<y then return (x);
   else return (y);
   }

and then call it as

   r = min(17.0,34.0);

then  the C compiler will merrily pass 17.0 and 34.0 as floating-point
numbers,  although  it  "knows"  that  MIN expects  integers. In other
words,  the  C  compiler  will  neither  print  an  error  message nor
automatically  convert  the  reals  into integers; it'll  pass them as
reals,  leaving  MIN  to treat their  binary representations as binary
representations of integers.

   In  fact,  C won't even "KNOW" that  MIN has two parameters; if you
pass  three,  it'll try to do it  and let you suffer the consequences.
The  only  thing that C remembers about MIN  is the same thing that it
would allow you to declare about MIN if MIN were an external procedure
-- that MIN's result type is "int" (or whatever else you might declare
it as).

   Since  external function call characteristics  are thus pretty much
the  same  in K&R C as internal  call characteristics, I describe them
elsewhere  (primarily  in  the "C TYPE CHECKING"  section of the "DATA
STRUCTURES"  chapter).  However,  I'll  mention  a  few  of  the  most
important points here:

   * To refer to an external procedure, all you need to do is say:

       extern <type> <proc>();

     The  EXTERN indicates that <proc>  is defined elsewhere; the "()"
     indicates  that <proc> is a  procedure; <type> indicates the type
     of the object returned by <proc>. An example might be:

       extern float sqrt();

     which  declares  SQRT to be an  external procedure that returns a
     FLOAT.

   *  If you want to pass a  parameter by reference, you actually pass
     its  address  by  value.  In  other  words,  you'd  declare  your
     procedure to be

       int swap_ints (x, y)
       int *x, *y;
       {
       int temp;
       temp = *x;
       *x = *y;
       *y = temp;
       }

     --  a procedure that takes  two BY-VALUE parameters which happens
     to be a pointer. Then, to call it, you'd say

       swap_ints (&foo, &bar);

     passing as parameters not FOO and BAR, but rather the expressions
     "&FOO" and "&BAR", which are the addresses of FOO and BAR. If you
     omit an "&" (i.e. say "FOO" instead of "&FOO"), the compiler will
     neither  warn  you  nor  do  what  seems to be  "the right thing"
     (extract  the address automatically);  rather, it'll happily pass
     the  value  of  FOO,  which  SWAP_INTS  will treat  as an address
     (boom!).


   *  Similarly,  you must be meticulously  careful with the number of
     parameters  you try to pass and the  type of each parameter; as I
     said before, if you say

       SQRT (100)

     and "sqrt" expects a floating point number, C won't automatically
     do  the conversion for you (because  it doesn't know just what it
     is that SQRT expects).

     The  only  exceptions  are  the  various  "short" types  that are
     automatically   converted   to  "int"s  and  "float"s  which  are
     automatically converted to "double"s.

   To  summarize,  K&R  C  saves  you having to  include the procedure
header  of every external procedure you want to call; however, it does
require you to specify the procedure's result type.

   On  the flip side, it can't check for what you don't specify, so it
will  neither check for your errors  nor automatically do the kinds of
conversions  (e.g.  automatically  take the address  of a by-reference
parameter) that SPL and PASCAL programmers take for granted.


            DRAFT ANSI STANDARD C AND SEPARATE COMPILATION

   Draft ANSI Standard C allows you to say

   extern float sqrt();

or

   extern int min();

just like you would in standard Kernighan & Ritchie C. One new feature
that  it  provides,  though,  is  the  ability to  declare a "function
prototype" which specifies the types of the parameters that the called
procedure  expects, thus letting the C  compiler do some type checking
and automatic conversion.

   I  discuss this facility in some  detail in the "DATA STRUCTURES --
TYPE CHECKING" sections of this manual, but I'll talk about it briefly
here. In Draft ANSI Standard C, you can say

   extern float sqrt(float);

or

   extern int min(int,int);

thus  declaring  the  number and types  of the procedures' parameters.
This  imposes  some  type  checking  that's  almost  as  stringent  as
PASCAL's;  the major differences between it and PASCAL's type checking
are:

   *  The  sizes  of  array  parameters are not checked.  This -- as I
     mention  in  the  DATA  STRUCTURES chapter --  is actually a good
     thing.


   * You can entirely prevent parameter checking for a given procedure
     by  simply  using the old ("extern  float sqrt()") standard K&R C
     mechanism instead of the new method.

   * You can declare a parameter to be a "generic by-reference object"
     by declaring its type to be "void *", to wit

        extern int put_rec (char *, char *, void *);

     where  the  third parameter is declared to  be a "void *" and can
     thus  have  any  type of array or  record structure (or any other
     by-reference object) passed in its place.

   * Finally, you can use C's "type cast" mechanism to force an object
     to  the  expected  data type if for  some reason the object isn't
     declared the same way.

   I  happen to like this new Draft Standard C approach; it allows you
to implement strict type checking, but waive it whenever appropriate.


         MORE ABOUT SEPARATE COMPILATION -- GLOBAL VARIABLES

   What  we  talked  about  above  is how the  compiler can know about
procedures  that  are declared in a  different source file. What about
variables?  What  if  we want some procedures in  the RL to share some
global variables with procedures in our main program?

   In  recent  times,  global  variables  have fallen  somewhat out of
favor,  and for good reason. A procedure  is a lot clearer if its only
inputs  and  outputs  are  its parameters; you  needn't be afraid that
calling

   FOO (A, B);

will  actually  change  some  global  variable I or  J that isn't even
mentioned  in the procedure call. I  myself often ran into cases where
an  apparently  innocent procedure call did  something I didn't expect
because  it modified a global variable. As a rule, it's much better to
pass whatever information the called procedure needs to look at or set
as a procedure parameter.

   For   this   reason,  many  of  today's  programming  textbooks  --
especially  PASCAL and C textbooks -- counsel us to have as few global
variables as possible.

   Unfortunately, though global variables are usually "bad programming
style",  they are often necessary. For instance, I have several global
variables  in my MPEX/3000 product and  the supporting routines I keep
in my RL:

   * CONY'HIT, a global variable that my control-Y trap procedure sets
     whenever  control-Y  is  hit. Since its  "caller" is actually the
     system,  the  only  way  it  can communicate with  the rest of my
     program is by using a global variable.

   *  DEBUGGING. If this variable is TRUE, many of my procedures print
     useful  debugging  information (e.g. information  on every file I
     open,  parsing  info,  etc.).  If  I were to  pass DEBUGGING as a
     procedure  parameter, virtually every one  of my procedures would
     have  to  have  this extra parameter, either  to use itself or to
     pass to other procedures that it calls.

   * VERSION, an 8-byte array that's set to the current version number
     of my program. Whenever one of my low-level routines detects some
     kind  of  logic error within my program  (e.g. I'm trying to read
     from  a  non-existent data segment), it  prints an error message,
     prints  the contents of VERSION, and  aborts. That way, if a user
     gets an abort like this and sends a PSCREEN to us, we'll see what
     version of the program he was running. Again, it would be a great
     burden  to  pass  this  variable  as  a  parameter  to all  my RL
     procedures.

   All  I'm saying is that there  are cases where global variables are
necessary  and  desirable,  and  it's  important  that  a  programming
language -- especially a systems programming language -- support them.
Now, how do PASCAL and C do them?


                      GLOBAL VARIABLES IN PASCAL

   Since  Standard PASCAL can't handle separate compilation anyway, it
certainly  has no provisions for "cross-source file" global variables,
i.e.  global  variables shared by  separately compiled procedures. You
can declare normal global variables, to wit

   PROGRAM GLOBBER;
   { The variables between a PROGRAM statement and the first }
   { PROCEDURE statement are global. }
   VAR DEBUGGING: BOOLEAN;
       VERSION: PACKED ARRAY [1..8] OF CHAR;
   ...
   PROCEDURE FOO;
   VAR X: INTEGER;   { this variable is local }
   ...

but  these variables will only be  known within this source file; even
if  you  can somehow call  an external, separately-compiled procedure,
there's no way for it to know about these global variables.

   PASCAL/3000 and PASCAL/XL, of course, had to face this problem just
like  they  had  to  face the problem  of calling external procedures.
Their solution was this:

   *  One  of the source files (the  MAIN BODY) should have a $GLOBAL$
     control  card  at  the  very  beginning. Then, ALL  of its global
     variables  become  "knowable" by any other  source files that are
     compiled separately.

   *  Any  of  the  other  source files that wants  to access a global
     variable  must declare the variable as  global within it; it must
     also have a $EXTERNAL$ control card at the very beginning.

   *  All the global variables defined in any of the $EXTERNAL$ source
     files must also be defined in the $GLOBAL$ source file.

In other words, our main body might say

   $GLOBAL$
   PROGRAM GLOBBER;
   { These global variables are now accessible by all the }
   { separately compiled procedures. }
   VAR DEBUGGING: BOOLEAN;
       VERSION: PACKED ARRAY [1..8] OF CHAR;
   ...

Two other separately-compiled files might read:

   $EXTERNAL$
   $SUBPROGRAM$   { only subroutines here, no main body }
   PROGRAM PROCS_1;
   VAR DEBUGGING: BOOLEAN;   { want to access this global var }
   ...

   $EXTERNAL$
   PROGRAM PROCS_2;
                { want to access this global var }
   VAR VERSION: PACKED ARRAY [1..8] OF CHAR;
   ...

So, you see, the main body file DECLARES all the global variables that
are  to  be shared among all  of the separately-compiled entities; all
other  files  can essentially IMPORT none, some,  or all of the global
variables that were so declared.

   If, however, some procedures in PROCS_1 and PROCS_2 wanted to share
some  global variables, or even some procedures in a single file (e.g.
PROCS_1) wanted to share a global variable between them, this variable
would also have to be declared in the main body.

   Also note that, unlike the external procedure declaration, which is
rather   similarly  implemented  in  many  versions  of  PASCAL,  this
$GLOBAL$/$EXTERNAL$ is a distinctly unusual construct that is unlikely
to be compatible with virtually any other PASCAL.


                       GLOBAL VARIABLES IN SPL

   Global  variable declaration and use is much  the same in SPL as in
PASCAL.  The  "main body" program -- the  one that'll become the outer
block  of the program file --  should declare all the global variables
used anywhere within the program. This would look something like:

   BEGIN
   GLOBAL LOGICAL DEBUGGING:=FALSE;
   GLOBAL BYTE ARRAY VERSION(0:7):="0.1     ";
   ...
   END;

[Note  that  unlike PASCAL, you can  initialize the variables when you
declare them; this isn't overwhelmingly important, but does save a bit
of typing.]

   Then,  each  procedure in a separately-compiled  file that wants to
"import" any of these global variables would say:

   ...
   PROCEDURE ABORT'PROG;
   BEGIN
   EXTERNAL BYTE ARRAY VERSION(*);
   ...
   END;
   ...

Note  one  difference  between  SPL  and PASCAL --  in PASCAL, all the
imported  global  variables  would be listed at  the top of the source
file;  in  SPL, they'd be given  inside each referencing procedure. On
the  one  hand,  the  SPL  method means more typing;  on the other, it
"localizes"  the importations to only those procedures that explicitly
request them, thus making it clearer who is accessing global variables
and who isn't. In any case, this isn't much of a difference.

   Note once again an interesting feature, present also in PASCAL: any
global  variables  used  anywhere  in the  various separately-compiled
sources  have to be declared in the  main body (an exception in SPL is
OWN variables; more about them later).

   This  can  be  quite  a  burden;  say  you  have  an RL  file whose
procedures  use a bunch of local  variables, often just to communicate
between  each  other  (for  instance,  a  bunch  of  resource handling
routines  might need to share a  "table of currently locked resources"
data structure).

   Then,  any program that calls these  procedures -- whether it needs
to  access  the  global variables or not --  would have to declare the
variables  as GLOBAL. Not a good thing, especially if you like to view
your  RL procedures as "black boxes" whose internal details should not
be cared about by their callers.


                        GLOBAL VARIABLES IN C

   Both  K&R  C  and  Draft  Standard  C  were designed  with separate
compilation  in  mind; thus, they have  standard provisions for global
variable declaration.

   In general, you're required to do two things:

   * In one of your separately compiled source files, you must declare
     the  variable  as  a  simple  global variable (i.e.  outside of a
     procedure declaration), e.g.

       int debugging = 0;
       char version[8] = "0.1    ";

     Note  that  unlike  PASCAL and SPL, C  doesn't require that these
     declarations  occur in any particular source file, or even in the
     same source file. You could, for instance, put these declarations
     in  your  RL procedure source file --  then, if your main program
     doesn't  want to change these variables,  it need never know that
     they exist.

   *  Any  source file that wants to  access a global variable that it
     didn't itself declare must define the variable as "extern":

       extern int debugging;
       extern char version[];

     The EXTERN declaration may occur either at the top level (outside
     a  procedure), in which case the  variable will be visible to all
     procedures  that are subsequently  defined; alternatively, it may
     occur  inside  a  procedure,  in which case  the variable will be
     known only to that procedure.


   As  I  mentioned,  the advantage of this  sort of mechanism is that
there  is no central place that is obligated to declare all the global
variables  that  are  used  in  any of the  separately compiled source
files. Rather, each variable may be declared and "extern"ed only where
it needs to be used.

   This  apparently  unimportant feature can be  useful in many cases.
Say  you have two procedures, "alloc_dl_space" and "dealloc_dl_space",
that  allocate  space in the DL- area of  the stack. They need to keep
track  of certain data, for instance, a list of all the free chunks of
memory. You can't just say:

   alloc_dl_space()
   {
   int *first_free_chunk;
   ...
   }

Not only will FIRST_FREE_CHUNK not be visible to DEALLOC_DL_SPACE, but
even  a subsequent call to ALLOC_DL_SPACE won't "know" this variable's
value,  since  any  procedure-local  variable is thrown  away when the
procedure exits. In C, however, you can say:

   int *first_free_chunk = 0;

   alloc_dl_space()
   {
   ...
   }

   dealloc_dl_space()
   {
   ...
   }

Now,  both ALLOC_DL_SPACE and DEALLOC_DL_SPACE can both see and modify
the  value  of  FIRST_FREE_CHUNK;  also,  since  it  is  no  longer  a
procedure-local variable, the value will be preserved between calls to
these two procedures. Equally importantly,

   * NOBODY WHO MIGHT CALL THESE TWO PROCEDURES WILL HAVE TO KNOW THAT
     THEY USE THIS GLOBAL VARIABLE.

Contrast  with the PASCAL and SPL approach, where each global variable
(and  its  type and size) would have to  be known to the main program.
This  both puts a burden on the programmer and adds the risk that this
variable   --   really   the   property   of  the  ALLOC_DL_SPACE  and
DEALLOC_DL_SPACE  procedures  --  will somehow be  changed by the main
program that has to declare it.

   Note  that  this  is  the one case where  the ability to initialize
variables  (which  C  has, SPL has to  some extent, and PASCAL doesn't
have) becomes really necessary. Since the variable is not known to the
main  body,  who's  going to initialize  it? The initialization clause
("int *first_free_chunk = 0") will do it.

   Incidentally,  SPL has a similar feature in its ability to have OWN
variables.  These are also "static" variables  -- i.e. ones that don't
go  away when the procedure is exited -- but are only known within one
procedure.  Thus, this is somewhat less  useful than C's approach, but
better  than PASCAL's, where all variables are either global (and thus
have  to be declared in the main  body) or non-static (i.e. get thrown
away whenever the procedure is exited).


 PASCAL/XL'S MODULES -- A NEW (IMPROVED?) SEPARATE COMPILATION METHOD

   PASCAL/3000,  PASCAL/XL,  SPL,  and  both species of  C provide for
separate compilation. Essentially, any file may declare a procedure or
variable as "external", tell the compiler some things about it, and be
able to reference this procedure or variable. This sort of solution is
certainly  necessary, and we can certainly live with it. But it is not
without its problems.


   I  keep  about 300 procedures in  my Relocatable Library (RL) file;
these  are  general-purpose  routines that I call  from all my various
programs. Say that I write a program in SPL or PASCAL. In order for my
program  to be able to call any RL procedure, the program must include
an  EXTERNAL declaration for the procedure, complete with declarations
of  all  the  parameters. Even in C, I'd  have to declare at least the
procedure's result type.

   Now,  none  of  my  programs  actually  calls  all  300  of  the RL
procedures;  most  call  only  about  10  or 20, though  some can call
hundreds.  Even if the program only  calls 20 RL procedures, having to
type  all these EXTERNAL declarations can impose a substantial burden.
Twenty  full  procedure  headers,  each  one of which  must be exactly
correct,  or else the program may very  well fail in very strange ways
at run-time.

   My  solution to this problem was to  create one $INCLUDE file -- C,
SPL,  and PASCAL/3000 all have $INCLUDE-like commands -- that contains
the external declarations of each procedure in the RL. Then, every one
of  my programs $INCLUDEs this file, thus declaring as external all of
the  RL  procedures,  and  making  any  one of them  callable from the
program.

   Thus, if my source file looks like:

   BEGIN

   INTEGER PROCEDURE MIN (I, J);
   VALUE I, J;
   INTEGER I, J;
   BEGIN
   IF I<J THEN MIN:=I ELSE MIN:=J;
   END;

   INTEGER PROCEDURE MAX (I, J);
   VALUE I, J;
   INTEGER I, J;
   BEGIN
   IF I<J THEN MAX:=I ELSE MAX:=J;
   END;

   ...
   END;

then my $INCLUDE file would look like:

   INTEGER PROCEDURE MIN (I, J);
   VALUE I, J;
   OPTION EXTERNAL;

   INTEGER PROCEDURE MAX (I, J);
   VALUE I, J;
   INTEGER I, J;
   OPTION EXTERNAL;

   ...

As  you see, one OPTION EXTERNAL  declaration for each procedure. Now,
instead  of manually declaring each such  procedure in every file that
calls  it, I can just $INCLUDE the entire "external declaration" file,
and  have access to ALL the procedures in my RL. Similarly, I may have
a  separate  $INCLUDE file for various  global constants, DEFINEs, and
variable declarations.

   Still,  the problem is obvious --  the procedure headers have to be
written  at least twice, once where  the procedure is actually defined
and  at least once where the procedure is declared EXTERNAL. These two
definitions then have to be kept in sync, and any discrepancy may have
unpleasant consequences. The same goes for global variables, too.

   Do you feel sorry for me yet? Well, if you don't, consider this. If
I write my procedures in PASCAL, then doubtless many of the parameters
will   be   of   specially-defined  types  --  records,  enumerations,
subranges, etc.

   My  EXTERNAL  declarations  must have EXACTLY  THE SAME TYPES! This
means that not only does each external procedure need to be defined in
the  external  declarations, but so must any  type that is used in any
procedure  header. The same goes for any constant used in defining the
type (e.g. MAX_NAME_CHARS in the type 1..MAX_NAME_CHARS).

   In  other  words,  all of the "externally  visible entities" of the
file  containing  the  utility  procedures must be  duplicated in each
caller  of the procedures (either directly  or using a $INCLUDE). This
includes  the  externally-visible  procedure, the  externally- visible
variables,  types, and constants. They  have to be maintained together
with  the procedure definitions themselves; any change in the external
appearance of the file must be reflected in the copy.

   Thus, to summarize, if I have this source file:

   CONST MAX_NAME_CHARS = 8;
   TYPE FLAB_REC = RECORD ... END;
        NAME = PACKED ARRAY [1..MAX_NAME_CHARS] OF CHAR;
   ...
   PROCEDURE FLAB_READ (N: NAME; VAR F: FLAB_REC);
   BEGIN
   ...
   END;
   ...
   PROCEDURE FLAB_WRITE (N: NAME; VAR F: FLAB_REC);
   BEGIN
   ...
   END;
   ...
   FUNCTION FLAB_CALC_SECTORS (F: FLAB_REC): INTEGER;
   BEGIN
   ...
   END;

then at the very least I have to have an $INCLUDE file that looks like
this:

   CONST MAX_NAME_CHARS = 8;
   TYPE FLAB_REC = RECORD ... END;
        NAME = PACKED ARRAY [1..MAX_NAME_CHARS] OF CHAR;

   PROCEDURE FLAB_READ (N: NAME; VAR F: FLAB_REC);
   EXTERNAL;

   PROCEDURE FLAB_WRITE (N: NAME; VAR F: FLAB_REC);
   EXTERNAL;

   FUNCTION FLAB_CALC_SECTORS (F: FLAB_REC): INTEGER;
   EXTERNAL;

   So  much for the tale of woe.  I bet you're thinking: "Now Eugene's
going  to  tell us that PASCAL/XL solves  all these problems." Well, I
wish I could, but I can't.


   PASCAL/XL's MODULEs look something like the following:

   $MLIBRARY 'FLABLIB'$
   MODULE FLAB_HANDLING;

   $SEARCH 'PRIMLIB, STRLIB'$
   IMPORT PRIMITIVES, STRING_FUNCTIONS;

   EXPORT
     CONST FSERR_EOF = 0;
           FSERR_NO_FILE = 52;
           FSERR_DUP_FILE = 100;
     TYPE FLAB_TYPE = RECORD
                      FILE: PACKED ARRAY [1..8] OF CHAR;
                      GROUP: PACKED ARRAY [1..8] OF CHAR;
                      ...
                      END;
          DISC_ADDR_TYPE = ARRAY [1..10] OF INTEGER;
     FUNCTION FLABREAD (DISC_ADDR: DISC_ADDR_TYPE): FLAB_TYPE;
     PROCEDURE FLABWRITE (DISC_ADDR: DISC_ADDR_TYPE; F: FLAB_TYPE);

   IMPLEMENT

     CONST FSERR_EOF = 0;
           FSERR_NO_FILE = 52;
           FSERR_DUP_FILE = 100;
           MAX_FILES_PER_NODE = 16;   { not exported }
     TYPE FLAB_TYPE = RECORD
                      FILE: PACKED ARRAY [1..8] OF CHAR;
                      GROUP: PACKED ARRAY [1..8] OF CHAR;
                      ...
                      END;
          DISC_ADDR_TYPE = ARRAY [1..10] OF INTEGER;
          DISC_ID = 1..256;   { not exported }

     FUNCTION ADDRCHECK (DISC_ADDR: DISC_ADDR_TYPE): BOOLEAN;
     BEGIN
     { the implementation -- ADDRCHECK is not exported }
     END;

     FUNCTION FLABREAD (DISC_ADDR: DISC_ADDR_TYPE): FLAB_TYPE;
     BEGIN
     { the actual implementation of the function... }
     END;

     PROCEDURE FLABWRITE (DISC_ADDR: DISC_ADDR_TYPE; F: FLAB_TYPE);
     BEGIN
     { the actual implementation of the function... }
     END;

   END;

OK, let's look at this one piece at a time:

   *   First,  we  say  "$MLIBRARY  'FLABLIB'$"  followed  by  "MODULE
     FLAB_HANDLING". These tell the compiler that this file

        DEFINES   A   MODULE   CALLED   "FLAB_HANDLING"   INSIDE   THE
        SPECIALLY-FORMATTED MODULE LIBRARY "FLABLIB".

     All  of  the  information about the  "external interface" of this
     module -- i.e. everything that is specified in the EXPORT section
     -- will be stored into this module library.

   *  Then  we  say  "$SEARCH 'PRIMLIB, STRLIB'$"  followed by "IMPORT
     PRIMITIVES,  STRING_FUNCTIONS;". This essentially "brings in" the
     external interface of the modules PRIMITIVES and STRING_FUNCTIONS
     that  are stored in the module  library files PRIMLIB and STRLIB.
     Practically speaking, this is exactly the same as if

        ALL  THE  TEXT  SPECIFIED  IN  THE  "EXPORT"  SECTION  OF  THE
        "PRIMITIVES"   AND  "STRING_FUNCTIONS"  MODULES  WAS  INCLUDED
        DIRECTLY INTO THE FILE, WITH "EXTERNAL;" KEYWORDS PLACED RIGHT
        AFTER THE FUNCTION DEFINITIONS.

     Now  the  compiler  "knows"  about  all the  TYPEs, CONSTs, VARs,
     PROCEDUREs,  and FUNCTIONs that are defined by the PRIMITIVES and
     STRING_FUNCTIONS  modules, and will let  you use them from within
     the module that's being currently defined (FLAB_HANDLING).

   *  The  EXPORT  section  defines  the "external  interface" of this
     module.  We've  already  explained  what  this  really  means  --
     whenever  you IMPORT a module, the  result is exactly the same as
     if  you  had  copied in the EXPORT  section of the module (except
     that  "EXTERNAL;"  keywords  are automatically put  after all the
     functions).

     Any  CONSTants,  VARiables, TYPEs, PROCEDUREs,  or FUNCTIONs that
     you define in this module but want other modules to use should be
     declared in the EXPORT section.


   *  Finally, the IMPLEMENT section is the actual source code of your
     file.    It   has   to   include   all   the   declarations   and
     procedure/function  definitions (EVEN THE  ONES ALREADY MENTIONED
     IN THE "EXPORT" SECTION!).

     In  fact,  it's  just like an ordinary  PROGRAM, except that it's
     missing the PROGRAM statement.

   Thus,  if you think about it, you could have -- instead of defining
a MODULE --

   * Written the IMPLEMENT section as an ordinary program;

   *  Put  the EXPORT information into a  separate file (we'll call it
     the "external declarations $INCLUDE file");

   *  And,  instead  of the IMPORTs, just  used the $INCLUDE$ compiler
     command  to include the "external declarations $INCLUDE files" of
     all of the IMPORTed modules.

Really, this is ALL THERE IS TO A MODULE. Its only advantages are:

   *  You  can keep the EXPORT declarations  and IMPLEMENT part in the
     same  file, so that when you change  the definition of one of the
     external objects, you can easily change the EXPORT declaration in
     the same file.

   * The compiler will check to make sure that the EXPORT declarations
     are  exactly  the same as  the actual implementation declarations
     (i.e.  that you didn't define the  procedure one way in an EXPORT
     and another way in the IMPLEMENT section).

   *  Finally,  for  honesty's  sake, I ought to  point out that you'd
     actually  need two $INCLUDE files per module if you wanted to use
     them  instead  of  MODULEs. You'd need one  $INCLUDE file for the
     TYPE/CONST/VAR    declarations    and    one    file    for   the
     PROCEDURE/FUNCTION  EXTERNAL declarations --  this is because ALL
     of  the  TYPE/CONST/VAR  declarations for  all $INCLUDE$d modules
     would have to go before ALL the EXTERNAL declarations.

   The  major flaw -- not compared with $INCLUDE$s but rather compared
to what they could have so easily done! -- is obvious:

   * WHY FORCE US TO DUPLICATE ALL THE EXTERNAL OBJECT DECLARATIONS IN
     THE "EXPORT" AND "IMPLEMENT" SECTIONS?

Why  should I define the hundred-odd  subfields of a file label twice?
Why  should I specify the parameter list of all my external procedures
twice?

   To me it seems simple -- just have all the declarations go into the
IMPLEMENT section, and then let me say:

   EXPORT FSERR_EOF, FSERR_NO_FILE, FSERR_DUP_FILE,
          FLAB_TYPE, DISC_ADDR_TYPE,
          FLABREAD, FLABWRITE;

How  simple!  Think  of all the effort  and possibility for error that
we'd avoid. Even better -- I'd like to be able to say either

   EXPORT_ALL;

to  indicate  that  ALL  the  things  I define should  be exported, or
perhaps  specify a $EXPORT$ flag near  every definition that I want to
export.  That way, if I have a file with 20 constants, 5 variables, 10
data  types, and 30 procedures, I  wouldn't have to enumerate them all
in an EXPORT statement at the top.

   This  way, not only do I have to  enumerate all of them, but I have
to DUPLICATE ALL THEIR DEFINITIONS! Why?

   Another  quirk  that  you  may or may  not have noticed: obviously,
since I specify a module name AND a module library filename, there can
be  more  than  one module per library. And  yet, I define a different
library  file  for each of the  modules FLAB_HANDLING, PRIMITIVES, and
STRING_FUNCTIONS. Why would I do a silly thing like that?

   Well,  a little paragraph hidden in my PASCAL/XL Programmer's Guide
says:  "A  module  can  not import another module  from its own target
library;  that  is,  the compiler options MLIBRARY  and SEARCH can not
specify the same library."

   Seems  innocent,  eh?  But  that means that any  time a module must
import  another  module,  the  imported module must  be in a different
library  file! The only modules that CAN  be stored in the same module
library file are ones that do NOT import one another.

   Well,  it  makes  perfect  sense  for FLAB_HANDLING to  want to use
PRIMITIVES  and STRING_FUNCTIONS -- after all, why do I define modules
if  not  to  be  able to IMPORT them into  as many places as possible?
Similarly, STRING_FUNCTIONS will probably want to use PRIMITIVES. What
we get is a rather paradoxical situation in which

   THE  ONLY  MODULES  THAT CAN BE PUT  INTO THE *SAME* MODULE LIBRARY
   FILES ARE ONES THAT HAVE *NOTHING TO DO WITH EACH OTHER*!

Of course, that's a bit of hyperbole, but you get my point -- you make
modules  so you can IMPORT them into one another, but the more you use
IMPORT  the  less able you are to  have several related modules in the
same library.


                     STANDARD LANGUAGE OPERATORS

   Everybody  knows operators; all languages have  them. +, -, *, / --
you  pass them parameters and they return to you results. Imagine what
would happen if you couldn't multiply two numbers!

   PASCAL's  set  of  operators  is,  by  standards  of  most computer
languages   (like  BASIC,  FORTRAN,  and  COBOL),  about  average.  It
includes:

   monadic +       monadic -
   +               -               *               /
   DIV             MOD
   NOT             AND             OR
   <               <=              =               <>
   >               >=
   IN (the set operator)
   ^ (pointer deferencing)
   [ (array subscripting)

These include:

   * The "monadic" operators, which take one parameter.

   * The "dyadic" operators, which take two parameters.

   *  The arithmetic operators +, -, *, /, DIV, and MOD. These operate
     on integer or real parameters.

   * The logical operators NOT, AND, and OR. They work on booleans.

   *  The  relational  operators  <,  <=, =, <>, >,  and >=. They take
     numbers and return booleans.

   *  The SET operators IN, +, -, and  *. They work on sets; note that
     +,  -,  and * mean quite  different (though conceptually similar)
     things for sets than they do for integers -- + is set union, - is
     set difference, and * is set intersection.

   *  Some  other  things  that you may not  think of as operators but
     which  most  assuredly are. ^ and [, it  seems to me, are just as
     much operators as anything else.

   Now,  it takes no feats of analysis to construct this kind of list;
I  just  looked in the PASCAL manual. The  C manual reveals to me that
the  C operator set is -- at least  in terms of number of operators --
quite a bit richer:

   C operator                PASCAL equivalent
   ----------                -----------------
   monadic +, monadic -      Same
   +, -, *, /                Same
   %                         MOD
   <, >, <=, >=, ==, !=      Same; == is "equal", != is "not equal"
   !, &&, ||                 NOT, AND, OR (respectively)
   ~, &, |, ^                BITWISE NOT, AND, OR, and EXCLUSIVE OR
   <<, >>                    BITWISE LOGICAL SHIFTS (LEFT and RIGHT)
   monadic *, e.g. *X        pointer^
   monadic &, e.g. &X        None -- returns address of a variable
   ++X, --X                  None -- increments or decrements X by 1
                             (usually) and also returns the new value
                             (simultaneously changing X)
   X++, X--                  None -- increments or decrements X by 1
                             (usually) but returns the value that X
                             had BEFORE the increment/decrement!
   X=Y                       := assignment, but can be used in an
                             expression (e.g. "X = 2+(Y=3)+Z").
   X+=Y, X-=Y, X*=Y, X/=Y,   None -- X+=Y means the same as
     X&=Y, X|=Y, X^=Y,       "X=X+Y", and so on for the other ops
     X%=Y, X<<=Y, X>>=Y
   X?Y:Z                     None -- an IF/THEN/ELSE that can be
                             used within an expression.
                             "(X<Y)?X:Y" returns the minimum of
                             X and Y (IF X<Y THEN X ELSE Y).
   (X,Y,Z)                   None -- executes the expressions X, Y,
                             and Z, but only returns the result of Z!
   sizeof X                  Returns the size, in bytes of its
                             operand expression.

   This  is the rich operator set that many C programmers are so proud
of,  and deride PASCAL and other languages for not possessing. Indeed,
several new categories of operators do exist. But are they really that
useful?  Or can they be easily  (and, perhaps, more readably) emulated
with conventional, more familiar, operators.


                           BIT MANIPULATION

   PASCAL  prides  itself  on being a High  Level language. That's not
just  high level as in "high-level" -- it's High Level, with a capital
H  and  a  capital  L.  In PASCAL, much care  is taken to insulate the
programmer  from  the  underlying physical structure  of the data. You
don't  need  to  know how many bytes there are  in a word, or how many
words  there  are in your data structure;  you don't even need to know
that  there  is  such  a  thing  as  a  "bit"  and that it  is what is
manipulated deep down inside the computer.

   Unfortunately,  we do not live in a  High Level world. If I want to
write  a  PASCAL  program  that  can access, say,  file labels, or PCB
entries,  or log file records, I need  to be able to access individual
bit fields.

   This  may  mean  that  the  system was badly  designed in the first
place; perhaps to much attention was paid to saving space; perhaps the
operating  system should have been written  in PASCAL, too, so I could
just  use  the same record structure  definitions to access the system
tables that the system itself uses.

   For  better  or  worse, there are plenty of  cases where we need to
manipulate bit fields:

   * In FOPEN, FGETINFO, and WHO calls;

   * In accessing data in system tables and system log files;

   * In writing one's own packed decimal manipulation routines;

   *  In  compressing  one's own data structures  to save space, by no
     means an unworthy goal;

   *  And  many other cases, but  primarily systems programming rather
     than application programming.

   Thus,  although bit fields are admittedly not something you usually
use  quite  as  often  as,  say,  strings  or  integers, they  must be
supported by any system programming language.

   How do PASCAL, SPL, and C support bit fields?


TYPICAL OPERATIONS ON BITS

   There  are really three main classes  of operations that you'd want
to perform with bits:

   * BITWISE OPERATIONS ON ENTIRE NUMBERS. These view a quantity -- an
     8-bit,  a 16-bit, a 32-bit, or whatever  -- as just a sequence of
     bits.  When you do a "bitwise not"  of a number, each of the bits
     in  it  is negated; a "bitwise and"  of two numbers ands together
     each  bit  in the two numbers -- bit  0 of the result becomes the
     "logical  and"  of  bit  0  of the first number  and bit 0 of the
     second;  bit  1 of the result becomes the  and of the two bits 1;
     and so on.

   *  BIT  SHIFTS. Bit shifts also view a  number as a bit sequence. A
     bit shift just takes all of the bits and "moves" them some number
     to  the left or to the right. For instance, if you shift 101 left
     by 2 bits, this takes the bit pattern

       00001100101

     and makes it into

       00110010100

     which is 404.

   * BIT EXTRACTS/DEPOSITS. Bit extracts take a particular sequence of
     bits  --  say  "3 bits starting from bit  #10" -- and extract the
     value  stored  at  those  bits;  in other words,  they view a bit
     string  as  an integer. Bit deposits allow  you to set the 3 bits
     starting at bit #10 to some value. In SPL, for instance,

       RECFORMAT:=FOPTIONS.(8:2)

     and

       FOPTIONS.(8:2):=RECFORMAT

     extract and deposit bit fields.

   Since  the  computer's  lowest-level  data  type  is  the  bit, bit
operations  can  usually  be  performed quite  easily and efficiently.
Virtually  all computers have the  BITWISE OPERATIONS and SHIFTS built
in  as instructions, and some (like the  HP3000, but not, say, the VAX
or  the Motorola 68000) have BIT EXTRACT/DEPOSIT instructions as well.
An interesting fact is that although bitwise operations and shifts are
relatively  hard and inefficient to  emulate in software, bit extracts
and deposits are relatively easy. For instance,

   X.(10:3)   --  extract 3 bits starting from bit #10
                  (assuming 16-bit words, least significant bit=#15)

is the same as

   X shifted left by 10 bits and shifted right by 13 bits

or

   X shifted right by 3 bits and ANDed with 7.

In general,

   I.(START:COUNT)  =  I  shifted  left by START  and shifted right by
                     (#bits/word) - COUNT.

   So,  just because these three types  of operations are available in
certain  languages, it does not follow that all of them are necessary.
Some  can be easily (or not so easily) emulated using the others, but,
more  importantly, it may well be the  case that some just aren't very
frequently useful. Bit extracts, for instance, are something that I've
frequently  found  myself  doing;  bitwise  operations  and especially
shifts are (at least for me) far rarer things.


                                 SPL

   SPL  supports all three types of  bit operations, and supports them
all in a big way.

   *  Single-word  unsigned  quantities  (LOGICALs)  can  be  bit-wise
     negated  (NOT), ANDed (LAND), inclusive ORed (LOR), and exclusive
     ORed (XOR).

   * Single-word quantities can have arbitrary CONSTANT bit substrings
     extracted   and   deposited.   In   other   words,  you  can  say
     "FOPTIONS.(10:3)", but you CAN'T say "CAPMASK.(CAPNUM:1)".

   *  Single-, double-, triple-, and  quadruple-word quantities can be
     shifted  in a number of ways --  see the SPL Reference Manual for
     details.

   * Some other, less important features, like "Bit concatenation" are
     also supported; see the SPL Reference Manual if you're curious.

   Bit  extractions are, naturally, vital in  SPL. All sorts of things
--  FOPEN  foptions  and aoptions, WHO  capability masks, system table
fields, etc. -- contain various bit fields, most of which do not cross
word  boundaries  (hence it's sufficient to  have bit fields of single
words  only)  and  have constant offsets and  lengths (hence it's only
necessary to have constant bit field parameters).

   Other operations are less frequent. Checking my 13,000-line RL file
--  which, of course, is completely  representative of any SPL program
that anybody's ever written -- I find that I use shift operations in a
very distinct set of cases:

   * Converting byte to word addresses and vice versa.

   *  Extracting  variable  bit  fields  (e.g. if I  have a WHO-format
     capability mask and a variable capability bit number).

   * Extracting bit fields of doubles (from WHO masks, disc addresses,
     and other double-word entities).

   * Rarely, quick multiplies and divides by powers of two.

   * Constructing integers and doubles out of bytes and integers.

   Note  that the majority of  these cases are actually "work-arounds"
caused  by compiler problems. If SPL had a C-like "cast" facility with
which I could convert a byte address to a word address and vice versa,
I  wouldn't  need  to  do  an ugly and unreadable  shift; if SPL's bit
extracts  were more powerful, I could use them for variable fields and
doubles;  if SPL's optimizer were better, I could always do multiplies
and divides and count on SPL to do the work.

Shifts, then, in my opinion are a classic case of something that might
not be needed in a "perfect" system; however, as we see, they can come
in  handy  to  avoid the imperfections that are  bound to exist in any
language.

   Similarly,  I  find  that I almost don't  use bitwise operations at
all;  the only cases I do use them are those where I need to implement
double-word bit fields and variable bit fields.


                                  C

   C's  approach  to bit operations is rather  like that of SPL. Since
most computers have bit manipulation instructions, C has bit operators
built  in to the language. Actually, C  does not have bit extracts and
deposits,  but it does have shifts  and bitwise operators, which (as I
mentioned) can be used to emulate bit extractions.

   However,  C  has  another mechanism to handle  bit fields, which is
both more and less usable than SPL's.

   In C, structures can have fields that are explicitly declared to be
a  certain  number of bits long.  For instance, consider the following
definition:

   typedef struct {unsigned:2;       /* Bits .(0:2); unused */
                   unsigned file_type:2;         /* .(2:2) */
                   unsigned ksam:1;              /* .(4:1) */
                   unsigned disallow_fileeq:1;   /* .(5:1) */
                   unsigned labelled_tape:1;     /* .(6:1) */
                   unsigned cctl:1;              /* .(7:1) */
                   unsigned record_format:2;     /* .(8:2) */
                   unsigned default_desig:3;     /* .(10:3) */
                   unsigned ascii:1;             /* .(13:1) */
                   unsigned domain:2;            /* .(14:2) */
                  } foptions_type;

This  defines  a data type called  "foptions_type" as a structure with
several subfields. However, each subfield occupies a certain number of
bits,  and because of certain guarantees made by the C compiler (which
differ,  by the way, among different Cs on different machines) we know
which bits they are. Thus, if we say:

   foptions_type foptions;
   ...
   fgetinfo (fnum,,foptions);
   if (foptions.cctl=1) ...

A  lot  clearer,  you'll agree, than  saying "FOPTIONS.(7:1)". And the
effect, of course, is exactly the same.

   On the other hand, there are certain restrictions to this bit field
extraction mechanism:

   *  Just like in SPL, you can't  extract bit fields whose offset and
     length  are  not constant. To do this,  you have to use the shift
     operators or bitwise operations. For instance, instead of writing

       CAPMASK.(BIT:2) (which you couldn't do in SPL anyway),

     you'd say:

       (capmask << bit) >> 14

     or, perhaps,

       (capmask >> (16-bit)) & 3    (3 = a binary 11)

     You  have  to  do similar (but uglier)  stuff to set variable bit
     fields.

   *  Another  difference -- in which C falls  short of SPL -- is that
     the  "structure  subfield" approach of  bit extraction only works
     for  getting  bits  from  variables  that  were declared  to be a
     certain  type.  Say you want to  extract bits (10:2) of something
     that  was declared as an "int",  or, perhaps, an expression (like
     "foptions | 4", which ORs foptions with 4, thus setting the ASCII
     bit). You can't say

       (foptions | 4).rec_format

     or even

       ((foptions_type) (foptions | 4)).rec_format

     (since  you  can't cast something to  a structure type). Granted,
     it's  very  rare  that  you'd  want  to extract a  subfield of an
     expression,  and if you're trying to extract a subfield of a type
     "int"  variable,  this might mean that  you declared the variable
     wrong.  Still, the point here is that SPL's bit extract mechanism
     is more flexible (though much less readable).

   *  Finally,  a  philosophical  issue.  Yes,  you  can use  C record
     structures  to extract bit fields; but what  if you want to use a
     particular  subfield only once? Do you  need to declare a special
     record  structure  just for extracting the  field? Wouldn't it be
     easier  to do like you can in  SPL, specifying the bit offset and
     length   directly,   without  encumbering  yourself  with  a  new
     datatype?

     This  becomes an even more serious  problem with PASCAL, in which
     you can't even do bit shifts to emulate impromptu bit extraction.
     The  key  point  here  is  that  for  QUICK  AND  DIRTY, one-shot
     operations,  declaring a structure in order to be able to use bit
     subfields  may be more cumbersome that you'd like. Again, the old
     trade-off of "Good Programming Style" versus ease of writing.

   Thus,  C  supports  bitwise operations and  shifts (although not as
many  shift  operators  as SPL does; on the  other hand, many of SPL's
shift  operators  are  of  doubtful utility). It  can also emulate bit
field  manipulations using shifts, but, more importantly, can make bit
manipulations  a  lot easier and more  readable using record structure
bit  subfields.  On  the  other hand, SPL's  ".(X:Y)" operator has the
advantage  of  being  usable  on  an ad-hoc basis,  without needing to
declare a special record structure.


                                PASCAL

   As  I  mentioned  before,  PASCAL,  perhaps more  for philosophical
reasons than anything else, does not explicitly support bits. "PASCAL:
An  Introduction to Methodical Programming" (W.  Findlay & D. A. Watt)
--  the  book from which I learned  PASCAL, and one that describes all
the  Standard  PASCAL  features -- doesn't even  mention "bits" in its
index.

   Of  course, the need for bit fields was recognized quite early, and
a fairly common consensus developed.

   PACKED  RECORDs  are defined in Standard  PASCAL as structures that
the compiler may -- at its option -- make use less space but slower to
access.  Many  PASCAL  compilers  use  PACKED RECORDs  as vehicles for
implementing bit subfields, much like C does.

   For  instance, say that you want to declare an "foptions" type much
like the one I showed for C. In PASCAL, it would be:

   TYPE FOPTIONS_TYPE = PACKED RECORD
                        DUMMY: 0..3;             /* .(0:2) */
                        FILE_TYPE: 0..3;         /* .(2:2) */
                        KSAM: 0..1;              /* .(4:1) */
                        DISALLOW_FILEEQ: 0..1;   /* .(5:1) */
                        LABELLED_TAPE: 0..1;     /* .(6:1) */
                        CCTL: 0..1;              /* .(7:1) */
                        RECORD_FORMAT: 0..3;     /* .(8:2) */
                        DEFAULT_DESIG: 0..7;     /* .(10:3) */
                        ASCII: 0..1;             /* .(13:1) */
                        DOMAIN: 0..3;            /* .(14:2) */
                        END;

   Note  the most obvious feature here  (which you may consider either
ingenious or utterly laughable, depending on your prejudices). Instead
of  specifying the number of bits you're using explicitly, you specify
a range from 0 to

   2 ^ NUMBITS - 1

The  compiler  then decides that the smallest  number of bits it could
use  to  represent this is NUMBITS,  and allocates that many. Remember
that  PASCAL is very reluctant to  let the programmer "see" any aspect
of  the internal representation of its variable; therefore, it prefers
that bit fields be thus declared implicitly rather than explicitly.

   Now,  you  can  access the bit fields  of an FOPTIONS_TYPE variable
just like you would in C:

   VAR FOPTIONS: FOPTIONS_TYPE;
   ...
   FGETINFO (FNUM, , FOPTIONS);
   IF FOPTIONS.CCTL=1 THEN ...

If  you  don't  like  having to declare the  bit fields with all those
powers of 2 (quick -- how many bits is "0..8191"?), you can just issue
the following declarations (usually in an $INCLUDE$ file):

   TYPE BITS_1 = 0..1;
        BITS_2 = 0..3;
        BITS_3 = 0..7;
        ...
        BITS_13 = 0..8191;
        BITS_14 = 0..16383;
        BITS_15 = 0..32767;

and  then  declare  each subfield of, say,  FOPTIONS_TYPE, as being of
type BITS_1 or BITS_3 or whatever.

   So,  the  high-level  means  of  accessing bit  subfields exists in
PASCAL  just as it does in C.  What about the low-level means? What if
you  want to do a shift or a bitwise AND? Or, more concretely, what if
you  want  to  extract,  say,  a  bit field that  starts at a variable
offset?

   Fortunately,  HP PASCAL (unlike SPL and C) provides a nice built-in
mechanism  for  handling  variable-offset  bit  fields.  Consider  our
classic  example,  a  32-bit  "capability mask" (of  the type that WHO
returns). We want to be able to retrieve, say, the CAPNUMth bit of the
capability mask. In PASCAL, we say:

   TYPE CAPABILITY_MASK_TYPE = PACKED ARRAY [0..31] OF 0..1;
   VAR CAPABILITY_MASK: CAPABILITY_MASK_TYPE;
   ...
   I:=CAPABILITY_MASK[BITNUM];

Simple! Because "PACKED" in PASCAL can apply to any kind of structure,
HP  PASCAL  (and  many other PASCAL compilers)  allow PACKED ARRAYs of
subranges  that can fit in less than 1 byte (e.g. 0..1, 0..3, etc.) to
become arrays of bit fields.

   Thus,   "CAPABILITY_MASK[BITNUM]"  extracts  the  BITNUMth  bit  of
CAPABILITY_MASK  simply because, to HP PASCAL, the BITNUMth bit is the
BITNUMth  element  of  an  array  of  32  bits. We could  do the same,
incidentally, to an array of 48 bits, 64 bits, etc. On the other hand,
if  we want to retrieve more than  one bit, we can see the limitations
of the PASCAL approach:

   *  There  is  no  simple way, for instance,  to retrieve a variable
     number  of  bits  from  an  integer. Since we  have neither a bit
     extract  nor  a  bit shift operator, we  can't use them; our only
     alternatives  are using division and modulo by powers of 2 (quite
     complicated  and  very inefficient) or using  the PACK and UNPACK
     built-in procedures in a rather esoteric way (equally complicated
     and maybe more inefficient).

   *  We can't even, in general, extract,  say, 2 bits from a variable
     bit  offset!  We can only do this if  we know that the bit offset
     will  be  a  multiple  of 2 -- in that  case, we can use a PACKED
     ARRAY   OF   0..3.  Similar  problems,  of  course,  happen  with
     extracting   3-bit  fields,  4-bit  fields,  etc.  from  variable
     boundaries (although we can, for instance, extract nybbles from a
     packed  decimal  number,  because  they  always start  on a 4-bit
     boundary).

   In  light  of  all this, it should be  obvious that, say, shifts or
bitwise  operations  are  well-nigh  impossible  to do  efficiently or
conveniently  in PASCAL (although in HP PASCAL, bitwise operations can
be craftily and kludgily emulated using sets).

   What  we  see  here is, I believe,  a common PASCAL syndrome. Those
things  that the language supports --  to wit, fixed-offset bit fields
and  arrays of single-bit fields --  it often supports quite well; you
can  use  these  features very readably and  efficiently. On the other
hand, anything that the language designers didn't think to give you --
like shifts and bitwise operators -- you have NO WAY of accessing.

   If  the  compiler  is too dumb to implement  "X*8" as the much more
efficient "X left shifted by 3 bits", too bad; in SPL and C you can do
this  yourself, but in PASCAL you  can't. If the compiler doesn't have
variable-offset  bit  field  support,  SPL  and C let  you do it using
shifts; in PASCAL, it would be very difficult and very inefficient.

   Thus, if you find PASCAL's bit handling mechanisms sufficient -- as
is  quite probable, since the major  features are there -- then you'll
have  no problems. On the other hand, there's a very distinct limit on
what  you can do in PASCAL, and PASCAL doesn't have the flexibility to
let you work around it easily.


                              PASCAL/XL

   Just  a  brief comment about PASCAL/XL  -- in PASCAL/XL, instead of
using  PACKED RECORD and PACKED ARRAY  for bit field support, you must
use  CRUNCHED RECORDs and CRUNCHED ARRAYs.  Remember this when you try
to  write  code  that'll  run both in PASCAL/XL  and normal HP PASCAL.
Remember this and weep.


                     INCREMENT AND DECREMENT IN C

   Another  feature  of C worth mentioning  is the variable increment/
decrement set of operators. This is what allows C programmers to say

   char a[80], b[80];
   char *pa, *pb;
   pa = &a[0];
   pb = &b[0];
   while (pa != '\0')
     *(pb++) = *(pa++);

This, of course, is either one of the most elegant pieces of code ever
written, or one of the most unreadable. Or both.

   There are four operators like this in C:

   *  ++ PREFIX, i.e. "++X". This increments X by 1 (or sometimes 2 or
     4,  if  it's a pointer -- more  about this later) and returns X's
     new value. In other words, if you say

       int x, y;
       x = 10;
       y = 1 + (++x) + 2;

     then X will be set to 11 and Y will be 14 (1 + 11 + 2).

   *  ++  POSTFIX, i.e. "X++". This increments X  by 1 (or 2 or 4) and
     returns X's OLD value, the one it had before the increment. Thus,

       int x, y;
       x = 10;
       y = 1 + (x++) + 2;

     then  X will be set to 11, but Y will  be 13 (1 + 10 + 2). In the
     calculation of Y, the old value of X (10) was used.

   * -- PREFIX ("--X"), just like ++, but decrements.


   * -- POSTFIX ("X--"), just like --, but decrements.


   Note  that ++ and -- don't  always increment/decrement by 1. If you
pass them a POINTER, the pointer will be incremented or decremented by
the  SIZE  OF THE OBJECT BEING POINTED TO.  Thus, if "int" is a 4-byte
integer,

   int a[10];
   int *ap;
   ap = &a[0];
   ++ap;

will increment AP by 4 bytes (or 2 words, depending on how the pointer
is  represented internally). In other words, "++" and "--" of pointers
actually increment or decrement by one element; if AP used to point to
element 0 of A, now it points to element 1.

   Now,  one  reason why I mention these  operators is that they are a
non-trivial  difference  between  C  and  PASCAL,  and  I have  to say
something about them just so you'll think I'm thorough.

   Another  reason, though, is that  beyond the seemingly simple (and,
in case of the postfix operators, counterintuitive) definition lurks a
fairly  powerful construct that can be  quite useful in many cases. On
the other hand, some say -- and not without reason -- that using these
kinds  of  operators  makes  code  much  more  difficult  to  read and
understand.

   One  of  the original reasons that  these operators were introduced
was  that  some  of  the  computers  that  C was  first implemented on
supported these operations in hardware. Modern computers, like the VAX
and  the Motorola 68000, for instance, have special "addressing modes"
on  each  instruction  that  allow  you  to  store something  into the
location  pointed  to  by  a register and  then increment the register
(like  postfix  ++)  or  decrement  the register and  then store (like
prefix --).

   A reasonable compiler, though, can know enough to translate, say,

   X:=X-1;

into  "decrement  X";  even the 15-year-old SPL  compiler can do this.
Today's reason for the increment/decrement operators -- besides saving
poor programmers' weary fingers -- is that in many cases they can very
directly represent what you're trying to do.

   A  classic case, for instance, is stack processing. Say you want to
implement  your  own stack data structure.  The primary operations you
need  are  to PUSH a value onto the stack  and to POP a value. In SPL,
you  might have a pointer PTR that  points to the top cell, and define
two procedures,

   PROCEDURE STACK'PUSH (V);
   VALUE V;
   INTEGER V;
   BEGIN
   @PTR:=@PTR+1;
   PTR:=V;
   END;

   INTEGER PROCEDURE STACK'POP;
   BEGIN
   STACK'POP:=PTR;
   @PTR:=@PTR-1;
   END;

In C, you can have PTR point to one cell AFTER the top cell, and say:

   *(ptr++) = v;          /* to push V onto the stack */
   v = *(--ptr);          /* to pop a value from the stack */

   For  stacks (and queues and  other data structures), post-increment
and   pre-decrement   are   EXACTLY   what  you  need.  Of  course,  a
full-function stack package would have to have many more features, but
many  of  them  can  profitably use  post-increment/ pre-decrement and
other nice C features.

   Other  applications  are,  for  intstance, the case  I showed as an
example earlier:

   char a[80], b[80];
   char *pa, *pb;
   pa = &a[0];
   pb = &b[0];
   while (pa != '\0')
     *(pb++) = *(pa++);

What  this actually does (isn't it obvious?) is copy the string stored
in  the  array  A  to  the  array  B.  PA is a  pointer to the current
character  in A; PB is a pointer  to the current character in B. Since
all C strings are terminated by a null character ('\0'), the loop goes
through  A,  incrementing  the pointers and  copying characters at the
same time!

   Similarly, you can say...

   while (*(pa++) == ' ');

which will increment PA until it points to a non-blank; or,

   while (*(pa++) == *(pb++));

which  will increment PA and PB while the characters they point to are
equal -- very useful for a string comparison routine.


   In  a way, these features of C are rather like FOR loops that never
execute  when the starting value is  greater than the ending value. In
PASCAL, for instance, you can say

   FOR INDEX:=CURRCOLUMN+1 TO ENDCOLUMN DO
     ...

and know that if CURRCOLUMN = ENDCOLUMN, the loop won't be executed at
all  (which happens to be exactly  what you want). In classic FORTRAN,
though,

   DO 10 INDEX=CURRCOLUMN+1,ENDCOLUMN
     ...

will  always  execute  the  loop at least once,  even if CURRCOLUMN is
equal to ENDCOLUMN; if you don't want this, you have to have an IF ...
GOTO    around    the    loop.    The   point   here   is   that   the
"post/pre-increment/decrement"  features are one  of those things that
"just  happen to come in handy" in  a surprising number of cases. Just
by  looking  at  them, you wouldn't think  that they're so useful, but
there are a lot of applications where they are just the thing.

   Fine,  you've  heard  the  "pro". ++ and -- let  you write a lot of
elegant  one-liners for handling stacks, strings, queues and the like.
Now, the con:

   while (pa != '\0')
     {
     *pb = *pa;
     pb = pb + 1;
     pa = pa + 1;
     }

   while (pa == ' ')
     pa = pa + 1;

   while (*pa == *pb)
     {
     pa = pa + 1;
     pb = pb + 1;
     }

What  are these? Well, these are the  C loops that do exactly what the
above   post/pre-increment   examples   do,   but  using  conventional
operators.  Are  they more or less readable  than the ++ mechanisms we
saw?  Let  us for the moment ignore  performance; even if the compiler
doesn't  optimize  all  these  cases (and, to  be fair, many compilers
won't),  performance  isn't  everything. DO ++  AND -- CONSTRUCTS MAKE
YOUR CODE MORE OR LESS READABLE?

   Now,  I don't have any opinions on this matter; I just tell you the
two  sides of the issue and let  you decide. I am completely objective
(if  you believe that, I've got some waterfront property in Kansas you
could have real cheap...). Readability isn't a black-and-white sort of
thing;  everybody has his own standards.  What do you think? Are these
"two-in-one" programming constructs elegant or ugly?


                              ?: AND (,)

   Let's say that you want to call the READX intrinsic, reading either
LEN  words or 128 words, whichever is  less. You know that your buffer
only  has room for 128 words, and you don't want to overflow it if LEN
is  too large; normally, though, if LEN<128, you want to read only LEN
words.

   In PASCAL, you'd have to write this:

   IF LEN<128 THEN
     ACTUAL_LEN := READX (BUFFER, LEN)
   ELSE
     ACTUAL_LEN := READX (BUFFER, 128);

In C, however, you can instead say:

   actual_len = readx (buffer, (len<128)?len:128);

(Think  of  all the keystrokes you save!)  What this actually means is
that the second parameter to READX is the expression

   (LEN<128) ? LEN : 128

This is a "ternary" operator -- an operator with three parameters:

   *  the  first  parameter (before the "?")  is a boolean expression,
     called the "test" -- in this case "LEN<128";

   *  the second parameter (between the  "?" and ":") is an expression
     called the "then clause", in this case "LEN";

   * the third parameter (after the ":") is the "else clause", in this
     case "128".

The  behavior is quite simple -- if  the "test" is TRUE, this operator
returns  the  value  of  the "then clause"; if  the test is FALSE, the
operator  returns  the  value  of  the  "else"  clause.  Just  like an
IF/THEN/ELSE statement except that it returns a value of an expression
instead of just executing some statements.

   The  advantage should be clear. There are many cases where you need
to  do one of two things, almost  exactly identical except for one key
parameter.  If the two tasks need a different statement in one case or
another  (e.g.  a  call  to  READ  instead  of  READX),  you'd  use an
IF/THEN/ELSE;  if  they  need  a  different expression  as a parameter
inside a statement, you'd use a ?: construct.

   The  trouble  is, again, one of  readability. Consider this example
(taken  as  an  example  of  the  "right way" of using  ?/: from "C, a
reference manual", by Harbison & Steele):

   return (x > 0) ? 1 : (x < 0) ? -1 : 0;

OK,  quick,  what  does this do? Why, it  determines the "signum" of a
number,  of course! +1 if the number is positive, -1 if it's negative,
0 if it's zero. Which is more readable -- the above or

   if (x > 0) return 1;
   else if (x < 0) return -1;
   else return 0;

Again, up to you to decide -- some would say that ?: is better, others
would  side with the IF/THEN/ELSE. On the  other hand, in this case, I
think there is a substantive thing to be said against "? :":

   *  ?:  IS  MORE THAN JUST AN OPERATOR;  IT'S A CONTROL STRUCTURE IN
     THAT  IT INFLUENCES THE FLOW OF  THE PROGRAM. ESPECIALLY WHEN ?:S
     ARE  NESTED (e.g. if you're testing  two conditions and do one of
     four  things  based  on  the result), THE  FACT THAT THIS CONTROL
     STRUCTURE  IS  DELIMITED BY TWO  SPECIAL CHARACTERS (rather than,
     say, IF, THEN, or ELSE) CAN MAKE THE PROGRAM DIFFICULT TO READ.

In  other  words,  since "?" and ":"  are just two special characters,
like  many of the other special characters that occur in C statements,
you  can often have a hard time  finding out where the test starts and
where  it ends, where one THEN or ELSE clause ends, and so on. This is
especially the case when you write code like

   a = (x>0) ? ((y>0)?(x*y):(-x*y)) : ((y>0)?(-x+y):(error_trap()));

In  which  ?:s are nested within each  other. Of course, you might say
that this code is badly written; perhaps it should be:

   a = (x>0)
         ? ((y>0) ? (x*y) : (-x*y))
         : ((y>0) ? (-x+y) : (error_trap()));

But then, why not just write it as

   if (x>0 && y>0)  a = x*y;
   else if (x>0)  a = -x*y;
   else if (y>0)  a = -x+y;
   else  a = error_trap();

In  any case, this is mostly a  matter of personal preferences. I'm in
favor  of  using  ?:  in  #define's (where it is  necessary -- see the
chapter on them, and on "(,)" below), such as

   #define min(a,b) (((a)<(b)) ? (a) : (b))
   #define abs(a)   (((a)<0) ? -(a) : (a))

On  the  other  hand, I try to avoid ?:s  in normal code in almost all
cases. I prefer IF/THEN/ELSE statements instead.


                                 (,)

   Just like ?: is equivalent to an IF/THEN/ELSE,

   (x,y,z)

is essentially equivalent to

   {            /* begin */
   x;
   y;
   z;
   }            /* end */

The  difference, of course, is that  "(x,y,z)" returns the value of Z.
For instance, say that we're looping through a file. We want to read a
bunch of records until we get an EOF (which, presumably, is a returned
by a call to the "get_ccode" procedure). We might write:

   while ((len=fread(fnum,data,128), get_ccode()==2))
     ...

Instead  of a simple expression, our  loop test consists of two parts,
which  are both evaluated (in the given order!) to determine the value
--  every time we do the loop test, we first call FREAD and then check
the result of GET_CCODE. This, of course, is identical to

   len = fread(fnum,data,128);
   while (get_ccode()==2)
     {
     ...
     len = fread(fnum,data,128);
     }

but  using  the  "(x,y)" construct, we don't  have to repeat the FREAD
call twice.

   A  more common use of this is in  FOR loops. The three parts of the
FOR  loop  --  the  loop counter initialization,  the loop termination
test,   and   the  loop  counter  increment  --  must  all  be  single
expressions.  Using  the "," operator, you  can fit several operations
into one expression, to wit:

   for (pa=&a[0], pb=&b[0]; pa!='\0'; pa++, pb++)
     *pb=*pa;

This,  of course, copies the string  A t(without the trailing zero) to
the  string  B. The "," operator isn't  used here for its value (which
would  be  the value of "&b[0]" in  the initialization portion and the
new  value  of  "pb"  in  the  increment portion); it's  just used for
combining several expressions in a context where only one is allowed.

   The  major  power  of both the ?: operator  and the "," operator is
manifested   in   #define's.   Statements   can  only  appear  at  the
"top-level", separated by semicolons; expressions, however, can appear
either inside statements or in place of statements. Thus,

   #define min(a,b) if (a<b) a; else b;

won't work, because it will make

   x=min(y,z)

expand into

   x=if (y<z) y;
   else z;

which is quite illegal. On the other hand,

   #define min(a,b) (((a)<(b)) ? (a) : (b))

will translate

   x=min(y,z)

into

   x=((y)<(z)) ? (y) : (z);

which will do the right thing.

   Similarly, say you have a record structure

   typedef struct {real re; real im;} complex;

Then, you can have a #define

   #define cbuild(z,rpart,ipart) (z.re=(rpart), z.im=(ipart), z)

When  you  say  "CBUILD(Z,1.0,3.0)",  this  sets  the variable  Z's RE
subfield  to 1.0, its IM subfield to 3.0, and returns the value Z. You
can use this in cases like

   c = csqrt (cbuild(z,1.0,3.0));

If you didn't have the "," operator, you couldn't write a #define like
this. You could say:

   #define cbuild(z,rpart,ipart) {z.re=rpart; z.im=ipart;}

but  then  it wouldn't be usable in  an expression because C statement
blocks can't be parts of expressions.

   Again,  my personal attitude towards the "," operator is similar to
my  opinion about ?:. It is necessary for #DEFINEs but best avoided in
normal  code, the one exception being  FOR loops, where it's used more
as a separator than to return a result. That's one reporter's opinion.


              THE "COMPOUND ASSIGNMENTS" (+=, -=, ETC.);
    ALSO, SOME MORE GENERAL COMMENTS ON EFFICIENCY AND READABILIY

   Finally,  C  has one other set  of interesting operators. These are
the  "compound  assignments",  which  perform  an operation  and do an
assignment at the same time. A possible example might be:

   x[index(7)+3] += inc;

which is, of course, identical to

   x[index(7)+3] = x[index(7)+3] + inc;

Similarly,

   (*foo).frob |= 0x100;

means the same thing as:

   (*foo).frob = (*foo).frob | 0x100;

(and  happens to set the  8th least-significant bit of "(*foo).frob").
[Note:  The above examples aren't actually EXACTLY the same because of
considerations  pertaining  to  "double evaluation"  of the expression
being  assigned  to; however, this isn't  usually very relevant, and I
won't discuss it here.]

   Now if I wanted to, I could stop here. Obviously "a x= b" means the
same  as "a = a x b" (where "x" is pretty much any dyadic operator) --
now you know it and can make up your own minds about it.

   But,  what  the  hell  -- I'm a naturally  garrulous kind of guy. I
could   run   on   for   pages   about   these  operators,  and  their
psychoscientific  motivations! In fact, I think  I might do just that,
because  I think that there's something of deeper significance to them
than just a few saved keystrokes.

   To  put it simply, there are several "statements" that the presence
of  these operators -- "+=", "-=",  "++", "--", etc. -- makes. Whether
you  take  the  "pro"  side  or the "con" on  them will influence your
opinion on the utility of these operators:

   *  EFFICIENCY.  Saying  "X  +=  Y"  or "X++" will  let the compiler
     generate more efficient code than just "X = X+Y" or "X = X+1".

     -  PRO:  The  compiler  "knows"  that  we're just  incrementing a
       variable  (rather than doing an arbitrary add) and can generate
       the  more  efficient instructions that  most computers have for
       this special case.

     - CON: All -- or almost all -- modern compilers can easily deduce
       this  information  even  from  a  "X = X+Y" or  "X = X+1". Even
       SPL/3000, which is 15 years old, will generate an "INCREMENT BY
       ONE" instruction if you say "X := X+1".

     -  MORE  PRO  (COUNTER-CON?): It's true  that most compilers will
       automatically  generate fast code for increments, bit extracts,
       etc.  However, every compiler will  have SOME flaw somewhere --
       perhaps  it  won't  recognize  one  particular  case  and  will
       generate  inefficient code. Special operators that the compiler
       ALWAYS  translates efficiently can allow you to write efficient
       code even if you're stuck with a silly compiler implementation.

   * READABILITY. Saying "X += Y" is more readable than "X = X+Y".

     - PRO: Consider one of the examples above:

         x[index(7)+3] = x[index(7)+3] + inc;

       Here,  we're  incrementing "x[index(7)+3]" --  but how does the
       person  reading  the  program know that? He  has to look at the
       fairly complex expressions on both sides of the assignment, and
       make  sure that they're identical! Similarly, when he's writing
       the  program, it's quite easy for him  to make a mistake -- say
       "x[index(3)+7]"  instead of "x[index(7)+3]" on  one side of the
       assignment,  and probably never see  it because he "knows" that
       it's just a simple increment. Saying

         x[index(7)+3] += inc;

       is  actually  MORE readable, since you  don't have to duplicate
       any code and thus introduce additional opportunity for error.

     -  CON:  "x[index(7)+3] += inc". Can you  read that? I can't read
       that! The more special characters and operators a language has,
       the  harder  it  is  to  read. Everybody's USED  to simple ":="
       assignments, present in ALGOL, FORTRAN, PASCAL, SPL, ADA, etc.;
       when  we  introduce  a whole new bevy  of operators, people are
       likely  to  misunderstand them, or at  least have to take extra
       time and effort while reading the program.

   *  FLEXIBILITY. OK, so you don't  like these operators -- don't use
     them!

     -  PRO:  Hey,  this  is  a free country! Look  at the entire rich
       operator  set  and  use  only those that  you find pleasant; at
       least in C, you have the choice.

     -  CON:  I  might  not  be  forced  to WRITE  programs with these
       operators  in them, but I may well  be forced to READ them; 70%
       of  a  program's lifetime is spent  in maintenance, and I don't
       want my programmers to write in a language that ENCOURAGES them
       to  write unreadably! A language  should be restrictive as well
       as  flexible  --  it  should  prevent wayward  programmers from
       writing unreadable constructs like:

         x += (x++) + f(x,y) + (y++);  /* can you understand this? */

     - PRO again: Authoritarian fascist!

     - CON again: Undisciplined hippie!

   OK,  break  it  up,  boys.  I  think  that  the  above  issues  are
particularly   involved   in   evaluating   C's   rich   (but  perhaps
"undisciplined")  operator  set, and, to  some extent, the differences
between  PASCAL  and  C in general. I won't  pretend to tell you which
attitude is correct -- I don't know myself. I just want to lay more of
the cards out on the table.


PASSING VARIABLE NUMBERS OF PARAMETERS TO PROCEDURES -- SPL

   One feature that SPL has is so-called "OPTION VARIABLE" procedures.
This is a procedure that looks like this:

   PROCEDURE V (A, B, C);
   VALUE A;
   INTEGER A;
   BYTE ARRAY B, C;
   OPTION VARIABLE;
   BEGIN
   ...
   INTEGER VAR'MASK = Q-4;
   ...
   IF VAR'MASK.(14:1)=1 THEN    << was parameter B omitted? >>
     ...
   END;

What does this mean? This means that when we say:

   V (1);

or

   V (,,BUFF);

or even simply

   V;

the  SPL  compiler will NOT complain that  we didn't specify the three
parameters  that  V  expects.  Rather,  it will  pass those parameters
you've  specified,  pass  GARBAGE  in  place of  the parameters you've
omitted,  and will set the "Q-4" location  in your stack to a bit mask
indicating exactly which parameters were specified and which were not.
As you see, we've declared the variable VAR'MASK to reside at this Q-4
location, and can now say

   IF VAR'MASK.(x:1)=1 THEN

to  check  whether or not the parameter  indicated by "x" was actually
specified. "x" has to be the bit number associated with the parameter,
counting  from 15 (the last parameter) down.  Thus, to check if C (the
last parameter) was specified, we'd say

   IF VAR'MASK.(15:1)=1 THEN

To check for B (the second-to-last), we'd say

   IF VAR'MASK.(14:1)=1 THEN

To check for A (the third-to-last), we'd say

   IF VAR'MASK.(13:1)=1 THEN

   Note the twin advantages of being able to omit parameters:

   * It can make the procedure call a lot easier to write or read; the
     FOPEN  intrinsic has 13 parameters, all of them necessary for one
     thing or another. Do you want to have to say:

        MOVE DISC'DEV:="DISC ";
        FNUM:=FOPEN (FILENAME, 1, %420, 128, DISC'DEV, DUMMY,
                     0, 2, 1, 1023D, 8, 1, 0);

     or wouldn't you rather just type

        FNUM:=FOPEN (FILENAME, 1, %420);

     and  let  all  the other parameters  automatically default to the
     right values? It's easier to write AND gives less opportunity for
     error  (did  you  notice  that I  accidentally specified blocking
     factor  2 and 1 buffer instead of the default, which is the other
     way around?).

   *  Furthermore, the very act of  omitting or specifying a parameter
     carries  INFORMATION.  For instance, FOPENing  a file with DEV=LP
     and  the forms message parameter  OMITTED is quite different than
     passing  any forms message. The very  fact that the forms message
     wasn't  specified  tells  the  file system  something. Similarly,
     omitting  the  blocking factor in an  FOPEN makes the file system
     calculate  an  "optimal" (actually it  isn't) blocking factor for
     the file.

   Many  examples  --  FOPEN,  FGETINFO, FCHECK, etc.  -- can be given
where  not  having OPTION VARIABLE would  make calling the procedure a
substantial burden.

   While  we talk about the  advantages of OPTION VARIABLE procedures,
let's  note also some of the problems with the way they're implemented
in SPL:

   *  If you declare a procedure to  be OPTION VARIABLE, then SPL will
     let  its caller omit ANY  parameter. Usually, some parameters are
     optional, while others (e.g. the file number in an FGETINFO call)
     are  required,  and  you'd  like  the  compiler  to  enforce this
     requirement.

     Otherwise,  you'd  either have to rely on  the user (always a bad
     idea),  or check the presence of  each of the required parameters
     yourself  at  run-time  (possible  but  somewhat  cumbersome  and
     inefficient).

   *  As  you  saw,  checking to see whether  a parameter was actually
     passed is not an easy job. Instead of saying

        IF HAVE(FILENAME) THEN

     you have to say

        INTEGER VAR'MASK = Q-4;
        ...
        IF VAR'MASK.(3:1)=1 THEN

     knowing  (as of course you do)  that FILENAME is the 13th-to-last
     procedure parameter and is thus indicated by VAR'MASK.(3:1).

   *  Often,  a user's omission of a  parameter simply means that some
     default  value should be assumed. Why  not have the compiler take
     care  of this case for you instead  of making you do it yourself?
     For instance, if you were writing FOPEN, wouldn't you rather say:

        INTEGER PROCEDURE FOPEN (FILE, FOPT, AOPT, RECSZ, DEV, ...);
        VALUE FOPT, AOPT, RECSZ, ...;
        BYTE ARRAY FILE (DEFAULT ""), DEV (DEFAULT "DISC ");
        INTEGER FOPT (DEFAULT 0), AOPT (DEFAULT 0),
                RECSZ (DEFAULT 128);
        ...
        OPTION VARIABLE;

     instead of

        INTEGER PROCEDURE FOPEN (FILE, FOPT, AOPT, RECSZ, DEV, ...);
        VALUE FOPT, AOPT, RECSZ, ...;
        BYTE ARRAY FILE (DEFAULT ""), DEV (DEFAULT "DISC ");
        INTEGER FOPT (DEFAULT 0), AOPT (DEFAULT 0),
                RECSZ (DEFAULT 128);
        ...
        OPTION VARIABLE;
        BEGIN
        INTEGER VAR'MASK=Q-4;
        ...
        IF VAR'MASK.(3:1)=0 THEN @FILE:=@DEFAULT'FILE;
        IF VAR'MASK.(4:1)=0 THEN FOPT:=0;
        IF VAR'MASK.(5:1)=0 THEN AOPT:=0;
        IF VAR'MASK.(6:1)=0 THEN RECSZ:=128;
        IF VAR'MASK.(7:1)=0 THEN @DEV:=@DISC'DEVICE;
        ...
        END;

     Note  only  is  that easier on the author  of FOPEN, but it could
     also  be more efficient at run-time  -- instead of having a whole
     bunch  of run-time bit extracts and checks, the code generated by
     a call such as:

        FNUM:=FOPEN (TMPFILE);

     might  actually have all the default  values built in to it (just
     as   if  the  user  had  explicitly  specified  them),  saving  a
     non-trivial amount of time.

   *  Finally, another interesting concern. Say that I want to write a
     procedure  that's "plug-compatible" with  the FOPEN intrinsic. In
     my  MPEX/3000, for instance, I  have a SUPER'FOPEN procedure that
     checks a global "debugging" flag, prints all of its parameters if
     the  flag  is true, and then  calls FOPEN. SUPER'FOPEN also calls
     the  ZSIZE  intrinsic  to make sure that  FOPEN has as much stack
     space  as  possible  to  work  with;  it  might  also  detect and
     specially handle certain error conditions, and so on.

     In  other  words,  what  I  want  to  have is  an OPTION VARIABLE
     procedure  that  does  some  things  and  then passes  all of its
     parameters to another OPTION VARIABLE procedure:

        INTEGER PROCEDURE SUPER'FOPEN (FILE, FOPT, AOPT, RECSZ, ...);
        ...
        OPTION VARIABLE;
        BEGIN
        ...
        SUPER'FOPEN := FOPEN (FILE, FOPT, AOPT, RECSZ, ...);
        ...
        END;

     The  trouble here is that in my FOPEN call I want to OMIT ALL THE
     PARAMETERS  THAT WERE OMITTED IN THE SUPER'FOPEN CALL and SPECIFY
     ONLY  THOSE  PARAMETERS  THAT  WERE SPECIFIED  IN THE SUPER'FOPEN
     CALL.  In other words, in this case I DON'T KNOW WHICH PARAMETERS
     I WANT TO OMIT UNTIL RUN-TIME. If I just say:

        SUPER'FOPEN := FOPEN (FILE, FOPT, AOPT, RECSZ, ...);

     passing  all  thirteen parameters, FOPEN  will think that they're
     all  the ones I want, whereas many of them are garbage. I want to
     say

        INTEGER VAR'MASK = Q-4;
        ...
        SUPER'FOPEN := VARCALL FOPEN, VAR'MASK (FILE, FOPT,
                                                AOPT, RECSZ, ...);

     somehow telling the compiler: "this isn't an ordinary call, where
     you  should  figure out which parameters  are specified and which
     aren't;  rather,  pass to FOPEN the  very same VAR'MASK parameter
     that I myself was given".


       PASSING VARIABLE NUMBERS OF PARAMETERS TO PROCEDURES --
      STANDARD PASCAL, PASCAL/3000, ISO LEVEL 1 STANDARD PASCAL

   Standard  PASCAL,  PASCAL/3000, and ISO Level  1 Standard PASCAL do
not  allow  you to pass variable  numbers of parameters to procedures.
Enough said?

   Well,  maybe  not.  As  I've  mentioned before, the  mere fact that
language  X  has a feature that language  Y does not doesn't mean that
language  X  is  better than language Y.  This isn't a basketball game
where  you  get  2 points for each feature,  and 3 for each one that's
really  far  out.  Maybe  PASCAL  has  a  point -- do  you really need
procedures with variable numbers of parameters?

   Well, the first thing you notice about, say, the CREATE, FOPEN, and
FGETINFO  intrinsics  --  conspicuous  users  of  the  OPTION VARIABLE
features -- is that they aren't very extensible.

   Sure,  FGETINFO has 20 parameters, and you can specify any and omit
any  (except  the  file  number); but what if  a new file parameter is
introduced?  Since there are all these  thousands of programs that use
the old FGETINFO, we can't just add a 21st parameter, since that would
make them all incompatible.

   This,  in  fact,  is  why  the  FFILEINFO intrinsic  was created --
FFILEINFO  takes  a  file number and five  pairs of "item numbers" and
"item  buffers".  Each item number is a  code indicating what piece of
information  ought  to  be  returned  about  a file. Thus,  up to five
different  pieces of information can be returned by a single FFILEINFO
call.  If  you  need more than five (which  is unlikely), you can call
FFILEINFO twice or however many times is necessary. A typical call can
thus look like:

   FFILEINFO (FNUM, 8 << item number for "filecode" >>, CODE,
                    18 << item number for "creator id" >>, CREATOR);

instead of the FGETINFO call, which would be:

   FGETINFO (FNUM,,,,,,,,CODE,,,,,,,,,,CREATOR);

Note another advantage of the FFILEINFO approach -- you no longer have
to  "count  commas"  to make sure that your  parameter is in the right
place; the item number (which you've presumably declared as a symbolic
constant) indicates what the item you want to get is.

   So, instead of the 20-parameter OPTION VARIABLE FGETINFO intrinsic,
we  have FFILEINFO. But FFILEINFO  is still OPTION VARIABLE! Remember,
FFILEINFO  takes up to five item number/item value pairs; in this case
we entered only two. Of course, we could have said:

   FFILEINFO (FNUM, 8 << item number for "filecode" >>, CODE,
                    18 << item number for "creator id" >>, CREATOR,
                    0, DUMMY, 0, DUMMY, 0, DUMMY);

but  who'd  want  to? Similarly, FFILEINFO might  have been defined to
return  only one piece of data at a time (and thus always have exactly
three  parameters),  but  again that's not  very good. Every FFILEINFO
call  has  some  fixed overhead to it  (for instance, finding the File
Control  Block  from  the file number FNUM);  why repeat it more often
than you have to?

   Another   example   arises  in  the  CREATEPROCESS  intrinsic.  The
CREATEPROCESS  intrinsic  was  introduced when some  new parameters --
;STDLIST,  ;STDIN,  and  ;INFO  --  had  to  be  added  to  the CREATE
intrinsic.

   The  CREATE  intrinsic,  although  OPTION  VARIABLE,  was initially
defined  to  have 10 parameters. This  means that any compiled program
that  uses  the CREATE intrinsic expects it  to have 10 parameters; if
you  added three parameters to the  CREATE intrinsic in the system SL,
all the old programs would stop working.

   An  additional problem with CREATEPROCESS is  that it wasn't just a
"get  me  some  information"  intrinsic like FFILEINFO  -- it actually
starts  a  new process. We can't  just say "pass five process-creation
parameters  at  a time; if you need to  pass more, just call it twice"
(like  we  did for FFILEINFO). All the  parameters need to be known to
the CREATEPROCESS intrinsic at once.

   The  CREATEPROCESS  intrinsic,  although  OPTION  VARIABLE, doesn't
really need to be. You can just view it as a five-parameter procedure:

   CREATEPROCESS (error, pin, program, itemnumbers, items);

The   itemnumbers   array   contains  the  item  numbers  of  all  the
process-creation  parameters; the items  array contains the parameters
themselves  (either  the  values  or  the addresses). Thus,  to do the
equivalent of an old

   CREATE (PROGRAM, ENTRY'NAME, PIN, PARM, 1, , , MAXDATA);

(which  would  create  a  process with entry  ENTRY'NAME, ;PARM= PARM,
;MAXDATA= MAXDATA, and "load flags" 1), we'd say

   INTEGER ARRAY ITEM'NUMS(0:4);
   INTEGER ARRAY ITEMS(0:4);
   ...
   << Item 1 = entry name, 2 = parm, 3 = load flags, 6 = maxdata; >>
   << 0 terminates the list. >>
   << We probably want to have EQUATEs for these "magic numbers". >>
   MOVE ITEM'NUMS:=(1, 2, 3, 6, 0);
   ITEMS(0):=@ENTRY'NAME;   << the address of the entry name >>
   ITEMS(1):=PARM;          << ;PARM= >>
   ITEMS(2):=1;             << load flags >>
   ITEMS(3):=MAXDATA;       << ;MAXDATA= >>
   CREATEPROCESS (ERR, PIN, PROGRAM, ITEM'NUMS, ITEMS);

As  you see, the non-OPTION VARIABLE  approach may be more extensible,
but it certainly isn't easier to write or read.

   Finally,  let me point out  that OPTION VARIABLE procedures, though
not easily extensible when you have COMPILED CODE that calls them, are
quite easy to extend when you have SOURCE CODE.

   If  you have your own OPTION VARIABLE procedure MYPROC, then adding
a  new  parameter to it is a piece  of cake; in fact, it's easier than
adding  a new parameter to a  non-OPTION VARIABLE procedure (for which
you'd  have to change all the calls to pass an extra dummy parameter).
All  you  need  to  do  to  extend an OPTION  VARIABLE procedure is to
recompile both it and all its callers, so that the newly-compiled code
will appropriately reflect the new parameters of the called procedure.


       PASSING VARIABLE NUMBERS OF PARAMETERS TO PROCEDURES --
                        KERNIGHAN & RITCHIE C

   One  thing  you  may  have noticed about C is  that two of the most
important  functions  in  C  --  "printf",  which  outputs  data  in a
formatted  manner,  and  "scanf",  which inputs data  -- have variable
numbers of parameters. An example of a call to "printf" might be:

   printf ("Max = %d, min = %d, avg = %d\n", max, min, average);

This  call happens to take 4 parameters -- the format string (in which
the  "%d"s indicate where the rest of the parameters are to be put in)
and  three  integers  (max, min, and average).  Other calls might take
only  one  parameter  (a constant formatted  string, e.g. 'printf ("Hi
there!\n")') or two or twenty.

   Now,  PASCAL's input/output "procedures"  (READ, READLN, WRITE, and
WRITELN)  also  take  a variable number  of parameters. Unfortunately,
PASCAL  isn't being quite honest when it just calls them "procedures";
they  can get away with things that ordinary procedures can't, such as
taking  parameters  of  varying  types,  taking  a variable  number of
parameters,  and  even  having  special parameters  prefixed with ":"s
(e.g. "WRITELN(X:10, Y:7:4)").

   C,   however,  is  serious  when  it  calls  "printf"  and  "scanf"
procedures.  Their  source is kept somewhere  in some C library source
files; if you don't like them, you can rewrite them yourself, or write
your own procedures that take variable numbers of parameters.

   Let's  say that we want to do just  that. Let's say that on our way
to  work,  we fall down and knock ourselves  on the head. When we wake
up,  we  find that we've inexplicably fallen  in love with FORTRAN and
want  to  make  our "printf" format strings  look exactly like FORTRAN
FORMAT statements. (For a slightly more plausible example, say that we
want  to add some new directives, such  as "%m" for outputting data in
monetary  format,  with  ","s  between  each  thousand  --  standard C
"printf" doesn't allow this.)

   Well, what we really want to do is write a "writef" procedure:

   writef (fmtstring, ???)
   char fmtstring[];
   ???;
   {
   ???
   }

Now  the good news is that -- unlike  PASCAL -- C won't get upset when
we call this procedure as:

   writef ("I5,X,F7.2", inum, fnum);

on one line, and as

   writef ("I4,X,S,X,I3", i1, s, i2);

on  the next; C never checks the number or types of parameters anyway.
(Note  that C allows us to omit parameters at the END of the parameter
list;  unlike SPL, it doesn't let us  omit them from the MIDDLE of the
parameter  list.)  The  question  is  -- how do  we write the "writef"
procedure?  The  caller might be able to  specify a variable number of
parameters,  but  how  will  "writef"  itself be able  to access these
parameters?

   This  is  where  the  trouble with having  "OPTION VARIABLE"-type C
routines  comes  in. There's no  universal, system-independent way for
"writef"  and any such procedure to  find out how many parameters were
actually passed, or access those parameters that were passed.

   Different  compilers  have  different conventions for  this sort of
thing.  Many  compilers  assure  you  that if the  user passes, say, 3
parameters to a procedure that expects 10 parameters, then the first 3
procedure  parameters will have the right values -- it's just that the
remaining  7  will  be  set  to garbage. In this  case, we could write
"writef" as:

   writef (fmtstring, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10)
   char fmtstring[];
   int p1, p2, p3, p4, p5, p6, p7, p8, p9, p10;

Then if we call "writef" using:

   writef ("I5,X,F7.2", inum, fnum);

the  procedure can look at the format string -- which it knows will be
passed  as "fmtstring" -- determine the  number of parameters that the
format string expects (in this case, 2, one for the I5 and one for the
F7.2), and then look only at "p1" and "p2", not at "p3" through "p10",
which are known to be garbage.

   Some  other  compiler  might  always assure you  that the number of
parameters passed to a procedure would be kept in some register, which
could  then be accessed using an  assembly routine. On the other hand,
it  might  say  that  if  10 parameters were expected  but only 3 were
passed,  the actually passed parameters would be accessible as P8, P9,
and P10 instead of P1, P2, and P3. Then, WRITEF would have to call the
assembly  routine  to  determine  the number of  passed parameters and
would  then have to realize that  since only 3 parameters were passed,
their data is stored in P8, P9, and P10.

   As you see, there are two components here:

   *  Knowing  the  number  of  parameters passed  (here determined by
     looking at FMTSTRING).

   *  Being  able  to  determine the value of  each parameter that was
     passed  (here  assured  by  knowing that any  parameters that are
     passed  will  become  the  first, second, etc.  parameters of the
     procedure).

Somehow  --  by  some  compiler guarantee, or  by a calling convention
(e.g.  the  number  of  parameters  is indicated in  FMTSTRING, or the
parameter list is terminated by -1), or by some assembly routine -- we
need to be able to do both of the above things.

   Finally,  let  me  point  out  one other factor.  When we declare a
procedure as

   writef (fmtstring, p1, p2, p3, p4, p5, p6, p7, p8, p9, p10)

we  really  don't want to access the  last 10 parameters as P1 through
P10; we want to be able to view them all as elements of one big array,
so we could say something like:

   for (i = 0; i < 10; i = i + 1)
     process_parm (p[i]);

instead of having to say

   process_parm (p1);
   process_parm (p2);
   process_parm (p3);
   process_parm (p4);
   process_parm (p5);
   process_parm (p6);
   process_parm (p7);
   process_parm (p8);
   process_parm (p9);
   process_parm (p10);

The  same,  incidentally,  arises with SPL -- we'd  like to be able to
access   SPL  OPTION  VARIABLE  parameters  as  elements  of  one  big
"parameters  array",  too.  In  SPL,  it turns out, we  can do that by
saying:

   PROCEDURE FOO (P0, P1, P2, P3, P4, P5, P6, P7, P8, P9);
   VALUE P0, P1, P2, P3, P4, P5, P6, P7, P8, P9;
   INTEGER P0, P1, P2, P3, P4, P5, P6, P7, P8, P9;
   OPTION VARIABLE;
   BEGIN
   INTEGER ARRAY PARMS(*)=P0;
   ...
   END;

The  PARMS array here is defined to  start at the location occupied by
the  by-value parameter P0; it so  happens that the way parameters are
allocated  on the HP3000, PARMS(3) would  be equal to P3, and PARMS(7)
would be equal to P7. Similarly, in C you can say:

   foo (p0, p1, p2, p3, p4, p5, p6, p7, p8, p9)
   int p0, p1, p2, p3, p4, p5, p6, p7, p8, p9;
   {
   int *parms;
   parms = &p0;
   ...
   x = parms[i];   /* meaning parameter #I */
   ...
   }

and  this will work on those C compilers which allocate the parameters
the  appropriate way on the stack. As  you see, the conclusion here --
just  like  in  the  general question of  writing OPTION VARIABLE-like
parameters -- is:

   * It's probably doable on any particular C implementation, but it's
     certainly not portable.

   Thus, to summarize, C's support for procedure with variable numbers
of parameters is:

   *  Unlike PASCAL, C syntax allows you to specify a different number
     of  parameters in a call than the procedure actually has: you can
     call a 10-parameter procedure using "P (1, 2, 3)".

   * Unlike SPL, you can't omit any parameters in the middle of a call
     -- "P (1,, 3,,, 6)" is illegal.

   *  Although  the  CALL  is  legal  and portable, there's  no way to
     portably  write  a  procedure  that EXPECTS a  variable number of
     parameters.

   * On the other hand, on most C compilers, there will be SOME way of
     writing  an  OPTION  VARIABLE-type procedure, although  as I said
     it's likely to be rather different from compiler to compiler.

   *  Finally -- something that I haven't mentioned yet but that is of
     much  relevance  --  parameters  of different types  may occupy a
     different  amount of space on the call stack. If you pass a "long
     float"  to  a  procedure  that's expecting  "int" parameters, the
     "long  float"  will  end up occupying  two parameters. This means
     that  the  procedure  must  know  when  its parameters  are "long
     float"s  (like  "writef"  can  know  by looking  at the FMTSTRING
     parameter), and kludge accordingly.


       PASSING VARIABLE NUMBERS OF PARAMETERS TO PROCEDURES --
                        DRAFT ANSI STANDARD C

   Draft  ANSI Standard C has a provision for passing variable numbers
of parameters. Like many good things, it's at the same time useful and
confusing. Let's have a look at it.

   Calling  an  OPTION VARIABLE-type procedure in  Draft Standard C is
quite similar to the way you'd do it in K&R C:

   writef ("I5,X,F7.2", inum, fnum);

The  one  difference is that the compiler  might (or might not) DEMAND
that  you establish a function prototype (see "DATA STRUCTURES -- TYPE
CHECKING")  to  declare that this function  takes a variable number of
parameters. The prototype for WRITEF would probably be:

   extern int writef (char *, ...);

The  "char  *" says that there is  one REQUIRED parameter, a character
array;  the  "..."  --  literally, three "."s, one  after the other --
means that there is a variable number of parameters after this.

   Defining  a procedure that can take a variable number of parameters
is trickier. Here's an example:

   writef (char *fmtstring, ...)
   {
   va_list arg_ptr;
   va_start (arg_ptr, fmtstring);
   ...
   while (!done)
     {
     ...
     if (current_fmtstring_descriptor_is_I)
        output_integer (va_arg (arg_ptr, int));
     else if (current_fmtstring_descriptor_is_F)
        output_integer (va_arg (arg_ptr, float));
     else if (current_fmtstring_descriptor_is_S)
        output_integer (va_arg (arg_ptr, char *);
     ...
     }
   ...
   va_end (arg_ptr);
   }

Consider the components of this declaration one at a time:

   *  The "..." in the header  indicates that besides the one required
     parameter,  this  procedure  takes an unknown  number of optional
     parameters.

   *  The  "va_list arg_ptr" declares a  variable called "arg_ptr", of
     type  "va_list" (which is defined in a special #INCLUDE file that
     comes with the C compiler).

   * The "va_start (arg_ptr, fmtstring)" call initializes "arg_ptr" to
     point  to  the  first  variable parameter --  the one immediately
     after  "fmtstring".  "va_start" must be  passed the last required
     parameter  (in this case, "fmtstring");  among other things, this
     means that every procedure must take AT LEAST ONE fixed parameter
     -- you can't have all the parameters be optional.

   *  The procedure then (presumably) goes through FMTSTRING and finds
     out  what  the  types of the parameters is  expected to be. As it
     determines  that  the current format descriptor  is, say, "I", or
     "F",  or  "S", it "picks up" the  next parameter. It does this by
     saying

        va_arg (arg_ptr, <type>)

     The  "arg_ptr"  is  the  same  variable  that was  declared using
     "va_list  arg_ptr"; the <type> indicates  which type of object we
     want  to get (in our case, this may  be an "int", a "float", or a
     "char *", depending on which format descriptor we're on).

   * Finally, at the end, we call

        va_end (arg_ptr);

     to do whatever stack cleanup needs to be done.

This  method  is  guaranteed  (heh,  heh)  to  be portable  across all
implementations  of  the Draft ANSI Standard.  (Again, note that since
the  Standard is only Draft,  many existing implementations might have
no such facility or a slightly different one.) Note its advantages and
disadvantages:

   *  You  can  now portably access the  optional parameters, and even
     easily access them as elements of an array by making a WHILE loop
     that calls VA_ARG and sticks the results into a local array.

   *  You  can  specify  that  some  of  the procedure  parameters are
     required,  thus letting the compiler check that every call to the
     procedure contains at least those parameters.

   *  On the other hand, there's still  no way of figuring out exactly
     how many parameters were passed to you -- you have to rely on the
     user's  telling  you  this,  either  as an  explicit parameter or
     implicitly  (such as using a format  string from which the number
     of parameters can be deduced).

   *  You  can't  have  a  procedure  where all of  the parameters are
     optional.

   *  You still can't have a procedure where a parameter in the MIDDLE
     of a parameter list can be omitted (e.g. "P (1,,3,,,6)").

   *  Accessing parameters that are simply optional is somewhat harder
     than  in SPL, since you have to get them using VA_ARG rather than
     referring to them by name, to wit:

        create (char *progname, ...)
        {
        va_list ap;
        char *entryname;
        int *pin, param, flags, stack, dl, maxdata, pri, rank;
        va_start (ap, progname);
        entryname = va_arg (ap, char *);
        pin = va_arg (ap, int *);
        param = va_arg (ap, int);
        flags = va_arg (ap, int);
        stack = va_arg (ap, int);
        dl = va_arg (ap, int);
        maxdata = va_arg (ap, int);
        pri = va_arg (ap, int);
        rank = va_arg (ap, int);
        ...
        }

     As   you  see,  you  have  to  specially  extract  each  optional
     parameter, rather than just being able to access it directly like
     you can in SPL.


  PASSING VARIABLE NUMBERS OF PARAMETERS TO PROCEDURES -- PASCAL/XL

   PASCAL/XL's  support for procedures  with optional parameters seems
to be really nice.

   One mechanism that PASCAL/XL provides is "OPTION DEFAULT_PARMS".

   PROCEDURE P (A, B, C, D, E, F: INTEGER)
             OPTION DEFAULT_PARMS (A:=11, C:=22, E:=55, F:=66);
   BEGIN
   ...
   END;

This tells PASCAL/XL several things:

   *  In any call to P, the first  (A), third (C), fifth (E), or sixth
     (F) parameters may (or may not) be omitted. Thus, we can say:

        P (, 22, ,44);    { omitting A, C, E, and F }
        P (11, 22, 33, 44);   { omitting only E and F }
        P (, 22, 33, 44, , 66);   { omitting A and E }

     or   any   such   combination.  Only  the  parameters  without  a
     DEFAULT_PARMS declaration -- B and D -- must be specified.

   * If any of A, C, E, and F are omitted, then when P tries to access
     them, it will get their default values instead. To the procedure,
     the  parameter  will  look exactly as if  it was specified as the
     default value.

   *  HOWEVER,  the  procedure  can  (if  it wants to)  determine if a
     parameter was ACTUALLY passed by saying something like

        IF HAVEOPTVARPARM(C) THEN
          { C was actually passed }
        ELSE
          { we're using C's default value };

     "HAVEOPTVARPARM(X)"  simply  returns  TRUE  if  parameter  X  was
     actually  passed  to the procedure, and  FALSE if parameter X was
     not passed and X's value was simply defaulted.

Thus, we get the best of both worlds:

   * You can specify a default value, so if the procedure wants to, it
     can just see the parameter value as the default.

   *  If  you  need  to,  you can still find  out if the parameter was
     REALLY specified.

   *  Since the compiler knows which parameters are optional and which
     are  required, it can make sure that the required ones are really
     specified  (unlike  SPL,  in  which  any parameters  of an OPTION
     VARIABLE procedure may be omitted without an error).

   Now, interestingly enough, PASCAL/XL also has a different mechanism
to achieve a similar goal. You can also say

   PROCEDURE P (A, B, C, D, E)
             OPTION EXTENSIBLE 3;

What  this  means is that all parameters after  the first 3 -- in this
case,  D  and  E  -- are optional. You  can actually combine this with
DEFAULT_PARMS  to set default values for these "extension" parameters,
or  even  set default values for  the "non-extension" parameters, thus
making them optional, too.

   Practically speaking, saying

   PROCEDURE P (A, B, C, D, E: INTEGER)
             OPTION DEFAULT_PARMS (D:=NIL, E:=NIL);

will  achieve  pretty  much  the  same  goal  (making  both  D  and  E
extensible).  The  advantage  of  EXTENSIBLE parameters  is that their
implementation   allows  you  to  add  new  parameters  to  an  OPTION
EXTENSIBLE  procedure WITHOUT HAVING TO RE-COMPILE ANY OF ITS CALLERS!
Thus,  if  HP had written the CREATE  intrinsic in PASCAL/XL, it could
have said:

   PROCEDURE CREATE (VAR PROGRAM: STRING;
                     VAR ENTRY: STRING;
                     VAR PIN: INTEGER;
                     PARM, FLAGS, STACK, DL, MAXDATA,
                       PRI, RANK: INTEGER)
                 OPTION EXTENSIBLE 3
                        DEFAULT_PARMS (ENTRY:="");

This  would  have  made  PROGRAM  and  PIN required and  all the other
parameters optional -- ENTRY because of the DEFAULT_PARMS and the rest
because  of  the  OPTION EXTENSIBLE. Then, if  HP wanted to add STDIN,
STDLIST,   and  INFO  parameters,  it  could  have  just  changed  the
definition of CREATE to:

   PROCEDURE CREATE (VAR PROGRAM: STRING;
                     VAR ENTRY: STRING;
                     VAR PIN: INTEGER;
                     PARM, FLAGS, STACK, DL, MAXDATA,
                       PRI, RANK: INTEGER;
                     VAR STDIN, STDLIST, INFO: STRING)
                 OPTION EXTENSIBLE 3
                        DEFAULT_PARMS (ENTRY:="");

Then,  ALL  OF  THE  PROGRAMS THAT WERE COMPILED  REFERRING TO THE OLD
CREATE  WOULD  STILL  WORK!  You  can add new  parameters to an OPTION
EXTENSIBLE   procedure   without   causing  any  incompatibility  with
previously compiled callers.


                               SUMMARY

   Thus,  to  summarize  the  differences  in  the  ways  the  various
compilers  handle  optional  parameters  and procedures  with variable
numbers  of parameters: ["STD PAS" includes Standard PASCAL, ISO Level
1 Standard, and PASCAL/3000]

                                            STD  PAS  K&R  STD
                                       SPL  PAS  /XL  C    C

CAN YOU HAVE A PROCEDURE               YES  NO   YES  YES  YES
  WITH OPTIONAL PARAMETERS?

IS SUCH A PROCEDURE                    N/A       N/A  NO   YES
  DEFINITION PORTABLE?

CAN YOU OMIT PARAMETERS IN             YES       YES  NO   NO
  THE MIDDLE OF A CALL?

CAN THE FIRST PARAMETER OF A           YES       YES  YES  NO
  FUNCTION BE OPTIONAL?

CAN YOU DETECT IF A PARAMETER          YES       YES  NO   NO
  WAS REALLY PASSED OR NOT?

CAN YOU SPECIFY SOME                   NO        YES  NO   YES
  PARAMETERS AS REQUIRED?

CAN YOU SPECIFY DEFAULT VALUES         NO        YES  NO   NO
  FOR OPTIONAL PARAMETERS?

CAN YOU ADD NEW PARAMETERS             NO        YES  NO   NO
  WITHOUT RE-COMPILING ALL CALLERS?

CAN YOU ACCESS PARAMETERS "BY          YES       NO   YES  YES
  NUMBER" AS WELL AS "BY NAME"?
  [This refers to the example we showed
   were we wanted to reference, say, the
   last 10 parameters as elements of an
   array rather as P1, P2, P3, ..., P10]


                   PROCEDURE AND FUNCTION VARIABLES

   Say that you write a B-Tree handling package. B-Trees, as you know,
is the kind of data structure that KSAM is built on; they allow you to
easily find records either by key or in sequential order. Thus, if you
store  your  data  in  a B-Tree, you can,  for instance, find a record
whose  key starts with "JON", even though you don't know the exact key
value.

   Now,  you're  a sophisticated programmer, and  you know how to deal
with  this  sort of thing. You define  a record structure type called,
say,  BTREE_HEADER_TYPE, that contains the  various pointers that your
B-Tree   handling  procedures  need,  and  then  write  the  following
procedures:

   PROCEDURE BTREE_CREATE (VAR B: BTREE_HEADER_TYPE);
   ...
   PROCEDURE BTREE_ADD (VAR B: BTREE_HEADER_TYPE;
                        K: KEY_TYPE; REC: RECORD_TYPE);
   ...
   FUNCTION BTREE_FIND (VAR B: BTREE_HEADER_TYPE; K: KEY_TYPE):
                       RECORD_TYPE;
   ...

You  get  the drift -- you have all  these routines, to which you pass
the  appropriate  data,  and  between  them,  they  process  the  data
structure. No problem.

   Now,  we  said  that  the B-tree allows you  to retrieve records in
"sorted  order".  Sorted  how?  If the key is  a string, you'd want it
sorted  alphabetically;  however, what if the key  is an integer? Or a
floating  point number? Comparing two strings is a different operation
from comparing integers or floating point numbers.

   Now,  you  can  write a different set  of routines for B-Trees with
string  keys,  B-Trees with integer keys,  B-Trees with packed decimal
keys,  etc. You can, but of course  you wouldn't want to duplicate the
code.  Assume  for  a  moment  that  you can get  around PASCAL's type
checking  so that you can pass  an arbitrary-type key to the BTREE_ADD
and  BTREE_FIND routines; how do you  make sure that the procedures do
the appropriate comparisons for the various types?

   Well, one possibility is this:

   *  Have  a  field  in  the BTREE_HEADER_TYPE  data structure called
     "COMPARISON_TYPE".

   * Have BTREE_CREATE take a parameter indicating the comparison type
     (string,  integer,  float, packed, etc.) needed;  it can then put
     this type into the COMPARISON_TYPE field.

   *   Each  BTREE_ADD  and  BTREE_FIND  call  will  interrogate  this
     COMPARISON_TYPE  field,  and  do the  appropriate comparison; for
     instance,

   PROCEDURE BTREE_ADD (VAR B: BTREE_HEADER_TYPE;
                        K: KEY_TYPE; REC: RECORD_TYPE);
   BEGIN
   ...
   IF B.COMPARISON_TYPE=STRING_COMPARE THEN
     COMP_RESULT:=STRCOMPARE (K, CURRENT_KEY)
   ELSE IF B.COMPARISON_TYPE=INT_COMPARE THEN
     COMP_RESULT:=INTCOMPARE (K, CURRENT_KEY)
   ELSE IF B.COMPARISON_TYPE=FLOAT_COMPARE THEN
     COMP_RESULT:=FLOATCOMPARE (K, CURRENT_KEY)
   ELSE IF B.COMPARISON_TYPE=PACKED_COMPARE THEN
     COMP_RESULT:=PACKEDCOMPARE (K, CURRENT_KEY);
   ...
   END;

Depending  on  the  COMPARISON_TYPE field value,  BTREE_ADD can do the
right thing.

   The trouble with this approach though, is quite obvious. What if we
(like  KSAM) support more than just these  four types? What if we need
to  add zoned decimal support -- will we have to change BTREE_ADD (and
BTREE_FIND  and whatever other procedures do this)? What if we need to
add  an  EBCDIC  collating  sequence?  We  want  to  allow  the B-Tree
package's  USER  to define his own  comparison types without having to
change the source code of the package itself.

   In other words, we don't just want to let the user pass us a value,
like  a  record structure or an integer. We  want to let a user pass a
PIECE  OF  CODE,  in this case the code  that would do the comparison.
Then,  instead  of having a big IF  (or CASE) statement, our BTREE_ADD
and  BTREE_FIND procedures can simply call the code that was passed to
them to do the comparison.

   PASCAL, of course, has a facility for doing this (as do SPL and C).
PASCAL  lets  you  declare  a  parameter  to be of  type PROCEDURE (or
FUNCTION),  and then call that parameter. A good example of this might
be the following procedure:

   FUNCTION NUMERICAL_INTEGRATION (FUNCTION F(PARM:REAL): REAL;
                                   START, FINISH, INCREMENT: REAL):
                                   REAL;
   VAR X, TOTAL: REAL;
   BEGIN
   X:=START;
   TOTAL:=0.0;
   WHILE X<FINISH DO
     BEGIN
     TOTAL:=TOTAL+F(X)/((FINISH-START)/INCREMENT);
     X:=X+INCREMENT;
     END;
   NUMERICAL_INTEGRATION:=TOTAL;
   END;

(And you thought you'd never have to see this sort of thing again once
you  finished college!) This procedure takes a function as a parameter
(a  function  that  itself  takes one parameter),  and then calls that
function  several  times.  The NUMERICAL_INTEGRATION  procedure itself
might be called thus:

   X:=NUMERICAL_INTEGRATION (SQRT, 0.0, 10.0, 0.01);

This,  as you see, passes it the  procedure "SQRT" as a parameter. The
same sort of thing, incidentally, can easily be done in SPL:

   REAL PROCEDURE NUMERICAL'INTEGRATION (F, START, FINISH, INC);
   VALUE START, FINISH, INC;
   REAL PROCEDURE F;
   REAL START, FINISH, INC;
   ...

or in C:

   float numerical_integration (f, start, finish, inc)
   real *f();
   real start, finish, inc;
   ...

And,  of  course,  this  is  the  very  sort of thing  that we'd do to
implement our BTREE_ADD and BTREE_FIND:

   PROCEDURE BTREE_ADD (VAR B: BTREE_HEADER_TYPE;
                        K: KEY_TYPE; REC: RECORD_TYPE;
                        FUNCTION COMP_ROUTINE (K1, K2: KEY_TYPE):
                          BOOLEAN);
   ...
   FUNCTION BTREE_FIND (VAR B: BTREE_HEADER_TYPE; K: KEY_TYPE;
                        FUNCTION COMP_ROUTINE (K1, K2: KEY_TYPE):
                          BOOLEAN):
                       RECORD_TYPE;
   ...

These declarations, as you see, indicate that both of these procedures
expect a parameter that is itself a function (which takes two keys and
returns a boolean). A typical call might thus be:

   BTREE_ADD (BHEADER, K1, R1, STRCOMPARE);

or

   R:=BTREE_FIND (BHEADER, K, MY_OWN_EBCDIC_COMPARE_ROUTINE);

or whatever else the user wants to do.

   Now,  this paper purports to be a comparison between PASCAL, C, and
SPL,  but so far we've only discussed  a feature that exists -- and is
virtually identical -- in all three languages. What's the difference?

   Well,  note  that  we  demanded  that the user  pass the comparison
routine   in  every  call  to  one  of  the  BTREE_ADD  or  BTREE_FIND
procedures.  In turn, if a procedure called by BTREE_ADD or BTREE_FIND
needs  to  call  the comparison routine,  then BTREE_ADD or BTREE_FIND
must  pass  that  procedure  the  comparison  routine,  too.  This  is
cumbersome  and  also  error-prone  --  what  if  the user  passes one
procedure for BTREE_ADD and another for BTREE_FIND?

   The  logical  solution  seems  to be to pass  the procedure once to
BTREE_CREATE, i.e.

   BTREE_CREATE (BHEADER, STRCOMPARE);

and  then  have  the address of the  procedure stored somewhere in the
BHEADER  record  structure.  Then,  when  BTREE_ADD  needed  to  do  a
comparison, it would say something like:

   COMP_RESULT:=BHEADER.COMPARE_ROUTINE (K, CURRENT_KEY);

This  makes  more  sense.  After  all, even KSAM  only requires you to
specify  the  key  comparison  sequence at file  creation time, not on
every intrinsic call.

   Examples of this sort of thing are plentiful:

   *  Trap routines (just like in  MPE). MPE's XCONTRAP intrinsic, for
     instance,  expects  you  to  pass  it  a  procedure  (actually, a
     procedure's   address);  it  saves  this  address  in  a  special
     location, and then when control-Y is hit, calls this procedure.

     Similarly,  let's say you're writing a package for packed decimal
     arithmetic. You want to have a procedure called

        PROCEDURE SET_PACKED_TRAP (PROCEDURE TRAP_ROUTINE);

     which  will  set  some  global  variable  to the  value passed as
     TRAP_ROUTINE.  Then, whenever your procedure detects some kind of
     packed  decimal arithmetic error, it'll call whatever routine was
     set up as the trap routine. That way, the user will be able to do
     what  he  pleases;  he  can  set  the  trap routine  to abort the
     program, to print an error message, to return a default result --
     whatever.

   *  Say  that you have various procedures  that do certain things --
     build  temporary  files,  set up locks, buffer  I/O, etc. -- that
     require  special  processing  when  the program  is terminated. A
     classic  example  of  this is buffering your  output to a file in
     order to do fewer physical I/Os. If the program somehow dies, you
     want to be able to flush all the buffered data to the file rather
     than having it get lost.

     What  you'd  like to do is have  a procedure called ONEXIT, which
     would  take  a single procedure parameter.  Then, if your process
     dies,  the  system  would  know  enough  to call  this procedure,
     letting  it do whatever cleanup you want. For instance, you might
     say

       ONEXIT (FLUSH_BUFFERS);

     to   tell  the  system  to  call  FLUSH_BUFFERS  if  the  program
     terminates for any reason; you might also want to say

       ONEXIT (RELEASE_SIRS);

     so  that  the  system  will  release  any  SIRs  (System Internal
     Resources)  that you may have acquired.  In fact, you want ONEXIT
     to keep what is essentially an array of procedures:

       VAR ONEXIT_PROCS: ARRAY [1..100] OF PROCEDURE;

     Then, the system terminate routine will say:

       FOR  I:=1 UNTIL NUM_ONEXIT_PROCS DO ONEXIT_PROCS[I]();   { call
         the Ith procedure }

   *  The file system, for instance, has  to keep track of a number of
     different  file  types  --  standard  files, message  files, KSAM
     files, remote files, and so on. Although they all look like files
     to  the  user,  the various routines that  read them, write them,
     close them, etc. are quite different. A possible design for, say,
     the file control block might be:

       RECORD
       FILENAME: PACKED ARRAY [1..36] OF CHAR;
       FILE_READ_ROUTINE: PROCEDURE (...);
       FILE_WRITE_ROUTINE: PROCEDURE (...);
       FILE_CLOSE_ROUTINE: PROCEDURE (...);
       ...
       END;

     Then, the FREAD intrinsic might simply say

       FCB.FILE_READ_ROUTINE (FNUM, BUFFER, LENGTH, ...);

     thus calling the file read routine pointed to by the file control
     block   (this   might   be   one  of  FREAD_STANDARD,  FREAD_MSG,
     FREAD_KSAM,  etc.) -- this field would have been set by the FOPEN
     call.

   These  are  just  examples to convince you  that it makes sense not
just  to  be able to pass procedures  and functions as parameters, but
also have variables that "contain" procedures and functions, or rather
pointers to them.

   This  is  where  the  three  languages differ.  Standard PASCAL and
PASCAL/3000 have no such feature. There simply is no way of either

   *  declaring  a  variable  to be of type  "pointer to a function or
     procedure";

   * setting a variable to point to a particular function/procedure;

   * or calling a procedure/function pointed to by a variable;

None  of the three above examples can be implemented in Standard or HP
PASCAL.

   C, on the other hand, does support this feature. In C we might say:

   typedef struct {...
                   int (*comp_proc)();
                   ...
                   } btree_header_type;

thus  declaring  comp_proc to be a field  pointing to a procedure that
returns an integer. Then, BTREE_OPEN might read like:

   btree_open (b, proc)
   btree_header_type b;
   int (*proc)();
   {
   ...
   b.comp_proc = proc;
   ...
   }

and BTREE_ADD might say

   btree_add (b, k, rec)
   btree_header_type b;
   int k[], rec[];
   {
   ...
   comp_result = (*b.comp_proc) (k, current_key);
   ...
   }

As you see, we simply use "(*b.comp_proc)" -- "the thing pointed to by
the  COMP_PROC  subfield of record B" in  place of a procedure name; C
will  then  call  this  procedure,  passing  it  the parameters  K and
CURRENT_KEY (of course, doing no parameter checking).

   Similarly,  our  ONEXIT  routine  (which, by the way,  I think is a
singularly useful sort of procedure, and one that Draft Standard C has
defined in its Standard Library) might look like this:

   int (*exit_routines[100])();
   int num_exit_routines = 0;

   onexit (proc)
   int (*proc)();
   {
   exit_routines[num_exit_routines] = proc;
   num_exit_routines = num_exit_routines + 1;
   }

   terminate ()
   int i;
   {
   ...
   for (i=0; i<num_exit_routines; i++)
     (*exit_routines[i]) ();    /* call the Ith exit routine */
   ...
   }

Clean and simple.

   SPL's  solution to this problem is somewhat dirtier. SPL can do it,
because  SPL -- with its TOSes and  ASSEMBLEs -- can do anything; but,
it can't do it very cleanly.

   In  SPL,  what you'd do is  save the procedure's address (actually,
its  plabel, but for our purposes that's the same thing) in an integer
variable. Then, you'd use an ASSEMBLE statement to call the procedure.

   INTEGER ARRAY EXIT'ROUTINES(0:99);
   INTEGER NUM'EXIT'ROUTINES:=0;

   PROCEDURE ONEXIT (PROC);
   PROCEDURE PROC;
   BEGIN
   EXIT'ROUTINES(NUM'EXIT'ROUTINES):=@PROC;
   NUM'EXIT'ROUTINE:=NUM'EXIT'ROUTINES+1;
   END;

   PROCEDURE TERMINATE;
   BEGIN
   INTEGER I;
   ...
   FOR I:=0 UNTIL NUM'EXIT'ROUTINES-1 DO
     BEGIN
     TOS:=EXIT'ROUTINES(I);
     ASSEMBLE (PCAL 0);   << call the routine whose addr is on TOS >>
     END;
   ...
   END;

If  you had to pass and/or receive parameters from this procedure, the
code would be even uglier. To pass, for instance, the integer arrays K
and  CURRENT'KEY  and to receive a result  to be put into COMP'RESULT,
you'd have to say:

   TOS:=0;   << room for the result >>
   TOS:=@K;
   TOS:=@CURRENT'KEY;
   TOS:=COMP'ROUTINE'PLABEL;  << the plabel of the routine to call >>
   ASSEMBLE (PCAL 0);
   COMP'RESULT:=TOS;   << get the return value >>

Ugly,  but  possible  -- more than can be  said for Standard PASCAL or
PASCAL/3000.

   PASCAL/XL,  on the other hand, has  a solution rather comparable to
that  of  C's  --  better,  if  you  generally prefer PASCAL  to C. In
PASCAL/XL,  you could define a "procedural" or "functional" data type,
to wit:

   TYPE EXIT_PROC = PROCEDURE;  << no parms, no result >>
        COMPARE_PROC = FUNCTION (K1, K2: KEY_TYPE): BOOLEAN;

The declaration is much like what you'd put into a procedure header if
you want a parameter to be a procedure or function; however, this kind
of  type allows your variables to be procedure/function pointers, too.
Thus, you'd write ONEXIT as:

   VAR EXIT_ROUTINES: ARRAY [1..100] OF EXIT_PROC;
       NUM_EXIT_ROUTINES: 0..100;

   PROCEDURE ONEXIT (P: EXIT_PROC);
   BEGIN
   EXIT_ROUTINES[NUM_EXIT_ROUTINES]:=P;
   NUM_EXIT_ROUTINES:=NUM_EXIT_ROUTINES+1;
   END;

   PROCEDURE TERMINATE;
   VAR I: 0..100;
   BEGIN
   ...
   FOR I:=1 UNTIL NUM_EXIT_ROUTINES DO
     CALL (EXIT_ROUTINES[I]);  { CALL is a special construct }
   ...
   END;

Similarly, our BTREE_ADD procedure would be:

   PROCEDURE BTREE_ADD (VAR B: BTREE_HEADER_TYPE;
                        K: KEY_TYPE; REC: RECORD_TYPE);
   ...
   BEGIN
   ...
   COMP_RESULT:=FCALL (B.COMP_ROUTINE, K, CURRENT_KEY);
   ...
   END;

As  you  see,  "CALL  (proc,  parm1, parm2, ...)"  calls the procedure
pointed to by the variable "proc", passing to it the given parameters.
Similarly, "FCALL (proc, parm1, parm2, ...)" calls a function.

   To summarize, then:

   *  Procedure  and  function  variables --  though apparently rather
     obscure -- can actually be very useful.

   * Standard PASCAL and PASCAL/3000 supports procedures and functions
     as parameters, but not as variables; this is rather inadequate.

   *  SPL  supports  procedure/function  variables,  but  in  a rather
     "dirty"  fashion,  which  is  clumsy  and  uses  many  TOSes  and
     ASSEMBLEs.

   * C and PASCAL/XL have very clean support for this nifty feature.


                              C #DEFINEs

   One  thing in which C may be quite  a bit superior to PASCAL is the
#define  construct.  It  can have serious  performance advantages, and
also avoid duplication of code in cases where ordinary procedures just
don't do the job.

   The  #define  is  a  simple  macro  facility. References  to it get
expanded  into C code that is compiled in place of its invocation.  In
other words, saying

   #define square(x) ((x)*(x))
   ...
   printf ("%d %d\n", a, square(a));

is identical to

   printf ("%d %d\n", a, ((a)*(a)));

Other useful defines may include

   #define min(x,y) (((x)<(y)) ? (x) : (y))
   #define max(x,y) (((x)>(y)) ? (x) : (y))
   #define push(val,stackptr) *(stackptr++) = (val)

and  so on. Note that this is in  the same spirit as SPL DEFINEs -- in
fact,  if  you don't have any parameters,  C #define's and SPL DEFINEs
are  one and the same --  but allows parameterization, which increases
the power immeasurably.

   One  question  that  instantly comes to mind  is: how are #define's
better than FUNCTIONs?

   *  First of all, on any computer, there is an overhead in PROCEDURE
     and FUNCTION calls. For instance, an HP PASCAL program running on
     a  Mighty  Mouse  took  about 140 milliseconds  to execute 10,000
     calls  to a parameter-less PROCEDURE,  and about 250 milliseconds
     to do 10,000 calls to a PROCEDURE with 3 parameters.

     Of  course, this isn't a very large amount of time, and certainly
     isn't  bad  enough to convince me  to stop writing procedures and
     repeat portions of code several times in my program. Still, it is
     enough  to give one pause in cases where performance is critical;
     procedure calls are frequent enough that the total overhead piles
     up.

     #DEFINEs  allow  us  to avoid code repetition  without any of the
     overhead  of  procedure  calls.  For small,  very frequently used
     procedures, they can be a very good solution.

   *  A #DEFINE can replace  anything, including declarations, control
     structures,  etc.  For  instance, say that you  think the C "for"
     loop  is  too  complicated,  and  you'd  like to be  able to do a
     PASCAL-like "FOR". You could say

        #define loop(var,start,limit) \
        for ((var) = (start);  (var) <= (limit);  (var)=(var)+1)

      Then,

        loop (x, 1, 100)
          printf ("%d %d %d\n", x, x*x, x*x*x);

      would mean the same thing as

        for (x = 1; x <= 100; x=x+1)
          printf ("%d %d %d\n", x, x*x, x*x*x);

      This  use of #DEFINE, however, is more than just a sop to people
      who  are  dissatisfied  with C terminology and  want to make it
      look  like  PASCAL. The fact that  #DEFINEs directly expand into
      source code rather than just calling functions can be used to:

        -  Have  operations  that  work with  arbitrary datatypes. For
          instance,  our  "min"  define  will  work  equally  well for
          "int"s,  for  "float"s, for "long"s,  etc., since it expands
          into a "<" comparison, which works for all those types.

        -  Define  objects that can be stored  into as well as fetched
          from.  For  instance,  if for some reason  you keep all your
          arrays in 1-dimensional format, you can say

            #define element(array,rowsize,row,col) \
                    array[rowsize*row+col]
            ...
            x = element (a, numcols, rnum, cnum);
            ...
            element (a, numcols, rnum, cnum) = x;

          Since  you  can't  assign  anything to a  function call, you
          couldn't do this if ELEMENT were a function; but, since it's
          a  macro, this ends up being a simple assignment to an array
          element.

        -   Have   #define's  that  define  procedures.  Consider  the
          following mysterious creature:

            #define defvectorop(funcname,type,op) \
            funcname(vect1,vect2,rvect,len) \
            type vect1[], vect2[], rvect[]; \
            int len; \
            { \
            int i; \
            for (i = 0; i<len; i++) \
                rvect[i] = vect1[i] op vect2[i]; \
            }

          This  #define  allows  you to easily  define procedures of a
          certain format -- to wit, those that operate element-wise on
          two  arrays (of a given type) to generate a third array. For
          instance,

            defvectorop(intmult,int,*)
            defvectorop(intadd,int,+)
            defvectorop(floatadd,float,+)

          will  define  three  functions that,  respectively, multiply
          vectors of integer, add vectors of integers, and add vectors
          of floats. A call to

            intmult(x1,x2,y,10);

          will  set elements y[0] through  y[9] to x1[0]*x2[0] through
          x1[9]*x2[9].

        -  Finally,  using some even weirder  constructs, you can deal
          with  "families" of variables which  are identified by their
          similar  names. For instance, say your convention is that if
          your  "queue"  data  structure is stored  in an array called
          "x",  then the header pointer is stored in a variable called
          "x_head" and the tail pointer is stored in a variable called
          "x_tail". A typical macro might look like

            #define queueprint (queuename) \
            printf ("Head = %d, Tail = %d\n", \
                    queuename ## _head, queuename ## _tail); \
            print_array_data (queuename);

          Note that with the special "##" macro operator, we can have

            queueprint (myqueue)

          expand into

            printf ("Head = %d, Tail = %d\n",
                    myqueue_head, myqueue_tail);
            print_array_date (queuename);

          thus  deriving  the  head  and tail variable  names from the
          queue  variable name -- clearly something we can't do with a
          procedure.

   To  summarize, the primary advantages  of #define's are performance
and   the   additional   flexibility   that  comes  with  direct  text
substitution.


                     PASCAL/XL INLINE PROCEDURES

   SPL,  of course, has DEFINEs, but they don't support parameters and
thus  are severely limited. Standard PASCAL and HP PASCAL have nothing
like DEFINEs or #define's, either for performance's or functionality's
sake.  PASCAL/XL,  however,  has  a rather  interesting feature called
"INLINE".

   In HP PASCAL, you can write a procedure like this:

   FUNCTION MIN (X, Y: INTEGER): INTEGER
     OPTION INLINE;
   BEGIN
   IF X<Y THEN MIN:=X ELSE MIN:=Y;
   END;

What  does  the "OPTION INLINE" keyword  do? It commands the compiler:
whenever  a MIN is seen, physically  INCLUDE the code of the procedure
instead of simply compiling a procedure call instruction.

   This is, of course, done for performance's sake -- to save the time
it  would  take  to  do  that  procedure  call. This  is thus somewhat
comparable  to  C's #define's. It isn't as  flexible -- MIN is still a
procedure, with fixed parameter types, and so on -- but can be as fast
(or almost as fast).

   Actually,  I  wouldn't be surprised if  INLINE procedure calls were
still  somewhat slower than #define's (although they don't have to be,
if the compiler is really smart). On the other hand, INLINE procedures
have some advantages:

   *  Since the compiler treats them  like true procedures, it'll make
     sure that

       MIN(A,F(B))

     won't evaluate F(B) twice (like our C #define would).

   *  Since  these are real procedures,  we're no longer restricted by
     the  rules  about  what can and can't  go into an expression. For
     instance, the procedure

       FUNCTION FIND_NON_SPACE (X: STRING): INTEGER
         OPTION INLINE;
       VAR I: INTEGER;
       BEGIN
       I:=0;
       WHILE I<STRLEN(X) AND X[I]=' ' DO I:=I+1;
       FIND_NON_SPACE:=I;
       END;

     can't  be written as a C #define,  since C does not allow "while"
     loops   inside  expressions.  Similarly,  we  can  declare  local
     variables and so forth.

   In  short,  while PASCAL/XL INLINE procedures  are in some respects
not  quite  as flexible as C #define's,  they might be as efficient or
almost  as efficient, and even more flexible in their own way. Whether
or  not  they really work depends on how  good a job PASCAL/XL does of
optimization. If, for instance, it expands

   A:=MIN(B,C)

into

   TEMPPARM1:=B;
   TEMPPARM2:=C;
   IF TEMPPARM1<TEMPPARM2 THEN
     RESULT:=TEMPPARM1
   ELSE
     RESULT:=TEMPPARM2;
   A:=RESULT;

then  this  won't be a big savings. If,  on the other hand, it's smart
enough to generate

   IF B<C THEN
     A:=B
   ELSE
     A:=C;

then, of course, it'll be every bit as efficient as the C #define.

   Remember, though:

   * INLINE procedures can only be used in the same kind of context in
     which an ordinary procedure is used; i.e., you can't define a new
     type of looping construct, declaration, etc.

   *  INLINE  procedures  are  available  only  in  PASCAL/XL,  not in
     Standard PASCAL or even PASCAL/3000.

   *  You rely (like you always  do) on the compiler's intelligence in
     generating  efficient  code.  When  you  have  MIN  defined  as a
     #define,  you  KNOW  that the compiler  will generate EXACTLY the
     same code for

        min(x,y)

     and

        (x<y) ? x : y

     This  will  probably  be  one test, two  branches, and some stack
     pushes.  On  the  other  hand, a call to  an INLINE MIN procedure
     might  do  exactly  the  same thing -- or,  it might also build a
     stack  frame,  allocate  local variables, etc.,  taking almost as
     much time as an ordinary non-INLINE call.


                        POINTERS: WHAT AND WHY

   One major feature that C emphasizes more than PASCAL is support for
POINTERS.  These  creatures  --  available in SPL as  well as C -- are
often  very  powerful  mechanisms, but they have  been also accused of
making  programs  very  difficult to read.  I can't really objectively
comment  on the readability aspect,  but some discussions of pointers,
their advantages and disadvantages, is in order.


            APPLICATION #1: DYNAMICALLY ALLOCATED STORAGE

   If  you  declare  some  variable in SPL, C,  or PASCAL, what you're
really  declaring  is  a  chunk  of  storage. If you  declare a global
variable,  the  storage  is  allocated  when  you run  the program and
deallocated when the program is done; if you declare a local variable,
the  storage  is allocated when you  enter a procedure and deallocated
when the procedure is exited.

   What  if  you  want  to  declare  storage  that  is  allocated  and
deallocated in some other way?

   For  instance,  MPEX  (or, for that matter,  MPE) needs to read all
your  UDC  files  and  keep a table indicating  which UDC commands are
defined  in  which file and at which  record number. This table can be
any  size  from  0  bytes  (no UDCs) to thousands  of bytes. How do we
allocate it?

   The trouble is that we don't know how large the UDC dictionary will
be, so we can't really declare it either as a local or global variable
(in  SPL, local arrays can be of variable size, but the UDC dictionary
has to be global anyway). What we need to be able to do is DYNAMICALLY
ALLOCATE  IT  in  the  READ_UDC_FILES  procedure  -- somehow  tell the
computer  "I  need X (a variable) bytes of  storage now, and I want to
view it as an object of type so-and-so (say, an array of records)".

   Now, there are two issues involved here:

   * WE NEED A MECHANISM FOR DYNAMICALLY ALLOCATING STORAGE.

   * WE NEED A WAY OF REFERRING TO THIS STORAGE ONCE IT'S ALLOCATED.

The  need for a dynamic allocation procedure (e.g. PASCAL's NEW, SPL's
DLSIZE,  or  C's  CALLOC)  is  obvious; but, the  need for a reference
mechanism  is equally important! After all, we can't very well declare
our UDC dictionary as

   VAR UDC_DICT: ARRAY [1..x] OF RECORD ...;

Our  very  point  is that we don't know the  size of the array, and we
DON'T  WANT  THE  COMPILER  TO  ALLOCATE IT FOR US,  which is what the
compiler has to do if it sees an array declaration.

   What  we have to do is to  declare UDC_DICT as a POINTER. A pointer
is an object that can be accessed in one of two modes:

   *  In  one  mode,  it looks EXACTLY like an  object of a given type
     (say, an array, a record, a string, etc.). It can be assigned to,
     it  can have its fields extracted,  etc. If UDC_DICT is a pointer
     to an ARRAY of RECORDs, we could say

        UDC_DICT^[UDCNUM].NAME:=UDC_NAME_STRING;

     and assign something to the NAME subfield of the UDCNUMth element
     of this ARRAY of RECORDs.

   *  In  another  mode,  it  is essentially an  ADDRESS, which can be
     changed to make the pointer point to (theoretically) an arbitrary
     location in memory. When we say

        NEW (UDC_DICT);

     we  don't  really  pass an ARRAY of  RECORDs to NEW (remember, no
     such  array  has been allocated yet);  rather, we pass a variable
     that  will  be  set by NEW to a  MEMORY ADDRESS, the address of a
     newly-allocated array of records that can later be accessed using
     "UDC_DICT^".

This  two-fold  nature  is  the key aspect of  pointers -- they can be
viewed  as  ordinary  pieces  of  data,  OR they can  be viewed as the
addresses  of data, and thus changed to "point" to arbitrary locations
in memory.

   The  reason  why  I  gave  this  definition  in  the  context  of a
discussion  on "dynamic memory allocation" is that with dynamic memory
allocation, pointers are NECESSARY. If all your data is kept in global
and  local variables, you might never  need to use pointers, since all
the  data can be accessed by  directly referring to the variable name.
On  the  other  hand,  if you use things like  NEW or CALLOC, you must
refer to the dynamically allocated data using pointers.

   Let's  take  another  example:  We're  building  a Multi-Processing
Executive system. This'll be an eXtra Large variety, by the way, so we
might  call  it  MPE/XL  for  short.  This  system  will have  lots of
PROCESSES; each process has to have a lot of data kept about it.

   It makes a lot of sense for us to declare a special type of record:

   TYPE PROCESS_INFO_REC =
          RECORD
          PROGRAM_NAME: PACKED ARRAY [1..80] OF CHAR;
          CURRENT_PRIORITY: INTEGER;
          TOTAL_MEMORY_USED: INTEGER;
          FATHER_PROCESS: ???;
          SON_PROCESSES: ARRAY [1..100] OF ???;
          END;
                      { Now, declare a pointer to this type }
        PROCESS_PTR = ^PROCESS_INFO_REC;

[Let's  not talk about whether or not  this is a good design.] Now, we
can have a procedure to create a new process:

   FUNCTION CREATE_PROCESS (PROGNAME: PROGNAME_TYPE): PROCESS_PTR;
   VAR NEW_PROC: PROCESS_PTR;
   BEGIN
   NEW (NEW_PROC);
   NEW_PROC^.PROGRAM_NAME:=PROGNAME;
   NEW_PROC^.TOTAL_MEMORY_USED:=1234;
   ...
   CREATE_PROCESS:=NEW_PROC;
   END;

Note  what this procedure returns -- it returns a POINTER to the newly
allocated  PROCESS  INFORMATION  RECORD  (remember,  NEW_PROC  is  the
pointer,  NEW_PROC^  is  the  record).  Why  does it  return a pointer
instead of the record itself?

   Remember  that each process information  record has to indicate who
the  process's father pointer is and who the process's sons are. To do
this,  we  have  to  have some kind of  "unique process identifier" --
well,   what  better  identifier  than  the  POINTER  TO  THE  PROCESS
INFORMATION RECORD?

   Thus, our record really looks like this:

   TYPE PROCESS_INFO_REC =
          RECORD
          PROGRAM_NAME: PACKED ARRAY [1..80] OF CHAR;
          CURRENT_PRIORITY: INTEGER;
          TOTAL_MEMORY_USED: INTEGER;
          FATHER_PROCESS: ^PROCESS_INFO_REC;
          SON_PROCESSES: ARRAY [1..100] OF ^PROCESS_INFO_REC;
          END;

When we create a new process, we can just say:

   NEW_PROC_INFO_PTR^.FATHER_PROCESS:=CURR_PROC_INFO_PTR;

All  of our dynamically allocated process information records are thus
DIRECTLY  LINKED to each other using pointers. To find out a process's
grandfather, for instance, we can just say

   FUNCTION GRANDFATHER (PROC_INFO_PTR: PROCESS_PTR): PROCESS_PTR;
   BEGIN
   GRANDFATHER:=PROC_INFO_PTR^.FATHER_PROCESS^.FATHER_PROCESS;
   END;

   Now of course, pointers aren't the only way to "point" to data. If,
for  instance, all our Process  Information Records were not allocated
dynamically, but rather taken out of some global array:

   VAR PROCESS_INFO_REC_POOL: ARRAY [1..256] OF PROCESS_INFO_REC;

then  we  could  just  use  indices  into this pool  as unique process
identifiers  (in  fact, that's what PINs in  MPE/V are -- indices into
the  PCB,  an array of records that's  kept in a system data segment).
But for true dynamically allocated data (allocated using PASCAL's NEW,
C's CALLOC, or SPL's DLSIZE), pointers are the way to go.


                  POINTERS BEYOND DYNAMIC ALLOCATION

   Another  reason  why I first introduced  pointers in the context of
dynamic  allocation  and  NEW  is  that in PASCAL,  that's all you can
really use pointers for.

   In  other  words,  NEW  makes a pointer  point to a newly-allocated
chunk  of  data;  but  there's  NO  WAY TO MAKE A  POINTER POINT TO AN
EXISTING  GLOBAL OR LOCAL VARIABLE (OR ARRAY ELEMENT). PASCAL's theory
was  that  any  global  or local variables can  and should be accessed
without   pointers  (since  presumably  we  know  where  they  are  at
compile-time).

   Following  with the UDC dictionary  example I talked about earlier,
let me explain a bit about the workings of the UDC parser and executor
that I have in MPEX and SECURITY.

   Both  MPEX/3000  and  SECURITY/3000's  VEMENU  need  to be  able to
execute  UDCs.  Unfortunately,  MPE's COMMAND  intrinsic can't execute
UDCs on my behalf (it can't even execute PREP or RUN!). Thus, I had to
do my own UDC handling.

   * The first step in handling UDCs is finding out what UDC files the
     user  has  set  up.  To  do  this  I  look  in the  directory and
     COMMAND.PUB.SYS,  which indicate all of the user's UDC files, and
     then I FOPEN each one of these files.

   *  Next, I have to find out what UDCs these files contain and where
     they  contain  them.  I read the files  and generate a record for
     each  UDC  I find; this record contains  the UDC's name, the file
     number  of  the  file where I found it,  and the record number at
     which  I  found  it.  (This  is  where  the dynamic  allocation I
     discussed  earlier fits in -- I'd like to be able to allocate the
     UDC  dictionary dynamically rather than just  keep it around as a
     fixed-size global variable.)

   * Finally, when the time actually comes to execute a UDC,

     - I parse the UDC invocation (e.g. "COBOL85 AP010S,AP010U");

     -  After  finding the UDC name (COBOL85), I  look it up in my UDC
       dictionary  to  find  out  where  and  in  which UDC  file it's
       defined;

     -  I  read  the  header of the UDC from  the UDC file -- it looks
       something like "COBOL85 SRC,OBJ=$NEWPASS,LIST=$STDLIST";

     -  I  determine  the  values  of all the  UDC parameters from the
       header  and the invocation -- SRC is AP010S, OBJ is AP010U, and
       LIST is $STDLIST (the default);

     -  I  then  read the UDC, substituting  all the UDC parameters in
       each line and then executing it.

Not  a trivial process, but a necessary  one. The reason I bring it up
is  that one aspect of the processing -- determining the UDC parameter
values  and  then substituting them into the  UDC commands -- is quite
well-tailored to the use of POINTERS.

   In  order to determine the values of all the UDC parameters, I have
to  parse  the  UDC  invocation ("COBOL85 AP010S,AP010U")  and the UDC
header   ("COBOL85   SRC,OBJ=$NEWPASS,LIST=$STDLIST").  From  the  UDC
invocation  I determine the values of the specified parameters (AP010S
and  AP010U); from the UDC header I get the parameter names (SRC, OBJ,
and  LIST),  and the default values  ($NEWPASS and $STDLIST). (For the
purposes  of this discussion, I'm ignoring keyworded UDC invocation --
if  you don't know what I mean by this, that's good; it's not relevant
here.)

   Thus, my parsing has essentially generated three tables:

   Parameter Names    Values Given     Default Values
   SRC                AP010S           none
   OBJ                AP010U           $NEWPASS
   LIST               none             $STDLIST

Two  of these tables -- the Values  Given and Default Values -- I have
to  merge  into  one  table,  the Actual Values table.  If a value was
given, it becomes the Actual Value; if it wasn't, the default value is
used.

   Let's  look at the kind of data  structure we'd use to do this sort
of thing:

   TYPE STRING_TYPE = PACKED ARRAY [1..80] OF CHAR;
   VAR PARM_NAMES: ARRAY [1..MAX_PARMS] OF STRING_TYPE;
       VALUES_GIVEN: ARRAY [1..MAX_PARMS] OF STRING_TYPE;
       DEFAULT_VALUES: ARRAY [1..MAX_PARMS] OF STRING_TYPE;
       ACTUAL_VALUES: ARRAY [1..MAX_PARMS] OF STRING_TYPE;

As  you see, we've declared four  arrays of strings. Each one contains
up to MAX_PARMS (say, 16) strings, one for each UDC parameter.

   Of  course,  if  we  do it this way, we'll  be using 3*16*80 = 3840
bytes.  Since  actually each parameter could be  up to 256 bytes long,
we'd  actually need more like 12,000 bytes to fit all our data! What a
waste,  especially,  since  all of these values  were derived from two
strings  -- the UDC invocation and UDC  header -- each of which was at
most 256 bytes.

   In other words,

   *  All  the  elements of VALUES_GIVEN are  simply substrings of the
     UDC_INVOCATION array;

   *  All the elements of PARM_NAMES and DEFAULT_VALUES are substrings
     of the UDC_HEADER array.

Why  actually  copy these substrings out? It  takes a lot of space and
more  than  a little time -- instead,  let's just keep the INDICES and
LENGTHS of the substrings in their original arrays:

   VAR PARM_NAME_INDICES: ARRAY [1..MAX_PARMS] OF INTEGER;
       PARM_NAME_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;
       GIVEN_VALUE_INDICES: ARRAY [1..MAX_PARMS] OF INTEGER;
       GIVEN_VALUE_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;
       DEFAULT_VALUE_INDICES: ARRAY [1..MAX_PARMS] OF INTEGER;
       DEFAULT_VALUE_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;

   Note that, so far, this is a classical PASCAL solution; if you want
to  "point"  to  data that's in your  program's variables (rather than
dynamically allocated using NEW), you just keep indices instead of the
actual data. Then, you can say

   SUBSTR(UDC_HEADER,PARM_NAME_INDICES[PNUM],PARM_NAME_LENGTHS[PNUM])

and  thus  refer  to the PNUMth PARM_NAME  (assuming your PASCAL has a
SUBSTR function); similarly, you can use

   SUBSTR(UDC_HEADER,DEFAULT_VALUE_INDICES[PNUM],
                     DEFAULT_VALUE_LENGTHS[PNUM])

and

   SUBSTR(UDC_INVOCATION,GIVEN_VALUE_INDICES[PNUM],
                         GIVEN_VALUE_LENGTHS[PNUM])

Remember,  you KNOW where the substrings came from anyway, so with the
indices  and  the lengths you can  always "reconstitute" them whenever
you like instead of having to keep them around in separate arrays.

   However,  think  about  the  ACTUAL_VALUE table.  This contains the
ACTUAL VALUES of the UDC parameters, which might have come either from
the  DEFAULT  VALUES on the UDC header or  the GIVEN VALUES on the UDC
invocation.  How can you represent the actual values without having to
copy each one out into a separate string?

   You  see, you can't just keep the index of the actual value around,
since in this case, you're not sure WHAT string this is an index into.
You'd have to have a special array of flags:

   VAR ACTUAL_VALUE_FROM: ARRAY [1..MAX_PARMS] OF (HEADER,INVOCATION);
   VAR ACTUAL_VALUE_INDICES: ARRAY [1..MAX_PARMS] OF INTEGER;
   VAR ACTUAL_VALUE_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;

and then use it like this:

   PROCEDURE PRINT_ACTUAL_VALUES (NUM_PARMS: INTEGER);
   VAR PNUM: 1..MAX_PARMS;
   BEGIN
   FOR PNUM:=1 TO NUM_PARMS DO
     BEGIN
     WRITE ('PARAMETER NUMBER ', PNUM:3, ' IS: ');
     IF ACTUAL_VALUE_FROM(PNUM)=HEADER THEN
       WRITELN (SUBSTR(UDC_HEADER,ACTUAL_VALUE_INDICES[PNUM],
                                  ACTUAL_VALUE_LENGTHS[PNUM]))
     ELSE
       WRITELN (SUBSTR(UDC_INVOCATION,ACTUAL_VALUE_INDICES[PNUM],
                                      ACTUAL_VALUE_LENGTHS[PNUM]));
     END;
   END;

The point I'm trying to make here is that:

   *  THERE ARE CASES IN WHICH YOU  WANT TO HAVE A VARIABLE "POINTING"
     INTO  ONE OF SEVERAL ARRAYS -- OR, IN GENERAL, ONE OF A NUMBER OF
     POSSIBLE  LOCATIONS. TO DO THIS IN PASCAL, YOU HAVE TO KEEP TRACK
     OF *BOTH* WHICH LOCATION IT'S POINTING TO AND WHAT ITS INDEX INTO
     THAT LOCATION IS. THEN, TO REFERENCE IT, YOU'LL NEED AN "IF" OR A
     "CASE".


   Imagine,  though, that in PASCAL you were  able to set a pointer to
the  address of a global or procedure local variable (and, presumably,
have some way of using that pointer as a string). Then, you could have

   VAR PARM_NAMES: ARRAY [1..MAX_PARMS] OF ^STRING;
       PARM_NAME_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;
       GIVEN_VALUES: ARRAY [1..MAX_PARMS] OF ^STRING;
       GIVEN_VALUE_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;
       DEFAULT_VALUES: ARRAY [1..MAX_PARMS] OF ^STRING;
       DEFAULT_VALUE_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;
       ACTUAL_VALUES: ARRAY [1..MAX_PARMS] OF ^STRING;
       ACTUAL_VALUE_LENGTHS: ARRAY [1..MAX_PARMS] OF INTEGER;

Not only will you be able to say

   SUBSTR(PARM_NAMES[PNUM]^,1,PARM_NAME_LENGTHS[PNUM])

instead of

   SUBSTR(UDC_HEADER,PARM_NAME_INDICES[PNUM],PARM_NAME_LENGTHS[PNUM])

thus  being  able  to  forget where the  PARM_NAMES, GIVEN_VALUES, and
DEFAULT_VALUES  arrays happened to be derived from, but you could also
have  ACTUAL_VALUES  point  into EITHER the  header or the invocation,
thus reducing our "PRINT ACTUAL VALUES" procedure to:

   PROCEDURE PRINT_ACTUAL_VALUES (NUM_PARMS: INTEGER);
   VAR PNUM: 1..MAX_PARMS;
   BEGIN
   FOR PNUM:=1 TO NUM_PARMS DO
     BEGIN
     WRITE ('PARAMETER NUMBER ', PNUM:3, ' IS: ');
     WRITELN (SUBSTR(ACTUAL_VALUES[PNUM]^,1,
                     ACTUAL_VALUE_LENGTHS[PNUM]));
     END;
   END;


   Unfortunately, in PASCAL you can't do this because there was no way
to  fill  the  various arrays of pointers  (ACTUAL_VALUES et al.) with
data   --   there's  no  way  of  determining  the  pointer  to,  say,
UDC_INVOCATION[33] or UDC_HEADER[2].

   What I've been trying to convince you is that this is a non-trivial
lack  and there are cases in which it's  desirable to be able to set a
pointer  to point to any object in your data space, and then work from
that pointer rather than, say, indexing into an array.


                      OTHER POINTER APPLICATIONS

   There  are other cases in which many  C and SPL users use pointers.
In these cases, a PASCAL user can quite as readily use an index into a
string or an array; it's hard to tell which solution is better.

   For   instance,   consider  these  three  procedures:  [Note:  I've
intentionally  avoided  using  certain  language features  like PASCAL
sets,  SPL three-way <=, some automatic type conversions, etc. to make
the examples as similar as possible]

   TYPE PAC256 = PACKED ARRAY [0..255] OF CHAR;
   PROCEDURE UPSHIFT_WORD (S: PAC256);
   { Upshifts all the letters in S until a non-alphabetic }
   { character is reached; expects there to be at least one }
   { special character somewhere in S to act as a terminator. }
   VAR I: INTEGER;
   BEGIN
   I:=0;
   WHILE 'A'<=S[I] AND S[I]<='Z'   OR   'a'<=S[I] AND S[I]<='z' DO
     BEGIN
     IF 'a'<=S[I] AND S[I]<='z' THEN
       S[I]:=CHR(ORD(S[I])-32);   { upshift character }
     I:=I+1;
     END;
   END;

   PROCEDURE UPSHIFT'WORD (S);
   BYTE ARRAY S;
   << Upshifts all the letters in S until a non-alphabetic >>
   << characters is reached; expects there to be at least one >>
   << special character somewhere in S to act as a terminator. >>
   BEGIN
   BYTE POINTER P;
   @P:=@S;
   WHILE "A"<=P AND P<="Z"   OR   "a"<=P AND P<="z" DO
     BEGIN
     IF "a"<=P AND P<="z" THEN
       P:=BYTE(INTEGER(P)-32);
     @P:=@P(1);
     END;
   END;

   upshift_word (s)
   char s[];
   {
   char *p;
   p = &s[0];
   while ('A'<=*p && *p<='Z'   ||   'a'<=*p && *p<='z')
     {
     if ('a'<=*p && *p<='z')
       p = (char) (*p - 32);
     p = &p[1];
     }
   }

The   first  example  is  PASCAL,  using  indices  into  an  array  of
characters;  the  next  two  are SPL and  C, using character pointers.
Which is better?

   * In PASCAL, since you're using indices into an array whose size is
     known,  the compiler can do run-time bounds checking to make sure
     that  the  index is valid; also,  every "S[I]" reference makes it
     clear where you're getting the data from.

   * In SPL and C, instead of "S[I]" you just say "P" (in SPL) or "*P"
     (in C). This is, incidentally, probably faster than PASCAL, since
     it  would  typically  require  just an  indirect memory reference
     instead of an indirect reference with indexing (unless you have a
     very smart compiler).

   * Some people think that "S[I]" is more readable; others don't like
     to  always  repeat  the  index (especially if  its something like
     "MY_STRING[MY_INDEX]")  and prefer "P" or  "*P". This is where it
     gets quite subjective; you've got to decide for yourself.


                MORE ON DYNAMICALLY ALLOCATING MEMORY

   As  you  recall,  we  started  our  discussion of  pointers with an
example  involving dynamic memory allocation.  This is really good for
me,  since  I have things to say  about dynamic memory allocation, and
I'd  have a hard time sneaking them into any other chapter. Thus, with
this  tenuous  connection  established,  let's  talk  some  more about
dynamic memory allocation.

   I  use "dynamic memory allocation"  to mean allocating memory other
than  what's automatically allocated for you  in the form of GLOBAL or
PROCEDURE  LOCAL variables. You typically  use this mechanism when you
don't know at compile-time how much memory you'll need.

   I've  already  given  some  examples  of  uses  of  dynamic  memory
allocation:

   *  Allocating  a "UDC command dictionary" that  could be 0 bytes or
     10,000 bytes.

   *  Allocating "process information records", which might themselves
     be  rather  small,  but of which there might  be either none or a
     thousand.

   *  Implementing  commands  like  MPE's :SETJCW, with  which you can
     define any number of objects at the user's command.

   Let's  consider  the  latter  example -- you're  writing a command-
driven program, in which the user might use a "SETVARIABLE" command to
define a new variable and give it a value (say, an integer).

   Naturally,  you have some top-level prompt-and-input routine, which
then  sends  the  user-supplied  command  to the  parser (called, say,
PARSE'COMMAND):  The parser identifies this  as a SETVARIABLE command,
and calls this procedure:

   PROCEDURE SETVARIABLE (VAR'NAME, VAR'LEN, INIT'VALUE);
   VALUE VAR'LEN;
   VALUE INIT'VALUE;
   BYTE ARRAY VAR'NAME;
   INTEGER VAR'LEN;
   INTEGER INIT'VALUE;
   BEGIN
   ...
   END;

Now  SETVARIABLE  has  all  the  data  already at its  disposal in its
parameters; however, it has to SAVE all this information somewhere, so
that  it'll stay around long after  the SETVARIABLE procedure and even
the   PARSE'COMMAND   procedure  is  exited.  Presumably  there  is  a
FINDVARIABLE  procedure  somewhere that, given  the variable name will
extract the value that was put into it using SETVARIABLE.

   Where  should  SETVARIABLE  put  the data --  the variable name and
initial value? Well, we could have a global array:

   BYTE ARRAY VARIABLES(0:4095);

This  gives  us  up  to  4096  bytes  of  room  for  our "user-defined
variables", names, data, and all.

   Clearly,  though, this solution is both inefficient and inflexible.
What  if the user doesn't define any variables? We've wasted 4K bytes.
What if the user defines too many variables? He won't be able to.

   What  we want to do, thus, is  to have SETVARIABLE request from the
system a chunk of memory containing VAR'LEN+3 bytes -- VAR'LEN for the
name,  1  for the name length, and 2  for the variable value. Then, we
can  keep  the  addresses of all these  chunks somewhere (perhaps in a
linked  list), and FINDVARIABLE can then  just go through these chunks
to find the variable it's looking for.

   *  In SPL, the only easy way of dynamically allocating memory is by
     calling  DLSIZE.  DLSIZE  will  get space from  the "DL-DB" area;
     unfortunately,  it'll only get it in  128-word chunks (if you ask
     for  a 2-word chunk it'll give you a 128 words). Also, if we ever
     see a DELETEVAR command, there's no easy way of "returning space"
     to the system (always a difficult task).

   *  In PASCAL, we'd use the NEW procedure. You pass to NEW a pointer
     to  a given data type, and it will set that pointer to point to a
     newly allocated variable of that data type. Thus, you'd say:

       TYPE NAME_TYPE = PACKED ARRAY [1..80] OF CHAR;
            VAR_REC = RECORD
                      CURRENT_VALUE: INTEGER;
                      NAME_LEN: INTEGER;
                      NAME: NAME_TYPE;
                      END;
            VAR_REC_PTR = ^VAR_REC;
       ...
       FUNCTION SETVARIABLE (VAR NAME: NAME_TYPE;
                             NAME_LEN: INTEGER;
                             INIT_VALUE: INTEGER): VAR_REC_PTR;
       VAR RESULT: VAR_REC_PTR;
       BEGIN
       NEW (RESULT);
       RESULT^.CURRENT_VALUE:=INIT_VALUE;
       RESULT^.NAME_LEN:=NAME_LEN;
       RESULT^.NAME:=NAME;
       SETVARIABLE:=RESULT;
       END;

     Simple, eh?

   *  In  C, you'd do almost the same  thing. I say almost because the
     only  real  difference  is  that  C's  equivalent of  NEW, called
     CALLOC,  takes the number of elements to allocate (in our case 1,
     since  this is just a record and not an array of records) and the
     size of each element.

       typedef struct {int current_value;
                       int name_len;
                       char name[0];} var_rec;
       typedef *var_rec var_rec_ptr;
       ...
       var_rec_ptr setvariable (name, name_len, init_value)
       char name[];
       int name_len, init_value;
       {
       var_rec_ptr result;
                /* Allocate an object, cast it to type "VAR_REC". */
       result = (*var_rec) calloc (1, sizeof(var_rec) + name_len);
       *result.curr_value = init_value;
       strcpy (*result.name, name);
       }

   Compare  PASCAL and C; ignore the small differences like the STRCPY
call  (it  just  copies  once  string  into  another).  The  important
difference  is in the CALLOC and NEW  calls, and it's an important one
indeed!

   *  IN  PASCAL,  A  CALL TO "NEW" ALLOCATES A  NEW OBJECT OF A GIVEN
     DATATYPE.  THE  SIZE  OF  THE  OBJECT IS UNIQUELY  DEFINED BY THE
     DATATYPE. WHAT ABOUT STRINGS???

NEW  is just great for allocating fixed-size objects, like the Process
Information   Records   we   talked  about  earlier.  But  what  about
variable-length things?

   When  we  defined  the  VAR_REC data type, we  defined NAME to be a
PACKED ARRAY [1..80] OF CHAR. This, however, isn't quite precise. What
this  means  to  us is that NAME can be  UP TO 80 characters long. But
when  NEW  allocates  new  objects  of  type  VAR_REC, it  will ALWAYS
allocate  them with room for 80 characters in NAME! Never mind that we
know  how long NAME should REALLY be -- we have no way of telling this
to NEW.

   C's  CALLOC, on the other hand, allows  us to specify the number of
bytes we need to allocate. The disadvantage of this is that we need to
figure out this number; this, however, isn't hard -- we just say

   sizeof (<datatype>)

e.g.

   sizeof (var_rec)

[Note  that VAR_REC was cunningly defined to  have the NAME field be 0
characters  long  --  since C never does  bounds checking anyway, this
won't cause any problems, but will make SIZEOF return the size of only
the fixed-length portion of VAR_REC.]

The  great advantage of CALLOC is  that for variable-length objects we
can  EXACTLY  indicate how much space is  to be allocated. Since space
savings  is one of the major  reasons we do dynamic memory allocation,
this  advantage of CALLOC -- or, more properly, disadvantage of NEW --
becomes very serious indeed. Not only does it waste space, but it also
impairs  flexibility,  since  in trying to save  space we restrict the
maximum  variable name length to 80  bytes, when we should really make
it virtually unlimited.

   Thus,  to summarize, C and  PASCAL both have relatively easy-to-use
dynamic   memory   allocation  mechanisms  (as  well  as  deallocation
mechanisms,  called DISPOSE in PASCAL and  CFREE in C). They both work
very  well  for allocating fixed-length  objects (or, parenthetically,
so-called  "variant records" which are  really variable-length in that
they  can have one of several  distinct formats). However, if you want
to  allocate  variable-length  objects  --  e.g.  strings  or  records
containing  strings  -- PASCAL CAN'T DO  IT WITHOUT WASTING INORDINATE
AMOUNTS OF MEMORY!


        DYNAMIC MEMORY ALLOCATION -- PASCAL/3000 AND PASCAL/XL

   Naturally,  I'm  not the first one to  notice this kind of problem.
PASCAL/XL  has a rather nice solution to  it (I only wish the Standard
PASCAL authors thought of it):

   P_GETHEAP (PTR, NUM_BYTES, ALIGNMENT, OK);

Using this built-in procedure, you can set PTR (which can be a pointer
of  any type) to point to  a newly-allocated chunk of memory NUM_BYTES
long.  ALIGNMENT  indicates  how to physically align  this chunk (on a
byte,   half-word,  word,  double-word,  or  page  boundary),  and  OK
indicates  whether  or not this request  succeeded (another thing that
Standard PASCAL NEW doesn't give you). The counterpart to DISPOSE here
is

   P_RTNHEAP (PTR, NUM_BYTES, ALIGNMENT, OK);

   These procedures seem every bit as good as CALLOC and CFREE -- they
let you allocate EXACTLY as much space as you need.

   PASCAL/3000  does not have the  P_GETHEAP and P_RTNHEAP procedures;
however,  hidden away in Appendix F of the PASCAL/3000 manual there is
a subsection called "PASCAL Support Library" (did you see this section
when  you  read  the  manual?) Here, with  the strong implication that
these  procedures  are  to  be  used from OTHER  languages rather than
PASCAL, are documented two procedures:

   GETHEAP (PTR, NUM_BYTES, OK);

and

   RTNHEAP (PTR, NUM_BYTES, OK);

   It  appears that these two procedures do pretty much the same thing
as  P_GETHEAP  and  P_RTNHEAP -- they allocate  an arbitrary amount of
space, and return a pointer to it in the variable PTR, which can be of
any type.

   Again,  I'm  not sure whether they were  even INTENDED to be called
from  PASCAL  or  only from other languages;  however, it appears that
they ought to work from PASCAL, too.


                        PASCAL/XL AND POINTERS

   As I mentioned earlier, Standard PASCAL allows pointers in one case
and  one case alone -- pointers to dynamically allocated (NEWed) data.
There's   no   way  to  make  a  pointer  to  point  to  a  global  or
procedure-local  variable.  PASCAL/3000  (and PASCAL/3000)  share this
lack;  PASCAL/3000  lets you get the  address of an arbitrary variable
(by calling WADDRESS, e.g. "WADDRESS(X)"), but it doesn't allow you to
do  the  inverse -- go from the address  (which is an integer) back to
the value.

   SPL and C, of course, allow you to do both. In SPL, you can say

   INTEGER POINTER IP;
   INTEGER I, J;
   @IP:=@I;    << set IP to the address of I >>
   J:=IP+1;    << get the value pointed to by IP >>

In C, you'd write

   int *ip;
   int i, j;
   ip = &i;    << Set IP to the address of I >>
   j = *ip + 1;

Note  that C provides you two operators -- "&" to get the address of a
variable, and "*" to get the value stored at a particular address.

   PASCAL/XL,  in  essence, allows you to do  much the same thing. Its
ADDR  operator  determines  the  address of an  arbitrary variable and
returns it as a pointer. Thus, you can say:

   VAR IP: ^INTEGER;
       I, J: INTEGER;
   IP:=ADDR(I);
   J:=IP^+1;

A  very simple addition, but it allows  you to do virtually all of the
pointer  manipulation described earlier in the section -- you can have
pointers that point to one of several local arrays, pointers that step
through  a  string,  etc. To revive an  example from before, PASCAL/XL
lets you write:

   TYPE PAC256 = PACKED ARRAY [0..255] OF CHAR;
   PROCEDURE UPSHIFT_WORD (S: PAC256);
   { Upshifts all the letters in S until a non-alphabetic }
   { character is reached; expects there to be at least one }
   { special character somewhere in S to act as a terminator. }
   VAR P: ^CHAR;
   BEGIN
   P:=ADDR(S);
   WHILE 'A'<=P^ AND P^<='Z'   OR   'a'<=P^ AND P^<='z' DO
     BEGIN
     IF 'a'<=P^ AND P^<='z' THEN
       P^:=CHR(ORD(P^)-32);   { upshift character }
     P:=ADDTOPOINTER(P,SIZEOF(CHAR));
     END;
   END;

Compare this with the corresponding C code:

   upshift_word (s)
   char s[];
   {
   char *p;
   p = &s;
   while ('A'<=*p && *p<='Z'   ||   'a'<=*p && *p<='z')
     {
     if ('a'<=*p && *p<='z')
       p = (char) ((int)*p - 32);
     p = &p[1];
     }
   }

As  I  mentioned  before, one can legitimately  say that you should be
indexing into the string (e.g. S[I]) rather than using a pointer -- in
fact,  since  you  can't  use pointers to  local variables in Standard
PASCAL,  you'd  have  to use indexing. On  the other hand, many people
prefer using pointers and, as you see, PASCAL/XL allows you to do this
as easily as in C.


               SPL AND ITS LOW-LEVEL ACCESS MECHANISMS

   SPL,  being  a language designed explicitly  for the HP3000 and for
nitty-gritty  systems  programming,  has  a lot  of "low-level" access
mechanisms. These include:

   *  The ability to execute  any arbitrary machine instruction (using
     ASSEMBLE).

   *  The  ability  to  push things onto and  pop things off the stack
     (using TOS).

   *  The  ability  to  examine (PUSH), set  (SET), and reference data
     relative to various system registers (Q, DL, DB, S, X, etc.).

Standard C and PASCAL, naturally, do not have such mechanisms; neither
does   PASCAL/XL.   CCS,  Inc.'s  C/3000  does  have  "ASSEMBLE"-  and
"TOS"-like  constructs  (although their ASSEMBLE  is more difficult to
use than SPL's); however, it's by no means certain that C/XL will have
them.

   Now,  if we were simply counting  features, things would be simple.
We'd  credit SPL with 3 new statements (ASSEMBLE, PUSH, and SET) and 5
new  addressing  modes (TOS, X  register, DB-relative, Q-relative, and
S-relative), and that'd be that. Score: SPL 37, PASCAL 22, C 31.

   Of course, not every feature is worth as much as any other feature.
Many  people  complain  that it's BAD for  SPL to have these features;
that  whatever  performance  advantages  you can get  aren't worth the
additional  complexity and opacity; that, in general, PASCAL and C are
better off without them.

   Now  this may end up being a moot point, especially if C/XL ends up
not  having  ASSEMBLE  and  similar constructs. On  the other hand, it
might be nice to consider any cases there may be where such constructs
are really necessary -- if only for old times' sake.


           THE ARGUMENTS AGAINST ASSEMBLE, TOS, AND FRIENDS

   Before  we go further, let me outline the arguments -- most of them
perfectly  valid  --  that  have  been  made against  SPL's (and other
languages') low-level constructs:

   *  If  you're  using  low-level constructs  for performance's sake,
     you're  wasting  your  time.  Most compilers these  days are good
     enough  that they generate very efficient  code, and you can't do
     much  better using assembly. On the other hand, assembly is much,
     much  harder  to write, read, and  maintain than high-level code.
     It's just not worth it.

   *  If you're using low-level constructs for functionality, all that
     means  is  that  the  system  isn't  providing  you  with  enough
     fundamental  primitives that you could  use in place of assembly.
     For instance, the old trick of going through the stack markers, Q
     register, etc. to get your ;PARM= and ;INFO= -- there should have
     been an intrinsic to do that in the first place.

   * If the language has low-level programming constructs, people will
     use   them  out  of  thoughtlessness  or  a  misguided  sense  of
     efficiency, and thus produce awful, impossible to maintain, code.
     Languages  with  ASSEMBLE,  TOS, and the like,  are like a loaded
     gun,  an open invitation for anybody to shoot himself in the foot
     (or worse).

   * Finally, the more sophisticated (and efficient) the compiler, the
     more likely that it CAN'T let you do low-level stuff. How can you
     use  register-manipulation code if you  don't know what registers
     the  compiler  uses for itself (and it  can use different ones in
     different  cases)?  How  can you get things  off the stack if you
     don't  know whether the compiler is  keeping them on the stack or
     in  registers?  How  can you trace back  the stack markers if the
     compiler may do special call instructions or place code inline at
     its own discretion?

   The  first of these arguments, in my  opinion, is on the whole very
sound.  Very rarely do I find it desirable to use low-level constructs
for  efficiency's  sake.  Compared to programmer  and maintainer time,
computer  time is cheap. On the  other hand, when efficiency is really
very important -- and it's often the case that 5% of the code uses 95%
of the CPU time -- using low-level constructs for performance sake may
be quite necessary.

   The  fourth  argument -- that a  smart modern compiler can't assure
you  about  the state of the world and  thus can't let you muck around
with it -- is very potent as well. In SPL, you can always say

   TOS:=123;
   << now, execute an instruction that expects something on TOS >>

What  if  on Spectrum you have an  instruction that expects a value in
register 10? You can't just say

   R10:=123;
   << execute the instruction >>

What  if  the compiler stored one of  your local variables in R10? How
does it know that it has to save it before your operation? Will saving
it  damage  the  machine state (e.g. condition  codes) enough that the
operation  won't  work  anyway? The classic example  of this in SPL is
condition codes -- saying

   FNUMS'ARRAY(I):=FOPEN (...);
   IF <> THEN
     ...

The  very  act  of  saving  the result of FOPEN  in the Ith element of
FNUMS'ARRAY  resets the condition code, thus  making the IF <> THEN do
something  entirely  unexpected!  Now,  an  SPL  user  can  know which
operations  change  the  condition code, and  which don't (maybe), and
thus avoid this sort of error -- but what about a PASCAL/XL user? Will
HP  be  obligated to tell all the  users which registers and condition
codes each operation modifies?

   The  third  of the arguments has some  merit, too. In fact, all you
need  to  do is to look at a  certain operating system provided by the
manufacturer  of  a  certain  large business minicomputer,  to see how
dangerous  TOSes and ASSEMBLEs are. I've  seen pieces of code that are
utterly  impossible to understand, where  something is pushed onto the
stack  only to be popped 60 lines and 10 GOTOs later -- ugh! There are
some language constructs that are just plain DANGEROUS.

   It's  the  second argument -- that all  the cases where you need to
use low-level code shouldn't have existed in the first place -- that I
don't  quite buy. What SHOULD be and what IS are two different things.
I personally wish that every case where I needed to use low-level code
was  already  implemented  for me by a  nice, readable, easy-to-use HP
intrinsic. Unfortunately, that's not always the case, and I shudder to
think what would have happened if I DIDN'T have a way of doing all the
dirty assembler stuff myself.

   Thus,  the  point here is: every system  SHOULD provide you all you
want,  and it SHOULD optimize your code well (perhaps even better than
you  could  do  it yourself using  assembler). Unfortunately, it often
DOESN'T,  and  you  need  some  way of getting around  it to do things
yourself.


         EXAMPLES OF THINGS YOU NEED LOW-LEVEL OPERATIONS FOR

                       SETTING CONDITION CODES

   For  better or worse, HP decided  that its intrinsics indicate some
part  of  their  return  result  as the "condition  code". This value,
actually  2  bits in the STATUS register,  can be set to the so-called
"less  than",  "greater  than",  and  "equal" values; to  see what the
current condition code value is, you can say in SPL:

   IF <> THEN     << or <, >, <=, >=, or = >>

or, in FORTRAN:

   IF (.CC.) 10, 20, 30      << go to 10 on <, 20 on =, 30 on > >>

(A similar mechanism exists in COBOL II.)

   Now,  have you ever wondered how  HP's intrinsics actually SET this
condition code? You can't just say:

   CONDITION'CODE:=<;

to set it to "less than".  What can you do?

   Now,  one  can  say -- and quite correctly  -- that it's not a good
thing  for a procedure to return  condition codes. Condition codes are
volatile  things; they're changed by almost every machine instruction;
for instance, as I mentioned before,

   FNUMS(I):=FOPEN (...);
   IF <> THEN
     ...

won't  do what you expect, since  the instructions that index into the
FNUMS  array  reset the condition code. Thus,  if you have the choice,
you  ought to return data as, say,  the procedure's return value, or a
by-reference parameter.

   Still,  sometimes  it's necessary to return  a condition code. Say,
for instance, that you have a program that's been written to use FREAD
calls,  and you decide to change it to call your own procedure called,
say,  MYREAD.  MYREAD  might,  for instance, do MR  NOBUF I/O to speed
things  up, or whatever -- the important  thing is that you want it to
be "plug compatible" with FREAD. You just want to be able to say

   /CHANGE "FREAD","MYREAD",ALL

and not worry about changing anything else.

   Well,  in C or PASCAL, you'd be USC (that's Up Some Creek). In SPL,
though,  you  can  do it. You have to  know that the condition code is
kept  in the STATUS register, a copy of  which is in turn kept in your
procedure's  STACK  MARKER at location Q-1. When  you do a return from
your  procedure  to  the caller, the EXIT  instruction sets the status
register  to the value stored in the stack marker. Thus, you just need
to  set the condition code bits in  Q-1 (something you can't do in any
language besides SPL) before returning from the procedure:

   INTEGER PROCEDURE MYREAD (FNUM, BUFFER, LEN);
   VALUE FNUM, LEN;
   INTEGER FNUM, LEN;
   ARRAY BUFFER;
   BEGIN
   INTEGER STATUS'WORD = Q-1;
   DEFINE CONDITION'CODE = STATUS'WORD.(6:2) #;
   EQUATE CCG=0, CCL=1, CCE=2;  << possible CONDITION'CODE values >>
   ...
   CONDITION'CODE:=CCE;
   END;

Relatively clean as you see, but not doable without "low-level" access
(in this case, Q-relative addressing).

   Incidentally,  don't think this is  just a speculative example that
doesn't  happen in real life. I usually avoid using condition codes in
my RL (most of my procedures return a logical value indicating success
or   failure),   but   I   have   several  just  like  this  --  FREAD
plug-compatible replacements.

   Also,  I've sometimes had to write SL procedures that exactly mimic
intrinsics  like  READX,  FREAD,  FOPEN,  etc.  so that I  can patch a
program  file  to  call  this  procedure instead of  the HP intrinsic.
(VESOFT's  hook mechanism, which implements RUN, UDCs, super REDO, and
MPEX commands from within programs like EDITOR, QUERY, etc. works like
this.)

   One  can say that HP should  have provided a SET'CCODE intrinsic in
the  first place to do this; my only response is that it didn't, and I
have to somehow get my job done in spite of it.


                         SYSTEM TABLE ACCESS

   System  table  access,  another  thing  that  I  like  to  do  in a
high-level fashion, with as few ASSEMBLEs, TOSes, EXCHANGEDBs, etc. as
possible, sometimes needs low-level access.

   The classic example is accessing system data segments, e.g.

   TOS:=@BUFFER;
   TOS:=DSEG'NUMBER;
   TOS:=OFFSET;
   TOS:=COUNT;
   ASSEMBLE (MFDS 4);   << Move From Data Segment >>

Originally,  in  SPL,  this  was the ONLY way  to access a system data
segment  (like the PCB, JMAT, your JIT, etc.). I didn't have an OPTION
--  do  it this way or some other,  somewhat slower way; it was either
use  TOS and ASSEMBLE (which I was scared to death of) or not do it at
all.

   Now, of course, SPL has the MOVEX statement, with which I can say

   MOVEX (BUFFER):=(DSEG'NUMBER,OFFSET),(COUNT);

to  do  exactly  the  same  thing  without  any  unsightly  ASSEMBLEs.
Remember,  though,  that  this construct is a  recent addition to SPL;
when  I  started hacking on the HP3000 in  1979, it wasn't there, so I
had to do without it.

   Another  example of system tables access is access to the PXGLOB, a
table  that  lives  in  the  DL-  negative  area  of your  stack. Most
languages  can't  access  this  area, and even SPL  doesn't give you a
direct  way of getting to it; but,  with the PUSH statement, it can be
done:

   INTEGER ARRAY PXGLOB(0:11);
   INTEGER POINTER DL'PTR;
   ...
   PUSH (DL);
   @DL'PTR:=TOS;
   MOVE PXGLOB:=DL'PTR(-DL'PTR(-1)),(12);

We  use SPL's ability to set a pointer to point to any location in the
stack  (in this case, the location pointed to by the DL register), and
then  index  from  there. Again, there's no  way of doing this without
using PUSH and TOS.


              OTHER APPLICATIONS OF LOW-LEVEL CONSTRUCTS

   Some  other  cases  where  ASSEMBLE, TOS, etc.  are necessary to do
things:

   *  If  you look in the "CONTROL  STRUCTURES" section of this paper,
     you'll  see  a  PASCAL/XL  construct  called TRY  .. RECOVER. For
     reasons  that I explain in that section, I think that it's a very
     useful  construct, and I've implemented it in SPL for the benefit
     of my SPL programs.

     Note that in any other language, I couldn't do this; only in SPL,
     with  its  register  access  and  especially  the  ability  to do
     Q-relative  addressing to access stack markers, could I implement
     this entirely new control structure.

   *  Whenever you write an SPL OPTION VARIABLE procedure, you need to
     be able to access its "option variable bit mask", which indicates
     which  parameters  were  passed  and  which  were  omitted.  This
     information  is  stored at Q-4 (and  also sometimes at Q-5); with
     SPL's  Q-relative addressing, you can access it. Again, maybe SPL
     should  have  a  built-in  construct  that lets you  find out the
     presence/absence  of  an  OPTION VARIABLE  parameter; however, it
     does not.

   *  To  determine your run-time ;PARM= or  ;INFO= value, you need to
     look  at  your  "Qinitial"-relative  locations  -4,  -5,  and -6.
     Qinitial refers to the initial value of the Q register; to get to
     it, you have to go through all your stack markers, which requires
     Q-relative  addressing. HP's new GETINFO  intrinsic does this for
     you;  it  was released in 1987, whereas  the HP3000 was first put
     out in 1972.

   *  HP's  LOADPROC intrinsic dynamically loads  a procedure from the
     system SL and returns you the procedure's plabel. How do you call
     the  loaded  procedure? You have to  push all the parameters onto
     the stack and then do an ASSEMBLE, to wit:

        TOS:=0;   << room for the return value >>
        TOS:=@BUFF;   << parameter #1 >>
        TOS:=I+7;     << parameter #2 >>
        TOS:=PLABEL;  << the plabel returned by LOADPROC >>
        ASSEMBLE (PCAL 0);
        RESULT:=TOS;  << collect the return value >>

   *  Say  that,  in the middle of executing  a procedure, you need to
     allocate  X words of space. You can try allocating it in your DL-
     area,  but  then you'll have a  hard time deallocating it (unless
     you want to write your own free space management package). If you
     need  this  space  only  until the end of  the procedure, you can
     simply say:

        INTEGER S0 = S-0;   << S0 now refers to the top of stack >>
        INTEGER POINTER NEWLY'ALLOCATED;
        ...
        @NEWLY'ALLOCATED:=@S0+1;
        TOS:=X;   << the amount of space to allocate >>
        ASSEMBLE (ADDS 0);  << allocate the space on the stack >>

     NEWLY'ALLOCATED  now  points  to  the X words  of newly allocated
     stack  space. Exiting the procedure will deallocate the space, as
     will saying

        TOS:=@NEWLY'ALLOCATED-1;
        SET (S);

   *  XCONTRAP sets up a procedure as a control-Y trap procedure; when
     the user hits control-Y, the procedure is called. However, if the
     control-Y  is  hit  at  certain  times (say, in  the middle of an
     intrinsic call), there'll be some junk left on the stack that the
     trap  procedure will have to pop. The way the SPL manual suggests
     you do this is by saying:

        PROCEDURE TRAP'PROC;
        BEGIN
        INTEGER SDEC=Q+1;   << indicates the amount of junk to pop >>
        ...
        TOS:=%31400+SDEC;   << build an EXIT instruction! >>
        ASSEMBLE (XEQ 0);   << execute the value in TOS! >>
        END;

     This  is,  of  course,  incredibly  ugly  code  --  you  build an
     instruction  on  top of the stack and  then execute it! -- and HP
     should  certainly have designed its control-Y trap mechanism some
     other  way. On the other hand, I  can't do anything about it; I'm
     stuck with it, and I have to have some way of dealing with it.


                  CONCLUSION: HOW BAD IS LOW-LEVEL?

   As  you  see,  for  all  the  bad things that  have been said about
ASSEMBLEs and TOSes, sometimes they are necessary to get things done.

   Almost  by definition, every time when they are necessary indicates
something  wrong with the operating system. In every case shown above,
I  SHOULDN'T  have  to  stoop to ASSEMBLEs et  al.; there SHOULD be HP
intrinsics  to set the condition code, get the ;PARM= and ;INFO=, move
things to/from data segments, access DL-negative area, allocate things
on the stack, do TRY .. RECOVER, etc.

   And,  as you see, many of the  problems I discussed above HAVE been
fixed  -- in new version of MPE, of SPL, of PASCAL/XL. The root of the
problem, though, remains the same:

   * HP WILL NEVER THINK OF EVERYTHING.

The  users'  needs  will  always  outstrip HP's  clairvoyance. The big
advantage  of SPL was that it gave you the tools to satisfy almost any
need you had (at a substantial cost in blood, sweat, toil, and tears).

   I  only hope that on Spectrum, HP has some mechanism -- an ASSEMBLE
construct in PASCAL/XL or C/XL, or, perhaps, a separate assembler that
is  accessible  to  the users -- with  which Spectrum users can attack
problems that HP hasn't thought of.


              THE STANDARD LIBRARIES IN DRAFT STANDARD C

   One  of C's features that its fans  are justifiably proud of is the
tendency  of  many  C  compilers  to  provide lots  of useful built-in
"library  functions",  which  do  things  like  I/O,  string handling,
mathematical  functions  (exp,  log,  etc.), and more.  In addition to
getting "the language" itself, C proponents say, you also get a lot of
nice functionality that you COULD have implemented yourself, but would
rather not have to.

   Now,  Standard PASCAL has some such  functions (mostly in the field
of  I/O  and mathematics); PASCAL/3000 and  PASCAL/XL add more (mostly
strings  and more I/O); SPL, being fixed  to the HP3000, relies on the
HP3000 System Intrinsics.

   Kernighan  &  Ritchie  C,  to  be  honest, is  actually INFERIOR to
Standard  PASCAL  insofar  as  built-in  functions  go --  although it
defines  a  standard  set  of  I/O  functions,  it doesn't  define any
standard  mathematical  functions, nor does  it define standard string
handling functions (which Standard PASCAL doesn't, either).

   However,  many  C  compilers  quickly  evolved  their  own  sets of
supported  library  functions,  and  the  Draft  Proposed  C  Standard
enumerates   and   standardizes   them   all.  Remember,  though,  the
considerations involved in relying on the Draft Proposed C Standard:

   *  On  the  one  hand,  being new and not  even finalized yet, most
     existing  compilers  are likely to differ with  it in quite a few
     respects.  In  fact,  it'll  probably  be  years  before  most  C
     compilers fully conform to the new Standard.

   *  On  the  other  hand,  the  Draft  Standard is  not created from
     scratch.  All or most of the  functionality that it sets down has
     already  been  implemented  in  one  or  more  of the  existing C
     compilers.  In particular, at least  string handling packages and
     mathematical functions are available in virtually all C compilers
     (although not necessarily entirely standardized).

   The question of these sorts of built-in support functions is not an
earth-shaking  one; almost by definition  of C, any function described
here  can  be  implemented  by  any  C  programmer, and  most of these
functions are probably ordinary C-written functions that just happened
to have been provided by the compiler writer.

   However,  I  think  that it is somewhat  important to mention these
functions simply because although you CAN write them, you'd rather not
write  anything you don't have to. Any time the standard is thoughtful
enough to provide date and time support (how many thousands of various
personal  implementations of date handlers are  there? and how many of
them  actually  work?)  or built-in binary  search or sort mechanisms,
that's something to be thankful for.


          INTERESTING FUNCTIONS PROVIDED BY DRAFT STANDARD C

* RAND, a random number generator. Quite simple to implement yourself.

*  ATEXIT, which allows you to specify  one or more functions that are
  to be called when the program terminates normally. These may release
  various resources, flush buffers to disc, etc.

  Actually,  this is a very useful  construct, one that I think should
  be  available  in  any  language on any  operating system. The major
  problem  here  is that you'd like the  ATEXIT functions to be called
  whenever the program terminates, whether normally or abnormally.

*  BSEARCH, which does a binary search  of a sorted in-memory array. A
  nice  thing,  especially  since  it  often  isn't  provided  by  the
  underlying  operating system (for instance,  there's no intrinsic to
  do this on the HP3000). Note, however, that this is quite limited in
  application,  since  you usually want to  search files or databases,
  not just simple arrays.

*  QSORT,  which  sorts  an  in-memory array. Again,  rather nice, but
  limited  because it only works on  in-memory arrays and not on files
  or databases.

*  MEMCPY  and  MEMMOVE,  which can copy  arrays very fast (presumably
  faster  than  a  normal  FOR  loop).  This  is comparable  with, but
  different from, PASCAL/XL's MOVE_FAST, MOVE_L_TO_R, and MOVE_R_TO_L.

  Note  that  this  is  somewhat  different in spirit  from the string
  handling  functions like STRCPY and STRNCPY  -- this is intended for
  arbitrary   arrays,   and  doesn't  care  about,  say,  '\0'  string
  terminators.

* MEMCHR finds a character in an array; MEMSET sets all elements of an
  array  to a given character. Again,  note the emphasis here on speed
  (if  the computer supports special fast search/fast set instructions
  (like  the  HP3000 does), these functions ought  to use them) and on
  working  with  ARRAYS  rather  than STRINGS  (neither function cares
  about '\0' string terminators).

* Built-in DATE and TIME handling functions:

  - Return the current date and time.

  -  Convert  the  internal  date/time  representation to  a structure
    containing  year,  month,  day, hour, minute,  and second; convert
    backwards, too.

  - Compute the difference between two days and times.

  -  Convert  an  internal  time  into  a text string  of an arbitrary
    user-specified format. You can, for instance, say,

      strftime (s, 80, "%A, %d %B %Y;  %I:%M:%S %p", &time);

    and  the  string  S  (whose maximum length was  given as 80) would
    contain a representation of "time" as, for instance:

      Friday, 29 February 1968;  04:50:33 PM

    The  third  parameter to STRFTIME is  a format string; "%A" stands
    for the full weekday name, "%d" for the day of the month, "%B" for
    the  full  month name, etc. As you  can see, this is a non-trivial
    feature, one that many operating systems (e.g. MPE) don't provide,
    and one that you'd rather not have to implement yourself.

*  Character  handling  functions,  such as "isalpha"  (is a character
  alphabetic  or not?), "isdigit" (is it a digit"), "toupper" (convert
  a character to uppercase), etc.

*  If you care about these sorts  of things, Draft Standard C provides
  for  "native  language"  support  (called  "localization"  in  the C
  standard).  This  means  that  "isalpha",  string  comparisons,  the
  time-handling  functions,  etc.  are  defined to  return whatever is
  appropriate for the local language and character set, be it English,
  Dutch, Czech, or Swahili (well, maybe not Swahili).


                               SUMMARY

   If  you have not yet guessed, I  am by nature a loquacious man. For
every  issue I've raised, I've  spent pages providing examples, giving
arguments, discussing various points of view.

   This  was  all  intentional;  rather  than  just presenting  my own
opinions,  I  wanted to give as many of  the facts as possible and let
you  come  to your own conclusions. However,  this resulted in a paper
that  was  200-odd  pages  long  -- not, I  would conjecture, the most
exciting and tittilating 200 pages that were ever written.

   In  this  section  I want to present a  summary of what I think the
various  merits  and  demerits  of SPL, PASCAL, and  C are. All of the
things  I mention are discussed in more detail elsewhere in the paper,
so if you want clarification or evidence, you'll be able to find it. I
hope,  though,  that these lists themselves  might put all the various
arguments and alternatives in better perspective.

   Remember,  however,  as  you  read  this  --  if  this  all  sounds
opinionated  and  subjective,  all  the  evidence is  elsewhere in the
paper, if you want to read it!


                         THE TROUBLE WITH SPL

   [This  section includes all those things that make SPL hard to work
in. This isn't just "features that exist in other languages but not in
SPL" -- these are what might be considered drawbacks (serious or not),
things  that you're likely to run into and regret while programming in
SPL.]

   *  SPL  IS COMPLETELY NON-PORTABLE. There  is no HP-supplied Native
     Mode  SPL  on  Spectrum, and certainly not  on any other machine.
     (Note:  Software Research Northwest intends to have a Native Mode
     SPL compiler released by MAY 1987.)

   *  SPL's  I/O  FACILITY  FRANKLY  STINKS. Outputting  and inputting
     either  strings  or numbers is a  very difficult proposition -- I
     think  this  is  the  major  reason  why more  HP3000 programmers
     haven't learned SPL.

   *  SPL HAS NO RECORD STRUCTURES. This  is a severe problem, but not
     fatal  --  there  are  workarounds,  though none of  them is very
     clean. See the "DATA STRUCTURES" section for more details.


                   THE TROUBLE WITH STANDARD PASCAL

   * STANDARD PASCAL's PROCEDURE PARAMETER TYPE CHECKING IS MURDEROUS:

     -  YOU  CAN'T  WRITE  A  GENERIC  STRING  PACKAGE  OR  A  GENERIC
       MATRIX-HANDLING  PACKAGE because the  same procedure can't take
       parameters of varying sizes! That's right -- you either have to
       have  all  your strings be 256-byte  arrays (or some such fixed
       size),  or have a different procedure  for each size of string!
       Try  writing a general matrix multiplication routine; it's even
       more fun.

     -  YOU CAN'T WRITE A GENERIC ROUTINE THAT HANDLES DIFFERENT TYPES
       OF  RECORD STRUCTURES OR ARRAYS FOR  ARGUMENTS. Say you want to
       write  a procedure that, say, does a DBPUT and aborts nicely if
       you  get an error; or does a DBGET; or does anything that might
       cause  it  to  want  to take a parameter  that's "AN ARRAY OR A
       RECORD OF ANY TYPE". You can't do it! You must have a different
       procedure for each type!

     -  YOU CAN'T WRITE A  WRITELN-LIKE PROCEDURE THAT TAKES INTEGERS,
       STRINGS,  OR  FLOATS  (perhaps  to  format  them  all  in  some
       interesting way).

   *  IN STANDARD PASCAL, YOUR PROGRAM AND ALL THE PROCEDURES IT CALLS
     MUST  BE  IN  THE  SAME FILE! That's right  -- if your program is
     20,000  lines, it must all be in one  file, and all of it must be
     compiled together.

   *  STANDARD PASCAL HAS NO  BUILT-IN STRING HANDLING FACILITIES, AND
     NO  MECHANISM  FOR  YOU  TO  IMPLEMENT THEM. Not  only are simple
     things  like  string  comparison, copying, etc.  (which are built
     into  SPL)  missing;  you  can't  write  generic  string handling
     routines of your own unless all your strings have the same length
     and occupy the same amount of space (see above)!

   * STANDARD PASCAL's I/O FACILITY IS ABYSMAL.

     -  YOU CAN'T WRITE A STRING  WITHOUT CAUSING THE OUTPUT TO DEVICE
       TO  GO TO A NEW LINE (i.e.  you can't just "prompt the user for
       input" and have the cursor remain on the same line).

     - YOU CAN'T OPEN A FILE FOR APPEND OR INPUT/OUTPUT ACCESS.

     -  YOU CAN'T OPEN A FILE WITH A GIVEN NAME. So you want to prompt
       the  user  for a filename and open  that file? Tough cookies --
       Standard PASCAL has no way of letting you do that.

     -  IF  YOU PROMPT THE USER FOR  NUMERIC INPUT, THERE'S NO WAY FOR
       YOUR PROGRAM TO CHECK IF HE TYPED CORRECT DATA. Say you ask him
       for  a  number  and  he types "FOO";  what happens? The program
       aborts!  It doesn't return an error  condition to let you print
       an error and recover gracefully -- it juts aborts!

     -  SIMILARLY, IF YOU TRY TO OPEN  A FILE AND IT DOESN'T EXIST (or
       some   such  file  system  error  occurs  on  any  file  system
       operation),  YOU  DON'T  GET  AN  ERROR  CODE  BACK --  YOU GET
       ABORTED!  What  a  loss!    -- YOU  CAN'T DO "DIRECT ACCESS" --
       READ  OR  WRITE  A  PARTICULAR RECORD GIVEN  ITS RECORD NUMBER.
       Think about it -- how can you write any kind of disc-based data
       structure (like a KSAM- or IMAGE-like file) without some direct
       access facility? Even FORTRAN IV has it!

   *  OTHER, LESS PAINFUL, BUT  STILL SIGNIFICANT LIMITATIONS INCLUDE:
     (These  are  things which you can  certainly live without, unlike
     some  of  the  problems  above,  which  can  be  extremely grave.
     However,  although  you  can  live  without them,  they are still
     desirable,  and  in  Standard  PASCAL  --  partly because  of its
     restrictive  type  checking  --  you CAN'T emulate  them with any
     degree of ease. Their lack, incidentally, is felt particularly in
     writing large system programming applications.)

     -  STANDARD  PASCAL  DOESN'T ALLOW YOU  TO DYNAMICALLY ALLOCATE A
       STRING  OF  A  GIVEN  SIZE  (where  the  size  is not  known at
       compile-time).  PASCAL  talks  much  about its  NEW and DISPOSE
       functions, which dynamically allocate space at run-time.

       These  functions  are  certainly  very useful, and  are in fact
       essential  to  many systems  programming applications. However,
       say you want to allocate an array of X elements, where X is not
       known  at compile-time -- YOU CAN'T!  You can allocate an array
       of,  say,  1024 elements, forbidding X  to be greater than 1024
       and  wasting space if X is less than 1024; you CAN'T simply say
       "give me X bytes (or words) of memory".

     -  STANDARD  PASCAL  HAS  NO  REASONABLE  MECHANISM  FOR DIRECTLY
       EXITING OUT OF SEVERAL LAYERS OF PROCEDURE CALLS. Say that your
       lowest-level  parsing  routine  detects  a syntax  error in the
       user's  input  and  wants  to  return  control directly  to the
       command  input  loop  (the  larger  and  more  complicated your
       application,  the  more  common  it  is  that  you  want  to do
       something like this).

       "Un-structured"  as  this  may seem, it  can be quite essential
       (see  the  "CONTROL  STRUCTURES  --  LONG JUMPS"  chapter), and
       Standard  PASCAL provides only very  shabby facilities of doing
       this.

     -  STANDARD PASCAL HAS NO WAY  FOR HAVING VARIABLES THAT POINT TO
       FUNCTIONS  AND PROCEDURES. Strange as  this may seem, variables
       that  point to procedures/ functions can  be VERY useful -- see
       the  chapter  on  "PROCEDURE  AND FUNCTION  VARIABLES" for full
       details.  Interestingly, even Standard  PASCAL recognizes their
       utility by implementing PARAMETERS that point to procedures and
       functions,  but  it  doesn't  go all the  way and let arbitrary
       VARIABLES do it.

   If  you respond that many PASCALs fix many of these drawbacks, I'll
agree  --  BUT  WHAT HAPPENS TO PORTABILITY?  If you use PASCAL/3000's
string handling package (a pretty nice one, too), how are you going to
port  your  program to, say, a PC  implementation that has a different
string  handling  package?  What's more, some  implementations -- like
PASCAL/3000  itself -- don't solve many of the most important problems
listed above!


                THE TROUBLE WITH KERNIGHAN & RITCHIE C

   *  WHERE STANDARD PASCAL's PROCEDURE PARAMETER TYPE CHECKING IS TOO
     RESTRICTIVE, K&R C's IS NON-EXISTENT! If you write a procedure

        p (x1, x2, x3)
        int x1;
        char *x2;
        int *x3;

     and call it by saying

        p (13.0, i+j, 77, "foo")

     the  compiler won't utter a peep. Not only won't it automatically
     convert  13.0  to  an integer -- it  won't print an error message
     about that, OR that "I+J" is not a character array, OR that 77 is
     not an integer pointer (which probably means that P expects X3 to
     be a by-reference parameter and you passed a by-value parameter),
     OR EVEN THAT YOU PASSED THE WRONG NUMBER OF PARAMETERS!

   *  WHILE NOT AS BAD AS PASCAL's,  K&R C's I/O FACILITIES STILL HAVE
     SOME MAJOR LACKS. Most serious are:

     - NO DIRECT ACCESS (read record #X).

     - NO INPUT/OUTPUT ACCESS.

   *  K&R  C, THOUGH FAIRLY STANDARD,  HAS NO STANDARD STRING PACKAGE.
     Unlike  in  Standard  PASCAL, though, it's  fairly easy to write,
     since  you  CAN write a C procedure  that takes, say, a string of
     arbitrary length.

   *  C  IS UGLY AS SIN. At  least that's what some PASCAL programmers
     say;  C programmers obviously disagree. People complain that C is
     just  plain UGLY and thus  (subjectively) difficult to read; they
     talk about everything from the "{" and "}" that C uses instead of
     "BEGIN" and "END" to C's somewhat arcane operators, like "+=" and
     "--".  I'm  not  saying  that  this  is  either  TRUE  or  FALSE;
     unfortunately, it's much too subjective to discuss in this paper.

     However, don't be surprised if you decide that on all the merits,
     C  is  superior but you can't stand  writing with all these funny
     special  characters;  or, that PASCAL is  the best, but it's much
     too verbose for you!


  IS ISO LEVEL 1 STANDARD PASCAL ANY BETTER THAN THE ANSI STANDARD?

   *  THE ONLY DIFFERENCE BETWEEN ISO  LEVEL 1 STANDARD PASCAL AND THE
     ANSI  STANDARD (what I call simply  "Standard PASCAL") IS THAT IT
     ALLOWS  YOU  TO  WRITE  PROCEDURES  THAT TAKE  ARRAYS OF VARIABLE
     SIZES.  This  eliminates  one  of the worst  problems in Standard
     PASCAL -- that you can't write a generic string handling package,
     or  a matrix multiplication routine, etc.; however, all the other
     problems  (lack  of  separate  compilation, bad  I/O, etc.) still
     remain.

   *  NOTE  THAT  IT'S  NOT  CLEAR HOW MANY  NEW PASCAL COMPILERS WILL
     FOLLOW  THE ISO LEVEL 1 STANDARD. The ISO Standard document makes
     it  clear  that  implementing  this feature  is optional (without
     them,   a  compiler  will  conform  only  to  the  "ISO  Level  0
     Standard"); PASCAL/3000 doesn't implement it, but PASCAL/XL does.


         IS PASCAL/3000 ANY BETTER THAN ANSI STANDARD PASCAL?

   * PASCAL/3000 supports:

     - A PRETTY GOOD STRING PACKAGE.

     - IMPROVED, though still somewhat difficult, I/O.

     - THE ABILITY TO COMPILE A PROGRAM IN SEVERAL PIECES.

     -  THE  ABILITY  TO  WRITE  A PROCEDURE OR  FUNCTION THAT TAKES A
       STRING (BUT NOT ANY OTHER KIND OF ARRAY) OF VARIABLE SIZE.

   * The remaining problems still include:

     -  PARAMETER  TYPE CHECKING STILL WAY  TOO TIGHT. You still can't
       write a procedure that takes an integer array of arbitrary size
       or  an  arbitrary  array/record  (say,  to do  DBGETs or DBPUTs
       with);  you  still  can't  write, say,  a matrix multiplication
       routine (just as an example).

     - I/O STILL HAS PROBLEMS:

       *  IT'S  STILL  VERY DIFFICULT (not  impossible, but still very
         painful) TO TRAP ERROR CONDITIONS, SUCH AS FILE SYSTEM ERRORS
         OR INCORRECT NUMERIC INPUT.

       *  YOU CAN'T OPEN A FILE USING  "FOPEN" AND THEN USE THE PASCAL
         I/O  SYSTEM  WITH  IT.  This means that any  time you need to
         specify  a  feature that PASCAL's OPEN  doesn't have (such as
         "open   temporary   file",  "build  a  new  file  with  given
         parameters",  "open  a file on the  line printer", etc.), you
         can't  just call FOPEN and  then use PASCAL's I/O facilities.
         You  either have to do  all FOPEN/FWRITE/FCLOSEs, or you have
         to  issue  a  :FILE  equation,  which is  difficult and still
         doesn't give you all the features you want.

     -  THE  "LESS IMPORTANT BUT  STILL SUBSTANTIAL" LIMITATIONS STILL
       EXIST -- it's hard to allocate variable-size strings, you can't
       immediately  exit several levels of nesting, and you can't have
       variables that point to functions or procedures.

   *  REMEMBER -- YOU CAN'T COUNT  ON "A PARTICULAR IMPLEMENTATION" TO
     SAVE   YOU  HERE!  If  you  could  live  with  Standard  PASCAL's
     restrictions  by knowing that, say, string handling or a good I/O
     facility   would   surely   be   implemented  by  any  particular
     implementation,    remember:    PASCAL/3000   is   a   particular
     implementation!  If you run into  a restriction with PASCAL/3000,
     that's  it; you either have to work  around it or use a different
     language.


           IS PASCAL/XL ANY BETTER THAN THE ANSI STANDARD?

   Surprisingly,  yes.  ALL  OF  THE  MAJOR PROBLEMS I  POINTED OUT IN
STANDARD  PASCAL SEEM TO HAVE BEEN  FIXED IN PASCAL/XL! The only words
of caution are:

   *  IT  MAY  BE  GREAT,  BUT  IT'S  NOT  PORTABLE  --  NOT  EVEN  TO
     PRE-SPECTRUMS!  HP  still  hasn't announced when  (if ever) it'll
     implement  all of PASCAL/XL's great  features on the pre-Spectrum
     machines.  As  long  as  it doesn't, you'll  have to either avoid
     using of all PASCAL/XL's wonderful improvements, or be stuck with
     code that won't run on pre-Spectrum 3000's!

   *  BE SKEPTICAL. "New implementations" always look great, precisely
     because  we haven't had the chance to really use them. For all we
     know,  the  compiler  may  be  ridden  with bugs, or  it might be
     excruciatingly  slow  in  compiling  your  program,  or  it might
     generate  awfully  slow  code!  Even  more  likely, there  may be
     serious design flaws that make programming difficult -- it's just
     that  we  won't  notice  them  until  we've programmed  in it for
     several months! As I said, BE SKEPTICAL.


     IS DRAFT ANSI STANDARD C BETTER THAN KERNIGHAN & RITCHIE C?

   Again,  it seems it might be!  It's standardized the I/O and string
handling  facilities (and they're pretty good  ones at that), AND it's
implemented some nice-looking parameter checking. Still, beware:

   *  BEING A "DRAFT STANDARD", IT  MIGHT BE YEARS (OR DECADES) BEFORE
     ALL  OR MOST C COMPILERS HAVE ALL OF ITS FEATURES. Note, however,
     that  most  modern C compilers already  include some of the Draft
     Standard's  new  features, except for  the strengthened parameter
     checking, which is still relatively rare.

   * IF YOU THOUGHT KERNIGHAN & RITCHIE C WAS UGLY, YOU'LL STILL THINK
     THIS  ABOUT DRAFT STANDARD C. I don't want to imply that K&R C IS
     ugly   --  it's  just  that  many  old  SPL,  PASCAL,  and  ALGOL
     programmers  think so. It may not be objectively demonstrable, or
     even  objectively discussible; however,  that's the reaction I've
     seen  in some (more than a few!) people. All I can say is this --
     if you suffer from it, the Draft Standard still won't help you.


      NICE FEATURES THAT SOME LANGUAGES DON'T HAVE AND OTHERS DO

   The  "PROBLEMS  WITH"  sections  discussed  things that  could make
programming in SPL, PASCAL, or C a miserable experience. It emphasized
some  things that were show-stoppers and  others that simply frayed on
the nerves; one thing it conspicuously EXCLUDED were the good features
that you could live without, but would rather have. The following is a
summary  of all these, plus some of the things we've already mentioned
above.

[Legend: "STD PAS" = Standard PASCAL or ISO Level 1 Standard;
         "STD C" = Draft Proposed ANSI Standard;
         "YES" = good implementation of this feature;
         "YES+" = excellent or particularly nice implementation;
         "YES-" = OK, so they've got it, but it's rather ugly;
         "NO" = no;
         "HNO" = Hell, no!;
         "---" = Major loss!  No support of REALLY IMPORTANT feature]


                                       STD  PAS/ PAS/ K&R  STD
                                       PAS  3000 XL   C    C    SPL

RECORD STRUCTURES                      YES  YES  YES  YES  YES  NO

STRINGS                                ---  YES+ YES+ YES- YES+ YES

ENUMERATED DATA TYPES                  YES  YES  YES  NO   YES- NO
  (see "DATA STRUCTURES")

SUBRANGE TYPES                         YES  YES  YES  NO   NO   NO
  (see "DATA STRUCTURES"; may not
   be all that useful)

OPTIONAL PARAMETER/VARIABLE NUMBER     NO   NO   YES+ YES- YES  YES
  OF PARAMETERS SUPPORT
  (like SPL "OPTION VARIABLE")

NUMERIC FORMATTING/INPUT               YES- YES- YES  YES+ YES+ YES-

FILE I/O                               YES- YES  YES+ YES- YES  YES
  (see "FILE I/O" chapter for more)

BIT ACCESS                             NO   YES  YES  YES  YES  YES+
  (see "OPERATORS")

POINTER SUPPORT                        NO   NO   YES  YES  YES  YES

THE ABILITY TO WRITE PROCEDURE-LIKE    NO   NO   YES  YES  YES  NO
  CONSTRUCTS THAT ARE COMPILED
  "IN-LINE", FOR MAXIMUM EFFICIENCY
  PLUS MAXIMUM MAINTAINABILITY

LOW-LEVEL ACCESS                       HNO  HNO  HNO  NO   NO   YES
  (ASSEMBLEs, TOS, registers --
   often useless, sometimes vital!)


               REALLY NICE FEATURES TO PAY ATTENTION TO

   Just some interesting things, mostly implemented in only one of the
three languages. I just wanted to draw your attention to them, because
they can be quite nice:

   *  PASCAL/XL'S TRY/RECOVER CONSTRUCT. A really nifty contraption --
     see the "CONTROL STRUCTURES" chapter for more info.

   *  C's "FOR" LOOP. You might think  it's ugly, but it's quite a bit
     more  powerful  --  in  some  very  useful ways --  than SPL's or
     PASCAL's looping constructs.

   *  C's "#define" MACRO FACILITY. I wish  that PASCAL and SPL had it
     too; it lets you do procedure-like things without the overhead of
     a  procedure  call  AND  without the  maintainability problems of
     writing  the code in-line. ALSO, it  lets you add interesting new
     constructs  to  the  language  (like  defining  your  own looping
     constructs, etc.).

   *  SPL's LOW-LEVEL SYSTEM ACCESS. Although you'd rather not have to
     worry  about registers, TOSs, ASSEMBLEs, etc., sometimes you need
     to be able to manipulate them -- SPL lets you to do it.

Go to Adager's index of technical papers