HOW PROGRAMMING LANGUAGES DIFFER:
A CASE STUDY OF SPL, PASCAL, AND C
by Eugene Volokh, VESOFT
Presented at 1987 SCRUG Conference, Pasadena, CA
Presented at 1987 INTEREX Conference, Las Vegas, NV, USA
Published by The HP CHRONICLE, May 1987-May 1988.
ABSTRACT: The HP3000's wunderkind sets out to study Pascal, C and SPL
for the HP mini in a set of articles, using real-life examples and
plenty of tips on how to code for optimum efficiency in each language.
First in the series: ground rules for the comparison and a look at
control structures. (The HP CHRONICLE, May 1987)
INTRODUCTION
Programmers get passionate about programming languages. We spend
most of our time hacking code, exploiting the language's features,
being bitten by its silly restrictions. There are dozens of languages,
and each one has its fanatical adherents and its ardent detractors.
Some like APL, some like FORTH, LISP, C, PASCAL; some might even like
COBOL or FORTRAN, perish the thought.
In particular, a lot of fuss has recently arisen about SPL, PASCAL,
and C. All three of them are considered good "system programming"
(whatever that is) languages, and naturally people argue about which
one is the best.
HP's Spectrum project has come out in favor of PASCAL -- all new
MPE/XL code will be written in PASCAL, and HP won't even provide a
native mode SPL compiler. On the other hand, HP's also getting more
and more into UNIX, which is coded entirely in C. Especially between C
and PASCAL adherents there seems to be something like a "holy war"; it
becomes not just a matter of advantages and disadvantages, but of Good
and Evil, Right and Wrong. Strict type checking is Good, some say --
loose type checking is Evil; pointers are Wrong -- array indexing is
Right. The battle-lines are drawn and the knights are sharpening their
swords.
But, some ask -- what's the big deal? After all, it's an axiom of
computer science that all you need is an IF and a GOTO, and you can
code anything you like. Theoretically speaking, C, SPL, and PASCAL are
all equivalent; practically, is there that much of a difference?
In other words, is it just esthetics or prejudice that animate the
ardent fans of C, PASCAL, or SPL, or are there real, substantive
differences between the languages -- cases in which using one language
rather than another will make your life substantially easier? Are the
main differences between, say, C and PASCAL that PASCAL uses BEGIN and
END and C uses "{" and "}"? That C's assignment operator is "=" and
PASCAL's is ":="?
The goal of this paper is to answer just this question. I will try
to analyze each of the main areas where SPL, C, and PASCAL differ, and
point out those differences using actual programming examples. I'll
try not to emphasize vague, general statements, like "PASCAL does
strict type checking", or subjective opinions, like "C is too hard to
read"; rather, I want to use SPECIFIC EXAMPLES which can help make
clear the exact influence of strict or loose type checking on your
programming tasks.
RULES OF EVIDENCE
Saying that I'll "compare SPL, PASCAL, and C" isn't really saying a
whole lot. How will I compare them? What criteria will I use to
compare them? Will I compare how easy it is to read them or write
them? Will I compare what programming habits they instill in their
users? Which versions of these languages will I compare?
To do this, and to do this in as useful a fashion as possible, I
set myself some rules:
* I resolved to try to show the differences by use of examples,
preferably as real-life as possible. The emphasis here is on
CONCRETE SPECIFICS, not on general statements such as "C is less
readable" or "PASCAL is more restrictive".
* I decided not to go into questions of efficiency. Compiling a
certain construct using one implementation of a compiler may
generate fast code, whereas a different implementation may
generate slow code. Sure, the FOR loop in PASCAL/3000 may be less
efficient than in SPL or in CCS's C/3000, but who knows how fast
it'll be under PASCAL/XL?
For this reason, I don't wax too poetic about the efficiency
advantages of features such as C's "X++" (which increments X by
1) -- a modern optimizing compiler is quite likely to generate
equally fast code for "X:=X+1", automatically seeing that it's a
simple increment-by-one (even the 15-year-old SPL/3000 compiler
does this).
The only times when I'll mention efficiency is when some feature
is INHERENTLY more or less efficient than another (at least on a
conventional machine architecture); for instance, passing a large
array BY VALUE will almost certainly be slower than passing it BY
REFERENCE, since by-value passing would require copying all the
array data.
Even in these cases, I try to play down performance
considerations; if you're concerned about speed (as well you
should be), do your own performance measurements for the features
and compiler implementations that you know you care about.
* I resolved -- for space reasons if for no other -- not to be a
textbook for SPL, PASCAL, or C. Some of the things I say apply
equally well to almost all programming languages, and I hope that
they will be understandable even to people who've never seen SPL,
PASCAL, or C.
For other things, I rely on the relative readability of the
languages and their similarity to one another. I hope that if you
know any one of SPL, PASCAL, or C, you should be able to
understand the examples written in the other languages.
However, it may be wise for you to have manuals for these three
languages -- either their HP3000 implementations or general
standards -- at hand, in case some of the examples should prove
too arcane.
* As you can tell by the size of this paper, I also decided to be
as thorough as practical in my comparisons, and ESPECIALLY in the
evidence backing up my comparisons.
One of the main reasons I wrote this paper is that I hadn't seen
much OBJECTIVE discussion comparing C and PASCAL; I wanted not
just to present my conclusions -- which might as easily be based
on prejudice as on fact -- but also the reasons why I arrived at
them, so that you could decide for yourself.
So as not to burden you with having to read all 200-odd pages,
though, I've summarized my conclusions in the "SUMMARY" chapter.
You might want to have a look there first, and then perhaps go
back to the main body of the paper to see the supporting evidence
of the points I made.
WHAT ARE C AND PASCAL, ANYWAY?
If you think about it, SPL is a very unusual language indeed. To
the best of my knowledge, there is exactly one SPL compiler available
anywhere, on any computer (eventually, the independent SPLash! may be
available on Spectrum, but that is another story). I can say "SPL
supports this" or "SPL can't do that" and, excepting differences
between one chronological version of SPL and the next, be absolutely
precise and objectively verifiable. SPL can be said to "support"
something only because there is only one SPL compiler that we're
talking about.
To say "PASCAL can do X" is a chancy proposition indeed. ANSI
Standard PASCAL doesn't support variable-length strings, but most
modern PASCAL implementations, including HP PASCAL, have some sort of
string mechanism. What about HP's new PASCAL/XL, reputed to be even
more powerful still? Similarly, with C, there are the old "Kernighan &
Ritchie" C, the proposed new ANSI standard C, whatever it is that HP
uses on the Spectrum, AND whatever you use on the 3000, which might be
CCS's C compiler or Tymlabs' C.
On the one hand, I contemplated comparing standard C and standard
PASCAL. This is easier for me, and it also makes sense from a
portability point of view (if you want it to be portable, you're
better off using the standard, anyway).
On the other hand, portability is fine and dandy, but most people
aren't going to be porting their software any further than from an
MPE/XL machine to an MPE/V machine and back. As long as you stick to
HP3000s, you have the full power of so-called "HP PASCAL", an extended
superset of PASCAL that's supported on 3000s, 1000s, 9000s, and the
rest; it's hardly fair (or practical) to ignore this in a comparison.
Finally, what about PASCAL/XL? It'll have even more useful
features, but they may not be ported back to the MPE/V machines, at
least for a while. Should I then compare PASCAL/XL and C/XL, a
representative contest for the XL machines, but not necessarily for
MPE V machines, and certainly not if you really want to port your
software onto other machines.
This is all, incidentally, aggravated by the fact that HP's
extensions to PASCAL are more substantial than its extensions to C;
thus, comparing the "standards" is likely to put PASCAL in a
relatively worse light than comparing "supersets" (not to say that
PASCAL is worse than C in either case).
Faced with all this, I've decided to compare everything with
everything else. There are actually 7 different compilers I discuss at
one time or another:
* SPL.
There's only one, thank God.
* Standard PASCAL.
This is the original ANSI Standard, on which all other PASCALs
are based. This is also very similar to Level 0 ISO Standard
PASCAL (see next item).
* Level 1 ISO Standard PASCAL.
This standard, put out in the early 1980's, supports so-called
CONFORMANT ARRAY parameters (see the DATA STRUCTURES chapter).
The same standard document defined "Level 0 ISO Standard PASCAL"
to be much like classic "Standard PASCAL", i.e. without
conformant arrays. Compiler writers were given the choice of
which one to implement, and it isn't obvious how popular Level 1
ISO Standard will be. When I say "Standard PASCAL", I mean the
original standard, which is almost identical to the ISO Level 0
Standard.
* PASCAL/3000.
This is HP's implementation of PASCAL on the pre-Spectrum HP3000.
Although the Spectrum machines will also be called 3000's, when I
say PASCAL/3000 I mean the pre-Spectrum version. PASCAL/3000 is
itself a superset of HP Pascal, which is also implemented by HP
on HP 1000s and HP 9000s. PASCAL/3000 is a superset of the
original Standard PASCAL, not the ISO Level 1 Standard.
* PASCAL/XL.
This is HP's implementation of PASCAL on the Spectrum. It's
essentially a superset of both PASCAL/3000 and the ISO Level 1
Standard.
* Kernighan & Ritchie (K&R) C.
This is the C described by Brian Kernighan and Dennis Ritchie in
their now-classic book "The C Programming Language" (which, in
fact, is usually called "Kernighan and Ritchie"). Although never
an official standard, it is quite representative of most modern
C's. In fact, for practical purposes, it can be said that a
program written in K & R C is portable to virtually any C
implementation (assuming you avoid those things that K&R itself
describes as implementation-dependent).
* Draft ANSI Standard C.
ANSI is now working on codifying a standard of C, which will have
some (but not very many) improvements over K&R. My reference for
this was Harbison & Steele's book "C: A Reference Manual", which
also discusses various other implementations of C. Although Draft
ANSI Standard C is Standard, it is also Draft. Some of the
features described in it are implemented virtually nowhere, and
it's not clear how much of them C/XL will include.
Matters are further complicated, of course, by the lack of an
HP-provided C compiler on the pre-Spectrum HP3000. The compiler I used
to research this paper is CCS Inc.'s C/3000 compiler, which is a
super-set of K&R C and a subset of Draft ANSI Standard C. The most
conspicuous Draft Standard feature that CCS C/3000 lacks is Function
Prototypes -- an understandable lack since virtually all other C
compilers don't have them, either.
Whenever any difference exists between any of the PASCAL or C
versions, I try to point it out. Which versions you compare are up to
you:
* You can compare Standard PASCAL and K&R C.
If it isn't in these general standards that everybody implements,
you're unlikely to get much portability.
* You can compare PASCAL/XL and Draft ANSI Standard C.
These are the compilers that will most likely be available on the
Spectrum.
* You can compare PASCAL/3000 and Draft ANSI Standard or K&R C.
Even though you might not usually care about porting to, say, an
IBM or a VAX, you may very seriously care about porting from the
pre-Spectrum to the Spectrum and vice versa. HP hasn't promised
to port PASCAL/XL back to the pre-Spectrums, so PASCAL/3000 is
probably the lowest common denominator.
SPL is nice. At least until SPLash!'s promised Native Mode SPL
compiler comes out, there's only one SPL compiler to compare with.
This makes me very happy.
ARE C, PASCAL, AND SPL
FUNDAMENTALLY DIFFERENT OR
FUNDAMENTALLY ALIKE?
In my opinion, they are definitely FUNDAMENTALLY ALIKE. In the rest
of the paper, I'll tell you all about their differences, but those are
EXCEPTIONS in their fundamental similarity.
Why do I think so? Well, virtually every important construct in
either of the three languages has an almost exact parallel in the
other two (the only exception being, perhaps, record structures, which
SPL doesn't have).
* All three languages emphasize writing your program as a set of
re-usable, parameterized procedures or functions (which, for
instance, COBOL 74 and most BASICs do not);
* All three languages share virtually the same rich set of control
structures (which neither FORTRAN/IV nor BASIC/3000 possesses).
* The languages may on the surface LOOK somewhat different (PASCAL
and C certainly do), but remember that the ESSENCE is virtually
identical -- PASCAL may say "BEGIN" and "END" where C says "{"
and "}", but that's hardly a SUBSTANTIVE difference.
Despite all the differences which I'll spend all these pages
describing -- and I think many of the differences are indeed very
important ones -- I still think that SPL, PASCAL, and C are about as
close to each other as languages get.
SO, WHICH IS BETTER -- C, PASCAL, OR SPL?
You think I'm going to answer that? With all my pretensions to
objectivity, and dozens of angry language fanatics ready to berate me
for choosing the "wrong one"?
The main purpose of this paper is to show you all the differences
and let you decide for yourselves; after all, there are so many
parameters (how portable do you want the code to be? how much do you
care about error checking?) that are involved in this sort of
decision.
The closest I come to actually saying which is better is in the
"SUMMARY" chapter (at the very end of the paper); there I explain what
I think the major drawbacks and advantages of each language are. Look
there, but remember -- only you can decide which language is best for
your purposes.
TECHNICAL NOTE ABOUT C EXAMPLES
In case you didn't know, C differentiates between upper- and
lower-case. The variables "file" and "FILE" are quite different, as
are "file", "File", and "fILE". (In SPL and PASCAL, of course, case
differences are irrelevant; all of the just-given names would refer to
the same variable.)
In fact, in C programs the majority of all objects -- reserved
words, procedure names, variables, etc. -- are lower-case. The
reserved words ("if", "while", "for", "int", etc.) are required to be
lower-case by the standard; theoretically, you can name all your
variables and procedures in upper-case, but most C programmers use
lower-case for them, too (although they can sometimes use upper-case
variable names as well, perhaps to indicate their own defined types or
#define macros).
This is why all the examples of C programs in this paper are
written in lower-case. The one exception to this is when I refer to a
C object -- a variable, a procedure, or a reserved word -- within the
text of a paragraph. Then, I'll often capitalize it to set it off from
the rest of the paper, to wit:
proc (i, j)
int i, j;
{
if (i == j)
...
}
The procedure PROC takes two parameters, I, and J.
The IF statement makes sure that they're equal, ....
The fact that I refer to them in upper-case in the text doesn't
mean that you should actually use upper-case names. I just do it to
make the text more readable.
Another example of how a little lie can help reveal the greater
truth...
ACKNOWLEDGMENTS
I'd like to thank the following people for their great help in the
writing of this paper:
* CCS, Inc., authors of CCS C/3000, a C compiler for pre-Spectrum
HP3000s. All the research and testing of the C examples given in
this paper was done using their excellent compiler. In
particular, I'd also like to thank Tim Chase, who gave me a great
deal of help on some of the details of the C language.
* Steve Hoogheem of the HP Migration Center, who served as liaison
between me and the PASCAL/XL lab in answering my questions about
PASCAL/XL. * Mr. Tom Plum (of Plum Hall, Cardiff, NJ), a
recognized C expert and member of the Draft ANSI Standard C
committee, who was kind enough to answer many of the questions
that I had about the Draft Standard.
* Dennis Mitrzyk, of Hewlett-Packard, who helped me obtain much of
my PASCAL/XL information, and who was also kind enough to review
this paper.
* Joseph Brothers, David Greer (of Robelle), Dave Lange and Roger
Morsch (of State Farm Insurance), and Mark Wallace (of Robinson,
Wallace, and Company), all of whom reviewed the paper and
provided a lot of useful input and corrections.
CONTROL STRUCTURES
GOTOs, some say, are Considered Harmful. Perhaps they are and
perhaps they are not. But the major reason for the control structures
that PASCAL and C provide (as opposed to, say, FORTRAN IV, which
doesn't) is not that they replace GOTOs, but rather that they replace
them with something more convenient. If given the choice between
saying
IF FNUM = 0 THEN
PRINTERROR
ELSE
BEGIN
READFILE;
FCLOSE (FNUM, 0, 0);
END;
and
IF FNUM <> 0 THEN GOTO 10;
PRINTERROR;
GOTO 20;
10:
READFILE;
FCLOSE (FNUM, 0, 0);
20:
then I would choose the former. IF-THEN-ELSE is a common construct in
all of the algorithms we write, and it's easier for both the writer
and the reader to have a language construct that directly corresponds
to it.
C and PASCAL share some of the fundamental control structures. Both
have
* IF-THEN-ELSEs. They look slightly different:
IF FNUM=0 THEN { PASCAL }
PRINTERROR
ELSE
BEGIN
READFILE;
FCLOSE (FNUM, 0, 0);
END;
and
if (fnum==0) /* C */
printerror; /* note the semicolon */
else
{
readfile;
fclose (fnum, 0, 0);
}
but I hardly think the difference very substantial. There'll be
some who forever curse C for using lower-case or PASCAL for using
such L-O-N-G reserved words, like "BEGIN" and "END"; I can live
with either.
* WHILE-DOs, although again there are some minor differences
WHILE GETREC (FNUM, RECORD) DO
PRINTREC (RECORD);
vs.
while (getrec (fnum, record))
printrec (record);
* DO-UNTILs:
REPEAT
GETREC (FNUM, RECORD);
PRINTREC (RECORD);
UNTIL
NOMORERECS (FNUM);
and
do
{
getrec (fnum, record);
printrec (record);
}
while
(!nomorerecs (fnum)); /* "!" means "NOT" */
Note that PASCAL has a DO-UNTIL and C has a DO-WHILE. Big
difference.
* And, finally, C's and PASCAL's procedure support is comparable,
as well.
The interesting things, of course, are the points at which C and
PASCAL differ. There are some, and for those us who thought that
IF-THEN-ELSE and WHILE-DO are all the control structures we'll ever
need, the differences can be quite surprising.
THE "WHILE" LOOP AND ITS LIMITATIONS; THE "FOR" LOOP
It is, indeed, true, that all iterative constructs can be emulated
with the WHILE-DO loop. On the other hand, why do the work if someone
else can do it for you?
The PASCAL FOR loop -- a child of FORTRAN's DO -- is actually not
that hard to emulate:
FOR I:=1 TO 9 DO
WRITELN (I);
is identical, of course, to
I:=1;
WHILE I<=9 DO
BEGIN
WRITELN (I);
I:=I+1;
END;
Not such a vast savings, but, still, the FOR loop definitely looks
nicer.
Unfortunately, for all the savings that the FOR loop gives you,
I've found that it's not as useful as one might, at first glance,
believe. This is because it ALWAYS loops through all the values from
the start to the limit. How often do you need to do that, rather than
loop until EITHER a limit is reached OR another condition is found?
String searching, for instance -- you want to loop until the index is
at the end of the string OR you've found what you're searching for.
Always looping until the end is wasteful and inconvenient.
Looking through my MPEX source code, incidentally, I find 53 WHILE
loops and 8 FOR loops. In my RL, the numbers are 170 WHILEs and 38
FORs (at least 6 of these FORs should have been WHILEs if I weren't so
lazy). (How's that for an argument -- I don't use it, ERGO it is
useless. I'm rather proud of it.) In any case, though, my experience
has been that
* THE PURE "FOR" LOOP -- A LOOP THAT ALWAYS GOES ON UNTIL THE LIMIT
HAS BEEN REACHED -- IS NOT AS COMMON AS ONE MIGHT THINK IN
BUSINESS AND SYSTEM PROGRAMS (although scientific and engineering
applications, which often handle matrices and such, use pure FOR
loops more often). MORE OFTEN YOU WANT TO ALSO SPECIFY AN "UNTIL"
CONDITION WHICH WILL ALSO TERMINATE THE LOOP.
What I wanted, then, was simple -- a loop that looked like
FOR I:=START TO END UNTIL CONDITION DO
For instance,
FOR I:=1 TO STRLEN(S) UNTIL S[I]=C DO;
or
FOR I:=1 TO STRLEN(S) WHILE S[I]=' ' DO;
What I got -- and I'm not sure if I'm sorry I asked or not -- is the C
FOR loop:
for (i=1; i<=strlen(s) && s[i]!=c; i=i+1)
;
The C FOR loop -- like most things in C, accomplished with a minimum
of letters and a maximum of special characters -- looks like this:
for (initcode; testcode; inccode)
statement;
It is functionally identical to
initcode;
while (testcode)
{
statement;
inccode;
}
In other words, this is a sort of "build-your-own" FOR loop -- YOU
specify the initialization, the termination test, and the "STEP". This
is actually quite useful for loops that don't involve simple
incrementing, such as stepping through a linked list:
for (ptr=listhead; ptr!=nil; ptr=ptr.next)
fondle (ptr);
The above loop, of course, fondles every element of the linked list,
something quite analogous to what an ordinary PASCAL FOR loop would
do, but with a different kind of "stepping" action.
The standard PASCAL loop, of course, can easily be emulated --
for (i=start; i<=limit; i=i+1)
statements;
I'm sure it would be fair to conclude that C's FOR loop is clearly
more powerful than PASCAL's. On the other hand, a WHILE loop is more
powerful than a FOR loop, too; and, a GOTO is the most powerful of
them all (heresy!). The reason a PASCAL FOR loop -- or for that
matter, a C FOR loop -- is good is because simply by looking at it,
you can clearly see that it is a WHILE loop of a particular kind, with
clearly evident starting, terminating, and stepping operations.
The major argument that may be made against C's for loop is simply
one of clarity. Possible reasons include:
* The loop variable has to be repeated four (or three, if you use
"i++" instead of "i=i+1") times.
* The semicolons, adequate to delimit the three clauses for the
compiler, may not sufficiently delimit them to a human reader --
it may not be instantly obvious where one clause starts and
another ends.
* Also, the very use of semicolons instead of control keywords
(like "TO") may be irritating; in a way, it's like having to
write
FOR I,1,100
instead of
FOR I:=1 TO 100
If you think the first version isn't any worse than the second,
you shouldn't mind C; some, however, find "FOR I,1,100" slightly
less clear than "FOR I:=1 TO 100".
for (i=1; i<=10; i=i+1) FOR I:=1 TO 10 DO
or, alternatively
for (i=1; i<=10; i++) FOR I:=1 TO 10 DO
Which do you prefer? Frankly, for me, the PASCAL version is somewhat
clearer, although I'm not prepared to say that the clarity is worth
the cost in power. On the other hand, many a C programmer doesn't see
any advantage in the PASCAL style, and perhaps there isn't any. Some
of the C/PASCAL differences, I'm afraid, boil down to simply this.
THE WHILE LOOP AND ITS LIMITATIONS -- AN INTERESTING PROBLEM
Consider the following simple task -- you want to read a file until
you get a record whose first character is a "*"; for each record you
read, you want to execute some statements. Your PASCAL program might
look like this:
READLN (F, REC);
WHILE REC[1]<>'*' DO
BEGIN
PROCESS_RECORD_A (REC);
PROCESS_RECORD_B (REC);
PROCESS_RECORD_C (REC);
READLN (F, REC);
END;
All well and good? But, wait a minute -- we had to repeat the READLN
statement a second time at the end of the WHILE loop. "Lazy bum," you
might reply. "Can't handle typing an extra line." Well, what if, in
order to get the record, we had to do more than just a READLN? We
might need to, say, call FCONTROL before doing the READLN, and perhaps
have a more complicated loop test. Our program might end up looking
like:
FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
READLN (F, REC);
GETFIELD (REC, 3, FIELD3);
WHILE FIELD3<>'*' DO
BEGIN
PROCESS_RECORD_A (REC);
PROCESS_RECORD_B (REC);
PROCESS_RECORD_C (REC);
FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
READLN (F, REC);
GETFIELD (REC, 3, FIELD3);
END;
This is not a happy-looking program. We had to duplicate a good chunk
of code, with all the resultant perils of such a duplication; the code
was harder to write, it's now harder to read, and when we maintain it,
we're liable to change one of the occurrences of the code and not the
other.
Workarounds, of course, exist. We can say
REPEAT
FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
READLN (F, REC);
GETFIELD (REC, 3, FIELD3);
IF FIELD3 <> '*' THEN
BEGIN
PROCESS_RECORD_A (REC);
PROCESS_RECORD_B (REC);
PROCESS_RECORD_C (REC);
END;
UNTIL
FIELD3 = '*';
although this is also rather messy -- we've had to repeat the loop
termination condition, and the resulting code is really a WHILE-DO
loop masquerading as a REPEAT-UNTIL.
Some might reply that what we ought to do is to move the FCONTROLs,
READLN, and GETFIELD into a separate function that returns just the
value of FIELD3, or perhaps even the loop test (FIELD3 <> '*'). Then,
the loop would look like:
WHILE FCONTROLS_READLN_AND_GETFIELD_CHECK_STAR (FNUM, REC) DO
BEGIN
PROCESS_RECORD_A (REC);
PROCESS_RECORD_B (REC);
PROCESS_RECORD_C (REC);
END;
This, indeed, does look nice -- but are we to be expected to create a
new procedure every time a control structure doesn't work like we want
it to? I like procedures just as much as the next man; in fact, I'm a
lot more prone to pull code out into procedures than others are (I
like my procedures to be twenty lines or shorter). On the other hand,
what if someone said that you couldn't use BEGIN .. END in
IF/THEN/ELSE statements -- if you want to do more than one thing in
the THEN clause, you have to write a procedure?
C has some advantage here. With C's "comma" operator, you can
combine any number of statements (with some restrictions) into a
single expression, whose result is the last value. Thus, what you can
do is something like this:
while ((fcontrol (fnum(f), extended_read, dummy),
fcontrol (fnum(f), set_timeout, timeout),
gets (f, rec, 80),
getfield (rec, 3, field3),
field3<>'*'))
{
process_record_a (rec);
process_record_b (rec);
process_record_c (rec);
};
Whether this is better or not, you decide. The "comma" construct can
be very confusing. In "while ((...)) do", the outside parentheses are
the WHILE's; the inner pair is the comma constructs'; and all others
belong to internal expressions and function calls. Additionally, you
have to keep track of which commas belong to the function calls and
which delimit the comma constructs' elements. &P
What is that slithering underfoot? Could it be the serpent? He
proposes this:
WHILE TRUE DO
BEGIN
FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
READLN (F, REC);
GETFIELD (REC, 3, FIELD3);
IF FIELD3<>'*' THEN GOTO 99;
PROCESS_RECORD_A (REC);
PROCESS_RECORD_B (REC);
PROCESS_RECORD_C (REC);
END;
99:
"Sssimple and ssstraightforward, madam. Won't you have a bite?" Shame
on you! Still, it's not obvious that the old faithful "GOTO" isn't,
relatively speaking, a reasonable solution. C has its own variant,
that lets us get away without using the "forbidden word":
while (TRUE)
{
fcontrol (fnum(f), extended_read, dummy);
fcontrol (fnum(f), set_timeout, timeout);
gets (f, rec, 80);
getfield (rec, 3, field3);
if (field3='*') break;
process_record_a (rec);
process_record_b (rec);
process_record_c (rec);
};
C's "BREAK" construct gets you out of the construct that you're
immediately in, be it a WHILE loop (as in this case), a SWITCH
statement (in which it is vital), a FOR, or a DO. If you believe in
the evil of GOTOs, you probably won't much like BREAKs; again, though,
I ask -- is the above example any less muddled than the other ones I
showed?
Incidentally, the best approach that I've seen so far comes from a
certain awful, barbarian language called FORTH (OK, all you FORTHies
-- meet me in the alley after the talk and we can have it out).
Translated into civilized terms, the loop looked something like this:
DO
FCONTROL (FNUM(F), EXTENDED_READ, DUMMY);
FCONTROL (FNUM(F), SET_TIMOUT, TIMEOUT);
READLN (F, REC);
GETFIELD (REC, 3, FIELD3);
WHILE FIELD3<>'*'
PROCESS_RECORD_A (REC);
PROCESS_RECORD_B (REC);
PROCESS_RECORD_C (REC);
ENDDO;
This so-called "loop-and-a-half" solves what I think is the key
problem, present in so many WHILE loops -- that the condition often
takes more than a single expression to calculate. Well, in any case,
neither SPL, PASCAL, nor C have such a construct, so that's that.
BREAK, CONTINUE, AND RETURN -- PERFECTION OR PERVERSION?
As I mentioned briefly in the last section, C has three control
structures that PASCAL does not, and some say should not. These
structures, Comrade, are Ideologically Suspect. A Dangerous Heresy.
Still, they're there, and ought to be briefly discussed.
* BREAK -- exits the immediately enclosing loop (WHILE, DO, or FOR)
or a SWITCH statement. Essentially a GOTO to the statement
immediately following the loop.
* CONTINUE -- goes to the "next iteration" of the immediately
enclosing loop (WHILE, DO, or FOR).
* RETURN -- exits the current procedure. "RETURN <expression>"
exits the current procedure, returning the value of <expression>
as the procedure's result.
* Of course, GOTO, the old faithful.
Now, as you may or may not recall, a while ago there was much
argument made against GOTOs. Instead of GOTOs, it was said, you ought
to use only IF-THEN-ELSEs and WHILE-DOs. CASEs, FORs, and
REPEAT-UNTILs, being just variants of the other control structures,
were all right; but GOTOs were condemned, on several very good
grounds:
* First of all, with GOTOs, the "shape" of a procedure stops being
evident. If you don't use GOTOs, each procedure and block of code
will have only one ENTRY and only one EXIT. This means that you
can always assume that control will always flow from the
beginning to the end, with iterations and departures that are
always clearly defined and the conditions for which are always
evident.
* If you avoid GOTOs, then for any statement, you can tell under
what conditions it will be executed just by looking at the
control structures within which it is enclosed.
These concerns, I would say, may apply equally well to BREAKs,
CONTINUEs, and RETURNs.
Personally, I must confess, I don't use GOTOs. I don't know if it
is the appeal of reason, the lesson of experience, or fear for my
immortal soul. About five years ago I resolved to stop using them;
except for "long jumps" (which I'll talk about more later), I use
GOTOs in 1 procedure of MPEX's 40 procedures, and in 2 procedures of
my RL's 350 (both of the uses of "GOTO" are as "RETURN" statements).
However, I must say that in many cases the temptation does seem great.
Consider, for a moment, the following case. We need to write a
procedure that opens a file, reads some records, writes some records,
and closes the file. In case any of the file operations fails, we
should immediately close the file and not do anything else. The
"GOTO-less" solution:
munch_file (f)
char f[40];
{
int fnum;
fnum = fopen (f, 1);
if (error == 0) /* let's say ERROR is an error code */
{
freaddir (fnum, buffer, 128, rec_a);
if (error == 0)
{
munch_record_one_way (buffer);
fwritedir (fnum, buffer, 128, rec_a);
if (error == 0)
{
freaddir (fnum, buffer, 128, rec_b);
if (error == 0)
{
munch_record_another_way (buffer);
fwritedir (fnum, buffer, 128, rec_b);
if (error == 0)
some_more_stuff;
}
}
}
}
fclose (fnum, 0, 0);
}
Or, using "GOTO":
munch_file (f)
char f[40];
{
int fnum;
#define check_error if (error != 0) goto done
fnum = fopen (f, 1);
if (error = 0)
{
freaddir (fnum, buffer, 128, rec_a);
check_error;
munch_record_one_way (buffer);
fwritedir (fnum, buffer, 128, rec_a);
check_error;
freaddir (fnum, buffer, 128, rec_b);
check_error;
munch_record_another_way (buffer);
fwritedir (fnum, buffer, 128, rec_b);
check_error;
some_more_stuff;
}
done:
fclose (fnum, 0, 0);
}
Is the latter way really worse? I'm not so sure. Also, I can't see any
way in which I can rewrite this example without GOTOs without making
it as cumbersome as the first case.
Similar examples can be found for BREAK and RETURN. If, for
instance, I wasn't required to close the file, I'd just do a RETURN
instead of doing the "GOTO DONE"; if I had to loop through the file,
my code might look something like:
framastatify (f)
char f[40];
{
int fnum;
fnum = fopen (f, 1);
if (error = 0)
{
while (TRUE)
{
fread (fnum, rec1, 128);
if (error != 0) break;
if (frob_a (rec1) == failed)
break;
fupdate (fnum, rec1, 128);
if (error != 0) break;
freadlabel (fnum, rec1, 128, 0);
if (error != 0) break;
if (twiddle_label (rec1) == failed)
break;
fwritelabel (fnum, rec1, 128, 0);
if (error != 0) break;
fspace (fnum, 20);
if (error != 0) break;
}
fclose (fnum, 0, 0);
}
}
Just IMAGINE all those IFs you'd need to nest if you avoided BREAK!
CONTINUE, on the other hand, is a vile heresy. Everybody who uses
CONTINUE should be burned at the stake.
To summarize, "C Notes, A Guide to the C Programming Language" by
C.T. Zahn (Yourdon 1979) says:
"In practice, BREAK is needed rarely, CONTINUE never, and GOTO even
less often than that... It also is good style to minimize the
number of RETURN statements; exactly one at the end of the
function is best of all for readability."
On the other hand, I say
"If this be treason, make the most of it!"
Especially if your procedures are short enough and otherwise
well-written enough, I think that you can well make the judgment that
even with the introduction of GOTOs, the control flow will still be
clear enough.
Just don't tell anyone I told you to do it.
LONG JUMPS -- PROBLEM AND SOLUTION
Modern structured programming encourages FACTORING. Your algorithm,
it says, should be broken up into small procedures, small enough that
each one can be easily understood and digested by anybody reading it.
I'm quite fond of factoring myself, and you'll find most of my
procedures to be about 20-odd lines long or shorter. I try to make
each procedure a "black box", with a well-defined, atomic function and
no unobvious side effects. Naturally, with procedures this small, I
often end up going several levels of procedure calls deep to do a
relatively simple task.
For instance, I might have a procedure called ALTFILE that takes a
file name and a string of keywords indicating how the file is to be
altered:
* ALTFILE calls PARSE_KEYWORDS to parse the keyword string;
* PARSE_KEYWORDS separates the string into individual keywords,
calling PROCESS_KEYWORD for each one;
* PROCESS_KEYWORD figures out what keyword is being referenced, and
calls a parsing routine -- PARSE_INTEGER, PARSE_DATE,
PARSE_INT_ARRAY, etc. -- depending on the type of the value the
user specified;
* PARSE_INT_ARRAY takes a list of integer values delimited by, say
":"s, and calls PARSE_INTEGER for each one.
* PARSE_INTEGER converts the text string containing an integer
value into a number and returns the numeric value.
Not a far-fetched example, you must agree; in fact, many of my
programs (e.g. MPEX's %ALTFILE parser) nest even deeper. Now, the
question arises -- what if PARSE_INTEGER realizes that the value the
user specified isn't a valid number after all?
The solution seems clear -- PARSE_INTEGER, in addition to returning
the integer's value, also returns a true/false flag indicating whether
or not the value was actually valid. PARSE_INTEGER returns this to
PARSE_INT_ARRAY; now, PARSE_INT_ARRAY realizes that its parameter
isn't a valid integer array -- it must also return a success/failure
flag to PROCESS_KEYWORD; PROCESS_KEYWORD must pass it back up to
PARSE_KEYWORDS; PARSE_KEYWORDS should return it to ALTFILE; finally,
ALTFILE informs its caller that the operation failed.
Let's look at a particular specimen of one of these procedures;
say, the portion that handles the keyword FOOBAR, which the user
should specify in conjunction with an integer array, a string, and two
dates:
...
IF KEYWORD="FOOBAR" THEN
BEGIN
GET_SUBPARM (0, PARM_STRING);
IF PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE) = FALSE THEN
PARSE_KEYWORD:=FALSE
ELSE
BEGIN
GET_SUBPARM (1, PARM_STRING);
IF PARSE_STRING (PARM_STRING, SP1_VALUE) = FALSE THEN
PARSE_KEYWORD:=FALSE
ELSE
BEGIN
GET_SUBPARM (2, PARM_STRING);
IF PARSE_DATE (PARM_STRING, SP2_VALUE) = FALSE THEN
PARSE_KEYWORD:=FALSE
ELSE
BEGIN
GET_SUBPARM (3, PARM_STRING);
PARSE_KEYWORD:=
PARSE_DATE (PARM_STRING, SP3_VALUE) = FALSE ;
END;
END;
END;
END;
...
Of course, the same sort of thing has to be repeated in every
procedure in the calling sequence; the moment an error return is
detected from one of the called procedures, the other calls have to be
skipped, and the error condition should be passed back up to the
caller.
Error handling, of course, is important business, and it would
hardly be appropriate to crash and burn just because the user inputs a
bad value (users input bad values all the time). Still, all this work
just to catch the error condition?
What we really want to do in this case is to
* HAVE WHOEVER DETECTS THE ERROR CONDITION AUTOMATICALLY RETURN ALL
THE WAY TO THE TOP OF THE CALLING SEQUENCE.
In other words, the error finder might have code that looks like:
NUM:=BINARY (STR, LEN);
IF CCODE<>CCE THEN { an error detected? }
SIGNAL_ERROR; { return to the top! }
The procedure we want to return to would indicate its desire to catch
these errors by saying something like:
ON ERROR DO
{ the code to be activated when the error is detected };
RESULT:=ALTFILE (FILE, KEYWORDS);
Finally, the intermediate procedures can now be the soul of
simplicity:
...
IF KEYWORD="FOOBAR" THEN
BEGIN
GET_SUBPARM (0, PARM_STRING);
PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE);
GET_SUBPARM (1, PARM_STRING);
PARSE_STRING (PARM_STRING, SP1_VALUE);
GET_SUBPARM (2, PARM_STRING);
PARSE_DATE (PARM_STRING, SP2_VALUE);
GET_SUBPARM (3, PARM_STRING);
PARSE_DATE (PARM_STRING, SP3_VALUE);
END;
...
Thus, the three components of this scheme:
* The code that finds the error -- it "SIGNALS THE ERROR";
* The code that should be branched to in case of error is somehow
indicated, at compile time or run time (but before the error is
actually signaled).
* Finally, the intermediate code knows nothing about the possible
error condition. It's automatically exited by the error signaling
mechanism.
For want of a better name, I'll call this concept a "Long Jump". It's
also been called a "non-local GOTO", a "throw", a "signal raise", and
other unsavory things, but "Long Jump" -- which happens to be the C
name for it -- sounds more romantic.
LONG JUMPS, CONTINUED -- SOLUTIONS AND PROBLEMS
I've indicated the need -- or at least, I think it's a need -- and
a possible prototype solution. There are several implementations of
this already extant, each with its own little quirks and problems.
PASCAL -- STANDARD AND /3000
The only mechanism Standard PASCAL and PASCAL/3000 give you to
solve our problem is the GOTO. In PASCAL, you're allowed to GOTO out
of a procedure or function; however, you can only branch INTO the
main body of the program or from a nested procedure into the procedure
that contains it. In other words, if you have
PROCEDURE P;
PROCEDURE INSIDE_P; { nested in P }
BEGIN
...
END;
BEGIN
...
END;
PROCEDURE Q;
BEGIN
...
P;
...
END;
then you can branch from INSIDE_P into P, but you can't branch from P
into Q, even though Q calls P.
Even if this restriction weren't present, the GOTO to a fixed label
still wouldn't be the right answer -- what if our PARSE_KEYWORDS
procedure is called from two places? Surely we wouldn't want an error
condition to cause a branch to the same location in both cases!
Besides, if we want to compile PARSE_KEYWORDS separately from its
caller, we'd have to allow "global label variables". In reality,
PASCAL can't do these "long jumps".
SPL
SPL has a different and rather better facility. In SPL, you can't
branch from one procedure into another; however, you CAN pass a label
as a parameter to a procedure. Thus, you could write:
PROCEDURE PARSE'INT'ARRAY (PARM, RESULT, ERR'LABEL);
BYTE ARRAY PARM;
INTEGER ARRAY RESULT;
LABEL ERR'LABEL;
BEGIN
...
IF << test for error condition >> THEN
GOTO ERR'LABEL;
...
END;
Then, you might call this from within PROCESS'KEYWORD by saying
PROCEDURE PROCESS'KEYWORD (KEYWORD'AND'PARM, ERR'LABEL);
BYTE ARRAY KEYWORD'AND'PARM;
LABEL ERR'LABEL;
BEGIN
...
IF KEYWORD="FOOBAR" THEN
BEGIN
GET'SUBPARM (0, PARM'STRING);
PARSE'INT'ARRAY (PARM'STRING, SP0'VALUE, ERR'LABEL);
...
END;
...
END;
When you call PARSE'INT'ARRAY, you pass it the label to which it
should return in case of error -- in this case, also called ERR'LABEL,
which was also passed to this procedure. Finally, the topmost
procedure -- ALTFILE -- might say:
RESULT:=ALTFILE (FILENAME, KEYWORDS, GOT'ERROR);
...
GOT'ERROR:
<< report to the user that an error occurred >>
The key point here is that each procedure doesn't really return
directly to the top; rather, it returns to the error label that it was
passed by its caller. Since that may well be the label passed by the
caller's caller, and so on, you get a sort of "daisy chain" effect by
which you can easily exit ten levels of procedures in one GOTO
statement.
At this point, I think it's quite important to mention a SEVERE
PROBLEM of these "long jumps" that I think any implementation
mechanism has to be able to address:
* THE VERY ESSENCE OF A LONG JUMP IS THAT IT BYPASSES SEVERAL OF
THE PROCEDURES IN THE CALLING SEQUENCE. A PROCEDURE (say, our
PROCESS_KEYWORD) CALLS ANOTHER PROCEDURE, EXPECTING THE CALLEE TO
RETURN, BUT THE CALLEE NEVER DOES!
Imagine for a moment that PROCESS_KEYWORD opened a file, intending
to close it at the end of the operation; after the long jump branches
out of it, the file will remain open. Any other kind of cleanup --
resetting temporarily changed global variables, releasing acquired
resources -- that a procedure expects to do at the end might remain
undone because the procedure will be branched out of.
Similarly, what if a procedure EXPECTS another procedure that it
calls to detect an error condition? What is a fatal error under some
circumstances may be quite normal under others; for instance, say you
have a procedure that reads data from a file and signals an error if
the file couldn't be opened -- in some cases, you may expect the file
to be unopenable, and have a set of defaults you want to use instead.
By using the convenience of long jumps, you lose the certainty that
every procedure has complete control over its execution, and can be
sure that any procedure it calls will always return.
The advantage of SPL's approach is that you could call a procedure
passing to it any error label you want to. For instance,
PROCESS'KEYWORD might look like:
PROCEDURE PROCESS'KEYWORD (KEYWORD'AND'PARM, ERR'LABEL);
BYTE ARRAY KEYWORD'AND'PARM;
LABEL ERR'LABEL;
BEGIN
INTEGER FNUM;
FNUM:=FOPEN (KEY'INFO'FILE, 1);
...
IF KEYWORD="FOOBAR" THEN
BEGIN
GET'SUBPARM (0, PARM'STRING);
PARSE'INT'ARRAY (PARM'STRING, SP0'VALUE, CLOSE'FILE);
...
END;
...
RETURN; << if we finished normally, just return >>
CLOSE'FILE: << branch here in case of error >>
FCLOSE (FNUM, 0, 0);
GOTO ERR'LABEL;
END;
Because you have complete control over each branch, you don't HAVE to
pass the procedure you call the same error label that you were passed;
if you want to do some cleanup, you can just pass the label that does
the cleanup, and THEN returns to your own error label.
Thus, with SPL's label parameter system, you get the best of both
worlds:
* If you pass an "error label" to a procedure, the procedure may
choose to return normally or to return to the error label.
* Since you can pass the same label to a procedure you call as the
one that you yourself were passed, a single GOTO to that label
can conceivably exit any number of levels of procedures.
* On the other hand, if you want to do some cleanup in case of an
error, you can just pass a different label, one that points to
the cleanup code.
* Finally -- if you want to -- you can actually pass several labels
to a procedure, allowing it to return to a different one
depending on what error condition it finds. A bit extravagant for
my blood, but maybe I'm just too stodgy.
The only problems that this system has are:
* You have to pass the label to any procedure that might
conceivably want to participate in a long jump -- either the
procedure that initially detects the error or any one that wants
to pass it on. This may often mean that virtually every one of
your procedures will have to have this error label parameter. Not
a very unpleasant problem, but a bit of a bother nonetheless.
* Similarly, there are some procedures whose parameters you can't
dictate; for instance, control-Y trap procedures (ones in which a
long jump to the control-Y handling code may often be just the
thing you want to do). Other trap procedures (arithmetic,
library, and system) are just like this, too, as are those which
are themselves passed as "procedure parameters" to other
procedures and whose parameters are dictated by those other
procedures (got that?).
Besides these minor problems, though, SPL's long jump support is quite
reasonably done.
PROPOSED ANSI STANDARD C
C's "GOTO" doesn't allow any branch from one function to another;
neither does C provide label parameters like SPL does. Long jumps in C
are accomplished with a different mechanism, involving the SETJMP and
LONGJMP built-in procedures.
SETJMP is a procedure to which you pass a record structure (of the
predefined type "jmp_buf"). When you first call it, it saves all the
vital statistics of the program -- the current instruction pointer,
the current top-of-stack address, etc. -- in this record structure.
Then, when the same record structure is passed to LONGJMP, LONGJMP
uses this information to restore the instruction pointer and stack
pointer to be exactly what they were at SETJMP time. Thus, control is
passed back to the SETJMP location, wherever it may be.
A typical application of this might be:
jmp_buf error_trapper;
proc()
{
...
if (setjmp(error_trapper) != 0)
/* do error processing */;
else
{
result = altfile (filename, keywords);
...
}
...
}
...
int parse_integer (str)
char str[];
{
...
if (bad_value)
longjmp (error_trapper, 1);
...
}
One thing, I didn't, as you see, mention at first was the "IF
(SETJMP(ERROR_TRAPPER) != 0)". Well, since the LONGJMP jumps DIRECTLY
to the instruction following the SETJMP, we have to have some way of
distinguishing the first time it is executed (after a legitimate
SETJMP) and the next time (after the LONGJMP which transferred control
back to it). The initial SETJMP, you see, returns a 0; a LONGJMP takes
its second parameter (in this case, a 1), and returns it as the
"result" of SETJMP.
Thus, when the IF statement is first executed, the value of the
"(SETJMP ... != 0)" will be FALSE, and the ALTFILE will be done; when
the IF is executed a second time, the value will be TRUE, the error
processing will be performed.
Note the distinctive features of the SETJMP/LONGJMP construct:
* The "jump buffer" -- set by SETJMP and used by LONGJMP -- need
not be passed as a parameter to each procedure that needs it
(although it could be). Typically, it's stored as a global
variable (which the SPL error label parameter couldn't be).
* You still have control over procedures you call; if you want to
trap their jump yourself (either to do some cleanup or treat it
as a normal condition), you can just do your own SETJMP using the
same buffer that they'll LONGJMP with.
* On the other hand, if you want do some cleanup and then continue
the LONGJMP process -- propagate it back up to the original error
trapper, in this case PROC -- you have to do more work. You must
save the original jump buffer in a temporary variable before
doing the SETJMP, and restore it before continuing the LONGJMP
(or simply returning from the procedure). For instance,
PROCESS_KEYWORD might look like this:
process_keyword (keyword_and_parm)
char keyword_and_parm[];
{
jmp_buf error_trapper; /* declare our temporary save buffer */
int fnum;
fnum = fopen (key_info_file, 1);
save_error_trapper = error_trapper;
if (setjmp (error_trapper) != 0)
/* Must be an error condition */
{
fclose (fnum, 0, 0);
error_trapper = save_error_trapper;
longjmp (error_trapper, 1);
}
...
if (strcmp (keyword, "foobar"))
{
get_subparm (0, parm_string);
parse_int_array (parm_string, sp0_value);
...
}
...
fclose (fnum, 0, 0);
error_trapper = save_error_trapper; /* restore for future use */
}
Frankly speaking, if you ask me -- and even if you don't -- this
doesn't look very clean. I'd like to see some way of automatically
"stacking" SETJMPs so that the system would do the saving of the old
jump buffer for you; also, I'd prefer not to have to type that ugly
"IF (SETJMP ... != 0)" kludge. On the other hand, this can be made
quite palatable-looking with a few macros, and it's better than
nothing (or is it?).
PASCAL/XL AND THE TRY..RECOVER
The authors of PASCAL/XL -- perhaps because they were faced with
the non-trivial task of building a language that MPE/XL could be
profitably written in -- must have given this subject a great deal of
thought. And, fortunately, they've come up with what I think to be a
very powerful construct.
TRY
statement1;
statement2;
...
statementN;
RECOVER
recoverycode;
The behavior here is
* EXECUTE statement1 THROUGH statementN. IF ANY PASCAL ERROR (e.g.
giving a bad numeric value to a READLN) OR A CALL TO THE BUILT-IN
"ESCAPE" PROCEDURE OCCURS WITHIN THESE STATEMENTS, CONTROL IS
TRANSFERRED TO recoverycode, AND AFTER THAT TO THE STATEMENT
FOLLOWING TRY..RECOVER.
This, as you see, allows you to put a TRY..RECOVER into the
top-level procedure (in our case, PROC or ALTFILE) and an ESCAPE call
in any of the called procedures (e.g. PARSE_INTEGER) that detects a
fatal error.
The best part, though, is that any procedure that wants to
establish some sort of "cleanup" code can do this trivially! For
instance, our PROCESS_KEYWORD might say:
PROCEDURE PROCESS_KEYWORD (VAR KEYWORD_AND_PARM: STRING);
VAR FNUM: INTEGER;
SAVE_ESCAPECODE: INTEGER;
BEGIN
FNUM:=FOPEN (KEY_INFO_FILE, 1);
TRY
...
IF KEYWORD="FOOBAR" THEN
BEGIN
GET_SUBPARM (0, PARM_STRING);
PARSE_INT_ARRAY (PARM_STRING, SP0_VALUE);
END;
...
FCLOSE (FNUM, 0, 0);
RECOVER
BEGIN
SAVE_ESCAPECODE:=ESCAPECODE;
FCLOSE (FNUM, 0, 0);
ESCAPE (SAVE_ESCAPECODE);
END;
END;
If any error occurs in the code between TRY and RECOVER, the BEGIN/END
in the RECOVER part is triggered. This is now free to close the file,
or do whatever else it needs to, and then "pass the error down" by
calling ESCAPE again.
This ESCAPE -- since it's no longer between this TRY and RECOVER --
will activate the previously defined TRY/RECOVER block (say, in the
PARSE_KEYWORDS procedure) which might do more cleanup and then call
ESCAPE again. Eventually, the error will percolate to the top-most
TRY/RECOVER, which will just do some work and not call ESCAPE any
more, continuing with the rest of the program.
In other words, "TRY .. RECOVER"s can be nested. In the following
piece of code
TRY
A;
TRY
B;
TRY
C;
RECOVER
R1;
D;
RECOVER
R2;
E;
RECOVER
R3;
* An error or ESCAPE in C will cause a branch to R1.
* An error/ESCAPE in B or D will, of course, branch to R2 (since B
and D are outside the innermost TRY .. RECOVER R1). However, an
error/ESCAPE in R1 will also cause a branch to R2! That's because
R1 is also out of the area of effect of the innermost TRY ..
RECOVER.
In other words, the "recovery handler" R1 is only "established"
between the innermost TRY and the innermost RECOVER; when it's
actually "triggered", it's disestablished, and the recovery
handler that was previously in effect is re-established.
* By this token, an error/ESCAPE in A, E, or R2 will branch to R3.
* And, finally, an error in R3 -- or anywhere else outside of the
TRY .. RECOVER -- will actually abort the program with an error
message.
As you see, then, all is for the best in this best of all possible
worlds. We can do long jumps "up the stack" to the RECOVER code, but
each intervening procedure can also easily set up "cleanup code" that
needs to be executed before the long jump can continue.
Several notes:
* First of all, remember that the RECOVER statement is executed
ONLY in case of an error or an ESCAPE. If the statements between
TRY and RECOVER finish normally, any "cleanup" code you may have
inside the RECOVER will NOT be executed. That's why our sample
program has two FCLOSEs -- one for the normal case and one for
the cleanup case.
* Note also that the ESCAPE call can take a parameter (just like
C's LONGJMP). This parameter is then available as the variable
ESCAPECODE in the RECOVER handler, and is used to indicate what
kind of error or ESCAPE happened.
A RECOVER handler might, for instance, be used to avoid an abort
caused by an expected error condition (e.g. file I/O error);
however, if it sees that ESCAPECODE indicates some other,
unexpected, error condition, it might terminate or call ESCAPE
again, hoping that some "higher-level" RECOVER block can handle
the error.
* Finally, if a RECOVER block wants to continue the long jump after
doing its cleanup work, it often needs to pass the ESCAPECODE up
as well (unless, of course, the higher-level RECOVER handler
won't use the ESCAPECODE). Unfortunately, the PASCAL/XL manual
explicitly tells us:
- "It is wise to assign the result of the ESCAPECODE function
to a local variable immediately upon entering the RECOVER
part of a TRY-RECOVER construct, because the system can
change that value later in the RECOVER part."
This is too bad; it would have been nice to have TRY .. RECOVER
do this saving for you automatically, saving you the burden of
having to declare and set an extra local variable. Still, we
oughtn't look a gift horse in the mouth.
Note, incidentally, how C's #define macro facility can come to our
aid if we want to implement this same construct in C. All we need is
three #defines:
int escapecode;
int jump_stack_ptr = -1;
jmp_buf jump_stack[100]; /* the stack used to do nesting */
#define TRY if (setjmp(jump_stack[++jump_stack_ptr])==0) {
#define RECOVER jump_stack_ptr--; } else
#define ESCAPE(parm) \
{ \
escapecode = parm; \
longjmp(jump_stack[jump_stack_ptr--], 1);
}
This would allow us to say:
TRY
code;
RECOVER
errorhandler;
and
ESCAPE(value);
just like we could in PASCAL/XL! Note how we've added this entirely
new control structure without any changes to the compiler -- nothing
more complicated than a few #defines! (Many thanks to Tim Chase of CCS
for showing me how to do this!)
NESTED PROCEDURES
An interesting feature of PASCAL is its ability to have procedures
nested within other procedures. In other words, I could say:
PROCEDURE PARSE_THING (VAR THING: STRING);
VAR CURR_PTR, CURR_DELIMS: INTEGER;
QUOTED: BOOLEAN;
...
PROCEDURE PARSE_CURR_ELEMENT (...);
BEGIN
...
END;
BEGIN
...
PARSE_CURR_ELEMENT (...);
...
END;
PARSE_CURR_ELEMENT here is just like a local variable of PARSE_THING
-- it's a local procedure. It's callable only from within PARSE_THING
and not from any other procedure in the program. More importantly,
* THE NESTED PROCEDURE (PARSE_CURR_ELEMENT) CAN ACCESS ALL OF
PARSE_THING'S LOCAL VARIABLES.
This is a significant consideration. If PARSE_CURR_ELEMENT didn't need
to access PARSE_THING's local variables, not only could it be a
different (non-nested) procedure, but it probably should be. When a
procedure is entirely self-contained, it's usually a good idea to make
it accessible to as many possible callers as possible.
On the other hand, what if PARSE_CURR_ELEMENT needs to interrogate
CURR_PTR to find out where we are in parsing the thing; or look at or
modify CURR_DELIMS or QUOTED or whatever other local variables are
relevant to the operation?
We don't want to have to pass all these values as parameters --
there could be dozens of them.
We don't want to make them global variables, since they're really
only relevant to PARSE_THING -- why make them accessible by other
procedures that have no business messing with them? (Incidentally,
making the variables global will also prevent PARSE_THING from calling
itself recursively.)
But, on the other hand, we certainly DO want to have
PARSE_CURR_ELEMENT be a procedure -- after all, we might need to call
it many times from within PARSE_THING; surely we don't want to repeat
the code every time!
Thus, the main advantage of nested procedures is not just that,
like local variables, they can only be accessed by the "nester".
Rather, the advantage is the fact that they can share the nester's
local variables, which are often quite relevant to what the nested
procedure is supposed to do.
Another substantial benefit comes when you pass procedures as
parameters to other procedures. A good example of this might be a
report writer procedure:
TYPE LINE_TYPE = PACKED ARRAY [1..256] OF CHAR;
PROCEDURE PRINT_LINE (VAR LINE: LINE_TYPE;
LINE_LEN: INTEGER;
PROCEDURE PAGE_HEADER (PAGENUM: INTEGER);
PROCEDURE PAGE_FOOTER (PAGENUM: INTEGER));
This procedure takes the line to be output and its length, but also
takes two procedures -- one that will be called in case a page header
should be printed and one in case a page footer should be printed. The
utility of this is obvious -- it gives the user the power to define
his own header and footer format.
Now, let's say we have the following procedure:
PROCEDURE PRINT_CUST_REPORT (VAR CATEGORY: INTEGER);
VAR CURRENT_COUNTRY: PACKED ARRAY [1..40] OF CHAR;
...
BEGIN
...
PRINT_LINE (OUT_LINE, OUT_LINE_LEN,
MY_PAGE_HEAD_PROC, MY_PAGE_FOOT_PROC);
...
END;
PRINT_LINE will output OUT_LINE and, in some cases, call
MY_PAGE_HEAD_PROC or MY_PAGE_FOOT_PROC. Now, it makes sense for you to
want these procedures to print, say, the current value of CATEGORY
and, perhaps, CURRENT_COUNTRY.
In C and SPL, which have no nested procedure, both
MY_PAGE_HEAD_PROC and MY_PAGE_FOOT_PROC would have to be separate
procedures which have no access to PRINT_CUST_REPORT's local
variables.
The variables would either have to be global (which is quite
undesirable) or would somehow have to be passed to PRINT_LINE, which
in turn would pass them to the MY_PAGE_xxx_PROC procedures.
This would be quite cumbersome, since in PRINT_CUST_REPORT the
header and footer procedures need to be passed an integer and a PACKED
ARRAY OF CHAR, whereas in some other application of PRINT_LINE they
would have be to passed, say, three floats and a record structure.
In PASCAL, on the other hand, both MY_PAGE_HEAD_PROC and
MY_PAGE_FOOT_PROC can be nested within PRINT_CUST_REPORT and thus have
access to CATEGORY and CURRENT_COMPANY (and all the other local
variables of the PRINT_CUST_REPORT procedure). Another useful
application for nested procedures.
C, as I mentioned, has no nested procedure support at all. On the
other hand, it does have #DEFINEs, which allow you to define text
substitutions that can often do the job (see the section on DEFINES)
of a nested procedure, especially if it's a small one. For instance,
you can say:
#define foo(x,y) \
{ \
int a, b; \ /* variables local to THIS DEFINE */
a = x + parm1; \ /* access a variable local to the procedure */
b = y * parm2; \ /* (the nesting procedure) */
x = a + b; \
y = a * b; \
}
As you can see, C's support for "block-local" variables -- local
variables that are local not just to the procedure, but rather to the
"{"/"}" block in which they're defined -- allows you to have #DEFINEs
that are almost as powerful as real procedures.
SPL allows you to have "SUBROUTINE"s nested within procedures, but
subject to some rather stringent restrictions:
* The subroutines can have no local variables of their own. This is
a pretty severe problem, since it means that all your local
variables have to be declared in the nesting procedure, which
increases the likelihood of errors and also prohibits you from
calling the subroutine recursively (which you would otherwise be
able to do).
* The subroutines can not be passed as procedure parameters to
other procedures (only procedures can be -- try parsing that!).
* Furthermore, this nesting capability goes to only one level; you
can nest SUBROUTINEs in PROCEDUREs, but you can't nest anything
within SUBROUTINEs. In PASCAL, procedures can be nested within
each other to an arbitrary number of levels. Frankly speaking,
I'm hard put to think of an application for triply-nested
procedures.
Practically, you'll have to decide for yourself whether PASCAL's
nested procedure support -- and C's lack of it -- is important to you.
I brought this issue up to a C partisan, and she replied that she's
simply never run into a case where nested procedures were all that
important. Upon thinking about this, I found myself forced to agree,
at least partially:
* #DEFINEs can do much of the job that nested procedures are needed
for;
* Most procedures should often NOT be nested, but rather be made
self-contained and made available to the world at large (rather
than just to a particular procedure).
* If the reason you don't want to declare your variables as global
is that you want to "hide" them from other procedures, you can do
this in C by making them "static". This will make them available
only to the procedures in the file in which they're defined. This
allows you to share data between procedures (which you might
otherwise have wanted to nest within each other) without making
the data readable and modifiable by everybody.
* On the other hand, there's no denying that there are cases in
which PASCAL's nested procedures are quite a bit superior to any
C or SPL alternative. For instance, a recursive procedure might
well not be able to use the "static global variable" approach I
just mentioned.
DATA TYPES
The difference most often cited between PASCAL and C is the way
that they treat data types. PASCAL is often considered a "strict type
checking" language and C a "loose type checking" language, and that's
true enough. However, the effects of this philosophical difference are
subtler and more pervasive than at first glance appears.
What are data types? Data types can be seen in the earliest of
languages, from FORTRAN and COBOL onwards. When you declare a variable
to be a certain data type, you give certain information to the
compiler -- information that the compiler must have to produce correct
code. Historically, this information has included:
* What the various operators of the language MEAN when applied to
the variable. "+", for instance, isn't just "addition" -- when
you add two integers, it's integer addition, and when you add two
reals, it's real addition. Two entirely different operations,
with entirely different machine language opcodes and (possibly)
different effects on the system state. Similarly, a FORTRAN
"DISPLAY X" means:
- If X is a string, print it verbatim;
- If X is an integer, print its ASCII representation;
- If X is a real, print its ASCII representation, but in a
different format and with a different conversion mechanism.
* How much SPACE is to be allocated for the variable. "Array of 20
integers" is a type, too, one from which the compiler can exactly
deduce how much memory (20 words) needs to be allocated to fit
this data.
If you look at SPL (and, incidentally, FORTRAN and other languages),
you'll find that all of its type declarations essentially aim at
serving these two functions. However, in recent times, a few other
functions have been ascribed to type declarations:
* Using type declarations, the compiler can DETECT ERRORS that you
may make. The compiler can't, of course, figure out if your
program does "the right thing" since it doesn't know what the
right thing is; however, it can see if there are any internal
inconsistencies in your program.
For instance, if you're multiplying two strings, the compiler can
tag that as an obvious error; similarly, if you pass a string
parameter to a procedure that expects an integer (or vice versa),
a good compiler will find this and save you a lot of run-time
debugging. The more elaborate and precise the type specifications
you give, the more error checking the compiler can do.
Error checking can also be provided at run time, where code that
knows what size arrays are, for instance, can make sure that you
don't inadvertently index outside them. PASCAL's "subrange types"
do this sort of thing, too, allowing you to declare what values
(e.g. "0 to 100") a variable may take and triggering an error
when you try assigning it an invalid value.
* Furthermore, with a type declaration, the compiler can
automatically SAVE WORK for you by automatically defining special
tools for the given type.
The classic example of this is the record structure -- by
declaring the structure, you're automatically defining a set of
"operators" (one for each field of the structure) that allow you
to easily access the structure. Similarly, enumerated types can
save you the burden of having to manually allocate distinct
values for each of the elements in the enumeration (admittedly,
not a very large burden).
Some fancy compilers can even automatically define "print"
operations for each record structure, so that you can easily dump
it in a legible format to the terminal without having to print
each element individually.
* Good type handling provisions can INSULATE YOUR PROGRAMS FROM
CHANGES IN YOUR DATA'S INTERNAL REPRESENTATION. For instance, if
the compiler allows you to refer to a field of a record as, say,
"CUSTREC.NAME" instead of "CUSTREC(20)", then you can easily
reformat the insides of the record (adding new fields, changing
field sizes, etc.) without having to change all places that
reference this record.
Similarly, if your language allows functions to return records
and arrays as well as scalars, you can easily change the type of
your, say, disc addresses from a 2-word double integer to a
10-word array of integers. In SPL, for instance, such a change
would require rewriting all procedures that want to return such
objects or to take them as "by-value" parameters. Even changing a
value from an "integer" to a "double integer" in SPL will require
you to change a great deal of code.
The reason I've given this list is that SPL, PASCAL, and C place
different weights on each of these points, and this makes for rather
substantial differences in the way you use these languages.
Now, away from the generalities and on to concrete examples.
RECORD STRUCTURES
Consider for a moment an entry in your "employee" data set. It
could be a file label; it could be a Process Control Block entry; it
could be any chunk of memory that contains various fields of various
data types.
A typical layout of this employee entry (or employee "record")
might be:
Words 0 through 14 - The employee name (a 30-character string);
Words 15-19 - Social security number (10-character string);
Words 20-21 - Birthday (a double integer, just to be interesting);
Words 22-23 - Monthly salary (a real number).
A simple record. It's 24 words long, but it's not really an "array of
24 words"; logically speaking, to you and me, it's a collection of
four objects, each of a different type, each starting at a different
(but constant) offset within the record.
How do we declare a variable to hold this record? In FORTRAN and
SPL, it's easy:
INTEGER ARRAY EMPREC(0:23);
or
INTEGER EMPREC(24)
Short and sweet. The compiler's happy -- it knows that it's an array
of integers, which means you can extract an arbitrary element from it,
and pass it to a procedure (like DBGET), which will receive its
address as an integer pointer. This defines to the compiler the
MEANING of the "indexing" and "pass to procedure" operations that can
be done on EMPREC. Also, the compiler knows that 24 words must be
allocated for this array, as a global or local variable.
The compiler is happy, but are you? First of all, how are you going
to access the various elements of this record structure? Are you going
to say
EMPREC(20)
when you mean the employee's birthday (actually, since it's a double
integer, you couldn't even do that)?
What about error checking? Since all the compiler knows about this
is that it's an integer array, it'll be happy as punch to allow you to
put it anywhere an integer array can go. Would you like to pass it as
the database name to DBGET instead of as the buffer variable? Fine.
Would you like to view it as a 4 by 5 matrix and multiply it by, say,
the department record? The computer will gladly oblige.
Finally, consider the burden this places on you whenever you want
to change the layout of EMPREC -- say, to increase the name from 30
characters to 40. You'll have to change all your "EMPREC(20)"s to
"EMPREC(25)", all your "INTEGER ARRAY EMPREC(0:23)" to "INTEGER ARRAY
EMPREC (0:28)". And, of course, if you forget one or the other -- why,
the compiler will be happy to extract the 4th word of the social
security number and treat it as the employee's birthday!
Of course, you're not going to do this. You will certainly not
refer to all the elements of the record structure by their numeric
array indices (although it so happens that most of HP's MPE code does
exactly this). Rather, you'll say (of course, in SPL, you can also do
the same thing with DEFINEs):
EQUATE SIZE'EMPREC = 24;
BYTE ARRAY EMP'NAME (*) = EMPREC(0);
BYTE ARRAY EMP'SSN (*) = EMPREC(15);
DOUBLE ARRAY EMP'BIRTHDATE (*) = EMPREC(20);
REAL ARRAY EMP'SALARY (*) = EMPREC(22);
[Note: The fact that we define, say, EMP'BIRTHDATE and
EMP'SALARY as arrays isn't a problem. If we say EMP'SALARY
with no subscript, it'll refer to the 0th element of this
"array", which is exactly what we want it to do.]
FORTRAN is similar (you'd use an EQUIVALENCE); COBOL is a bit
simpler, allowing you to say (remembering that COBOL doesn't have
REALs).
01 EMPREC.
05 NAME PIC X(30).
05 SSN PIC X(10).
05 BIRTHDATE PIC S9(9) COMP.
05 SALARY PIC S9(5)V9(2) COMP-3.
As you see, COBOL at least has the advantage that it automatically
calculates the indexes of each subfield for you. This is nice,
especially when you change the structure, reshuffling, inserting,
deleting, or resizing fields. On the other hand, I wouldn't call this
a very substantial feature, especially since sometimes you WANT to
manually specify the field offsets (whenever the record structure is
not under your control, like, say, an MPE file label).
To summarize, this "EQUIVALENCE"ing approach that's available in
SPL, FORTRAN, and COBOL saves you from the very substantial bother of
having to hardcode the offsets of all the subfields into your program.
This is certainly a good thing; however, PASCAL and C go substantially
beyond this.
The most serious problem with what I'll call the "EQUIVALENCE"ing
approach is a rather subtle one, one that I didn't realize until I'd
used it for some time.
The definitions we saw above -- in SPL, FORTRAN, or COBOL --
defined several variables as subfields of another variable. EMP'NAME
and EMP'SSN are subfields of EMPREC. What if we need to declare this
EMPREC twice -- say, in two different procedures?
Clearly we don't want to have to repeat the EQUIVALENCEs in each
procedure. Yet what choice do we have? We might, for instance, set up
each of the subfields as a DEFINE instead of an equivalence, making
the DEFINEs available in all the procedures that reference EMPREC:
DEFINE EMP'NAME = EMPREC(0) #;
DEFINE EMP'SSN = EMPREC(15) #;
DEFINE EMP'BIRTHDATE = EMPREC(20) #;
DEFINE EMP'SALARY = EMPREC(22) #;
but then, since DEFINEs are merely text substitutions and EMPREC is an
integer array, each EMP'xxx will also be an integer array. We'd have
to say
BYTE ARRAY EMPREC'B(*)=EMPREC;
DOUBLE ARRAY EMPREC'D(*)=EMPREC;
REAL ARRAY EMPREC'R(*)=EMPREC;
in each procedure that defines an EMPREC array, and a
DEFINE EMP'NAME = EMPREC'B(0) #;
DEFINE EMP'SSN = EMPREC'B(15) #;
DEFINE EMP'BIRTHDATE = EMPREC'D(20) #;
DEFINE EMP'SALARY = EMPREC'R(22) #;
at the beginning of the program. Still, we'd have had to have the
defines of the BYTE ARRAY, DOUBLE ARRAY, and REAL ARRAY repeated once
for each declaration of EMPREC; and, what if we want to call the
record something else, like have two records called EMPREC1 and
EMPREC2?
* THE PROBLEM WITH DEFINING SUBFIELDS OF A RECORD STRUCTURE USING
THE "EQUIVALENCING" APPROACH IS THAT IT DEFINES THE SUBFIELDS OF
ONLY ONE RECORD STRUCTURE VARIABLE.
WHAT WE WANT IS TO DEFINE A GENERALIZED "TEMPLATE" ONCE AND THEN
APPLY THIS TEMPLATE TO EACH RECORD STRUCTURE VARIABLE WE USE.
In other words, we want to be able to say
DEFINE'TYPE EMPLOYEE'REC (SIZE 24)
BEGIN
BYTE ARRAY NAME (*) = RECORD(0);
BYTE ARRAY SSN (*) = RECORD(15);
DOUBLE ARRAY BIRTHDATE (*) = RECORD(20);
REAL ARRAY SALARY (*) = RECORD(22);
END;
and then declare any particular employee record buffer by saying:
EMPLOYEE'REC EMPREC1;
EMPLOYEE'REC EMPREC2;
Then, we could extract each individual subfield of the record like
this:
NEW'SALARY := EMPREC1.SALARY * 1.1;
The point here is that
* IN ADDITION TO NOT HAVING TO EXPLICITLY SPECIFY THE OFFSET OF THE
SUBFIELD OF THE RECORD (like having to say RECORD(22), an awful
thing to do), WE CAN NOW DEFINE THE LAYOUT OF THE RECORD
STRUCTURE ONCE, REGARDLESS OF HOW MANY VARIABLES WITH THAT
STRUCTURE WE WANT TO DECLARE.
Do you see how nicely this dovetails with the "INSULATING YOUR PROGRAM
FROM CHANGING INTERNAL REPRESENTATION" principle we gave above? The
record structure layout is defined in EXACTLY ONE PLACE in the program
file. We can have a hundred different variables of this type -- none
of them will have to specify the physical size of the buffer or the
offsets of the subfields. Each one will merely refer back to the type
declaration.
Also, we've now announced EMPREC1 to the compiler as being of the
special "EMPLOYEE'REC" type. It's no longer a simple INTEGER ARRAY,
just like any other integer array. Conceivably, if we declare a
procedure to be
PROCEDURE PUT'EMPLOYEE (DBNAME, EMPREC, FRAMASTAT);
INTEGER ARRAY DBNAME;
EMPLOYEE'REC EMPREC;
INTEGER FRAMASTAT;
...
the compiler can warn us that
EMPLOYEE'REC EMPREC;
INTEGER ARRAY DBNAME;
INTEGER FOOBAR;
...
PUT'EMPLOYEE (EMPREC, DBNAME, FOOBAR);
is an invalid call -- it sees that an object of type EMPLOYEE'REC is
being passed in place of an INTEGER ARRAY, and an INTEGER ARRAY is
being passed in place of an EMPLOYEE'REC. Without this error checking,
you'd have to find this problem yourself at run-time, a distinctly
more difficult task.
RECORD STRUCTURES IN PASCAL AND C
What I just gave is the rationale for record structures, mostly for
the benefit of SPL programmers who haven't used PASCAL and C before.
Of course, the only reason I gave it is that PASCAL and C do have
record structure support, remarkably similar support at that. Here's
the way you declare a structure data type in PASCAL:
{ "PACKED ARRAY OF CHAR"s are PASCAL strings }
TYPE EMP_RECORD = RECORD
NAME: PACKED ARRAY [1..30] OF CHAR;
SSN: PACKED ARRAY [1..10] OF CHAR;
BIRTHDATE: INTEGER; { really a double integer }
SALARY: REAL;
END;
...
VAR
EMPREC: EMP_RECORD; { declare a variable called "EMPREC" }
And in C:
typedef
struct {char name[30];
char ssn[10];
long int birthdate;
float salary;
}
emp_record;
...
emp_record emprec; /* declare a variable called "emprec" */
You can see the minor differences -- the type names are different
("float" instead of "REAL", "long int" to mean double integer); the
type name comes at the end of the "typedef"; the newly defined type is
used a "statement" all its own rather than as part of a VAR statement;
and, of course, everything's written in those CUTE lower-case
characters. In essence, of course, the constructs are absolutely
identical.
The use is identical, as well:
NEW_SALARY := EMPREC.SALARY * 1.1;
new_salary = emprec.salary * 1.1;
Incidentally, if we didn't want to define a new type, but rather just
wanted to define one variable of a given structure, we could have
said:
VAR EMPREC: RECORD
NAME: PACKED ARRAY [1..30] OF CHAR;
SSN: PACKED ARRAY [1..10] OF CHAR;
BIRTHDATE: INTEGER; { really a double integer }
SALARY: REAL;
END;
struct {char name[30];
char ssn[10];
long int birthdate;
float salary;
}
emprec;
Note how the type declaration is very much like the original variable
declaration.
So, declaring and using record structures is identical in PASCAL
and C. However, there's a VERY BIG DIFFERENCE between PASCAL and C.
* In PASCAL, strict type checking is more than just a good idea,
it's the LAW.
If a function parameter is declared as type EMPLOYEE_REC, any
function call to it must pass an object of that type. Even if it
passes a record structure that's defined with exactly the same
fields but with a different type name (admittedly a rare
occurrence), the compiler will cough.
Any structure parameter must be of EXACTLY THE RIGHT TYPE.
* Many C programmers view strict type checking much as you or I
might view, say, the Gestapo or the KGB. Kernighan & Ritchie C
compilers DO NOT do type checking.
In fact, in Kernighan & Ritchie C, you can pass a string where a
real number is expected, and the compiler won't say a word! (On
the other hand, your program is unlikely to work right.)
I could fault C for this, treating C's lack of type checking much as I
do, say, SPL's lack of an easy I/O facility. The trouble is that C
programmers don't think that lack of type checking is a bug; they
think it's a feature. The problem is philosophical -- what are the
benefits of type checking and do they outweigh the drawbacks?
TYPE CHECKING -- ORIGINAL STANDARD PASCAL AND PASCAL/3000
Earlier in the paper I brought up a certain point. Compilers that
know the type of variables can, I said, check your code to make sure
that you're not using types inconsistently.
For instance, if you use a character when you should be using a
real number, that's an "obvious error" and the compiler can do you a
favor by complaining at compile-time. Similarly, if you pass an
employee record to a procedure that expects a database name, that's
also an error, and should also be reported.
Now, this principle is in many ways at the heart of the PASCAL
language. And, certainly, everyone will agree that it would be good
for the compiler to find errors in your program rather than making you
do it yourself. The question is --
IS A COMPILER WISE ENOUGH TO DETERMINE WHAT IS AN ERROR AND WHAT IS
NOT?
For instance, say you write
VAR CH: INTEGER;
IF 'a'<=CH AND CH<='z' THEN
CH:=CH-'a'+'A';
Utterly awful! We have what -- to PASCAL, at least -- is at least 4
type inconsistencies; we're comparing an integer against a character
two times, and then we're adding and subtracting characters and
integers! Obviously an error.
Actually, of course, this code takes CH, which it assumes is a
character's ASCII code, and upshifts it. If it finds that CH is a
lower case character, it shifts it into the upper case character set
by subtracting 'a' and adding 'A'.
Some might complain that this code is not portable (it won't, for
instance, work on EBCDIC machines), but that's not relevant. The
programmer has a perfect right to assume that the code will run on an
ASCII machine; you mustn't ram portability down his throat. Sometimes,
it's very useful to be able to, say, treat characters as integers and
vice versa.
Now, before anybody accuses me of slandering PASCAL, I must point
out that the solution is readily available. Pascal can convert a
character to an integer using the "ORD" function, and an integer to a
character using "CHR"; our code could easily be re-written:
VAR CH: INTEGER;
IF ORD('a')<=CH AND CH<=ORD('z') THEN
CH:=CH-ORD('a')+ORD('A');
The important point here is not whether or not you can upshift
characters; the important fact is that:
* SOMETIMES A PROGRAMMER MAY CONSCIOUSLY WANT TO DO THINGS THAT
MIGHT USUALLY BE VIEWED AS TYPE INCOMPATIBILITIES.
Consider, for a moment, the following application:
* You want to write a procedure that adds a record to the database.
Unlike DBPUT, this one should just take the database name, the
dataset name, and the buffer, and do all the error checking
itself.
Sounds simple, no? You write:
TYPE TDATABASE = PACKED ARRAY [1..30] OF CHAR;
TDATASET = PACKED ARRAY [1..16] OF CHAR;
TRECORD = ???;
...
PROCEDURE PUT_REC (VAR DB: TDATABASE;
S: TDATASET;
VAR REC: TRECORD);
BEGIN
...
END;
BUT HOW DO YOU DEFINE "TRECORD"?
Remember why I said that type checking is such a wonderful thing.
After all, if a procedure expects a "customer record" and you pass it
an "employee record", you want the compiler to complain.
But what if the procedure expects ANY kind of record? What if it'll
be perfectly HAPPY to take an employee record, a sales record, a
database name, or a 10 x 10 real matrix? How should the compiler react
then?
Unfortunately, PASCAL, with all its sophisticated type checking,
falls flat on its face (this is true of both Standard PASCAL and
PASCAL/3000).
At this point, in the interest of fairness (and for the practical
use of those who HAVE to do this sort of thing in PASCAL), I must
point out that PASCAL does have a mechanism for supporting record
structures of different types. The trick is to use a degenerate
variation of the record structure called the "tagless variant" or
"union" structure. It's quite similar to EQUIVALENCE in FORTRAN, but
even uglier.
To put it briefly, you have to say the following:
TYPE TANY_RECORD =
RECORD
CASE 1..5 OF
1: (EMP_CASE: TEMPLOYEE_RECORD);
2: (CUST_CASE: TCUSTOMER_RECORD);
3: (VENDOR_CASE: TVENDOR_RECORD);
4: (INV_CASE: TINVOICE_RECORD);
5: (DEPT_CASE: TDEPARTMENT_RECORD);
END;
This defines the type "TANY_RECORD" to be a record structure which can
be looked at in one of FIVE different ways:
* As having one field called "EMP_CASE" which is of type
"TEMPLOYEE_RECORD".
* As having one field called "CUST_CASE" which is of type
"TCUSTOMER_RECORD".
* Or, as having one field called "VENDOR_CASE", "INV_CASE", or
"DEPT_CASE", which is of type "TVENDOR_RECORD",
"TINVOICE_RECORD", or "TDEPARTMENT_RECORD", respectively. You get
the idea.
If you declare a variable of type "TANY_RECORD", it'll be allocated
with enough room for the largest of the component datatypes. Then, you
can make the variable "look" like any one of these records by using
the appropriate subfield:
VAR R: TANY_RECORD;
...
WRITELN (R.EMP_CASE.NAME); { views R as an employee record }
WRITELN (R.DEPT_CASE.DEPTHEAD); { views R as a dept record }
WRITELN (R.INV_CASE.AMOUNT); { views R as an invoice record }
In other words, an object of type TANY_RECORD is actually five
different record structures "equivalenced" together; which one you get
depends on which ".xxx_CASE" subfield you use.
Got all that? Now, here's how you define and call the PUT_REC
procedure:
PROCEDURE PUT_REC (VAR DB: TDATABASE;
S: TDATASET;
VAR REC: TANY_RECORD);
BEGIN
...
END;
...
{ now, all dataset records you need to pass must be declared to }
{ be of type TANY_RECORD. }
READLN (R.EMP_CASE.NAME, R.EMP_CASE.SSN);
R.EMP_CASE.BIRTHDATE := 022968;
R.EMP_CASE.SALARY := MINIMUM_WAGE - 1.00;
PUT_REC (MY_DB, EMP_DATASET, R);
You must declare ALL YOUR DATASET RECORDS to be of type TANY_RECORD
(wasting space if, say, TDEPARTMENT_RECORD is 10 bytes long and
TINVOICE_RECORD is 200 bytes long); you must refer to them with the
appropriate ".xxx_CASE" subfield; then, you must pass the TANY_RECORD
to PUT_REC. (Alternately, you may have one "working area" record of
type TANY_RECORD and move the record you want into the appropriate
subfield of this "working area" record before calling PUT_REC.)
As you may have guessed, I think this is a very poor workaround
indeed:
* You need to specify in the TANY_RECORD declaration every possible
type that you'll ever want to pass to PUT_REC;
* You have to declare any record you want to pass to PUT_REC to be
of type TANY_RECORD, even if it wastes space.
* If you don't want to use a "working area" record, you have to
refer to all your records as "R.EMP_CASE" or "R.DEPT_CASE" rather
than just defining R as the appropriate type and referring to it
just as "R".
* If you do use a "working area" record, to wit:
VAR WORK_RECORD: TANY_RECORD;
EMP_REC: TEMPLOYEE_RECORD;
...
READLN (EMP_REC.NAME, EMP_REC.SSN);
EMP_REC.BIRTHDATE := 022968;
EMP_REC.SALARY := MINIMUM_WAGE - 1.00;
WORK_RECORD.EMP_CASE := EMP_REC;
PUT_REC (MY_DB, EMP_DATASET, WORK_RECORD);
then you have to move your data into it before every PUT_REC
call, which is both ugly and inefficient.
And why? All because PASCAL isn't flexible enough to allow you to
declare a parameter to be of "any type".
A couple more examples of cases where strict type checking is
utterly lethal may be in order:
* Say that you want to write a procedure that compares two PACKED
ARRAY OF CHARs (in Standard PASCAL, these are the only way of
representing strings). You must define the types of your
parameters, INCLUDING THE PARAMETER LENGTHS! In other words,
TYPE TPAC = PACKED ARRAY [1..256] OF CHAR;
VAR P1: PACKED ARRAY [1..80] OF CHAR;
P2: PACKED ARRAY [1..80] OF CHAR;
...
FUNCTION STRCOMPARE (VAR X1: TPAC; VAR X2: TPAC): BOOLEAN;
BEGIN
...
END;
...
IF STRCOMPARE (P1, P2) THEN ...
is ILLEGAL. P1, you see, is an 80-character string, which is not
compatible with the function parameter, which is a 256-character
string.
* Say that you want to write a procedure like WRITELN, which will
format data of various types. WRITELN may not be sufficient for
your needs -- you might need to be able to output numbers
zerofilled or in octal, you might want to provide for page breaks
and line wraparound, etc. Surely you should be allowed to do
this!
Well, first of all, you can't have a variable number of
parameters. But, even if you're willing to have a maximum of,
say, 10 parameters and pad the list with 0s, your parameters must
all be of fixed types!
Thus, even if your design calls for some kind of "format string"
that'll tell your WRITELN-replacement what the actual type of
each parameter is, you can't do anything. You must either have a
procedure for each possible type combination (one to output two
integers and a string, one to output a real, an integer, and
three strings, etc.), or have the procedure only output one
entity at a time. This way, you'll have to write:
PRINTS ('THE RESULT WAS ');
PRINTI (ACTUAL);
PRINTS (' OUT OF A MAXIMUM ');
PRINTI (MAXIMUM);
PRINTS (', WHICH WAS ');
PRINTR (ACTUAL/MAXIMUM*100);
PRINTS ('%');
PRINTLN;
instead of
PRINTF ('THE RESULT WAS %d OUT OF A MAXIMUM %d, WHICH WAS %f',
ACTUAL, MAXIMUM, ACTUAL/MAXIMUM*100);
* Finally -- although it should be obvious by now -- you can't
write, say, a matrix inversion function that takes any kind of
matrix. You could write a 2x2 inverter, a 3x3 inverter, a 4x4
inverter, and so on. You could also write a matrix multiplier
that multiplies 2x2s by 2x2s, another that does 2x2s by 2x3s,
another 2x2s by 2x4s, another 3x2s by 2x2s, .... Just think of
the job security you'll have!
For fairness's sake, I must admit that this problem is SLIGHTLY
mitigated in PASCAL/3000.
PASCAL/3000 has a "STRING" data type, which is a variable-length
string (as opposed to PACKED ARRAY OF CHAR, which is a fixed-length
string). In other words, PASCAL/3000 STRINGs are essentially
(internally) record structures, containing an integer -- the current
string length -- and a PACKED ARRAY OF CHAR -- the string data.
When HP implemented this, they were good enough to make all STRINGs
-- regardless of their maximum sizes -- "assignment- compatible" with
each other. This means that you can say:
VAR STR1: STRING[80];
STR2: STRING[256];
...
STR1:=STR2;
and also
TYPE TSTR256 = STRING[256];
VAR S: STRING[80];
...
FUNCTION FIRST_NON_BLANK (PARM: TSTR256): INTEGER;
BEGIN
...
END;
...
I := FIRST_NON_BLANK (S);
Since STRING[80]s (strings with maximum length 80) and STRING[256]s
(strings with maximum length 256) are assignment- compatible, you may
both directly assign them (STR1:=STR2) and pass one by value in place
of another (PROC(S)).
Although "assignment compatibility" allows by-value passing, a
variable passed by reference still has to be of exactly the same type
as the formal parameter specified in the procedure's header. Thus,
TYPE TSTR256 = STRING[256];
VAR S: STRING[80];
...
FUNCTION FIRST_NON_BLANK (VAR PARM: TSTR256): INTEGER;
BEGIN
...
END;
...
I := FIRST_NON_BLANK (S);
is still illegal, since STRING[80]s can't be passed to by-reference
(VAR) parameters of type STRING[256]. Fortunately, PASCAL/3000 also
lets you say:
FUNCTION FIRST_NON_BLANK (VAR PARM: STRING): INTEGER;
Specifying a type of "STRING" rather than "STRING[maxlength]" allows
you to pass any string in place of the parameter.
This only works for STRING parameters. It doesn't work for PACKED
ARRAYs OF CHAR; it doesn't work for other array structures; it isn't
supported by Standard PASCAL. However, for the specific case of string
manipulation, you can get around some of PASCAL's onerous parameter
type checking restrictions.
Remember also that this is strictly an PASCAL/3000 (PASCAL/3000 and
PASCAL/XL) feature, and can not be relied on in any other PASCAL
compiler.
TYPE CHECKING -- KERNIGHAN & RITCHIE C
Where PASCAL insists on checking all parameters for an exact type
match, original -- Kernighan & Ritchie -- C takes the diametrically
opposite view.
Classic C checks NOTHING. It does not check parameter types; it
does not even check the number of parameters. All data in C is passed
"by value", which means that the value of the expression you specify
is pushed onto the parameter stack for the called procedure to use; if
you want to pass a variable "by reference" -- pushing its pointer onto
the stack -- you have to use the "&" operator to get the variable's
address, to wit:
myproc (&result, parm1, parm2);
If you omit the "&", or specify it when you shouldn't -- well, C
doesn't check for this, either.
Much can be said about the philosophical reasons that C is this
way; many labels, from "flexibility" to "cussedness" can be attached
to it. The fact of the matter, though, is that K&R C -- which means
many, if not most, of today's C compilers -- doesn't do any type
checking.
The effects of this, of course, are the opposite of the effects of
PASCAL's strong type checking:
* You have almost complete flexibility in what types you pass to a
procedure. In two different calls, the same parameter can be one
of two entirely different record structures; one of two character
or integer arrays of entirely different lengths (C doesn't do
run-time bounds checking, anyway); a real in one call, an integer
in another, and a pointer in a third.
Practically, virtually all of the examples I showed in the PASCAL
chapter can thus be implemented in C. For instance,
int strcompare(s1,s2,len)
char *s1, *s2;
int len;
{
int i;
i = 0;
while ((i < len) && (s1[i] == s2[i]))
i = i+1;
}
will merrily compare two character arrays, no questions asked.
You can pass arrays of any size, and it'll do the job. You can
pass integers, reals, integer arrays, whatever; of course, the
code isn't likely to work, but, hey, it's a free country --
nobody'll stop you.
* In most implementations of K&R C, you're even allowed to pass a
different number of parameters than the function was declared
with. Though this is not guaranteed portable, most C compilers
make sure that if, say, your procedure's formal parameters are
"a", "b", and "c" (all integers) and you actually pass the values
"1" and "2", then A will be set to 1, B to 2, and C will contain
garbage (that's "C" the variable, not "C" the language).
This is good because it allows you to write procedures that take
a variable number of parameters; as long as you have a way of
finding out how many parameters were actually passed (e.g. the
PRINTF format string), your procedure can handle them
accordingly.
* On the other hand, say you make a mistake in a procedure call --
you pass a real instead of an integer, a value instead of a
pointer, or perhaps even omit a parameter. The compiler won't
check this; the only way you'll find the error is by running the
program, and even then the erroneous results may first appear far
away from the real error.
Some C compilers (especially on UNIX) come with a program called
LINT that can check for this error and others, but that's often
not enough. First of all, your programmers have to run LINT as
well as C for each program, which slows down the compilation
pass; more importantly, since LINT is no way part of standard C,
many C compilers don't have it.
VAX/VMS C, for instance, doesn't come with LINT; neither does the
CCS C that's available on the HP3000.
* Similarly, even things that seem like they ought to work --
passing an integer in place of a real and expecting it to be
reasonably converted -- will fail badly. Thus,
sqrt(100)
won't work if SQRT expects a real; C won't realize that an
integer-to-real conversion is required, and will thus pass 100 as
an integer, which is a different thing altogether.
A similar problem occurs on computers (like the HP3000) that
represent byte pointers (type "char *") and integer pointers
(type "int *" and other pointer types) differently. Since C
doesn't know which type of pointer a procedure expects, it'll
never do conversions. If you call a procedure like FGETINFO that
expects byte pointers and pass it an integer pointer, you'll be
in trouble (unless you manually cast the pointer yourself).
Incidentally, for ease of using real numbers, C will
automatically convert all "single-precision real" (called "float"
in C) arguments to "double-precision real" ("long") in function
calls. This makes sure that if SQRT expects a "long", passing it
a "float" won't confuse it.
* On the other hand (how many hands am I up to now?), C's
conversion woes -- requirements of passing "float"s instead of
"int"s, "char *"s instead of "int *"s, etc. -- are easier to
solve than in PASCAL. Since C allows you to easily convert a
value from one datatype to another (using the so-called "type
casts"), you could say
my_proc ((float) 100, (char *) &int_value);
and thus pass a "float" and a "char *" to "my_proc". In PASCAL
you couldn't do things this easily. The compiler might
automatically translate an integer to a float for you; but, if it
expects a character value and all you've got is an integer,
there's no easy way for you to tell it "just pass this integer as
a byte address, I know what I'm doing."
Thus, K&R C is flexible enough to do all that Standard PASCAL can
not. If this is necessary to you -- and I can easily understand why it
would be; Standard PASCAL's restrictions are very substantial -- then
you'll have to live with C's lack of error checking. On the other
hand, if flexibility is of less than critical value, you have to ask
yourself whether or not you want the extra level of compiler error
checking that PASCAL can provide you.
My personal experience, incidentally, has been that compiler error
checking of parameters is very nice, but not absolutely necessary. I'd
love to have the compiler find my bugs for me, but I can muddle
through without it. PASCAL's restrictions, though, are substantially
more grave. More than inconveniences, they can make certain problems
almost impossible to solve.
DRAFT ANSI STANDARD C
Time, it is said, heals all wounds; perhaps it can also heal
wounded computer languages. God knows, FORTRAN 77 isn't the greatest,
but it sure is better than FORTRAN IV.
The framers of the new Draft ANSI Standard C have apparently
thought about some of the problems that C has, especially the ones
with function call parameter checking and conversion. The solution
seems to be quite good, letting you impose either strict or loose type
checking -- whichever you prefer -- for each procedure or procedure
parameter. Remember, though, the standard is still only Draft, so it's
not unlikely that any given C compiler you might want to use won't
have it.
In Draft Standard C, you can do one of two things:
* You can call a procedure the same old way that you'd do in K & R
C. No type checking, no automatic conversion, no nothin'. You
might declare its result type, to wit:
extern float sqrt();
(Remember, you'd have to do that anyway in K&R C; otherwise, the
compiler will treat SQRT's result as an integer.) But no other
declarations are required, and no checking will be done.
* Alternatively, you can declare a FUNCTION PROTOTYPE. This can be
done either for an external function or for one you're defining
-- the prototype is very much like PASCAL's procedure header
declaration. A sample might be:
extern int ASCII (int val, int base, char *buffer);
or simply
extern int ASCII (int, int, char *);
[Note that the parameter NAMES, as opposed to TYPES, are not
necessary in a prototype for an EXTERNAL function. For a function
that you're actually defining, the names are necessary; the
declarations in the prototype are used in place of the type
declarations that you'd normally specify for the function
parameters.]
This function prototype tells the compiler enough about the
function parameters for it to be able to do appropriate type
checking and conversion. One of the reasons K&R C couldn't do
that is precisely because of the lack of this information.
Consider the cases where this would come in handy. We might declare
SQRT as
extern float sqrt (float);
and then a call like
sqrt (100)
would automatically be taken to mean "sqrt ((float) 100)", i.e. "sqrt
(100.0)". Similarly,
sqrt (100, 200)
or
sqrt ()
would cause a compiler error or warning, since now the compiler KNOWS
that SQRT takes exactly one parameter.
In general, say that you have a function declared as
extern int f(formaltype); /* or non-extern, for that matter */
This simply means that "f" is a function that returns an "int" and
takes one parameter of type "formaltype". Now, say that your code
looks like:
actualtype x;
...
i = f(x);
Is this kind of call valid or not? Of course, it depends on what
"formaltype" and "actualtype" are:
* If both FORMALTYPE and ACTUALTYPE are numbers -- integers or
floats, short, long, or whatever -- then X is converted to
ACTUALTYPE before the call. This is what lets us say
sqrt(100)
when "sqrt" is declared to take a parameter of type "real".
(The same goes the other way -- if "mod" is declared to take two
"int"s, then "mod(10.5,3.2)" would be converted to "mod(10,3)",
although the compiler might print a warning message to caution
you that a truncation is taking place.)
* If FORMALTYPE is a pointer -- which is the case for all
"by-reference" parameters, since that's how we pass things by
reference in C -- then ACTUALTYPE must be EXACTLY the same type.
In other words, if we say:
int copystring (char *src, char *dest)
then in the call
char x;
int y;
...
copystring (x, &y);
BOTH parameters will cause an error message. The first parameter
will be a "CHAR" passed where a "CHAR *" is expected, which is
illegal -- a good way of checking for attempts to pass parameters
by value where by-reference was expected. The second parameter
will be an "INT *" passed where a "CHAR *" is expected, which is
also illegal, since although both are pointers, they don't point
to the same type of thing.
* If ACTUALTYPE is a pointer, then FORMALTYPE must also be a
pointer of EXACTLY the same type. Again, this is useful for
catching attempts to pass "by-reference" calls to procedures that
expect "by-value" parameters, and also attempts to pass a pointer
to the wrong type of object.
* If either ACTUALTYPE or FORMALTYPE is a pointer of the special
type "void *", then the other one may be any type of pointer.
This is very useful when we want a parameter to be a BY-REFERENCE
parameter of some arbitrary type (similar to PASCAL/XL's ANYVAR,
for which see below). Thus, if we want to write our "put_rec"
procedure that'll put any type of record structure into a
database, we'd say:
put_rec (char *dbname, char *dbset, void *rec)
Then, we could say:
typedef struct {...} sales_rec_type;
typedef struct {...} emp_rec_type;
...
sales_rec_type srec;
emp_rec_type erec;
...
put_rec (mydb, sales_set, &srec);
...
put_rec (mydb, emp_set, &erec);
Both of the PUT_REC calls are valid since both "&srec" and
"&erec" (and, for that matter, any other pointer) can be passed
in place of a "void *" parameter. If we'd declared "put_rec" as:
put_rec (char *dbname, char *dbset, sales_rec_type *rec)
then the "put_rec (mydb, emp_set, &erec)" call would NOT be
legal, sinec "&erec" is NOT compatible with "sales_rec_type *".
Note that on some machines -- including the HP3000 -- integer
pointers and character pointers are NOT represented the same way.
However, it's always safe to pass either a "char *" or an "int *"
in place of a parameter that's declared as a "void *". The C
compiler will always do the appropriate conversion; thus, if we
declare the ASCII intrinsic as
extern int ASCII (int, int, void *);
then both of the calls below:
char *cptr;
int *iptr;
...
i = ASCII (num, 10, cptr);
...
i = ASCII (num, 10, iptr);
will be valid (assuming that a "void *" is actually represented
as a byte pointer, which is what the ASCII intrinsic wants). You
can thus think of "void *" as the "most general type"; any
pointer can be successfully passed to a "void *".
* Note that although you CAN'T pass, say, a "char *" to a parameter
of type "int *", C will ignore the SIZE of the array the pointer
to which is being passed. In other words, a function such as
extern strlen (char *s);
may be passed a pointer to a string of any size -- both of the
following calls:
char s1[80], s2[256];
...
i = strlen (s1);
i = strlen (s2);
are valid. Remember that C makes no distinction between a
"pointer to an 80-byte array" and a "pointer to a 256-byte
array"; similarly, it makes no distinction between an array like
"s1" and a "pointer to a character" (see below).
* An interesting exception to the above rules is that the integer
constant 0 can be passed to ANY pointer parameter. This is
because a pointer with value 0 is conventionally used to mean a
"null pointer".
This is quite useful in some applications, but can often prevent
the compiler from detecting some errors. If I say:
extern PRINT (int *buffer, int len, int cctl);
...
PRINT (0, -10, current_cctl);
this won't, of course, print a "0"; rather, it'll pass PRINT the
integer pointer "0", which will point to God knows what in your
stack. Not a very serious problem, but something you ought to
keep in mind.
* Unlike Standard PASCAL, not only can you entirely waive parameter
checking for a procedure (just omit the prototype!), but you can
also explicitly CAST an actual parameter whenever you want it to
match the type of a formal parameter. In other words, say that
you declare two structure types:
typedef struct {...} rec_a;
typedef struct {...} rec_b;
rec_a ra; /* declare a variable of type "rec_a" */
rec_b rb; /* declare a variable of type "rec_b" */
and then write a function
process_record_a (int x, int y, rec_a *r)
{
...
}
If you then say
process_record_a (10, 20, &rb);
then the compiler will (quite properly) print an error message,
since you were trying to pass a "pointer to rec_b" instead of a
"pointer to rec_a". If you really want to do this, though, all
you need to do is say:
process_record_a (10, 20, (rec_a *) &rb);
manually CASTING the pointer "&rb" to be of type "rec_a *", and
the compiler won't mind.
* Finally, let me also point out that, like everywhere in C, an
"array of T" and a "pointer to T" are mutually interchangeable.
In other words, if you say:
extern int string_compare (char *s1, char *s2);
and then call it as:
char str1[80], str2[256];
...
if (string_compare (str1, str2)) ...
the compiler won't mind. To it a "char *" and a "char []" are
really one and the same type.
Somewhat (but not exactly) similarly -- perhaps I should say,
similarly but differently -- the NAME OF A FUNCTION can be passed
to a parameter that is expecting a POINTER TO A FUNCTION. In
other words, if you write a procedure
int do_function_on_array_elems (int *f(), int *a, int len);
(which takes a pointer to a function, a pointer to an integer,
and an integer), and then call it as:
do_function_on_array_elems (myfunc, xarray, num_xs);
the compiler won't complain (assuming, of course, that MYFUNC is
really a function and not, say, an integer or a pointer).
To summarize, then, Draft Proposed ANSI Standard C lets you check
function parameters almost as precisely as Standard PASCAL. The
differences are:
* You can ENTIRELY INHIBIT PARAMETER CHECKING for all function
parameters by just omitting the function prototype.
* You can declare a parameter to BE A BY-REFERENCE PARAMETER OF AN
ARBITRARY TYPE by declaring it to be of type "void *". You can do
this while still enforcing tight type checking for all the other
parameters.
* In addition to overriding type checking on a PROCEDURE BASIS or
PROCEDURE PARAMETER basis, you can also override type checking on
a particular call by simply casting the actual parameter to the
formal parameter's datatype.
* Unlike PASCAL, C will never check the SIZE of an array parameter;
only its TYPE.
STANDARD "LEVEL 1" PASCAL TYPE CHECKING -- CONFORMANT ARRAYS
If you recall, one of the PASCAL features I most complained about
was the inability to pass arrays of different sizes to different
procedures. This essentially prevents you from writing any sort of
general array handling routine, including:
* For PACKED ARRAYs OF CHAR -- the way that Standard PASCAL
represents strings -- you can't write things like blank trimming
routines, string searches, or anything that's intended to take
PACKED ARRAYs OF CHAR of different sizes.
* For other arrays, the problem is exactly the same -- you can't
write matrix handling routines that work with arbitrary sizes of
arrays, e.g. matrix addition, multiplication division, etc.
This wasn't the only type checking problem (others included the
inability to pass various record types to database I/O routines,
etc.), but it was a major one.
The ISO Pascal Standard, released in the early 80's, addresses this
problem. A new feature called "conformant arrays" has been defined;
PASCAL compilers are encouraged, but not required, to implement it. A
compiler is said to
* "Comply at level 0" if it does not support conformant arrays;
* "Comply at level 1" if it does support them.
You see the problem -- who knows just how many new PASCAL compilers
will include this feature? It is a fact that most compilers written
before the ISO Standard do NOT include it.
PASCAL/3000, for instance, does not have it; PASCAL/XL, on the
other hand, does.
What are "conformant arrays"? To put it simply, they are
* FUNCTION PARAMETERS that are defined to be ARRAYS OF ELEMENTS OF
A GIVEN TYPE, but whose bounds are NOT defined. Instead, the
compiler makes sure that the ACTUAL BOUNDS of whatever array
parameter is ACTUALLY passed are made known to the procedure.
An example:
FUNCTION FIRST_NON_BLANK
(VAR STR: PACKED ARRAY [LOWBOUND..HIGHBOUND: INTEGER]
OF CHAR): INTEGER;
VAR I: INTEGER;
BEGIN
I:=LOWBOUND;
WHILE I<HIGHBOUND AND STR[I]=' ' DO
I:=I+1;
FIRST_NON_BLANK:=I;
END;
This procedure is intended to find the index of the first non-blank
character of STR. Note how it declares STR: Instead of specifying a
constant lower and upper bound in the PACKED ARRAY [x..y] OF CHAR
declaration, it specifies TWO VARIABLES.
When the procedure is entered, the variable LOWBOUND is
automatically set to the lower bound of whatever array the user
actually passed, and HIGHBOUND is set to the upper bound of the array.
In other words, if we say:
VAR MYSTR: PACKED ARRAY [1..80] OF CHAR;
...
I:=FIRST_NON_BLANK (MYSTR);
then, in FIRST_NON_BLANK, the variable LOWBOUND will be set to 1 and
HIGHBOUND will be set to 80. Instead of just passing the MYSTR
parameter, PASCAL actually passes "behind your back" 1 and 80 as well.
The way I see it, this is a very good solution, even better in some
ways than C's (in which you can always pass an array of any arbitrary
size):
* You're no longer restricted (like you are in Standard PASCAL) to
a fixed size for your array parameters.
* When you pass an array to a conformant array parameter, you don't
have to manually specify the size of the array; the array bounds
are automatically passed for you. If I were to write the same
procedure in C, I'd have to say
int first_non_blank (str, maxlen)
char str[];
int maxlen;
...
and then manually pass it both the string and the size that it
was allocated with; otherwise, the procedure won't know when to
stop searching (assuming you don't use the convention that a
string is terminated by a null or some such terminator).
* Since the compiler itself knows what the conformant array
parameter's bounds are (it doesn't know the actual values, but it
does know what variables contain the values), it can emit
appropriate run-time bounds checking code. This can automatically
catch some errors at run-time, which is good if you like heavy
compiler-generated error checking.
* Conformant arrays are even better for two-dimensional arrays. To
index into a two-dimensional array the compiler must, of course,
know the number of columns in the array (assuming it's stored in
row-major order, as C and PASCAL 2-D arrays are). In C, you must
either declare the number of columns as a constant, e.g.
matrix_invert (m)
float m[][100];
or declare the parameter as a 1-D array, pass the number of
columns as a parameter, and then do your own 2-D indexing, to
wit:
matrix_invert (m, numcols)
float m[];
int numcols;
...
element = m[row*numcols+col]; /* instead of M[ROW,COL] */
...
In ISO Level 1 PASCAL, you just declare the procedure as:
PROCEDURE MATRIX_INVERT (M: ARRAY [MINROW..MAXROW,
MINCOL..MAXCOL] OF REAL);
Then you automatically know the bounds of the array AND can also
do normal array indexing (M[ROW,COL]), since the compiler knows
the number of columns, too.
This, it seems, is how original Standard PASCAL should have worked,
and I'm glad that the standards people have established it. The only
problems are:
* This is, of course, somewhat less efficient than not passing the
bounds or just passing, say, the upper bound (like you would in
C).
* Remember that this only fixes the case where we want to pass
differently sized arrays to a procedure. If we want to pass
different TYPES (like in our PUT_REC procedure that should
accept one of several database record types), conformant arrays
won't help us.
* Most importantly, MANY PASCAL COMPILERS MIGHT NOT SUPPORT THIS
WONDERFUL FEATURE. In particular, PASCAL/3000 DOES NOT SUPPORT
CONFORMANT ARRAYS.
PASCAL/XL TYPE CHECKING
PASCAL/XL obeys all of PASCAL's type checking rules, but gives you
a number of ways to work around them:
* PASCAL/XL supports the CONFORMANT ARRAYS that I just talked
about.
* PASCAL/XL allows you to specify a variable as "ANYVAR", e.g.
PROCEDURE PUT_REC (VAR DB: TDATABASE;
S: TDATASET;
ANYVAR REC: TDBRECORD);
What this means to PASCAL is that, when PUT_REC is called, the
third parameter (REC) will NOT be checked. Inside PUT_REC, you'll
be able to refer to this parameter as REC, and to PUT_REC it'll
have the type TDBRECORD; however, the CALLER need not declare it
as TDBRECORD. For instance,
VAR SALES_REC: TSALES_REC;
EMP_REC: TEMP_REC;
...
PUT_REC (MY_DB, SALES_DATASET, SALES_REC);
...
PUT_REC (MY_DB, EMP_DATASET, EMP_REC);
will do EXACTLY what we want it to -- it'll pass SALES_REC and
EMP_REC to our PUT_REC procedure without complaining about their
data types.
As I said, PUT_REC itself will view the REC parameter as an
object of type TDBRECORD. However, PUT_REC can say
SIZEOF(REC)
and determine the TRUE size of the actual parameter that was
passed in place of REC. This can be very useful if PUT_REC needs
to do an FWRITE or some such operation that needs to know the
size of the thing being manipulated.
The way this is done, of course, is by PASCAL/XL's passing the
size of the actual parameter as well as the parameter's address.
Incidentally, you can turn this off for efficiency's sake if
you're not going to use this SIZEOF construct.
* PASCAL/XL allows you to do TYPE COERCION -- you can take an
object of an arbitrary type and view it as any other type. For
instance, you can take a generic "ARRAY OF INTEGER" and view it
as a record type, or take an INTEGER parameter and view it as a
FLOAT. A possible application might be:
TYPE COMPLEX = RECORD REAL_PART, IMAG_PART: REAL; END;
INT_ARRAY = ARRAY [1..2] OF INTEGER;
...
PROCEDURE WRITE_VALUE (T: INTEGER; ANYVAR V: INT_ARRAY);
BEGIN
IF T=1 THEN WRITELN (V[1])
ELSE IF T=2 THEN WRITELN (FLOAT(V))
ELSE IF T=3 THEN WRITELN (BOOLEAN(V))
ELSE IF T=4 THEN WRITELN (COMPLEX(V).REAL_PART,
COMPLEX(V).IMAG_PART);
END;
As you see, this procedure takes a type indicator (T) and a
variable of any type V. Then, depending on the value of T, it
VIEWS V as an integer, a float, a boolean, or a record structure
of type COMPLEX. All we need to do is say
typename(value)
and it returns an object with EXACTLY THE SAME DATA as "value",
but viewed by the compiler as being of type "typename". Note that
this means that "REAL(10)" won't return 10.0 (which is what a C
"(float) 10" type cast would do); rather, it'll return the
floating point number the MACHINE REPRESENTATION of which is 10.
Some other example applications for this very useful construct
are:
- You can now have a pointer variable that can be set to point
to an object of an arbitrary type; this allows you to write
things like generic linked list handling procedures that work
regardless of what type of object the linked list contains.
More about this on ANYPTR below.
- You may write a generic bit extract procedure that can be
used for extracting bits from characters, integers, reals,
etc. You'd declare it as:
FUNCTION GETBITS (VAL, STARTBIT, LEN: INTEGER): INTEGER;
...
and call it using
I:=GETBITS (INTEGER(3.0*EXP(X)), 10, 6);
or
I:=GETBITS (INTEGER(MYSTRING[I]), 5, 1);
or whatever. Note that you couldn't do this with ANYVAR
parameters since ANYVAR parameters are by-reference, and thus
can't be passed constants or expressions.
* PASCAL/XL -- just like PASCAL/3000 -- makes STRING parameters of
any size compatible with each other. Thus, you can pass a
STRING[20] to a procedure that's defined to take a STRING[256];
or, if you're passing the string by REFERENCE, you can just
declare the formal parameter as "STRING", which will be
compatible with any string type.
* PASCAL/XL has a new type called "ANYPTR"; declaring a variable to
be an ANYPTR makes it "assignment-compatible" with any other
pointer type, which means that that variable can be easily made
to point to objects of different types. This, coupled with the
"type coercion" operation mentioned above, makes manipulating
say, linked lists of different data structures much easier.
Needless to say, use of any of these constructs can get you into
trouble precisely because of the additional freedom they give you.
Converting a chunk of data from one record data type to another only
makes sense if you know exactly what you're doing; if you don't,
you're likely to end up with garbage.
However, often there are cases where you NEED this additional
freedom, and in those cases, PASCAL/XL really comes through. As a
rule, its type checking is as stringent and thorough as Standard
PASCAL's, but it allows you to relatively easily waive the type
checking whenever you need to.
ENUMERATED DATA TYPES
If you recall, before I started talking about type checking, I was
describing RECORD STRUCTURES, a new data type that PASCAL and C
support. My mind, you see, works like a stack -- sometimes I'll
interrupt what I'm doi