AUTOMATED TESTING -- WHY AND HOW
                       by Eugene Volokh, VESOFT
        Presented at 1990 INTEREX Conference, Boston, MA, USA
              Published by INTERACT Magazine, Dec 1990.


Everyone  knows  how  important  testing is, and,  with luck, everyone
actually does test the software that they release. But do they really?
Can  they?  Even  a  simple program often  has many different possible
behaviors,  some of which only take  place in rather unusual (and hard
to  duplicate)  circumstances.  Even  if  every possible  behavior was
tested  when  the program was first released  to the users, what about
the  second release, or even a "minor" modification? The feature being
modified  will probably be re-tested,  but what about other, seemingly
unrelated,  features  that  may have been  inadvertently broken by the
modification?  Will  every unusual test case  from the first release's
testing  be  remembered,  much  less  retried,  for  the  new release,
especially  if  retrying  the test would require  a lot of preliminary
work (e.g. adding appropriate test records to the database)?

This  problem  arose for us several years  ago, when we found that our
software  was  getting  so complicated that  testing everything before
release  was  a  real  chore, and a good many  bugs (some of them very
obvious)  were getting out into the field. What's more, I found that I
was  actually  afraid  to add new features,  concerned that they might
break  the rest of the software. It  was this last problem that really
drove  home to me the importance of  making it possible to quickly and
easily test all the features of all our products.


AUTOMATED TESTING

The  principle of automated testing is  that there is a program (which
could  be a job stream) that runs the program being tested, feeding it
the  proper input, and checking the output against the output that was
expected.  Once  the  test suite is written,  no human intervention is
needed,  either to run the program or to look to see if it worked; the
test  suite  does  all  that,  and somehow indicates  (say, by a :TELL
message  and  a  results  file)  whether  the program's  output was as
expected.  We, for instance, have over two hundred test suites, all of
which  can  be  run  overnight by executing  one job stream submission
command;  after  they run, another command  can show which test suites
succeeded and which failed.

These test suites can help in many ways:

   * As discussed above, the test suites should always be run before a
     new  version is released, no matter how trivial the modifications
     to the program.

   *   If   the   software   is  internally  different  for  different
     environments  (e.g.  MPE/V vs. MPE/XL), but  should have the same
     external  behavior,  the  test  suites  should  be  run  on  both
     environments.

   *  As you're making serious changes to the software, you might want
     to  run  the test suites even before  the release, since they can
     tell you what still needs to be fixed.

   *  If you have the discipline to --  believe it or not -- write the
     test  suite before you've written your  program, you can even use
     the test suite to do the initial testing of your code. After all,
     you'd  have to initially test the  code anyway; you might as well
     use  your  test suites to do that  initial testing as well as all
     subsequent tests.

Note  also  that the test suites not only  run the program, but set up
the  proper environment for the program;  this might mean filling up a
test database, building necessary files, etc.


WRITING TEST SUITES

Let's  switch  for  a moment to a  concrete example -- a date-handling
package,  something that, unfortunately, many people have had to write
on  their  own,  from  scratch.  Say that one of  the routines in your
package  is DATEADD, which adds a given  number of days to a date, and
returns  the new date. Here's the code that you might write to test it
(the dates are represented as YYYYMMDD 32-bit integers):

   IF DATEADD (19901031, 7) <> 19901107 THEN
     BEGIN
     WRITELN ('Error: DATEADD (19901031, 7) <> 19901107');
     GOT_ERROR:=TRUE;
     END;
   IF DATEADD (19901220, 20) <> 19910109 THEN
     BEGIN
     WRITELN ('Error: DATEADD (19901220, 20) <> 19910109');
     GOT_ERROR:=TRUE;
     END;
   ...

As you see, the code calls DATEADD several times, and each time checks
the result against the expected result; if the result is incorrect, it
prints  an  error  message  and sets GOT_ERROR to  TRUE. After all the
tests  are done, the program can check if GOT_ERROR is TRUE, and if it
is, say, build a special "got error" file, or write an error record to
some  special  log  record.  This  way,  the test suites  can be truly
automatic -- you can run many test suites in the background, and after
they're done, find out if all went well by just checking one file, not
looking through many large spool files for error messages.

The  first thing that you might notice  is that the DATEADD test suite
can  easily grow to be much  larger than the DATEADD procedure itself!
No  doubt  about  it  --  writing  test  suites  is  a  very expensive
proposition.   Our  test  suites  for  MPEX/3000,  SECURITY/3000,  and
VEAUDIT/3000  take  up  almost  30,000 lines,  not counting supporting
files  and  supporting  code in the actual  programs; the total source
code of our products is less than 100,000 lines. Often, writing a test
suite  for  a  feature  takes  as  long or almost  as long as actually
implementing the feature. Sometimes, instead of being reluctant to add
a  new  feature for fear of breaking  something, I am now reluctant to
add  a new feature because I don't want to bother writing a test suite
for it.

Fortunately,  the  often  dramatic  costs  of writing  test suites are
recouped  not just by the decrease in  the number of bugs, but also by
the  fact that test suites, once written,  save a lot of testing time.
It's much easier for someone to run an already-written test suite than
to execute by hand even a fraction of the tests included in the suite,
especially if they require complicated set-up. Since a typical program
will actually have to be tested several times before it finally works,
the  costs of writing a test suite  (assuming that it's written at the
same  time  as  the code, or even earlier)  can be recouped before the
program is ever released.

Also,  test suites tend to have longer  lives than code. A program can
be dramatically changed -- even re-written in another language -- and,
assuming  that it was intended to behave  the same as before, the test
suite will work every bit as well. Once the substantial up-front costs
of  writing  test  suites  have  been  paid, the pay-offs  can be very
substantial.

But  even  though we should be willing  to invest time and effort into
writing test suites, there's no reason to invest more than we have to.
In  fact,  precisely  because test suites at  first glance seem like a
luxury, and people are thus not very willing to work on them, creating
test  suites  should  be  as easy as possible. What  can we do to make
writing test suites simpler and more efficient?

One  goal that I try to shoot for is to make it as easy as possible to
add  new test cases, even if this  means doing some additional work up
front.  I try to make every new test  case, if possible, to fit on one
line.   The  reason  is  quite  simple:  I  want  to  have  as  little
disincentive  as  possible  to add new test  cases. A really fine test
suite  would  have  tests for many  different situations, including as
many obscure boundary conditions and exceptions as possible; also, any
time a new bug is found, a test should be added to the test suite that
would  have  caught  the  bug,  just  in  case the  bug re-surfaces (a
remarkably  frequent  event).  If  we  grit  our teeth  and write some
convenient  testing  tools  up  front,  we can make  it much easier to
create a full test suite.

Here, for instance is one example:

   PROCEDURE TESTDATEADD (DATE, NUMDAYS, EXPECTEDRESULT: INTEGER);
   BEGIN
   IF DATEADD (DATE, NUMDAYS) <> EXPECTEDRESULT THEN
     BEGIN
     WRITELN ('Error: DATEADD (', DATE, ', ', NUMDAYS, ') <> ',
              EXPECTEDRESULT);
     GOT_ERROR:=TRUE;
     END;
   END;
   ...
   TESTDATEADD (19901031, 10, 19901110);
   TESTDATEADD (19901220, 20, 19910109);
   TESTDATEADD (19920301, -2, 19920228);
   ...

By  this  model,  each  procedure  that  you  test  would have  a test
procedure  like  this  one written for it; then  the main body of your
test program would just be calls to these test procedures.

This  is  especially  useful for procedures  that require some special
processing before or after being called; for instance, they might have
reference parameters that need to be put into variables before they're
passed,   record   structure   parameters   to   be  filled,  multiple
by-reference  output  parameters that all need  to be compared against
expected values, and so on.

You  can make up other, even  more general-purpose testing tools, such
as the following procedure:

   PROCEDURE MUSTBE (TAG: stringtype; RESULT, EXPECTEDRESULT: INTEGER);
   BEGIN
   IF RESULT<>EXPECTEDRESULT THEN
     BEGIN
     WRITELN ('Error: ', TAG, ': ', RESULT, ' <> ', EXPECTEDRESULT);
     (* error handling code *)
     END;
   END;

This  procedure  can be used to check  the result of any function that
returns an integer value, e.g.

   MUSTBE ('DATEADD #1', DATEADD (19901031, 10), 19901110);
   MUSTBE ('DATEADD #2', DATEADD (19901220, 20), 19910109);
   MUSTBE ('DATEADD #3', DATEADD (19920301, -2), 19920228);

Other,  similar,  procedures  might be written  to help test functions
that return other types (REALs, STRINGs, etc.). On the other hand, for
functions  that can't easily be called  in one statement (because they
take  by-reference or specially-formatted  parameters), you might want
to consider writing a special test procedure.

Finally,  one other alternative (which I personally prefer) is writing
a  special  "shell"  program  that  asks  for  a  procedure  name, its
parameters,  and the expected result,  calls the procedure, and checks
the result:

   PROGRAM TESTSHELL ...
   ...
   READLN (PROCNAME, P1, P2, EXPECTEDRESULT);
   WHILE PROCNAME<>'EXIT' DO
     BEGIN
     IF PROCNAME='DATEADD' THEN RESULT:=DATEADD (P1, P2)
     ELSE IF PROCNAME='DATEDIFF' THEN RESULT:=DATEDIFF (P1, P2)
     ELSE IF PROCNAME='DATEYEAR' THEN RESULT:=DATEYEAR (P1)
     ...
     IF RESULT <> EXPECTEDRESULT THEN
       ... output error ...
     READLN (PROCNAME, P1, P2, EXPECTEDRESULT);
     END;
   ...

This  way, your actual test suite could  be a job stream, to which you
can  add  as many test cases as you like  -- one line per test case --
without having to recompile anything:

   !JOB TESTDATE, ...
   !RUN TESTSHEL
   DATEADD 19901031 10 19901110
   DATEADD 19920228 2  19920301
   DATEYEAR 19920228 0 1992
   ...
   !EOJ

Whenever  you  make  a  change to your procedures,  you just rerun the
TESTDATE  job,  and  you'll  either  find  some bugs  or be reasonably
confident  (though, of course, never 100% confident) that the software
works.


TESTING PROGRAMS THAT DO I/O

It's  rather  easy  to  test  a  procedure  whose only  inputs are its
parameters and whose only output is its result (or even a by-reference
parameter). The more places a program derives its input from, or sends
its output to, the harder it becomes to test.

Let's  take a simple I/O program, one which reads a file, reformats it
in some way, and writes the result to another file. Obviously, to test
it, we should fill up the input file, run the program, and compare the
output  file against the expected output  file. As we discussed in the
previous  section, it would be nice if  we could build a program -- it
might  be a 3GL or 4GL program, or even an MPE or MPEX command file --
that  takes as parameters the input data and the expected output data,
so that we can easily add new test cases.

A first try on this might be a job stream like the following:

   :PURGE TESTIN
   :FCOPY FROM;TO=TESTIN;NEW
   LINE ONE
   LINE TWO
   LINE THREE

   :FILE MYPROGI=TESTIN
   :PURGE TESTOUT
   :FILE MYPROGO=TESTOUT
   :RUN MYPROG

   :PURGE TESTCOMP
   :FCOPY FROM;TO=TESTCOMP;NEW
   PROCESSED LINE A
   PROCESSED LINE B

   :SETJCW JCW=0
   :CONTINUE
   :FCOPY FROM=TESTCOMP;TO=TESTOUT;COMPARE=1
   :IF JCW<>0 THEN
   :  handle error
   :ENDIF

or, if the commands are put into a separate command file or UDC,

   :TESTCMDS
   LINE ONE
   LINE TWO
   LINE THREE
   :EOD
   PROCESSED LINE A
   PROCESSED LINE B
   :EOD

(the  data  would  go  as input to the  :FCOPY commands in the command
file).

Note  how the :FILE equations come  in handy to redirect the program's
input and output files. Not only does this avoid the need to overwrite
the  production  input and output files, but  it makes it possible for
several  test  suites  which test programs that  normally use the same
files  (e.g.  this  program,  the program that  created this program's
input  file, and the one that reads  this one's output file) to run at
once.  If  for  some reason your programs  don't allow :FILE equations
(e.g.  they issue their own :FILE  equations to refer to these files),
try  to  change  them  so they do, or at  least so they have a special
"test"  mode that will read :FILE-equatable  files. Note also that the
job  stream  regenerates the input and  comparison files every time it
runs.  I recommend this, since then each job stream would be a more or
less  self-contained  unit (if it uses a  special command file that no
other  test job uses, I suggest that  you build even this command file
inside  the  test job). It is easier to  move or maintain, and is less
likely to suffer from "software rot" (a condition that causes software
that's  been  left  on  the  shelf  too long to  stop working, largely
because some outside things that it depends on have changed).

Back  to our example. One problem with  it is that :FCOPY ;COMPARE= is
rather  finicky  about the files it's  comparing -- for instance, they
must  both  have  exactly the same record  size. TESTCOMP, built by an
:FCOPY  FROM;TO=TESTCOMP  would normally have the  same record size as
the  job  input  device,  so  you might need a  :FILE equation to work
around this.

A more serious problem is that :FCOPY FROM;TO= can only be easily used
for  creating  files  that  contain  ASCII  data. What if  some of the
columns of the file need to contain binary data?

Here is where I think you ought to grit your teeth and write a special
program  (unless, of course, you have a 4GL that can do this for you).
Yes, I know that it seems like a pain to write code that will never be
run  in  production,  but is only needed to  test other code, but this
rather  simple program could, if designed  right, prove to be a highly
reusable building block.

The  program would first prompt for some  sort of "layout" of the file
--  a  list of the starting column  numbers, lengths, and datatypes of
each  field in the file. Then, it  would prompt for each record in the
file,  specified  as  a list of fields,  separated by, say, commas; it
would  format  the fields into the file  record, and write it into the
file. Thus, you'd say:

   :RUN BLDFILE
   S,1,8, I,9,2, S,11,10, P,21,8  << string, integer, string, packed >>
   SMITH, 100, XYZZY, 1234567
   JONES, 55, PLUGH, 554927
   ...

Once  you write this program, incidentally, you might find that it has
other  uses,  say,  to  do manual testing of  your program once you've
already  found that it has a bug and are trying to isolate it. And, of
course,  if you make it general enough,  it should be usable in all of
your test suites.

Also,  your  input file had to have  been created by some program, and
your  output file must be intended as  to input to some other program;
there's  nothing  that  says that you can't  run those programs in the
test  job  to  create  the input file from  data you've input and then
format the output file into readable text. The problems come in if the
other programs are too hard to run in batch (e.g. require block mode),
or  if you'd like to be able  to test each program separately from the
others,  perhaps  because  you want to see  how your program reacts to
illegal data in its input file, data that shouldn't normally appear in
the input generated by the other program.

What  if your programs reads and writes  an IMAGE database? This is in
some  ways  simpler to test and in  other ways more difficult. You can
use  QUERY  to  fill the input sets and  create output (using >LIST or
>REPORT) will be usable by :FCOPY ;COMPARE=. Be sure, though, that you
sort any master sets that you dump using >REPORT -- since the order of
the  entries in the master set depends on the hashing algorithm, which
depends on the capacity, unsorted output will make the test suite find
an "error" every time you change the capacity.

However,  the  setup  of  the IMAGE database might  also be a bit more
cumbersome,  largely since you probably want  to have your own special
test  database  built  by the job (for  the reasons discussed above --
independence  from the production data,  from other test suites' data,
and  self-containedness). You might want to create a simple program or
command  file that takes a schema  file, lowers the dataset capacities
on  it,  runs  DBSCHEMA, and then does  a DBUTIL,CREATE -- you'll find
that a lot of your test suites can use it.


ADJUSTING FOR ENVIRONMENT INFORMATION

Our   pass-input-and-compare-output-against-expected-result   strategy
works just fine if the same input is always supposed to yield the same
output, but what if the output can vary? The most common variables are
based   on  current  date  and  time  --  reports  that  contain  this
information   in   headers,   output   files   that  have  each  value
date-stamped,   a   date-handling   procedure   that  returns  today's
day-of-week,  and so on. Another related problem is with programs that
check  whether they're being run online  or in batch, and do different
things  in these cases -- how can your batch test suite make sure that
the online features work properly?

What  we really have here is a different sort of input, input not from
a  file or a database, but from the system clock or the WHO intrinsic.
There  are a few ways of handling  this; for example, instead of doing
an FCOPY ;COMPARE=, which demands exact matches, you can have your own
comparison program that lets you specify that some particular field --
e.g.,  the  date  on  a  report header -- will  not get compared. Even
better,  your comparison program can let you specify that a particular
field  should  be  equal  to,  say,  the current year,  month, or day,
calculated at the time the comparison program runs.

However,  more  flexible  still  --  and  necessary  for  things  like
pretending  you're  online  rather  than  in  batch -- you  can try to
redirect  this  "input"  from the environment,  just as you redirected
input from files and databases using :FILE equations.

Now how are you going to do this redirection? Believe it or not, after
having  the  gall to ask you to write  test suites that are as long as
your  source  code,  I'm  suggesting that you  change your programs to
accommodate  testing  requirements.  Instead  of  calling  CALENDAR or
DATELINE,  for  instance  -- or using  whatever language construct may
give you this information -- you might write your own procedure:

   FUNCTION MYCALENDAR: SHORTINT;
   BEGIN
   get the value of the "PRETENDCALENDAR" jcw;
   IF the value is 0 THEN
     MYCALENDAR:=CALENDAR
   ELSE
     MYCALENDAR:=value of jcw;
   END;

This  way,  your program would normally get  the current date from the
CALENDAR intrinsic, but when the PRETENDCALENDAR JCW is non-zero, will
use  that value instead. You might, for efficiency's sake, want to get
the  JCW value only once, and then save it somewhere; for ease of use,
you   might  want  to  look  at  the  PRETENDYEAR,  PRETENDMONTH,  and
PRETENDDAY  JCWs,  and  assemble  the CALENDAR-format  value from them
(possibly using the date-handling package that we so thoroughly tested
a few pages ago).

A  similar procedure might be written to determine whether the program
is  running  online or in batch --  it'll check the PRETENDONLINE JCW,
and  if it doesn't exist, or set  to some default value, will call the
WHO  intrinsic. If your program does different things depending on the
user's  capabilities  or  logon  id,  you  might want  to have similar
procedures  for  them, too (wrapped around  the WHO intrinsic call) --
although  it's  possible  for  your test suite  to actually be several
jobs,  each  of  which logs on under  a different user, with different
capabilities,  it  may be more convenient for  you if one job can make
itself  look  like each one of these users  in turn. In fact, it might
even  be convenient for your own  manual debugging (say, when you want
to  duplicate the program's behavior as a  particular user id, or on a
particular  date,  but  don't  want  to re-logon or  change the system
clock).

Of  course, the drawback to this  approach is that you're not actually
testing  the  program  as it really behaves,  but rather as it behaves
with  the testing flag set; the  code you're executing in testing mode
is  somewhat different than is normally executed, and if, say, there's
a  bug  in  the  CALENDAR call or the WHO  call, your test suite won't
catch  it, since in testing mode  the intrinsics aren't called at all.
Unfortunately, this seems to be a necessary evil; the only solution is
to  minimize the amount of code  whose execution depends on whether or
not you're in testing mode.

One  thing  that you might do -- if you  want to be really fancy -- is
create  a  library  of  procedures called CALENDAR,  CLOCK, WHO, etc.,
which  would,  depending  on  some testing flag,  either call the real
CALENDAR,  CLOCK, or WHO, or return "pretend" values; you can then put
these  procedures  into an RL, SL, or XL,  and not have to change your
source  file. Once you debug your  library procedures, you should have
more confidence that your testing in test mode actually simulates what
the  program will really behave like in production. One thing that you
may have to do, however, is intercept not just the intrinsics that you
call  directly,  but  also  whatever  procedures  might  be  called by
language  constructs  (like  COBOL's  facility  for  returning today's
date).


WHAT TEST CASES SHOULD YOU USE?

So  far, we've talked a lot about how to write tools that make it easy
for you to add test cases to your test suites, but not much about what
your  test  cases  should  be.  Say  that  you're  testing  a  DATEADD
procedure, one that returns a date that is X days after date Y. (Let's
assume  that X could be negative -- X = -5 means a date that is 5 days
before  date  Y.)  What  test  cases should you  use? Think about this
before reading the answers!

Well, it seems to me that there are quite a few:

  * Add days so that it stays in the same month (e.g. 1990/05/10+7).

  * Add days so that it changes months (e.g. 1990/05/10+30).

  * Add days so that it changes years (e.g. 1990/05/10+300).

  *  Add days so that it changes months or years over February 28th in
    a non-leap year (e.g. 1990/02/10+30).

  *  Add days so that it changes months or years over February 29th in
    a non-leap year (e.g. 1992/02/10+30).

  *  Handle years that are divisible by  100 but not by 400 (like 1900
    or 2100), which are not leap years (did you know this?).

  * Add 0 days.

  * Add days so that it goes outside of your accepted date range (e.g.
    beyond 1999, or whatever other date is your limit).

  * Add to an invalid date -- one with an invalid year, month, or day.

  * All the above, but with subtracting days.

Wow!  That's  a lot of work. But, you'll  have to admit, all the above
are  things  that  you  really  should  test  for (unless  they're not
relevant  to  your particular interpretation, e.g.  if your date range
doesn't  extend to 1900 or 2100,  or if you've consciously decided not
to check for certain error conditions), manually if not automatically;
it's  especially important to test  for "boundary conditions" (did you
know  that, in the DATELINE intrinsic, the next day after DEC 31, 1999
is  JAN 1, 19:0?), for cases  that require special handling (like leap
year), and for proper handling of errors.

These  are  the obvious tests -- tests  for bugs that you expect might
happen.  As  other bugs come up, however,  you ought to add test cases
that  would have caught these bugs:  firstly, you'll have to test your
fix  anyway,  and if you add the  test case before implementing it, it
won't  cost  you anything extra; secondly, the  same bug (or a similar
one) may come up later, but this time will get caught.

Still,  there's  no  need  to get extreme  about things; shortcuts are
still  possible.  Say, for instance, that  DATEADD works by converting
the  date  into  a  "century  date"  format  (number  of days  since a
particular  base date), adding the number of days, and then converting
back  into a year/month/day format -- if  you're sure that this is all
it  does, you might just have one test case (preferably one that seems
to  exercise  as  much of the internal logic  as possible, such as one
that  changes  months  and  years).  Of  course,  you'd still  have to
properly test the date conversion routines.

In  general,  what  you  test  should  depend on how  your code works.
Whenever  you  know  your  code  treats  two different  types of input
differently,  you  should  test both. If you're  fairly certain that a
single  test will test many features, you  can just use that one test;
if,  for  instance,  you know that testing  one module or routine will
also  adequately  test  the module or routines  that it calls, you can
make  do  with just testing the top-level  one. However, try to resist
this  temptation;  firstly,  the  top-level  module  probably  doesn't
exercise all the functions of the bottom-level one, and secondly, it's
very  convenient  to have a test suite  for the bottom-level module --
that  way, if you're making substantial changes to your system and you
know   the  top-level  module  is  broken,  you  can  still  test  the
bottom-level one independently.

Finally,  an  obvious  point, but one that  it often neglected -- it's
better  to test a little than not at all. If you find something that's
hard  to test in all possible ways, test it in at least a few; if, for
instance,  its results are hard to automatically verify, at least make
sure  that they're in the right  format, or even that they're returned
at  all (i.e. that the program  doesn't just abort). There's really no
90-10  rule  in  testing -- 10% of the  effort won't get you much more
than 10% of the benefit -- but you can at least avoid some of the more
obvious  (and  more  embarrassing) bugs. Then,  once the groundwork is
laid,  you might try to get back to it periodically, adding a new test
case here or there. Don't let perfectionism get in the way of doing at
least something.


VERIFYING DATA STRUCTURES

Most  sufficiently  complicated data structures  -- anything from your
data stored in an IMAGE database to your own linked lists, hash files,
or  B-trees,  if  you  write  such  things  yourself --  have internal
consistency  requirements.  Certain fields in  your databases may only
contain  particular  values;  other  fields  must  have  corresponding
records  in  other  datasets  or  in other databases.  If any of these
internal consistency requirements are not met, you know you have a bug
somewhere.

You can get a lot of benefit out of writing a verification routine for
each such data structure that checks it for internal consistency. This
is  somewhat different from the test suites we discussed before, which
check  for  the validity of the ultimate  results, but it can still be
very  useful,  since internal inconsistency  must have, by definition,
been  caused  by a program error, and  is likely to eventually lead to
incorrect  results (incorrect results that  your test suites might not
otherwise check for).

You  should  call  this  verification routine at the  end of each test
suite to verify the consistency of the structures (again, usually data
in  the database) that the program  being tested built; you might even
run  it after each step in the test suite, to isolate exactly where an
error  might be sneaking in. You may also want to run the verification
routine  against  your production database every  night, to check your
programs  as they run in the real  world, not just the controlled test
environment;  and, you can run it  whenever you suspect that something
may be wrong (either in testing or in production), to figure out if an
internal inconsistency might be causing it.

The  verification routine shouldn't be hard to write; if you can do it
using  a 4GL or some other tool (like Robelle's fast SUPRTOOL -- speed
is  important, since you want to make it as quick and easy as possible
to verify your data), all the better. Simply put, check all the fields
for  which at least one possible value would be invalid, whether it is
because  it's not one of a list of allowable values for this field, or
because  it's  out  of  range, or because  it's inconsistent with some
other   values   in   this   record,  or  some  other  values  in  the
dataset/database. Possible checks include:

   * Flag fields may contain only certain allowable values.

   *  Numbers, like salaries or prices,  must be within certain ranges
     (e.g. non-negative, below a certain amount, etc.).

   * Dates must be valid (valid year, month, day).

   * Strings must at least not include non-alphanumeric characters.

   *  Some  fields  must  have  corresponding  entries in  a different
     dataset  or  database  (do a DBGET mode  7, for instance, to make
     sure they're there).

   *  Some  fields  are  calculated from other  fields, and must match
     (e.g.  a total price field in an  invoice that must be the sum of
     the price fields in the line items).

Not  only can this check for bugs in your programs, but can also check
for invalid production data that your programs might not have detected
(e.g.  garbage  characters  in string fields,  bad states, state codes
that  don't  match  phone numbers, etc.). And,  again, if written as a
QUERY  >XEQ file, or as a 4GL  program, it can be very easily created,
and used over and over again.


TESTING COMMAND-DRIVEN AND CHARACTER-MODE INTERACTIVE PROGRAMS

As  we discussed before, the key to successful automated testing is to
have  the  proper  tools  that  make  adding  test cases  easy. One in
particular  -- which takes some work to construct but can make writing
test suites much simpler -- is very much worth discussing.

This  test-bench lets you run another  program under its control, with
the son program's $STDIN and $STDLIST redirected to message files. The
test-bench  can let you specify input to be passed to the son program,
and  the  expected  output  that  the son program  should display; for
instance, a typical test suite might look like:

   :RUN TESTBENC

   RUN SONPROG          << command to start son process >>
   I CALC 10+20         << command for son to execute >>
   O 30                 << expected result >>
   DOIT                 << tells test-bench to compare the results >>
   I SQUARES 3          << command for son to execute >>
   O 1                  << expected result >>
   O 4                  << expected result >>
   O 9                  << expected result >>
   DOIT                 << tells test-bench to compare the results >>
   ...

What are the advantages of this approach?

   *  It  lets you specify the expected  output right after the input;
     this  makes  the  test  suite much easier to  write (and read and
     maintain)  than if you had to specify  all the input up front and
     all the expected output (from all the commands) at the end.

   *  It lets you compare the output and the expected output much more
     flexibly  than  a simple :FCOPY ;COMPARE=  would; you can specify
     special  commands that indicate, say,  that the output needn't be
     exactly  as you specified, but might include some variations here
     and there (e.g. date- or environment-dependent information).

   *  It tells you exactly which commands got errors, rather than just
     telling you an error was found.

Note,  however,  that all this test-bench does  is feed input into the
son  process'  $STDIN  and  read output from  its $STDLIST; what about
other  output,  say  output  to  files,  databases,  JCWs,  or  MPE/XL
variables, and input from the same places?

Fortunately,  output  to  one of those places  can easily be converted
into  output  to  $STDLIST  simply  by executing an  MPE command, like
:PRINT (or :FCOPY on MPE/V), :SHOWJCW, :RUN QUERY, etc. If our program
can  not only check the output of the  son process and feed input to a
son  process, but execute MPE commands and check their output and feed
them input, our problems will be solved.

Let's  say  your program is supposed to build  a file, and you want to
make  sure  that  the  file is built with  the right structure and the
right contents. Then, your test suite might look like this:

   :RUN TESTBENC

   RUN SONPROG          << command to start son process >>
   I BUILDFILE XXX      << command to test >>
   O File was built.    << expected output to $STDLIST >>
   DOIT
   MPE :LISTF XXX,2     << MPE command to execute >>
   O ...
   O XXX  123  32W ...  << expected :LISTF output >>
   O ...
   DOIT
   MPE :PRINT XXX       << MPE command to execute >>
   O ...                << expected :PRINT xxx output >>
   DOIT

How  does this program work? Well, as  we said before, it runs the son
program  with  $STDIN  and  $STDLIST redirected to  message files; "I"
commands write stuff to the input message file, "O" commands write the
expected  output  records  to  a  special  temporary file,  and "DOIT"
commands  read  the  output  message  file and compare  it against the
O-command temporary file.

One  problem is how "DOIT" will recognize that the son program is done
with  its  output  and  has  issued  another input prompt.  If the son
program  always  uses the same prompt (or  one of a few prompts), DOIT
can check for this; if, however, the son program's prompt isn't easily
distinguishable  from  normal output, you should  make the son program
print  a special line (e.g. "***INPUT***") before doing any input when
it's in "testing mode"; so long as this happens before any input, DOIT
can recognize these lines and realize that the output is done.

What  about  the MPE commands that can  be used to "convert" output to
files, databases, etc. into output to $STDLIST? When we test MPEX (the
test-bench  I'm describing is essentially the  one that we use to test
all  of our software), this is no problem, since we can just pass MPEX
an  MPE  command as input, and MPEX will  execute it. You might do the
same  yourself  -- make sure that  the program you're testing executes
MPE  commands -- or you can have  TESTBENC have two son processes, one
the  program being tested, and the other a simple program that prompts
for an MPE command and executes it. The only other problem that you'll
face  here  is executing :FCOPY or :RUN  commands on MPE/V (where they
can't  be done with the COMMAND intrinsic); however, if you're an MPEX
customer,  you can actually use MPEX as this MPE-command-executing son
process -- MPEX can execute :FCOPYs, :PRINTs, :RUNs, etc.

This  test-bench $STDIN-and-$STDLIST-redirection solution, it seems to
me,  would  work  quite  well  for  any command-driven  or interactive
character-mode  programs.  If  you want to use  it to test procedures,
you'll  have  to  write a simple shell  program that prompts for input
parameters, calls the procedure, and prints the output parameters, and
run  this  program  as  a  son  of the  test-bench. Testing block-mode
programs,  I  suspect,  would be much more  difficult; I'll have a few
words about it later, but it's still an unsolved problem as far as I'm
concerned.

Of course, the more complicated your test-bench is, the more important
it  is  to  write  a test suite for it!  (A bug in the test-bench that
keeps  it from properly checking  things could be almost unnoticeable,
since it will falsely tell you that your test suite ran fine.) We test
our  test-bench  by feeding a lot of  test commands, some of which are
calculated  to  produce  errors  and others to  succeed, and check the
results  of  these  operation  (not  using  the test-bench  itself, of
course) to see if they're as expected.


AUTOMATIC TEST SUITE GENERATION

No  matter  how  sophisticated  your test-bench, you'll  still have to
write  your  test  cases.  For simple  one-line-input, one-line-output
operations,  there's little that you need  to do beyond specifying the
input  and the expected output; however, for things that require a lot
of set-up, or are actually conversations, with many output prompts and
many inputs, you'd like a better way.

One idea that some automated testing people like is having you run the
program  once,  specifying all the right  inputs, and making sure that
all  the  outputs  are correct; these inputs  and outputs will then be
saved,  ready to be "re-played" by the test-bench, which will resubmit
exactly  the same inputs, and expect  to get exactly the same outputs.
In  effect,  then,  the  test  suite  will  check that  all subsequent
executions  of the program behave exactly  the same way as the initial
one  (which  was  presumably  correct).  The  way you'd do  this is by
modifying the test-bench program we discussed above (what? you mean to
say  you haven't written it yet?)  to have a special "data-collection"
mode  that will accept user inputs,  pass them to the program, collect
the  outputs,  and create a file that can  later be used by the normal
mode of the test-bench.

Now,  there  are  a few problems with this,  which lead me to conclude
that,  even if this data-collect feature  is used, the test suite that
it  generates must be easy to modify. Firstly, the user will doubtless
make errors while entering the original inputs; since the data-collect
feature  doesn't  know what's a user error  and what should be part of
the  test  suite,  there  needs  to  be some way  of editing out these
errors.  (Technically,  you  need not do this,  since exactly the same
inputs  should yield exactly the same error outputs in the future, but
if you don't edit them out, the test suite will be very unreadable and
unmaintainable.)  Secondly,  the future output  probably won't exactly
match  the  current output -- dates  and other environment information
(version  numbers, etc.) will doubtless change. There needs to be some
way  to  edit the generated test suite  to replace the expected output
with  some sort of "wildcard" characters that tell the test-bench that
any output would be acceptable in this case.

However,  taking  into  account  that  some editing will  be needed, a
data-collect feature can be quite convenient for testing features that
involve complicated I/O sequences.


A SAMPLE TEST ENVIRONMENT

Besides  having  the right test tools and  the right test suites, it's
important  to  internally  set  up  your  test  suites  and  your test
environment  so  that it is as easy as  possible to run all your tests
and  check  whether  or  not they succeeded. Here  are a few tips that
we've found handy ourselves:

   *  Have one test suite for each  major feature, not one big one for
     your  entire system. When you're  working on a particular feature
     and  want to see if it works, you'll probably want to re-run only
     that test suite after each change, and re-run all the test suites
     only at the very end.

   * As we mentioned before, have each test suite as self-contained as
     possible,  but  also try to have each  test case within each test
     suite  be relatively self-contained. The more a test case depends
     on  the results of the test cases that preceded it, the harder it
     will  be  for  you  to  fully  understand what the  state of your
     internal  files, databases, etc. is at  the time the test case is
     executed,  and  the  harder  it  will be to  maintain it, or even
     understand  why  the  "expected results" you have  for it in your
     test  suite  are really what should  be expected. Of course, some
     test  cases must not be self-contained precisely because you want
     to  make sure that they work  properly when done together, rather
     than separately.

   *  Have  each  test suite run in its  own group, with all the files
     needed  by  the  program being tested redirected  to the files in
     that  group.  The  first  thing that the test  suite should do is
     purge  the group (if it's logged on to it, this will merely purge
     all  the  files); this way, you'll be  sure that this test run is
     not influenced by previous runs of the same test suite, and since
     the  test suite runs in its own  group, it will not be influenced
     by concurrently-running other test suites.

   *  All the actual test suites and permanent support files should be
     in  their  own group, separate from the  groups in which the test
     suites run; this way, the test suites will be able to purge their
     own  groups, as discussed above. If  the test suite files are all
     in  a particular fileset (e.g.  "T@.TEST"), they can be submitted
     in MPEX using a %REPEAT/%STREAM/%FORFILES construct.

   *  Each test suite should signal  that it completed successfully by
     building  a file called TESTOK, and  that it failed by building a
     file  called TESTERR (or, even better, TESTE###, where ### stands
     for   the   number   of   errors   discovered).  Then,  a  :LISTF
     TESTERR.@.TESTACCT,6 will show you which jobs had errors in them.
     In  case  you're  afraid that a job  might abort without building
     either  a TESTOK or TESTERR file,  you can build the TESTERR file
     at  the  very beginning and only purge it  at the end if all went
     well.

   *  Finally, if you use a  test-bench program, the test-bench should
     send  all its output, especially an  indication of all the errors
     (what  the input was, what the  output was, and what the expected
     output  was),  to  a  disc  file called, say,  TESTLOG, which can
     easily  be read, and will remain on  the system even if the spool
     file is deleted.

Thus, the configuration we use in VESOFT is:

   C@.TEST.VESOFTD -- command files used by the test suites.

   M@.TEST.VESOFTD -- MPEX test suites.

   S@.TEST.VESOFTD -- SECURITY test suites.

   A@.TEST.VESOFTD -- VEAUDIT test suites.

   TESTPROD.TEST.VESOFTD  --  a  command  file that  purges the VETEST
     account and %STREAMs M@.TEST+S@.TEST+A@.TEST.

   @.MALTFILE.VETEST     --     group     used     by    test    suite
   MALTFILE.TEST.VESOFTD.

   @.MBATCH.VETEST -- group used by test suite MBATCH.TEST.VESOFTD.

   ...


TESTING SEEMINGLY HARD-TO-TEST PROGRAMS

Some  things  are  easier  to  test  than others;  procedure calls are
simplest,   command-driven   programs   are   rather  straightforward,
"conversational"  character-mode  programs  are  a  bit  harder.  Much
depends  on how easy it is to feed the program input and intercept the
program's  output; for example, if a program does input from tape, you
might redirect it by a :FILE equation to a disc file, but how will you
test  the  code in the program that tries  to handle tape errors? If a
program  is supposed to submit a job, how can you tell whether the job
was properly submitted?

There  are  several  general  tricks  that you can  use to solve these
problems,  though  these are more examples  of ingenious solutions for
you  to  emulate,  not  specific  instructions  that should  always be
followed:

   *  Inputs:  Have  ways to "fake"  hard-to-trigger input conditions,
     like  bad tapes, control-Y, I/O errors, etc. For instance, have a
     "***BAD  TAPE***" record in a file  be interpreted by the program
     as a tape error; if the program expects a tape to contain several
     things separated by EOF markers (which can't normally be emulated
     by disc files), have it treat an "***EOF***" as an EOF marker.

     Again, this has the same problem as the PRETENDDATE/PRETENDONLINE
     features that we suggested above -- what you'll really be testing
     is  not the actual execution of the program, but the execution of
     the  program in testing mode. However, though problems with, say,
     the  actual condition code check that detects the tape error will
     not  be found, all the other  aspects of tape error handling will
     be properly tested.

   *  Outputs:  Find  commands or programs that  can convert an output
     that  is hard to test for into one  that is easy to test for; for
     instance,  if your program is supposed to  do a :DOWN, test it by
     doing  a :SHOWDEV afterwards to see if it is DOWNed or has a DOWN
     pending. If the program is supposed to do an :ABORTJOB, PAUSE for
     some  time  (since an :ABORTJOB may  not immediately take effect)
     and  then do a :SHOWJOB of that  job number to make sure that the
     job  no  longer exists. If your program  is supposed to shut down
     the system, you're out of luck...

   *  More  outputs:  But  maybe you're not out  of luck in the system
     shut-down  case,  after  all;  analogously  to  the  "fake input"
     suggestion  above,  you  might have your program  check to see if
     it's  in  testing mode, and if it  is, print a message instead of
     shutting   down   the   system   (or   doing   something  equally
     uncheckable-for).  Again, this won't make  sure that the ultimate
     operation  is done properly (since it  won't be done in this case
     at   all),  but  at  least  it'll  make  sure  that  all  of  the
     preliminaries will be handled correctly.

   *  General:  Find  ways  of  taking  care  of  timing  windows; for
     instance, if your program submits a job, a simple :SHOWJOB in the
     test  suite won't be a proper check (since a small job might have
     finished  by  the time the :SHOWJOB is  done), and having the job
     build a file or leave some such permanent file won't work either,
     since the job might still not have started up. Instead, your test
     suite  might build an empty message  file and then make sure that
     the job writes a record to this message file (possibly by setting
     up  a logon UDC for that user). Your test suite can then read the
     message  file, waiting until a record is written to it, no matter
     when the job actually gets around to executing.

     Message  files  are also quite useful when  the test suite has to
     check  something  at  a  particular  point  in the  son program's
     execution,  and if it checks it too early or too late the results
     will  not be quite right. This  is particularly so when your code
     is   supposed   to   properly   handle  concurrent  access  in  a
     non-standard  way  (i.e.  not  by  simply using  FLOCK/FUNLOCK or
     DBLOCK/DBUNLOCK).  You might want to have your programs, when run
     in  testing  mode,  try  to  read  records  from message  file at
     critical  points,  which  will let you  control when each program
     will hit a particular piece of code.

Again,  these  are  some sample solutions to  some (though by no means
all) testing problems. The $65,536 question of testing, however, still
remains:  How  do you test VPLUS  block-mode applications? Some of the
above  tricks  might  be  usable  --  instead  of  calling  the  VPLUS
intrinsics,  call  procedures  that, in testing  mode, will do normal,
unformatted terminal I/O (i.e. the input fields are to be input simply
as a data string, with all the fields run together), which can then be
run  under test-bench control. Unfortunately, it seems that this would
leave  too  much out of the testing  (for instance, the correctness of
the VPLUS calls themselves, and the correctness of any edits specified
in  the  VPLUS  forms),  and  the  test  suites  would  also  be quite
unreadable and unwritable. Someone might do something to intercept the
terminal  I/O  from  within  VPLUS  itself,  but  that's  getting  too
complicated for me. Any ideas?


CONCLUSION

To sum up, a few testing maxims:

   *  Automate  testing  --  both  the  input and the  checking of the
     output.

   *  Write test suites before or  while you're writing the program --
     that way, you can use them to do even the initial testing.

   *  Figure  out  the testing tools that you  need and don't skimp in
     building them; they can save you a lot of effort.

   * Make it as easy as possible to add new test cases (try to make it
     one test case per line), even if it means extra work up front.

   *  Have your test cases be in  job streams, not in source files, so
     that you can add new ones without recompiling.

   *  Change  your  programs  so that they  can "pretend" that today's
     date,  the  batch/online flag, your  logon information, and such,
     are  something  other than what they really  are. Do the same for
     hard-to-reproduce conditions, like I/O errors, control-Y, etc.

   *  Think about your code and come  up with test cases that exercise
     as much of it as possible; as new bugs arise, add test cases that
     would have caught them.

   *  Write  verification  routines  for  all  your  complicated  data
     structures,  especially  including  the  data  in your  files and
     databases.

   *  If feasible, write some sort  of test-bench program in which you
     can test the behavior of other programs by feeding them input and
     checking their output.

   * Think creatively about testing features that at first glance seem
     difficult  to check the results of.  Use message files to control
     timing problems.

   *  Be  prepared  to  spend a lot of  time and effort (and therefore
     money)  on  automated  testing,  but  expect  to save  a lot more
     effort, and come out with much fewer bugs, if you do it right.

Go to Adager's index of technical papers