The truth about MPE/XL disc files

THE TRUTH ABOUT MPE/XL DISC FILES
by Eugene Volokh, VESOFT
Presented at 1989 INTEREX Conference, San Francisco, CA, USA
(BEST PAPER AWARD For an Outstanding Paper Presentation)
Published by INTERACT Magazine, Sep 1989.

ABSTRACT

Several years ago, I wrote a paper called "The Truth About Disc Files". In it, I tried to describe some aspects of files that, I felt, inquiring minds wanted to know -- things like extent considerations, blocking, file locking, etc. Some of those things remained the same under MPE/XL's file system, but many have substantially changed; this paper will try to describe some of the key differences and similarities between the MPE/XL and MPE/V file systems, and explain some of their practical implications.

HOW FILES ARE STORED -- EXTENT CONSIDERATIONS

One of the key limitations of MPE/V was that you had to know a file's maximum size when you were building the file. If you guessed too low, your programs would eventually abort with an END OF FILE error; if you guessed too high, you could waste a lot of disc space.

Actually, technically speaking, you didn't have to know a file's true maximum size; you could always build the file with a very large file limit, e.g.

:BUILD MYFILE;DISC=1000000

and you'd be rather certain never to overflow it. The trouble, of course, is that the file's disc space was not allocated on a simple as-needed basis, but rather one extent at a time. Since a file by default had a maximum of 8 extents the above :BUILD would build a file that was split into up to 8 chunks of contiguous disc space; the very act of building the file would allocate one chunk of this space, which would occupy 1,000,000 / 8 = 125,000 (contiguous!) sectors. (Remember that a sector is 256 bytes -- we used to say 128 words, but not any more; since a word means 2 bytes on Classics but 4 bytes on Spectrums, I will try to use "word" as infrequently as possible in this paper.) Even if you said

:BUILD MYFILE;DISC=1000000,32

to build the file with up to 32 extents, the file would initially be allocated with over 31,000 sectors. In other words, it wasn't so much selecting the right file limit that was the problem, but rather that selecting a file limit that was too high would cause prohibitive consumption of disc space.

This may seem somewhat nitpicky but it is actually quite relevant to MPE/XL. MPE/XL also requires you to specify the maximum size for a file. However -- and this is a big "however" -- it lets you specify a very large file limit without using huge quantifies of disc space. If in MPE/XL you said

:BUILD MYFILE;DISC=1000000

the file would be allocated 2,048 sectors at a time, even if this would require it to have almost 500 extents when full. Thus you get the best of both worlds -- the file can grow to up to 1,000,000 records but will never have more than 2,047 sectors of wasted disc space. You'll find that MPE/XL often builds files (e.g. XL's built by LINKEDIT's -BUILDXL command) that have file limits of 4,096,000 -- more than they'll ever need but what does it matter? In fact, the highest "maximum maximum file size" -- i.e., the highest (file limit * record size) value that you can have -- is 8,388,607 sectors.

One of the reasons why MPE/V had the 32-extent limit was that the disc addresses of all the extents were kept in the 256-byte file label. Since each disc address was 4 bytes, and a little bit less than 128 bytes were required for other file information (e.g. file name, creator id, file code, etc.), that left room for only about 32 extent pointers.

MPE/XL didn't make the same mistake of keeping extent pointers in a single fixed-size array. Instead, each file has a linked list of "extent descriptor blocks", each of which has 20 12-bytes entries that point to the extents of the file. Thus, a file on MPE/XL will have:

* a file label, which points to

* an extent descriptor block which contains the disc addresses of

20 extents and also points to

* a second extent descriptor block which contains the disc addresses

of 12 extents.

Granted, this is 3 sectors (as opposed to the 1 sector, which is all that is needed for the file label on MPE/V), but think of the flexibility -- new extents can be added to the file with no difficulty. To avoid possible performance problems with access to files that have many extents, MPE/XL builds a special "extent descriptor B-tree" whenever you open a file; that way, it can very quickly find the address of an extent even in a a many-hundred-extent file.

All right you've said:

:BUILD MYFILE;DISC=1000000

and now you do a :LISTF MYFILE,2. What do you get?

FILENAME CODE  ------------LOGICAL RECORD----------  ------SPACE-----
               SIZE  TYP    EOF       LIMIT     R/B  SECTORS   #X   MX
MYFILE         128W  FB       0     1000000       1        0    0    *

There are, obviously, three unusual things in this picture:

* The number of sectors allocated to the file is 0. If you recall on MPE/V, even an empty file always has at least one sector allocated to it, and usually more. This is because on MPE/V, file labels were kept as the first sector of the first extent of the file so each file always had to have at least one extent allocated to it. In MPE/XL, file labels are kept separately from the file data (in a special portion of the disc called the "file label table" -- extent descriptor blocks are also kept there), so no data extents are allocated and thus 0 sectors are actually allocated for the data. Of course, the file label still takes up 1 sector of space, but that doesn't get budgeted to the file in MPE/XL.

* The number of extents allocated to the file is O. See above paragraph.

* The maximum number of extents for the file is "*". This means that the file was built without a maximum number of extents (though, as we'll explain later, even if it were built with a maximum number of extents, this number still wouldn't really be a maximum!). For extra credit try to guess what an "*" in the "# of extents" (as opposed to "maximum extents") column means; we'll discuss that later.

Now,    say that  we  write  one record into this  file  and then do a
:LISTF. We'd see:

FILENAME CODE  ------------LOGICAL RECORD----------  ------SPACE-----
               SIZE  TYP    EOF       LIMIT     R/B  SECTORS   #X   MX
MYFILE         128W  FB       1     1000000       1     2048    1    *

One 2,048-sector extent has been allocated to this file -- it will be enough for the first 2,048 records to be written into MYFILE. When we try to write the 2,049th record, another 2,048-sector extent will be allocated, and so on. As we mentioned before, this means that no more than 2,047 sectors in this file will ever be wasted (allocated but unused).

This is quite nice, but if each such file wastes an average of 1,000 sectors (an average between those that waste 0 and those that waste all 2,047) and you have 2,000 such files, we're talking about 500 megabytes of wasted space about the size of one disc drive. Looking at it this way, saying that "at most 2,047 sectors can be wasted" is small comfort.

It would have been nice if we could build the file indicating what we'd like its extent size to be; we might then build our files with huge file limits but tell MPE/XL that they are to be allocated only, say, 512 or 256 sectors at a time.

Unfortunately this is (to the best of my knowledge) not possible. MPE/XL determines the size of the extents that it will try to allocate for a file using a (rather bizarre) formula based on the maximum number of sectors the file can ever contain:

MAXSECT = (userlabelflimit*256+recordsize*flimit)/(16*256)*16

    (i.e. the total number of sectors in the file's data portion,
    rounded up to the next multiple of 16 sectors)

DEFAULTEXTENTSIZE  =
IF MAXSECT<= 127       then MAXSECT rounded up to the next highest
                       multiple of 16
IF MAXSECT<=255        then MAXSECT rounded down to the lowest
                       multiple of 16
IF MAXSECT<=4095       then 128 sectors
IF MAXSECT<=65535      then (MAXSECT/32) rounded down to the next
                       lowest multiple of 16
IF MAXSECT>=65536      then 2048  sectors

Don't ask me why this is the case -- it just is. (It certainly makes some sense for extent size to vary as a function of the file size, but I'm not sure why it varies exactly in this unusual way.)

Note, however that one other great new feature of MPE/XL is that if it can't find extents of the size that it wants (e.g. there is no empty chunk of 2048 extents), it will just allocate smaller extents. MPE/XL will not report an "OUT OF DISC SPACE" condition unless there really isn't enough disc space but it's reserved for virtual (transient) memory. Disc fragmentation does not appear to harm things except that it may make files be built with smaller extents.

So, this all shows (among other things) that any files with a maximum of 65,536 or more sectors will have space allocated for them in 2,048-sector chunks. If you have one of those files how do you save the 1,000 sectors or so that will, on the average, be wasted?

On MPE/V, you could always "squeeze" the file (FCLOSE it with disposition 8 or MPEX %ALTFILE;SQUEEZE), which would set the file's file limit to be equal to its end of file and thus save the wasted space. Unfortunately, it would also prevent any more records from being added to the file (unless you rebuild it or %ALTFILE;FLIMIT= it).

MPE/XL has a very nice alternative to this that we call "trimming". It lets you tell MPE (using a new FCLOSE disposition) to deallocate any unused space allocated to the file without changing the file's file limit. In other words, this will save space without making files any less expandable; an operation such as MPEX's

%ALTFILE @.@.@; XLTRIM

can save you hundreds of thousands of sectors (it did for us and we only have a 925LX with two disc drives). Actually, before doing this we called PICS and asked whether there were any files that should not be trimmed and they told us that they didn't know of any; I then did the trim of all the files in the system and nothing seemed to fail. Be warned though -- it's certainly possible that some program (probably a heavily privileged one, since normal programs have no way of knowing whether a file has been trimmed or not) doesn't like its files trimmed.

How does this work? Well, say that you have a file with five 2,048-sector extents the last of which contains only 537 actual sectors of data. When you tell MPE to trim the file, it will deallocate the last 1,504 sectors of the last extent, leaving it with only 544 sectors. (544 = 537 rounded up to the next highest multiple of 16; files are always allocated in multiples of 16 sectors.)

Now, the file has four 2,048-sector extents and a 544- sector extent. If you start adding more records to it, more 2,048-sector extents will be allocated to it; you may then again want to trim the file.

What makes this whole process work is that MPE/XL allows you to have extents of different sizes. If, like in MPE/V, all extents (except for the very last of the possible extents) had to be the same size, you wouldn't be able to throw away unallocated data because that would leave the last allocated extent with a different extent size. MPE/XL, however, is not bothered by this -- I've often seen files with many different sizes for many different extents. The "extent descriptor blocks" that I mentioned earlier actually contain several pieces of information for each extent:

* the disc number on which the extent resides (actually, the volume table index);

* the starting sector address of the extent;

* the size of the extent, in sectors;

* and, the sector number of the first sector of data (relative to the start of the file) that resides in this extent.

Actually with a structure this flexible it's even possible for records #0 to #999 to be located in the second extent of a file and record #1000 to #1999 to be located in the first extent! (Of course when you read the file, the records will come out in the right order, but internally inside the extent descriptor blocks the extent information will be kept out of order.)

What are the disadvantages of trimming files? To the best of my knowledge, there are very few. Trimming files will increase disc space fragmentation, but it's not clear to me that this is a problem on MPE/XL, especially since MPE/XL seems to handle correctly situations where it can't find extents of the size that it wants (if necessary, it just allocates more smaller ones).

Trimming files does cause the file to have more extents, which may have a slightly adverse effect on performance. The file system usually tries to read 128 or more sectors (16,384 bytes) at a time, so if a file has all 16-sector extents (the smallest size possible), you will lose the advantages of these large I/Os since you'll never have 64 contiguous sectors. However, if a file has, say, 512-sector extents rather than 2,048-sector extents, this should cause minimal performance penalties (if any at all).

On the other hand, if you feel that a trimmed file isn't giving you the performance you'd like, you can copy its data into a new copy of the file, and all the extents will then be the same size (whatever the extent-size algorithm we showed above dictates). You can even build the new file with a higher file limit (to increase the extent size) and then trim the file to save as much space from the last extent as possible. MPEX users can do this by saying

%ALTFILE filename;FLIMIT=4096000;XLTRIM

-- this will rebuild the file to have all its extents be 2,048 sectors each except for the last one, which will only be as large as necessary. This will give you the maximum disc space savings as well as the maximum possible extent sizes (if that's what you want).

(Note that files with EOF = FLIMIT do not require trimming; the MPE/XL file system automatically allocates the last extent to be just large enough to fit all the records up to the FLIMIT, even if this makes the extent smaller than the other extents.)

In light of all this, why does MPE/XL still support the maximum number of extents parameter of the FOPEN intrinsic and the :BUILD command?

Well, compatibility is one reason. There are programs out there that might, for instance, do an FGETINFO or FFILEINFO of a file's maximum number of extents -- they should be able to get the value specified when the file was :BUILDed (:BUILt?). In fact, MPE goes so far as to return 8 as the maximum number of extents when you do an FFILEINFO for a file that was built without a maximum number of extents -- all this just to make sure that the program will get a value (though an incorrect one) that it will be able to handle.

Another reason involves moving files from MPE/V to MPE/XL and back. If you move a file from MPE/V to MPE/XL, it will have the same "maximum extents" value that it did on MPE/V (even if it will be ignored by MPE/XL). Then, if you move the file back to an MPE/V system, it will have the same maximum-extents value that it originally had. If the MPE/XL file had no maximum extents, MPE/V will select a "reasonable" value for this (based on the number of sectors the file uses).

Finally, the maximum number of extents is used (though in a very strange way) when you build a file specifying both the maximum number of extents and the number of extents to initially allocate. For instance, say that you enter

:BUILD MYFILE;DISC=100000,32,4

What do you suppose will happen? here's what a :LISTF of the file will show:

FILENAME CODE  ------------LOGICAL RECORD----------  ------SPACE-----
               SIZE  TYP    EOF       LIMIT     R/B  SECTORS   #X   MX
X              128W  FB       0     1000000       1    12512    1   32

The file was built with 1 extent of 12512 sectors! MPE/XL decided that what you wanted is a file with 4/32nd (= one eighth) of its space allocated, so it built you one like that, although with all that space allocated as one contiguous extent. From then on, the file will be allocated in normally-sized (in this case, 2,048-sector) chunks. Eventually, the file will need more than 32 extents (this file would if full, need more than 40), and MPE/XL will just blithely ignore the maximum extents value and allocate as many as it needs. Thus, you'll often see files with more extents than the maximum (rather perplexing when you first see them).

Incidentally, to answer the question we asked earlier: if a file has 100 or more extents, MPE/XL will show an * in the number of extents column of a :LISTF ,2 listing.

Finally, remember that IMAGE databases must still be fully allocated when they are built -- I believe that IMAGE does this not because of any file system limitation but rather for data consistency's sake; it doesn't want to run out of disc space in the middle of a DBPUT. I have, however, heard a hot rumor that a future version of TurboIMAGE/XL will allow a detail dataset to be expanded when it runs out of space (so that you can initially allocate it with less space than it will eventually need); but (or so the rumor says), each dataset can only be expanded once in its life -- once it's expanded, it better not overflow again!

Seems bizarre but that's what I've heard. Believe it or not.

HOW FILES ARE STORED -- BLOCKING CONSIDERATIONS

In discussions of MPE/V, much was said about blocks, blocking factors, and their effects on speed and disc space. Just when we had all taken the time and trouble to learn all of their intricacies, MPE/XL has made them (almost) completely irrelevant.

In MPE/XL, all physical disc I/O is done in multiples of one page, which is 4096 bytes. This is not to be confused with the pages that are 2,048 bytes -- yes, that's right, there are two kinds of pages, each of different size, and both of which are called pages. One, which is 2,048 bytes long, is the unit in which the hardware sees the world; the other which is 4,096 bytes long, is the unit used by the operating system from the memory manager on up (including the file system).

In any event, physical I/O is done some number of 4,096-byte pages at a time (it's good that it's a 4,096-byte page because you want to do I/Os in fairly large chunks). Since each 4096-byte page consists of 16 256-byte sectors, a file is always allocated, read, and written in multiples of 16 sectors at a time.

Remember that the whole point of MPE/V blocks was that a block was the unit of file transfer for this particular file. This may have made sense in the very earliest HP3000s, which often had as few as 128 Kilobytes of memory, and which couldn't afford huge chunks of this for file buffering.

However, since on MPE/XL file transfer is always done 4,096 bytes at a time, the concept of a block becomes irrelevant. Each page has as many records in it as will fit; there are no inter-record gaps (as there used to be on MPE/V when the block size was not a multiple of one sector). In fact, records can even straddle pages -- if your file's records are 1,000 bytes long, then

* the first 4,096-byte page will have 4 full records and the first

96 bytes of the fifth record;

* the second 4,096-byte page will have the last 904 (1,000-96) bytes

of the fifth record, the next 3 full records, and the next 192 bytes of the ninth record;

* and so on.

There's never any space wasted in a page (except of course, in the allocated-but-not-written portion of the last extent) -- not because of bad blocking factors and not even because of records with odd record length. If you build an ASCII file with 1-byte records, exactly 4,096 of them will fit into each 4,096-byte page.

A curious thing, incidentally, is the lengths to which MPE/XL must go to make this efficient, reasonable, straightforward system compatible with MPE/V's baroque and inefficient mechanisms. If you read an odd-record-length file MR NOBUF, MPE/XL will actually insert padding bytes at the end of each record to be compatible with MPE/V; when do you an MPE/XL :STORE;TRANSPORT of a file whose blocks (in MPE/V) wouldn't be multiples of 256 bytes, MPE/XL will also insert padding at the end of each block to correspond to MPE/V's inefficient end-of-block padding.

The blocking factor of a file is, like the maximum number of extents of a file, specifiable but largely ignored. It's relevant only for compatibility, for transporting files to MPE/V machines, and for NOBUF file accesses, in which a program written for MPE/V would expect to get data in units of MPE/V blocks.

OTHER DISC SPACE CONSIDERATIONS

So, if MPE/XL can build large files without wasting space and do file blocking more efficiently and trim wasted space without changing file limits, one question remains: Why does it use so much disc space?

There is one philosophical explanation (it's somebody or other's Law, but I forgot whose): "The amount of disc space required will increase until it meets and exceeds the amount of disc space available". This is actually not just a facetious statement; as disc space use algorithms become more efficient and disc space becomes more plentiful, people will take advantage of this by building more and more files that are larger and larger. You'll get more bang for a buck's worth of disc space, but eventually you will exhaust it all the same.

There are, however a few more pragmatic explanations:

* The operating system uses much more disc space than on MPE/V. The groups MPEXL.SYS and PUB.SYS use 570,000 sectors (150 megabytes!) on our 925/LX -- PUB.SYS on our MICRO/3000 uses only 100,000 sectors.

* Code -- programs and SLs -- uses a lot more space than it did before (this is actually a big part of the reason why the operating system uses more disc space).

Why is this the case? Well, remember that all this "Reduced Instruction Set" means that it takes several RISC instructions to do the job of one Classic instruction. Thus, a program of 10,000 16-bit Classic instructions might be replaced by one of 50,000 32-bit RISC instructions -- a ten-fold increase. This is true of Native Mode code and of OCTCOMPed code. Compatibility Mode code still takes the same amount of space as it did under MPE/V.

* Although trimming files is possible, to the best of my knowledge,

few things in MPE do it routinely. Compatibility mode USLs, it seems, are pretty substantial culprits (using far more space than they would if trimmed), and other files should probably be periodically trimmed, too.

Thus, our recommendations for saving disc space would be:

* Purge old unused files. This #1 space-saving feature from MPE/V

days is still as important as ever on MPE/XL (and will probably be for a long time to come). Discs inevitably get filled up with junk, data that the owner no longer uses, no longer wants, and has probably already forgotten about; not only does it waste disc space, but it also makes your full backups take more time and more tapes. If you periodically archive and then purge all the files (except, say, IMAGE datasets) that haven't been accessed in 120 days, you will save a lot of a disc space with minimal user complaints

* Trim files (e.g. using MPEX's %ALTFILE @.@.@;XLTRIM) periodically.

As I mentioned before, trimming seems to be safe for all files in the system.

* Remember that native mode and OCTCOMPed program files are now big

disc space hogs -- multiple unneeded copies of programs (which used to be rather harmless on MPE/V) may now substantially contribute to your disc space problems.

MAPPED FILES

Mapped files have been heralded (and correctly so) as a powerful and valuable new feature of MPE/XL. It has been discussed in a number of places, including chapter 11 of HP's "Accessing Files Programmer's Guide", and also, coincidentally, chapter 11 of SRN, Inc.'s excellent "Beyond RISC!" book. (I heartily recommend Beyond RISC! to anybody who's at all interested in Spectrums -- call SRN at 206-463-3030).

At the RISC of beating a dead horse, I d like to go over some of the key points of mapped files in this paper, too.

First of all, a Mapped File is actually not a type of file but rather a type of file access. Almost any file can be opened as a mapped file; once your program opens a file with the mapping option, it will be able to access the file as if it were an array in its own data area. Instead of accessing a file using FREAD and FWRITE (or the equivalent language constructs, such as PASCAL's READLN or WRITELN), you'll be able to access the data of the file just as you'd access any array (or record structure).

MPE/XL will, behind your back, realize that this isn't a normal array but is rather a mapped file; whenever you access a piece of this array, MPE/XL will, if necessary, go out to disc to get the appropriate data. (This is actually true for your stack, data segments, etc., as well, but it's especially important for mapped files.)

(Not very important note: Actually, the file system opens all files with mapped access for its own internal purposes; however, when I talk about mapped file access , I refer to file access that is mapped from the users point of view.)

Let's look at what might be the perfect application for mapped files keeping a large array of data that must survive from one execution of a program to another.

Say that you have a large number of payroll codes (numbered, say, from 0 to 99), each of which has various attributes (such as code name, pay scale, tax identifier, etc.) that your program must know about. Your program has to look payroll codes up in this file and extract the relevant data.

Without mapped files, here's what your program might look like (in PASCAL):

TYPE PAYROLL_CODE_REC = CRUNCHED RECORD CODE_NAME: PACKED ARRAY [1..20] OF CHAR; PAY_SCALE: INTEGER; TAX_ID: INTEGER; ... END; VAR PC_REC: PAYROLL_CODE_REC; ... FNUM:=FOPEN (DATAFILE, 1 (* old file *)); ... FREADDIR (FNUM, PC_REC, SIZEOF(PC_REC), PCODE); IF PCODE.TAX_ID=... THEN ...

With mapped files, you'd say:

TYPE PAYROLL_CODE_REC = CRUNCHED RECORD CODE NAME: PACKED ARRAY [1..20] OF CHAR; PAY SCALE: INTEGER; TAX ID: INTEGER; ... END; PAYROLL_CODE_REC_ARRAY = ARRAY [0..99] OF PAYROLL_CODE_REC; VAR PC_FILE_PTR: ^PAYROLL_CODE_REC_ARRAY; ... DOMAIN:=1 (* old file *); HPFOPEN (FNUM, STATUS, 2, FILENAME, 3, DOMAIN, 18, PC_FILE_PTR); ... IF PC_FILE_PTR^[PCODE].TAX_ID=... THEN ...

Instead of doing an FREADDIR (using PASCAL's READDIR statement), we directly access the file as if it were an array. The HPFOPEN call (more about its unusual calling sequence later) indicates that the file is to be opened mapped and that the pointer PC_FILE_PTR is set to point to its data; then, whenever we refer to PC_FILE_PTR^ we get access to the entire file as an array of records.

Why would we want to use mapped files? One reason is convenience -- in a situation like this one, it's easier (and makes more sense in light of the logic of the program) to view the file as an array rather than as a file. Instead of having to do a READDIR every time we want to get a record, we just access the record directly.

Another reason is performance. Avoiding the extra READDIRs not only makes the program smaller and cleaner, but also saves the CPU time that would otherwise be taken by each READ, READDIR, WRITE, or WRITEDIR. Each file system intrinsic (which are ultimately called READ, READDIR, WRITE, and WRITEDIR, and all the similar constructs in the other languages) has to do a lot of work finding control blocks, checking file types, etc., even before a disc I/O is actually done. This can take many thousands of instructions, amounting to up to a millisecond per call (or more). Access to a mapped file can take as little as one instruction one memory access.

As we will discuss later, mapped file access actually has some performance penalties, too, especially when we're doing sequential accesses to files that are not likely to be already in memory It is actually quite possible with mapped file access to lose much more on disc I/O increases than you would gain on CPU time savings. However, if you're accessing files that are already likely to be in memory -- which often includes many heavily-accessed files -- mapped I/O can give you very large performance gains (again, more about that later). Beyond convenience and optimization, I think that there are many more very interesting things that mapped files can let us do things that have rarely been contemplated in the past precisely because they were so difficult to do in the past. There is one idea that I have along these lines; I've never tried it in a production program, but I feel that it could very well prove quite useful.

One of the things that mapped files can give us is shared variables. By this I don't mean global variables that are shared among all the procedures in a program, but rather variables that are shared among multiple programs and processes.

For example, let's say that you have a program that runs in a job stream. The program might run for a long time, and you may want to check on its progress -- see which phase of processing it's in, what was the last record it processed, and so on.

With mapped files, you can do the following:

* Keep some crucial variables -- the current processing phase, the

current record being processed, etc. -- in fields of a record structure. (This is a bit more complicated than having them all be separate variables, but not much.)

* Have the record structure be associated with a mapped file

by HPFOPENing the file (with shared access) and using the pointer that HPFOPEN returns as a pointer to the record structure.

* Have another program that you can run online that will open the

mapped file and print its contents for you.

Whenever the background program modifies one of the fields of this mapped-file-resident data structure, the field will be automatically updated in the file (even though this almost certainly require, on the average, far less than one disc I/O for each field modification). Then, the online program can at any time look at the contents of the file and tell you what's going on; and, if the batch program aborts, you'll be able to see where it was in its processing when it aborted (since the data is saved in the permanent mapped file).

This would also be an excellent tool if you'd like to write a debugger for some interpreter program that you have. As long as the interpreter keeps all its control variables in a mapped-file-resident area, then a debugger program (running in the same session or in a different one) can look at these variables and figure out exactly what the interpreter is doing. It can even change the variables, for instance, setting some debugging flag, changing the value of a user variable, or whatever; and, if all the important data is actually kept in this file, it would permit dump analysis in case the program aborts, and even interruptions and restarts (since the entire state of the program would be automatically saved).

Another possible application is to have a program periodically (e.g. for every record that it processes) check a mapped-file-resident variable and terminate cleanly if it is set. Then, if we want to terminate all the processes running this program, we just set the variable, and all of them will stop. (Something like this could be done before with message files and soft interrupts, but it would require one record to be written to the message file for each process accessing it.)

Of course, this could all have been done before mapped files instead of accessing the mapped-file-resident variables directly we could just do FREADs or FWRITEs to read or write the appropriate record from the file. However, this would have been prohibitively expensive and clumsy -- imagine that you had to do an intrinsic call every time you wanted to access a particular variable; it would badly slow things down and make your program much more complicated. As I said, all of the above are relatively untested ideas, but I feel that much can be gained by doing something along those lines.

The really sad thing about mapped files -- something that I think is likely to drastically reduce their utility -- is that they can only be accessed from PASCAL/XL, C/XL, and SPLash!. FORTRAN/XL and COBOL/XL programs cannot access mapped files, not because of any file system limitation, but because those languages do not support pointers In FORTRAN and COBOL, all variables are preallocated when you run the program or enter a procedure; to use mapped files you have to be able to assign a particular address to a variable.

Actually, if you really wanted to use mapped files from FORTRAN or COBOL, you could write a PASCAL, C, or SPLash! procedure that lets you access pointers; however, this would most likely cancel any convenience advantages that mapped files can give you.

A few other notes about mapped files -- they're all documented in various places, but they're worth repeating:

* There are two ways of opening a file for mapped access -- "long mapped" and "short mapped". Long-mapped access lets you access any size file but requires you to use long (64-bit) pointers; in PASCAL, they have to be declared as $EXTNADDR$. Short-mapped access only lets you access a file of at most 4 megabytes; furthermore, you may have no more than 6 megabytes worth of short-mapped files open for each process. On the other hand, short-mapped access lets you use 32-bit pointers, which are faster to operate with than the 64-bit ones.

* Because of the restriction on short-mapped file size and the fact that you can't open a file short-mapped if it's already opened by somebody else without mapping, your general-purpose programs (e.g. copiers, editors, etc.) should probably open files long-mapped rather than short-mapped -- it seems to be a more versatile, less restrictive access method.

* You can not access RIO, CIRcular, or MSG files as mapped files; you can access variable-record-length and KSAM files, but you'll see their internal structure (i.e. what you'd see if you read a variable file NOBUF or a KSAM file with COPY access) rather than their normal appearance.

This may not seem to be such a big problem, and it often isn't; however, I've found that one of the most useful features of the MPE file system is its ability to substitute one type of file for another -- for instance, give a message file to a program that expects input from a standard file, or a variable-record-length file to a program that expects input from a fixed-record-length file. This interchangeability will be lost for files that you open with mapped access.

* Remember that writing to a mapped file only writes the data; it does not increment the EOF. Even if you write data that ends up in record 1000 of the file, if the EOF is 200 it will stay 200. You have to do an FPOINT to the right record and then an FCONTROL mode 6 (as documented by the Accessing Files manual and by the Beyond RISC!) book to set the EOF to the right place.

  An interesting aspect of this is that the data that you write beyond
  the  EOF will actually be written there and will remain readable the
  next  time you open the file for mapped access. However, it will not
  be  readable  when  you  open  the  file  normally, and  will almost
  certainly  disappear  if  the file is  copied, :STOREd/:RESTOREd, or
  %ALTFILE;XLTRIMed.

Thus, if you write to a mapped file and forget to adjust the EOF, your programs might very well keep working just fine until the the file is next :STOREd/:RESTOREd or %ALTFILE;XLTRIM. You can get some truly bizarre bugs that way.

Don't even dare think that this is a useful feature and try to exploit it (e.g. to have some place to put hidden data that will appear not to actually be there)! Imagine trying to maintain or manage a system on which seemingly empty files were actually chock-full of data.

* If you have processes share a mapped file, you may have to do appropriate locking to prevent problems (especially if you have more than one person writing to the file). For instance, you might write all your programs so that they FLOCK the file before making any updates to it; unfortunately, this will make your program more complicated after all, the whole point was to treat the file data just as if it were normal program variables. Furthermore, the very fact that it's so easy to modify one of these shared variables (just assign to it or pass it as an output parameter to a procedure) may make it easier for you to forget to put in an FLOCK in the right place.

HOW THE FILE SYSTEM DOES I/O

One rule that we learned under MPE/V is: always do disc I/Os in as large chunks as possible. If a file has 256-bye records, don't read it from (or write it to) disc one record at a time; read it ten records at a time or, even better, thirty or sixty at a time.

The reason for this was, of course, that as the transfer count (the number of words of data read or written) on a particular disc I/O increases, the time to do the disc I/O increases much more slowly. Thus, it might take you 30 milliseconds to read 256 byes, but 100 milliseconds to read 8192 bytes; if you were planning to read those 8192 bytes anyway (and weren't just going to read one 256-byte record), you could read them ten times faster by reading them in one 8192-byte chunk than by reading them in 32 256-byte chunks. Furthermore, you'd incur the CPU overhead (which can be pretty substantial) of only one FREAD call rather than of 32 FREAD calls.

On MPE/V, the file system would always do disc I/Os in units of one block. The default blocking factor (the number of records per block) was usually not well-chosen by the operating system; for instance any file whose record size was 65 words or more would, by default, have a blocking factor of 1. This might have made sense in the earliest HP3000s (on which memory was a very scarce resource), but not on 8-Megabyte series 70s, which tended to end up being quite disc I/O-bound.

Thus, on MPE/V the recommendation was to raise the blocking factors of MPE (and KSAM) files that you frequently access, especially serially; this could save you a large fraction of the file's I/Os (increasing a blocking factor from 1 to 10 could cut by 90% the number of I/Os needed to read the file).

When disc caching was introduced, this became somewhat less important, since the file system would pre-read from 16 to 96 sectors (4K bytes to 24K bytes) whenever you'd do a serial disc I/O; thus even a file with a low blocking factor could be read with relatively few disc I/Os. However, it still paid to have the blocking factor be high since going to cache was still more expensive than getting the record from the file system buffer (though not as expensive as going to disc).

Finally beyond increasing the blocking factor, it was often a good idea to read or write the file NOBUF (so that each FREAD returned an entire block) or MR NOBUF (so that each FREAD returned several blocks). Reading a file NOBUF caused you to do the same number of disc I/Os (since the file system also read the file a block at a time); however you would save the CPU overhead of all those FREADs (which could be quite a lot). Reading a file MR NOBUF was even better, since it let you do even fewer disc I/Os (though increasing the blocking factor to a high enough value and then using plain NOBUF or even normal access could accomplish the same purpose).

The trouble with reading files NOBUF or MR NOBUF is that your program had to do its own deblocking , i.e. it had to, by itself, separate each record in the block from the next not a very difficult task, but not a trivially easy one, either.

To summarize (again, remember that this is on MPE/V), here are the ways you might read a file of 1024 256-byte records (depending on the file's blocking factor and access method):

Blocking factor Type of access # of disc I/Os # of FREAD calls

  1                 Normal           1024             1024
  4                 Normal           256              1024
  16                Normal           64               1024
  16                NOBUF            64               64
  16                MR NOBUF         32               32
                    reading 8192
                    bytes at a time)

OK, enough re-cap. What's new in MPE/XL?

Well, the good news is that all file system disc I/O is now done in units of several 4,096-byte pages, often 8 pages (32,768 bytes), though the number seems to vary rather unpredictably. That's a lot of data (probably close to the optimum from the disc drives point of view, since at some point the beneficial effects of reading larger and larger chunks of data will peter out), and it will substantially decrease the amount of disc I/O that will be done. Of course what makes this possible is all those megabytes of memory that you had to buy to make your Spectrum run; they allow HP to go straight after performance optimization without having to optimize memory usage as well (or so we hope).

This means that blocking factors are now quite irrelevant to file performance (just as they are, as we mentioned before, irrelevant to disc space usage). You may set them high or set them low, but the file transfer unit will not change.

The bad news is that each FREAD and FWRITE call still takes a good deal of time, about .25 or so milliseconds running stand-alone on my 925/LX. (This may not seem like much, but remember that not everybody gets to use a Spectrum as a personal computer! On heavily-loaded systems the FREADs and FWRITEs will take even longer to execute, and will adversely impact other users response times.)

What can we do about all this file system CPU overhead? Well, we could access the files NOBUF or MR NOBUF the MR NOBUF would now be needed not so much to decrease disc I/Os (which at one I/O per 16,384 bytes can't really be decreased much further) as to decrease the number of file system calls.

Alternatively, we could access these files as mapped files. Once we open the file, we could then access all the data in the file using simple memory accesses -- when a disc I/O is required, it'll be done for us by the memory manager, but no file system overhead will be required!

The only problem -- and this is a really big one -- is that, together with the substantial CPU time savings that mapped files give us, they can also substantially increase the amount of disc I/O that is done. While the file system accesses the disc several 4,096-byte pages at a time (my observations showed me that it usually accesses 8 pages, or 32,768 bytes, in one shot), the memory manager (and thus mapped files) accesses the disc only 4,096 bytes at a time. Thus, while we can totally eliminate file system CPU overhead by using mapped files, we could at the same time quadruple the amount of disc I/O that needs to be done!

Now, as it happens, this disc I/O increase only becomes an issue if the file is not already in memory; to the extent that it is in memory (and many parts of your most heavily used files will be), the disc I/O size is irrelevant because no disc I/O will be needed. However, if a file is entirely or largely not in memory, you could suffer a very serious performance penalty by using mapped files.

To revisit our little table of the ways you can read a file of 1024 256-byte records, this time on MPE/XL (we'll assume that the file's blocking factor is 32 -- it's quite irrelevant except for NOBUF access):

Type of access            Maximum # of disc I/Os   # of FREAD calls

Normal                    8                        1024
NOBUF                     8                        64
MR NOBUF (reading 16384   8                        16
  bytes at a time)
Mapped                    64                       0

(This assumes that the file system reads 8 4,096-byte pages at a time, something that experiments on MPE/XL version 1.2 seem to indicate.)

Of course, the actual number of disc I/Os will vary depending on how much of the file is in memory. Stan Sieler of Allegro Consultants (co-author of Beyond RISC! and one of the foremost experts on MPE/XL and the RISC architecture) ran some experiments that showed that a mapped read of a file that was 100% non-memory-resident took more than 3 times the elapsed time (though only about half the CPU time) of a file-system read; a mapped read of a 100%- memory-resident file took less than 1/9th the elapsed and CPU time of a file-system read.

The moral of the story:

* Blocking factors are no longer relevant to performance.

* NOBUF and MR NOBUF can still be a good idea.

* Mapped file access is much faster for memory-resident files, much

slower for non-memory-resident files.

Finally (as Stan Sieler discusses in his paper and as we'll discuss more in the NM FILES VS. CM FILES chapter), KSAM access can be faster from Compatibility Mode than from Native Mode -- it's faster still from OCTCOMPed code.

Oh, yes, one other thing: FOPEN calls are much faster on MPE/XL than they were on MPE/V (they typically take from 25 to about 100 milliseconds running stand-alone on our 925/LX, compared to about 300 to about 500 milliseconds on a stand-alone Micro/3000). This may not seem like much, but this can be very important for programs that open some files, do a few checks, and then terminate (e.g. logon UDC security programs). These programs can now take a lot less time than before.

CM FILES VS. NM FILES

Every so often, you'll hear people talk about CM (Compatibility Mode) files and NM (Native Mode) files. There are a few things that are worth saying about this distinction.

The first thing that might come to mind is that a CM file is somehow accessible only from CM and an NM file only from NM. This is not so; both kinds of files are equally accessible from both modes (and, of course, from OCTCOMPed code, too); in fact the access is completely transparent -- nothing behaves any differently (at least externally) from one mode to the other.

The distinction between CM files and NM files is purely internal. CM files are those for which the internal file system code is implemented in CM. KSAM files, message files, circular files and RIO files -- the code that handles these files has simply never been rewritten by HP in PASCAL/XL; whenever you access these files MPE/XL will execute the CM code that pertains to these files even if this requires switching from NM to CM. NM files of course are those whose internal code is implemented in NM -- they include all the "vanilla" files, including both fixed and variable-record length files and IMAGE databases

The main way in which this CM/NM difference manifests itself is in speed of file access. As we said, if you try to access a CM file from NM (or an NM file from CM), the system will have to switch into the other mode in order to execute the appropriate file system code.

In addition to the switch from, say your NM program to the CM file handling procedures, the CM file handling procedures will then have to switch to the NM internal file system procedures to do the actual file I/O. All these switches can take a non-trivial amount of time; for instance it took a CM program more than 23 times longer to read a circular file than an otherwise identical non-circular file; however even with this the CM program ran 20% faster than an NM program reading the same circular file! A bizarre incident indeed -- NM code running slower than CM code.

This would not be that much of a problem if the only files that were slower in NM were message files, circular files, and RIO files -- after all, how much are these rather esoteric file types used in production? Unfortunately, the same thing applies to KSAM files, which can indeed often be quite performance-critical. My tests (and Stan Sieler's as well) showed that KSAM file accesses from NM were over 10% slower than from CM and about 20% slower than from OCTCOMPed code.

This might very well mean that KSAM users ought not migrate their programs to Native Mode for now (presumably, HP will come out with an NM KSAM soon). It seems that converting to NM will slow your KSAM file accesses down by about 20% (compared to OCTCOMPed code -- if you're still in CM, you should probably be OCTCOMPing all your code); you'll have to balance this against whatever performance improvement you expect to get on your other, non-KSAM-file-access code.

HPFOPEN

A number of the new features of the MPE/XL File System (including mapped file access and a few others that we'll talk more about shortly) have been implemented in the new HPFOPEN intrinsic, a successor to the old well-loved FOPEN.

Why a new intrinsic? Because the old FOPEN intrinsic, with its limited number of parameters (13 of them), just didn't have enough room for all the data that needed to be passed. By the time MPE/XL came around,

* 14 of the 16 foptions bits and 13 of the 16 aoptions bits were

used up;

* The "device" parameter was actually used to pass no less than 4

different values (the device, the environment file, the tape density and the ;VTERM parameter);

* The "forms message" parameter was used to pass 3 different values

(the forms message, the tape label, and the KSAM file characteristics).

The MPE/V designers squeezed every last bit (almost) out of the FOPEN intrinsic because it was designed in an inherently non-expandable way; there was no way HP could have fit in the new parameters required to support the new features of the MPE/XL file system.

Much like the CREATEPROCESS intrinsic supplanted the CREATE intrinsic before it, HPFOPEN was designed to be a much more expandable (albeit, in some respects harder to use) version of FOPEN. The general calling sequence of HPFOPEN is

HPFOPEN (FNUM,     (* 32-bit integer by reference *)
         STATUS,   (* 32-bit integer by reference *)
         ITEMNUM1, (* 32-bit integer by value *)
         ITEM1,    (* by reference *)
         ...
         ITEMNUMn, (* 32-bit integer by value *)
         ITEMn,);  (* by reference *)

HPFOPEN takes as input a list of item numbers and item values; it returns the file number and the error status. A typical call might be (naturally we hope that you define constants for the item numbers -- 2, 3, 11, 12, 13, etc. -- and for the possible domain, access type, exclusive state, etc. values):

LOCK:=1; DOMAIN:=1 (* old *); ACCESS_TYPE:=4 (* input/output *); FILENAME: = '/MYFILE.MYGROUP.MYACCT/'; EXCLUSIVE:=3 (* shared *); HPFOPEN (FNUM, STATUS, 2, FILENAME, 3, DOMAIN, 11, ACCESS_TYPE, 12, LOCK, 13, EXCLUSIVE); IF STATUS<>O THEN PRINTFILEINFO (0);

The same call with the FOPEN intrinsic would be:

FIlENAME:='MYFILE.MYGROUP.MYACCT '; FNUM:=FOPEN (FILENAME, 1, OCTAL('344')); IF CCODE<>2 (* condition code equal *) THEN PRINTFILEINFO (0);

As you see, the HPFOPEN intrinsic call is actually rather more verbose than the FOPEN intrinsic call, and may be argued to be harder to write, especially since you actually have to declare the variables DOMAIN, LOCK, ACCESS_TYPE, and EXCLUSIVE (since they, like all HPFOPEN item values, must be passed by reference). On the other hand, it does keep you from having to remember what all the positional parameters are (quick -- which FOPEN parameter is the file code?). More importantly, HPFOPEN lets you do things that FOPEN won't:

* Open a file for mapped access (items number 18 and 21).

* Open a file given the entire right-hand side of a file equation

(item number 52). This way you don't have to worry about all the other items or any magic numbers -- just say something like:

FILEEQ:='%MYFILE.MYGROUP.MYACCT,OLD;ACC=INOUT;SHR;LOCK%'; HPFOPEN (FNUN, STATUS, 52, FILEEQ);

This can be a lot cleaner than the normal HPFOPEN approach, especially if all the parameters are constant (rather than having one be a variable in which case you'd have to assemble the FILEEQ string using a STRWRITE or, in C, an sprintf). Unfortunately, not all file parameters are supported with this syntax -- exceptions include mapped files, user labels, disallowing file equations and several others.

Note that the value of FILEEQ started and ended with a "%"; it actually didn't matter what character it started and ended with as long as it was the same character. Rather than rely on terminators such as blank, semicolon, or whatever, HPFOPEN lets you specify your own string terminator as the first character in the string. In almost all the HPFOPEN items that are strings (including the filename parameter itself) must be passed this way.

* Open a file as a new file and immediately save it as a permanent file (item number 3, value 4); this avoids the MPE/V headache of having an FOPEN succeed and then at the very end of the program having the FCLOSE fail because a file with this name already exists.

* Specify, when a file is opened, what disposition it is to be closed with (item number 50). In other words, if you open a file that you know should be purged when you're done with it, you can indicate this on the HPFOPEN call; then, even if the program aborts before FCLOSEing the file, the file will get deleted.

* And a few other, less important things. In the future, though, new features that are added to the file system will be added through the HPFOPEN intrinsic, not through the already overloaded FOPEN, so this list will probably grow with time.

One other important point: how are you to interpret the STATUS variable that HPFOPEN returns to you? The manual tells you that the low-order 16 bits are always 143 (the code indicating that this is a File System error) and the high-order 16 bits are values from a 16-page table in the Intrinsics Manual. Naturally, rather than referring the user of your program to this manual, you really ought to format the message yourself, using the HPERRMSG intrinsic:

HPERRMSG (2, O, O, STATUS);

Easy enough to do, but I'll bet you that half the programs you run won't do this. If they don't and you have MPEX Version 2.2.11 or later, you can just say

%CALC WRITEMPEXLSTATUS (statusvalue)

and get the text of the error message. Of course both the HPERRMSG call and the %CALC WRITEMPEXLSTATUS will work for all "MPE/XL-standard" 32-bit status results.

Finally, a few other interesting points:

* Whenever you create a file using HPFOPEN and do not specify a file

limit, it will be built with a file limit of 8,388,607 records (not the measly little 1,023 that are the default on MPE/XL). This may be a good idea in theory, but in practice it means that you must dose your files with disposition 16 (the "trimming" disposition) since otherwise your file will be allocated in chunks of 2,048 sectors each, so you could easily have some 2048-sector files with one or two records.

* It has been said that HPFOPEN is not callable from CM programs.

HPFOPEN is indeed an NM procedure so it can't be called from CM programs as simply as, say, FOPEN can be; however, using HPSWTONMNAME (or HPLOADNMPROC and HPSWTONMPLABEL), one can relatively easy switch directly to HPFOPEN, passing to it whoever parameters you please. You don't have to write any native mode code to do this, nor do you have to put anything into any SLs or NLs -- it's a bit trickier than a direct call, but not by all that much.

It is, however, true that there is no (documented) way of directly manipulating virtual pointers in CM, so mapped file access from CM is pretty much out.

ACCESSING FILES

While internal file structure and disc space considerations have changed dramatically with MPE/XL, the rules for accessing and sharing files have not (except for the addition of mapped files and the decrease in importance of NOBUF and MR NOBUF access). There's no reason to go into them in much detail now; I'll just go through a few of the key items that are worth repeating:

* If you want to have multiple writers appending to a (non-KSAM) file, use ;SHR;GMULTI;ACC=APPEND access. If you do this, you will not need to lock the file.

* If you want to have multiple writers doing any sort of writing other than appending, be sure that you lock the file, not just before the write but before any read done in preparation for the write. Thus, if a process needs to read a record, calculate a new value for a field in the record, and then write the record back, it must lock before the read and unlock after the write; otherwise, it risks the record being modified by somebody eke between the read and the write and then having this other person's modifications wiped out.

* Attempts to lock a file (or a database) when you already have a file or database locked will normally fail with an FSERR 64. If you :PREP your program with ;CAP= MR, the attempt will succeed, but you stand the risk of causing a deadlock (which will still require a system reboot to resolve).

* If you must use ;CAP = MR, make sure that all your multiple locks are acquired in the same order -- if one program locks file A and then file B, all programs must lock those files in that order; otherwise, if any program locks file B and then file A, a deadlock becomes quite possible.

PROTECTING YOUR FILES AGAINST SYSTEM FAILURES

MPE/XL relies very heavily (even more so than MPE/V) on disc caching -- keeping as much disc data in memory as possible to speed up access to it. Unlike MPE/V, which, by default, only used this cache for reads and always did the writes to disc, MPE/XL caches writes, too; if you write a record to a file, that record might not get written to disc for an indefinite amount of time.

This has some substantial performance advantages (since a lot of disc I/O is avoided this way), but obviously puts your files very much at risk when the system crashes. KSAM files and IMAGE files seem to be protected by MPE/XL against loss of data at system failure time; unfortunately plain MPE files can very easily lose a lot of recently-written data when the system crashes.

One of these forms of data loss could happen (and often did) on MPE/V -- when you're appending to an MPE file, the EOF pointer does not get updated on disc until the file is closed or a new extent is allocated. Thus, the data that you append to an MPE file can get completely destroyed by a system failure because the EOF pointer did not get properly set.

The solution to this problem, just as in MPE/V, is to do FCONTROLs mode 6, which post the EOF pointer to disc, as often as possible when you're appending to an MPE file. You might, for instance, do an FCONTROL mode 6 after every write, which will give you almost complete safety but also slow things down substantially; or, you could keep a counter, and do FCONTROLs mode 6 every, say, five or ten records, thus minimizing your overhead while still protecting most of your data.

Unfortunately, on MPE/XL, there's more to it than this. Any data that you write to a plain MPE file -- even if you're not appending to it -- might get lost in a system failure because it may not get posted to disc until some time after you do the writes. On MPE/V, this possibility was limited to the data that was in your memory buffers (usually no more than about 2 blocks worth of data); on MPE/XL, any data written since you opened the file could conceivably be lost.

For example, I ran a test with a file of 1000 records, each 256 bytes wide; I overwrote all 1000 records, kept the file opened, and re-booted the system. When the system came back up, only the first 768 of the new records were actually in the file; the remaining 232 records were still the old records from the time before I did my writes. (Note that 768 = 3*256; I'm not sure if there's any significance to this but I suspect that there is.)

What can you do about this? Well, the simplest solution seems to be to call the FSETMODE intrinsic with the second parameter set to 2. This means (according to the manual) "force your program to wait until the physical write operation is completed (the record is posted)", and this is what it seems to do. Of course, this causes each logical write to generate a physical I/O -- a great deal of overhead -- but it protects your data.

Alternatively, you can call FCONTROL mode 2 or FCONTROL mode 6 after each write or once every several writes (FCONTROL mode 2 is faster and may work well in cases where you're not appending and thus need not post the EOF); this is more work for you as a programmer than just calling FSETMODE, but it may be more efficient because you can do the FCONTROLs once every several records, thus decreasing the overhead of the extra disc I/O (but increasing the amount of data you may lose in case of a system failure).

A FEW WORDS ABOUT PERFORMANCE TESTS

The performance guidelines I've talked about (such as "FREADs of files that aren't in memory are faster than mapped file accesses" or "FCONTROLs mode 2 are faster than FCONTROLs mode 6") are strictly based on experience (my own or Stan Sieler's -- see his "MPE XL and Performance: Not Incompatible" paper). This experience may be inapplicable to your particular application, inapplicable to your version of the operating system, or perhaps just plain mistaken; I strongly encourage you to run your own performance tests to figure out how fast various file access methods work for you.

Unfortunately, file system performance measurement on MPE/XL is substantially more difficult than on MPE/V because of MPE/XL's immense caching capabilities. It is almost guaranteed that, if you run a test twice in a row, you will get completely different results -- the first time your data was quite likely out on disc, but the second time it had just been read into memory and was therefore quite probably still in memory. Unlike MPE/V, there are no :STOPCACHE commands that you can use to make sure that this doesn't happen.

There are two key things you can do to detect possible bias due to a file's presence in memory and to avoid such bias:

* To find out how much of a file is in memory, do the following:

- Go into DEBUG.

- Enter "MAP filename" to open the file as mapped; this will output

a line such as:

1 MYFILE.MYGROUP.MYACCT 1234.0 Bytes = ...

- The 1234.0 in the above line is the virtual memory address of

the file -- type

=VAINFO(1234.0, "PAGES_IN_MEM")*#16,#

The value output will be the number of sectors of the file that are currently in memory (the #16 is there because there are 16 sectors per 4,096-byte page).

- Close the file by saying "UNMAP n", where n is the first number

output by the MAP command (in this example, 1).

* Getting the file out of memory is a tougher proposition. My experience has been that the only way of doing this is to cause enough memory pressure to get the file's pages to be discarded (after they have, of course, been flushed to disc).

One way of doing this is to read a very large file into memory; SL.PUB.SYS (22 megabytes on my system), NL.PUB.SYS (15 megabytes), and XL.PUB.SYS (6 megabytes) are good candidates. Just say:

FILE S=SL.PUB.SYS;LOCK COPY *S,TESTFILE;YES PURGE TESTFILE

This will read all of SL.PUB.SYS into memory, which on my 925/LX is enough to flush any other files I may already have in memory. All you rich people out there with 128 megabytes of unused memory may need more than just this file, but you can always tell if the flushing succeeded by using DEBUG's VAINFO function discussed above -- if it tells you that your file has only 0 pages in memory, you know that you've flushed it out.

Given these precautions, you should be able to do your own performance tests (on an otherwise idle system, of course). Beware, though -- at least one key test I know of yielded completely different results on MPE/XL 1.1 and 1.2 -- much of the file system's performance characteristics seem to be quite MPE-version-dependent.

ODDS AND ENDS

Finally a few miscellaneous features which couldn't fit in anywhere else:

* DEBUG/XL's MAP command makes the debugger a powerful data file editor even more convenient than the old DISKED on MPE/V. (Anything would be more convenient than DISKED. Do you remember how, when you asked for it to display octal and ASCII on the same line it would display 8 words of the data in octal and then the first 8 bytes of the data in ASCII, completely ignoring the last 8 bytes?)

In DEBUG/XL, you can say

MAP filename WRITEACCESS

DEBUG will output for you the "file index number" (used to close the file with the UNMAP command), the filename, the file's visual address, and the file size, e.g.

1 MYFILE.MYGROUP.MYACCT 1237.0 Bytes = 7560

(For this example, we're assuming that you're in CM debug, so the numbers are output in octal; in NM debug, the output and default input would be in hex.) You can then display and modify data with addresses 1237.0 through 1237.7557 -- all the bytes in the file; thus, you could say

DV 1237.200,10

to display the 10 (octal) 32-bit words starting with byte 200 (octal) of the file -- the MV command will let you modify data. Note that DV and MV expect byte addresses, not record numbers and offsets within records; you have to do the calculation yourself (for instance, if each file's records are #256 [decimal] bytes long, record #10 occupies bytes #2560 through #2815).

The MAP command also provides you with one of the few ways to easily see (and edit) a file's user labels (:FCOPY, for instance, doesn't let you display their contents, and neither does the :PRINT command).

When MAP gives you an address whose second word is not 0 (1237.1400, for instance, rather than 1237.0), this means that the file has user labels (in this case, %1400/%400 = 3 labels). User label 0 starts at 1237.0, user label 1 starts at 1237.400, user label 2 starts at 1237.1000, and data record 0 starts at 1237.1400. (User labels are always %400 = #256 bytes long.)

Thus, you can use DV and MV to modify the file's user labels; also, remember that data record 0 now starts at a byte address other than 0 (in the example, %1400) -- keep this in mind when calculating the byte address of a particular byte in a particular record. If your file's records are #80 (= %120) bytes long, then, say, byte 6 of record 4 will be at location 1094 (= %1400+4*%120+6).

* If you do a :DISCFREE A (which shows you how many bee space chunks of each size there are on your disc), beware! You'll often see several large free chunks on LDEV 1 even though you're running out of disc space (or at least of contiguous disc space).

On MPE/XL (unlike MPE/V), transient space (analogous to MPE/V's virtual memory) is treated as free disc space; however at least 17% of the system disc (or more if you configure it that way) is reserved for transient space. Thus, you could have a huge chunk of free space on your system disc and still have it completely unusable for new disc files because it's reserved for transient space.

  :DISCFREE B tells you how  much  space  is  reserved  for  transient
  space,  so its output shouldn't be too confusing; however, :DISCFREE
  A's output can be quite misleading if you don't keep  the  transient
  space issue in mind.

This should probably not be overwhelmingly important, since contiguous space is less important on MPE/XL than on MPE/V, and you should therefore run :DISCFREE B more often than :DISCFREE A; however, I got bit by this thing myself when I was doing research for this paper, so I decided to mention it.

CONCLUSION

The MPE/XL file system is different from the MPE/V file system in many respects but is also similar to it in many respects. This paper was largely dedicated to the differences (since they're more interesting), but there are very many similarities as well, largely dictated by the requirement of complete (well, almost complete) compatibility -- a requirement that HP rigidly enforced on itself, and, I must say ,very much lived up to.

Many of the old and unpleasant limitations of the MPE/V file system have been lifted; a few remain in place (such as the 3-level directory structure and a few other, relatively minor, problems); a few new ones have probably been added, but the user community hasn't discovered them yet and probably won' t for some time. (Who would have thought, in 1972, that people would be running into the 2,097,120-sector limit on file size?) Performance and disc space are still potential problems, and will be for a long time to come as long as CPU power and disc storage cost money.

I would like to thank Jason Goertz, Bob Green, Randy Medd, David Merit, Ross Scroggs, and, especially, Steve Cooper and Stan Sieler for their reviewing the paper and for their many excellent comments and suggestions. I would also like to refer the interested reader to Stan Sieler's "MPE XL and Performance: Not Incompatible" paper, published in the SCRUG '89 Proceedings, and, of course, to the Beyond RISC! book from Software Research Northwest (by S.Cooper, J.Goertz, S.Levine, J.Mosher, S.Sieler, and J.Van Damme edited by W.Holt).