Dynamic Dataset Expansion

Dynamic Dataset Expansion

Fred White

Adager Corporation

Sun Valley, Idaho 83353-2358 · USA

adager.com

Introduction

A TurboIMAGE (or, for short, IMAGE) dataset is enabled for DDX by DBSCHEMA at root file creation time if the capacity specs for the dataset includes a maximum capacity, an initial capacity and an increment. For convenience, I will use DDX (dynamic dataset expansion) for both details and masters. Other people may use DDX for details and MDX for masters.

When the database is created by DBUTIL, the EOF (end-of-file) and disk space allocated to a DDX dataset is determined by the initial capacity. The FLIMIT (file limit) is determined by the maximum capacity. The increment is unknown to the File System but is retained in the root file along with the initial, current and maximum capacities.

You may use Adager to enable a dataset for DDX (i.e., to change the dataset's FLIMIT and to modify the increment and the initial, current and maximum capacity fields in the root file in keeping with your specs).

The term dynamic refers to the fact that, whenever IMAGE determines that a DDX dataset requires additional capacity, IMAGE can perform an online expansion of that dataset (bumping the EOF to the new, higher value and updating the database's root file and the dataset's user label to reflect this increase in current capacity).

DDX for details

When a DBPUT occurs on a DDX detail which is "full" (relative to its current capacity), DBPUT responds in one of two (2) ways: If the current capacity is not less than the maximum capacity, DBPUT returns a dataset full error code. Otherwise, DBPUT requests allocation of additional disk space sufficient to accommodate the specified increment of capacity.

If the Disk Space Manager is able to fulfill the request, the newly acquired space is zeroed out, the EOF is increased in keeping with the increased size of the space allocated to the dataset, DBPUT updates the root file and the dataset's user label to match the new current capacity and then utilizes the newly acquired space to complete the DBPUT. Otherwise, DBPUT returns a dataset full error code.

Benefits of DDX for details

You may create your DETAIL datasets with smaller initial capacities with little fear of encountering a dataset full condition. This minimizes your initial disk space requirements while still permitting your detail datasets to grow (or not) on an as needed basis.

Over time, some details may expand significantly and others little if at all. This minimizes disk space usage and backup times.

You may never need to shut down your application simply to increase the capacity of a detail dataset.

Adding entries to masters

Before discussing master datasets which have been enabled for DDX, let's understand how IMAGE goes about adding an entry to an ordinary master (i.e., a non-DDX master):

It uses the master's capacity and the new entry's key value to calculate the primary address of the new entry. It accesses the IMAGE block containing the primary address.

If the primary address location is empty, the new entry is placed there.

If the primary location is occupied by a secondary (of some other synonym chain), a search is made (see below) for an empty location and the secondary is moved (thus becoming a migrating secondary) to that location and the new entry is placed in its primary location.

If the primary location is occupied by a primary entry, IMAGE verifies that the new entry's key value doesn't match that of the primary entry and then traverses the synonym chain (if any) verifying that the new entry's key value doesn't match the key value any of the synonyms. A search is then made (see below) for an empty location and the new entry is placed there and attached to the end of its synonym chain.

Searching for an empty location

IMAGE first checks the current block. If an empty location is not found in the current block, IMAGE cyclically searches successive blocks until an empty location is found.

The disk access and wall time to perform this search is usually quite small and significantly degrades performance only when the dataset is large and (a) through the mis-use of non-hashing keys (or the use of hashing key values whose primary addresses are not evenly distributed) one or more long clusters of occupied locations exist (even when the master is not nearly full) or (b) despite good distribution of entries the dataset becomes so full (>95%?) that sheer overcrowding results in long clusters which some searches are compelled to span.

Minimizing the performance degradation of long searches

If non-hashing keys have been mis-used, in some cases changing the capacity can result in all entries becoming primaries so that searching is never needed and performance of DBPUT, DBFIND and directed DBGET is optimal. If your case doesn't lend itself to that solution, your other option is to convert the keys to hashing keys with Adager and recompile your applications software.

If your search performance is bad with hashing keys, your only option for regaining performance is to change capacity (increasing it if the master is nearly full).

Both of these "solutions" are expensive.

The storage areas of DDX masters

The storage areas for DDX masters consists of a primary area and a secondary-only area (initially non-existent).

At dataset creation time, only the disk space for the primary area is allocated and its size is determined by the value of the initial capacity.

Disk space is allocated for the secondary-only area only when the master is dynamically expanded.

Additional expansions enlarge the secondary-only area but the size of the primary area never changes.

Adding entries to DDX masters

IMAGE uses the initial capacity and the new entry's key value to calculate the primary address of the new entry.

It then accesses the IMAGE block containing the primary address. If the primary address location is empty, the new entry is placed there. If the primary location is occupied by a secondary (of some other synonym chain), a search is made (see below) for an empty location and the secondary is moved (thus becoming a migrating secondary) to that empty location and the new entry is placed in its primary location.

If the primary location is occupied by a primary entry, IMAGE verifies that the new entry's key value doesn't match that of the primary entry and then traverses the synonym chain (if any) verifying that the new key value doesn't match the key value of any of the synonyms. A search is made (see below) for an empty location and the new entry is placed there and attached to the end of its synonym chain.

Searching for an empty location

There are various possibilities, depending on the location of the current block.

If the current block is within the PRIMARY area

IMAGE checks the current block. If an empty location is not found in the current block, IMAGE cyclically searches successive blocks until an empty location is found or the cyclical searching becomes excessive (see below), in which case IMAGE obtains an empty location from the secondary-only area (see below).

If the secondary-only area is full or doesn't exist, IMAGE performs a dynamic expansion (as described in the introduction) before obtaining the empty location.

IMAGE employs two thresholds to determine whether the search time is excessive. One is a count of the maximum number of IMAGE blocks to be examined. The other is a percent of the initial capacity.

At present, both thresholds are constants but they will probably become configurable.

A search time is considered to be excessive if either of these thresholds is reached without an empty location being found.

If the current block is in the SECONDARY- ONLY area

IMAGE obtains an empty location directly from the SECONDARY ONLY area (whose space management is identical to that of DETAIL datasets).

If the delete chain head is non-zero, its value provides the address of the empty location. Otherwise, the secondary-only area's HighWaterMark (HWM) is incremented by one to provide the address of the empty location.

Benefits of DDX for masters

You can eliminate performance degradation caused by long searches for empty locations. You may never need to shut down your application to increase the capacity of a master dataset.

Of course, if your master employs hashing keys, you can obtain better performance (except for serial DBGETs) right from the start (and in the long run) by specifying an initial capacity which provides at least as much capacity as you can afford and about 10% more than you may ever need. The problem with this solution is that you may not know how much capacity you need.

By providing a large primary area, you minimize synonyms and (with the exception of serial reads) maximize performance.

By providing a small primary area you maximize synonyms and minimize performance.

Your motivation for using DDX masters should not be to conserve disk space.

If you wish to conserve on disk space usage, focus your efforts on detail datasets, not on master datasets. If you do a good job with your detail datasets you can be more generous with space for your master datasets.

If you specify DDX for a master, stipulate a modest increment just in case you have underestimated your needs (and solely to protect against search time performance degradation).

The primary address calculation for integer (i.e., non-hashing) keys is predictable and such keys should never be used unless you know exactly what you're doing. (See my paper, The Use and Abuse of Non-hashing Keys in Adager's web site, at URL http://www.adager.com/TechnicalPapers.html)

However, if you have inherited a master with integer keys and your performance falls off a cliff, either (a) someone has mistakenly changed its capacity from the one carefully chosen by the designer to a capacity which destroyed the performance or (b) integer keys shouldn't have been used in the first place.

DDX Shortcomings and Pitfalls

Jumbo datasets (i.e., exceeding 4 gigabytes in size) cannot be DDXed. This may change in the future.

If your increments are too small and expansion occurs frequently, over time your disk space becomes fragmented.

If your increments are unnecessarily large, at expansion time the Disk Space Manager may not be able to allocate the requested disk space.

To avoid these last two problems, continue to monitor and anticipate your system-wide disk space needs.

At DDX enabling time, don't specify an unnecessarily large increment. A smaller increment has a better chance of being available at expansion time.

Furthermore, the expansion operation (typically a hiccup in your application's performance) can become an annoying pause when the increment is so large that it takes a long time for the Disk Space Manager to allocate the space and for the space to be initialized.

Epilogue

The purpose of this paper was not to provide you with hard and fast rules on the use of DDX but rather to provide you with a basic understanding of how DDX works and why you might want to use it so that you can make informed decisions when applying the concept to your databases.

Fred White (Senior Research Scientist at Adager since 1981) and Jonathan Bale (current Manager of Database Research & Development at Hewlett-Packard's Computer Systems Division) worked together as leaders of the original IMAGE/3000 development team.

What do your worldwide HP e3000 colleagues think of Adager? See a sample of comments from real people who use Adager in the real world, where performance and reliability really count.
Back to Adager