Your Adager Guide: Adager Maintenance Functions

Your Adager Guide (Section 11 of 13)

First | Prev Table of Contents | Index Next | Last

Adager Maintenance Functions

You can change dataset capacities. You can delete all of a dataset's entries very quickly. To optimize disc throughput, you can move datasets to disc devices of your choice. To improve the performance of your applications, you can reblock and repack datasets.

Change Capacity
You may increase or decrease a dataset's capacity (the number of entries it can hold). You may specify the new capacity in several ways:

A number requests a specific capacity. If you specify 12345, the resulting capacity will be 12345.

A number followed by a percent sign specifies a percentage full. Minimum is 1%, maximum is 100%. If the dataset has 183 entries and you specify 10%, the resulting capacity will be 1830.

A number preceded by a plus sign specifies a number of free entries. If the dataset has 183 entries and you specify +17, the resulting capacity will be 200.

Master Capacities
Changing a master dataset's capacity requires the reorganization of its entries, because IMAGE's hashing algorithm depends on the dataset's capacity.

For master datasets, you should provide a reasonable amount of excess capacity for DBPUT's sake. You want enough free entries scattered throughout the dataset so that DBPUT can quickly find space for new entries or for migrating secondaries. But you don't want so many free entries that you seriously impact the time for serial scans and backup (not to mention disc space and backup tapes).

Detail Capacities
A detail dataset requires less processing time than a master dataset, because Change Capacity does not alter the internal distribution of detail entries. (If you want to reorganize the entries of a detail to improve its performance and to reduce its HighWater mark to the lowest possible value, please use Repack Dataset.)

Detail capacity changes have always taken remarkably short times with Adager, since 1978. In the 1990s, they have taken still less time with Adager's improved technology. Even if you run an older CISC HP3000, you benefit from a hardware-independent feature of Adager: ultra-fast detail capacity decreases. If you run a newer RISC machine (under MPE/iX 5.0 or later), you enjoy the additional benefit of Adager's ultra-fast native-mode detail capacity increases.

According to IMAGE's rules, a detail dataset must have a capacity which is a multiple of its blocking factor. As a consequence, Adager may round up your requested capacity.

Fast dataset capacity changes
Both CISC and RISC computers benefit from a hardware-independent feature of Adager: ultra-fast detail capacity decreases.

In addition, RISC users (under MPE/iX 5.0 and later) benefit from Adager's ultra-fast native-mode increases of detail capacities and changes of master maximum capacities (whether increasing or decreasing) that do not change InitialCapacity.

Enabling datasets (masters or details) for dynamic expansion
To enable a dataset for DX, specify the following information (which closely follows DBSCHEMA's syntax for DX attributes) with a non-zero Increment:

MaximumCapacity InitialCapacity Increment

Maximum Capacity
MaximumCapacity is the value beyond which you are not willing to let the dataset grow.

You can specify MaximumCapacity as you would specify any capacity (see the section on Change Capacity above for information regarding specific, percentage-full & relative requests).

Initial Capacity
InitialCapacity is not important for details but defines the hashing capacity for masters. IMAGE uses this value (in conjunction with the key's type and value) to calculate a master entry's location.

If you put "!" instead of a specific value for InitialCapacity, Adager will:

Calculate a reasonable InitialCapacity default on your behalf for detail datasets.

Preserve the InitialCapacity for master datasets (also known as the master's PrimaryCapacity or as itsHashingCapacity).

This is very useful to enable a master dataset for MDX (or to change the MaximumCapacity for a master dataset that is already enabled for MDX) without rehashing all of its entries, which is a time-consuming operation.

You can change the InitialCapacity of detail datasets without any performance penalty, because there is no need to relocate detail dataset entries. If you want to reorganize detail entries to improve the performance of your applications, please see "Repack Dataset" on page 31 .

Increment
Increment is the amount of space that IMAGE adds to the dataset each time it reaches the limit imposed by its current capacity.

You can specify Increment as a number of entries or as a percentage of the initial capacity (if you specify "%" after the value for increment).

If you put "!" instead of a specific value for Increment, Adager will preserve its current value.

MDX primary and secondary storage areas
For master dynamic capacity expansion (MDX), IMAGE uses two concepts:

Primary storage area is the area between 1 and the initial capacity (also known as hashing capacity for masters).

Secondary storage area is the area between the initial capacity (also known as hashing capacity) and the current highwater mark.

Disabling datasets (masters or details) for DDX
To disable DDX, specify the dataset's current capacity (which Adager conveniently displays for you) as the only value.

Do not put any values (or "!") for InitialCapacity or for Increment. Simply specify the dataset's current capacity by itself:

CurrentCapacity

Verification of DDX values
During its preprocess consistency checking, Adager detects (and recommends you to correct) any inconsistencies in DDX values.

For instance, Adager does not allow DDX if MaximumCapacity would result in a dataset greater than 4 gigabytes, because IMAGE does not yet handle DDX with jumbo datasets. As soon as IMAGE removes this restriction, Adager will allow DDX for jumbo datasets.

Automatic capacity management
There are many ways to automate the management of dataset capacities. Here is a small sample:

MPEX's DBADGALT command
If you are a licensed user of MPEX from VESOFT, you can take advantage of MPEX's DBADGALT command to manage your capacity requirements with Adager.

CapMgr from ICS
If you are a licensed user of CapMgr from Idaho Computer Services, you can take advantage of its Adager-related commands.

Command files
MPE's Command Interpreter is very powerful. As an example that you can follow, Paul H. Christidis created and contributed a command file, called capchang.library.rego, which is distributed in the Adager tape. It builds jobs to monitor your datasets and to invoke Adager, when necessary, to repack the datasets and to maintain them within the limits that you specify.

Erase Dataset
Adager deletes all of the entries of a dataset, thereby erasing it. Adager completely initializes the dataset and all of its associated data structures. Adager even cleans up an empty dataset, because it may have structural remnants from previous activities. These are the specifications:

Adager can always erase a detail dataset.

Adager can always erase a manual master dataset that has zero paths.

Adager can erase a manual master dataset with paths as long as its related details are empty.

To erase an automatic master dataset, erase all of its associated detail datasets. According to IMAGE's rules, you cannot delete automatic master entries explicitly.

Move Dataset
By moving one or more of your datasets to different disc device classes or disc device numbers, you may be able to balance the I/O activity across your drives, channels and buses. For example, you may want to place a heavily used detail dataset on one disc drive and the related master datasets on other disc drives.

Reblock Database
You can reblock datasets and you can also change the database's BufferLength.

IMAGE uses MPE and Posix files to keep datasets on disc. A file consists of physical units called blocks. The number of IMAGE entries that can exist in a block is the dataset's blocking factor. The database's BufferLength is the maximum block size for any dataset.

You may improve performance or minimize disc requirements by changing dataset blocking factors. Your reblocking strategy depends on various issues, including the kind of hardware (CISC or RISC) and operating system (MPE V, MPE XL, or MPE/iX), the amount of disc space, the duration of backup operations and amount of backup tapes, the size of main memory, and the kind of database access.

As examples, let's discuss type of database access and disc space for CISC hardware under MPE V or earlier (these considerations do not apply to RISC hardware, where the concept of IMAGE block loses importance due to the use of mapped-file access). If your applications support an on-line environment, your priority is to provide short response times. You typically want smaller blocking factors to minimize the size of each "packet" going back and forth between disc and main memory.

If your applications support a batch environment, your objective is to maximize the throughput of large serial scans. You probably want larger blocking factors to minimize the number of disc accesses.

Ideally, you would like to minimize disc usage and maximize the run-time performance of your database applications. In reality, though, you may have to settle for some reasonable compromise.

To minimize disc usage, in general, use a BufferLength which is a multiple of 128 (or slightly less than a multiple of 128). This, by itself, is not sufficient, because you still have to try to line up every dataset block size as close as possible to the BufferLength. With Adager, you can try all kinds of combinations to suit your requirements.

Reblock Database allows you to specify a target Buffer Length and then select a set of datasets that you want to optimize for disc space. To prevent any dataset from exceeding IMAGE's Capacity Limit, Adager will not automatically reduce the blocking factor of any dataset. To reduce the blocking factor of an individual dataset, use Reblock Dataset.

To maximize the run-time performance of your applications, you should consider the optimization of chained access via Repack Dataset.

Reblock Dataset
Reblock Dataset allows you to specify a new blocking factor for any dataset.

This is very useful when you need to increase a dataset capacity beyond the IMAGE Capacity Limit allowed by the dataset's current blocking factor.

Repack Dataset
In everyday life, we do spring cleaning to tidy things up. In IMAGE, we do dataset repacking to clean up messy chains.

With Adager, you can repack master datasets as well as detail datasets, which are optimized for different access strategies.

MastPack (repacking a master dataset)
You can access master entries more efficiently when there is a reasonable amount of "ventilation" space for new entries (and for the migration of secondary entries when you add a new primary entry that collides with an existing secondary or when you delete the head of a synonym chain).

"Reasonable" means that you do not want to have excessive idle space which will increase the time required for serial master-dataset scans and backups (not to mention the amounts of wasted disc space and tape-backup space).

You control the amount of free space for a master dataset by means of Change Capacity. Within your specified master capacity, MastPack distributes—as equitably as possible—the available free space throughout the dataset's space.

DetPack (repacking a detail dataset)
Repacking a detail dataset along a given path reorganizes the dataset's entries to optimize chained access along that path.

Adager places the members of each chain in contiguous locations (so that their physical order coincides with their logical order on the chain). This minimizes the number of blocks occupied by each chain of the repacking path, thereby reducing disc input/output (I/O) during chained access along this path.

In the process, Adager squeezes out empty (deleted) entries and lowers the dataset's HighWater mark.

Adager preserves chronology
Adager provides several repacking options, all of which preserve the chronological ordering of the entries on all chains on all paths.

You can optimize chained access for only one path per detail dataset
Optimizing one path usually affects the performance of other paths. As an example, consider your library:

If you sort your books by author, it is very convenient and fast to look for a given author's books—but it becomes difficult to look for books on a given subject.

If you sort your books by subject, it is very convenient and fast to look for a given subject—but it becomes difficult to look for books by a given author (particularly so in the case of authors who write books on different subjects).

If you do not have duplicated collections of books, you must choose one sorting method. If you are willing to have duplicated collections of books, then you can sort each collection according to its own method.

Analogously, the only way to optimize the information in a given detail dataset for more than one access path (via DBFIND and DBGET/chained—and without having separate TPI indexes) is to actually have physically different copies of the dataset, each organized according to a given criterion. This, unfortunately, adds to your maintenance burden and you may not be willing to follow this idea.

If you prefer to stay with just one copy of the dataset, you should give some thought to your selection of the repacking path.

A good choice is to repack, using option 3 (chained), along a path that is frequently accessed and whose chain lengths are greater than 1.

You can certainly repack along a given path for daily on-line processing and you can repack along a totally different path for weekly batch processing.

Repacking along the primary path may not be a good idea if the primary path just "happened" by default (if the database designer failed to explicitly specify a primary path in the original schema, IMAGE, just selected the first unsorted path as the primary path).

A randomly-chosen primary path may have chains with only one entry. Because you can only repack along one path, repacking an undeserving primary path would not benefit anybody.

The frequency with which you should repack a dataset depends on the turnover rate of the entries in the dataset. As entries come and go, the chain locality derived from the last repacking diminishes, gradually degrading the performance of your chained access on the repacked path.

DetPacking & changing capacity
You can change a detail dataset's capacity as you repack it, including capacity changes that involve Dynamic Dataset Expansion (DDX) and jumbo datasets.

See "Change Capacity" on page 26 for information on the dialogue.

DetPack (serial)
The serial DetPack option compacts all existing entries—maintaining their original relative locations—into the smallest possible space (by squeezing out empty entries).

This option reduces the HighWater mark to its lowest possible value (thereby allowing you to reduce the dataset's capacity to its lowest possible value, if you desire to do so).

This option is the fastest way to repack a detail dataset, because Adager does not reorganize the entries. Adager simply presses or joins the entries firmly together, consolidating them into one continuous span. As all DetPack options do, this method preserves the chronological ordering of the entries on all chains on all paths.

Let's see, as an example, a detail dataset (blocking factor = 10) with 21 entries. The path has 4 chains:

6 entries have search field value A

8 entries have search field value B

4 entries have search field value C

3 entries have search field value D

A dot represents a free entry. The plus sign ("+") represents the first entry above the HighWater mark. Before repacking, the first 5 blocks of the detail dataset look like this:

CAD.ABBC.A .BCBAB.D.B A....DB... ...C.A.B.. ...+......

Option 1 (serial) produces this result:

CADABBCABC BABDBADBCA B+........ .......... ..........

DetPack (sorted)
The sorted DetPack option sorts the entire dataset according to the values of a virtual sort key which you specify (using up to four fields of the dataset).

This option may benefit serial scans that do control breaks when the value of the virtual sort key changes. But this option may not benefit chained access to the dataset along any path, unless you have specified some path's search field as the primary sort key in your virtual sort key. (Even so, because of Adager's preservation of the logical chronological order of the entries in the chains, there is no guarantee of "good locality.")

DetPack (chained)
The chained DetPack option reorganizes the detail's chains to coincide with the serial order of the master ChainHeads. This optimizes performance for those applications which read a master dataset serially and, for each master entry, perform chained detail reads.

This option also optimizes performance for on-line applications that use random access to master ChainHeads followed by chained access to their corresponding detail entries. As all DetPack options do, this method preserves the chronological ordering of the entries on all chains on all paths.

DetPack (SuperChained)
The SuperChained DetPack option guarantees that each detail chain, except possibly the last one, spans the minimum number of IMAGE blocks. As all DetPack options do, this method preserves the chronological ordering of the entries on all chains on all paths.

Differences between DetPack options
To see the difference between DetPack options 3 (chained) and 4 (SuperChained), let's use, as an example, a path whose master dataset's entries are organized sequentially in order ABCD.

The detail dataset (blocking factor = 10) has 21 entries. The path has 4 chains:

6 entries have search field value A

8 entries have search field value B

4 entries have search field value C

3 entries have search field value D

A dot represents a free entry. The plus sign ("+") represents the first entry above the HighWater mark. Before repacking, the first 5 blocks of the detail dataset look like this:

CAD.ABBC.A .BCBAB.D.B A....DB... ...C.A.B.. ...+......

Detpack option 3 (chained) produces this result:

AAAAAABBBB BBBBCCCCDD D+........ .......... ..........

Detpack option 4 (SuperChained) minimizes the number of times that each chain (on the repacked path) crosses a block boundary. To accomplish this goal, this option does extra work (for instance, the detail entries for "C" are placed before the detail entries for "B"). Option 4 (SuperChained) produces this result:

AAAAAACCCC BBBBBBBBDD D+........ .......... ..........

DetPack SuperChained and RISC
Everything has a price and Detpack option 4 (SuperChained) is not an exception: It takes longer to perform than Detpack option 3 (chained).

In addition, under mapped-file access on RISC machines, the performance of your applications is about the same (or some times even worse, due to the lack of locality) after repacking a detail dataset with Option 4.

As a consequence, should you request Option 4 when running on a RISC computer, Adager automatically does Option 3 instead.

Your Adager Guide (Section 11 of 13)

First | Prev Table of Contents | Index Next | Last

	*Your Adager Guide*		(Section 11 of 13)
	First \| Prev	Table of Contents \| Index	Next \| Last