next up previous
Next: Tuning in Practice
Up: TUNING
Previous: Setting HDS Tuning Parameters

Tuning Parameters Available   

HDS currently uses the following tuning parameters to control its behaviour.

 INALQ - Initial File Allocation Quantity:

This value determines how many blocks[*] are to be allocated when a new container file is created. The default value of 2 is the minimum value allowed; the first block contains header information and the second contains the top-level object. Note that the host operating system may impose further restrictions on allowable file sizes, so the actual size of a file may not match the value specified exactly. The value of this parameter reverts to its default value (or the value specified by the HDS_INALQ environment variable) after each file is created, so if it is being set from within a program, it must be set every time that it is required.

If a file is to be extended frequently (through the creation of new objects within it), then this parameter may provide a worthwhile efficiency gain by allowing a file of a suitable size to be created initially. On most UNIX systems, however, the benefits are minimal.

 MAP - Use file mapping if available?

This value controls the method by which HDS performs I/O operations on the values of primitive objects and may take the following values:

MAP=1:

Use "file mapping" (if supported) as the preferred method of accessing primitive data.
MAP=0:

Use read/write operations (if supported) as the preferred data access method.

MAP=-1:

Use whichever method is normally faster for sequential access to all elements of a large array of data.

MAP=-2:

Use whichever method is normally faster for sparse random access to a large array of data.

MAP=-3:

Use whichever method normally makes the smaller demand on system memory resources (normally this means a request to minimise use of address space or swap file space, but the precise interpretation is operating system dependent). This is normally the appropriate option if you intend to use HDS arrays as temporary workspace.

HDS converts all other values to one. The value may be changed at any time.

A subsequent call to HDS_GTUNE, specifying the `MAP' tuning parameter, will return 0 or 1 to indicate which option was actually chosen. This may depend on the capabilities of the host operating system and the particular implementation of HDS in use. The default value for this tuning parameter is also system dependent (see §[*]).

Typically, file mapping has the following plus and minus points:

+
It allows large arrays accessed via the HDS mapping routines to be sparsely accessed in an efficient way. In this case, only those regions of the array actually accessed will need to be read/written, as opposed to reading the entire array just to access a small fraction of it. This might be useful, for instance, if a 1-dimensional profile through a large image were being generated.
+
It allows HDS container files to act as "backing store" for the virtual memory associated with objects accessed via the mapping routines. The operating system can then use HDS files, rather than its own backing (swap) file, to implement virtual memory management. This means that you do not need to have a large system backing file available in order to access large datasets.

+
For the same reason, temporary objects created with DAT_TEMP and mapped to provide temporary workspace make no additional demand on the system backing file.

?
On some operating systems file mapping may be less efficient in terms of elapsed time than direct read/write operations. Conversely, on some operating systems it may be more efficient.

-
Despite the memory efficiency of file mapping, there may be a significant efficiency penalty when large arrays are mapped to provide workspace. This is because the scratch data will often be written back to the container file when the array is unmapped (despite the fact that the file is about to be deleted). This can take a considerable time and cannot be prevented as the operating system has control over this process.

Unfortunately, on some operating systems, this process appears to occur even when normal system calls are used to allocate memory because file mapping is used implicitly. In this case, HDS's file mapping is at no particular disadvantage.

-
Not all operating systems support file mapping and it generally requires system-specific programming techniques, making it more trouble to implement on a new operating system.

Using read/write access has the following advantages and disadvantages:

+?
On some operating systems it may be more efficient than file mapping in terms of elapsed time in cases where an array of data will be accessed in its entirety (the normal situation). This is generally not true of UNIX systems, however,
-
It is an inefficient method of accessing a small subset of a large array because it requires the entire array to be read/written. The solution to this problem is to explicitly access the required subset using (e.g.) DAT_SLICE, although this complicates the software somewhat.

-
It makes demands on the operating system's backing file which the file mapping technique avoids (see above). As a result, there is little point in creating scratch arrays with DAT_TEMP for use as workspace unless file mapping is available (because the system backing file will be used anyway).

?
If an object is accessed several times simultaneously using HDS mapping routines, then modifications made via one mapping may not be consistently reflected in the other mapping (modifications will only be updated in the container file when the object is unmapped, so the two mappings may get out of step in the mean time). Conversely, if file mapping is in use and a primitive object is mapped in its entirety without type conversion, then this behaviour does not occur (all mappings remain consistent). It may occur, however, if a slice is being accessed or if type conversion is needed.

It is debatable which behaviour is preferable. The best policy is to avoid the problem entirely by not utilising multiple access to the same object while modifications are being made.

 MAXWPL - Maximum Size of the "Working Page List":

This value specifies how many blocks[*] are to be allocated to the memory cache which HDS uses to hold information about the structure of HDS files and objects and to buffer its I/O operations when obtaining this information. The default value is 32 blocks; this value cannot be decreased. Modifications to this value will only have an effect if made before HDS becomes active (i.e. before any call is made to another HDS routine).

There will not normally be any need to increase this value unless excessively complex data structures are being accessed with very large numbers of locators simultaneously active.

 NBLOCKS - Size of the internal "Transfer Buffer":

When HDS has to move large quantities of data from one location to another, it often has to store an intermediate result. In such cases, rather than allocate a large buffer to hold all the intermediate data, it uses a smaller buffer and performs the transfer in pieces. This parameter specifies the maximum size in blocks[*] which this transfer buffer may have and is constrained to be no less than the default, which is 32 blocks.

The value should not be too small, or excessive time will be spent in loops which repeatedly refill the buffer. Conversely, too large a value will make excessive demands on memory. In practice there is a wide range of acceptable values, so this tuning parameter will almost never need to be altered.

 NCOMP - Optimum number of structure components:

This value may be used to specify the expected number of components which will be stored in an HDS structure. HDS does not limit the number of structure components, but when a structure is first created, space is set aside for creation of components in future. If more than the expected number of components are subsequently created, then HDS must eventually re-organise part of the container file to obtain the space needed. Conversely, if fewer components are created, then some space in the file will remain unused. The value is constrained to be at least one, the default being 6 components.

The value of this parameter is used during the creation of the first component in every new structure. It reverts to its default value (or the value specified by the HDS_NCOMP environment variable) afterwards, so if it is being set from within a program, it must be set every time it is needed.

 SHELL - Preferred shell:

This parameter determines which UNIX shell should be used to interpret container file names which contain "special" characters representing pattern-matching, environment variable substitution, etc. Each shell typically has its own particular way of interpreting these characters, so users of HDS may wish to select the same shell as they normally use for entering commands. The following values are allowed:

SHELL=2:

Use the "tcsh" shell (if available). If this is not available, then use the same shell as when SHELL=1.
SHELL=1:

Use the "csh" shell (C shell on traditional UNIX systems). If this is not available, then use the same shell as when SHELL=0.

SHELL=0 (the default):

Use the "sh" shell. This normally means the Bourne Shell on traditional UNIX systems, but on systems which support it, the similar POSIX "sh" shell may be used instead.

SHELL=-1:

Don't use any shell for interpreting single file names (all special characters are to be interpreted literally). When performing "wild-card" searches for multiple files (with HDS_WILD), use the same shell as when SHELL=0.

HDS converts all other values to zero.

 SYSLCK - System wide lock flag:

This parameter is present for historical reasons and has no effect on UNIX systems.

 WAIT - Wait for locked files?

This parameter is present for historical reasons and currently has no effect on UNIX systems, where HDS file locking is not implemented.



next up previous
Next: Tuning in Practice
Up: TUNING
Previous: Setting HDS Tuning Parameters

HDS Hierarchical Data System
Starlink User Note 92
R.F. Warren-Smith & M.D. Lawden
23rd February 1999
E-mail:rfws@star.rl.ac.uk

Copyright (C) 1999 Central Laboratory of the Research Councils