ECG Database Applications Guide Table of Contents

NAME

header - ECG database header file format

DESCRIPTION

For each database record, a header file specifies the names of the associated signal files and their attributes. Programs compiled with the DB library (-ldb) can read header files created by newheader (see db(3) ). Header files contain line- and field-oriented ASCII text. ASCII linefeed characters separate lines (which may not contain more than 255 characters each, including the linefeed), and spaces or tabs separate fields (except as noted below). Beginning with DB library version 6.1, an ASCII carriage return character may precede each linefeed. Fields not specifically designated below as optional must be present.

Header files contain at a minimum a record line, which specifies the record name, the number of segments, and the number of signals. Header files for ordinary records (those that contain one segment) also contain a signal specification line for each signal. Header files for multi-segment records (supported by DB library version 9.1 and later versions) contain a segment specification line for each segment.

Comment lines may appear anywhere in a header file. The first printing character in a comment line must be `#'. Comment lines that follow the last signal specification line are treated specially (see Info strings, below). All other comment lines are ignored by DB library functions that read header files.

Record line

The first non-empty, non-comment line is the record line. It contains information applicable to all signals in the record. Its fields are, from left to right:

record name
A string of characters that identify the record. The record name may include letters, digits and underscores (`_') only.
number of segments [optional]
This field, if present, is not separated by whitespace from the record name field; rather, it follows a `/', which serves as a field separator. If the field is present, it indicates that the record is a multi-segment record containing the specified number of segments, and that the header file contains segment specification lines rather than signal specification lines. The number of segments must be greater than zero. A value of 1 in this field is legal, though unlikely to be useful.
number of signals
Note that this is not necessarily equal to the number of signal files, since two or more signals can share a signal file. This number must not be negative; a value of zero is legal, however.
sampling frequency (in samples per second per signal) [optional]
This number can be expressed in any format legal for scanf(3) input of floating point numbers (thus `360', `360.', `360.0', and `3.6e2' are all legal and equivalent). The sampling frequency must be greater than zero; if it is missing, a value of 250 (DEFREQ, defined in <ecg/db.h>) is assumed.
counter frequency (in ticks per second) [optional]
This field (a floating-point number, in the same format as the sampling frequency) can be present only if the sampling frequency is also present. It is not separated by whitespace from the sampling frequency field; rather, it follows a `/', which serves as a field separator. The sampling and counter frequencies are used by strtim to convert strings beginning with `c' into sample intervals. Typically, the counter frequency may be derived from an analog tape counter, or from page numbers in a chart recording. If the counter frequency is absent or not positive, it is assumed to be equal to the sampling frequency. DB library versions 5.1 and earlier ignore the counter frequency field.
base counter value [optional]
This field can be present only if the counter frequency is also present. It is not separated by whitespace from the counter frequency field; rather, it is surrounded by parentheses, which delimit it. The base counter value is a floating-point number that specifies the counter value corresponding to sample 0. If absent, the base counter value is taken to be zero. DB library versions 5.1 and earlier ignore the base counter value field.
number of samples per signal [optional]
This field can be present only if the sampling frequency is also present. If it is zero or missing, the number of samples is unspecified and checksum verification of the signals is disabled.
base time [optional]
This field can be present only if the number of samples is also present. It gives the time of day that corresponds to the beginning of the record, in HH:MM:SS format (using a 24-hour clock; thus 13:05:00, or 13:5:0, represent 1:05 pm). If this field is absent, the time-conversion functions assume a value of 0:0:0, corresponding to midnight.
base date [optional]
This field can be present only if the base time is also present. It contains the date that corresponds to the beginning of the record, in DD/MM/YYYY format (e.g., 25/4/1989 is 25 April 1989).

Signal specification lines

Each non-empty, non-comment line following the record line in a single-segment record contains specifications for one signal, beginning with signal 0. Header files must contain valid signal specification lines for at least as many signals as were indicated in the record line (the first non-empty, non-comment line in the file). Any extra signal specification lines are not read by DB library functions. From left to right in each line, the fields are:

file name
The name of the file in which samples of the signal are kept. The environment variable DB (the database path) lists the directories in which signal files (as well as DB header and annotation files) are found; normally DB should include an initial empty component, so that signal files can be kept in any directory if they are designated by absolute path names in the header file. If the file name specifies that the signal file is to be found in a directory that is not already in DB, that directory is appended to the end of DB (by functions that read header files in DB library version 6.2 and later versions). Although the record name is usually part of the signal file name, this convention is not a requirement (see, e.g., examples 3, 4, and 5 below). Note that several signals can share the same file (i.e., they can belong to the same signal group); all entries for signals that share a given file must be consecutive, however. The file name `-' refers to the standard input or output. The sum of the lengths of the file name and description fields (see below) is limited to 80 characters.
format
This field is an integer that specifies the storage format of the signal. All signals in a given group are stored in the same format. The most common formats are format 8 (eight-bit first differences) and format 16 (sixteen-bit amplitudes); see signal(5) (or <ecg/db.h>) for a list of other supported formats. The following three optional fields, if present, are bound to the format field (i.e., not separated from it by whitespace); they may be considered as format modifiers, since they further describe the encoding of samples within the signal file.
samples per frame [optional]
If present, this field follows an `x' that serves as a field separator. Normally, all signals in a given record are sampled at the (base) sampling frequency as specified in the record line; in this case, the number of samples per frame is 1 for all signals, and this field is conventionally omitted. If the signal was sampled at some integer multiple, n, of the base sampling frequency, however, each frame (set of samples returned by getframe) contains n samples of the signal, and the value specified in this field is also n. (Note that non-integer multiples of the base sampling frequency are not supported.) DB library versions 8.3 and earlier ignore this field if it is present, and cannot properly read signal files that contain more than one sample per signal per frame.
skew [optional]
If present, this field follows a `:' that serves as a field separator. Ideally, within a given record, samples of different signals with the same sample number are simultaneous (within one sampling interval). If this is not the case (as, for example, when a multitrack analog tape recording is digitized and the azimuth of the playback head does not match that of the recording head), the skew between signals can sometimes determined (for example, by locating recorded waveform features with known time relationships, such as calibration signals). If this has been done, the skew field may be inserted into the header file to indicate the (positive) number of samples of the signal that are considered to precede sample 0. These samples, if any, are included in the checksum, but cannot be returned by getvec or getframe (thus the checksum need not be changed if the skew field is inserted or modified). DB library versions 9.1 and earlier ignore this field if it is present; later versions correctly deskew signals in accordance with the contents of this field.
byte offset [optional]
If present, this field follows a `+' that serves as a field separator. Normally, signal files include only sample data. If a signal file includes a preamble, however, this field specifies the offset in bytes from the beginning of the signal file to sample 0 (i.e., the length of the preamble). Data within the preamble is not included in the signal checksum. Note that the byte offset must be the same for all signals within a given group (use the skew field to correct for intersignal skew). This feature is provided only to simplify the task of reading signal files not generated using the DB library; the DB library does not support any means of writing such files, and byte offsets must be inserted into header files manually. DB library versions 4.4 and earlier ignore byte offsets; these versions return any preamble data as samples.
ADC gain (ADC units per physical unit) [optional]
This field is a floating-point number that specifies the difference in sample values that would be observed if a step of one physical unit occurred in the original analog signal. For ECGs, the gain is usually roughly equal to the R-wave amplitude in a lead that is roughly parallel to the mean cardiac electrical axis. If the gain is zero or missing, this indicates that the signal amplitude is uncalibrated; in such cases, a value of 200 (DEFGAIN, defined in <ecg/db.h>) ADC units per physical unit may be assumed.
baseline (ADC units) [optional]
This field can be present only if the ADC gain is also present. It is not separated by whitespace from the ADC gain field; rather, it is surrounded by parentheses, which delimit it. The baseline is an integer that specifies the sample value corresponding to 0 physical units. If absent, the baseline is taken to be equal to the ADC zero. Note that the baseline need not be a value within the ADC range; for example, if the ADC input range corresponds to 200-300 degrees Kelvin, the baseline is the (extended precision) value that would map to 0 degrees Kelvin. DB library versions 5.0 and earlier ignore baseline fields.
units [optional]
This field can be present only if the ADC gain is also present. It follows the baseline field if that field is present, or the gain field if the baseline field is absent. It is not separated by whitespace from the previous field; rather, it follows a `/', which serves as a field separator. The units field is a character string without embedded whitespace that specifies the type of physical unit. If the units field is absent, the physical unit may be assumed to be one millivolt. DB library versions 4.7 and earlier ignore units fields.
ADC resolution (bits) [optional]
This field can be present only if the ADC gain is also present. It specifies the resolution of the analog-to-digital converter used to digitize the signal. Typical ADCs have resolutions between 8 and 16 bits. If this field is missing or zero, the default value is 12 bits for amplitude-format signals, or 10 bits for difference-format signals (unless a lower value is specified by the format field).
ADC zero [optional]
This field can be present only if the ADC resolution is also present. It is an integer that represents the amplitude (sample value) that would be observed if the analog signal present at the ADC inputs had a level that fell exactly in the middle of the input range of the ADC. For a bipolar ADC, this value is usually zero, but a unipolar (offset binary) ADC usually produces a non-zero value in the middle of its range. Together with the ADC resolution, the contents of this field can be used to determine the range of possible sample values. If this field is missing, a value of zero is assumed.
initial value [optional]
This field can be present only if the ADC zero is also present. It specifies the value of sample 0 in the signal, but is used only if the signal is stored in difference format. If this field is missing, a value equal to the ADC zero is assumed.
checksum [optional]
This field can be present only if the initial value is also present. It is a 16-bit signed checksum of all samples in the signal. (Thus the checksum is independent of the storage format.) If the entire record is read without skipping samples, and the header's record line specifies the correct number of samples per signal, this field is compared against a computed checksum to verify that the signal file has not been corrupted. A value of zero may be used as a field placeholder if the number of samples is unspecified.
block size [optional]
This field can be present only if the checksum is present. This field is an integer and is usually zero. If the signal is stored in a file that must be read in blocks of a specific size, however, this field specifies the block size in bytes. (On UNIX systems, this is the case only for character special files, corresponding to certain tape and raw disk files. If necessary, the block size may be given as a negative number to indicate that the associated file lacks I/O driver support for fseek(3) operations.) All signals belonging to the same signal group have the same block size.
description [optional]
This field can be present only if the block size is present. Any text between the block size field and the end of the line is taken to be a description of the signal. When creating new records, follow the style used to document the signals in existing header files. Unlike the other fields in the header file, the description may include embedded spaces; note that whitespace between the block size and description fields is not considered to be part of the description, however. If the description is missing, the DB library functions that read header files supply a description of the form ``record xxx, signal n''.

Segment specification lines

Each non-empty, non-comment line following the record line in a multi-segment record contains specifications for one segment, beginning with segment 0. Header files must contain valid segment specification lines for at least as many segments as were indicated in the record line. Any extra segment specification lines are not read by DB library functions.

A segment is simply an ordinary (single-segment) record, with its own header and signal files. By including segments in a multi-segment record, the signals within them can be read by DB applications as if they were continuous signals, beginning with those in segment 0 and continuing with those in segment 1, with no need for the applications to do anything special to move from one segment to another. The only restrictions are that segments cannot themselves contain other segments (they must be single-segment records), and the number of samples per signal must be defined for each segment in the record line of the segment's own header file. In addition, the number of signals and the sampling frequency should match in all segments of a record, and it is best if the signal gain, baseline, units, ADC resolution and zero, and description match for corresponding signals in all segments (these recommendations are not enforced by the DB library, but existing applications are likely to behave unpredictably if they are not followed). Note, however, that it is not necessary to use the same signal storage format in all segments, and significant space savings may be possible in some cases by selecting an optimal format for each segment.

Each segment specification line contains the following fields, separated by whitespace:

record name
A string of characters identifying the single-segment record that comprises the segment. As in the record line, the record name may include letters, digits, and underscores (`_') only.
number of samples per signal
This number must match the number specified in the header file for the single-segment record that comprises the segment.

Info strings

Comment lines that follow the last signal specification line in a header file can be read and written by the DB library functions getinfo and putinfo; the contents of these lines (excluding the initial `#' comment character) are referred to as `info strings'. There must be no whitespace preceding the initial `#' in any line that is to be recognized by getinfo.

Examples:

Example 1 (MIT DB record 100):
100 2 360 650000 0:0:0 0/0/0
100.dat 212 200 11 1024 995 -22131 0 MLII
100.dat 212 200 11 1024 1011 20052 0 V5

# 69 M 1085 1629 x1
# Aldomet, Inderal

This header specifies 2 signals each sampled at 360 Hz, each 650000 samples (slightly over 30 minutes) long. The starting time and date were not recorded; in the example, the defaults are shown, but they might be omitted without changing the meaning of the header file. Each signal is stored in 12-bit bit-packed format (2 samples per 3 bytes; see signal(5) for details), and one file contains both signals. Since the filename given (100.dat) does not include path information, DB library-based programs will find the signal file only if it is located in one of the directories specified by the DB environment variable. The gain for each signal was the (default) 200 ADC units per millivolt (the default physical unit), and the ADC had 11-bit resolution and an offset such that its output was 1024 ADC units given an input exactly in the middle of its range. The baseline is not given explicitly, but may be assumed to be equal to the ADC zero value of 1024. The first samples acquired had values of 995 and 1011 (i.e., both signals began slightly below 0 VDC). The checksums of the 650000 samples are -22131 and 20052, and I/O may be performed in blocks of any desired size (since the block size fields are zero). The signal descriptions specify which leads were used (MLII: modified lead II). Finally, the last two lines contain `info strings'. (In this example, the first info string specifies the sex and age of the subject and data about the recording, and the second lists the subject's medications. The contents and format of info strings vary between databases; it is not wise to rely on the presence of specific data in info strings, since their use in header files is optional.)

Example 2 (AHA DB record 7001):
7001 2 250 525000
/db1/data0/d0.7001 8 100 10 0 -53 -1279 0 ECG signal 0
/db1/data1/d1.7001 8 100 10 0 -69 15626 0 ECG signal 1

This header illustrates how on-line AHA DB records were formerly kept at MIT. Note that the sampling frequency and ADC specifications differ from the previous example. In this example, each signal is kept in its own signal file, specified by its absolute pathname. As shown here, AHA DB records may be kept in 8-bit first difference format, but the sampling rate requires that the signals be scaled down (from 12-bit to 10-bit ADC resolution) to stay within the slew rate limits imposed by the format. Note that signal checksums (-1279 and 15626 in this example) are derived from the reconstructed sample values, and not from the first differences; thus they should not change if the signals are reformatted.

Example 3 (Local record 8l):
8l 16
data0 8
data1 8
...
data15 8

This example illustrates how relative pathnames can be used for user-created records. If data* files in the proper format are created in any of the directories named by the DB environment variable, they become the signal files for record 8l.

Example 4 (Piped record 16x4):
# Piped record 16x4. Use this record to read or write 4 signals
# using the standard I/O.
16x4 4
- 16
- 16
- 16
- 16

This example illustrates several features not seen in the earlier examples. The special file name `-' means that samples will be read from the standard input or written to the standard output when using this record. All four signals are associated with the same file. The signals are kept in 16-bit amplitude format. The example includes two comment lines, which are ignored by the DB library functions that read header files.

Example 5 ("ahatape" header file):
# Use this record on a UNIX system to read directly
# from a 9-track AHA DB distribution tape with
# 4096-byte blocks. The tape must be positioned
# to the beginning of the ECG data file before
# using this record.

ahatape 2 250
/dev/nrmt0 16 0 12 0 0 0 4096
/dev/nrmt0 16 0 12 0 0 0 4096

As in the previous example, both signals are associated with the same file; in this case, the file is /dev/nrmt0, the non-rewinding raw 9-track tape drive (on some systems, the name of this device may differ). The block size must be specified in this case, since I/O to or from a raw device (character special file) is not buffered by the operating system and must be performed in the units appropriate to the device (in this case, the tape block size). AHA DB tapes written at 1600 bpi contain 4096 bytes per block (i.e., 1024 two-byte samples from each of the two signals).

Example 6 ("multi" header file):
multi/3 2 360 45000
100s 21600
null 1800
100s 21600

This header file is a sample of a multi-segment record. The first line contains the record name ("multi"), the number of segments (there are 3), the number of signals (2; this must be the same in each segment), the sampling frequency (360), and the total length of the record in sample intervals (45000; this must be the sum of the segment lengths).

The second line contains the record name ("100s") of the first segment of the record, and its length in sample intervals (21600). The third and fourth lines contain the record names and lengths of the remaining segments. The remaining lines are comments.

Note that a segment may appear more than once in a multi-segment record, as in this sample, and that storage formats may vary between segments (the second segment is a "null" record, containing format 0 "signals", and the others are written in format 8).

This record may be read by any DB application built using DB library version 9.1 or later; the application need not be aware that this is a multi-segment record. Earlier versions of the DB library do not support multi-segment records (or format 0 signals).

Old format

Versions 2.3 through 4.6 of the DB library included support for reading header files written in an obsolete format. This support has been removed from DB library version 5.0. Obsolete-format header files can be brought up-to-date using revise (in the convert directory of the DB software distribution).

SEE ALSO

ECG Database Programmer's Guide


Table of Contents