Next: Editing the result Up: Applying PhysioNet tools to Previous: Introduction: Using PhysioNet tools Contents

Subsections

Reading an ASCII file produced by commercial equipment and marking the QRS complexes

We begin with the file base that can be downloaded from Neurotraces. The directory contains the file in compressed form (done with gzip) and uncompressed forms; the uncompressed file is about six times as large as the compressed one. [These files may also be downloaded from PhysioNet; choose base (uncompressed, 584 Kb) or base.gz (compressed, 97 Kb).] To avoid conflict with other files, we will create a directory called Code, where we are going to locate our files. We assume that WFDB has been installed previously.

The base file

The file was created by a Nihon Kohden EEG-1100. Similar files can be created with many types of modern neurophysiological equipment. The file is part of a polysomnographic recording, and one of its signals is an electrocardiographic signal. By viewing the recording we select a part and store the result as an ASCII file.

Now the first step is taking a glance into the content of the file:

[j@localhost Code]$ cat base | more
TimePoints=3400 Channels=19 BeginSweep[ms]=0.00 Sampli
ngInterval[ms]=5.000 Bins/uV=1.000
C3-A2 C4-A1 O1-A2 O2-A1 T1-A1 T2-A1 PG1-PG2 T5-P3 P4-T
6 X1-X2 X3-X4 X5-X6 X7-X8 E-X9 E-X10 E-X11 DC01 DC02 D
C03
    9.56    21.32    -1.47    11.76    -8.82    -8.09
  -26.47     1.47    13.97   -66.18    -6.62   -12.50
  -44.12    33.09   -17.65    44.85 -923529.41 -2205.8
8 -376470.59
   -7.35   -35.29    -7.35     1.47    -7.35    37.50
  -22.06     1.47    19.12     0.00    -3.68   -10.29
  -22.06    31.62   -18.38    43.38 -924264.71 -2941.1
8 -375735.29
   -4.41    -7.35   -12.50     4.41    -2.94    18.38
   -2.94     1.47   -42.65     0.00    -1.47    -2.94
  -55.15    31.62   -19.12    43.38 -924264.71 -2941.1
8 -375000.00
    2.94    -0.74    -2.21     8.09    -4.41     4.41
   32.35     1.47   -63.97   -22.06     0.00     3.68
 -121.32    31.62   -16.91    44.12 -924264.71 -3676.4
7 -376470.59
   -1.47    -8.09    -4.41     8.82     1.47     9.56
--More--

By inspecting the content of the file we can see that it includes 3400 samples with 19 channels. Since the sampling interval is 5 ms (equivalent to a sampling rate of 200 Hz) we have 17 seconds of recording (3400 samples / 200 samples/second = 17 seconds). We also know that the values are expressed in $\mu$ V. The first lines describe the recording as well as the signals included in the recording. In our case, the electrocardiographic signal is included in the channel whose label is X1-X2. Each line has been folded to adapt its length to the window size. The first line begins with TimePoints..., the second one begins with C3-A2... the third one begins with 9.56... and the fourth one begins with -7.35....

ASCII files as glue between applications

Since it is a key point in exchanging information, I would like to discuss a little bit the use of ASCII files to share neurophysiological recordings:

A recording is represented in ASCII files as a matrix of values. Usually, each column is a different signal, and the file contains as many columns as signals are stored; each row represents the samples of these signals at the same time.

Unless we introduce a signal as time we do not know the sampling rate. This is an inconvenience because time is not stored in a standard location and then we have to introduce it by hand for further processing. Moreover, it is possible that we may not remember the sampling rate when we use the recording.
The introduction of a header mixes the description of the signals with the samples (the digitized values of the signals) themselves. In our file, the sample values (from the third line to the end of the file) and the signal descriptions (the first two lines) are mixed. Since the format is not standardized, these lines are different in files obtained using other manufacturers' equipment.
When we store polygraphic recordings in ASCII files, the size of the file is an inconvenience too. Let us consider what would happen if we decided to store the same information in binary format (3400 samples of 19 channels stored as two-byte integers). Each sample uses 38 bytes. A recording of 17 seconds sampled at 200 Hz as base uses 3400 samples, i.e. 129,200 bytes. Our file uses 598,203 bytes, five times more !! The size of the files is less and less important nowadays with massive storing disks but even so, some reduction in size would be nice. We can improve the efficiency of ASCII files by compressing them. A compressed form of base can be downloaded from Neurotraces [or from PhysioNet] being its size about 100,000 bytes, less than the size used when we store the signal in binary format. It has been compressed with gzip; if we use bzip2, the size is about 60,000 bytes, half of the size of the binary file. Of course, the compression of binary files also increases the efficiency of copying, transmitting, and (sometimes) reading them.

Considering these inconveniences, let us say something about their benefits.

ASCII files, even being so simple, are not a very bad format when they are compressed (binary files are frequently downloaded in a non-compressed form) or to store short signals.
Since we are not limited by any format, each data has arbitrary precision.
The main benefit, however is: They can be understood by almost any program, from spreadsheets to sophisticated digital signal processing packages.

In summary, the conversion from and to ASCII files is an important feature of any format.

Of course, representing a recording as a matrix of rows and columns does not readily allow a different sampling rate for each signal (to do this, we might define a code to indicate that a signal was not sampled at the time corresponding to a specific row), but even so I can foresee that ASCII files are going to be used for a long time (unless XML is quickly and universally adopted).

Creating a WFDB file

Our first task is to create a WFDB signal file from an ASCII file. To do this, we have a very easy command: wrsamp (something like write samples). Most WFDB applications show a short summary of how they are used if we type the name of the program (only) as a command; wrsamp shows us this description of itself:

[j@localhost Code]$ wrsamp
usage: wrsamp [OPTIONS ...] COLUMN [COLUMN ...]
where COLUMN selects a field to be copied (leftmost field is column 0),
and OPTIONS may include:
 -c          check that each input line contains the same number of fields
 -f N        start copying with line N (default: 0)
 -F FREQ     specify frequency to be written to header file (default: 250)
 -G GAIN     specify gain to be written to header file (default: 200)
 -h          print this usage summary
 -i FILE     read input from FILE (default: standard input)
 -l LEN      read up to LEN characters in each line (default: 1024)
 -o RECORD   save output in RECORD.dat, and generate a header file for
              RECORD (default: write to standard output in format 16, do
              not generate a header file)
 -r RSEP     interpret RSEP as the input line separator (default: \n)
 -s FSEP     interpret FSEP as the input field separator (default: space
              or tab)
 -t N        stop copying at line N (default: end of input file)
 -x SCALE    multiply all inputs by SCALE (default: 1)

A lot of interesting options. We have to indicate the sampling frequency (200 Hz); otherwise the program will assume that it is sampled at 250 Hz.

Electroencephalography or Electromyography amplitudes are usually expressed in $\mu$ V, so one of the options deserves more comment. Here is a more detailed description of this option from the WFDB Applications Guide:

-G n:

Specify the gain (in A/D units per millivolt) for the output
signals (default: 200). This option is useful only in 
conjunction with -o, since it affects the output header
file only. This option has no effect on the output signal
file. If you wish to rescale samples in the signal file, use -x.

Our ASCII file contains sample values in $\mu$ V, so there are 1000 A/D units per millivolt, and we should therefore specify a gain of 1000. If we do not consider this point, we will obtain a signal five times bigger (the default is 200). Another interesting option is -x, which directly modifies the input. It is an important option when our file contains values smaller than 1 (it is not the case in our signal).

We know that the ECG is contained in column 9 of our ASCII file, base. (WFDB numbers the columns beginning at 0). But will wrsamp be able to detect that the first two lines of the file are not data? Let's see.

[j@localhost Code]$ wrsamp -i base -F 200 -G 1000 -o ecg 9
wrsamp: line 0, column 9 missing
wrsamp: line 1, column 9 improperly formatted

Wrsamp detected that the first two lines were not properly formatted and emitted a message. We are impatient to see the result

[j@localhost Code]$ ls ecg*
ecg.dat  ecg.hea
[j@localhost Code]$ cat ecg.hea
ecg 1 200 3402
ecg.dat 16 1000 12 0 0 -25694 0 base, column 9

We created two files: a binary signal file, ecg.dat, that stores the digitized samples of the ECG signal, and a short text header file, ecg.hea, that contains information that will be needed by any WFDB application that reads the signal file.

A typical WFDB application reads a record, which is a collection of files that are all related to the same recording. It is important to understand that the name of the record we have just created is ecg, and not the name of either of the files that belong to this record. When we read these files later on, we will refer to them by the record name, ecg, and not by the names of the individual files.

We were lucky that wrsamp rejected the first two lines of base. If we had chosen a different column number, one or both of these lines might have been accepted, and our signal file would have a spurious sample or two at its beginning. Looking back at wrsamp's options, we can see that -f allows us to tell wrsamp where to begin; so in the future, if we know that there are two header lines in our input file, we will add ``-f 2'' to our wrsamp command.

Analyzing the files

At this moment we have a WFDB record containing the ECG of our recording. We are interested in detecting the heart rate of the signal. We are going to use the command sqrs

 
[j@localhost Code]$ sqrs -r ecg
[j@localhost Code]$ ls ecg*
ecg.dat  ecg.hea  ecg.qrs

A new file (ecg.qrs) has been added to the ecg record. It is an annotation file that contains the positions of the QRS complexes. We can read the annotations

 
[j@localhost Code]$ rdann -r ecg -a qrs | more
    0:00.110       22     N    0    0    0
    0:00.785      157     N    0    0    0
    0:01.450      290     N    0    0    0
    0:02.115      423     N    0    0    0
    0:02.790      558     N    0    0    0
    0:03.450      690     N    0    0    0
    0:04.110      822     N    0    0    0
    0:04.775      955     N    0    0    0
    0:05.445     1089...

Each line is a QRS complex that has been detected.

In summary

Let us recapitulate what we did in this section:

We had an ASCII file containing a segment of a polygraphic recording. We created a WFDB file with the content of one of the signals (by using wrsamp).
Then we created an annotation file with the positions at which the QRS complexes are located (by using sqrs)

But how can we be confident of the result? WFDB has a very nice tool to view and edit the result: wave. In the next section we are going to edit the result by using it.

The WFDB Software Package includes two QRS detectors, named sqrs and wqrs, and PhysioToolkit offers another one, named ecgpuwave. All of them are used in a similar way, and all of them create an annotation file containing the times of the QRS complexes that they detect. Each has advantages for some types of studies; you can read more about them in the WFDB Applications Guide.

Next: Editing the result Up: Applying PhysioNet tools to Previous: Introduction: Using PhysioNet tools Contents

j 2002-12-11