Many readers wish to convert binary data from PhysioBank (PhysioNet's data archive) into text form for further processing. There are many good reasons not to do so. If you are determined to do it anyway, here's how.
If you seek heart rate or interbeat (RR) interval time series, see the RR Intervals, Heart Rate, and HRV Howto. Otherwise, continue reading here.
Signals, annotations, and notes
For most PhysioBank recordings, there are at least three types of data that might be of interest:
- Signals such as ECGs, blood pressure waveforms, and other continuously recorded physiologic signals
- Annotations that describe events within the recording, such as heart beats or apneas
- Notes about the subject (age, gender, diagnoses, medications) or about the recording (signals recorded, type and location of transducers, sampling frequency, resolution, bandwidth), etc.
Choosing how to convert the data
Most PhysioBank signals and annotations are stored in binary formats, and can be read using the WFDB Software Package, a free, open source set of tools that can be downloaded from this web site and run on your computer. Once you have done so, it becomes very easy to convert any of the binary signals or annotations in PhysioBank into text form; the tools can even read the binary data directly from the PhysioNet web server if you haven't already downloaded them.
By default, the tools described below write their outputs into the text windows in which they are run. To save their outputs in files instead, simply redirect them using the ">" operator, as shown in the examples below.
If you need only a small amount of signals or annotations in text format,
you can run a few of the tools in the WFDB Software Package on the PhysioNet
web server, and save their outputs using your web browser. Since the PhysioNet
server is a shared resource, and since the binary data can be transmitted
about 10 times as rapidly as the same data in text format, this method is much
less efficient than running the same tools on your own computer. Since you
don't need to download and install the tools, however, you may find it quicker
to use this method for an initial exploration, or for a project with very
small data requirements. The choice is yours.
Converting signals to text
The tool for converting signals to text is rdsamp. If you have installed the WFDB Software Package, open a terminal window and type
rdsampfor brief instructions. For example, to convert all of the signal data for record 100 of the MIT-BIH Arrhythmia Database (mitdb) into a text file named 100.txt, type
rdsamp -r mitdb/100 >100.txtIt is not necessary to have downloaded the signal and header files that rdsamp needs as input, because (unless rdsamp finds local copies of them) it will read them directly from the PhysioNet web server. The text file output by this command contains three columns (the sample number followed by the samples from each of the two signals in the record); it begins like this:
0 995 1011 1 995 1011 2 995 1011 3 995 1011 4 995 1011 5 995 1011 6 995 1011 7 995 1011 8 1000 1008 9 997 1008and continues for a total of 650,000 lines containing 12,350,000 characters.
The next example illustrates a few of rdsamp's options that may be useful:
rdsamp -r mitdb/200 -f 5:0 -t 10:30 -p -v >200.txtHere, we have started converting samples five minutes from the beginning of the record (-f 5:0) and ending ten minutes and thirty seconds from the beginning of the record (-t 10:30). The -p option tells rdsamp to convert the sample values from raw A/D converter units (as in the previous example) to physical units, and the -v option tells rdsamp to write a set of column labels at the beginning of its output, which begins like this:
time MLII V1 (sec) (mV) (mV) 300.000 -0.095 -0.140 300.003 -0.110 -0.140 300.006 -0.110 -0.120 300.008 -0.115 -0.110 300.011 -0.115 -0.120 300.014 -0.110 -0.110 300.017 -0.100 -0.120
Without installing the WFDB Software Package, you can use the
PhysioBank ATM to run rdsamp on
the PhysioNet server, with a set of options similar to those in the
second example above. Follow the ATM link and select "Show samples
as text" from the toolbox.
The ATM limits the amount of output you can obtain in a single request
to 60,000 samples (at least a minute, often more) in order to limit
the impact on other PhysioNet users; note that this amount can still
exceed a megabyte and may take a while to receive if you have a slow
connection or at times when the PhysioNet server is especially busy.
Converting annotations to text
The tool for converting annotations to text is rdann. If you have installed the WFDB Software Package, open a terminal window and type
rdannfor brief instructions. For example, to convert all of the reference (atr) annotations for record 100 of the MIT-BIH Arrhythmia Database (mitdb) into a text file named 100.txt, type
rdann -r mitdb/100 -a atr >100.txtIt is not necessary to have downloaded the annotation and header files that rdann needs as input, because (unless rdann finds local copies of them) it will read them directly from the PhysioNet web server. The text file output by this command contains seven columns (the last column is usually empty, however); it begins like this:
0:00.050 18 + 0 0 0 (N 0:00.213 77 N 0 0 0 0:01.027 370 N 0 0 0 0:01.838 662 N 0 0 0 0:02.627 946 N 0 0 0 0:03.419 1231 N 0 0 0 0:04.208 1515 N 0 0 0and continues for a total of 2274 lines containing 97,785 characters.
The next example illustrates a few of rdann's options that may be useful:
rdann -r mitdb/200 -a atr -f 5:10 -t 10:30 -v >200.txtHere, we have started converting annotations five minutes and ten seconds from the beginning of the record (-f 5:10) and ending ten minutes and thirty seconds from the beginning of the record (-t 10:30). The -v option tells rdann to write a set of column labels at the beginning of its output, which begins like this:
Time Sample # Type Sub Chan Num Aux 5:10.277 111700 N 0 0 0 5:10.838 111902 N 0 0 0 5:11.391 112101 N 0 0 0 5:11.961 112306 N 0 0 0 5:12.525 112509 N 0 0 0 5:13.155 112736 N 0 0 0 5:13.480 112853 + 0 0 0 (B 5:13.805 112970 V 1 0 0 5:14.461 113206 N 0 0 0 5:15.011 113404 V 1 0 0 5:15.658 113637 N 0 0 0Each line (after the labels) corresponds to one annotation; most of the annotations in these examples are QRS (heart beat) annotations. The first column contains the elapsed time of the annotation from the beginning of the record (expressed as minutes, seconds, and milliseconds; in longer records, this column may include hours and even days). The second column also contains the elapsed time, in this case shown as the number of sample intervals from the beginning of the record. The third column contains a label indicating what type of event is being annotated (N means a normal beat, and V, a premature ventricular ectopic beat; see this table of standard annotation labels for more). The Sub, Chan, and Num columns do not have standard meanings; refer to the documentation for the database you are examining to see how these fields have been used. The Aux column, which is usually empty, is often used to describe an event with greater precision than the Type can do. For example, the annotation at 5:13.480 in the example above marks a change in the predominant cardiac rhythm using the Type "+", and the Aux "(B" indicates that the rhythm has changed to ventricular bigeminy (see footnote 2 in the table of standard annotation labels for a list of commonly used rhythm annotation strings).
Without installing the WFDB Software Package, you can use the
PhysioBank ATM to run rdann on
the PhysioNet server, with a set of options similar to those in the second
example above. Follow the ATM link and select "Show annotations as text"
from the toolbox.
Reading notes
Notes are stored in text format within the header (.hea) files associated with PhysioBank records. If you have installed the WFDB software package, you can read .hea files using wfdbdesc, an application that reformats the contents of .hea files into more easily readable English. Open a terminal window and type
wfdbdescfor brief instructions. For example, to read the notes for record 100 of the MIT-BIH Arrhythmia Database (mitdb) into a text file named 100.txt, type
wfdbdesc mitdb/100 >100.txt(Omit the ">100.txt" if you wish to read the notes directly in the terminal window.) It is not necessary to have downloaded the header file that wfdbdesc needs as input, because (unless wfdbdesc finds a local copy) it will read it directly from the PhysioNet web server.
Without installing the WFDB Software Package, you can use the PhysioBank ATM to run wfdbdesc on the PhysioNet server, with a set of options similar to those in the second example above. Follow the ATM link and select "Describe record" from the toolbox.
If you would like help understanding, using, or downloading content, please see our Frequently Asked Questions. If you have any comments, feedback, or particular questions regarding this page, please send them to the webmaster. Comments and issues can also be raised on PhysioNet's GitHub page. Updated Friday, 14-Oct-2016 22:18:45 CEST |
PhysioNet is supported by the National Institute of General Medical Sciences (NIGMS) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number 2R01GM104987-09.
|