How to obtain PhysioBank data in text form

The new PhysioNet website is available at: https://physionet.org. We welcome your feedback.

Many readers wish to convert binary data from PhysioBank (PhysioNet's data archive) into text form for further processing. There are many good reasons not to do so. If you are determined to do it anyway, here's how.

If you seek heart rate or interbeat (RR) interval time series, see the RR Intervals, Heart Rate, and HRV Howto. Otherwise, continue reading here.

Signals, annotations, and notes

For most PhysioBank recordings, there are at least three types of data that might be of interest:

The distinction between the second and third types of data is that annotations point to local or transient events and identify their locations (times) within a recording, whereas notes describe global or persistent characteristics of an entire recording (or, sometimes, extended segments of a recording).

Choosing how to convert the data

Most PhysioBank signals and annotations are stored in binary formats, and can be read using the WFDB Software Package, a free, open source set of tools that can be downloaded from this web site and run on your computer. Once you have done so, it becomes very easy to convert any of the binary signals or annotations in PhysioBank into text form; the tools can even read the binary data directly from the PhysioNet web server if you haven't already downloaded them.

By default, the tools described below write their outputs into the text windows in which they are run. To save their outputs in files instead, simply redirect them using the ">" operator, as shown in the examples below.

If you need only a small amount of signals or annotations in text format, you can run a few of the tools in the WFDB Software Package on the PhysioNet web server, and save their outputs using your web browser. Since the PhysioNet server is a shared resource, and since the binary data can be transmitted about 10 times as rapidly as the same data in text format, this method is much less efficient than running the same tools on your own computer. Since you don't need to download and install the tools, however, you may find it quicker to use this method for an initial exploration, or for a project with very small data requirements. The choice is yours.

Converting signals to text

The tool for converting signals to text is rdsamp. If you have installed the WFDB Software Package, open a terminal window and type

rdsamp
for brief instructions. For example, to convert all of the signal data for record 100 of the MIT-BIH Arrhythmia Database (mitdb) into a text file named 100.txt, type
rdsamp -r mitdb/100 >100.txt
It is not necessary to have downloaded the signal and header files that rdsamp needs as input, because (unless rdsamp finds local copies of them) it will read them directly from the PhysioNet web server. The text file output by this command contains three columns (the sample number followed by the samples from each of the two signals in the record); it begins like this:
     0    995    1011
     1    995    1011
     2    995    1011
     3    995    1011
     4    995    1011
     5    995    1011
     6    995    1011
     7    995    1011
     8   1000    1008
     9    997    1008
and continues for a total of 650,000 lines containing 12,350,000 characters.

The next example illustrates a few of rdsamp's options that may be useful:

rdsamp -r mitdb/200 -f 5:0 -t 10:30 -p -v >200.txt
Here, we have started converting samples five minutes from the beginning of the record (-f 5:0) and ending ten minutes and thirty seconds from the beginning of the record (-t 10:30). The -p option tells rdsamp to convert the sample values from raw A/D converter units (as in the previous example) to physical units, and the -v option tells rdsamp to write a set of column labels at the beginning of its output, which begins like this:
  time    MLII    V1
(sec)   (mV)    (mV)
300.000  -0.095  -0.140
300.003  -0.110  -0.140
300.006  -0.110  -0.120
300.008  -0.115  -0.110
300.011  -0.115  -0.120
300.014  -0.110  -0.110
300.017  -0.100  -0.120

Without installing the WFDB Software Package, you can use the PhysioBank ATM to run rdsamp on the PhysioNet server, with a set of options similar to those in the second example above. Follow the ATM link and select "Show samples as text" from the toolbox. The ATM limits the amount of output you can obtain in a single request to 60,000 samples (at least a minute, often more) in order to limit the impact on other PhysioNet users; note that this amount can still exceed a megabyte and may take a while to receive if you have a slow connection or at times when the PhysioNet server is especially busy.

Converting annotations to text

The tool for converting annotations to text is rdann. If you have installed the WFDB Software Package, open a terminal window and type

rdann
for brief instructions. For example, to convert all of the reference (atr) annotations for record 100 of the MIT-BIH Arrhythmia Database (mitdb) into a text file named 100.txt, type
rdann -r mitdb/100 -a atr >100.txt
It is not necessary to have downloaded the annotation and header files that rdann needs as input, because (unless rdann finds local copies of them) it will read them directly from the PhysioNet web server. The text file output by this command contains seven columns (the last column is usually empty, however); it begins like this:
    0:00.050       18     +    0    0    0      (N
    0:00.213       77     N    0    0    0
    0:01.027      370     N    0    0    0
    0:01.838      662     N    0    0    0
    0:02.627      946     N    0    0    0
    0:03.419     1231     N    0    0    0
    0:04.208     1515     N    0    0    0
and continues for a total of 2274 lines containing 97,785 characters.

The next example illustrates a few of rdann's options that may be useful:

rdann -r mitdb/200 -a atr -f 5:10 -t 10:30 -v >200.txt
Here, we have started converting annotations five minutes and ten seconds from the beginning of the record (-f 5:10) and ending ten minutes and thirty seconds from the beginning of the record (-t 10:30). The -v option tells rdann to write a set of column labels at the beginning of its output, which begins like this:
      Time   Sample #  Type  Sub Chan  Num      Aux
    5:10.277   111700     N    0    0    0
    5:10.838   111902     N    0    0    0
    5:11.391   112101     N    0    0    0
    5:11.961   112306     N    0    0    0
    5:12.525   112509     N    0    0    0
    5:13.155   112736     N    0    0    0
    5:13.480   112853     +    0    0    0      (B
    5:13.805   112970     V    1    0    0
    5:14.461   113206     N    0    0    0
    5:15.011   113404     V    1    0    0
    5:15.658   113637     N    0    0    0
Each line (after the labels) corresponds to one annotation; most of the annotations in these examples are QRS (heart beat) annotations. The first column contains the elapsed time of the annotation from the beginning of the record (expressed as minutes, seconds, and milliseconds; in longer records, this column may include hours and even days). The second column also contains the elapsed time, in this case shown as the number of sample intervals from the beginning of the record. The third column contains a label indicating what type of event is being annotated (N means a normal beat, and V, a premature ventricular ectopic beat; see this table of standard annotation labels for more). The Sub, Chan, and Num columns do not have standard meanings; refer to the documentation for the database you are examining to see how these fields have been used. The Aux column, which is usually empty, is often used to describe an event with greater precision than the Type can do. For example, the annotation at 5:13.480 in the example above marks a change in the predominant cardiac rhythm using the Type "+", and the Aux "(B" indicates that the rhythm has changed to ventricular bigeminy (see footnote 2 in the table of standard annotation labels for a list of commonly used rhythm annotation strings).

Without installing the WFDB Software Package, you can use the PhysioBank ATM to run rdann on the PhysioNet server, with a set of options similar to those in the second example above. Follow the ATM link and select "Show annotations as text" from the toolbox.

Reading notes

Notes are stored in text format within the header (.hea) files associated with PhysioBank records. If you have installed the WFDB software package, you can read .hea files using wfdbdesc, an application that reformats the contents of .hea files into more easily readable English. Open a terminal window and type

wfdbdesc
for brief instructions. For example, to read the notes for record 100 of the MIT-BIH Arrhythmia Database (mitdb) into a text file named 100.txt, type
wfdbdesc mitdb/100 >100.txt
(Omit the ">100.txt" if you wish to read the notes directly in the terminal window.) It is not necessary to have downloaded the header file that wfdbdesc needs as input, because (unless wfdbdesc finds a local copy) it will read it directly from the PhysioNet web server.

Without installing the WFDB Software Package, you can use the PhysioBank ATM to run wfdbdesc on the PhysioNet server, with a set of options similar to those in the second example above. Follow the ATM link and select "Describe record" from the toolbox.

Questions and Comments

If you would like help understanding, using, or downloading content, please see our Frequently Asked Questions.

If you have any comments, feedback, or particular questions regarding this page, please send them to the webmaster.

Comments and issues can also be raised on PhysioNet's GitHub page.

Updated Friday, 14-Oct-2016 22:18:45 CEST

PhysioNet is supported by the National Institute of General Medical Sciences (NIGMS) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number 2R01GM104987-09.