An Introduction to the PhysioBank Archives

The new PhysioNet website is available at: https://physionet.org. We welcome your feedback.

PhysioBank currently contains over 36,000 recordings of annotated, digitized physiologic signals and time series, organized in over 50 databases (collections of recordings). All are freely available from PhysioNet, but where should one begin looking among nearly 4 terabytes of data?

Getting Started

Try the PhysioBank ATM first. This allows you to view any of the recordings and annotations in PhysioBank within your web browser. Choose a few records from several different databases to get a sense of the variety of data available. The PhysioBank ATM is best suited for a quick overview rather than in-depth study, because it is slower to generate images on the PhysioNet server than to download the raw data for processing by your own computer.

If you choose a record that contains ECG signals, you will probably see "·" (normal sinus beat) annotations and perhaps a few other types of annotations in the ATM output. If the mnemonics used to display these annotations are unfamiliar, you may wish to look them up in a table of the annotation codes displayed by the ATM and other PhysioToolkit software.

Next, get an overview of PhysioBank with a look at the PhysioBank Archive Index. All of the available databases are listed there, according to the types of signals contained in each. Choose one of these databases and follow the links to read about it and to gain access to the records it contains (read more about records).

If you need only a small amount of data for study, the PhysioBank ATM allows you to obtain up to 100,000 samples (at least a minute, usually more) of digitized signals of your choice in text format, and it similarly converts any amount of annotations into text format. You can view and save either type of text output using your web browser.

Using the PhysioBank ATM allows you to explore PhysioBank with nothing more than a web browser, but you should choose appropriate PhysioToolkit software before beginning a project that requires large amounts of data. Much of this software can access PhysioNet and other web servers directly, allowing you to draw on the resources of PhysioBank without having to download and store huge amounts of data in advance.

Downloading

It is worth repeating the last point in the previous section: most of the large amount of software available freely in PhysioToolkit can read PhysioBank data directly from the PhysioNet web server without the use of a web browser and without the need to copy data first to your own disk drive. These applications retrieve only the data needed (it is not necessary, for example, to download an entire 40-hour record in order to study a 5-minute region of interest in the middle of the record). The same applications can read data files from your own disk, so you do not need to learn to use a new set of tools if and when you decide it's time to download the data that you wish to study intensively. You may be able to save considerable time and disk space by avoiding downloading entirely, or by postponing downloading until you have explored the data enough to choose a subset of interest.

Before downloading large amounts of data, consider using one of the PhysioNet mirrors if your connection to the master PhysioNet web server (at MIT in Cambridge, Massachusetts) is slow. (Some mirrors provide only the PhysioBank core collection; if you use one of these mirrors, you will be redirected to the master server if you follow a link to a recording not in the core collection.)

Once you have chosen data to download, be sure to download all of the files associated with each record of interest. There are at least 2 and usually 3 or more such files in each case; see the individual listings for each database for details. The details of how to download a file depend on your browser. Most browsers allow you to download a file by right-clicking on the link to the file you wish to download and then choosing Save Target As... (or similar) from the pop-up menu that appears; another method supported by most browsers is to press and hold the Shift key while left-clicking on the link. See the PhysioNet FAQ for additional hints about downloading files.

The PhysioBank ATM can package all of the files associated with any single record into a tar archive or a zip file. This may be particularly convenient if you wish to download the multi-segment records of the MIMIC or MIMIC II databases, which may contain hundreds of files each.

If you wish to download all of the files in any of the PhysioBank databases without selecting each one individually, try using a utility for batch HTTP transfers such as wget, available here in source form for all versions of UNIX and as a precompiled binary for MS-Windows. Most Linux distributions include wget. Once you have installed wget, retrieve a batch of files using a command such as

wget -r -np http://physionet.cps.unizar.es/physiobank/database/mitdb/

(or substitute the name of a nearby PhysioNet mirror for physionet.cps.unizar.es above). See the PhysioNet FAQ for other ways to download complete databases efficiently.

Be sure to download the text file wfdbcal, which contains information about the customary scales used by software such as WAVE and pschart for displaying or plotting signals of various types. Install it in the same directory as the data files.

Databases

In this context, a database is simply a collection of recordings (records), available as a set of flat files. In contrast to typical relational databases, PhysioBank databases consist of relatively small numbers (tens to thousands) of records that may each be quite large (in some PhysioBank databases, the size of a record can be a gigabyte or more, although typical record sizes are a few Mb).

Many of the databases currently in the PhysioBank Archives were developed at MIT and at Boston's Beth Israel Hospital (now the Beth Israel Deaconess Medical Center) and have previously been distributed in CD-ROM format. All of these databases are available in their entirety from these archives. The support provided to PhysioBank by the NIH makes it possible for us to provide free access to these databases via PhysioNet to the research community.

About records

Each database consists of a set of records (recordings), identified by the record name. Lists of record names for each database can be found here. In most cases, a record consists of at least three files, which are named using the record name followed by distinct suffixes (extensions) that indicate their contents. For example, the MIT-BIH Arrhythmia Database includes record 100; the three files 100.atr, 100.dat, and 100.hea together comprise record 100. Almost all records include a binary .dat (signal) file, containing digitized samples of one or more signals; these files can be very large. The .hea (header) file is a short text file that describes the signals (including the name or URL of the signal file, storage format, number and type of signals, sampling frequency, calibration data, digitizer characteristics, record duration and starting time). Most records include one or more binary annotation files (in the example, .atr denotes an annotation file). Annotation files contain sets of labels (annotations), each of which describes a feature of one or more signals at a specified time in the record; 100.atr, for example, contains an annotation for each QRS complex (heart beat) in the recording, indicating its location (time of occurrence) and type (normal, ventricular ectopic, etc.), as well as other annotations that indicate changes in the predominant cardiac rhythm and in the signal quality. In other databases, annotations mark other features of the signals.

The PhysioBank Core Collection

Since PhysioBank currently occupies over 700 gigabytes and is growing, some PhysioNet mirrors provide only a subset of PhysioBank, known as the PhysioBank Core Collection. PhysioBank has been designed so that visitors to these mirrors are redirected to the master PhysioNet server when following a link to a PhysioBank record outside of the core collection. You may not notice that any redirection has occurred unless your connection to the master server is significantly slower than your connection to the mirror.

Currently the PhysioBank Core Collection includes all of PhysioBank except for the most recently added databases and a few extremely large databases.

PhysioToolkit Software

PhysioToolkit's open-source WFDB software for reading and analyzing PhysioBank data is usable with and freely available for FreeBSD, GNU/Linux, Mac OS/X, MS-Windows, Solaris, and most other popular operating systems.

If you wish to write your own software to read PhysioBank data, you are encouraged to download the WFDB library (a portable set of functions for reading and writing signal, annotation and header files in the formats used in PhysioBank, among others). The WFDB library can be used by your own software written in C, C++, Fortran, Java, Matlab, Perl, Python, and other languages. The advantage of incorporating the WFDB library in your software over attempting to write your own code for reading PhysioBank files (apart from the immediate savings of effort) is that support for new file formats and new file access methods is added to the library from time to time, and your software can then incorporate this support simply by recompiling or relinking with the latest version of the WFDB library.

A workable alternative is to use rdsamp(1) and rdann(1) to convert any desired portions of PhysioBank files into an easy-to-process text format. Sources for these programs are included in the WFDB Software Package; binaries are also available for several popular operating systems. You may also run these utilities on our web server without downloading the data files or software first; in this case, you can capture their output using your browser. Visit the PhysioBank ATM to obtain digitized samples or annotations as text.

If, despite the above, you still wish to read PhysioBank files directly (perhaps because C, C++, Fortran, Java, Matlab, Perl, Python, and the other languages are all unavailable in your environment), see the file format specifications (annot(5), wfdbcal(5), header(5), and signal(5), in the WFDB Applications Guide).