PhysioBank currently contains over 36,000 recordings of annotated, digitized physiologic signals and time series, organized in over 50 databases (collections of recordings). All are freely available from PhysioNet, but where should one begin looking among nearly 4 terabytes of data?
Getting Started
Try the PhysioBank ATM first. This allows you to view any of the recordings and annotations in PhysioBank within your web browser. Choose a few records from several different databases to get a sense of the variety of data available. The PhysioBank ATM is best suited for a quick overview rather than in-depth study, because it is slower to generate images on the PhysioNet server than to download the raw data for processing by your own computer.
If you choose a record that contains ECG signals, you will probably see "·" (normal sinus beat) annotations and perhaps a few other types of annotations in the ATM output. If the mnemonics used to display these annotations are unfamiliar, you may wish to look them up in a table of the annotation codes displayed by the ATM and other PhysioToolkit software.
Next, get an overview of PhysioBank with a look at the PhysioBank Archive Index. All of the available databases are listed there, according to the types of signals contained in each. Choose one of these databases and follow the links to read about it and to gain access to the records it contains (read more about records).
If you need only a small amount of data for study, the PhysioBank ATM allows you to obtain up to 100,000 samples (at least a minute, usually more) of digitized signals of your choice in text format, and it similarly converts any amount of annotations into text format. You can view and save either type of text output using your web browser.
Using the PhysioBank ATM allows you to explore PhysioBank with nothing
more than a web browser, but you should choose
appropriate PhysioToolkit software before beginning a project that
requires large amounts of data. Much of this software can access
PhysioNet and other web servers directly, allowing you to draw on the
resources of PhysioBank without having to download and store huge
amounts of data in advance.
Downloading
It is worth repeating the last point in the previous section: most of the large amount of software available freely in PhysioToolkit can read PhysioBank data directly from the PhysioNet web server without the use of a web browser and without the need to copy data first to your own disk drive. These applications retrieve only the data needed (it is not necessary, for example, to download an entire 40-hour record in order to study a 5-minute region of interest in the middle of the record). The same applications can read data files from your own disk, so you do not need to learn to use a new set of tools if and when you decide it's time to download the data that you wish to study intensively. You may be able to save considerable time and disk space by avoiding downloading entirely, or by postponing downloading until you have explored the data enough to choose a subset of interest.
Before downloading large amounts of data, consider using one of the PhysioNet mirrors if your connection to the master PhysioNet web server (at MIT in Cambridge, Massachusetts) is slow. (Some mirrors provide only the PhysioBank core collection; if you use one of these mirrors, you will be redirected to the master server if you follow a link to a recording not in the core collection.)
Once you have chosen data to download, be sure to download all of the files associated with each record of interest. There are at least 2 and usually 3 or more such files in each case; see the individual listings for each database for details. The details of how to download a file depend on your browser. Most browsers allow you to download a file by right-clicking on the link to the file you wish to download and then choosing Save Target As... (or similar) from the pop-up menu that appears; another method supported by most browsers is to press and hold the Shift key while left-clicking on the link. See the PhysioNet FAQ for additional hints about downloading files.
The PhysioBank ATM can package all of the files associated with any single record into a tar archive or a zip file. This may be particularly convenient if you wish to download the multi-segment records of the MIMIC or MIMIC II databases, which may contain hundreds of files each.
If you wish to download all of the files in any of the PhysioBank databases without selecting each one individually, try using a utility for batch HTTP transfers such as wget, available here in source form for all versions of UNIX and as a precompiled binary for MS-Windows. Most Linux distributions include wget. Once you have installed wget, retrieve a batch of files using a command such as
wget -r -np http://physionet.cps.unizar.es/physiobank/database/mitdb/(or substitute the name of a nearby PhysioNet mirror for physionet.cps.unizar.es above). See the PhysioNet FAQ for other ways to download complete databases efficiently.
Be sure to download the text file
wfdbcal,
which contains information about the customary scales used by software
such as WAVE and pschart for displaying or
plotting signals of various types. Install it in the same directory as the
data files.
Databases
In this context, a database is simply a collection of recordings (records), available as a set of flat files. In contrast to typical relational databases, PhysioBank databases consist of relatively small numbers (tens to thousands) of records that may each be quite large (in some PhysioBank databases, the size of a record can be a gigabyte or more, although typical record sizes are a few Mb).
Many of the databases currently in the PhysioBank
Archives were developed at MIT and at Boston's Beth Israel Hospital (now
the Beth Israel Deaconess Medical Center) and have previously been distributed
in CD-ROM format. All of these databases are available in their
entirety from these archives. The support provided to PhysioBank by the NIH
makes it possible for us to provide free access to these databases via
PhysioNet to the research community.
About records
Each database consists of a set of records (recordings), identified by the
record name. Lists of record names for each database can be found
here. In most cases, a record consists
of at least three files, which are named using the record name followed by
distinct suffixes (extensions) that indicate their contents. For example,
the MIT-BIH Arrhythmia Database includes record 100; the three files
100.atr, 100.dat, and 100.hea together comprise
record 100. Almost all records include a binary .dat (signal) file,
containing digitized samples of one or more signals; these files can be very
large. The .hea (header) file is a short text file that describes
the signals (including the name or URL of the signal file, storage format,
number and type of signals, sampling frequency, calibration data, digitizer
characteristics, record duration and starting time). Most records include
one or more binary annotation files (in the example, .atr
denotes an annotation file). Annotation files contain sets of labels
(annotations), each of which describes a feature of one or more signals at
a specified time in the record; 100.atr, for example, contains
an annotation for each QRS complex (heart beat) in the recording, indicating
its location (time of occurrence) and type (normal, ventricular ectopic, etc.),
as well as other annotations that indicate changes in the predominant cardiac
rhythm and in the signal quality. In other databases, annotations mark other
features of the signals.
The PhysioBank Core Collection
Since PhysioBank currently occupies over 700 gigabytes and is growing, some PhysioNet mirrors provide only a subset of PhysioBank, known as the PhysioBank Core Collection. PhysioBank has been designed so that visitors to these mirrors are redirected to the master PhysioNet server when following a link to a PhysioBank record outside of the core collection. You may not notice that any redirection has occurred unless your connection to the master server is significantly slower than your connection to the mirror.
Currently the PhysioBank Core Collection includes all of PhysioBank except
for the most recently added databases and a few extremely large databases.
PhysioToolkit Software
PhysioToolkit's open-source WFDB software for reading and analyzing PhysioBank data is usable with and freely available for FreeBSD, GNU/Linux, Mac OS/X, MS-Windows, Solaris, and most other popular operating systems.
If you wish to write your own software to read PhysioBank data, you are encouraged to download the WFDB library (a portable set of functions for reading and writing signal, annotation and header files in the formats used in PhysioBank, among others). The WFDB library can be used by your own software written in C, C++, Fortran, Java, Matlab, Perl, Python, and other languages. The advantage of incorporating the WFDB library in your software over attempting to write your own code for reading PhysioBank files (apart from the immediate savings of effort) is that support for new file formats and new file access methods is added to the library from time to time, and your software can then incorporate this support simply by recompiling or relinking with the latest version of the WFDB library.
A workable alternative is to use rdsamp(1) and rdann(1) to convert any desired portions of PhysioBank files into an easy-to-process text format. Sources for these programs are included in the WFDB Software Package; binaries are also available for several popular operating systems. You may also run these utilities on our web server without downloading the data files or software first; in this case, you can capture their output using your browser. Visit the PhysioBank ATM to obtain digitized samples or annotations as text.
If, despite the above, you still wish to read PhysioBank files directly (perhaps because C, C++, Fortran, Java, Matlab, Perl, Python, and the other languages are all unavailable in your environment), see the file format specifications (annot(5), wfdbcal(5), header(5), and signal(5), in the WFDB Applications Guide).