The top questions are answered directly below. If your question is not among these, please see the rest of the FAQ below. It is very detailed and most likely has the answer you need.
Top questions
- 1. I am looking for [...] data or content.
- 2.1. What do these file formats mean? Which files are the data and/or annotations?
- 2.2. How do I read them?
- 3. How do I get a dataset into text/matlab format so I can process it?
- 4.1. How do I know if the signals are digital or physical?
- 4.2. Does WFDB give digital or physical values?
- 4.3. Digital vs physical - concepts of storing and representing information
- 5. Help, the data are corrupt / How do I download the files?
- 6. What do the signals look like? Can I view them before I download them?
- 7. How can I report a problem with Physionet?
1. I am looking for [...] data or content.
You have several options:
- Browse the physiobank signal archives, which has the databases sorted by signal category.
- Use the keyword search, which is a google search through physionet. Type in keywords in the search bar at the top right of the web page. This searches the website’s content for your keywords.
- Use the physiobank record search. Specific instructions are on the page. For more information about the physiobank record search, see the page about the physiobank-index, the large metadata file the physiobank search is based on.
- Explore MIMIC-III, a massive healthcare dataset collected from over 40000 critical care patients. MIMIC-III is not part of PhysioBank, but a project in PhysionetWorks, therefore it is not completely openly accessible. Users must sign a data use agreement and apply for access. Some subsets of older versions of MIMIC are part of physiobank and can be found in the signal archives.
If you are looking for a very modern or unconventional type of recording, note that Physionet does not have every type of data. All databases hosted by Physionet are listed in the physiobank signal archives directory shown above.
*Do not email the Physionet webmaster asking them to find you data. All the search resources are mentioned above.
2.1. What do these file formats mean? Which files are the data and/or annotations?
The data and annotations in most PhysioBank databases are stored in a Waveform Database (WFDB) format, which contains two standard categories:
- MIT Format
-
- MIT Signal files (.dat) are binary files containing samples of digitized signals. These store the waveforms, but they cannot be interpreted properly without their corresponding header files. These files are in the form:
RECORDNAME.dat
. - MIT Header files (.hea) are short text files that describe the contents of associated signal files. These files are in the form:
RECORDNAME.hea
. - MIT Annotation files are binary files containing annotations (labels that generally refer to specific samples in associated signal files). Annotation files should be read with their associated header files. If you see files in a directory called
RECORDNAME.dat
, orRECORDNAME.hea
, any other file with the same name but different extension, for exampleRECORDNAME.atr
, is an annotation file for that record.
- MIT Signal files (.dat) are binary files containing samples of digitized signals. These store the waveforms, but they cannot be interpreted properly without their corresponding header files. These files are in the form:
- European Data Format (EDF)
-
- EDF files contain digital signals stored in their standard international format. EDF files store their header information at the beginning of the file, as opposed to MIT format which has a separate header file. Since recent versions of the WFDB library can read them directly, EDF is a WFDB and PhysioBank-compatible format. EDF files may also have associated annotation files. For example if a directory contains
RECORDNAME.edf
andRECORDNAME.edf.qrs
, the.qrs
file is the annotation file associated with the record. - EDF+ files are EDF files that also contain annotations encoded as signals.
- EDF files contain digital signals stored in their standard international format. EDF files store their header information at the beginning of the file, as opposed to MIT format which has a separate header file. Since recent versions of the WFDB library can read them directly, EDF is a WFDB and PhysioBank-compatible format. EDF files may also have associated annotation files. For example if a directory contains
2.2. How do I read them?
Physionet provides the WFDB software package highly useful for reading, writing, and processing the above described WFDB files. See the WFDB Applications Guide for details about its many functionalities.
To read MIT format signals and annotations RECORDNAME.dat
, RECORDNAME.hea
, and RECORDNAME.qrs
:
rdsamp -r RECORDNAME
rdann -r RECORDNAME -a qrs
To read EDF format signals and annotations RECORDNAME.edf
and RECORDNAME.edf.atr
:
rdsamp -r RECORDNAME.edf
rdann -r RECORDNAME.edf -a atr
There is also the WFDB Matlab toolbox, a matlab implentation of the WFDB software package. See also the development repository
Finally there is the WFDB Python package which just contains functions to read MIT WFDB format signal and annotation files into python data structures. Release versions are hosted on pypi and can be installed from your terminal by calling: pip install wfdb.
3. How do I get a dataset into text/matlab/python so I can process it?
There are several ways:
- Install and use our WFDB software package. It is a large collection of software for signal reading, writing, processing and automated analysis. See the WFDBApplications Guide for details about its many functionalities. Some basic commands include:
rdsamp
,rdann
,wfdb2mat
. For example, to convert a record into a text file, call:rdsamp -r RECORD -p > RECORD.txt.
For more details, see How to obtain PhysioBank data in text format.
To convert a record into a matlab matrix, call:wfdb2mat -r RECORD.
- Install and use the WFDB Matlab toolbox, which is a matlab implentation of the WFDB software package.
- Use the physiobank ATM. Under 'Input' select your database and record. Under 'Output/length' select 'to end'. Under 'Toolbox' select 'Export signals at .mat' or whatever format you want. Note that the page will show the commands from the WFDB software package used to generate the files/graph you request. It is highly recommended to download the WFDB software package full of useful and powerful commands.
- Install and use the WFDB Python package, which contains python functions to read MIT WFDB signals and annotations into python.
4.1. How do I know if the signals are digital or physical?
Easy method - just have a look at the signal numbers:
- If they are all integers in the range [-2^N, 2^N-1 ] or [ 0, 2^N ], they are probably digital. Compare the values to see if they are in the expected physiological range of the signal you are analyzing. For example, if the header states that the signal is an ECG stored in milivolts, which typically has an amplitude of about 2mV, a signal of integers ranging from -32000 to 32000 probably isn't giving you the physical ECG in milivolts...
- If they are not integers then they are physical. Once again you can quickly compare the values to see if they are in the expected physiological range of the signal you are analyzing.
4.2. Does WFDB give digital or physical values?
- The WFDB Software Package's (10.5.24)
rdsamp
produces digital values by default:rdsamp -r RECORDNAME
You can use the-p
option to obtain physical values instead:rdsamp -r RECORDNAME -p
- The WFDB Matlab Toolbox's
rdsamp
produces physical values by default (different from the above). Set the 'rawunits' flag (default 1 for 64 bit double precision floating point physical values,) according to your preferences. - The WFDB Software Package's and the WFDB Matlab Toolbox's
wfdb2mat
both produce digital values. You can obtain physical values by using the WFDB Matlab Toolbox'srdmat
function after calling wfdb2mat.
If you have digital values, you can manually convert all 2^-(N-1) into NAN, subtract the baseline and then divide by the gain for each channel to obtain physical units, where N=no. bits. But files using WFDB format 80 store integers from 0 to 256, which actually represent integers from -128 to 127, so you would have to first subtract 128, convert all -128 into NAN, and then subtract the stated baseline and divide by the gain. It is safer in general to use rdsamp -p
or wfdb2mat
+ rdmat
which accounts for these scenarios.
wrsamp
Currently wrsamp
only uses integer input values, which are directly written to the digital signal file. It is the reverse of rdsamp
which reads digital values from signal files. All non-integers will be rounded off, so if you input a physical signal of decimals all under 0.5, the output will just be 0's. This is fine if you already happen to have the digital values in text format, but very troublesome if you only have analogue values.
- One feature that may help in both instances is the -x option of wrsamp which multiplies each input channel by a specified factor before writing them to the signal file. Do not confuse this with the -G option which only affects writing the header file for interpreting the signal after it has been written. See the wrsamp man page (
man wrsamp
) for more details. - If you have matlab, you can use the
mat2wfdb
function from the WFDB Matlab Toolbox which automatically chooses and applies appropriate gains and offsets on input matlab signals before writing the output WFDB file.
4.3 Digital vs physical - Concepts of storing and representing information
Researchers want to analyze the actual value of signals, ie. the value of this ECG signal in milivolts. But to process information using computers, they must collect the signals via some capturing device which discretely samples, and digitizes the signals into 2^N levels, where N is the resolution of the device. Each sample captured requires N bits to store, and takes one of 2^N possible integer values. There is also information stored which allows the user/program to map these integers back to the physical values the device managed to capture given its resolution. For example if they have a 12 bit oscilloscope, they have 4096 levels to capture the range and details of the signal. A higher N allows us to resolve finer details, but requires more storage space per sample.
Because the user wants to analyze the actual value of the signals, we can map these digital values back to the original physical values the device managed to capture. These mapped values can be loaded into an environment like Python, C, or Matlab which has the double precision floating point (64 bit) variable type which can represent numbers and decimals to a very fine detail (2^64 = 1.8447e+19 levels of precision!). Then the user will have 'physical' values to process and apply algorithms on in their highly detailed 64 bit environment. Remember however that the original signal resolution is limited by the capturing device, and is not increased just by loading it into a 64 bit environment.
We say that signals are in 'physical units' when the values are used to represent the actual real life values as closely as possible, although obviously everything on the computer is digital and discrete rather than analogue and continuous. This includes our precious 64 bit double precision floating point values, but this is as close as we can get and already very close to the actual physical values, so we refer to them as 'physical'.
Binary files such as the WFDB .dat
files store signal values as integers, using enough space per sample to retain the signal's original resolution, but not an excessive amount.
For example, if a 15 bit signal is collected via a capturing device, Physionet will likely store it as a 16 bit signal. Each 16 bit block stores an integer value between -2^15 and 2^15-1, and using the gain and offset stated in the header for each channel, the original physical signal can be mapped out for processing. If we know that the signal only has 15 bits of precision when it was recorded, why not store it as integers in a 16 bit file along with a small header text, rather than waste 4x as much space storing the physical signal using 64 bits per sample? Because the capturing device was exactly 15 bits, assigning more space to allow for storing values that fall between the original values will be wasted and won't make the signal more detailed. Imagine using 5TB vs 20TB of disk space to store the exact same information!
5. Help, the data are corrupt / How do I download the files?
No, they’re probably not corrupt. Did you left click on a digital signal file (.edf
or .dat
) storing the data? That makes your browser view it in text format, which makes no sense. See above for descriptions of file types.
If you want to download the file, right click -> save as. If you want to convert the WFDB or EDF files into another form, see the question about file changing file formats, above.
If you want to download an entire database at once, see the downloading-databases section.
6. What do the signals look like? Can I view them before I download them?
You can view all physiobank signals with Lightwave or with the Physiobank ATM.
7. How can I report a problem with Physionet?
If you are experiencing issues when using PhysioNet or if you have a suggestion for improvement, please raise an issue on our issue tracker.
To raise an issue, first navigate to the PhysioNet repository on GitHub. After logging in to GitHub, click on the "Issues" tab, click "New issue", add a title and description of the problem, and select the “Submit new issue” button.
Top
- 1. I am looking for [...] data or content.
- 2.1. What do these file formats mean? Which files are the data and/or annotations?
- 2.2. How do I read them?
- 3. How do I get a dataset into text/matlab format so I can process it?
- 4.1. How do I know if the signals are digital or physical?
- 4.2. Does WFDB give digital or physical values?
- 4.3. Digital vs physical - concepts of storing and representing information
- 5. Help, the data are corrupt / How do I download the files?
- 6. What do the signals look like? Can I view them before I download them?
- 7. How can I report a problem with Physionet?
General
- How can I get an answer to my question?
- What is all of this, anyway?
- Who are you?
- Why is PhysioNet here?
- Who can use data and software from PhysioNet?
- Have the PhysioBank data been fully deidentified (anonymized), and may they be used without (further) IRB approval?
- Is all of this really free?
- How can I buy a copy of ...?
- Please send me a copy of ...
- What are the license terms?
- Is this software Y2K-compliant?
- My connection is slow. Is there a mirror?
- Can I set up a mirror?
- Will you post a link to my web site?
Sign-in, Accounts, and Passwords
- Why should I sign in?
- Why would I need an account and how do I get one?
- I can’t log in!
- How can I change my PhysioNetworks password?
- How can I change my MIMIC II Explorer/Query Builder password?
Where is ...
- Where can I find the specific type of data I need?
- Where can I find data for healthy subjects?
- Where can I find serial data (multiple recordings of the same subjects?)
- Where can I find long-duration signals and time series?
- Is the AHA Database available on PhysioNet?
- Are there any 12-lead (diagnostic) ECGs in PhysioBank?
- Are updates for CD-ROM databases of physiologic signals available here?
- Where is [some file]?
- I can’t find [something]!
Downloading
- How can I download binary files?
- Can I download an entire PhysioBank database in one step?
- There are so many files in .... Can I get a zip file or a tar archive of it?
- How can I unpack a .tar.gz archive (a “tarball”)?
- Can I get these files via FTP?
- Can I look at these recordings using only my web browser?
PhysioBank Files
- What are PhysioBank-compatible (or WFDB-compatible) formats?
- What are
.dat
,.hea
,.atr
,.qrs
, ... files? - What are
.xws
files and how can I view them? - What is a “record name” or an “annotator name”?
- How can I run ... on all of the records in a PhysioBank database?
- Where are the annotation, signal, or header files I just created?
- How were the signals in PhysioBank digitized?
- Should I use PhysioBank formats for my project?
Reading and Writing Digitized Signals
- How can I find out what signals were recorded?
- What do the signal names MLII, V2, ... mean?
- What do the signal names ‘signal 0’, ‘signal 1’, ... mean?
- What is the format of the signal files?
- How can I read signal files?
- How can I use Matlab’s import feature to read signal files?
- Why does Matlab say “file might be corrupt” when loading a huge .mat file?
- Is there any direct way of converting sample values to physical units using
wfdb2mat
? - How can I use Excel’s import feature to read signal files?
- Where do I get
rdsamp
? - How do I use
rdsamp
? - What do the sample values represent?
- What does the error “init: can’t open header for ...” mean?
- I can’t run
rdsamp
. Can you please send me a copy of ... in text format? - How can I get more than 100,000 samples?
- How can I create a PhysioBank-compatible record from my own data?
Reading and Writing Annotations
- What is an annotation?
- What do the annotation codes (N, V, S, F, ...) mean?
- What is the format of the annotation files?
- How can I read annotation files?
- What does the error “annopen: can’t read annotator ... for record ...” mean?
- Where do I get
rdann
? - How do I use
rdann
? - What are the columns in
rdann
’s output? - I can’t run
rdann
. Can you please send me a copy of ... in text format? - How can I create an annotation file?
- How can I annotate a record?
Software
- I double-clicked on the program icon, and nothing happens!
- I typed the program name in the ‘Run...’ dialog, and nothing happens!
- What is a “standard input” or a “standard output”?
- How can I save the output of ... in a file?
- How can one program read another’s output?
- My question is about WAVE (or gtkwave). Is there a WAVE FAQ?
- I tried to compile ... but the compiler can’t find wfdb.h (or ecgcodes.h, or ecgmap.h).
- I tried to compile ... but the compiler complains that isigopen (or iannopen, or strtim, or wfdbinit) is undefined.
- I’m writing a program to work with PhysioBank data, but my compiler can’t link to the WFDB library. What should I do?
- Where on this site can I find software for my favorite operating system/compiler?
- Can I use your code in my commercial application?
- How should I report a bug?
Help!
- Some links don’t work, but I don’t see any error. Why not?
- I’m having trouble viewing images on this site. Why?
- I’m having trouble printing PostScript files from this site. Why?
- I don’t understand how to use the software or data on this site.
- What’s a
man
page? - Are PhysioNet or its mirror maintainers responsible for the content of external sites?
- Why isn’t my question here?
- What’s the magic word?
General
How can I get an answer to my question?
Have you read this FAQ? If not, please take a few minutes to do so. It answers many common questions.
Have you tried searching for key words using the Search tool? All text on the PhysioNet web site is indexed and can be found by searching for it. To do this, type one or more terms related to your topic or question into the search box below, then click on the "Search" button to its right:
A similar search box and button appear at the top right corner of this and almost every other page on PhysioNet.
If you have not found an answer to your question in the PhysioNet FAQ or by a PhysioNet search, you may wish to ask your question by email. If everyone who took the time to ask a question by email first took the time to read How to Ask Questions the Smart Way, we would be able to take the time to answer all of the questions we receive with the detailed and pertinent answers they deserve. Since this will never happen, we give priority in answering questions to those who have read this FAQ. How can we tell who has done so? That’s easy; we look for the magic word in the subject line of the email. (Important: the author of “How to Ask Questions the Smart Way” cannot answer your questions; read his disclaimer!)
What is all of this, anyway?
You’re looking at the PhysioNet web site, or one of its mirrors. Read more about PhysioNet and the NIH-sponsored research resource to which it belongs here.
We have large collections of physiologic signals (time series) and software that can be used to study these signals, and smaller but growing collections of research papers, tutorials, and reference materials that relate to the signals and software.
Who are you?
We are a diverse group of computer scientists, physicists, mathematicians, biomedical researchers, clinicians, and educators at MIT (Cambridge, MA, USA), the Beth Israel Deaconess Medical Center, and /Harvard Medical School (Boston, MA, USA). Many of us have worked together for 20 years or even longer on problems relating to characterizing and understanding the dynamics of human physiology and the implications of dynamical change in diagnosis and treatment of pathophysiology.
PhysioNet receives contributions of data, software, publications, and tutorials from researchers worldwide; see the PhysioNet Contributors page for a list.
Why is PhysioNet here?
You can’t learn everything there is to know about snow by studying a single snowflake, or even a few hundred of them. In much the same way (and for some of the same reasons), physiologic signals display astonishing diversity, between individuals and even within individual subjects over time. To study them seriously requires large amounts of data that are difficult and expensive to gather and to characterize, and software that can be flexibly and efficiently modified to meet the unique requirements of new research.
PhysioNet is here first of all because we (see the previous question) needed to gather such data and to design such software for our own work. Having done so, we believe that other researchers should not be forced to do the same, and that by making our data and software available, others should be able to explore them, to develop, test, and refine hypotheses; in short, to do investigations that would not be possible otherwise.
Many researchers around the world share this vision of open science, in which investigators who need data with which to test their ideas can bootstrap their studies using large, freely available, and well-understood data collections, and in which investigators wishing to explore their data using a wide variety of methods can find verifiable, open-source, reference implementations of analysis software that can be adapted to their own studies. PhysioNet began in 1999 with our own collections of data and software, but its archives continue to grow in scope and depth thanks to the contributions of many others.
Who can use data and software from PhysioNet?
These materials have placed here for the use of researchers anywhere in the world (our visitors in the month of December 2009 came from at least 149 countries and territories on every continent, including Antarctica). Many of them are biomedical and clinical researchers in academia and industry, but others include physicists, mathematicians, computer scientists, educators, graduate and undergraduate university students, and even secondary school students.
Have the PhysioBank data been fully deidentified (anonymized), and may they be used without (further) IRB approval?
Yes.
If you are planning to contribute data to PhysioNet, it is your responsibility to ensure that they have been fully deidentified before transmitting them to us. Please review our guidelines for contributors. Our software for deidentification of free text medical records may be helpful while preparing data to be contributed.
Is all of this really free?
Yes.
We encourage contributions of data and software to PhysioNet, but only if contributors are willing to allow their contributions to be used freely. See our guidelines for contributors and our copying policy.
How can I buy a copy of ...?
See the answer to the previous question.
Printed copies of some of our books are now available at the PhysioNet Bookstore.
Please send me a copy of ...
Everything we have is free, and can be freely redistributed. You can download it yourself, or you can ask a friend to download it for you. We understand that web access can be slow or expensive in some locations; please understand that preparing and mailing materials from this web site to individual users would also be slow and expensive.
For downloading tips, read the questions and answers below, beginning with How can I download binary files?.
What are the license terms?
The software is licensed under the GNU Public License (GPL), or (if noted in the source files) other licenses that conform to the Open Source Definition. These licenses permit verbatim copying and redistribution of the source files, and generally grant other permissions as well. For further details, see Can I use your code in my commercial application? (below).
There is nothing analogous to the GPL for data, but we permit copying and redistribution of unaltered data from this site without restrictions, in the spirit of the GPL. We do not allow distribution of altered data except under conditions that make it clear that the data have been altered, because it is very important that users should be able to distinguish between original data from this site and modified versions of those data.
Other materials from this site (books, tutorials, papers, and commentary) may be reproduced freely, with appropriate credit to the original authors.
See the PhysioNet Copying Policy for further details.
Is this software Y2K-compliant?
Yes. See our statement of Y10K compliance.
This really isn’t a frequently-asked question any more. The last person who asked it sent his question by email, dated 1 January 103.
My connection is slow. Is there a mirror?
Yes. See Mirrors for a list.
Can I set up a mirror?
Yes. Please use rsync
as described in
How to set up a mirror of PhysioNet.
Will you post a link to my web site?
Probably not, unless it is directly relevant to the content of PhysioNet. Most external links on this site reference publications and other materials that provide additional information, examples of use, or context for PhysioBank data or PhysioToolkit software. We also maintain short and highly selective lists of other data and software resources likely to be of interest to PhysioNet visitors. These lists are limited to non-commercial sites that provide access unavailable elsewhere to collections of physiologic signals or related data, or open-source software for study of such data.
Sign-in, Accounts, and Passwords
Why should I sign in?
You are not required to login in order to use PhysioBank, PhysioToolkit, or the PhysioNet Library, all of which can be accessed freely. Use of PhysioNetWorks is also free, but it requires logging in.
PhysioNetWorks workspaces are available to members of the PhysioNet community for works in progress that will be made publicly available via PhysioNet when complete. Unlike other areas of PhysioNet, these workspaces are password-protected.
Why would I need an account and how do I get one?
Most visitors don’t need accounts (see the previous question).
If you wish to create a PhysioNetWorks project, to join an existing one, or to participate in an annual PhysioNet/Computing in Cardiology Challenge, you will need an account and a password in order to establish your identity and gain access to password-protected workspaces. Owners of PhysioNetWorks projects, which are works in progress, may allow access to invited collaborators only, or they may allow access to PhysioNetWorks members only under the terms of a Data Use Agreement (DUA).
The MIMIC II Clinical Database is an example of a PhysioNetWorks project that requires a password and DUA for access.
To create an account, go to the PhysioNetWorks login page, enter your email address, and click on ‘Create account’. Instructions for setting up your account and choosing your password are sent immediately to the address you enter, with the subject line “PhysioNetWorks login" and the sender address “DoNotReply" at physionet.org, so be sure to enter a valid email address at which you can receive it. If it doesn’t arrive within a few minutes, check that your spam filter has not discarded it. If you forget your password, or wish to change it at any time, simply return to the login page and request a new one.
Since your access to PhysioNet’s restricted or protected content will be interrupted if you lose both your password and access to your registered email address, we suggest not using a temporary address as your account name.
I can’t log in!
Check your assumptions: most users don’t need to log in (see the previous question and answer).
If you really do need to log in, go to the PhysioNetWorks login page and follow the instructions there.
How can I change my PhysioNetWorks password?
How can I change my MIMIC II Explorer/Query Builder password?
PhysioNetWorks users: Go to the PhysioNetWorks login page, enter your email address, and click “Reset password”. Follow the instructions that will be sent to your email address by the autoresponder within a minute or two.
MIMIC Query Builder users: A dedicated server, https://querybuilder-lcp.mit.edu/, provides access to the MIMIC Query Builder. User your PhysioNetWorks username and password to log in.
Where is ...
Where can I find the specific type of data I need?
Some of the most popular versions of this question are answered in this section; read it first.
The next place to look is in the PhysioBank Archive Index. It lists all of the data collections in PhysioBank, with brief descriptions and links to longer descriptions of each.
If you are looking for records with specific combinations of signals, durations, time or amplitude resolution, annotations of specific types, or female or male subjects of particular ages, try a PhysioBank Record Search to locate relevant data. A limited amount of information about diagnoses and medications is also searchable in this way. A tutorial introduction to this tool is available here.
A PhysioNet (text) search can also be helpful. Using the search box at the top of almost any page on this web site, look for keywords that describe the data you seek.
Where can I find data for healthy subjects?
Most data in PhysioBank have been obtained from subjects with a variety of health problems. About twenty PhysioBank databases, however, include healthy subjects.
The control records (c01, c02, ... c10) from the Apnea-ECG Database were obtained from healthy volunteers during sleep; the recordings each contain a single ECG signal and are each about 8 hours long. Simultaneously recorded respiration and oxygen saturation signals are available for one of these recordings.
The CAP Sleep Database includes 16 full-length polysomnograms of healthy subjects. Signals include 3 or more EEG channels, 2 EOG channels, submentalis and bilateral anterior tibial EMG, airflow, abdominal and thoracic respiratory effort, SaO2, and ECG.
The Fantasia Database is a very well-controlled set of 2-hour recordings of ECG (with beat annotations) and respiration signals from 40 rigorously-screened healthy subjects (20 young, 20 elderly, with equal numbers of men and women in each group). Half of the recordings also include an uncalibrated continuous non-invasive blood pressure signal.
Heart rate time series from five additional groups of healthy volunteers are available in a collection of data used for a study of exaggerated heart rate oscillations during meditation (two groups of meditators recorded before and during meditation, a group of volunteers recorded during sleep, a group of volunteers recorded during metronomic (fixed-rate) breathing, and a group of elite athletes recorded during sleeping hours).
The PTB Diagnostic ECG Database includes records from 52 healthy volunteers; here is a list of them.
The MIT-BIH Normal Sinus Rhythm Database consists of ECG recordings from subjects who were found to have had no significant arrhythmias, ST changes, or known cardiac disease. Since these subjects were recorded for medical reasons, however, they are not necessarily “healthy” — but their medical problems are not heart-related. Subjects included in the Normal Sinus Rhythm RR Interval Database were known to be healthy, however.
The Sleep-EDF Database [Expanded] contains EEG, EOG, and other signals from 42 healthy subjects. (Twenty-two of these had mild difficulty falling asleep, but were otherwise healthy.)
ECG, EMG, GSR, and respiration from seventeen healthy volunteers are included in data collected for a study of Stress Recognition in Automobile Drivers.
All six of PhysioBank’s gait and balance databases include at least some data collected from healthy volunteers. Among PhysioBank’s neuro- and myoelectric databases, several include data from healthy volunteers.
This is not a comprehensive list; depending on your interests, you may find other relevant data in PhysioBank. Read the descriptions of the data collections in the PhysioBank Archive Index to learn about them, and follow the links there and above for additional information.
This list also does not include data sets that are in development within projects on PhysioNetWorks. These data sets are currently accessible to members of the respective projects only. When they are complete, they will become open-access data within PhysioBank. To learn about them, join PhysioNetWorks (it’s free, and it takes only a minute or two) and browse through the list of works in progress. Many project owners welcome other interested researchers to join their projects, so in some cases it may be possible to get access while development is still in progress.
Where can I find serial data (multiple recordings of the same subjects?)
These databases include multiple recordings of some or all subjects:
- MIMIC Database
- MIMIC II Waveform Database
- Non-Invasive Fetal Electrocardiogram Database
- PTB Diagnostic ECG Database
- CAST RR Interval Sub-Study Database
- CHB-MIT Scalp EEG Database
A few other PhysioBank databases include multiple recordings of a few subjects, but lack information about the sequence of the recordings and the intervals between them:
- European ST-T Database
- Long-Term ST Database
- MIT-BIH Arrhythmia Database
- Spontaneous Ventricular Tachyarrhythmia Database
Studies requiring data collected at different times of the day, or during sleep and non-sleep, etc., may also be able to use segments of long continuous recordings (see the next question).
Where can I find long-duration signals and time series?
Many PhysioBank databases include at least some records that are on the order of 24 hours or longer in duration. These include:
- MIMIC Database
- MIMIC II Waveform Database
- BIDMC Congestive Heart Failure Database
- Long-Term AF Database
- MIT-BIH Long-Term Database
- MIT-BIH Normal Sinus Rhythm Database
- CAST RR Interval Sub-Study Database
- Congestive Heart Failure RR Interval Database
- Normal Sinus Rhythm RR Interval Database
- CHB-MIT Scalp EEG Database
- Long-Term ST Database
Is the AHA Database available on PhysioNet?
No, it is currently available only from ECRI. Additional information about the AHA Database is available here.
A single sample record that was prepared as an example by the creators of the AHA Database is available in PhysioBank.
Are there any 12-lead (diagnostic) ECGs in PhysioBank?
The PTB Diagnostic ECG Database contains 549 twelve-lead ECGs from 294 subjects. Most of these ECGs are two minutes in duration. (They also include simultaneously recorded Frank XYZ leads.)
The St.-Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database contains 75 twelve-lead ECGs from 32 subjects. Each recording is 30 minutes in duration.
The PhysioNet/Computing in Cardiology Challenge 2011 addressed the problem of quality assessment of 12-lead ECGs, making use of 1500 twelve-lead ECGs (a training set of 1000 ECGs, and a test set of 500 ECGs). These ECGs are unscored, although those in the training set have been classified individually with respect to acceptability for purposes of diagnostic interpretation.
Twelve-lead ECGs are also available from sources other than PhysioBank, including ECG Wave-Maven, the CSE Database and the 12-lead ECG Library.
Are updates for CD-ROM databases of physiologic signals available here?
Yes. Find them here.
Where is [some file]?
I can’t find [something]!
The search box is your friend. It’s at the top right corner of nearly every single page on PhysioNet. Use the search box!
Downloading
How can I download binary files?
The details of doing this depend on your web browser, not on anything specific to PhysioNet or to the specific files you wish to download, so the first thing you should do is to learn how to use your web browser. Most browsers have a Help button that can get you started.
In Firefox or Chrome, right-click on the link, and choose “Save Link As...” from the popup menu.
In Lynx, press d
to download the target of the highlighted link.
If you are using MS Internet Explorer, it is often possible to download a
file simply by left-clicking on the link to that file. This is not a foolproof
method, however, since MSIE attempts to guess the file type and may attempt to
open the file rather than downloading it. A more reliable method is to
right-click on the link and then to choose Save Target As... from the
pop-up menu that appears. In most cases, you can accept the suggested file
name, but be aware that MSIE will generate a .txt
extension for any
file that has a name without an extension (such as the files named
Makefile
that are found throughout PhysioToolkit), so you will need to
correct these file names.
In Safari, right-click (or, with a single-button mouse, press the Control key and click), and choose “Download Linked File”.
Many other web browsers, including Galeon, Konqueror, Mozilla, Netscape, and Opera, allow you to download a file by pressing and holding the Shift key while left-clicking on the link to the file you wish to download.
Can I download an entire PhysioBank database in one step?
Yes. Before you do so, however, note that this may not be necessary.
The recommended way to read PhysioBank data files is by using either PhysioToolkit software linked to the WFDB library, or (for those who like to write their own code) your own software linked to the WFDB library. In either case, the WFDB library does the work of finding and reading PhysioBank files. If you have a local copy of a PhysioBank file, the WFDB library reads that copy; otherwise, it reads the file from PhysioNet using the same HTTP protocol that your web browser uses.
If you want to read PhysioBank files without using the WFDB library (but why?) you will probably need to reformat the files into some less storage-efficient format first, and to do that you will need to read the original files using the WFDB library. In that case, you may as well allow the WFDB library to read the original files via HTTP, and write only the reformatted files to local storage.
If you decide to download a local copy of an entire database, there are two ways to do so that are much more efficient than downloading the files one at a time using a web browser.
The first method uses rsync
, which is the same free software used by
the PhysioNet mirrors. Install rsync
if
you don’t have it already, and then use the command
rsync physionet.org::
to get a list of databases available via rsync
. The output
of this command will contain lines such as
aami-ec13 ANSI/AAMI EC13 Test Waveforms (1 Mb) afdb MIT-BIH Atrial Fibrillation Database (607 Mb) afpdb PAF Prediction Challenge Database (195 Mb) aftdb AF Termination Challenge Database (3 Mb)
The entries in the first column are names of available “modules” (sets of
files). To download (for example) the AF Termination Challenge Database
into a subdirectory of its own within /usr/database
, type:
rsync -Cavz physionet.org::aftdb /usr/database/aftdb
(You may, of course, use any directory for storage of the downloaded
files. The suggested directory, /usr/database
, is searched by
default by the WFDB library, so it’s a good choice.)
To download the MIMIC II Waveform Database Matched Subset, the recommended procedure is slightly different; see these notes.
Using rsync
is particularly convenient if you have an
unreliable connection; if the transfer is interrupted, simply repeat
the command once the connection has been re-established,
and rsync
will quickly determine where it needs to resume the
download. Another advantage of using rsync
is that it
preserves the timestamps of the original files on PhysioNet, so that
if you return to PhysioNet, it will be easy to see if the original
files have been updated since the last time you downloaded them. If
there have been any updates, you can bring your local copy up-to-date
by running the same rsync
command that you used to create it,
without copying anything that hasn’t changed.
Note that rsync
has its own IANA-assigned port (873); if you can reach
PhysioNet with your web browser (port 80, HTTP) but not via rsync
,
your firewall may be blocking port 873.
The second method is by using the the dldatabase
function from the wfdb-python package. You can see the code repository and documentation here
There is another method described in the answer to the next question.
You can choose it if you can’t use rsync
or the wfdb-python package, or if your connection to
one of the PhysioNet mirrors (which do not generally support
rsync
access) is much better than your connection to
the PhysioNet master server.
There are so many files in .... Can I get a zip file or a tar archive of it?
You can obtain a tar archive or zip file of any single PhysioBank record using the PhysioBank ATM.
If you would like to download an entire PhysioBank database, see the previous question.
Otherwise, you can try looking for a .zip or a .tar.gz archive in the directory that contains the files of interest, or in its parent directory. If you don’t find one, however, the answer is no. It is necessary to keep individual files available, and maintaining redundant copies of these files within archives would not be the best use of available resources.
There are excellent alternatives, however, to downloading many files one at
a time using a web browser. Try using a utility that can do batch-oriented
HTTP transfers, such as wget
, available from this site
in source form for Unix, Mac OS/X, or MS-Windows, or in binary form for
MS-Windows. Once you have installed wget
, retrieve a batch of files
using a command such as
wget -r -np http://physionet.cps.unizar.es/physiobank/database/mitdb/
(or substitute the name of a nearby PhysioNet mirror for
physionet.cps.unizar.es
above).
How can I unpack a .tar.gz archive (a “tarball”)?
These files are gzip
-compressed tar
archives.
MS Windows: The free 7-zip file archiver can unpack .tar.gz archives as well as most other common compressed formats.
Alternatively, if you have installed Cygwin, follow the instructions below for using GNU
tar
.
GNU/Linux, Mac OS X:
Using GNU tar
, you can decompress and unpack
foo.tar.gz
in one step:
tar xfvz foo.tar.gz
If your browser decompressed the archive while downloading it, just unpack it:
tar xfv foo.tar
Other Unix platforms:
Traditional versions of tar
may not support GNU tar
’s
z
option. If you have one of these, decompress using gzip
before unpacking, and then unpack the decompressed archive, like this:
gzip -d foo.tar.gz
tar xfv foo.tar
(If you don’t have gzip
, free versions are available for all
popular operating systems from gzip.org.)
I unpacked the tarball, now where are the files?
An archive named foo.tar.gz
would normally
be unpacked into a subdirectory (folder)
named foo
within the current directory
(folder). Look at the file names shown during the unpacking process to see
where the unpacked files have been written.
Can I get these files via FTP?
No. It is considered insecure for an FTP server and a web server to share a file system, and it is not practical for us to maintain separate web and FTP servers.
If you are interested in batch file transfers, read the answer to "There are so many files in ....", above.
Can I look at the waveforms using only my web browser?
Yes. Go to the PhysioBank ATM and fill in the form. For a sample, click here.
PhysioBank Files
What are PhysioBank-compatible (or WFDB-compatible) formats?
The contents of almost all PhysioNet databases are collections of flat files (not relational databases). These files can be read by programs that use the WFDB library to do so. The WFDB library reads files in a variety of formats, presenting their contents in a uniform manner to the programs that use it, so that those programs need not be concerned with the details of the storage formats used in each case. The formats that can be read by the WFDB library are referred to as “PhysioBank-compatible formats”, because they are permissible for files within standard PhysioBank databases. The terms “PhysioBank-compatible" and “WFDB-compatible" are synonymous. Note, however, that the WFDB library is capable of reading a wider variety of formats than those that are actually used within PhysioBank.
Many visitors who ask this question assume that they need to understand the details of PhysioBank’s file formats in order to use PhysioBank. This is not necessary, however. Numerous options exist for reading and writing files in PhysioBank-compatible formats; read the other questions and answers in this section of the PhysioNet FAQ for pointers to many of them. If you really need to know the details of the formats, however, follow the pointers in the next paragraph.
There are several types of files in standard PhysioNet databases:
- Signal files are binary files containing samples of digitized signals.
- Header files are short text files that describe the contents of associated signal files.
- Annotation files are binary files containing annotations (labels that generally refer to specific samples in associated signal files).
-
RECORDS
andANNOTATORS
are text files listing the records belonging to the database, and the types of annotation files available for the database. (Each database has its ownRECORDS
file, and each annotated database has its ownANNOTATORS
file.) - EDF files are included in some PhysioBank databases in lieu of separate signal and header files. Since recent versions of the WFDB library can read them directly, EDF is a PhysioBank-compatible format.
-
EDF+ files are EDF files that also contain annotations
encoded as signals. Although the WFDB library can read the signals
from an EDF+ file, it does not support decoding the annotations, so
this format is only “mostly PhysioBank-compatible”. The PhysioToolkit
application
rdedfann
can decode these annotations into text that can be easily converted into PhysioBank-compatible annotation files usingwrann
. - Calibration files are text files that define customary scales for each type of signal. (There is a default calibration file containing definitions for most signals appearing in PhysioBank, so most standard PhysioBank databases do not need to have their own calibration files.)
What are .dat
, .hea
, .atr
, .qrs
, ... files?
Files belonging to PhysioBank databases have two-part names: the first
part is the record name, and the second part
(following the “.”) indicates the file type. For example, a file named
“chf08.hea
” is a file of type .hea
(see below) belonging
to a record named “chf08
”.
All of these file types are found in PhysioBank databases:
-
.dat
files are binary signal files. See the questions and answers below, beginning with “What is the format of the signal files?”, for information about their format and how to read them. -
.hea
files are short text “header” files used by all of the software that reads the signal files to determine their location and format. In some cases,.hea
files also contain structured comments that include information about the subjects (e.g., age, gender, medications, diagnoses). -
.atr
and.qrs
files (and other files described in the database index pages as annotation files) are binary files containing labels (annotations) that point to specific locations within the signal (.dat
) files and describe events at those locations. For example, many of these annotations indicate the times of occurrence and types of individual heart beats in records containing ECG signals. See the questions and answers below, beginning with “What is the format of the annotation files?”, for information about the format of these files and how to read them.
What are .xws
files and how can I view them?
These are short text files that point to specific locations within the records
with which they are associated. You can view the same locations using, for
example, the PhysioBank ATM. If you
haven’t set up a browser helper application for viewing .xws
files,
you can read them as text and copy the database, record, and annotator into
the PhysioBank ATM, then navigate to the location of interest.
You can set up WAVE
(actually, wavescript
) as a helper application for your
browser so that when you click on a .xws
file, WAVE will open
the associated record at the specified location using its built-in HTTP
client code (this is much faster and more flexible than using the
PhysioNet ATM). See Controlling WAVE from a web
browser in the WAVE User’s
Guide for details.
What is a “record name” or an “annotator name”?
Records are identified by
record names, which contain letters, digits, and underscores. For
example, the MIT-BIH Arrhythmia Database has record names consisting of
three-digit numbers beginning with ‘1’ or ‘2’, and the European ST-T Database
has record names that are four-digit numbers prefixed by ‘e’. Case is
significant in record names that contain letters, even in environments such as
MS-DOS for which case translation is normally performed by the operating system
on file names; thus ‘e0104’ is the name of a record found in the European ST-T
Database, whereas ‘E0104’ is not. A record is comprised of several files, which
contain signals, annotations, and specifications of signal attributes; each
file belonging to a given record normally includes the record name as part of
its name. The files named RECORDS
found in the PhysioBank database
directories list the record names for each database.
There may be many annotation files associated with the same record; they
are distinguished by annotator names. The name of an annotation file
is the record name, followed by a ‘.’, followed by the annotator name. The
files named ANNOTATORS
found in the PhysioBank database directories
list the annotator names for the annotation files that are available here. The
annotator name ‘atr’ is reserved to identify reference annotation files
supplied by the developers of the databases to document correct beat
labels. You may use other annotator names (which may contain letters, digits
and underscores, as for record names) to identify annotation files that you
create. You may wish to adopt the convention that the annotator name is the
name of the file’s creator (a program or a person).
How can I run ... on all of the records in a PhysioBank database?
Write a shell script to iterate over the records. You can use the
(text) file called RECORDS
in each database directory (see
the previous question) as the list of records to be processed; wfdbcat
can be used
to get this file from PhysioBank. For example:
for R in `wfdbcat mitdb/RECORDS` do echo -n "Record mitdb/$R ..." sigamp -r mitdb/$R -a atr -p done
The example above runs sigamp
on each record in the
MIT-BIH Arrhythmia Database (mitdb). Use whatever scripting language you wish;
the example is written in the standard POSIX sh
scripting language and
can be run in a terminal window on GNU/Linux, Mac OS X, or any other UNIX, or
in a Cygwin window under MS-Windows. For a tutorial introduction to writing
shell scripts, try the three-part Bash by Example series or the more comprehensive Unix/Linux
Shell Scripting Tutorial [external links open in another window].
Where are the annotation, signal, or header files I just created?
WFDB applications can read from local files or directly from remote locations such as the PhysioNet web site, but they always write to local files. In order to read annotation, signal, or header files that you have written, it will usually be simplest to begin within the directory (folder) that was current when they were created.
If you use WFDB applications to create new annotation, signal, or header files, those files are created within the current working directory (or, in some cases, its subdirectories). Thus, for example, the output annotation file created by the command
wqrs -r 100s
is a file in the current directory named 100s.wqrs
. If the
record name contains additional path information, the output file is
written in a location accessible by following that path from the
current directory. For example, the command
wqrs -r mitdb/100
writes its output annotation file (100.wqrs
) in the mitdb
subdirectory of the current working directory. If mitdb
doesn’t
exist in the current directory, wqrs
creates it.
Applications that use the WFDB library behave this way so that their output files can be located by other WFDB applications. For example, given the above, the command
wave -r mitdb/100 -a wqrs
displays the annotations from the local file created by wqrs
together
with the corresponding signals from PhysioBank. Neither wqrs
nor
wave
need to read local copies of the header or the (much
larger) signal files, however. If no local copies exist, they are
read directly from the PhysioNet server, using the additional path
information in the record name (mitdb/
, in this example) to find them.
How were the signals in PhysioBank digitized?
They come from many sources, but in all cases the signals have been digitally recorded or digitized from analog recordings. See the descriptions of the individual databases for details.
We are occasionally asked about digitizing paper ECGs and other hard-copy data. A brief survey of this subject is available here.
Should I use PhysioBank formats for my project?
Perhaps we’re biased, but we think so, and here’s why:
- There is a large amount of free and open-source software that reads and writes data in these formats. (The WFDB software package is a collection of such software, including viewers, signal processing and analysis applications, and an I/O library that can be used to build custom applications that read and write these formats.) If you use PhysioBank formats, you can use all of this software as is.
- These formats are reasonably storage efficient, while still permitting efficient random access. Since recordings can be of arbitrary duration (some of those in PhysioBank are up to 40 days in length), it is worthwhile to store them efficiently, not only to reduce disk requirements but also to reduce the time needed to transmit them and to read them. It is also worthwhile to be able to read only a segment of interest from somewhere in the middle of a long record, without having to read data sequentially from the beginning. These binary formats are more compact than text-based formats such as FDA XML, and less so than variable-length compressed formats such as SCP ECG; they represent a reasonable compromise in terms of storage efficiency to achieve a significant advantage in use efficiency.
- These formats, when used to store multiple simultaneous signals, have
the advantage over EDF that it is unnecessary to skip around in the
record in order to assemble a vector of simultaneous samples of signals
(a very common and basic operation in signal processing of multiple
signals).
If you can read an entire EDF frame (often a minute of digitized signals) into memory, and if a latency on the order of the frame length is tolerable, then EDF is a good choice, also.
- Although it has been argued that meta-information (signal descriptions,
sampling frequency, gain, etc.) should be kept in the same file as the
digitized signals, there are advantages to keeping this information
in separate “header” (
.hea
) files as is done on PhysioNet:- A large number of (very small) header files can be kept in one place (to make it easy to find a record), and the digitized signals in their (possibly very large) signal files can be kept in many locations (not necessarily the same directory, the same disk, or even the same file server).
- When recording signals, we often do not have a priori knowledge of details such as the length of the recording or the gains of the signal amplifiers (although we might record analog calibration signals so that gains can be measured later on). This information can be added when available to separate header files, without needing to rewrite the possibly huge associated signal files.
- Occasionally we assemble records from multiple instruments, or by combining recorded signals with additional (computed or synthesized) signals. In this case we can keep signal sets in separate, parallel signal files, and add their metadata to the header file without rewriting the signals.
Of course, it is possible to embed metadata in a signal file in PhysioBank format if this is desired; we don’t generally do so, for these reasons.
- There is an enormous amount of data already available in these formats (from PhysioBank and from other sources), so most users of physiologic signal data are already using these formats. For many researchers, the first step in using records in some other format is to convert them to a PhysioBank format, so that they can be analyzed using familiar tools.
Some PhysioBank formats that are good choices for new projects are format 16 (very easy to read using almost any software, though not as storage efficient as format 212 if you have 12-bit or lower resolution) and format 212 (most used in PhysioBank, because it’s ideal for widely available 12-bit resolution data, and it’s still relatively easy to read). Format 8 is not recommended for new projects, because it does not preserve the DC offset when used in random-access mode, and because it limits the maximum slew rate of the signals that can be recorded. (It is supported for historical reasons, and it was devised as a way to circumvent memory and storage limits encountered in recording the MIT-BIH Arrhythmia Database.)
If you need even more storage efficiency than is provided by PhysioBank formats, consider using gzip or bzip2 to compress files stored in format 16 or 212, or (especially for a commercial product) consider SCP ECG.
If you need an easy-to-read format, and efficiency is not a concern, use
rdsamp
’s output (text)
format (see this note).
Reading and Writing Digitized Signals
How can I find out what signals were recorded?
If you are looking for recordings with a specific type of signal, look first in the PhysioBank Archive Index, which indicates in general terms the types of signals in each of the available databases.
Within each database, there may be variations in the choice of signals from record to record. The pages that describe each database (see the links from the PhysioBank Archive Index) can help in locating subsets of records that contain specific signals of interest.
Each recording has an associated header file that
lists (among other things) the names of the signals included in that recording.
The wfdbdesc
utility
reads header files and produces a readable summary of their contents, including
the signal names. Many other PhysioToolkit applications that read PhysioBank
data are capable of printing or displaying signal names. The
PhysioBank ATM shows the names of the signals
belonging to the selected record (in the drop-down Signals list).
What do the signal names MLII, V2, ... mean?
Short answer: MLII and V2 are ECG signals. The names refer to the electrode positions, using standard nomenclature for lead names. MLII is "modified lead II", a bipolar lead parallel to the standard limb lead II, but acquired using electrodes placed on the torso (a requirement for long-term ECG monitoring). V2 is a precordial lead that is roughly orthogonal to MLII. These two leads are favored for many recordings, since MLII yields high-amplitude normal QRS complexes in most subjects, and V2 usually offers a nearly optimal frontal-plane projection of any ectopic beats that happen to be of low amplitude in MLII.
Long answer: Signals in PhysioBank databases have standardized names (see the previous question). Most of these are in common clinical use for designating signals such as arterial blood pressure (ABP), respiration (RESP), or heart rate (HR). Generally the pages that describe the databases in which these signals appear include definitions of any unusual signal names (see the links to these pages from the PhysioBank Archive Index).
Most ECG recordings contain two or more simultaneously recorded ECG signals, called "leads." Since the heart generates an electrical field that varies spatially as well as temporally, there is no uniquely determined (scalar) signal that offers a complete view of cardiac electrical activity. The standard practice among clinicians and researchers interested in the ECG is to record two or more signals (leads) derived using sensing electrodes placed at certain specific locations. Some leads are bipolar (they are potential differences between pairs of electrodes); others are unipolar (they are potentials measured with respect to an artificial "zero" reference potential typically derived by summing potentials measured at multiple locations). Confusingly, the wires that connect the electrodes to the recording equipment are also called "leads"; thus, for example, a five-lead (five-wire) harness is generally used to record a two-lead (two-signal) ECG!
The three Einthoven bipolar limb leads (designated I, II, and III) are determined by the pairwise potential differences between electrodes placed on the left arm (LA), right arm (RA), and left leg (LL); specifically, lead II is the potential difference between LL and RA. In most subjects, the axis defined by these points is roughly parallel to the mean cardiac electrical axis, so it is a lead in which QRS complexes are typically observable at nearly maximum amplitude.
In long-term ECG recordings (including most of those on PhysioNet), limb leads are not generally used, since physical activity causes significant interference in these leads. Commonly, equivalent "modified" leads are used, with electrodes placed on the torso in positions chosen so that the signals closely match the limb leads. This is possible because the cardiac electrical field is (to a good approximation) a time-varying dipole field, so that it is generally sufficient to choose positions that allow one to observe the same projections of the dipole field onto the axes defined by the limb leads.
For MLII (modified lead II), the LL equivalent electrode is ideally placed at the left iliac crest, and the RA equivalent electrode is ideally placed in the infraclavicular fossa, medial to the border of the deltoid muscle and 2 cm below the lower border of the clavicle.
Additional information about ECG lead systems can be found in many textbooks about electrocardiography. A clear and comprehensive discussion can be found in chapter 15 of Bioelectromagnetism: Principles and Applications of Bioelectric and Biomagnetic Fields by J Malivuo and R Plonsey (Oxford University Press, 1995; also available on-line here).
What do the signal names ‘signal 0’, ‘signal 1’, ... mean?
As noted in the answers to the previous two questions, signal names recorded in header files describe the signals in each record. In rare cases, however, this information is missing, usually because it was not preserved when the signals were recorded. Names of the form ‘signal N’ are default signal names used in such cases; they may appear in header files explicitly, or they may be displayed by PhysioToolkit software if no explicit signal names appear in a header file.
What is the format of the signal files?
Many formats are supported. Most signal files are written in “format 212”, in which two 12-bit samples are bit-packed into three 8-bit bytes, or in "format 16", in which a 16-bit sample is written as two bytes, least significant byte first ("little-endian"). See signal(5) in the WFDB Applications Guide for details on formats 16 and 212 and on the other supported formats.
To determine which format is used for a given signal file, look in the
associated header file. (This is a text file that usually has the same name as
the signal file, except for a suffix of .hea
instead of
.dat
.) Each line of the header file that begins with the name of the
signal file describes the format and contents of a signal within the signal
file. See header(5) in the WFDB
Applications Guide for details.
How can I read signal files?
If you would like to read signal files within a C, C++, Fortran, Java,
Perl, or Python program, see the WFDB Programmer’s
Guide for information on doing this using the WFDB library. Other
programming languages supported by SWIG may also be usable with the WFDB library, but have
not been tested. Briefly, use isigopen()
to open the files, and
getvec()
to read them.
If you would like to do this within a Matlab or Octave program, we recommend using the WFDB Toolbox for MATLAB. For an overview of this solution and a variety of alternatives of varying degrees of complexity, see Reading and writing PhysioBank and compatible data on the Contributed software for Matlab and Octave page. Note that Matlab and Octave are not able to import most signal files directly; for exceptions, see wfdb2mat.
Another possibility is to convert the portions of interest into
text format
using rdsamp
(described in detail in How
to obtain PhysioBank data in text form). To
save rdsamp
’s output in a file, or to read rdsamp
’s
output using another program, see this note.
Alternatively, segments of up to 100,000 samples in length of signal files found
on this web server can be converted into text using the
PhysioBank ATM, which can be accessed using
your web browser. This may be useful if you wish to read signals
using Excel or another spreadsheet (although spreadsheets in general
are not recommended as tools for signal processing, visualization, or
analysis; there are much better choices freely available in
PhysioToolkit).
MS-Windows Media Player and similar software for reading audio and multimedia files cannot be used to read these files (or any others in PhysioBank).
How can I use Matlab’s import feature to read signal files?
You can’t in general, because Matlab doesn’t know how to figure out which of the many supported formats is used in any given signal file, because it can’t understand the most commonly used formats in any case, and because (in many cases) signal files are orders of magnitude larger than any matrix that Matlab can handle.
You can export a segment of signal files up to a million samples in
length as a .mat
file readable by Matlab or Octave, using the
PhysioBank ATM. Longer segments may be
difficult to handle, but you can make them if you wish using
wfdb2mat
, included in the
WFDB software package and used by
the PhysioBank ATM. The .mat
files produced in this way can
be read and plotted using
plotATM.m.
See How can I read signal files? for a variety of other ways to read signal files from Matlab without using its import feature.
Why does Matlab say “file might be corrupt” when loading a huge .mat file?
Matlab cannot handle version 4 .mat
files containing more
than 100,000,000 samples. If you need to load a file this large (and
you have enough memory to do so), use Octave instead.
In most cases, however, a better strategy is to redesign your program so that it does not need to read the entire record into memory at once.
Is there any direct way of converting sample values to physical units using wfdb2mat
?
The program plotATM.m
reads
the output of wfdb2mat
and converts the raw samples into physical
units. The conversion is very simple and easily incorporated in your own
Matlab or Octave code if you prefer for whatever reason not to
use plotATM.m
; see plotATM.m
for details.
wfdb2mat
doesn’t do this conversion itself since this would
(1) increase the the size of the generated .mat
files by a
factor of 4 or 8 (depending on the ADC resolution), (2) slow down and
significantly complicate wfdb2mat
, because of the need to
convert from native (IEEE 754 on most platforms) floating-point format
to the VAX floating-point format required by Matlab, (3) make
the .mat
files incompatible with WFDB applications, and (4)
make it unnecessarily difficult to distinguish the effects of
quantization error from other sources of noise in the signal, for
those who might wish to do so.
How can I use Excel’s import feature to read signal files?
You can’t, because Excel doesn’t know how to figure out which of the many supported formats is used in any given signal file, because it can’t understand any of those formats in any case, and because signal files are almost always orders of magnitude larger than any spreadsheet that Excel can handle. Spreadsheets are not suitable for studying, visualizing, or analyzing digitized signals; many better tools are freely available in PhysioToolkit.
If, despite the above, you still wish to read a piece of a signal file into a spreadsheet, see How can I read signal files?.
Where do I get rdsamp
?
It’s part of the free, open source WFDB Software Package. Both C sources and binaries for several popular operating systems are available.
How do I use rdsamp
?
Install it, then type
rdsamp
for a brief summary of options. For details, see rdsamp(1) in the WFDB Applications Guide.
The output of rdsamp
is in text format. Unless you have used the
-v
option, the output contains data only (no column labels) and can be
plotted directly using, for example,
plt
. The first column contains
sample numbers (or elapsed times in seconds and milliseconds if you have used
the -p
option), and each of the remaining columns contains samples
for one signal (in raw ADC units unless you have used the -p
option).
To save rdsamp
’s output in a file, or to read rdsamp
’s output
using another program, see this note.
What do the sample values represent?
Analog-to-digital converters (ADCs) are usually used to produce PhysioBank
signal files, which consist of sequences of integer samples in unscaled
analog-to-digital converter units (adus). Samples are stored in this
way not only because doing so usually requires less space than most
alternatives, but also because this scheme introduces no loss of precision
beyond the quantization error of the ADC. By default, rdsamp
outputs
sample values in unscaled adus (raw ADC units).
The header file for each record contains fields that describe the characteristics of each signal and of the ADC used to digitize it. These fields include the signal type (such as ECG, ABP, or SpO2), the physical units of the original analog signal (such as mV, mmHg, or degreesC), the gain (the number of adus per physical unit), the baseline (the sample value that would correspond to a physical value of zero, which is often but not always at the center of the ADC range, and may even lie outside of the ADC range), the adczero (the sample value at the center of the ADC range, which is 0 for a bipolar ADC and a non-zero value for an offset binary ADC), and the number of bits of ADC precision (most often 12 for PhysioBank recordings). Taken together, the values specified for these parameters allow identification of the signals, conversion of sample values from raw ADC units to baseline-corrected physical units and back again, and calculation of the ADC range in raw or physical units.
Using rdsamp
’s -p
(physical unit output) option, or using the
PhysioBank ATM (which uses rdsamp
), the
sample values are presented in baseline-corrected physical units. The signal
types and units used appear in the first two lines of rdsamp
’s output
when using rdsamp
’s -v
(verbose output) option. Note that in
these cases, the sample values are given to exactly three decimal places by
default regardless of the precision of the integer samples, although additional
precision can be obtained using rdsamp
’s -P
option.
The database of Evoked Auditory Responses in Normals across Stimulus Level is the first (and so far, the only) PhysioBank database containing 24-bit and 32-bit signals. All other current PhysioBank databases were recorded using ADCs with resolutions of 16 bits or fewer. A raw sample value of -32768 has a special meaning: it signifies that no valid observation of the associated signal was made during the corresponding sampling interval. This value is the most negative number that can be represented in 16 bits, so (in data with 16 or fewer bits of ADC resolution) it is less than any valid sample value.
What does the message “init: can’t open header for ...” mean?
This message can be produced by any application linked to the WFDB library,
including rdsamp
and rdann
. In order to read data files,
these applications need to find a header (.hea
) file for the input
record you specify. The message indicates that the header file was not found
in any of the expected places, or that it was unreadable. There are three
common reasons why this can happen:
- The record name supplied to the application is
not correct. Note that record names are not file names; if you
wish to read, for example, a signal file named
slp60.dat
usingrdsamp
, you must specify the name of the record to which this file belongs (slp60
) after the-r
option, and not the name of the file itself. Whatever follows “init: can’t open header for ...” is what the application thinks is the name of the record you wish to read. Also, be aware that case matters in record names, even under operating systems that ignore case in file names. Thus “SLP60
” is not a valid record name; “slp60
” is. - The header file is missing. If you download signal (
.dat
) or annotation (.atr
,.qrs
, etc.) files, be sure to download the corresponding.hea
files from the same locations. - The list of locations to be searched does not include the location of the header file. WFDB applications find their input files by searching a list of locations specified by the WFDB path (the environment variable WFDB, or a default list of locations if WFDB has not been set). The WFDB path normally includes the current directory, but this may not be true if the WFDB path has been modified; the current directory must appear explicitly (either as a “.” or as an empty component in the path) in order to be included in the list of locations to be searched. For further information, see “The Database Path and Other Environment Variables” in the WFDB Programmer’s Guide.
I can’t run rdsamp
. Can you please send me a copy of ... in text
format?
Yes. Go to the PhysioBank ATM, request the data of interest, and save the results in your browser.
How can I get more than 100,000 samples?
The PhysioBank ATM “Show samples as text” and "Export signals as CSV" tools are intended to offer short segments of data in text form without the need for anything more than a web browser. They are not intended to be methods for obtaining large amounts of data.
The ATM’s “Export signals as .mat” tool also limits the amount of data
converted, but since .mat
format is significantly more compact than
text or CSV formats, the limit is 1,000,000 samples per signal. Larger
amounts may be difficult to load into Matlab.
You may download an entire record in EDF or PhysioNet (binary) formats using the ATM’s “Export signals as EDF”, “Make tarball of record”, or “Make zip file of record” tools (or by simply downloading the files you need from the PhysioBank archive with your web browser).
If you need more data in text, CSV, or .mat
format than
allowed by the ATM’s tools, first convert a short segment in the
desired format, then read the notes immediately below the ATM’s
control panel to learn how to run the ATM’s format conversion
applications (rdsamp
and wfdb2mat
) on your own
computer, without limitations on the length of the output. If you
install the
WFDB Software Package and run
rdsamp
on your own computer, for example, you can convert entire
records to text if you wish.
The PhysioBank ATM limits you to 100,000 samples in text or CSV formats at a time, because signal files converted to text can be very large, and reading them would not be possible with standard web browsers. If you do not wish to use any of the alternatives above, you may concatenate successive segments obtained by multiple requests to the ATM; this will allow you to obtain the data you wish without significantly affecting other PhysioNet users or crashing your web browser.
How can I create a PhysioBank-compatible record from my own data?
There are many ways to create a PhysioBank-compatible record. Here is an easy way to do so:
- If your signals are still in analog format, digitize them. For ECGs, we recommend using a sampling frequency of at least 120 Hz, with at least 8 bit resolution over a ±5 mV range (ideally, 250 Hz to 1 KHz, with 12 bit or higher resolution over a ±10 mV range). As is necessary whenever digitizing any signal, use an appropriate antialiasing filter (a low-pass filter with a cutoff no higher than about 40% of the sampling frequency).
- Write the samples into a file in text form, as a column of decimal numbers. If you have digitized more than one signal, use a separate column for each signal. (The software that was used to digitize your signals may include a means for doing this.)
- Read about wrsamp to see
how to prepare a binary signal file and a header file from the text file.
Typically, you will need to use a command such as
wrsamp -F 128 -G 102.4 -i data.txt -o Rec01 0
This example reads a text file named
data.txt
, and creates the files needed for a record namedRec01
, namely, a signal file namedRec01.dat
and a header file namedRec01.hea
. (See What are.dat
,.hea
,.atr
,.qrs
, ... files? for definitions of signal and header files.) The arguments of-F
and-G
specify that the signal was sampled at 128 Hz and that the signal was amplified in such a way that a step of 1 millivolt would appear as sample values that differ by 102.4 units. The final argument (0
) indicates that the leftmost column in the input (column 0) contains the data.
Records that belong to PhysioBank never have names that include upper-case characters, so you may wish to follow the example above and include at least one upper-case character in the names of any records you create, to avoid any possibility of confusing them with PhysioBank records.
There are shortcuts that may be useful if your data happen to be in a format for which a converter is available:
- If you have EDF files, use edf2mit to convert them to PhysioBank-compatible format.
- If you have AHA Database files (or others in AHA format), use a2m and ad2m to convert them to PhysioBank-compatible format.
-
If you have a
.wav
(audio format) file, it may be PhysioBank-compatible already, but you will need to make a PhysioBank-compatible header (.hea
) file for it using wav2mit, so that WFDB applications can read the.wav
file directly. (This works for most.wav
files, but there are many infrequently-used variants of.wav
format, and not all of them are compatible. Read on ifwav2mit
complains about your.wav
file.) -
If your signals are in another audio file format, you may be able to use audio
file conversion software such as the freely available SoX converter to create a
.wav
file first, and then usewav2mit
to finish the conversion.
If you wish to annotate your record, see How can I annotate a record?
Reading and Writing Annotations
What is an annotation?
Informally, an annotation is a note about some feature of a signal. On this web site, an annotation is a tag (label) that "points" to a specific sample of a digitized recording.
Most PhysioBank databases include annotations for each record. In some cases, these may be reference annotations that have been independently reviewed by one or more (usually, two) human experts; in others, they may be machine annotations generated by automated signal-processing and analysis software. The documentation for each database indicates what types of annotations are available.
Usually, annotations mark events that are localized in time (such as individual heart beats); sometimes, they are used to indicate persistent attributes (such as the beginning of a period of sleep). In recordings that contain two or more simultaneously recorded signals, an annotation can "point" to all signals at once, or to a specific signal.
Each annotation can be thought of as an object having six attributes: the
time
(the number of sample intervals that precede the sample that the
annotation marks); an annotation type (anntyp
[sic], usually displayed
as a mnemonic annotation code; see the next question); three numeric
attributes (subtyp
[sic], chan
, and num
); and an
optional string (the aux
string). Only the time
attribute
has a fixed meaning; all of the others can be redefined to fit the
characteristics of the data and the needs of the investigator.
Annotations are kept in files that exist independently of the signals that they annotate; this means, among other things, that multiple sets of annotations (created by different applications or people) can coexist, and that annotations can be read even if the signals to which they refer are not available.
Within an annotation file, annotations are stored in a compact binary format. See the questions and answers below for information about reading annotation files.
What do the annotation codes (N, V, S, F, ...) mean?
WFDB applications such as those used by
the PhysioBank ATM display annotations
using these and other codes. When these codes are used to annotate
ECGs, N
is a normal sinus beat, V
is a ventricular
ectopic beat, S
is a supraventricular ectopic beat,
and F
is a fusion of a normal beat and a ventricular ectopic
beat. These and many others are
described here.
What is the format of the annotation files?
Most annotations occupy two bytes, of which 10 bits contain the time interval (in units of sample intervals) from the previous annotation, and 6 contain an annotation type code. Special type codes allow for annotations at intervals that exceed 1023 sample intervals, and for other numeric and text fields to be associated with individual annotations. See annot(5) in the WFDB Applications Guide for details.
How can I read annotation files?
If you would like to read annotation files within a C, C++, Fortran, Java,
Perl, or Python program, see the WFDB Programmer’s
Guide. Other programming languages supported by SWIG may also be usable with the
WFDB library, but have not been tested. Briefly, use annopen()
to
open the files, and getann()
to read annotations from them.
If you would like to do this within a Matlab or Octave program, we recommend using the WFDB Toolbox for MATLAB. For an overview of this solution and a variety of alternatives of varying degrees of complexity, see Reading and writing PhysioBank and compatible data on the Contributed software for Matlab and Octave page. Note that Matlab and Octave are not able to import annotation files directly.
Another possibility is to convert the portions of interest into text format
using rdann
(described in detail in How
to obtain PhysioBank data in text form). rdann
can be
downloaded as part of the WFDB Software
Package and run on your own computer. To save rdann
’s output in a
file, or to read rdann
’s output using another program, see this note. Alternatively, annotation files found on this web
server can be converted into text using the
PhysioBank ATM, which can be accessed using your web
browser. The format of rdann
’s text output is described below.
What does the error “annopen: can’t read annotator ... for record ...” mean?
This message can be produced by any application linked to the WFDB library that attempts to read annotation files. In order to do so successfully, these applications need to find the annotation file for the annotator and input record you specify. The message indicates that the annotation file was not found in any of the expected places, or that it was unreadable. There are several common reasons why this can happen:
- The record name supplied to the application is
not correct. Note that record names are not file names; if you
wish to read, for example, an annotation file named
100.atr
usingrdann
, you must specify the name of the record to which this file belongs (100
) after the-r
option, and not the name of the annotation file itself. Whatever follows “for record ...” in the error message is what the application thinks is the name of the record you wish to read. - The annotator name supplied to the application
is not correct. Note that annotator names are not file names,
either; if you wish to read, for example, an annotation file named
100.atr
usingrdann
, you must specify the annotator name of the file (its suffix,atr
) after the-a
option, and not the name of the annotation file itself. Whatever follows “annotator ...” in the error message is what the application thinks is the annotator name of the file you wish to read. - The annotation file may not be in the WFDB path. Check this using
wfdbwhich
, as inwfdbwhich 100.atr
If
wfdbwhich
cannot find the annotation file, copy or move it into any of the locations in the WFDB path (listed bywfdbwhich
), or add the directory containing the annotation file to the WFDB path. For further information, see “The Database Path and Other Environment Variables” in the WFDB Programmer’s Guide.
Where do I get rdann
?
It’s part of the free, open source WFDB Software Package. Both C sources and binaries for several popular operating systems are available.
How do I use rdann
?
Install it, then type
rdann
for a brief summary of options. For details, see rdann(1) in the WFDB Applications Guide.
The format of rdann
’s output is described in the answer to the
next question.
To save rdann
’s output in a file, or to read rdann
’s output
using another program, see this note.
What are the columns in rdann
’s output?
If you add the -v
option at the end of the command line, rdann
prints a set of column headings above the first annotation line.
The output contains one annotation per line; from left to right, each line
contains the time of the annotation in hours, minutes, seconds, and
milliseconds; the time of the annotation in sample intervals; a mnemonic for
the annotation type; the annotation subtyp
[sic], chan
, and
num
fields; and the auxiliary information string, if any. The
meanings of the annotation type mnemonics and of the other fields are discussed
here.
For example, if we read the first five seconds of the reference (atr
)
annotations for record 200 of the MIT-BIH Arrhythmia Database using the
command
rdann -r mitdb/200 -a atr -t 5
then we obtain this output:
0:00.186 67 + 0 0 0 (B 0:00.625 225 V 1 0 0 0:01.352 487 N 0 0 0 0:01.913 689 V 1 0 0 0:02.677 964 N 0 0 0 0:03.186 1147 V 1 0 0 0:03.980 1433 N 0 0 0 0:04.472 1610 V 1 0 0
Each of these eight lines contains one annotation. The third column shows
the annotation mnemonics, and by referring to the
table of mnemonics.we can see
that the ‘+
’ in the first annotation indicates that it marks the
underlying rhythm of the beats that follow; the rhythm type is ventricular
bigeminy, specified by the “(B
” that appears in the aux
field
at the end of the line; see this
table for descriptions of rhythm annotation strings such as “(B
”.
The remaining seven lines each mark a QRS complex, associated with either a
normal (N) or premature ventricular (V) beat. Times in the first and second
columns indicate when the events marked by the annotations occur. For example,
the first V beat occurs 0.625 seconds (625 milliseconds), or 225 sample
intervals, after the beginning of the record. (A quick calculation shows that
one sample interval is 2.777... milliseconds for this record, or that its
sampling frequency is 360 Hz. Sample intervals may vary between records.)
The subtyp
, chan
, and num
fields in columns 4, 5,
and 6 are usually zero in reference annotation files, but occasionally one
or more of these fields is used to indicate additional information, as in
this case, in which the subtyp
field in the V annotations indicates
which of several ventricular ectopic beat morphologies has occurred. See
the documentation for the associated database to see how to interpret these
fields.
In some cases, the times in the first column may be enclosed in square brackets [like this]. This format indicates that the times are given as times of day (in the local time zone where the recording was made). Bracketed times may also include the date (in DD/MM/YYYY format), if this information is available. If the time of the beginning of the recording is not available, the times in the first column are not bracketed, and in this case they represent the elapsed time from the beginning of the recording.
I can’t run rdann
. Can you please send me a copy of ... in text
format?
Yes. Go to the PhysioBank ATM, request the data of interest, and save the results in your browser.
How can I create an annotation file?
How can I annotate a record?
If the signals you wish to annotate are not already in a PhysioBank-compatible
format, including .dat
and .hea
files, follow the
instructions in How can I create a
PhysioBank-compatible record from my own data?
If your record contains ECG or blood pressure signals, you may wish to make a beat annotation file. There are several ways to create one using PhysioToolkit software:
- Use sqrs, a good, fast and simple QRS detector.
- Use wqrs, a reasonably fast QRS detector that generally works better than sqrs.
- Use ecgpuwave, a very good QRS detector that also locates the P- and T-waves and their boundaries. Follow the link above to obtain ecgpuwave.
- Another PhysioToolkit application, wabp, can create a beat annotation file from a blood pressure signal.
You should always review the beat annotation file generated by any of these detectors; although all of them work well in most cases, there is wide variability among recordings, and any detector will make errors if the data quality is insufficient. There are, once again, several ways to do this:
- Use WAVE, an interactive viewer that permits you to correct QRS detection errors manually if you wish to do so; you can also create annotation files from scratch (completely manually) using WAVE. WAVE is included in the WFDB Software Package and runs under FreeBSD, GNU/Linux, Mac OS X, MS-Windows, and Solaris.
- Use pschart or psfd. Both produce PostScript output. If you don’t have a PostScript printer, you can view or print the output using GhostScript (a free PostScript-compatible rasterizer; follow the link to obtain GhostScript from its developers).
All four of these detectors mark all detected beats as normal (N). If your record includes abnormal beats, change the N annotations for these beats to the correct annotations (a complete list of annotation types can be found here). This can be done manually using WAVE.
Another possibility is to use OSAS, free software for QRS detection and beat classification available from its author (follow the link for details). This may be particularly helpful if your records contain more than a handful of abnormal beats, since OSAS can find most abnormal beats and annotate them appropriately, but it is still necessary to review the automatically-generated annotation file and correct any errors.
If you have annotations (or their equivalent) that must be converted into
PhysioBank-compatible annotation file format, it may be easiest to convert
them first into a text format that can be read by rr2ann, which can then
be used to produce the desired (binary) annotation file. If you wish to
use any of the optional annotation attributes (subtyp
, chan
,
num
, or aux
),
rr2ann will not be sufficient. In this case, you may wish
to convert your data first into rdann’s output
(text) format; this can be read as input by wrann,
which will convert the data into PhysioBank-compatible annotation format.
If you do this, note that the first column (time in hours, minute, and
seconds) must be present but need not be valid, since wrann determines
the annotation times from the second column (time in sample intervals); note
also that entries in the last column may be omitted for any annotations
that have an empty aux
field. Both rr2ann and wrann
read text-format data from their standard input.
Software
I double-clicked on the program icon, and nothing happens!
I typed the program name in the ‘Run...’ dialog, and nothing happens!
Don’t do this!
With few exceptions, PhysioToolkit applications run in text mode (i.e., they do not include a graphical user interface). These programs are intended to be run within a terminal emulator using a command-line interface. In most cases, if you attempt to run them by clicking on their icons or names, or by entering the program name in the MS-Windows Run... dialog box, these programs will open a DOS box, print a usage summary, and exit, usually much too fast for you to read anything.
By far the best way to use these programs under MS-Windows is to install a
Unix-compatible terminal emulator and shell in which to run them. The best of
these is also free; if you have not already done so, download and install the
Cygwin software package.
This package includes bash
(the GNU Bourne Again Shell), and a
terminal emulator in which to run it. After a standard installation of Cygwin,
you can launch a terminal emulator and bash
by clicking on the Cygwin
icon that will have been installed on your desktop.
If you do not wish to use Cygwin, it is possible to run text-mode
applications under MS-Windows within a DOS box, but there are many limitations
of command.com
that may prove frustrating. In particular,
command.com
supports a relatively small space for environment
variables that is not secure against buffer overruns, and has idiosyncratic
filename globbing behavior.
What is a “standard input” or a “standard output”?
These concepts are common to all text mode applications (see the previous question). A program’s standard input is whatever it reads from the keyboard (i.e., whatever you type into its terminal emulator window once the program begins to run). A program’s standard output is whatever it prints in its terminal emulator window. There are (of course) exceptions, and the exceptions are what make these ideas useful!
First, it’s possible to redirect either or both of the standard
input and the standard output before the program begins to run, by adding
appropriate parameters to the command line. So, for example, a program
named pour
can read its standard input from a file named
teapot
, and then write its standard output to another file named
teacup
, using a command such as:
pour <teapot >teacup
(For an explanation of this command, see the answer to the next question.)
Second, most applications have an additional standard error output that is also printed in the terminal window, intermingled with the standard output. The standard error output is reserved for warning and error messages. If you redirect the standard output to a file, the standard error output still appears in the terminal window (and is not copied into the file). In most cases, this is useful behavior, since it allows you to see quickly if there have been any errors or warnings without the need to look through what may be lengthy output. If you wish, you can capture the standard error output in its own file using a command such as:
frobnicate <input >output 2>errors
How can I save the output of ... in a file?
How can one program read another’s output?
If you are running programs from a command prompt (by typing commands into a terminal emulator window or an MS-DOS box), these things can be done easily.
If you have ever used GNU/Linux, Unix, or MS-DOS, you may have captured the output of a program by redirecting it to a file, like this:
foo >bar
The >
operator redirects foo
’s standard output (which
would normally appear on-screen) into a file named bar
. If
bar
exists already, its contents are replaced. If you wish to
append foo
’s output to whatever is already contained in bar
,
use a command such as this instead:
foo >>bar
There is an analogous operator that arranges for a program’s standard input (which would normally be read from whatever you type on the keyboard) to be read from a file instead:
baz <bar
Here, the <
operator arranges for baz
to read its
input from a file named bar
. If bar
was created by
foo
, then this command allows baz
to read foo
’s
output.
You can combine input and output redirection in a single command using the
pipe (|
) operator:
foo | baz
This command runs foo
and sends its standard output directly to
baz
, without requiring an intermediate file. True multitasking
operating systems such as Unix, GNU/Linux, and Mac OS X allow both programs
to run (apparently) simultaneously; under MS-DOS or MS-Windows, the first
program runs to completion before the second one begins execution.
You can use these techniques whenever you run programs from a command prompt, whether those programs are among those available here or obtained from some other source. You can use the same techniques with programs you write yourself; the only requirement is that your programs must read from the standard input and write to the standard output (i.e., they must not attempt to bypass the standard input/output mechanism by reading directly from the keyboard or writing directly to the screen).
These operators (>
, >>
, <
, and
|
) are supported by all shells (command interpreters)
under Unix, GNU/Linux, Mac OS X, and MS-DOS (including those that run within
MS-DOS boxes or other types of terminal emulators under MS-Windows). For
further information, please refer to the documentation for your shell or
command interpreter.
My question is about WAVE (or gtkwave). Is there a WAVE FAQ?
Yes, look here for answers to many frequently asked questions about WAVE. The gtkwave project is no longer active, since WAVE now runs on all of the popular platforms, including those formerly supported by gtkwave only.
I tried to compile ... but the compiler can’t find wfdb.h (or ecgcodes.h, or ecgmap.h).
These files are included with the WFDB library. Most of the PhysioToolkit applications use at least one of them; if you are trying to compile such an application, you will need to have installed the WFDB library and its *.h files first. The easiest way to do this is to install the WFDB Software Package, which includes the WFDB library and many of the PhysioToolkit applications. Find instructions for doing so in the quick start guide for your platform (FreeBSD, GNU/Linux, Mac OS X (Darwin), MS-Windows, and Solaris), or on the WFDB Software Package introductory page.
If you have already installed the WFDB Software Package and your compiler is still complaining, the WFDB *.h files may not be installed in any of the directories where your compiler is looking for them. Use wfdb-config to find out where they are.
I tried to compile ... but the compiler complains that isigopen (or iannopen, or strtim, or wfdbinit) is undefined.
These are among the functions defined in the WFDB library; most of the PhysioToolkit applications use at least one of these functions. If you are trying to compile such an application, it must be linked to the WFDB library. If you have not yet installed the WFDB library, see the answer to the previous question.
For details on how to link to the WFDB library, see Compiling a Program with the WFDB Library in the WFDB Programmer’s Guide.
I’m writing a program to work with PhysioBank data, but my compiler can’t link to the WFDB library. What should I do?
If you are using one of the precompiled versions of the library, be sure that you have the correct version for use with your compiler and operating system. If there is none available, you have two reasonable choices:
- Get the WFDB library sources and, if desired, the libcurl or libwww sources, and compile them yourself. Please contribute the binary to PhysioToolkit once you have tested it and are sure it works properly.
- Use a compiler for which a precompiled version of the library is available, such as the free and open-source GNU C compiler for GNU/Linux, Mac OS X, MS-DOS, MS-Windows, and Solaris.
Where on this site can I find software for my favorite operating system/compiler?
Look in PhysioToolkit, the repository for all software available on this site. With very few exceptions, the software available here is portable among all popular operating systems, including GNU/Linux, Mac OS X, MS-Windows, and Unix. Since all of it is provided in source form, you can compile it (using free or proprietary compilers) into binaries that can run under any of these operating systems.
For convenience, some PhysioToolkit software is also available as ready-to-run binaries for a variety of operating systems.
Generally, the same sources can be compiled without modification under any
supported OS or compiler; you will not find separate sets of sources for
different compilers or platforms. Following conventions used by most
free or open-source software, look for files named README
or
INSTALL
in each software package; these files indicate what’s included
in the package, and how to compile it from the sources.
Most PhysioToolkit software is written in portable (ANSI/ISO standard) C. ANSI/ISO C code can be compiled by all standard C++ compilers. There is a small amount in other languages, including Fortran 77 and Matlab/Octave m-code. If you don’t have a C or C++ compiler, we strongly recommend the excellent and free GNU Compiler Collection (gcc), which includes C, C++, and Fortran 77 compilers (among others), and is available for a vast range of platforms, including GNU/Linux, Mac OS X, MS-Windows, and all versions of Unix.
If you wish to write your own software to work with PhysioBank data, the WFDB library provides standard, portable interfaces in C, C++, and Fortran for doing so. The wfdb-swig package provides Perl, Python, Java, and C# interfaces to the WFDB library. Matlab can use any of several compatible APIs.
Although it is possible to compile PhysioToolkit software using proprietary compilers, you are generally on your own if you choose to do so; we don’t use these compilers ourselves, and we can’t help you learn how to use them.
Can I use your code in my commercial application?
Yes. There are two different categories of PhysioToolkit code, and the rules for using them are slightly different.
The WFDB library is free under the GNU Lesser General Public License (LGPL). The LGPL permits you to use (or sell, or give away) the library with your own code. The only significant restriction is that you must make the sources for the library itself freely available. You do not need to disclose the sources for your own code simply because you have used the WFDB library with it.
All of the remaining PhysioToolkit software (the applications) is free under the GNU General Public License (GPL). What this means in simple terms is that you can sell it or give it away to others, but if you do so, you must distribute the sources under the same terms as those under which you received them.
If you incorporate GPL code into your own code, the resulting code must be distributed under the GPL or not at all; this is the so-called “viral” property of the GPL. What this implies is that you cannot simply make minor (or even major) modifications to free code and then sell it without honoring the original terms under which you received it.
There are ways to use GPL code together with proprietary code, however. For example, software that reads output from a GPL program (or that writes data to be read by a GPL program) does not automatically fall under the GPL. As another example, you may incorporate GPL code in a plugin for a proprietary program, but the sources for the plugin itself would have to be made available under the GPL.
Contributors of software may choose another license conforming to the Open Source Definition (OSD), so that, in the future, other licenses may apply. Other OSD licenses have provisions very similar to those outlined above.
How should I report a bug?
First, be sure that it is a bug. Try to reproduce it. Try doing so on another computer if possible.
If you have not read How to Report Bugs Effectively, please take a few minutes to do so. (Important: do not send bug reports about PhysioNet to the author of How to Report Bugs Effectively; he is an innocent bystander.)
Bug reports should provide enough specific information to permit duplicating your problem. At a minimum, this information includes:
- the name and version number of the software in which you found the bug, and the location on PhysioNet where the software can be found
- the name and version number of your operating system (e.g., Mac OS X 10.4, Fedora Core 4, Windows XP Professional)
- the exact command or sequence of events needed to replicate the problem
- an exact copy of any text output, including any errors or warning messages encountered
- the symptoms of the bug (how the output varied from what you expected, e.g., “v0 is smaller than it should be by a factor of 400”)
Do not send binary input or output files or core dumps unless requested. If you can reproduce the problem using input data available on PhysioNet, please tell us how to do so.
Carefully written bug reports are very valuable to us; we want our software to work reliably, we are grateful for information that helps us to fix defects, and we acknowledge the help of those who send us useful bug reports. If you wish to remain anonymous, please let us know when you write.
If you are able, by inspection of the sources, to locate the cause of a problem, tell us what you discover. If you can fix the problem yourself, send us a patch against the latest sources. These things, though very much appreciated, are not essential components of a useful bug report, however; what is essential is an accurate description of the symptoms of the bug. In some cases, what we think of as a feature may be what you think of as a bug; please help us understand what looks wrong to you. Without this context, a patch may be of little use to us.
All software on this site is provided in source form. If the documentation for the software in which you have found a bug does not provide an email address for bug reports, find it at the top of the source file. Please send all bug reports to both the author/maintainer and PhysioNet.
Help!
Some links don’t work, but I don’t see any error. Why not?
On PhysioNet, links to external sites (URLs that point outside of the PhysioNet domain) are designed to open the external URL in a separate window or tab. In most cases, this window or tab will open on top (in front) of the window that contains the link, but your browser and your window manager or operating system may override this behavior, especially if the second window was already open and was hidden. If clicking on a link doesn’t seem to do anything, check to see if there is a second browser window that is hidden behind other windows, or iconified (minimized, closed).
If you are sure that a link is broken, please send a note about it to
webmaster@physionet.org
.
I’m having trouble viewing images on this site. Why?
Most of the graphics on PhysioNet, including all of the dynamically generated graphics, are PNG images. PNG has been a W3C recommendation since 1996, and is one of only three standard image types that are rendered by all current graphical web browsers, most of which have supported PNG for ten years or more. The other supported types are JPEG (which uses lossy compression and is best suited for continuous-tone graphics such as photos) and GIF (which uses a lossless compression algorithm that is inferior to PNG’s). If you have an obsolete browser, upgrading it should fix this problem.
The QuickTime plugin sometimes interferes with some browsers’ built-in capability of rendering PNG images, however, notably when using MSIE. To avoid these problems, update or uninstall QuickTime, or use another browser such as Chrome or Firefox.
I’m having trouble printing PostScript files from this site. Why?
PostScript versions of books and papers available here are ready to be printed
on a PostScript printer without any additional formatting. Some users have
experienced problems, particularly with older PostScript versions of the
WFDB Applications Guide (which consists of several PostScript
documents concatenated together into one file). Software that attempts to
insert additional PostScript code, or that attempts to reformat these files
rather than simply printing them as is, is generally the cause of these
difficulties. MS-Windows users can use GSView to view
or print these files. Under UNIX, GNU Linux, or Mac OS X, simply print the
files using lp
or lpr
, or view them using gv. (Both GSView
and gv require
GhostScript to render PostScript or PDF input.) If your printer has
insufficient memory, it may stop after printing part of the file; in this case,
try using GSView or gv to print the file in sections.
If a PDF version of the file is available here, you may also wish to try printing it using GSView, gv, or xpdf (all three of these are free and open-source), or Adobe Acrobat Reader (free binaries, closed-source).
I don’t understand how to use the software or data on this site.
Go back to the beginning of this FAQ and read it carefully. Still confused? Read on....
Are you looking for something specific? Examples might include:
- a set of data or software from a published study
- an answer to a question
- data of a specific type
- software to solve a specific problem
If so, try using the Search tool. All text on the PhysioNet web site is indexed and can be found by searching for it. To do this, type one or more terms related to your topic or question into the search box below, then click on the “Search” button to its right:
A similar search box and button appear in the top right corner of this and almost every other page on PhysioNet.
Have you found something relevant to your interest, but don’t know how to use it? If so, look for tutorial materials that can help you get started. Browse through PhysioNet’s list of tutorials, or use a PhysioNet search to find information to help you get started.
If you have a question about a specific page in the PhysioNet web site, click on the “webmaster@physionet.org” link at the bottom of that page; doing this opens a preaddressed email window, with the URL of the page filled in as the subject, which will help us to understand the context of your question and to give you a relevant answer.
Before writing, please formulate specific questions (“I don’t understand how to use the data.” or “The software doesn’t work, please help me!” are examples of non-specific questions that cannot be usefully answered). Whoever replies to your question cannot read your mind; if you don’t say clearly what you need to know, you will not get a satisfactory answer.
Don’t be offended if the reply to your question is “Read the FAQ!” (this page). If the answers aren’t here, or if they aren’t clear, write again, and try to be more specific, or to point out what’s confusing or missing in the FAQ. Doing so will not only help you to get a useful answer, but it will also help us to write a better FAQ.
What’s a man
page?
A man
page is a concise description of how to use a piece of software,
intended to be read using man
or one of its work-alikes. Think of
man
pages as pages from a reference manual.
All Unix platforms, including GNU/Linux and Mac OS X, as well as
Cygwin/MS-Windows, include a program called man
that can be used to
find and display man
pages. This is the standard form of
documentation for all Unix software. Almost all PhysioToolkit applications
have man
pages. The near-universality of man pages means that you are
very likely to be able to learn about any program by typing its name as an
argument to a man
command, as in:
man tar
which will display the man
page that describes the standard
tar
command. On most platforms, the output of man
is sent
through a program such as more
, which allows you to read it one
screenful at a time; you may usually advance to the next screenful by pressing
the space bar, go back a screenful by typing ‘b’, or exit by typing ‘q’.
The format of man
pages is fairly rigid, which allows a variety of
software to extract useful information from them for purposes of indexing,
cross-referencing, etc. They are not intended as tutorial material, but
once you are familiar with their format, reading them is usually the quickest
way to learn how to use the software they document.
The largest collection of man
pages on PhysioNet is the WFDB Applications Guide, which includes not only
the man
pages for the roughly 70 applications in the WFDB Software
Package, but also those for a number of contributed applications that are
compatible with WFDB Software. These pages can be read within your web
browser; if you download and install the WFDB Software Package on your own
computer, you can use man
to read the local copies of these
man
pages that will be installed together with the software itself.
A distinctive feature of PhysioNet’s man
pages is the Sources
section at the end of each one, with one or more URLs that give the location
of the software sources. This feature is particularly handy if you are
reading a man page in your web browser and would like to refer to the
source of a program in order to see how it is implemented.
The Computer Science Department of McGill University offers a gentle introduction to reading man pages that will help you get started if you haven’t used man pages previously.
Although you won’t find the acronym “RTFM” used elsewhere on this site, it refers to the usefulness of Reading The Fine Manual (i.e., the man pages) to inform yourself about software that may be unfamiliar. Try it!
Are PhysioNet or its mirror maintainers responsible for the content of external sites?
No.
Why isn’t my question here?
This FAQ is revised frequently, and we may not have got to your question yet. It’s possible that yours is an infrequently, maybe even never-before, asked question. No matter, we’d like to hear it, and we’ll try to answer it as quickly as possible. Please send us feedback by following the link below!
What’s the magic word?
“Please.”
If you would like help understanding, using, or downloading content, please see our Frequently Asked Questions. If you have any comments, feedback, or particular questions regarding this page, please send them to the webmaster. Comments and issues can also be raised on PhysioNet's GitHub page. Updated Monday, 07-Aug-2017 15:55:13 CEST |
PhysioNet is supported by the National Institute of General Medical Sciences (NIGMS) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number 2R01GM104987-09.
|