This is not the user-friendly search tool that you are looking for! We are developing such a tool, which will encapsulate the procedures described below within a web-accessible GUI, and we will post a pointer to it here as soon as it's ready to be used.
If you don't mind a bit of typing, however, this page describes how to use grep, cut, and uniq to find records in PhysioBank that have desired features (such as specific combinations of signals). If you are familiar (or become familiar) with other standard command-line tools for text manipulation such as sort and join, you will be able to do much more.
The necessary command-line tools are standard components of Linux, Mac OS X, and all other Unix and Unix-like platforms; Windows users can get them by installing Cygwin.
The PhysioBank Index
All records in PhysioBank that can be viewed by the PhysioBank ATM (nearly 30,000 as of June 2011, from over 50 collections) are indexed in this text file:
physiobank-index (392M, last updated Wednesday, 17 April 2019)
Each line of the PhysioBank Index describes one signal, annotation file, or other feature of a single record; there are about 420,000 lines in the Index. All lines pertaining to any given record are consecutive, and the records appear in dictionary order. The first few lines of the Index are:aftdb/learning-set/n01 ECG1 ECG 128 142.045 adu/mV 60 aftdb/learning-set/n01 ECG2 ECG 128 143.062 adu/mV 60 aftdb/learning-set/n01 AnnM1 qrs 128 76 60 0-60 aftdb/learning-set/n01 AnnR2 qrsc 128 76 60 0-60 aftdb/learning-set/n02 ECG1 ECG 128 202.429 adu/mV 60 aftdb/learning-set/n02 ECG2 ECG 128 202.429 adu/mV 60 aftdb/learning-set/n02 AnnM1 qrs 128 73 59 1-60 aftdb/learning-set/n02 AnnR2 qrsc 128 73 59 1-60
The lines above describe records n01 and n02 of the collection named aftdb/learning-set. (The file DBS contains short descriptions of each collection; aftdb is the AF Termination Challenge Database, which contains a learning set and two test sets of records.)
Each line of the Index contains up to seven tab-separated columns that describe a signal, annotation set, or feature associated with the record. For lines describing signal and annotation sets, these columns are (from left to right):
- Record name
- Class
- Signal or annotator name
- Sampling frequency (Hz)
- Gain (adu per physical unit), or number of annotations
- Duration (in seconds)
- Time intervals during which samples* or annotations are present (in seconds)
Lines describing features are not present for all records; they are described below.
Class is the category of data: either a category of signals (defined in sigclasses), a category of annotations (either AnnM for machine-derived annotations, or AnnR for reference annotations), or a category of features associated with the record (either AgeSex, Med [medications], Diag [diagnoses], or Info [other information about the subject or the recording]). A sequence number is affixed to each instance of the class if more than one instance is possible in a single record (e.g., ECG1, ECG2, etc.); this is done even if only a single instance is actually present.
An adu is one analog-to-digital converter unit (the quantization step, which is the smallest measurable difference between samples). An amplitude resolution of 20 adu/mmHg means that two unscaled samples that differ by 20 units represent a pressure difference of 1 mmHg.
* In most cases, signals are present throughout, and the last column is omitted. The MIMIC II Waveform Database is an exception to this rule.
Typical record feature lines appear below:
iafdb/iaf1_afw Diag1 Atrial Fibrillation iafdb/iaf1_afw Meds1 Atenolol, Monopril iafdb/iaf1_afw Info1 Adenosine injected at 70 sec iafdb/iaf1_afw Info2 Note: signals are uncalibrated iafdb/iaf1_afw AgeSex 81 F
As for the signal and annotation set lines, the first two columns are the record name and class (data type). The first four feature lines shown above illustrate diagnoses, medications, and two lines of free-text information; the data appear in the third column. The final feature line contains the age (in years) in the third column, and the sex (M, F, or ? in the fourth column). If the subject's age is over 89, it is shown as 90 (since ages over 89 are protected health information); if the age was not recorded, it is shown as -1. Ages of infants less than 1 year old may be shown as 0, or as a decimal fraction of a year (e.g., 0.3).
Using the PhysioBank Index
Begin by downloading the Index from the link above.
Open a terminal emulator window and navigate to the directory in which you saved physiobank-index.
There are five records in PhysioBank that include a left ventricular stroke volume signal, which is labelled SV. Finding them is simple: type
grep SV physiobank-index
and the results appear in your terminal window quickly:
slpdb/slp59 SV SV 250 7.93846 adu/ml 14400 slpdb/slp60 SV SV 250 7.90293 adu/ml 21300 slpdb/slp61 SV SV 250 958.995 adu/cc 22200 slpdb/slp66 SV SV 250 9.957 adu/ml 13200 slpdb/slp67x SV SV 250 5.25615 adu/ml 4620
If you were looking for such recordings, you would now know where to find them by looking at the record names in the left-hand column.
Getting (re)acquainted with the command line
If you've ever used any version of Unix, or even MS-DOS, the examples on this page should not look strange. If they do, consult any introductory book or on-line tutorial about Unix or Linux. Here are a few places to start:
- The necessary command-line tools are standard components of Linux, Mac OS X, and all other Unix and Unix-like platforms; Windows users can get them by installing Cygwin.
- The PhysioNet FAQ includes basic information about standard input and output, I/O redirection, and pipes, powerful and easily-understood concepts that are useful whenever working on the command line.
- On-line tutorials, such as Working with Data or Command Line Essentials: Text and Pipeline, provide more examples of the use of the tools shown here.
- After nearly 30 years, Kernighan and Pike's The Unix Programming Environment remains the best introduction to this approach of tackling problems using tools that each do one job well, and work well together. Used copies are far less expensive than new.
If we want to find records that have at least 3 ECG signals, we can look for ECG3:
grep ECG3 physiobank-index
This results in a very long list of records that quickly scrolls off the screen. If we want to know how long the list is, we can use wc to count the lines:
grep ECG3 physiobank-index | wc -l
(The pipe symbol, '|', connects a pair of commands; it means "take the standard output of the command on the left and feed it to the standard input of the command on the right".) When this page was written, there were 6519 recordings with at least 3 ECG signals in PhysioBank. We can save the entire list by redirecting the standard output into a file, like this:
grep ECG3 physiobank-index >ECG3-records
The '>' collects the standard output of the command, which would otherwise be shown in the terminal window, and saves it in a file (ECG3-records).
Suppose what we really want are the longest such recordings. Here's how to find the 5 longest cases:
grep ECG3 physiobank-index | cut -f 1,6 | sort -nr -k2 | head -5
(This command uses pipes to chain four commands together, each one reading the output of the previous one; cut selects the first and sixth fields from each line output by grep; sort rearranges the lines in reverse numerical order of the second field output by cut; and head discards all but the first five lines output by sort.) The output lists 5 recordings, each containing over 400 hours of ECG3:
mimic2db/a46013/a46013 1815699 mimic2db/a44012/a44012b 1608251 mimic2db/a40308/a40308 1577319 mimic2db/a44261/a44261c 1531616 mimic2db/a44267/a44267b 1527637
There is a caveat, however: these recordings are all from the MIMIC II database, and the signals are not necessarily continuous; in fact, they may not even be simultaneously available. To find a set of long records with at least 3 continuous, simultaneous ECG signals, we can exclude the MIMIC databases and the similar Challenge 2009 database from the search:
grep ECG3 physiobank-index | \ egrep -i -v "mimic|challenge/2009" | \ cut -f 1,6 | sort -nr -k2 | head -5
(Here the \ characters indicate the command continues on the following line.) The results are:
ltstdb/s30691 85860 ltstdb/s30731 85845 ltstdb/s30801 85821 ltstdb/s30741 85800 ltstdb/s30752 85736
These somewhat contrived examples illustrate the flexibility of using standard command-line tools to search within the PhysioBank Index. If these tools are already familiar, it's easy to perform much more complex searches, including many that would be very difficult to perform using a relational database and SQL.
If you would like help understanding, using, or downloading content, please see our Frequently Asked Questions. If you have any comments, feedback, or particular questions regarding this page, please send them to the webmaster. Comments and issues can also be raised on PhysioNet's GitHub page. Updated Monday, 13-Jul-2015 20:52:50 CEST |
PhysioNet is supported by the National Institute of General Medical Sciences (NIGMS) and the National Institute of Biomedical Imaging and Bioengineering (NIBIB) under NIH grant number 2R01GM104987-09.
|