A challenge from PhysioNet and Computers in Cardiology 2000
22 September 2000: The deadline for entries has passed and
no further entries will be accepted. The final scores have now been posted
here, together with links to the
abstracts submitted by entrants for presentation at Computers in Cardiology.
14 March 2003: Several of the participants in this challenge, together with the organizers, have published a paper that compares the methods used in the challenge and investigates how several of the most successful strategies can be combined. This paper can now be read on-line: [PDF] Penzel T, McNames J, de Chazal P, Raymond B, Murray A, Moody G. Systematic comparison of different algorithms for apnoea detection based on electrocardiogram recordings. Medical & Biological Engineering & Computing 40:402-407, 2002. January 2009: Andy Fraser, who with colleagues James McNames and Andreas Rechtsteiner won event 2 and achieved a perfect score in event 1, has published a book about hidden Markov models and their applications, with a chapter detailing the winning method and variations on it: Fraser AM. Hidden Markov Models and Dynamical Systems [ISBN 978-0-898716-65-8]. Philadelphia: Society for Industrial and Applied Mathematics, 2008. A list of the papers about the Challenge presented at Computers in Cardiology 2000 is available. |
Introduction
Obstructive sleep apnea (intermittent cessation of breathing) is a common problem with major health implications, ranging from excessive daytime drowsiness to serious cardiac arrhythmias. Obstructive sleep apnea is associated with increased risks of high blood pressure, myocardial infarction, and stroke, and with increased mortality rates. Standard methods for detecting and quantifying sleep apnea are based on respiration monitoring, which often disturbs or interferes with sleep and is generally expensive. A number of studies during the past 15 years have hinted at the possibility of detecting sleep apnea using features of the electrocardiogram. Such approaches are minimally intrusive, inexpensive, and may be particularly well-suited for screening. The major obstacle to use of such methods is that careful quantitative comparisons of their accuracy against that of conventional techniques for apnea detection have not been published.
We therefore offer a challenge to the biomedical research community: demonstrate the efficacy of ECG-based methods for apnea detection using a large, well-characterized, and representative set of data. The goal of the contest is to stimulate effort and advance the state of the art in this clinically significant problem, and to foster both friendly competition and wide-ranging collaborations. We will award prizes of US$500 to the most successful entrant in each of two events.1
Data for development and evaluation
Data for this contest have kindly been provided by Dr. Thomas Penzel of Philipps-University, Marburg, Germany, and are available here.
The data to be used in the contest are divided into a learning set and a test set of equal size. Each set consists of 35 recordings, containing a single ECG signal digitized at 100 Hz with 12-bit resolution, continuously for approximately 8 hours (individual recordings vary in length from slightly less than 7 hours to nearly 10 hours). Each recording includes a set of reference annotations, one for each minute of the recording, that indicate the presence or absence of apnea during that minute. These reference annotations were made by human experts on the basis of simultaneously recorded respiration signals. Note that the reference annotations for the test set will not be made available until the conclusion of the contest. Eight of the recordings in the learning set include three respiration signals (oronasal airflow measured using nasal thermistors, and chest and abdominal respiratory effort measured using inductive plethysmography) each digitized at 20 Hz, and an oxygen saturation signal digitzed at 1 Hz. These additional signals can be used as reference material to understand how the apnea annotations were made, and to study the relationships between the respiration and ECG signals.
The database does not contain episodes of pure central apnea or of Cheyne-Stokes respiration; all apneas in these recordings are either obstructive or mixed. Minutes containing hypopneas (defined as intermittent drops in respiratory flow below 50%, accompanied by drops in oxygen saturation of at least 4%, and followed by compensating hyperventilation) are also scored as minutes containing apnea. Additional information about the recordings was posted here after the conclusion of the competition, including (for all recordings) age, gender, height, weight, AI (apnea index), HI (hypopnea index), and AHI (apnea-hypopnea index). The subjects of these recordings are men and women between 27 and 63 years of age, with weights between 53 and 135 kg (BMI between 20.3 and 42.1); AHI ranges from 0 to 93.5 in these recordings.
Sleep apnea definitions
Several definitions for clinically significant sleep apnea have been in clinical use since 1978, when Guilleminault defined "sleep apnea syndrome" as more than 30 apneas per night. In 1981, Lavie proposed a more selective criterion of 100 apneas per night. Later criteria were based on an "apnea index" (the number of apneas per hour, or the number of minutes containing apnea per hour). Most clinicians regard an apnea index below 5 as normal, and an apnea index of 10 or more as pathologic. In 1988, He et al. found increased mortality in untreated patients with apnea indices of 20 or more, and such patients are now recognized as in need of treatment. Criteria used in current practice rely not only on an apnea index, but also on symptoms and cardiovascular sequelae.2
Data classes
For the purposes of this challenge, based on these varied criteria, we have defined three classes of recordings:
- Class A (Apnea): These meet all criteria. Recordings in class A contain at least one hour with an apnea index of 10 or more, and at least 100 minutes with apnea during the recording. The learning and test sets each contain 20 class A recordings.
- Class B (Borderline): These meet some but not all of the criteria. Recordings in class B contain at least one hour with an apnea index of 5 or more, and between 5 and 99 minutes with apnea during the recording. The learning and test sets each contain 5 class B recordings.
- Class C (Control): These meet none of the criteria, and may be considered normal. Recordings in class C contain fewer than 5 minutes with apnea during the recording. The learning and test sets each contain 10 class C recordings.
Events and scoring
Each entrant may compete in one or both of the following events:
1. Apnea screening
In this event, your task is to design software that can classify the 35 test set recordings into class A (apnea) and class C (control or normal) groups, using the ECG signal to determine if significant sleep apnea is present. Your classifications for the 5 class B (borderline) recordings will not influence your score in this event (but you must classify them into either class A or class C, since you will not know which records belong to class B until the correct classifications of the 35 test set records are disclosed after the end of the contest). Your score for this event is simply the number of correct classifications; thus the maximum score possible is 30.
An example may help to clarify the scoring: A contestant submits her results, classifying 22 recordings in class A and 13 in class C (for a total of 35). Out of the 22 recordings that her software has identified as class A, 16 of them are actually class A, 3 are class B and 3 are class C. Out of the 13 recordings that her software identified as class C, 7 have been correctly identified, and other 6 include 4 class A and 2 class B. The score in this case is 23 (16 correct class A identifications, plus 7 in class C). Class B cases do not contribute to the final score; rather, they provide a buffer zone between the cut of classes A and C.
We have chosen to exclude the class B recordings from the calculation of the scores because the utility of a screening test depends primarily on the accuracy with which it classifies the unambiguous cases, both positive and negative (classes A and C respectively in this instance). If you wish to attempt to classify recordings into all three groups, you may submit a second set of classifications, and we will calculate your score in the same way (but the maximum possible score in this case will be 35). The highest scores obtained in this way will be published, but will not be the basis for an award.
2. Quantitative assessment of apnea
In this event, your software must generate a minute-by-minute annotation file for each recording, in the same format as those provided with the learning set, using the ECG signal to determine when sleep apnea occurs. Your annotations will be compared with a set of reference annotations to determine your score. Each annotation that matches a reference annotation earns one point; thus the highest possible score for this event will be approximately 16800 (480 annotations in each of 35 records). It is important to understand that scores approaching the maximum are very unlikely, since apnea assessment can be very difficult even for human experts. Nevertheless, the scores can be expected to provide a reasonable ranking of the ability of the respective algorithms to mimic the decisions made by human experts.
Obtaining scores
A form that will permit you to submit your classifications and/or annotations for scoring is now available here. You will receive a reference number and your score(s) by return e-mail. You may revise your submissions and try again if you wish, but attempts to exploit this service in order to discover the correct classifications are contrary to the spirit of the contest and will result in disqualification.
How to enter
To enter the competition, submit an abstract with a concise description of your approach to the problem to Computers in Cardiology 2000 no later than Wednesday, 3 May 2000. Your abstract must include your reference number and score(s); for this reason, do not wait until the last minute to submit your classifications and/or annotations for scoring. If your abstract is accepted, you will be expected to prepare a four-page paper for presentation during the conference and publication in the conference proceedings. We welcome and encourage contributions to PhysioNet of software developed during this competition.
Awards
The author(s) of the top-scoring eligible entry in each event will receive an award of US$500 in recognition of his or her achievement. In the event of a tie, the date of the author's abstract submission will be the tie-breaker. This rule favors early submission of abstracts, but permits authors to improve their results if they can after submitting their abstracts. Classifications or annotations received for scoring after noon GMT on Friday, 22 September 2000 will not be eligible for awards. Submissions from members and affiliates of our research groups at MIT, Boston University, Harvard Medical School, Beth Israel Deaconess Medical Center, McGill University, and Phillips-University are not eligible for awards, although all are welcome to participate.
Workshop/Panel discussion
All entrants are invited to describe their methods during a panel discussion at Computers in Cardiology in Boston on Sunday, 24 September 2000, when the awards will be given. Individual presentations of accepted papers will be scheduled for one or more sessions of the conference during the following days (25-27 September).
Acknowledgements
1. Funding for the awards has been contributed by the Margret and H.A. Rey Laboratory for Nonlinear Dynamics in Medicine at Boston's Beth Israel Deaconess Medical Center.
2. We thank Thomas Penzel for the discussion of diagnostic criteria for sleep apnea syndrome, as well as for making this event possible by his generous contribution of data.