This page describes how entries in PhysioNet/CinC Challenges are evaluated and scored automatically. The method described below has been developed to support the 2017 Challenge.
This page also includes instructions for setting up a replica of the Challenge test environment, which may be useful for debugging Challenge entries.
For previous Challenge test environments, see the archived documentation:
- 2014 (Robust Detection of Heart Beats in Multimodal Data)
- 2015 (Reducing False Arrhythmia Alarms in the ICU)
- 2016 (Classification of Normal/Abnormal Heart Sound Recordings)
Structure of a 2017 Challenge entry
A Challenge entry must be uploaded in the form of a zip archive
(entry.zip
) or a gzipped tar archive
(entry.tar.gz
.) The entry must contain all
source code for your algorithm, as well as any data files required to
run it (e.g., numerical models that are derived from the training
databases.) In addition, the following files must be included in the
top-level directory of the entry:
AUTHORS.txt
- A text file listing the authors of the entry (individuals who contributed to the design or implementation of the algorithm), and their affiliations.
LICENSE.txt
- A text file listing the terms for distribution. Official Challenge entries must be distributable under a license approved by the Open Source Initiative.
dependencies.txt
(example)- A text file containing the names of additional packages that must
be installed prior to compiling and running the entry. Lines
beginning with ‘#’ are treated as comments. Each
non-comment line must contain either the name of a package
(e.g.,
python3
), or the name of a package followed by ‘=’ and a version number (e.g.,wfdb = 10.5.25~pre2-0~pn1
.) The version number is required if the package comes from the Challenge repository. setup.sh
(example)- A shell script to be invoked by
/bin/bash
. This script is run once, during stage 1 (see below), to compile the entry. next.sh
(example)- A shell script to be invoked by
/bin/bash
. This script is run once for each record in the database, during stages 2 and 3. The name of the record is passed as the first command-line argument (‘$1’), and the script must write its result toanswers.txt
. answers.txt
(example)- A text file giving the expected output for each record in
the “validation” set. During stage 2, this will be
compared against the result of running
next.sh
.
An entry may also contain a file DRYRUN
, which
causes evaluation to stop after the end of stage 2.
Evaluation details
All of the processing needed to check, evaluate, and score challenge entries is performed on dedicated 64-bit Linux servers, under control of the supervisor script (evaluate). Each server runs several virtual machines (VMs) using qemu and kvm hardware virtualization.
Newly uploaded entries are initially placed into a queue. The oldest entry in the queue is loaded into an idle VM as soon as one is available, and stage 1 processing begins.
Stage 1 (prep): The entry is checked to
be certain that it contains all of the components required by the
rules of the challenge. If so, the packages specified
by dependencies.txt
are installed. Finally, the
entry's setup.sh
script is run. The evaluation ends if
stage 1 does not succeed for any of these reasons:
- the entry is unreadable or incomplete
- packages specified by
dependencies.txt
are not available or cannot be installed -
setup.sh
does not exit within 20 minutes -
setup.sh
fails (exits with non-zero status)
If stage 1 ends early, the diagnostic output of setup.sh
(its standard output and standard error output) is reported in the
last case, or an appropriate error message is reported otherwise.
If setup.sh
exits successfully (with zero status), the
entry is queued for stage 2 processing.
Stage 2 (quiz): The validation data set
is copied into the VM. The entry's next.sh
script is run
for each record in this set. The evaluation ends if stage 2 does not
succeed for any of these reasons:
-
next.sh
fails on any record -
next.sh
does not exit within the time limit (2 × 1011 CPU instructions for the 2017 Challenge) -
next.sh
's results do not match the expected results (answers.txt
) submitted with the entry
If stage 2 ends early, the diagnostic output of next.sh
(its standard
output and standard error output) is reported in the first case, or an
appropriate error message is reported otherwise. If all of the training set
records are processed successfully, and all results match the expected results,
and the entry does not include a DRYRUN
file (which forces a premature
exit after completion of stage 2), the entry is queued for stage 3 processing.
In order to be fair to all competitors, the limits on entry running
time are measured in CPU instructions rather than in seconds (since
the exact running time will depend on many factors that are
impractical to control, such as hard disk access speeds.) On a
GNU/Linux system, you can measure the number of instructions used by
your program by running the command perf stat
-e instructions:u ./next.sh A00001
.
Stage 3 (exam): The test data set is copied into the
VM. The entry's next.sh
script is run once with each test record as
input. Unlike stage 2, however, errors do not cause premature termination of
stage 3, and next.sh
's diagnostic output is not reported (to prevent
leakage of information about the test data). The numbers of failures and
timeouts, if any, are reported in lieu of detailed diagnostics. The results
are collected and transmitted from the VM to the dedicated host for stage 4
processing.
Stage 4 (score): The collected results are compared with the Challenge's reference results to determine performance statistics and scores, which are reported to the user.
Replicating the Challenge test environment
In order to test your entry, it is not necessary to replicate the Challenge test hardware, but it may be helpful to compare it with your hardware to estimate your entry's run time. The dedicated Challenge servers have two quad-core 2.6 GHz AMD Opteron 6212 CPUs and 96 GB of RAM.
Each virtual machine is configured with a single-core amd64 CPU, 2 GB
of RAM, a 2 GB read-write /home partition, and a 500 MB read-write
/tmp partition. The root filesystem is an aufs
filesystem with 5 GB of space for installing packages. A virtual
CD-ROM drive and serial port are used for transferring data to and
from the guest system. A virtual Ethernet interface is provided only
when running MATLAB entries, and only allows connections to the
designated MATLAB license server.
To replicate the Challenge test software environment available within the VMs, install 64-bit Debian 8 (jessie), then install these additional packages:
sudo apt-get install devscripts build-essential zip
(See the complete list of packages included in the base system as of Tuesday, 28 March 2017 at 21:26 CEST.)
In addition, enable the PhysioNet Challenge package repository:
sudo tee -a /etc/apt/sources.list <<EOF deb http://physionet.org/physiotools/debian/ jessie-challenge-2017 main contrib non-free deb-src http://physionet.org/physiotools/debian/ jessie-challenge-2017 main contrib non-free EOF gpg --recv-keys BDE026E5901D1BE2 gpg --export BDE026E5901D1BE2 | sudo apt-key add - sudo apt-get update
You may install and use the test software environment on a spare computer if you wish, or in a VM using whatever VM technology you prefer on your favorite host OS. On the Challenge servers, we use qemu-kvm, hosted on Debian GNU/Linux.