This page describes how entries in PhysioNet/CinC Challenges are evaluated and scored automatically. The method described below was developed to support the 2014 Challenge. Entries for future challenges, and unofficial late entries for previous challenges, will also be evaluated and scored using this method.
This page also includes instructions for setting up a replica of the Challenge test environment, which may be useful for debugging Challenge entries.
Evaluation details
All of the processing needed to check, evaluate, and score challenge entries is performed on dedicated 64-bit Linux servers, under control of the supervisor script (evaluate). Each server runs several virtual machines (VMs) using qemu and kvm hardware virtualization.
Newly uploaded entries are initially placed into a queue. The oldest entry in the queue is loaded into an idle VM as soon as one is available, and stage 1 processing begins.
Stage 1 (prep): The entry is checked to be certain that it contains all of the components required by the rules of the challenge; if so, its setup.sh script is run. The evaluation ends if stage 1 does not succeed for any of these reasons:
- the entry is unreadable or incomplete
- setup.sh does not exit within five minutes
- setup.sh fails (exits with non-zero status)
If stage 1 ends early, the diagnostic output of setup.sh (its standard output and standard error output) is reported in the last case, or an appropriate error message is reported otherwise. If setup.sh exits successfully (with zero status), the entry is queued for stage 2 processing.
Stage 2 (quiz): The training data set is copied into the VM. The entry's next.sh script is run for a randomly selected subset of the training records. The evaluation ends if stage 2 does not succeed for any of these reasons:
- next.sh fails on any record
- next.sh does not exit within the time limit (1011 CPU instructions for the 2015 Challenge)
- next.sh's results do not match the expected results (answers.txt) submitted with the entry
If stage 2 ends early, the diagnostic output of next.sh (its standard output and standard error output) is reported in the first case, or an appropriate error message is reported otherwise. If all of the training set records are processed successfully, and all results match the expected results, and the entry does not include a DRYRUN file (which forces a premature exit after completion of stage 2), the entry is queued for stage 3 processing.
In order to be fair to all competitors, the limits on entry running time are measured in CPU instructions rather than in seconds (since the exact running time will depend on many factors that are impractical to control, such as hard disk access speeds.) On a GNU/Linux system, you can measure the number of instructions used by your program by running the command perf stat -e instructions:u ./next.sh a103l.
Stage 3 (exam): The test data set is copied into the VM. The entry's next.sh script is run once with each test record as input. Unlike stage 2, however, errors do not cause premature termination of stage 3, and next.sh's diagnostic output is not reported (to prevent leakage of information about the test data). The numbers of failures and timeouts, if any, are reported in lieu of detailed diagnostics. The results are collected and transmitted from the VM to the dedicated host for stage 4 processing.
Stage 4 (score): The collected results are compared with the Challenge's reference results to determine performance statistics and scores, which are reported to the user.
Replicating the Challenge test environment
In order to test your entry, it is not necessary to replicate the Challenge test hardware, but it may be helpful to compare it with your hardware to estimate your entry's run time. The dedicated Challenge servers have two quad-core 2.6 GHz AMD Opteron 6212 CPUs and 32 GB of RAM. The VMs are configured with a single-core amd64 CPU, 2 GB of RAM, a 20 GB read-only root partition, a 2 GB read-write /home partition, and a 500 MB read-write /tmp partition. A virtual CD-ROM drive and serial port are used for transferring data to and from the guest system. A virtual Ethernet interface is provided only when running MATLAB entries, and only allows connections to the designated MATLAB license server.
To replicate the Challenge test software environment available within the VMs, install 64-bit Ubuntu 14.04 (Trusty Tahr), then install these additional packages from the Ubuntu repository:
sudo apt-get install \ build-essential automake bc cmake cython cython3 dc gdb gsl-bin \ libgsl0-dev libgmp-dev libarmadillo-dev libatlas-dev libboost-all-dev \ libopenblas-dev libsndfile1-dev libsox-dev openjdk-7-jdk parallel \ python-h5py python-pandas python-pip python-sklearn python-virtualenv \ python3-pip r-base r-cran-rcpp r-cran-scales r-recommended octave \ liboctave-dev octave-image octave-statistics octave-optim
(See the complete list of installed Ubuntu packages as of Wednesday, 27 April 2016 at 00:11 CEST. Note that the VMs do not include a graphical desktop environment.)
Install these additional Python 3.4 packages:
sudo pip3 install numpy Pandas Scipy Theano scikit-learn PyWavelets keras h5py wget https://github.com/aaren/wavelets/archive/e1f0d9cd813afc297c3f40d32506f911d2301f30.zip -O wavelets-20150214.zip unzip wavelets-20150214.zip cd wavelets-e1f0d9cd* sudo python3 setup.py install
(See the complete list of python3.4 packages as of Friday, 29 April 2016 at 20:48 CEST.)
Install these additional Python 2.7 packages:
sudo pip install --upgrade https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.7.1-cp27-none-linux_x86_64.whl sudo pip install theano keras lasagne sudo pip install --pre xgboost
(See the complete list of python2.7 packages as of Friday, 29 April 2016 at 20:48 CEST.)
Install nupic (in a virtualenv directory, since it requires specific versions of other packages):
sudo -s virtualenv --system-site-packages /usr/local/nupic ( . /usr/local/nupic/bin/activate pip install https://s3-us-west-2.amazonaws.com/artifacts.numenta.org/numenta/nupic.core/releases/nupic.bindings/nupic.bindings-0.4.2-cp27-none-linux_x86_64.whl pip install nupic pip install Cython pip install cochlea )
(In order to use this package from a challenge entry, you will need to
include the command . /usr/local/nupic/bin/activate
at
the start of your next.sh
script.)
Install these additional R packages:
wget http://cran.r-project.org/src/contrib/fftw_1.0-3.tar.gz \ http://cran.r-project.org/src/contrib/signal_0.7-6.tar.gz \ http://cran.r-project.org/src/contrib/tuneR_1.2.1.tar.gz \ http://cran.r-project.org/src/contrib/seewave_2.0.2.tar.gz \ http://cran.r-project.org/src/contrib/RSNNS_0.4-7.tar.gz sudo R CMD INSTALL fftw_1.0-3.tar.gz signal_0.7-6.tar.gz \ tuneR_1.2.1.tar.gz seewave_2.0.2.tar.gz RSNNS_0.4-7.tar.gz
Install Torch:
sudo -s cd wget https://physionet.org/physiotools/utilities/torch/torch-distro-20160328-pn.tar.gz tar xfvz torch-distro-20160328-pn.tar.gz cd torch-distro-20160328-pn/ PREFIX=/usr/local ./install.sh -s cd wget https://github.com/akfidjeland/torch-totem/archive/21240faf12ee7db95d7adc41760a04ec4ca036d0.tar.gz -O torch-totem-20160308.tar.gz tar xfvz torch-totem-20160308.tar.gz cd torch-totem-21240faf12ee7db95d7adc41760a04ec4ca036d0/ luarocks make rocks/totem-0-0.rockspec cd wget https://github.com/soumith/torch-hdf5/archive/480a43e7505037eb61b9b002cbc68aa2d0618bda.tar.gz -O torch-hdf5-20150204.tar.gz tar xfvz torch-hdf5-20150204.tar.gz cd torch-hdf5-480a43e7505037eb61b9b002cbc68aa2d0618bda/ luarocks make
Finally, install the WFDB Software Package and the WFDB Toolbox for MATLAB and Octave:
sudo -s cd wget https://physionet.org/physiotools/beta/wfdb-10.5.24pre1.tar.gz tar xfvz wfdb-10.5.24pre1.tar.gz cd wfdb-10.5.24 ./configure --without-netfiles --prefix=/usr make && make install cd wget https://physionet.org/physiotools/matlab/wfdb-app-matlab/wfdb-app-toolbox-0-9-9.zip unzip -d /usr/local/lib wfdb-app-toolbox-0-9-9.zip rm -r /usr/local/lib/wfdb-app-toolbox-0-9-9/mcode/nativelibs/linux ln -s /usr /usr/local/lib/wfdb-app-toolbox-0-9-9/mcode/nativelibs/linux echo "addpath('/usr/local/lib/wfdb-app-toolbox-0-9-9/mcode');" >> /etc/octave.conf
You may install and use the test software environment on a spare computer if you wish, or in a VM using whatever VM technology you prefer on your favorite host OS. On the Challenge servers, we use qemu-kvm, hosted on Debian GNU/Linux.