Data

Training data

Algorithms can be trained and tuned on any suitable data. However, the used training data must be disclosed when reporting the method. We recommend using the data from the osteoarthritis initiative (OAI, https://oai.epi-ucsf.org/datarelease/) as a training data. The OAI dataset contains 3T MRI scans and X-ray images of knee joints. The sagittal 3D dual-echo in steady state with selective water excitation (DESS WE) and coronal 2D intermediate-weighted turbo spin-echo (TSE IW) MRI sequences are similar to the sequences in the test set (see below). Participants can for example train their methods using the baseline image and covariate data and use the outcome data of the 72-month (6 years) follow-up as a ground truth.

OAI data can be downloaded from the NDA website (https://nda.nih.gov/oai/). Images, background and clinical variables, and visual scores for osteoarthritic changes on radiographs are available on the aforementioned website. You can for example download OAICompleteData in ASCII format from the OAI website. The files with the name kxr_sq_buXX.txt (XX=follow-up time point) contain KL grades and osteophyte grades for the knees. The files with the name AllClinicalXX.txt (XX=follow-up time point) contain the clinical variables. Please note that the OAI dataset does not contain a ready-made variable for incident symptomatic radiographic knee osteoarthritis and it has to be created by combining knee pain and osteophyte variables. To aid the participants, we have created an outcome variable that can be used. You can download the ground truth labels for all OAI knees here (If iSROA=0, the knee did not develop symptomatic radiographic knee osteoarthritis within the follow-up. If iSROA=1, the knee developed symptomatic radiographic knee osteoarthritis within the follow-up. If iSROA=NaN, the knee was excluded for example due to the symptomatic radiographic knee osteoarthritis at baseline or incomplete follow-up data). You can download the labels for the knees that are matched with the age, BMI, and gender of the test data here. The code used to create the variables can be viewed here.

Interim leaderboard: We will add an interim leaderboard for predictions in a subset of OAI data (20% of the knees that were matched with the age, BMI, and gender of the test data). Submission template and IDs of these knees can be downloaded here (template updated October 27, 2020).  As we are using automatic scripts to evaluate and update the leaderboard, please do not change the column names in the template and you need to provide probabilities for all knees. As the ground truth labels are available for this task, please do not cheat by using the leaderboard test set knees when training your algorithm for this purpose. These leaderboard predictions are purely for scientific value. They are not considered when deciding the winner of the challenge and does not make you eligible for co-authorship on the scientific paper documenting the challenge.

Test set

Data from a clinical study [Runhaar et al. 2015] will be used as a test set in this challenge. The test set contains MRI (scanners: 1.0T Philips Intera and 1.5T Siemens Symphony/Magnetom Essenza) scans and knee X-ray images from 423 knees. Sagittal 3D sequences with water excitation and coronal 2D TSE proton density (PD) weighted sequences are similar to the sequences in the OAI data. Baseline image data, baseline clinical data, and outcome data from the 78-month (6.5 years) time point will be used in the challenge. In the test set, the mean age and BMI are 55.7 (SD: 3.2, range: 50 - 62) years and 31.7 (SD: 3.7, range: 26.1 - 47.1) kg/m2, respectively.

In addition to baseline MRI and X-ray images, we will provide certain background information and known risk factors for knee osteoarthritis. The variables are based on the literature [Glyn-Jones et al. 2015, Hunter & Bierma-Zeinstra 2019, Silverwood et al. 2015, Emery et al. 2019] and availability in the released test set. We plan to release the following variables measured at baseline:

  1. Age
  2. Body mass index (BMI)
  3. Gender
  4. Kellgren-Lawrence (KL) grade
  5. Varus malalignment
  6. History of knee injury
  7. Mild symptoms
  8. Presence of Heberden nodes
  9. Joint line tenderness
  10. Crepitus
  11. Morning stiffness
  12. Postmenopausal status


Description of the variables:

Aging is one primary risk factor for osteoarthritis, probably due to a reduction in regenerative capacity and increased number of risk factors present [Glyn-Jones et al. 2015, Hunter & Bierma-Zeinstra 2019]. Age of the test set study subjects (years) at baseline will be provided. In the OAI data, V00AGE variable contain age of the subjects at the baseline.

Obesity is a strong risk factor for knee osteoarthritis [Arden & Nevitt 2006, Silverwood et al. 2015]. Obesity increases overloading of the joints and metabolic factors that are associated with osteoarthritis may also play a role in the pathogenesis of osteoarthritis [Arden & Nevitt 2006]. BMI (kg/m2) of the subjects in the test set will be provided. In the OAI data, P01BMI variable contain BMI of the subjects at the baseline.

Female gender is a risk factor for osteoarthritis [Glyn-Jones et al. 2015, Arden & Nevitt 2006, Silverwood et al. 2015]. Menopause may also be related to osteoarthritis [Arden & Nevitt 2006, Wluka et al. 2000]. The reason why osteoarthritis is more common in women than in men is still unknown, although estrogens may play some role in the pathogenesis. Please note that the released test dataset consists of females only. Postmenopausal status was defined after 12 consecutive months of amenorrhoea. In the OAI data, P02SEX is the gender of the subjects and P01MENSTR variable can be used to obtain the postmenopausal status.

Varus malalignment is a moderate or strong risk factor for knee osteoarthritis [Runhaar et al. 2014, Hunter & Bierma-Zeinstra 2019]. Altered loading of the knee joint may lead to changes in the composition, structure, metabolism and mechanical properties of articular cartilage [Guilak 2011]. In the test set, medial anatomical knee alignment was assessed by determining the angle between the line from the center of the tibial spines through the center of the femoral shaft at approximately 10cm from the joint margin and the line from the center of the tibial spines through the center of the tibial shaft at approximately 10cm from the joint margin, since it correlates with the mechanical axis of the knee and can be assessed on knee radiographs [Kraus et al. 2005]. In healthy adults with neutral alignment, the mechanical axis is between 1 and 1.5 degrees varus [Sheely et al. 2011]. For women, the reported offset between the mechanical and anatomical axis is between 3.5 and 4.6 degrees [Kraus et al. 2005, Sheely et al. 2011]. Since anatomical knee angles were measured in whole degrees, we defined a mechanical axis of 179 ±1 degrees as neutral and corrected for an offset of 4 degrees. Anatomical knee alignment angles below 182 degrees were defined as varus alignment [Brouwer et al. 2007]. In the test set, varus alignment (no/yes) of the knees is available. In the OAI data, V00FTANGLE variable can be used to obtain the varus alignment of the knees at the baseline.

Previous knee injury has been identified as a major risk factor for the development of knee osteoarthritis [Silverwood et al. 2015, Muthuri et al. 2011]. Injuries can for example affect joint biomechanics and make the joint more susceptible to further damage [Glyn-Jones et al. 2015]. In the test set, injury was defined when the women had ever visited a doctor for knee injury (no/yes). In the OAI data, P01INJR and P01INJL variables (Ever injured badly enough to limit ability to walk for at least two days?) contain information about previous knee injuries.

Mild symptoms are potentially associated with the incident symptomatic knee osteoarthritis [Landsmeer et al. 2019]. In the test set, mild symptoms were assessed with the question “Did you experience any pain in or around your knee within the past 12 months?” (no/yes). In the OAI data, P01KPNR12 and P01KPNL12 variables contain the mild symptoms at the baseline.

Presence of Heberden nodes that is a marker for hand osteoarthritis, is potentially a risk factor for knee osteoarthritis [Silverwood et al. 2015]. Both hands of the individuals in the test set were examined for Heberden’s nodes (no/yes). In the OAI data, P02JBMP variable (obvious hard bumps on joints closest to fingertips) contain the information about the presence of Heberden nodes at the baseline.

Radiographic features of osteoarthritis are usually classified using the Kellgren-Lawrence (KL) grading system [Kellgren & Lawrence 1957] and include assessment of joint space narrowing, osteophyte formation, sclerosis and deformity of bony contours. Baseline KL grade is a predictor for incident radiographic osteoarthritis and is available in the test set. In the OAI data, V00XRKL variables contain KL grades of the knees at the baseline.

Morning stiffness is one feature that is evaluated in the ACR criteria for osteoarthritis [Altman et al. 1986]. In the test set, morning stiffness was evaluated with KOOS subscale on stiffness [Roos et al. 1998]. Morning stiffness was present when the knee had moderate/much/very much joint stiffness after sleeping (vs no/little). In the OAI data, V00WSRKN1 and V00WSLKN1 variables can be used to obtain the morning stiffness of the knees at the baseline.

Joint line tenderness and crepitus (grating and crackling sounds of the knee) have been proposed for assessment of early knee osteoarthritis and may be associated with future osteoarthritis [Emery et al. 2019, Schiphof et al. 2014, Bastick et al. 2016]. In the test set, both knees of the individuals were examined for pain at palpation of the medial and lateral joint line (no/yes). Crepitus was tested during active flexion and extension of the knee (no/yes). In the OAI data, V00RKLTTPN, V00RKMTTPN, V00LKLTTPN, and V00LKMTTPN can be used to obtain joint line tenderness and V00RKPFCRE and V00LKPFCRE variables to obtain the crepitus data at the baseline.


References