From a71487653d1269997574edba82dc30dffd3a9b7b Mon Sep 17 00:00:00 2001 From: Suneeta Mall Date: Mon, 9 Sep 2024 09:14:39 +1000 Subject: [PATCH] [chore] Add more info to explain how to use dataset and images --- docs/datasets/radbench.md | 49 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/docs/datasets/radbench.md b/docs/datasets/radbench.md index 5d6d1aa..b647be6 100644 --- a/docs/datasets/radbench.md +++ b/docs/datasets/radbench.md @@ -30,6 +30,55 @@ We share this concern and worry that Radiology foundation models perhaps are als * Existing datasets are not selected for clinically challenging cases where the pathology is visually subtle or rare. RadBench specifically selects a wide range of pathology in different anatomical parts with the intention of including challenging cases. +## Working with RadBench Dataset + +The RadBench dataset is a collection of 89 unique cases, consisting of 40 cases sourced from MedPix and 49 cases sourced from Radiopaedia. A total of 497 questions were asked for these cases, with 377 questions being closed-ended and 120 questions being open-ended. + +Here is a breakdown of the dataset's structure: + +| Header | Description | +| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| imageSource | Describes the source of the images, indicating whether they are from MedPix or Radiopaedia. | +| CASE\_ID | For MedPix cases, this field is left blank. For Radiopaedia cases, it contains a numerical value that can be used to search for the case image at [Radiopaedia.org](https://radiopaedia.org/search?scope=cases) using the provided case ID. | +| imageIDs | A list of unique IDs for each image, separated by commas. These IDs can be either MedPix image IDs or Radiopaedia image URLs. To obtain URLs for MedPix images, refer to the following section on loading RadBench images. | +| modality | Currently, the dataset only includes "XR - Plain Film" modality. | +| IMAGE\_ORGAN | Specifies the organ present in the image, such as ABDOMEN, BRAIN, or PELVIS. | +| PRIMARY\_DX | Indicates the primary diagnosis for the given case. | +| QUESTION | Represents the question to be provided to the AI model. | +| Q\_TYPE | Specifies the type of question asked, such as Pathology, Radiological View, Body Part, Clinical, Demographic, or Comparison. | +| ANSWER | Denotes the expected answer from the model. | +| A\_TYPE | Indicates whether the question is open-ended or closed-ended. | +| OPTIONS | For closed-ended questions, this field contains a list of answer options, such as "left" and "right". For open-ended questions, this field is left blank. | + +### Loading RadBench images + +To load the images for the RadBench dataset, you will need to identify the source of the images using the `imageSource` column provided in the [RadBench dataset](https://github.com/harrison-ai/radbench/blob/main/data/radbench/radbench.csv). + +If the images are sourced from [Radiopaedia](https://radiopaedia.org/), the image URLs can be found in the `imageIDs` column and can be directly used to fetch the images. + +For [MedPix](https://medpix.nlm.nih.gov/home) images, you will need to query the MedPix image metadata first and obtain the images. The following Python script can be used as a starting point: + +```python +from typing import Any, Dict, List +import requests + +MEDPIX_DATA_BASE_URL = "https://medpix.nlm.nih.gov/rest/image.json?imageID=" + +def get_medpix_image_urls(image_ids_str: str) -> Dict[str, List[Any]]: + image_ids: List[str] = image_ids_str.split(",") + session = requests.Session() + images = [] + for image_id in image_ids: + image_data_json = session.get(f"{MEDPIX_DATA_BASE_URL}{image_id}").json() + image_url = image_data_json["imageURL"] + assert image_url.endswith(".jpg") + images.append(image_url) + return images +``` + +This script uses the `get_medpix_image_urls` function to retrieve the image URLs for the given MedPix image IDs. It makes use of the `requests` library to send HTTP requests and obtain the image metadata from the MedPix API. + +Please note that this is a basic example and may need to be customized based on your specific requirements and the structure of the RadBench dataset. ## Acknowledgements