Skip to content

Commit

Permalink
[chore] Add more info to explain how to use dataset and images
Browse files Browse the repository at this point in the history
  • Loading branch information
suneeta-mall committed Sep 8, 2024
1 parent e2b3a12 commit a714876
Showing 1 changed file with 49 additions and 0 deletions.
49 changes: 49 additions & 0 deletions docs/datasets/radbench.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,55 @@ We share this concern and worry that Radiology foundation models perhaps are als
* Existing datasets are not selected for clinically challenging cases where the pathology is visually subtle or rare. RadBench specifically selects a wide range of pathology in different anatomical parts with the intention of including challenging cases.


## Working with RadBench Dataset

The RadBench dataset is a collection of 89 unique cases, consisting of 40 cases sourced from MedPix and 49 cases sourced from Radiopaedia. A total of 497 questions were asked for these cases, with 377 questions being closed-ended and 120 questions being open-ended.

Here is a breakdown of the dataset's structure:

| Header | Description |
| ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| imageSource | Describes the source of the images, indicating whether they are from MedPix or Radiopaedia. |
| CASE\_ID | For MedPix cases, this field is left blank. For Radiopaedia cases, it contains a numerical value that can be used to search for the case image at [Radiopaedia.org](https://radiopaedia.org/search?scope=cases) using the provided case ID. |
| imageIDs | A list of unique IDs for each image, separated by commas. These IDs can be either MedPix image IDs or Radiopaedia image URLs. To obtain URLs for MedPix images, refer to the following section on loading RadBench images. |
| modality | Currently, the dataset only includes "XR - Plain Film" modality. |
| IMAGE\_ORGAN | Specifies the organ present in the image, such as ABDOMEN, BRAIN, or PELVIS. |
| PRIMARY\_DX | Indicates the primary diagnosis for the given case. |
| QUESTION | Represents the question to be provided to the AI model. |
| Q\_TYPE | Specifies the type of question asked, such as Pathology, Radiological View, Body Part, Clinical, Demographic, or Comparison. |
| ANSWER | Denotes the expected answer from the model. |
| A\_TYPE | Indicates whether the question is open-ended or closed-ended. |
| OPTIONS | For closed-ended questions, this field contains a list of answer options, such as "left" and "right". For open-ended questions, this field is left blank. |

### Loading RadBench images

To load the images for the RadBench dataset, you will need to identify the source of the images using the `imageSource` column provided in the [RadBench dataset](https://github.com/harrison-ai/radbench/blob/main/data/radbench/radbench.csv).

If the images are sourced from [Radiopaedia](https://radiopaedia.org/), the image URLs can be found in the `imageIDs` column and can be directly used to fetch the images.

For [MedPix](https://medpix.nlm.nih.gov/home) images, you will need to query the MedPix image metadata first and obtain the images. The following Python script can be used as a starting point:

```python
from typing import Any, Dict, List
import requests

MEDPIX_DATA_BASE_URL = "https://medpix.nlm.nih.gov/rest/image.json?imageID="

def get_medpix_image_urls(image_ids_str: str) -> Dict[str, List[Any]]:
image_ids: List[str] = image_ids_str.split(",")
session = requests.Session()
images = []
for image_id in image_ids:
image_data_json = session.get(f"{MEDPIX_DATA_BASE_URL}{image_id}").json()
image_url = image_data_json["imageURL"]
assert image_url.endswith(".jpg")
images.append(image_url)
return images
```

This script uses the `get_medpix_image_urls` function to retrieve the image URLs for the given MedPix image IDs. It makes use of the `requests` library to send HTTP requests and obtain the image metadata from the MedPix API.

Please note that this is a basic example and may need to be customized based on your specific requirements and the structure of the RadBench dataset.


## Acknowledgements
Expand Down

0 comments on commit a714876

Please sign in to comment.