2016年5月18日星期三

TCIA dataset hierarchy

The Cancer Imaging Archive (TCIA) is an open-access database of medical images for Cancer research. People can download the entire contents of a collection in bulk or utilize the RESTful API to search or download within or across collections.

TCIA's data is managed in a hierarchy way:
         the Whole database
                Collection1  Collection2  Collection3 .......
         in a Collection
                Patient1  Patient2  Patient3 .......
         in a Patient
                Study1  Study2  Study3 ......
         in a Study
                 Series1  Series2  Series3 ......
         in a Seris
                 image1  image2  image3 ...... (DICOM format)

         One can use RESTful API to search a Patients' set or a Studies' set of specific collections or some other filter conditions.
         One can also download the images of a Seris as a zip file or download one specific image as DICOM format file.
         Besides, TCIA supports some other options such as on can choose the format of response (CSV/HTML/XML/JSON), the modality of Study (CT/MR...), or the body part examined.


an image has its global unique ID : SOPInstanceUID
an image set of a Series has a global unique ID : SeriesInstanceUID
and a Study also has a global unique ID : StudyInstanceUID

To avoid redownloading the same image, one can keep track to the SOPInstanceUIDs and SeriesInstanceUIDs which have been downloaded before. Because one can only download image by those two UIDs via RESTful API.


没有评论:

发表评论