2016年5月25日星期三

Medicurator Abstract Definition



Medicurator is composed of six main part(More details on the bitbucket)

User 
         User has his username, password, replicaset. A user  can save many replicasets.

ReplicaSet

        Replicaset contains many datasets.
        It can operate add, get, remove, download  etc on the datasets.

Datasource
      
      The datasource is decided by the system. Now it has two   functions- getRootDataSet;retrieveDataSet 

Dataset
     
      Dataset is a tree structure. One dataset has its parent, child,metadateID and the data.
      
Metadata
      
      tag--string
      attribute--key-value<string, string>

Data

      represent the content requested by the user      
      Download process




2016年5月18日星期三

TCIA dataset hierarchy

The Cancer Imaging Archive (TCIA) is an open-access database of medical images for Cancer research. People can download the entire contents of a collection in bulk or utilize the RESTful API to search or download within or across collections.

TCIA's data is managed in a hierarchy way:
         the Whole database
                Collection1  Collection2  Collection3 .......
         in a Collection
                Patient1  Patient2  Patient3 .......
         in a Patient
                Study1  Study2  Study3 ......
         in a Study
                 Series1  Series2  Series3 ......
         in a Seris
                 image1  image2  image3 ...... (DICOM format)

         One can use RESTful API to search a Patients' set or a Studies' set of specific collections or some other filter conditions.
         One can also download the images of a Seris as a zip file or download one specific image as DICOM format file.
         Besides, TCIA supports some other options such as on can choose the format of response (CSV/HTML/XML/JSON), the modality of Study (CT/MR...), or the body part examined.


an image has its global unique ID : SOPInstanceUID
an image set of a Series has a global unique ID : SeriesInstanceUID
and a Study also has a global unique ID : StudyInstanceUID

To avoid redownloading the same image, one can keep track to the SOPInstanceUIDs and SeriesInstanceUIDs which have been downloaded before. Because one can only download image by those two UIDs via RESTful API.


2016年5月11日星期三

Infinispan and Spark

Today I find an amazing thing that Infinispan and Spark can work together! It may be very useful to my project. https://github.com/infinispan/infinispan-spark

Infinispan is a distributed in-memory key/value data storage system which should be much faster than HDFS. It can be used in distributed computing task as fast temporary storage and cache. http://infinispan.org/

Spark is a MapReduce style of distributed computing framwork. Visit http://spark.apache.org/ for more information

My thought on MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives



MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives
This is a paper written by my mentor.The content has  connection with the project I am going  to do.

Here, I list something I learn form this paper.

While sharing data is encouraged in science, algorithms and architectures should be designed for mashing up and sharing the medical data efficiently. Hence, a data sharing synchronization system should be secured and minimize data duplication in client instances, in addition to the regular requirements of the data access integration platforms.A data sharing synchronization platform should let data consumers to view sub sets of data that satisfy user-defined search criteria, and share them with others using pointers to the actual data.

This paper presents MEDIator, a data sharing and synchronization middleware platform for heterogeneous medical image archives. MEDIator allows sharing pointers to medical data efficiently, while letting the consumers manipulate the pointers without modifying the raw medical data. MEDIator has been implemented for multiple data sources, including Amazon S3, The Cancer Imaging Archive (TCIA), caMicroscope, and metadata from CSV files for cancer images.


 Also, an in-memory data grid can be an alternative for a traditional storage for the replica sets, as it provides faster storage, access, and execution. And this paper uses the platform - Infinispan. By the way, in my project, I plan to use Infinispan.

MEDIator lets the users create, update, retrieve, and delete replica sets, and share the replica sets with others. 
Higher Level Use Case VIew


MEDIator APIs :InterfaceAPI,PubConsAPI,Integrator
(details ignored)


Integration with Medical Data Sources 

Clinical data is deployed in multiple data sources such as TCIA, caMicroscope, and Amazon S3. Figure 3 de- picts the deployment of the system with multiple med- ical data sources.This part can help us access to different sources of data.


"MEDIator is multi-tenanted where multiple users co-exist without the knowledge of existence of the other users, sharing the same cache space. Involving a time stamp for the class extending P ubC onsAP I , downloaded items can be tracked, and the dis can be produced for the user download. Thus a download can be paused and resumed later, downloading the images that have not been downloaded yet."

---I am not clear about this paragraph, to be discussed.

To concluded, firstly, I think I can employ the part of the Representation of Medical Image Sources to my project. This part can help me to represent the source data.Secondly, I can join the MEDIator to access data and then do the Near Duplicate Detection work based on that.


2016年5月5日星期四

Hello, GSoC 2016!

I'm really happy to be selected in Google Summer of Code 2016. Thanks to my mentor Pradeeban and Ashish, your patience and help give me confidence when I wrote my proposal. Wish us a good cooperation.

The name of my GSoC project is Near Duplicate Detection in Medical Image Archives. My proposal is here.

I've set up my code repository at https://bitbucket.org/BMI/medicurator.

Good luck and have fun!

Happy Birthday, Peking University!

Yesterday, May 4th is Peking University's 118th birthday. Congratulations! I'm proud of you forever.