Irene: 2016

2016年8月12日星期五

Read the docs

These days, I finished the document for the MediCurator.

Below is the link:

http://medicurator.readthedocs.io/en/latest/

2016年8月10日星期三

This week, I wrote the document and fix the bug.

Details at

Download only what is new

https://bitbucket.org/BMI/medicurator/issues/6/download-only-what-is-new-when-attempting

The document:

https://bitbucket.org/BMI/medicurator/issues/5/medicurator-readthedocs-documentation

Fix the hard code the API_key

https://bitbucket.org/BMI/medicurator/issues/7/avoid-the-need-to-hard-code-the-api-key

2016年7月31日星期日

Research on Medicurator Supporting Dicomweb

I have done some research on the dicomweb. It can apply to MediCurator. The dicomweb has three levels - study, series and instance. I can query it level by level and inherit it to Medicurator like TCIA. The user only need to add the Url of the server, it can work. And I can use the function of retrieve to download the images. So the Madicurator can implement via it. However, I only find a serverhttp://www.dicomserver.co.uk/DICOMRS.html on which the dicomweb has implemented while the retrieve function didn't finished. So I can not experiment with retrieve function. In conclusion, Medicurator can use the dicomweb without too much modification.

2016年7月26日星期二

MediCurator Refactor and Add the Duplicate Detect

This week, I refactor the MediCurator into the three modules,

medicurator-core for the core medicurator stuff.
medicurator-server for the API.
medicurator-client for the web app.

So that , I can avoid the dependencies conflicts.

And I add the near Duplicate Detect function to the web app and APIs.

http://localhost:4567/duplicateSets?replicasetID1=***&replicasetID2=***

This makes the functions completely.

What's more, I write some scripts which make user to run my project easier.

Building

./compile.sh

Run webapp

./run_servlet.sh

Run Restful API

./run_api.sh

2016年7月20日星期三

Restful API and Delete - Download Workflow

First, this week, I implement the Restful API.

As shown in the README, it concludes the following

API:

http://localhost:4567/signup?username=***&password=***

http://localhost:4567/login?username=***&password=***

http://localhost:4567/getReplicaSets?userid=***

http://localhost:4567/createReplicaSets?userid=***&replicaName=***

http://localhost:4567/getDataSets?replicasetID=***

http://localhost:4567/addDataSet?replicasetID=***&datasetID=***

http://localhost:4567/removeDataSet?replicasetID=***&datasetID=***

http://localhost:4567/getRootDataSets http://localhost:4567/getSubsets?datasetID=***

http://localhost:4567/downloadDataSets?datasetID=***

http://localhost:4567/downloadOneDataSets?datasetID=***

http://localhost:4567/deleteDataSets?datasetID=***

http://localhost:4567/deleteOneDataSet?datasetID=***

And there is

http://localhost:4567/duplicateSets?replicasetID1=***&replicasetID2=***

This is to be done.

The second thing I have done is fix the Delete - Download Workflow. Now Medicurator can support the function that the user download and delete and download again.

I implement this by taking the meaning of "remove" and "delete" apart. Remove means to move the dataset out of the replicaset and "delete" means delete directly which can download again.

2016年7月13日星期三

Implement Local File Source

This week, I implement the medicurator for Local file source.

Through my test, it proves to have been working well. The test local file is under the path:medicurator/target/classes/image

It can be downloaded through the web application written last week.

The downloaded file is now stored at medicurator/target/classes/local.test
The path can be changed in Constant.java

Afterwards, I will finish the complete workflow on downloaded tracking to solve the delete problem. I think I should implement a delete invoke function so that when the user's delete behavior through the website or the delete message sent from the duplicate detect, it will invoke the function.

2016年7月6日星期三

Hdfs Apply to Medicurator

This week, I apply Hdfs to Medicurator.

As we all know, the hadoop distributed file system(HDFS) is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and is designed to be deploved on low-cost hardware. Hdfs provides high throughput access to application data and is suitable for applications that have large data sets.

To make medicurator easier to deal with the high throughput, I decide to use the Hdfs. I inherit the class Storage and make the already existed LocalStorage become HdfsStorage. In order to realize this, I mainly use the API, referred https://hadoop.apache.org/docs/r2.6.1/api/overview-summary.html.

After my test, it works well. I only run this on my single computer, to make this run on the cluster, there still has some work to do.

To add, the consumer can choose the localStorage or the hdfsStorage according to their peference by changing the STORAGE (hdfs/local) in Constants.java. To use HDFS, the user should config HDFS_URI and HDFS_BASEDIR in Constants.java.For example, HDFS_URI = "hdfs://localhost:9000/" and HDFS_BASEDIR = "/user/xxx/medicurator/"
Source code and More information
https://bitbucket.org/BMI/medicurator

2016年6月29日星期三

Web Application - - Medicurator

I use Maven-Tomcat7-plugin to implement a website.

This web application is for user to consume the Medicurator.

It will run at http://localhost:2222/index

It contains

Signup
Login
Logout
personal page which lists all the replicaset
function:

add a replicaset
add a dataset
delete a dataset
download a dataset

To add:

All the images are organized by hierarchy, which will be showed directly level by level to help the potential user easily get access to what they want.
The implementation is relatively robust. It won't be influenced by the collapse of the server, which means it will remember all the users' information no matter what happened to the server.

Further to learn about this:

https://bitbucket.org/BMI/medicurator

2016年6月8日星期三

The code hierarchy of MediCurator version 1 (before mid evaluation)

/medicurator/src/main/java/edu/emory/bmi/medicurator/
.
├── dupdetect ----------------- Near-duplicate detection module
│ │
│ ├── DetectImage.java ----- Detect duplicate image pairs
│ │
│ ├── DetectMetadata.java ----- Detect near-duplicate metadata pairs
│ │
│ ├── DupDetect.java --------- Entry of the detection module
│ │
│ ├── DuplicatePair.java ------ Define the data type of duplicate pair
│ │
│ └── Verify.java ---------- Check if a pair is really near-duplicate
│
│
├── general -------------------- Define the abstract data structures
│ │
│ ├── DataSet.java ------------ A DataSet may contains several Images
│ │ and sub DataSets. Maintained as a tree.
│ │
│ ├── DataSource.java --------- DataSource has a root DataSet
│ │
│ │
│ ├── Metadata.java ----------- Metadata is a collection of key-value
│ │ pairs Both of key and value are String.
│ │
│ ├── ReplicaSet.java --------- ReplicaSet contains many Datasets. The
│ │ DataSets might from different DataSource
│ │
│ └── User.java --------------- User has username and password as well
│ as several ReplicaSets.
│
├── image ------------------------- Various image types
│ │
│ ├── DicomImage.java --------- Implementation of DICOM image type.
│ │
│ └── Image.java -------------- The abstraction of image, a image is
│ consists of a Metadata and a byte[] of
│ raw image data.
│
│
├── infinispan ------------------- Contact with Infinispan
│ │
│ ├── ID.java ----------------- Put and get various data with data id
│ │
│ ├── Manager.java ------------ The global only DefaultCacheManager
│ │
│ └── StartInfinispan.java ---- Just start a Infinispan node
│
│
├── storage --------------------- Persist storage
│ │
│ ├── HdfsStorage.java ------ (TODO)store to HDFS
│ │
│ ├── LocalStorage.java ------ Store to local disk
│ │
│ └── Storage.java ---------- Interface of storage, save and load
│
│
└── tcia ------------------------ Implementation of TCIA data source
│
├── TciaAPI.java ----------- Implementation of TCIA RESTful API
│
├── TciaDataSet.java ------- DataSet
│
├── TciaDataSource.java ---- DataSource
│
├── TciaHierarchy.java ----- Five hierarchy of TCIA DataSet
│
└── TciaQuery.java --------- Generate and send request with HTTPS get

2016年5月25日星期三

Medicurator Abstract Definition

Medicurator is composed of six main part(More details on the bitbucket)

User

User has his username, password, replicaset. A user can save many replicasets.

ReplicaSet

Replicaset contains many datasets.

It can operate add, get, remove, download etc on the datasets.

Datasource

The datasource is decided by the system. Now it has two functions- getRootDataSet;retrieveDataSet

Dataset

Dataset is a tree structure. One dataset has its parent, child,metadateID and the data.

Metadata

tag--string

attribute--key-value<string, string>

Data

represent the content requested by the user

Download process

2016年5月18日星期三

TCIA dataset hierarchy

The Cancer Imaging Archive (TCIA) is an open-access database of medical images for Cancer research. People can download the entire contents of a collection in bulk or utilize the RESTful API to search or download within or across collections.

TCIA's data is managed in a hierarchy way:
the Whole database
Collection1 Collection2 Collection3 .......
in a Collection
Patient1 Patient2 Patient3 .......
in a Patient
Study1 Study2 Study3 ......
in a Study
Series1 Series2 Series3 ......
in a Seris
image1 image2 image3 ...... (DICOM format)

One can use RESTful API to search a Patients' set or a Studies' set of specific collections or some other filter conditions.
One can also download the images of a Seris as a zip file or download one specific image as DICOM format file.
Besides, TCIA supports some other options such as on can choose the format of response (CSV/HTML/XML/JSON), the modality of Study (CT/MR...), or the body part examined.

an image has its global unique ID : SOPInstanceUID
an image set of a Series has a global unique ID : SeriesInstanceUID
and a Study also has a global unique ID : StudyInstanceUID

To avoid redownloading the same image, one can keep track to the SOPInstanceUIDs and SeriesInstanceUIDs which have been downloaded before. Because one can only download image by those two UIDs via RESTful API.

2016年5月11日星期三

Infinispan and Spark

Today I find an amazing thing that Infinispan and Spark can work together! It may be very useful to my project. https://github.com/infinispan/infinispan-spark

Infinispan is a distributed in-memory key/value data storage system which should be much faster than HDFS. It can be used in distributed computing task as fast temporary storage and cache. http://infinispan.org/

Spark is a MapReduce style of distributed computing framwork. Visit http://spark.apache.org/ for more information

My thought on MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives

MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives
This is a paper written by my mentor.The content has connection with the project I am going to do.

Here, I list something I learn form this paper.

While sharing data is encouraged in science, algorithms and architectures should be designed for mashing up and sharing the medical data efficiently. Hence, a data sharing synchronization system should be secured and minimize data duplication in client instances, in addition to the regular requirements of the data access integration platforms.A data sharing synchronization platform should let data consumers to view sub sets of data that satisfy user-defined search criteria, and share them with others using pointers to the actual data.

This paper presents MEDIator, a data sharing and synchronization middleware platform for heterogeneous medical image archives. MEDIator allows sharing pointers to medical data efficiently, while letting the consumers manipulate the pointers without modifying the raw medical data. MEDIator has been implemented for multiple data sources, including Amazon S3, The Cancer Imaging Archive (TCIA), caMicroscope, and metadata from CSV files for cancer images.

Also, an in-memory data grid can be an alternative for a traditional storage for the replica sets, as it provides faster storage, access, and execution. And this paper uses the platform - Infinispan. By the way, in my project, I plan to use Infinispan.

MEDIator lets the users create, update, retrieve, and delete replica sets, and share the replica sets with others.

Higher Level Use Case VIew

MEDIator APIs :InterfaceAPI，PubConsAPI，Integrator
（details ignored）

Integration with Medical Data Sources

Clinical data is deployed in multiple data sources such as TCIA, caMicroscope, and Amazon S3. Figure 3 de- picts the deployment of the system with multiple med- ical data sources.This part can help us access to different sources of data.

"MEDIator is multi-tenanted where multiple users co-exist without the knowledge of existence of the other users, sharing the same cache space. Involving a time stamp for the class extending P ubC onsAP I , downloaded items can be tracked, and the dis can be produced for the user download. Thus a download can be paused and resumed later, downloading the images that have not been downloaded yet."

---I am not clear about this paragraph, to be discussed.

To concluded, firstly, I think I can employ the part of the Representation of Medical Image Sources to my project. This part can help me to represent the source data.Secondly, I can join the MEDIator to access data and then do the Near Duplicate Detection work based on that.

2016年5月5日星期四

Hello, GSoC 2016！

I'm really happy to be selected in Google Summer of Code 2016. Thanks to my mentor Pradeeban and Ashish, your patience and help give me confidence when I wrote my proposal. Wish us a good cooperation.

The name of my GSoC project is Near Duplicate Detection in Medical Image Archives. My proposal is here.

I've set up my code repository at https://bitbucket.org/BMI/medicurator.

Good luck and have fun！

Happy Birthday, Peking University！

Yesterday, May 4th is Peking University's 118th birthday. Congratulations! I'm proud of you forever.

订阅：博文 (Atom)