2016年8月12日星期五
Read the docs
These days, I finished the document for the MediCurator.
Below is the link:
http://medicurator.readthedocs.io/en/latest/
2016年8月10日星期三
Document and fix the bug
This week, I wrote the document and fix the bug.
Details at
Download only what is new
https://bitbucket.org/BMI/medicurator/issues/6/download-only-what-is-new-when-attempting
The document:
https://bitbucket.org/BMI/medicurator/issues/5/medicurator-readthedocs-documentation
Fix the hard code the API_key
https://bitbucket.org/BMI/medicurator/issues/7/avoid-the-need-to-hard-code-the-api-key
2016年7月31日星期日
Research on Medicurator Supporting Dicomweb
I have done some research on the dicomweb. It can apply to MediCurator. The dicomweb has three levels - study, series and instance. I can query it level by level and inherit it to Medicurator like TCIA. The user only need to add the Url of the server, it can work. And I can use the function of retrieve to download the images. So the Madicurator can implement via it. However, I only find a serverhttp://www.dicomserver.co.uk/DICOMRS.html on which the dicomweb has implemented while the retrieve function didn't finished. So I can not experiment with retrieve function. In conclusion, Medicurator can use the dicomweb without too much modification.
2016年7月26日星期二
MediCurator Refactor and Add the Duplicate Detect
This week, I refactor the MediCurator into the three modules,
- medicurator-core for the core medicurator stuff.
- medicurator-server for the API.
- medicurator-client for the web app.
So that , I can avoid the dependencies conflicts.
And I add the near Duplicate Detect function to the web app and APIs.
http://localhost:4567/duplicateSets?replicasetID1=***&replicasetID2=***
http://localhost:4567/duplicateSets?replicasetID1=***&replicasetID2=***
This makes the functions completely.
What's more, I write some scripts which make user to run my project easier.
Building
./compile.sh
Run webapp
./run_servlet.sh
Run Restful API
./run_api.sh
Building
./compile.sh
Run webapp
./run_servlet.sh
Run Restful API
./run_api.sh
2016年7月20日星期三
Restful API and Delete - Download Workflow
Restful API and Delete - Download Workflow
First, this week, I implement the Restful API.
As shown in the README, it concludes the following
API:
http://localhost:4567/signup?username=***&password=***
http://localhost:4567/login?username=***&password=***
http://localhost:4567/getReplicaSets?userid=***
http://localhost:4567/createReplicaSets?userid=***&replicaName=***
http://localhost:4567/getDataSets?replicasetID=***
http://localhost:4567/addDataSet?replicasetID=***&datasetID=***
http://localhost:4567/removeDataSet?replicasetID=***&datasetID=***
http://localhost:4567/getRootDataSets http://localhost:4567/getSubsets?datasetID=***
http://localhost:4567/downloadDataSets?datasetID=***
http://localhost:4567/downloadOneDataSets?datasetID=***
http://localhost:4567/deleteDataSets?datasetID=***
http://localhost:4567/deleteOneDataSet?datasetID=***
API:
http://localhost:4567/signup?username=***&password=***
http://localhost:4567/login?username=***&password=***
http://localhost:4567/getReplicaSets?userid=***
http://localhost:4567/createReplicaSets?userid=***&replicaName=***
http://localhost:4567/getDataSets?replicasetID=***
http://localhost:4567/addDataSet?replicasetID=***&datasetID=***
http://localhost:4567/removeDataSet?replicasetID=***&datasetID=***
http://localhost:4567/getRootDataSets http://localhost:4567/getSubsets?datasetID=***
http://localhost:4567/downloadDataSets?datasetID=***
http://localhost:4567/downloadOneDataSets?datasetID=***
http://localhost:4567/deleteDataSets?datasetID=***
http://localhost:4567/deleteOneDataSet?datasetID=***
And there is
http://localhost:4567/duplicateSets?replicasetID1=***&replicasetID2=***
http://localhost:4567/duplicateSets?replicasetID1=***&replicasetID2=***
This is to be done.
The second thing I have done is fix the Delete - Download Workflow. Now Medicurator can support the function that the user download and delete and download again.
I implement this by taking the meaning of "remove" and "delete" apart. Remove means to move the dataset out of the replicaset and "delete" means delete directly which can download again.
2016年7月13日星期三
Implement Local File Source
This week, I implement the medicurator for Local file source.
Through my test, it proves to have been working well. The test local file is under the path:medicurator/target/classes/image
It can be downloaded through the web application written last week.
The downloaded file is now stored at medicurator/target/classes/local.test
The path can be changed in Constant.java
Afterwards, I will finish the complete workflow on downloaded tracking to solve the delete problem. I think I should implement a delete invoke function so that when the user's delete behavior through the website or the delete message sent from the duplicate detect, it will invoke the function.
2016年7月6日星期三
Hdfs Apply to Medicurator
This week, I apply Hdfs to Medicurator.
As we all know, the hadoop distributed file system(HDFS) is a distributed file system designed to run on commodity hardware. HDFS is highly fault-tolerant and is designed to be deploved on low-cost hardware. Hdfs provides high throughput access to application data and is suitable for applications that have large data sets.
To make medicurator easier to deal with the high throughput, I decide to use the Hdfs. I inherit the class Storage and make the already existed LocalStorage become HdfsStorage. In order to realize this, I mainly use the API, referred https://hadoop.apache.org/docs/r2.6.1/api/overview-summary.html.
After my test, it works well. I only run this on my single computer, to make this run on the cluster, there still has some work to do.
To add, the consumer can choose the localStorage or the hdfsStorage according to their peference by changing the STORAGE (hdfs/local) in Constants.java. To use HDFS, the user should config HDFS_URI and HDFS_BASEDIR in Constants.java.For example, HDFS_URI = "hdfs://localhost:9000/" and HDFS_BASEDIR = "/user/xxx/medicurator/"
Source code and More information
https://bitbucket.org/BMI/medicurator
2016年6月29日星期三
Web Application - - Medicurator
I use Maven-Tomcat7-plugin to implement a website.
This web application is for user to consume the Medicurator.
It will run at http://localhost:2222/index
It contains
- Signup
- Login
- Logout
- personal page which lists all the replicaset
- function:
- add a replicaset
- add a dataset
- delete a dataset
- download a dataset
To add:
- All the images are organized by hierarchy, which will be showed directly level by level to help the potential user easily get access to what they want.
- The implementation is relatively robust. It won't be influenced by the collapse of the server, which means it will remember all the users' information no matter what happened to the server.
Further to learn about this:
https://bitbucket.org/BMI/medicurator
2016年6月8日星期三
The code hierarchy of MediCurator version 1 (before mid evaluation)
/medicurator/src/main/java/edu/emory/bmi/medicurator/
.
├── dupdetect ----------------- Near-duplicate detection module
│ │
│ ├── DetectImage.java ----- Detect duplicate image pairs
│ │
│ ├── DetectMetadata.java ----- Detect near-duplicate metadata pairs
│ │
│ ├── DupDetect.java --------- Entry of the detection module
│ │
│ ├── DuplicatePair.java ------ Define the data type of duplicate pair
│ │
│ └── Verify.java ---------- Check if a pair is really near-duplicate
│
│
├── general -------------------- Define the abstract data structures
│ │
│ ├── DataSet.java ------------ A DataSet may contains several Images
│ │ and sub DataSets. Maintained as a tree.
│ │
│ ├── DataSource.java --------- DataSource has a root DataSet
│ │
│ │
│ ├── Metadata.java ----------- Metadata is a collection of key-value
│ │ pairs Both of key and value are String.
│ │
│ ├── ReplicaSet.java --------- ReplicaSet contains many Datasets. The
│ │ DataSets might from different DataSource
│ │
│ └── User.java --------------- User has username and password as well
│ as several ReplicaSets.
│
├── image ------------------------- Various image types
│ │
│ ├── DicomImage.java --------- Implementation of DICOM image type.
│ │
│ └── Image.java -------------- The abstraction of image, a image is
│ consists of a Metadata and a byte[] of
│ raw image data.
│
│
├── infinispan ------------------- Contact with Infinispan
│ │
│ ├── ID.java ----------------- Put and get various data with data id
│ │
│ ├── Manager.java ------------ The global only DefaultCacheManager
│ │
│ └── StartInfinispan.java ---- Just start a Infinispan node
│
│
├── storage --------------------- Persist storage
│ │
│ ├── HdfsStorage.java ------ (TODO)store to HDFS
│ │
│ ├── LocalStorage.java ------ Store to local disk
│ │
│ └── Storage.java ---------- Interface of storage, save and load
│
│
└── tcia ------------------------ Implementation of TCIA data source
│
├── TciaAPI.java ----------- Implementation of TCIA RESTful API
│
├── TciaDataSet.java ------- DataSet
│
├── TciaDataSource.java ---- DataSource
│
├── TciaHierarchy.java ----- Five hierarchy of TCIA DataSet
│
└── TciaQuery.java --------- Generate and send request with HTTPS get
.
├── dupdetect ----------------- Near-duplicate detection module
│ │
│ ├── DetectImage.java ----- Detect duplicate image pairs
│ │
│ ├── DetectMetadata.java ----- Detect near-duplicate metadata pairs
│ │
│ ├── DupDetect.java --------- Entry of the detection module
│ │
│ ├── DuplicatePair.java ------ Define the data type of duplicate pair
│ │
│ └── Verify.java ---------- Check if a pair is really near-duplicate
│
│
├── general -------------------- Define the abstract data structures
│ │
│ ├── DataSet.java ------------ A DataSet may contains several Images
│ │ and sub DataSets. Maintained as a tree.
│ │
│ ├── DataSource.java --------- DataSource has a root DataSet
│ │
│ │
│ ├── Metadata.java ----------- Metadata is a collection of key-value
│ │ pairs Both of key and value are String.
│ │
│ ├── ReplicaSet.java --------- ReplicaSet contains many Datasets. The
│ │ DataSets might from different DataSource
│ │
│ └── User.java --------------- User has username and password as well
│ as several ReplicaSets.
│
├── image ------------------------- Various image types
│ │
│ ├── DicomImage.java --------- Implementation of DICOM image type.
│ │
│ └── Image.java -------------- The abstraction of image, a image is
│ consists of a Metadata and a byte[] of
│ raw image data.
│
│
├── infinispan ------------------- Contact with Infinispan
│ │
│ ├── ID.java ----------------- Put and get various data with data id
│ │
│ ├── Manager.java ------------ The global only DefaultCacheManager
│ │
│ └── StartInfinispan.java ---- Just start a Infinispan node
│
│
├── storage --------------------- Persist storage
│ │
│ ├── HdfsStorage.java ------ (TODO)store to HDFS
│ │
│ ├── LocalStorage.java ------ Store to local disk
│ │
│ └── Storage.java ---------- Interface of storage, save and load
│
│
└── tcia ------------------------ Implementation of TCIA data source
│
├── TciaAPI.java ----------- Implementation of TCIA RESTful API
│
├── TciaDataSet.java ------- DataSet
│
├── TciaDataSource.java ---- DataSource
│
├── TciaHierarchy.java ----- Five hierarchy of TCIA DataSet
│
└── TciaQuery.java --------- Generate and send request with HTTPS get
2016年5月25日星期三
Medicurator Abstract Definition
Medicurator is composed of six main part(More details on the bitbucket)
User
User has his username, password, replicaset. A user can save many replicasets.
ReplicaSet
Replicaset contains many datasets.
It can operate add, get, remove, download etc on the datasets.
Datasource
The datasource is decided by the system. Now it has two functions- getRootDataSet;retrieveDataSet
Dataset
Dataset is a tree structure. One dataset has its parent, child,metadateID and the data.
Metadata
tag--string
attribute--key-value<string, string>
Data
represent the content requested by the user
Download process
2016年5月18日星期三
TCIA dataset hierarchy
The Cancer Imaging Archive (TCIA) is an open-access database of medical images for Cancer research. People can download the entire contents of a collection in bulk or utilize the RESTful API to search or download within or across collections.
TCIA's data is managed in a hierarchy way:
the Whole database
Collection1 Collection2 Collection3 .......
in a Collection
Patient1 Patient2 Patient3 .......
in a Patient
Study1 Study2 Study3 ......
in a Study
Series1 Series2 Series3 ......
in a Seris
image1 image2 image3 ...... (DICOM format)
One can use RESTful API to search a Patients' set or a Studies' set of specific collections or some other filter conditions.
One can also download the images of a Seris as a zip file or download one specific image as DICOM format file.
Besides, TCIA supports some other options such as on can choose the format of response (CSV/HTML/XML/JSON), the modality of Study (CT/MR...), or the body part examined.
an image has its global unique ID : SOPInstanceUID
an image set of a Series has a global unique ID : SeriesInstanceUID
and a Study also has a global unique ID : StudyInstanceUID
To avoid redownloading the same image, one can keep track to the SOPInstanceUIDs and SeriesInstanceUIDs which have been downloaded before. Because one can only download image by those two UIDs via RESTful API.
TCIA's data is managed in a hierarchy way:
the Whole database
Collection1 Collection2 Collection3 .......
in a Collection
Patient1 Patient2 Patient3 .......
in a Patient
Study1 Study2 Study3 ......
in a Study
Series1 Series2 Series3 ......
in a Seris
image1 image2 image3 ...... (DICOM format)
One can use RESTful API to search a Patients' set or a Studies' set of specific collections or some other filter conditions.
One can also download the images of a Seris as a zip file or download one specific image as DICOM format file.
Besides, TCIA supports some other options such as on can choose the format of response (CSV/HTML/XML/JSON), the modality of Study (CT/MR...), or the body part examined.
an image has its global unique ID : SOPInstanceUID
an image set of a Series has a global unique ID : SeriesInstanceUID
and a Study also has a global unique ID : StudyInstanceUID
To avoid redownloading the same image, one can keep track to the SOPInstanceUIDs and SeriesInstanceUIDs which have been downloaded before. Because one can only download image by those two UIDs via RESTful API.
2016年5月11日星期三
Infinispan and Spark
Today I find an amazing thing that Infinispan and Spark can work together! It may be very useful to my project. https://github.com/infinispan/infinispan-spark
Infinispan is a distributed in-memory key/value data storage system which should be much faster than HDFS. It can be used in distributed computing task as fast temporary storage and cache. http://infinispan.org/
Spark is a MapReduce style of distributed computing framwork. Visit http://spark.apache.org/ for more information
My thought on MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives
MEDIator: A Data Sharing Synchronization Platform for Heterogeneous Medical Image Archives
This is a paper written by my mentor.The content has connection with the project I am going to do.
This is a paper written by my mentor.The content has connection with the project I am going to do.
Here, I list something I learn form this paper.
While sharing data is encouraged in science, algorithms and architectures should be designed for mashing up and sharing the medical data efficiently. Hence, a data sharing
synchronization system should be secured and minimize
data duplication in client instances, in addition to the
regular requirements of the data access integration platforms.A data sharing synchronization
platform should let data consumers to view sub sets of
data that satisfy user-defined search criteria, and share
them with others using pointers to the actual data.
This paper presents MEDIator, a data sharing and synchronization middleware platform for heterogeneous medical image archives. MEDIator allows sharing pointers to medical data efficiently, while letting the consumers manipulate the pointers without modifying the raw medical data. MEDIator has been implemented for multiple data sources, including Amazon S3, The Cancer Imaging Archive (TCIA), caMicroscope, and metadata from CSV files for cancer images.
Also, an in-memory data grid can be an alternative for a traditional storage for the replica sets, as
it provides faster storage, access, and execution. And this paper uses the platform - Infinispan. By the way, in my project, I plan to use Infinispan.
MEDIator lets the users create, update, retrieve, and delete replica sets, and share the replica sets with others.
MEDIator lets the users create, update, retrieve, and delete replica sets, and share the replica sets with others.
Higher Level Use Case VIew |
MEDIator APIs :InterfaceAPI,PubConsAPI,Integrator
(details ignored)
Clinical data is deployed in multiple data sources such as TCIA, caMicroscope, and Amazon S3. Figure 3 de- picts the deployment of the system with multiple med- ical data sources.This part can help us access to different sources of data.
"MEDIator is multi-tenanted where multiple users co-exist without the knowledge of existence of the other users, sharing the same cache space. Involving a time stamp for the class extending P ubC onsAP I , downloaded items can be tracked, and the dis can be produced for the user download. Thus a download can be paused and resumed later, downloading the images that have not been downloaded yet."
---I am not clear about this paragraph, to be discussed.
To concluded, firstly, I think I can employ the part of the Representation of Medical Image Sources to my project. This part can help me to represent the source data.Secondly, I can join the MEDIator to access data and then do the Near Duplicate Detection work based on that.
(details ignored)
Integration with Medical Data Sources
|
Clinical data is deployed in multiple data sources such as TCIA, caMicroscope, and Amazon S3. Figure 3 de- picts the deployment of the system with multiple med- ical data sources.This part can help us access to different sources of data.
"MEDIator is multi-tenanted where multiple users co-exist without the knowledge of existence of the other users, sharing the same cache space. Involving a time stamp for the class extending P ubC onsAP I , downloaded items can be tracked, and the dis can be produced for the user download. Thus a download can be paused and resumed later, downloading the images that have not been downloaded yet."
---I am not clear about this paragraph, to be discussed.
To concluded, firstly, I think I can employ the part of the Representation of Medical Image Sources to my project. This part can help me to represent the source data.Secondly, I can join the MEDIator to access data and then do the Near Duplicate Detection work based on that.
2016年5月5日星期四
Hello, GSoC 2016!
I'm really happy to be selected in Google Summer of Code 2016. Thanks to my mentor Pradeeban and Ashish, your patience and help give me confidence when I wrote my proposal. Wish us a good cooperation.
The name of my GSoC project is Near Duplicate Detection in Medical Image Archives. My proposal is here.
I've set up my code repository at https://bitbucket.org/BMI/medicurator.
Good luck and have fun!
The name of my GSoC project is Near Duplicate Detection in Medical Image Archives. My proposal is here.
I've set up my code repository at https://bitbucket.org/BMI/medicurator.
Good luck and have fun!
Happy Birthday, Peking University!
Yesterday, May 4th is Peking University's 118th birthday. Congratulations! I'm proud of you forever.
订阅:
博文 (Atom)