Today large sequencing centers are producing genomic data at the rate of 10 terabytes a day and require complicated processing to transform massive amounts of noisy data into biological information. To address these needs, we are developing GESALL (GEnomic Scalable Analysis with Low Latency), a system for end-to-end processing of the genomic data. We aim to improve the overall system performance by using a variety of ideas from the database systems research community.
Building and Benchmarking a Parallel Deep Analysis Pipeline [Technical Report]
Yanlei Diao, Abhishek Roy, Toby Bloom. Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis. In 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015). [pdf]
Abhishek Roy, Yanlei Diao, Evan Mauceli, Yiping Shen, Bai-Lin Wu. Massive Genomic Data Processing and Deep Analysis. In Proceedings of the VLDB Endowment 5(12) (VLDB 2012). [pdf]
We will put the link to code here.
We gratefully acknowledge the funding provided by the following agencies: