Today large sequencing centers are producing genomic data at the rate of 10 terabytes a day and require complicated processing to transform massive amounts of noisy data into biological information. To address these needs, we are developing GESALL (GEnomic Scalable Analysis with Low Latency), a system for end-to-end processing of the genomic data. We aim to improve the overall system performance by using a variety of ideas from the database systems research community.

  • Building and Benchmarking a Parallel Deep Analysis Pipeline [Technical Report]

  • Yanlei Diao, Abhishek Roy, Toby Bloom. Building Highly-Optimized, Low-Latency Pipelines for Genomic Data Analysis. In 7th Biennial Conference on Innovative Data Systems Research (CIDR 2015). [pdf]

  • Abhishek Roy, Yanlei Diao, Evan Mauceli, Yiping Shen, Bai-Lin Wu. Massive Genomic Data Processing and Deep Analysis. In Proceedings of the VLDB Endowment 5(12) (VLDB 2012). [pdf]

We will put the link to code here.

GESALL is being developed by Yanlei Diao, Abhishek Roy, and Prashant Shenoy at the University of Massachusetts Amherst. Our collaborators include Evan Mauceli, Dr. Yiping Shen, and Dr. Bai-Lin Wu at the Children's Hospital Boston, and Dr. Toby Bloom at the New York Genome Center.

We gratefully acknowledge the funding provided by the following agencies:

National Science Foundation

Massachusetts Green High Performance Computing Center

UMass Science & Technology Program