Single-cell sequencing generates a new kind of genomic data, and with it new storage and compute challenges. I'll talk about recent work parallelizing analysis of this data using a variety of distributed backends (Apache Spark, Dask, Pywren, Apache Beam). I'll also discuss the Zarr format for storing and working with N-dimensional arrays, that several scientific domains have recently gravitated toward in response to challenges using HDF5 in parallel and in the cloud.

Download Slides

Experience talks like this and many more at our upcoming event

Learn More