Technical Talks

View All

Reproducibility in Data Science

Juliana Freire Juliana Freire | Professor of Computer Science and Engineering and Data Science | NYU

The abundance of data, coupled with cheap and widely-available computing and storage, has revolutionized science, industry and government alike. Now, to a large extent, the bottleneck to extracting actionable insights lies with people. Complex computational pipelines are required to ingest, clean, analyze, visualize and create models from data. But the process to assemble these is inherently iterative and time consuming. 

In addition,  after a series of steps, there are many ways in which the computations,  the data, and the analyst could have been wrong. Thus, when results are derived, an important question is whether you can trust them. In this talk, I will discuss the importance of computational provenance for data science and how it enables reproducibility, transparency, and helps build trust in results obtained from data-driven exploration. I will also present techniques and tools that support automatic provenance capture and simplify the reproducibility of computations.

Juliana Freire
Juliana Freire
Professor of Computer Science and Engineering and Data Science | NYU

Juliana Freire is a Professor of Computer Science and Data Science at New York University. She is the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD) and a council member of the Computing Research Association’s Computing Community Consortium (CCC). She was the lead investigator and executive director of the NYU Moore-Sloan Data Science Environment.

Her research interests are in large-scale data analysis, curation and integration, visualization, provenance management, and web information discovery. She has made fundamental contributions to data management methods and tools that address problems introduced by emerging applications including urban analytics and computational reproducibility. Freire has published over 180 technical papers, several open-source systems, and is an inventor of 12 U.S. patents.

She has co-authored 6 award-winning papers, including one that received the ACM SIGMOD Most Reproducible Paper Award. She is an ACM Fellow and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She received M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook.