Technical Talks

View All

GenAI and Datacomp: Creating the Largest Public Multimodal Dataset in Academia

Alex Dimakis Alex Dimakis | Professor | The University of Texas at Austin

In this enlightening presentation, we delve into the critical role universities and the open source community play within the Generative AI (GenAI) ecosystem, focusing on the monumental task of creating, curating, and evaluating large-scale datasets. Drawing on the pioneering work showcased in our NeurIPS'23 Datacomp paper, this talk outlines the creation of the largest public multimodal dataset in academia to date.

We will explore four pivotal trends that are shaping the future of AI data management:

  1. Automated Data Curation: As datasets expand, the task of data cleaning, traditionally performed manually by junior researchers, is evolving. We discuss how AI models are now being designed and trained to automate data curation, turning what was once a mundane task into an opportunity for intellectual and methodological innovation.

  2. Data-Centric AI: Moving away from the traditional AI research paradigm where models are iterated upon a fixed dataset, data-centric AI presents a flipped approach. Here, a fixed model works with a flexible dataset pool, allowing researchers to iterate on dataset curation to optimize performance. This emerging trend, highlighted by our work with Datacomp, emphasizes the importance of dataset quality over model architecture.

  3. Legal and Privacy Challenges: We address the increasing difficulties posed by legal and privacy issues in data sharing within the industry, which hinder the ability to fine-tune, specialize, or distill AI models effectively.

  4. Synthetic Dataset Curation: The talk will also cover the role of synthetic datasets and the augmentation of real datasets with synthetic elements, such as enhanced image captions, making this an area ripe for academic exploration and innovation.

Join us to understand how these trends are not only addressing current challenges but are also steering the direction of future AI research and application, ensuring academia remains at the forefront of technological advancement in GenAI.

Alex Dimakis
Alex Dimakis
Professor | The University of Texas at Austin

Alex Dimakis is a Professor at the Electrical and Computer Engineering department, University of Texas at Austin. He received his Ph.D. from UC Berkeley and the Diploma degree from the National Technical University of Athens. He received several awards including the James Massey Award, NSF Career, a Google research award, the Eli Jury dissertation award and the joint Information Theory and Communications Society Best Paper Award. His research interests include information theory, coding theory and machine learning. He is also currently the Director of the Center for Generative AI at the University of Texas and is Co-director of the National AI Institute for Foundations of Machine Learning.