DC-Logo Type_ Invert(1)
  • DC_THURS
  • Community
  • Speakers
  • Talks
  • Blog
  • Locations
    • San Francisco
    • Barcelona
    • New York
    • Kuala Lumpur
Data Council

Data Council Talks

Creating an Extensible Big Data Platform

Creating an Extensible Big Data Platform
Reza Shiftehfar | Uber
DBT: Powerful, Open Source Data Transformations

DBT: Powerful, Open Source Data Transformations
Connor McArthur | Fishtown Analytics / DBT
Architecting the Data Lake: How to Ensure that Your ETL Pipelines Deliver High Quality Data

Architecting the Data Lake: How to Ensure that ...
Paul Lappas | Intermix
OASIS – Data Analysis Platform for Enterprise

OASIS – Data Analysis Platform for Enterprise
Keiji Yoshida | LINE Corporation
Building Highly Reliable Data Pipelines at Datadog

Building Highly Reliable Data Pipelines at Datadog
Quentin Francois | Datadog
Building a Feature Platform to Scale Machine Learning

Building a Feature Platform to Scale Machine ...
Willem Pienaar | GO-JEK
Creating a Data Engineering Culture

Creating a Data Engineering Culture
Jesse Anderson | Big Data Institute
Big Data Platform-as-a-Service for Cross-Media Monitoring

Big Data Platform-as-a-Service for Cross-Media ...
Liana Napalkova | Eurecat Technology Center
Apache Arrow: A Cross-language Development Platform for In-memory Data

Apache Arrow: A Cross-language Development ...
Wes McKinney | Ursa Labs
Computer Vision in Space

Computer Vision in Space
Marco Bressan | Satellogic
Easy Access to Data with Presto

Easy Access to Data with Presto
Iker Martinez de Apellaniz | Schibsted Classified Media
Event-Driven Data Architecture at Letgo

Event-Driven Data Architecture at Letgo
Ricardo Fanjul | Letgo
Automation of Machine Learning Workflows at BBVA

Automation of Machine Learning Workflows at BBVA
Jose Serrano | BBVA Data & Analytics
Flink SQL in Action

Flink SQL in Action
Timo Walther | Ververica
Data Processing with Apache Beam: Towards Portability and Beyond

Data Processing with Apache Beam: Towards ...
Maximilian Michels | Google
Using R in a Mid-Sized Data Analysis Scenario

Using R in a Mid-Sized Data Analysis Scenario
Richard Brosi | Yara Digital Labs
Driving in Dataland

Driving in Dataland
Carlos Herrera | Cabify
Marquez: A Metadata Service for Data Abstraction, Data Lineage, and Event-based Triggers

Marquez: A Metadata Service for Data Abstraction, ...
Willy Lulciuc | Datakin
Presto: Fast SQL-on-Anything

Presto: Fast SQL-on-Anything
Kamil Bajda-Pawlikowski | Starburst
Extract - Tiered Transform - Load (ETTL): A pipeline for a modular, scalable, and observable Internal Analytics platform

Extract - Tiered Transform - Load (ETTL): A ...
Jean-Mathieu Saponaro | Datadog
Using Embeddings to Understand the Evolution of Data Science Skill Sets

Using Embeddings to Understand the Evolution of ...
Maryam Jahanshahi | TapRecruit
Data Pipeline Frameworks: The Dream and the Reality

Data Pipeline Frameworks: The Dream and the ...
Mark Weiss | Beeswax
An Update on Scikit-learn

An Update on Scikit-learn
Andreas Mueller | Data Science Institute
Active Learning: Why Smart Labeling is the Future of Data Annotation

Active Learning: Why Smart Labeling is the Future ...
Jennifer Prendki | Alectio
Scaling Personalization via Machine-Learned Assortment Optimization

Scaling Personalization via Machine-Learned ...
Ethan Rosenthal | Dia&Co
Hindsight Bias: How to Deal with Label Leakage at Scale

Hindsight Bias: How to Deal with Label Leakage at ...
Till Bergmann | Salesforce
Oops I did it Again -- Adapting a Pop Music Identifier to Find Syndicated Content in Talk Radio

Oops I did it Again -- Adapting a Pop Music ...
Allison King | Cortico
Artwork Personalization at Netflix

Artwork Personalization at Netflix
Tony Jebara | Netflix
PyTorch 1.0 - The Platform for Accelerating AI Research to Production

PyTorch 1.0 - The Platform for Accelerating AI ...
Jeff Smith | Facebook AI Research
Scalability is Quantifiable: The Universal Scalability Law

Scalability is Quantifiable: The Universal ...
Baron Schwartz | VividCortex
Content Based Recommendations: Using Word Embeddings to Automate Related Content Generation at BuzzFeed

Content Based Recommendations: Using Word ...
Carolyn Huangci | BuzzFeed
Building a Modern Machine Learning Platform on Kubernetes

Building a Modern Machine Learning Platform on ...
Saurabh Bajaj | Lyft
Evolving Stitch Fix's Data Platform for Data Lineage

Evolving Stitch Fix's Data Platform for Data ...
Neelesh Srinivas Salian | Stitch Fix
The Literate Programmer: Cargo Cult Open Source

The Literate Programmer: Cargo Cult Open Source
Wes Chow | Cortico
Accelerating Single-cell Bioinformatics with N-dimensional Arrays in the Cloud

Accelerating Single-cell Bioinformatics with ...
Ryan Williams | Icahn School of Medicine at Mount Sinai
The Software Architecture of WayUp's Job Recommender System

The Software Architecture of WayUp's Job ...
Harlan Harris | WayUp
Automating Modeling Pipelines

Automating Modeling Pipelines
William Nelson | Intent Media Inc
Building Data Tools that Work

Building Data Tools that Work
Benn Stancil | Mode
Analyzing Data in the Cloud: Is True Privacy and Security Possible?

Analyzing Data in the Cloud: Is True Privacy and ...
Raghu Murthy | Datacoral
Stream Processing Design Patterns

Stream Processing Design Patterns
Andreas Markmann | Capital One
Scale Processes, Not People: How Data Teams Do More With Less By Adopting Software Engineering Best Practices

Scale Processes, Not People: How Data Teams Do ...
Thomas La Piana | GitLab
AI farming: 100x the yield with a data team of 1

AI farming: 100x the yield with a data team of 1
Sam Swift | Bowery Farming
The Highs and Lows of Building an Adtech Data Pipeline

The Highs and Lows of Building an Adtech Data ...
Dan Goldin | TripleLift
The Customer as The Unit of Analysis: Models, Metrics and a Multitude of Uses

The Customer as The Unit of Analysis: Models, ...
Brian Bloniarz | Second Measure
Technical Founders Panel

Technical Founders Panel
William Falcon | Facebook / NYU
Computer Vision AI to Disrupt Digital Advertising

Computer Vision AI to Disrupt Digital Advertising
Joy Tang | Markable AI
Running Effective Machine Learning Teams: Common Issues, Challenges and Solutions

Running Effective Machine Learning Teams: Common ...
Gideon Mendels | Comet.ml
Optimizing Time to Data through Streams and Data Abstraction

Optimizing Time to Data through Streams and Data ...
Nicolas Joseph | Datalogue
Engineering Lessons Learned by Data Scientists in Growing MalwareScore from Kaggle Competition to Trusted Antivirus Solution

Engineering Lessons Learned by Data Scientists in ...
Phil Roth | Endgame
The Unreasonable Deceptiveness of Bad Data

The Unreasonable Deceptiveness of Bad Data
Rigel Swavely | Clarifai
Fixing the Big Data Development Cycle with SQL

Fixing the Big Data Development Cycle with SQL
Justin Coffey | Criteo Labs
Building a Knowledge Graph Platform from Billions of Unstructured Online Sources Using AI

Building a Knowledge Graph Platform from Billions ...
Aditya Jami | Meltwater
Three Tips for Better Predictive Modeling

Three Tips for Better Predictive Modeling
Stephanie Yang | Foursquare
AI Challenges in Customer Care Automation

AI Challenges in Customer Care Automation
Sameer Yami | Linc Global
Building a Music Analytics Pipeline at Pandora

Building a Music Analytics Pipeline at Pandora
Brian Femiano | Pandora
Fast Data apps with Alpakka Kafka connector and Akka Streams

Fast Data apps with Alpakka Kafka connector and ...
Sean Glover | Lightbend
Causal Data Science

Causal Data Science
Adam Kelleher | Barclays Investment Bank
Machine Learning from Development to Production at Instacart

Machine Learning from Development to Production ...
Montana Low | Instacart
Cloud Data Warehouse Benchmark: Redshift vs Snowflake vs BigQuery

Cloud Data Warehouse Benchmark: Redshift vs ...
George Fraser | Fivetran
Democratizing Data with the Clover Transform Framework

Democratizing Data with the Clover Transform ...
Chris Hartfield | Clover Health
Functional Data Engineering - A Set of Best Practices

Functional Data Engineering - A Set of Best ...
Max Beauchemin | Stealth
Uber’s Data Journey: 100+PB with Minute Latency

Uber’s Data Journey: 100+PB with Minute Latency
Reza Shiftehfar | Uber
Scaling a Relational Database for the Cloud-age

Scaling a Relational Database for the Cloud-age
Sumedh Pathak | Citus Data
Lazy Beats Smart and Fast

Lazy Beats Smart and Fast
Julian Hyde | Looker
Effective Management of High Volume Numeric Data with Histograms

Effective Management of High Volume Numeric Data ...
Fred Moyer | Circonus
From Flat Files to Deconstructed Database: The Evolution and Future of the Big Data Ecosystem

From Flat Files to Deconstructed Database: The ...
Julien Le Dem | WeWork
A Trillion Rows Per Second as a Foundation for Interactive Analytics

A Trillion Rows Per Second as a Foundation for ...
Eric Hanson | MemSQL
What the heck is an In-Memory Data Grid?

What the heck is an In-Memory Data Grid?
Addison Huddy | Pivotal
Data Access for Data Science

Data Access for Data Science
Jacques Nadeau | Dremio
A Multi-Armed Bandit Framework for Recommendations at Netflix

A Multi-Armed Bandit Framework for ...
Jaya Kawale | Netflix
Fast & Effective: Natural Language Understanding

Fast & Effective: Natural Language Understanding
Mike Conover | Workday
Weld: Accelerating Data Science by 100x

Weld: Accelerating Data Science by 100x
Shoumik Palkar | Stanford University Infolab
AutoML: The Assembly Line of Machine Learning

AutoML: The Assembly Line of Machine Learning
Mayukh Bhaowal | Salesforce
Enabling Full Stack Data Scientists at Stitch Fix

Enabling Full Stack Data Scientists at Stitch Fix
Juliet Hougland | Stitch Fix
Safely Streamlining Healthcare Policy Management using Ideas from Structured Natural Language Processing (SNLP)

Safely Streamlining Healthcare Policy Management ...
Asif Khalak | Collective Health
Democratizing Metric Definition and Discovery at Airbnb

Democratizing Metric Definition and Discovery at ...
Lauren Chircus | Airbnb
Define Once, Evaluate Anywhere: Building Repeatable and Correct Features at Stripe

Define Once, Evaluate Anywhere: Building ...
Kelley Rivoire | Stripe
Hazardous Models and Risk Mitigation in Real Estate

Hazardous Models and Risk Mitigation in Real ...
David Lundgren | Opendoor
The Design of Systems for Real-time Prediction Serving

The Design of Systems for Real-time Prediction ...
Joseph Gonzalez | UC Berkeley
Data Science: Past, Present, Future

Data Science: Past, Present, Future
Shubha Nabar | Salesforce
VC Panel Talk

VC Panel Talk
Lisha Li | Amplify PartnersAmplify Partners
Macrobase: A Search Engine for Fast Data Streams

Macrobase: A Search Engine for Fast Data Streams
Sahaana Suri | Stanford University
TimescaleDB: Rearchitecting a SQL database for time-series data

TimescaleDB: Rearchitecting a SQL database for ...
Mike Freedman | TimescaleDB
Production Analytics With a Distributed Column Store

Production Analytics With a Distributed Column ...
Sam Stokes | Honeycomb
Don’t optimize my queries, optimize my data!

Don’t optimize my queries, optimize my data!
Julian Hyde | Looker
Worse Case Scenario in the Database

Worse Case Scenario in the Database
Marianne Bellotti | United States Digital Service
The Challenges of Distributing Postgres: A Citus Story

The Challenges of Distributing Postgres: A Citus ...
Ozgun Erdogan | Citus Data
The Statistics of Dirty Data

The Statistics of Dirty Data
Sanjay Krishnan | UC Berkeley
Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark

Easy, Scalable, Fault-tolerant Stream Processing ...
Burak Yavuz | Databricks
Using Apache Arrow, Calcite and Parquet to build a Relational Cache

Using Apache Arrow, Calcite and Parquet to build ...
Jacques Nadeau | Dremio
How Spotify Distills Terabytes of Raw Data into Meaningful Music Recommendations

How Spotify Distills Terabytes of Raw Data into ...
Gandalf Hernandez | Spotify
Building a Recursive BigQuery Mapper

Building a Recursive BigQuery Mapper
Darren McCleary | The New York Times
Building ETL Infrastructure that Analysts Love

Building ETL Infrastructure that Analysts Love
Christian Romming | ETLeap
Using Apache Spark for processing trillions of records each day at Datadog

Using Apache Spark for processing trillions of ...
Vadim Semenov | Datadog
Productizing structural models

Productizing structural models
James Savage | Schmidt Futures
Lessons in hiring data scientists

Lessons in hiring data scientists
Genevieve Smith | Insight
Using Causal Forests for Subpopulation Identification in Randomized Clinical Trials

Using Causal Forests for Subpopulation ...
James Faghmous | Icahn School of Medicine at Mt. Sinai
You Won't Believe How We Optimize our Headlines

You Won't Believe How We Optimize our Headlines
Lucy Wang | BuzzFeed
Deploying Data Science for Distribution of The New York Times

Deploying Data Science for Distribution of The ...
Anne Bauer | The New York Times
Zip codes and other lies your map told you

Zip codes and other lies your map told you
Peter Lenz | Dstilery
Automating machine learning

Automating machine learning
Andreas Mueller | Data Science Institute
Building automated support at Kickstarter

Building automated support at Kickstarter
Jeffrey Doker | Kickstarter
Privacy Techniques for Data Science

Privacy Techniques for Data Science
Jim Klucar | Immuta
The Future of Data Science in the Media

The Future of Data Science in the Media
Haile Owusu | Mashable
Composable Interfaces for Parallel Processing in Apache Spark & Weld

Composable Interfaces for Parallel Processing in ...
Matei Zaharia | Databricks
Data Science @ Pinterest

Data Science @ Pinterest
Mohammad Shahangian | Pinterest
Data Science in the Enterprise

Data Science in the Enterprise
Sean Anderson | Cloudera
Practical Solutions for Annoying Machine Learning Problems

Practical Solutions for Annoying Machine Learning ...
Alyssa Frazee | Stripe
Beyond 50,000 Partitions: How Heroku Pushes the Limits of Kafka at Scale

Beyond 50,000 Partitions: How Heroku Pushes the ...
Jeff Chao | Heroku
Scaling Up Spark at Facebook – a 60TB Production Use Case

Scaling Up Spark at Facebook – a 60TB Production ...
Shuojie Wang | Facebook
 How Superset and Druid Power Real-Time Analytics at Airbnb

How Superset and Druid Power Real-Time Analytics ...
Max Beauchemin | Stealth
Hoodie: An Open Source Incremental Processing Framework From Uber

Hoodie: An Open Source Incremental Processing ...
Vinoth Chandar | Uber
Data for the 99%

Data for the 99%
Benn Stancil | Mode
Payment Fraud in Digital Currency

Payment Fraud in Digital Currency
Soups Ranjan | Revolut
The Limitations of Big Data in Predictive Analytics

The Limitations of Big Data in Predictive ...
Jennifer Prendki | Alectio
Practical Lessons for Building Machine Learning Models in Production

Practical Lessons for Building Machine Learning ...
Sharath Rao | Instacart
An Introduction to Big Data's Unsung Hero: The Log

An Introduction to Big Data's Unsung Hero: The Log
Liz Bennett | Stitch Fix
Why, When, How: Lessons Learned in Applying Deep Learning to Real-World Problems

Why, When, How: Lessons Learned in Applying Deep ...
Daniel Galron | eBay
Anomaly Detection for Data Quality and Metric Shifts at Netflix

Anomaly Detection for Data Quality and Metric ...
Laura Pruitt | Netflix
Cloud-Native Stream Processing

Cloud-Native Stream Processing
Sid Anand | Apache Software Foundation
Parsing of Diverse Healthcare Data

Parsing of Diverse Healthcare Data
Chris Hartfield | Clover Health
Interactive Exploratory Analytics with Druid

Interactive Exploratory Analytics with Druid
Fangjin Yang | Imply
InfluxDB Storage Engine Internals

InfluxDB Storage Engine Internals
Paul Dix | Metamarkets
A Nation of Immigrants: The Data Sciences

A Nation of Immigrants: The Data Sciences
Kenneth Sanford | Dataiku
Twitter Heron: The Path Towards Elastic Streaming

Twitter Heron: The Path Towards Elastic Streaming
Ashvin Agrawal | Microsoft
Simulation-based Inference: Advantages Over A/B Testing in Real Estate

Simulation-based Inference: Advantages Over A/B ...
Nelson Ray | Opendoor
Format Wars: from VHS and Beta to Avro and Parquet

Format Wars: from VHS and Beta to Avro and Parquet
Silvia Oliveros Torres | Silicon Valley Data Science
Real-time System Computing Engines

Real-time System Computing Engines
Steffen Peter | Levyx
The Right Stuff: Lessons Learned from a Decade of Data Engineering

The Right Stuff: Lessons Learned from a Decade of ...
Ben Hamner | Kaggle
How Engineer Angels Evaluate Data-Oriented Companies

How Engineer Angels Evaluate Data-Oriented ...
Jocelyn Goldfein | Zetta
The Future of Column-Oriented Data Processing with Arrow and Parquet

The Future of Column-Oriented Data Processing ...
Julien Le Dem | WeWork
Computational Social Science: Exciting Progress & Future Challenges

Computational Social Science: Exciting Progress & ...
Duncan Watts | Microsoft Research
Causal Inference in Data Science From Prediction to Causation

Causal Inference in Data Science From Prediction ...
Amit Sharma | Microsoft Research
Reinforcement Learning for Data Scientists

Reinforcement Learning for Data Scientists
Brian Farris | Bloomberg
How to Change a City with Data Science

How to Change a City with Data Science
Ben Wellington | Two Sigma
Python Data Wrangling: Preparing for the Future

Python Data Wrangling: Preparing for the Future
Wes McKinney | Ursa Labs
Lessons Learned Optimizing NoSQL for Apache Spark

Lessons Learned Optimizing NoSQL for Apache Spark
John Musser | Ford Motor Company
Unified Pipeline Architecture: The Evolution of Data Processing at Spotify

Unified Pipeline Architecture: The Evolution of ...
Erin Palmer | Spotify
SystemML & Spark: a Framework for Scalable Data Science Algorithm Development

SystemML & Spark: a Framework for Scalable Data ...
Jerome Nilmeier | IBM
Stop Obsessing about Data Infrastructure

Stop Obsessing about Data Infrastructure
Yair Weinberger | Alooma
Apache Kafka and the Rise of Stream Processing

Apache Kafka and the Rise of Stream Processing
Guozhang Wang | Confluent
Genomic Data Analysis with Spark & Hadoop

Genomic Data Analysis with Spark & Hadoop
Ryan Williams | Icahn School of Medicine at Mount Sinai
Anomaly Detection for Real-World Systems

Anomaly Detection for Real-World Systems
Manojit Nandi | STEALTHbits
Building a Cloud-Native SQL Database

Building a Cloud-Native SQL Database
Alex Robinson | Cockroach Labs
To Get the Value, Ditch the Hype

To Get the Value, Ditch the Hype
Nick Ursa | The New York Times
Statistical and Computational Challenges of Real-Time News Clustering

Statistical and Computational Challenges of ...
Jeiran Jahani | Chartbeat
Predicting Chaotic Systems with Sparse Data: Lessons from Numerical Weather Prediction

Predicting Chaotic Systems with Sparse Data: ...
David Kelly | New York University
The Trials and Tribulations of Scaling Data Science and Engineering

The Trials and Tribulations of Scaling Data ...
Ashley Miller | Datadog
Peloton: The Self-Driving Database Management System

Peloton: The Self-Driving Database Management ...
Andy Pavlo | Carnegie Melon University
Elastic Big Data Platform at Datadog

Elastic Big Data Platform at Datadog
Doug Daniels | Datadog
Processing Geographic Data at Internet Scale

Processing Geographic Data at Internet Scale
Peter Lenz | Dstilery
Bias, Variance and Adaptive Products

Bias, Variance and Adaptive Products
George Davis | Frame.ai
Career Panel - Leveling Up in Your Career as a Data Scientist/Engineer

Career Panel - Leveling Up in Your Career as a ...
Nick Chamandy | Lyft
VC Panel - The Present Future of Data-Oriented Startups | DataEngConf NY '16

VC Panel - The Present Future of Data-Oriented ...
Matt Hartman | Betaworks
Fighting Churn with Data

Fighting Churn with Data
Carl Gold | Zuora
How Data is Transforming Politics

How Data is Transforming Politics
Catherine Tarsney | Democratic National Committee
Scaling model training: from flexible training APIs to resource management with Kubernetes

Scaling model training: from flexible training ...
Kelley Rivoire | Stripe
Dagster: A New Programming Model for Data Processing

Dagster: A New Programming Model for Data ...
Nicholas Schrock | Elementl
Time Series Prediction with TensorFlow

Time Series Prediction with TensorFlow
Jerome Nilmeier | IBM
Ray for Reinforcement Learning

Ray for Reinforcement Learning
Ion Stoica | Rise Lab (UC Berkeley)
Building a Distributed Data Access Layer for Analytics on Any Cloud

Building a Distributed Data Access Layer for ...
Bin Fan | Alluxio Inc
Notebooks as Functions with Papermill

Notebooks as Functions with Papermill
Matt Seal | Netflix
The history and anatomy of Apache Superset

The history and anatomy of Apache Superset
Max Beauchemin | Stealth
Machine Learning Infrastructure at an Early Stage

Machine Learning Infrastructure at an Early Stage
Spencer Barton | Branch International
Actionable and Interpretable Predictions from a Stacked Model

Actionable and Interpretable Predictions from a ...
Austen Head | Quid
Scalability! But at What COST?

Scalability! But at What COST?
Frank McSherry | Materialize
Reducing Student Loans with Bot-Powered Humans

Reducing Student Loans with Bot-Powered Humans
William Falcon | Facebook / NYU
Building Bots, Building Blocks: How Forbes Experiments, Evaluates, and Kills Data-driven Bots

Building Bots, Building Blocks: How Forbes ...
Luis Capelo | Forbes
Tactical Data Engineering

Tactical Data Engineering
Julian Hyde | Looker
Scaling Data Products Under Startup Constraints: A Case Study of ML Bias Testing

Scaling Data Products Under Startup Constraints: ...
Edwin Ong | TinyData
Operating Multi-Tenant Kafka Services for Developers on Heroku

Operating Multi-Tenant Kafka Services for ...
Ali Hamidi | Salesforce
Building a Lean AI Startup - Lessons Learned

Building a Lean AI Startup - Lessons Learned
Paul Cothenet | MadKudu
Scaling the best healthcare to everyone, with AI

Scaling the best healthcare to everyone, with AI
Anitha Kannan | Curai
Introducing Data Downtime: From Firefighting to Winning

Introducing Data Downtime: From Firefighting to ...
Barr Moses | Monte Carlo
Explaining AI: Putting Theory into Practice

Explaining AI: Putting Theory into Practice
Luke Merrick | Fiddler Labs
When Testing in Production is a Good Idea

When Testing in Production is a Good Idea
Dan Robinson | Heap
Accelerating Machine Learning with Training Data Management

Accelerating Machine Learning with Training Data ...
Alex Ratner | Stanford University
Making Friends with Generative Models

Making Friends with Generative Models
Andrew Colombi | Tonic
Amundsen: A Data Discovery Platform From Lyft

Amundsen: A Data Discovery Platform From Lyft
Tao Feng | Lyft
Powering Uber's global network analytics pipelines in near real-time with Apache Hudi (Incubating) Delta Streamer

Powering Uber's global network analytics ...
Nishith Agarwal | Uber
Introducing Switch: A Framework for Custom Data Applications

Introducing Switch: A Framework for Custom Data ...
Josh Ferguson | Mode
Appifying Data Science Workflows to Create Composable, User-Friendly Data Pipeline Products

Appifying Data Science Workflows to Create ...
Austen Head | Quid
End-to-end Exactly-once Aggregation Over Ad Streams

End-to-end Exactly-once Aggregation Over Ad ...
Amit Ramesh | Yelp
Transfer Learning in NLP - How to Help Small Teams Account for Small Datasets

Transfer Learning in NLP - How to Help Small ...
Ryan Smith | Wootric
Bighead: Airbnb's end-to-end Machine Learning Platform

Bighead: Airbnb's end-to-end Machine Learning ...
Andrew Hoh | Airbnb
Balancing Broad Data Access with Usability at Scale

Balancing Broad Data Access with Usability at ...
Austin Wilt | Slack
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark

Real-Time Data Pipelines Made Easy with ...
Tathagata Das | Databricks
Scalable Data Ingestion Architecture Using Airflow and Spark

Scalable Data Ingestion Architecture Using ...
Johannes Leppä | Komodo Health
Spatial Data Science Methods for Improving Models

Spatial Data Science Methods for Improving Models
Andy Eschbacher | Carto
Building a Programming By Example (PBE) Framework in Trifacta: Lessons learned and implications for Data Analytics

Building a Programming By Example (PBE) Framework ...
Anish Doshi | Trifacta
Split Learning: A Resource Efficient Distributed Deep Learning Method without Sensitive Data Sharing

Split Learning: A Resource Efficient Distributed ...
Praneeth Vepakomma | Massachusetts Institute of Technology
Building a Data-Powered Sales Intelligence Platform

Building a Data-Powered Sales Intelligence ...
Durgam Vahia | LinkedIn
Running Airflow reliably with Kubernetes

Running Airflow reliably with Kubernetes
Greg Neiheisel | Astronomer
Distributed SQL Databases Deconstructed

Distributed SQL Databases Deconstructed
Karthik Ranganathan | YugaByte
Swimming in the Data River, or, when “Streaming Analytics” isn’t

Swimming in the Data River, or, when “Streaming ...
Gian Merlino | Imply
Building Resilient Machine Learning Pipelines with IoT Data

Building Resilient Machine Learning Pipelines ...
Hedi Razavi | Keewi
Architecting a Low-Latency Schemaless SQL Engine

Architecting a Low-Latency Schemaless SQL Engine
Igor Canadi | Rockset
Data Security and Privacy in the Age of Machine Learning

Data Security and Privacy in the Age of Machine ...
Soups Ranjan | Revolut
Delphi: A Hybrid Approach to Forecasting a Global Marketplace

Delphi: A Hybrid Approach to Forecasting a Global ...
Kai Brusch | Airbnb
Building Real-Time Analytics Applications Using Apache Pinot: A Case Study of LinkedIn

Building Real-Time Analytics Applications Using ...
Kishore Gopalakrishna | Founding Engineer
Data Modeling and Processing for a Travel Super App

Data Modeling and Processing for a Travel Super ...
Rendy Bambang Junior | Traveloka
Argo: Kubernetes Native Workflows and Pipelines

Argo: Kubernetes Native Workflows and Pipelines
Greg Roodt | Canva
Data Architecture 101 for Your Business

Data Architecture 101 for Your Business
Bence Faludi | Independent Consultant
Causal Inference: Making the Right Intervention

Causal Inference: Making the Right Intervention
Paul Beaumont | QuantumBlack
Scaling Data Science Teams: Twitter's Perspective

Scaling Data Science Teams: Twitter's Perspective
Miguel Rios | Twitter
Building Data Products with Machine Learning at Zendesk

Building Data Products with Machine Learning at ...
Chris Hausler | Zendesk
Revenue Maximization in the Shared Bike Business Using Network Analysis and Geospatial Mapping

Revenue Maximization in the Shared Bike Business ...
Arpit Agarwal | Zoomcar
Sparklens: Understanding the Scalability Limits of Spark Applications

Sparklens: Understanding the Scalability Limits ...
Ashish Dubey | Qubole
A View from Apache Flink on Evolution and Outlooks for the Modern Stateful Stream Processor

A View from Apache Flink on Evolution and ...
Tzu-Li (Gordon) Tai | Ververica
Presto: Optimizing Performance of SQL-on-Anything

Presto: Optimizing Performance of SQL-on-Anything
Kamil Bajda-Pawlikowski | Starburst
Building Data Orchestration for Big Data Analytics in the Cloud

Building Data Orchestration for Big Data ...
Bin Fan | Alluxio Inc
7 Habits to Build Ethical AI

7 Habits to Build Ethical AI
Karthik Thirumalai | Teradata
Translating Source Code into Natural Language with AI

Translating Source Code into Natural Language ...
Mikhail Filippov | Quod AI
Taking Recommendation to the Masses

Taking Recommendation to the Masses
Le Zhang | Microsoft
Autoencoder Forest for Anomaly Detection from IoT Time Series

Autoencoder Forest for Anomaly Detection from IoT ...
Yiqun Hu | SP Group
Delivering ML Models the Safe and Sane Way

Delivering ML Models the Safe and Sane Way
David Tan | Thoughtworks
Building Data Engineering Teams

Building Data Engineering Teams
Wouter de Bie | Datadog
Murron: Reliable Logging Pipeline

Murron: Reliable Logging Pipeline
Ananth Packkildurai | Slack
Evolution of Data Ingestion and Product Instrumentation at Prezi

Evolution of Data Ingestion and Product ...
Tamás Németh | Prezi
Data at Marfeel: Addressing Complexity at Scale with the Latest Technologies

Data at Marfeel: Addressing Complexity at Scale ...
Alessandro Pregnolato | Marfeel
Kafka Streams in Production: From Use Case to Monitoring

Kafka Streams in Production: From Use Case to ...
Alexander Kudryashov | New Relic
GDPR: Discover The Main Challenges & Considerations

GDPR: Discover The Main Challenges & ...
Jordi Miró Bruix | Lernin Games
Blending Event Stream Processing with Machine Learning Using the Kafka Ecosystem

Blending Event Stream Processing with Machine ...
Andrea Spina | Radicalbit
A Federated Information Infrastructure That Works

A Federated Information Infrastructure That Works
Xavier Gumara Rigol | Adevinta
Explainable AI

Explainable AI
Ricardo Baeza-Yates | NTENT
A Machine Learning Approach to Optimize Prices During Clearance Sales at MANGO

A Machine Learning Approach to Optimize Prices ...
Carmen Herrero | MANGO
Uncertainty-Aware Food Recognition by Deep Learning

Uncertainty-Aware Food Recognition by Deep ...
Petia Radeva | University of Barcelona
Chatbots at Nestle: Improving Performance on Intent Detection

Chatbots at Nestle: Improving Performance on ...
Maria Crosas Batista | Nestlé
Talking Bayes to Business: A/B Testing Use Case

Talking Bayes to Business: A/B Testing Use Case
Yizhar Toren | Shopify
Machine Learning for Brain Health and Understanding at Starlab Neuroscience

Machine Learning for Brain Health and ...
Aureli Soria-Frisch | Starlab Consulting Division
Improving Search with Natural Language Processing and Deep Learning

Improving Search with Natural Language Processing ...
Markus Ludwig | Scout24
Rethinking Transportation in Cities: Making Traffic Smarter Through Optimization and Location Intelligence

Rethinking Transportation in Cities: Making ...
Miguel Alvarez | CARTO
Why We Defined a Metalanguage for SQL

Why We Defined a Metalanguage for SQL
Lewis Hemens | Dataform
Building a 1500-Class Listing Categorizer from Implicit User Feedback

Building a 1500-Class Listing Categorizer from ...
Arnau Tibau Puig | Letgo
The Case for Metadata for Machine Learning Platforms

The Case for Metadata for Machine Learning ...
Joerg Schad | ArangoDB
VEA: Validating, Evolving & Anonymizing Data in Real Time

VEA: Validating, Evolving & Anonymizing Data in ...
Albert Franzi Cros | Alpha Health
It's About Time: An Introduction to Timely Dataflow

It's About Time: An Introduction to Timely ...
Malte Sandstede | Clockworks
Stream Processing Beyond Streaming Data

Stream Processing Beyond Streaming Data
Timo Walther | Ververica
Kubernetes as a Streaming Data Platform: A Federated Operator Approach

Kubernetes as a Streaming Data Platform: A ...
Gerard Maas | Lightbend
Intelligent Orchestration - Data's Missing Link

Intelligent Orchestration - Data's Missing Link
Sean Knapp | Ascend
Building a Scalable Real-Time Data Pipeline

Building a Scalable Real-Time Data Pipeline
Vicente Valls Rios | Delivery Hero
Emotion Recognition in Images and Text

Emotion Recognition in Images and Text
Agata Lapedriza | UOC / MIT Media Lab
Issue with Tracking? Fail That Build!

Issue with Tracking? Fail That Build!
Steve Coppin-Smith | Snowplow Analytics
A Data- and Knowledge-Driven Approach for Mental Health Policy-Making at World Health Organization

A Data- and Knowledge-Driven Approach for Mental ...
Karina Gibert | UPC
Cloud-Based Practices for Effective Data Science at Scale

Cloud-Based Practices for Effective Data Science ...
Tiago Henriques | Microsoft
From Water Purification to Science Documentaries: Industrial Applications of AI, High Performance Computing, and Data Visualization

From Water Purification to Science Documentaries: ...
Fernando Cucchietti | Barcelona Supercomputing Center
Data Stories by Humans, for Humans

Data Stories by Humans, for Humans
Xaquín G.V. |
News in the Age of Algorithmic Recommendation

News in the Age of Algorithmic Recommendation
Nick Rockwell | The New York Times
One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques

One Explanation Does Not Fit All: A Toolkit and ...
Ronny Luss | IBM Research AI
From Batch to Streaming to Both: The Data Platform at Skyscanner

From Batch to Streaming to Both: The Data ...
Herman Schaaf | Skyscanner
A Label-Free World - Current State Of Unsupervised Deep Learning

A Label-Free World - Current State Of ...
William Falcon | Facebook / NYU
Online Learning of Website Embeddings for Accurate Prediction of User Behavior Even When Data Are Scarce

Online Learning of Website Embeddings for ...
Amelia White | Dstillery
Tailor-S: Look What You Made Me Do

Tailor-S: Look What You Made Me Do
Vadim Semenov | Datadog
Testing and Documenting Your Data <br> Doesn't Have to Suck

Testing and Documenting Your Data
Doesn't ...

Abe Gong | Superconductive
The Materialize Incremental View Maintenance Engine

The Materialize Incremental View Maintenance ...
Frank McSherry | Materialize
A Modern Love Story: Machine Learning & The Global Sports Betting Industry

A Modern Love Story: Machine Learning & The ...
Lloyd Danzig | ICED(AI)
Empowering Customer-Facing Teams With Voice-Based AI

Empowering Customer-Facing Teams With Voice-Based ...
Yev Meyer | Guru
Accelerate Source to Signal: Data Engineering Efficiency

Accelerate Source to Signal: Data Engineering ...
Mark Etherington | Crux Informatics
Real-time SQL Stream Processing at Scale with Apache Kafka and KSQL

Real-time SQL Stream Processing at Scale with ...
Viktor Gamov | Confluent
Combating AI Bias at Scale

Combating AI Bias at Scale
Lucy Vasserman | Jigsaw
Kubernetes-Native Workflow Orchestration with Argo

Kubernetes-Native Workflow Orchestration with Argo
Kai Rikhye | Skillshare
3 Best Practices for Data Organizations: Structure, ROI, Communications

3 Best Practices for Data Organizations: ...
Barr Moses | Monte Carlo
Building an End-to-End Data Stack in Thirty Minutes

Building an End-to-End Data Stack in Thirty ...
Benn Stancil | Mode
Messy Data and Reluctant Users - The Trouble with Healthcare Data

Messy Data and Reluctant Users - The Trouble with ...
Samantha Bail |
Building a Knowledge Graph Using Messy Real Estate Data

Building a Knowledge Graph Using Messy Real ...
John Maiden | Cherre
Intrinsic Autoregressive Models in Stan

Intrinsic Autoregressive Models in Stan
Susana Marquez | Rockefeller Foundation
Data Scientist, or the Most Dangerous Job of the 21st Century

Data Scientist, or the Most Dangerous Job of the ...
Hugo Bowne-Anderson | DataCamp
Augmented Programming

Augmented Programming
Gideon Mann | Bloomberg
Building Efficient ML Pipelines and Responsible AI Solutions

Building Efficient ML Pipelines and Responsible ...
Adi Polak | Microsoft
Statistical Aspects of Distributed Tracing

Statistical Aspects of Distributed Tracing
Joe Ross | Splunk
Reducing Flight Delays with Kubernetes and Tensorflow

Reducing Flight Delays with Kubernetes and ...
Daniel van der Ende | GoDataDriven
Creating Knowledge Graphs via a Symbiosis of Data Science and Data Engineering

Creating Knowledge Graphs via a Symbiosis of Data ...
Maureen Teyssier | Reonomy
Leveraging Compute

Leveraging Compute
Riva-Melissa Tez | Intel Corporation
Building Systems to Monitor Data and Model Health in Production Systems

Building Systems to Monitor Data and Model Health ...
Mohammed Ridwanul | Dessa
Monarch, Google’s Planet-Scale Streaming Monitoring Infrastructure

Monarch, Google’s Planet-Scale Streaming ...
George Talbot | Google
Valid Inference after Model Selection and the selectiveInference Package

Valid Inference after Model Selection and the ...
Joshua Loftus | NYU
The Observatorium - Using Machine Learning and Observability Together to Reduce Incident Impact

The Observatorium - Using Machine Learning and ...
Alex Kass | DigitalOcean
Uncovering the Potential of TensorFlow 2.0

Uncovering the Potential of TensorFlow 2.0
Jerome Nilmeier | IBM
CockroachDB: Architecture of a Geo-Distributed SQL Database

CockroachDB: Architecture of a Geo-Distributed ...
Nathan VanBenschoten | Cockroach Labs
Leveraging Stateful Functions to Power the Next Generation of Event-Driven Applications

Leveraging Stateful Functions to Power the Next ...
Seth Wiesman | Ververica
Reproducibility in Data Science

Reproducibility in Data Science
Juliana Freire | NYU
Zipline - Airbnb's Declarative Feature Engineering Framework

Zipline - Airbnb's Declarative Feature ...
Nikhil Simha | Airbnb
Dagster: Workflows for Data Science, Machine Learning, and Data Engineering

Dagster: Workflows for Data Science, Machine ...
Nicholas Schrock | Elementl
Meet dbt: The Data Transformation Tool Used by JetBlue, GitLab, Wistia, and Away

Meet dbt: The Data Transformation Tool Used by ...
Jeremy Cohen | Fishtown Analytics
Time to Rethink Visual Data Management for Machine Learning

Time to Rethink Visual Data Management for ...
Vishakha Gupta-Cledat | ApertureData
Amundsen - From Discovering Data to Securing Data

Amundsen - From Discovering Data to Securing Data
Mark Grover | Lyft
The Unreasonable Effectiveness of Product Sense

The Unreasonable Effectiveness of Product Sense
Vitaly Gordon | Faros AI
Real-time Retrieval with Deep Learning: Benefits and Challenges

Real-time Retrieval with Deep Learning: Benefits ...
Edo Liberty | HyperCube
Data Reliability for Data Lakes

Data Reliability for Data Lakes
Michael Armbrust | Databricks
Agile AI: From Research, Production to Customer Adoption

Agile AI: From Research, Production to Customer ...
Zineb Laraki | Salesforce
Flyte: Cloud Native Machine Learning & Data Processing Platform

Flyte: Cloud Native Machine Learning & Data ...
Haytham Abuelfutuh | Lyft
Pitfalls and Challenges of ML-Powered Applications

Pitfalls and Challenges of ML-Powered Applications
Emmanuel Ameisen | Stripe
Data Lineage with Apache Airflow

Data Lineage with Apache Airflow
Willy Lulciuc | Datakin
Lessons Developing Conversational AI Virtual Agents

Lessons Developing Conversational AI Virtual ...
Mitul Tiwari | ServiceNow
MESA: Building a Personalized Messaging System at Netflix

MESA: Building a Personalized Messaging System at ...
Grace Huang | Netflix
Federated Learning and Analytics at Google and Beyond

Federated Learning and Analytics at Google and ...
Peter Kairouz | Google
Responsible AI – Model Interpretability and Fairness

Responsible AI – Model Interpretability and ...
Mehrnoosh Sameki | Microsoft
https://www.datacouncil.ai/hubfs/DataEngConf/Data%20Council/Sample%20Backgrounds/bg2.jpg center center
  •  Follow / Join Us


  • YouTube
  • Twitter
  • LinkedIn
  • Facebook
  •  Contact Us


  • Email Us
  • Tweet Us
  • Lost Your Tickets?
  •  Menu


  • Home
  • Blog
  • Careers
  • Code of Conduct
  • Community
  • Nonprofits
  •  Menu


  • Partners
  • Privacy Policy
  • Speakers
  • Talks
  • Startups
  • Terms of Use
  •  Our Events


  • DC_THURS Online
  • Data Council New York City
  • Data Council San Francisco
  • Data Council Southeast Asia
  • Data Council Barcelona

MENU