In many products, user interaction with data is done through a series of statements in a specified “language” to transform the data as desired. While such domain specific languages (DSLs) can be powerful, crafting statements in them can often be challenging, especially when such statements involve complex string patterns, regex manipulations, or datetime transformations. , Users are often more aware of what they'd like the output of a statement to look like than how the statement itself should be written. The methodology of programming by example (PBE) has therefore become salient for data products in recent years. In the PBE context, rather than writing transforms directly, users are able to specify a set of input-output examples and have the product infer the transformations that they are trying to craft.
In this talk, we will address some of the challenges of designing, building, and integrating a PBE framework into Trifacta, a data preparation platform. After investigating numerous open source and research solutions for the task, we found that most PBE algorithms involve optimized searches over large spaces of transforms, or restrict the size of the DSL - a tradeoff between performance and power that we attempt to balance with our solution. Recent research into deep neural networks has also yielded interesting results for the programming by example problem, which may mitigate these challenges. We’ll talk about some of the pros and cons of using such neural networks, in particular the challenge of training deep networks with large amounts of data while also respecting user data privacy.
We’ll also talk about our journey designing user interfaces for programming by example - a tool which should be a meaningful dialogue between human and program, that both empowers the user to continuously input their example specifications and abstracts away the complexity of program generation from them. PBE represents a movement towards Software 2.0 in the space of data analytics, and we believe our journey towards building it into Trifacta can serve as a starting point for future exploration.
Anish Doshi is an engineer at Trifacta, where he works on the machine learning team on building features that leverage intelligence for data preparation. He holds a Bachelor of Science in Electrical Engineering and Computer Science from the University of California Berkeley, where he worked on research in data visualization and deep learning.
Data Council, PO Box 2087, Wilson, WY 83014, USA - Phone: +1 (415) 800-4938 - Email: community (at) datacouncil.ai