Cloud has been dramatically changing the landscape of data engineering as well as the behavior of data engineers. Specifically, data storage is migrating from the colocated model (e.g., HDFS) to a more cost-effective, more scalable but often fully disaggregated and remote data lake model (e.g. AWS S3). This has also created a strong need for data orchestration in the cloud like what Kubernetes does for container-based workloads, so that data can be presented in the right layout at the right location for data-consuming applications on the cloud.
Originally developed from UC Berkeley AMPLab as research project "Tachyon", Alluxio (www.alluxio.io) implements the world’s first open-source data orchestration system in the cloud. Alluxio creates a unified access layer for data-driven applications in big data and ML, enabling Spark, Presto, TensorFlow and so on to transparently access different external storage systems while actively leveraging in-memory cache to accelerate data access.
In this talk, the speaker will present:
- New trends and challenges in the data ecosystem in the cloud era;
- Effective data engineering in the cloud world with data orchestration;
- Production use cases of using popular stacks like Presto/Alluxio/S3.
Bin Fan is the founding engineer of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems and algorithms.
Data Council, PO Box 2087, Wilson, WY 83014, USA - Phone: +1 (415) 800-4938 - Email: community (at) datacouncil.ai