The Vision
Cerebro
Deep learning (DL) is revolutionizing data analytics applications across many domains. But making effective use of DL is often a painful empirical process, since accuracy is tied to the data representation, neural architecture, and hyper-parameter settings. This process, called model selection, is a bottleneck to democratizing DL due to both resource costs and user time spent.
Cerebro is a first-of-its-kind platform that mitigates this bottleneck for DL model selection at scale. It raises model building throughput without raising resource costs and while ensuring accuracy, reproducibility, and generality to support multiple DL frameworks (PyTorch and TensorFlow). Our target setting is small clusters, which covers the vast majority of DL use cases in practice.
Cerebro is open sourced under Apache License v2.0.
Components and Capabilities
Deep Learning
Cerebro will be the first DL platform to offer unified execution support with holistic resource optimization for all axes of scalability: model size, dataset size, example size, number of tasks, and for transfer learning. Its carefully layered architecture decouples the specification of model building tasks (e.g., in Keras APIs or AutoML heuristics) from the execution backend, making it portable across multiple backends: Kubernetes, Spark, Dask, Greenplum, Ray and soon, cloud-native IaaS.
Overview Resources
Some Damaging Delusions of Deep Learning Practice (and How to Avoid Them)
Arun Kumar, Supun Nakandala, and Yuhao Zhang
KDD 2021 Deep Learning Day | Extended Abstract PDF | Talk slides | Talk video
Cerebro: A Layered Data Platform for Scalable Deep Learning
Arun Kumar, Supun Nakandala, Yuhao Zhang, Side Li, Advitya Gemawat, and Kabir Nagrecha
CIDR 2021 (Vision paper) | Paper PDF and BibTeX | Talk video
Our Sponsors
This project was/is supported in part by a Hellman Fellowship, the NIDDK of the NIH under award number R01DK114945, an NSF CAREER Award under award number 1942724, and gifts from VMware.