Whether you're a new or an experienced user, there is plenty to discover about the Core Engine. We've collected (and continue to expand) this information and present it in a digestible way so you can start and build.
If you are already familiar with the basics, the core concepts will provide a better reference guide for the available features and internals.
What is the Core Engine
The Core Engine is an orchestration tool for Machine Learning and Deep Learning. It provides a consolidated interface to build tailored pipelines, from data ingestion, to training, evaluation, and finally serving (last one WIP). While initially built for timeseries datasets it also supports other, non-sequential datasets.
The Core Engine is built to bridge the gap between research and production in machine learning. It aims to consolidate the fragmented landscape of ML development. By providing an easy-to-use, managed, and powerful computing platform, we hope to get expedite the transition of ML models to production services.
All experiments and model trainings are run in individual pipelines and organized into workspaces. Similar experiments, i.e. ones using the same datasource, should be grouped in the same workspace, as subsequent pipeline runs enable the Core Engine to skip certain processing steps. This native caching is built-in and supported across all plans and for all types of datasets.
Configuration files are at the heart of each pipeline. They define your features and labels, how your data is split and which kind of preprocessing should be used, they configure your model and training (trainer), define the evaluator and contain optionally additional configuration for timeseries datasets.
Datasources are the heart and soul you bring to the party. They contain and define your data. As of today, only BigQuery is supported, but we will continue to add more over time. Feel free to reach out at firstname.lastname@example.org and tell us which datasource you'd like to see next.
Declarative Configurations for pipelines, models and datasources guarantee repeatable, reliable and comparable experiments.
Deep Learning development involves repetitive experimentation. Thanks to native caching of all computations you'll never have to repeat the same thing twice - saving time and money for all subsequent experiments.
All the right tools
The Core Engine supports a wide variety of plugins and tools, including your favorite ones such as TensorBoard and the What-If tool (WIP). They all come pre-configured and out-of-the-box as a result of every pipeline run.
Native distributed computation
With big enough data, it can take hours to crunch through data in one single machine. The Core Engine uses distributed data processing technologies (Apache Beam) for efficient execution, reducing hours of computation to just minutes.
» Install the Core Engine CLI
The most convenient way (in our opinion at least) to interact with the Core Engine is through the CLI. Installing is easy:
Once the installation is completed, you can check whether the installation was successful through:
» A quick start to the Core Engine
Once you've successfully created your account and completed the installation you're ready to go. Jump right into the quick start if you're new to the Core Engine to run your first pipeline within minutes!
» A primer on workspaces
The concept of workspaces is created to maintain an organized and efficient structure within the ML workflow of your organization.
Through a workspace, developers can:
- Gather all their solutions about a problem under one roof
- Collaborate on the same problem by sharing their experiments and solutions