Core Concepts

Everything at a glance

The Core Engine lets you build and execute Machine Learning pipelines. Pipelines are simply a sequence of data processing steps to train, evaluate and serve machine learning models.

Similar experiments should be grouped in the same workspace. In the same workspace, subsequent pipeline runs enable the Core Engine to skip certain processing steps - this caching is built-in natively and supported across all plans and for all types of datasets.

The usage of these pipelines is enabled by a configuration file. This is a YAML file and it includes all the possible configuration settings of a pipeline run. They define your features and labels, how your data is split and which preprocessing steps should be used, they configure your model and training (trainer), define the evaluator and contain optionally additional configuration for timeseries datasets.

A high level overview of a pipeline in the Core Engine

‚ÄčPipelines are the heart of most tasks executed in the Core Engine. A pipeline is split up into generic steps (shown in the diagram above), each of which have varying levels of allowed custom control and pre-built automations.

For example, the Train block is both customizable and has built-in automations. You can inject a model function that creates a ML model into the Train block, and can then leverage built-in automations for tasks like GPU-training, distributed training and automated hyperparameter tuning.