Architectural Overview

An overview over how the Core Engine is designed and integrates with your Cloud Accounts and existing infrastructure.

Getting started with the Core Engine is as easy as connecting a datasource and a processing backend. There is a wide array of supported integrations, and more advanced users with custom needs can simply write their own integrations. If you want to utilize the power of a specific training backend, lets say Google's AI Platform, you can easily integrate those, too.

System design

Our design choices follow the understanding that production-ready model training pipelines need to be immutable, repeatable, discoverable, descriptive and efficient. The Core Engine takes care of the orchestration of your pipelines, from sourcing data all the way to continuous training - no matter if you're running somewhere in an on-premise datacenter or in the Cloud.

In different words, we're running your preprocessing and model code while taking care of the "Operations" for you:

  • interfacing between the individual processing steps (splitting, transform, training),

  • tracking of intermediate results and metadata,

  • caching your processing artefacts,

  • parallelisation of compute tasks,

  • ensuring immutability of your pipelines from data sourcing to model artefacts,

  • no matter where - Cloud, On-Prem or managed by us.

To that end the Core Engine provides a "golden path" that requires nothing but a datasource and a processing backend. We provide a few recommended quickstart environments to get users up and running without the need to deal with further details.

Production scenarios however often look different, so the Core Engine is built with integrations in mind. We support a broad range of integrations for processing, training and serving, and we are going to provide the ability to add custom backends in the near future.


We believe practitioners should not have to deal with the complexity of managing an infrastructure - unless they want to. Read on for an overview of available integrations and overall specifica per category.


All supported datasource integrations are fully versionable using our commit-style interface.

  • Google BigQuery

  • PostreSQL

  • MySQL

  • MongoDB

  • JSON files

  • CSV files

Processing Backends

Processing backends are the literal backbone of your pipelines, they run the actual computation of your processing tasks. Some backends also handle model training, while other require the use of an additional training backend.

  • Kubeflow (processing & training)

  • Google AI Platform Pipelines (processing & training)

  • Airflow (processing & training)

  • Google Composer (processing & training)

  • Apache Spark (processing only)

  • Google Dataflow (processing only)

Training Backends

Training backends can greatly enhance your pipeline performance, e.g. via GPU support or seamless transition from training to serving.

  • Google AI Platform

  • AWS Sagemaker

  • Azure ML

Serving backends (coming soon)

Pipelines can transition seamlessly from training to serving when connected to a serving backend. While some processing backends can be used for batch inference "serving", realtime serving requires a serving backend to be configured.

  • Google AI Platform

  • Seldon

  • Cortex