Whether you're a new or an experienced user, there is plenty to discover about the Core Engine. We've collected (and continue to expand) this information and present it in a digestible way so you can start and build.
Most of the people reading this would want to know why they would want to use yet another supposed MLOps tool that solves all production problems. The simple answer to this question there is still no one solution out there that really solves the ML in production headache: Most of them either solve for really Ops-y problems (CI/CD, deployments, feature stores) or for really Data Scienc-y (remote kernels, metadata tracking, hyper-parameter tuning) problems. The tools that are really state-of-the-art and come close are not approachable (financially + technologically) for hobbyists or smaller teams that just want to get models in production. The result is that 87% of ML models never make it into production, and those that do make it tend be looked after by enormous engineering teams with big budgets.
The team behind the Core Engine has been through the ringer with putting models in production, and has built the Core Engine from the perspective of both ML and Ops people. Our goal with the Core Engine is to provide a neat interface for data scientists to write production-ready code from training day 0, and to provide a configurable, extensible and managed backend for Ops people to keep things chugging along.
Last but not least, our hope is that the Core Engine provides the hobbyist/smaller companies with a golden path to put models in production. With our free plan, you can start writing production-ready ML pipelines immediately.
For the people who actually create models and do experiments, you get exposed a simple interface to plug and play your models and data with. You can run experiments remotely as easily as possible, and use the automatic evaluation mechanisms that are built-in to analyze what happened. The goal is for you to follow as closely as possible the pandas/numpy/scikit paradigm you are familiar with, but to end-up with production-ready, scale-able and deploy-able models at the end.
For the people who are responsible for managing the infrastructure and tasked with negotiating the ever changing ML eco-system, the Core Engine should be seen as a platform that provides high-level integrations to various backends that are tedious to build and maintain. If you want to swap out some components of the Core Engine with others, then you are free to do so! For example, if you want to deploy on different cloud providers (AWS, GCP, Azure), or a different data processing backend (Spark, Dataflow etc), then the Core Engine provides the ability to do so.
The Core Engine is an end-to-end MLOps platform that serves multiple roles in your machine learning workflow. It is:
A workload processing engine - it processes and executes your code (in a distributed environment)
An orchestrator - it automates configuration, management, and coordination of your ML workloads.
A ML framework - it provides built-in plug-ins for normal tasks (like evaluation and serving).
A standardized interface - to quickly configure and run pipelines from data ingestion, to training, evaluation, and finally serving.
If you are coming from the land of writing Jupyter notebooks, scripts and glue-code to get your ML experiments or pipelines going, you should give the Core Engine a try. The Core Engine will provide you with an easy way to run your code in a distributed, transparent and tracked environment. You can leverage all the perks of running in a production-ready environment, but without the overhead of setting up the Ops, the datasources, organizing all this and writing the code that brings it all together into one coherent environment for your organization.
The Core Engine takes care of much of the hassle of ML development, so you can focus on writing your app without needing to reinvent the wheel. By providing an easy-to-use and powerful computing platform, we expedite the transition of ML models to production services.
To simplify things, the Core Engine lets you create a ML pipeline either through Python or the command line. No matter how you create it, at the end an easy-to-read YAML configuration file is produced with all necessary information required to uniquely identify what this pipeline is set up to do. This YAML file is a source of immutable ground truth for your colleagues that you can always trust no matter when it was produced and by whom.
Each pipeline is connected to a datasource commit - an immutable snapshot of any supported datasource. By versioning datasources using the Core Engine, you are able to track precisely what flows through your pipelines at any moment in time. The Core Engine supports multiple types (images, tabular, text) and sources (relational database, blob storage etc) of datasources.
The actual code that is being executed also exists in different types of functions that users can create asynchronously or during the creation of a pipeline. This creates a complete separation of the code from the configuration, and lets the Core Engine automatically track the important metadata that is necessary to keep an eye on as you progress through the ML life-cycle.
At the end of each training pipeline, the model is deployed on a supported backend as an endpoint. You are then able to schedule repeatably training pipelines based on time/data triggers, run a batch inference pipelines (also on a schedule if needed) and also run evaluation pipelines on other datasources according to your requirements. This way, every pipeline produces artifacts that are battle-tested and production-ready from day 1.
All the computation, training and deployment in the Core Engine is done on multiple supported backends, which can be swapped in and out according to your wishes. We will publish a full list of supported environments and backends soon!
Declarative Configurations for pipelines, models and datasources guarantee repeatable, reliable and comparable experiments.
Machine Learning development involves repetitive experimentation. Thanks to native caching of all computations you'll never have to repeat the same thing twice - saving time and money for all subsequent experiments.
The Core Engine supports a wide variety of plugins and tools, including your favorite ones such as TensorBoard and the What-If tool (WIP). They all come pre-configured and out-of-the-box as a result of every pipeline run.
With big enough data, it can take hours to crunch through data in one single machine. The Core Engine uses distributed data processing technologies (Apache Beam) for efficient execution, reducing hours of computation to just minutes.
The most convenient way (in our opinion at least) to interact with the Core Engine is through the CLI. Installing is easy:
pip install cengine
Once the installation is completed, you can check whether the installation was successful through:
The Core Engine is an end-to-end platform to run ML experiments.
Experiments are conceptualized as pipelines, which are bundled into workspaces. There is caching, config files, evaluation, model architectures and so much more.
Once you've successfully created your account and completed the installation you're ready to go. Jump right into the quick start if you're new to the Core Engine to run your first pipeline within minutes!