Managing versions

Forget about my_datasource, my_datasource_v2, my_datasource_final, my_datasource_realfinal...

Datasource commits

Datasources come in all shapes and sizes. Even though, most of the small projects use a relatively small and rarely changing -and almost static- datasource, as you start to work on bigger and bigger projects, there is a high chance that you would have to handle a datasource which would to grow in size in time and you would have to handle its dynamic nature too.

That is exactly the point where Core Engine comes into play with the concept of datasource commits. When you create a commit, it creates a snapshot of your datasource and stores it. This way, you can collaborate with your teammates on the same version of your datasource and you end up with traceable and repeatable experiments even on ever-changing datasources.

Creating a datasource commit

You can create a new commit of a selected datasource as follows:

CLI
Python SDK
CE Dashboard
CLI
cengine datasource commit DATASOURCE_ID [OPTIONS]

ARGUMENT

TYPE

DESCRIPTION

DATASOURCE_ID

string

the ID of the selected datasource

When you are creating the commit, you also have the option to write a commit message and define a schema. While the message is just used as a annotation for the specific commit, the schema decides how the data is being restored. If you do not specify it, Core Engine will be automatically infer it using the datapoints in your datasource.

OPTIONS

TYPE

DESCRIPTION

--message

string

A message for the datasource commit

--schema

path

Path to a schema file

Python SDK

Currently, this feature is unavailable.

We are working hard to create a Python SDK for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

Listing the commits

You can also observe the commits of your datasources by doing:

CLI
Python SDK
CE Dashboard
CLI
cengine datasource commits PIPELINE_ID

This should result in a commits table with some details such as:

Selection | ID | Created At | Status | Message
-------------+----------+---------------------+----------+--------------
* | a27g9c41 | 2020-07-21 18:04:50 | Success | first try
| f3da3h91 | 2020-07-21 18:07:56 | Success | second try
Python SDK

Currently, this feature is unavailable.

We are working hard to create a Python SDK for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

Selecting an active commit

While working on different pipelines on the Core Engine, you will have to select a datasource to work on, and you can do it in two different ways. Either you can explicitly define it upon pipeline execution or you can select a datasource commit as the active datasource commit and the Core Engine will use the selected commit whenever it is required.

CLI
Python SDK
CE Dashboard
CLI
cengine datasource set SOURCE_ID

ARGUMENT

TYPE

DESCRIPTION

SOURCE_ID

string

identifier for the selected commit

You can define the SOURCE_ID by using one of the following formats:

  • DATASOURCE_ID: the Core Engine will select the latest commit of the datasource as the active datasource commit

  • DATASOURCE_ID:COMMIT_ID: The Core Engine will select the specified commit as the active datasource commit

Python SDK

Currently, this feature is unavailable.

We are working hard to create a Python SDK for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.