Datasources come in all shapes and sizes. Even though, most of the small projects use a relatively small and rarely changing -and almost static- datasource, as you start to work on bigger and bigger projects, there is a high chance that you would have to handle a datasource which would to grow in size in time and you would have to handle its dynamic nature too.
That is exactly the point where Core Engine comes into play with the concept of datasource commits. When you create a commit, it creates a snapshot of your datasource and stores it. This way, you can collaborate with your teammates on the same version of your datasource and you end up with traceable and repeatable experiments even on ever-changing datasources.
You can create a new commit of a selected datasource as follows:
cengine datasource commit DATASOURCE_ID [OPTIONS]
the ID of the selected datasource
When you are creating the commit, you also have the option to write a commit message and define a schema. While the message is just used as a annotation for the specific commit, the schema decides how the data is being restored. If you do not specify it, Core Engine will be automatically infer it using the datapoints in your datasource.
A message for the datasource commit
Path to a schema file
You can also observe the commits of your datasources by doing:
cengine datasource commits PIPELINE_ID
This should result in a commits table with some details such as:
Selection | ID | Created At | Status | Message-------------+----------+---------------------+----------+--------------* | a27g9c41 | 2020-07-21 18:04:50 | Success | first try| f3da3h91 | 2020-07-21 18:07:56 | Success | second try
While working on different pipelines on the Core Engine, you will have to select a datasource to work on, and you can do it in two different ways. Either you can explicitly define it upon pipeline execution or you can select a datasource commit as the active datasource commit and the Core Engine will use the selected commit whenever it is required.
cengine datasource set SOURCE_ID
identifier for the selected commit
You can define the
SOURCE_ID by using one of the following formats:
DATASOURCE_ID: the Core Engine will select the latest commit of the datasource as the active datasource commit
DATASOURCE_ID:COMMIT_ID: The Core Engine will select the specified commit as the active datasource commit