Pipelines - Running your pipelines

Registering a pipeline

Once you created a configuration file, you can submit and run a pipeline based on this configuration.

cengine pipeline push [CONFIG_PATH] [PIPELINE_NAME] [--workers] [--cpus_per_worker]
typenamedtypedescriptionrequired
argumentCONFIG_PATHstrPath to the config file to submitTrue
argumentPIPELINE_NAMEstrName of the new pipelineTrue
option--workersintNumber of desired workersFalse
option--cpus_per_workerintNumber of CPUs per workerFalse

As described in the table above, for the push to succeed, the path to the configuration file and a new for the new pipeline need to be specified. However, if not provided, cengine will try to derive the best settings based on the selected datasource.

note

For instance, if you do:

cengine pipeline push my_first_config.yaml my_first_pipeline --workers 5 --cpus_per_worker 4

cengine uses the configuration file named my_first_config.yaml which resides in the working directory and registers a pipeline called my_first_pipeline. Moreover, the pipeline will be executed with 5 pipelines, where each will have 4 CPUs available.

Listing the registered pipelines

Once a workspace is selected, it is also possible to see a list of the registered pipelines through:

cengine pipeline list

This command will not just display the pipelines registered by a single user, but all the pipelines within the context of the selected workspace.

note

For instance, if you followed the last note, executing:

cengine pipeline list

should display something structured like the following:

You are working on the workspace:
Name: demo_workspace
ID: 1
Fetching pipelines. This might take a few seconds..
Currently, you have 2 different pipeline(s) in workspace 1.
ID | Name | Status | Start Time | End Time
---------------+-----------------------------------+------------+------------+------------
PIPELINE_ID_1 | my_colleagues_first_pipeline | NotStarted | |
PIPELINE_ID_2 | my_first_pipeline | NotStarted | |

Executing a pipeline

In the Core Engine, once you push a configuration file, the corresponding pipeline gets registered in the workspace, however, as seen in the example above, the initial status of the pipeline will always be NotStarted and it will not be executed until the user decides to run the pipeline, like:

cengine pipeline run [PIPELINE_ID]
typenamedtypedescriptionrequired
argumentPIPELINE_IDintID of the new pipelineTrue
note

If you want to continue the example, you can run my_first_pipeline using its ID from the table above:

cengine pipeline run PIPELINE_ID_2

This will execute the pipeline run and update its status in the list.

If you want to observe the state of the started pipeline runs, you can use:

cengine pipeline status

This will show you details such as the current status of the pipelines, their completion rate and cost/time computations

note

After my_first_pipeline finishes successfully, if you execute,

cengine pipeline status

you will see a table similar to the one below which shows a detailed representation regarding the status of your pipeline.

ID | Name | Pipeline Status | Completion | Compute Cost | Training Cost | Total Cost | Execution Time
---------------+-------------------+-----------------+------------+--------------+---------------+------------+----------------
PIPELINE_ID_2 | my_first_pipeline | Succeeded | 100% | 1.627 | 1.412 | 2.97 | 0:26:30

Interacting with a pipeline

Additionally, the Core Engine gives its users the ability to interact with the pipeline runs which are already registered within a selected workspace. The first way of doing this is to use the pull command, which downloads the configuration of a specified pipeline to a desired location.

cengine pipeline pull [PIPELINE_ID] [--output_path]
typenamedtypedescriptionrequired
argumentPIPELINE_IDintID of the new pipelineTrue
option--output_pathstrPath to download the config toFalse

If the --output_path is not provided, it will download the configuration to the working directory and name it 'ce_config.yaml'.

note

For instance, following our example from above, a user within the same workspace as you can do the following:

cengine pipeline pull PIPELINE_ID_2 --output_path=config.yaml

in order to download the config file that you submitted when you registered your pipeline. This opens up a collaboration opportunity, where he can use it to either to trace what has been down before or to use it as a template for future pipeline runs.

Lastly, the users can use the update command to update the number of workers and the number of CPUs per workers before the execution of a pipeline.

cengine pipeline update [PIPELINE_ID] [--workers] [--cpus_per_worker]
typenamedtypedescriptionrequired
argumentPIPELINE_IDintID of the selected pipelineTrue
option--workersintNumber of desired workersTrue
option--cpus_per_workerintNumber of CPUs per workerTrue