Connecting datasources

Because every pipeline needs a datasource!

#todo: small introduction paragraph about the datasources

Register a datasource

In order to register a new datasource on the Core Engine, you have to do the following:

CLI
Python SDK
CE Dashboard
CLI
cengine datasource create NAME DS_TYPE SOURCE [ARGS] [OPTIONS]

ARGUMENT

TYPE

DESCRIPTION

NAME

string

The name given to the datasource

DS_TYPE

string

The type of data which will be contained in the datasource,

select either tabular, images or text

SOURCE

string

The source where the data resides

ARGS

string

Multiple arguments for the selected type of source

Apart from the NAME, DS_TYPE and SOURCE, you will also have to provide additional parameters while you are registering a new datasource. The required parameters that you have to provide depends on the selection of the SOURCE parameter.

As of now, the only supported SOURCE parameter is bq which stands for Google BigQuery. You can learn about the required parameters to create a bq datasource right here.

We are working hard to make Core Engine compatible with more SOURCE types. Please see our roadmap for an indication on when new SOURCEvariants will be added.

The Core Engine also helps you version your data and this is achieved by the concept of datasource commits. The first commit of your datasource will be created when you first register your datasource. That is why you have to option to provide a commit message and schema, which would be used in the scope of your first datasource commit. You can find more information about the datasource versioning right here.

OPTIONS

TYPE

DESCRIPTION

--message

string

A message for the datasource commit

--schema

path

Path to a schema file

Python SDK

Currently, this feature is unavailable.

We are working hard to create a Python SDK for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

Listing registered datasources

If you want to take a look at all the datasources which are registered within your organization, you can follow the steps:

CLI
Python SDK
CE Dashboard
CLI
cengine datasource list

This will show you a list with all the registered datasources and their respective details as follows:

ID | Name | Type | # Commits | Latest Commit
----------+------------------+---------+-------------+---------------------
fb106f24 | FirstDatasource | tabular | 1 | 2020-07-23 13:38:19
0fef3355 | SecondDatasource | tabular | 1 | 2020-07-23 13:40:36
Python SDK

Currently, this feature is unavailable.

We are working hard to create a Python SDK for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.