Connecting datasources

Because every pipeline needs a datasource!

Register a datasource

In order to register a new datasource on the Core Engine, you have to do the following:

CLI
Python SDK
CE Dashboard
CLI
cengine datasource create NAME DS_TYPE SOURCE PROVIDER_ID [ARGS] [OPTIONS]

ARGUMENT

TYPE

DESCRIPTION

NAME

string

The name given to the datasource.

DS_TYPE

string

The type of data which will be contained in the datasource,

either tabular, images or csv.

SOURCE

string

The source where the data resides.

PROVIDER_ID

string

The ID of the provider you want to use.

ARGS

string

Multiple arguments for the selected type of source.

Python SDK
client.create_datasource(
name: str,
type: str,
source: str,
provider_id: str,
args: Dict[str, Any],
) -> cengine.models.Datasource

ARGUMENT

TYPE

DESCRIPTION

name

string

The name given to the datasource

type

string

The type of data which will be contained in the datasource,

select either tabular, images or text

source

string

The source where the data resides

provider_id

string

The Id of the provider you want to use

args

Dict

Multiple arguments for the selected type of source

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.

Apart from the NAME, DS_TYPE, SOURCE and PROVIDER_ID, you will also have to provide additional parameters while you are registering a new datasource. The required parameters that you have to provide depends on the selection of the SOURCE and DS_TYPE parameter. To see a concrete example, please checkout how to connect Google Cloud Platform datasources like BigQuery.

Creating commits (versions)

The Core Engine also helps you version your data and this is achieved by the concept of datasource commits. The first commit of your datasource will be created when you first register your datasource. That is why you have to option to provide a commit message and schema, which would be used in the scope of your first datasource commit. You can find more information about the datasource versioning right here. You can specify the following options when commiting:

OPTION

TYPE

DESCRIPTION

--message string

string

A message for the datasource commit.

--schema path

string

Path to a schema file, format here.

Listing registered datasources

If you want to take a look at all the datasources which are registered within your organization, you can follow the steps:

CLI
Python SDK
CE Dashboard
CLI
cengine datasource list

This will show you a list with all the registered datasources and their respective details as follows:

ID | Name | Type | # Commits | Latest Commit
----------+------------------+---------+-------------+---------------------
fb106f24 | FirstDatasource | tabular | 1 | 2020-07-23 13:38:19
0fef3355 | SecondDatasource | tabular | 1 | 2020-07-23 13:40:36
Python SDK
datasources = client.get_datasources()
print(datasources)
CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication on when this feature will be released.