Time-series

With the Core Engine, it is also possible to work on timeseries datasets. However, due to the nature of sequential tasks, a few additional parameters need to be defined to enable this functionality.

Main Key: timeseries

Structurally, the timeseries holds 4 mandatory and 1 optional value, such as:

Parameters

dtype

required

resampling_rate_in_secs

int or float

True

trip_gap_threshold_in_secs

int or float

True

process_sequence_w_timestamp

string

True

process_sequence_w_category

string

False

sequence_shift

int

True

  • resampling_rate_in_secs defines the resampling rate in seconds and it will be used at the

    corresponding preprocessing step

  • trip_gap_threshold_in_secs defines a maximum threshold in seconds in order to split

    the dataset into trips. Sequential transformations will occur once the data is split into trips

    based on this value.

  • process_sequence_w_timestamp specifies which data column holds the timestamp.

  • process_sequence_w_category is an optional value, which, if provided, will be used to

    split the data into categories before the sequential processes

  • sequence_shift defines the shift (in datapoints) while extracting sequences from the dataset

    ‚Äč

Examples

Imagine the scenario, where you have an asset on the field, which is transmitting sensory data and it is only active during a certain period of time every day. However, the different sensors have different transmission frequencies. In order to be able to feed your model with consistent (and equidistant) datapoints you have to resample your dataset. In that case you can build the timeseries block as follows:

timeseries:
resampling_rate_in_secs: 30
trip_gap_threshold_in_secs: 1800
process_sequence_w_timestamp: 'timestamp_column'
sequence_shift: 1

This will instruct the Core Engine to split your data into trips which are at least 1 hour away from each other and then resample those trips with a rate of 30 seconds using the timestamp_column as the time index. Moreover, while extracting sequences after the resampling, it will only shift by 1 data point before extracting the next sequence.

You can even build on top of this example by using not just one asset but a fleet of assets. In this scenario, multiple assets might be functional simultaneously, which means, during the resampling process, values from one asset might influence the values from another asset. That's exactly where the key process_sequence_w_category comes into play. It is used to split the data into categories before the sequential transformations, so the integrity of the data can remain intact within categories for tasks such as the one explained above. The resulting block is as follows:

timeseries:
resampling_rate_in_secs: 30
trip_gap_threshold_in_secs: 1800
process_sequence_w_timestamp: 'timestamp_column'
process_sequence_w_category: 'asset_id'
sequence_shift: 1