With the Core Engine, it is also possible to work on timeseries datasets. However, due to the nature of sequential tasks, a few additional parameters need to be defined to enable this functionality.
Structurally, the timeseries holds 4 mandatory and 1 optional value, such as:
int or float
int or float
resampling_rate_in_secs defines the resampling rate in seconds and it will be used at the
corresponding preprocessing step
trip_gap_threshold_in_secs defines a maximum threshold in seconds in order to split
the dataset into trips. Sequential transformations will occur once the data is split into trips
based on this value.
process_sequence_w_timestamp specifies which data column holds the timestamp.
process_sequence_w_category is an optional value, which, if provided, will be used to
split the data into categories before the sequential processes
sequence_shift defines the shift (in datapoints) while extracting sequences from the dataset
Imagine the scenario, where you have an asset on the field, which is transmitting sensory data and it is only active during a certain period of time every day. However, the different sensors have different transmission frequencies. In order to be able to feed your model with consistent (and equidistant) datapoints you have to resample your dataset. In that case you can build the
timeseries block as follows:
timeseries:resampling_rate_in_secs: 30trip_gap_threshold_in_secs: 1800process_sequence_w_timestamp: 'timestamp_column'sequence_shift: 1
This will instruct the Core Engine to split your data into trips which are at least 1 hour away from each other and then resample those trips with a rate of 30 seconds using the
timestamp_column as the time index. Moreover, while extracting sequences after the resampling, it will only shift by 1 data point before extracting the next sequence.
You can even build on top of this example by using not just one asset but a fleet of assets. In this scenario, multiple assets might be functional simultaneously, which means, during the resampling process, values from one asset might influence the values from another asset. That's exactly where the key
process_sequence_w_category comes into play. It is used to split the data into categories before the sequential transformations, so the integrity of the data can remain intact within categories for tasks such as the one explained above. The resulting block is as follows:
timeseries:resampling_rate_in_secs: 30trip_gap_threshold_in_secs: 1800process_sequence_w_timestamp: 'timestamp_column'process_sequence_w_category: 'asset_id'sequence_shift: 1