Evaluation

Training is done, evaluation is next.

Within the context of a pipeline in the Core Engine, you have the opportunity to dig a bit deeper into the trained model and grasp a better understanding of it by conducting a post-training evaluation which features configurable slices and metrics.

Main block: evaluator

The main block evaluator is responsible for configuring the post-training evaluation. Structurally, it features two main keys, namely slices and metrics.

Parameters

dtype

required

slices

list

True

metrics

list, dict

True

  • slices represent a list of string values, which hold the slicing columns. In more detail, your data will be sliced based on the columns given in this list and the post-training evaluation will be conducted on each slice.

  • metrics define which metrics will be computed during the evaluation. It is formed either as a list or a dictionary based on the type of the model. If it is a single output model, a simple list of metrics is sufficient. On the other hand, if you have a multi-output model, you can use a dictionary to define metrics specifically for each label.

Examples

Let's start with a simple example. Imagine you are dealing with a simple regression task on a single label and you want to compute the mean_squared_error of your trained model on the eval dataset. Moreover, you want to do this not just on a slice of the entire eval dataset, but also on each category within the native_country feature within your eval dataset. In order to configure your pipeline accordingly, you can follow:

Python SDK
YAML
Python SDK
from cengine import PipelineConfig
p = PipelineConfig()
# Defining the slices
p.evaluator.slices = [['native_country']]
# Defining the metrics
p.evaluator.metrics = ['mean_squared_error']
YAML
evaluator:
slices:
- - 'price_category'
metrics:
- 'mean_squared_error'

Taking on a bit more complicated example, let us assume that the task handles more than one label, age and income_bracket and due to the nature of these labels, you want to apply different metrics to each label. Furthermore, considering the slices, you do not just want to slice on features, native_country and marital_status, but you also want to combine them to create a multi-feature slicing which features both the native_country and the marital_status:

Python SDK
YAML
Python SDK
from cengine import PipelineConfig
p = PipelineConfig()
# Defining the slices
p.evaluator.slices = [
['native_country'],
['marital_status'],
['native_country', 'marital_status']
]
# Defining the metrics
p.evaluator.metrics = {
'age': ['mean_squared_error'],
'income_bracket': ['binary_crossentropy']
}
YAML
evaluator:
slices:
- - 'native_country'
- - 'marital_status'
- - 'native_country'
- 'marital_status'
metrics:
age:
- 'mean_squared_error'
income_bracket:
- 'binary_crossentropy'