Custom models

The Core Engine comes packed with a selection of built-in models. But, in a field such as Machine Learning, it is quite common that the problems you encounter require a tailored model as a solution. That is why you have the option to define your own custom model.

This section closely follows the same narrative as the custom functions, since they are built around the same concept.

Functions and versions

If you want to utilize a custom model in your Core Engine pipeline, there are a few things to take into consideration. First, in order to keep your pipeline traceable, repeatable and open to collaboration, you need to register your model as a function and version it. Secondly, you have to use the registered function accordingly as your model in your configuration.

Python SDK
CLI
CE Dashboard
Python SDK

In order to push your custom function to the Core Engine, you can use the client method push_function as follows:

def push_function(self,
name: Text,
function_type: Text,
local_path: Text,
udf_name: Text,
message: Text = None)

The name is the given name to the function. function_type denotes the type of the custom function, which should be set to model in this instance. local_path holds the path to the file containing the custom model code, whereas the udf_name is the name of the function within that file. Finally, message is an optional placeholder for a descriptive commit message.

If the name was not used before in your organization, the Core Engine will create a function called name and push its first version, however, if it was used before, a new version of the already available function will be created and pushed. You can see the list of your custom functions through:

client.get_functions()

whereas you can see the versions of a specific function through:

client.get_function_versions(function_id=F_ID)

Example:

client.push_function(name='my_first_model',
function_type='model',
local_path='/home/my_project/my_custom_model.py',
udf_name='baseline_model')
CLI

In order to create a custom model through the CLI, you can use cengine function as follows:

cengine function create NAME LOCAL_PATH FUNC_TYPE UDF_NAME

where:

  • NAME is a string value which is the given name to the function.

  • LOCAL_PATH is a string value which holds the path to the file containing the custom model code

  • FUNC_TYPE is a string value defining the type of custom function, use model to create a custom model

  • UDF_NAME is a string value which is the name of the actual function within the specified file

If the name was not used before in your organization, the Core Engine will create a function called NAME and push its first version, however, if it was used before, a new version of the already available function will be created and pushed. You can see the list of your custom functions through:

cengine function list

whereas you can see the versions of a specific function through:

cengine function versions FUNCTION_ID

Example:

cengine function create my_first_model \
/home/my_project/my_custom_model.py \
model \
baseline_model
CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication of when this feature will be released.

Writing a custom model

As we covered the inner-workings of how to push and version custom models, the next step is to look at how to actually write a custom model.

Write it in a single file

As of now, it is only possible to utilize a custom model if it is written in a single file that is pushed through the Core Engine. If you want to use any additional helper functions to your custom model code, please define it in the same file.

In the future, we are planning to bring a Docker-based solution to this problem, which would not just treat it as a function within a single file but rather as a module. Please see our roadmap for an indication of when this feature will be released.

Adjust your signature

As the model function will be executed within a Core Engine pipeline, it is important that it has a certain signature such as:

def custom_model(train_dataset: tf.data.Dataset,
eval_dataset: tf.data.Dataset,
schema: Dict,
log_dir: str,
hparam1,
hparam2,
...
hparamN):

The first four input parameters that you see above will be passed to your custom_model function by the Core Engine. As their names suggest, the first non-repeated, unbatched datasets for training and evaluation respectively. schema is a dictionary where the keys are the feature names and the values define the spec of each feature. Moreover, the log_dir is used for the Tensorboard callback.

On top of these four parameters, you have the ability to define as many hyperparameters as possible. They will be parsed from your configuration and passed to your custom function.

Inside the function

When writing a custom model function, there are three crucial steps which need to be covered:

  1. The model needs to be compiled, i.e. the function model.compile() has to be called on the defined Keras model instance.

  2. The model needs to be fit, which is done by calling the function model.fit() function on the same Keras model instance.

  3. The compiled and fit model needs to be returned in the end, so the function my_model from above needs to conclude with the statement return model , where model is the previously constructed Keras model instance.

Dependencies

For now, all custom functions are run in the same environment. Some of the most critical libraries in terms of preprocessing are as follows:

Library

Version

tensorflow

2.1.0

tensorboard

2.1.1

keras

2.1.0

numpy

1.18.5

pandas

0.24.2

scikit-learn

0.21.3

You can find a list of all the requirements right here.

Example

/home/my_project/my_custom_model.py
def custom_model(train_dataset: tf.data.Dataset,
eval_dataset: tf.data.Dataset,
schema: Dict,
log_dir: str,
batch_size: int = 32,
lr: float = 0.0001,
epochs: int = 10,
loss: str = 'mse',
last_activation: str = 'sigmoid',
input_units: int = 11,
output_units: int = 1):
"""
a custom Tensorflow/Keras model
"""
# Create batched datasets
train_dataset = train_dataset.batch(batch_size, drop_remainder=True)
eval_dataset = eval_dataset.batch(batch_size, drop_remainder=True)
# Set the metrics
metrics = ['accuracy']
# Set up the network architecture
input_layer = tf.keras.layers.Input(shape=(input_units,))
d = tf.keras.layers.Dense(64, activation='relu')(input_layer)
d = tf.keras.layers.Dropout(0.2)(d)
d = tf.keras.layers.Dense(64, activation='relu')(d)
d = tf.keras.layers.Dropout(0.2)(d)
output_layer = tf.keras.layers.Dense(output_units,
activation=last_activation,
name='income_bracket')(d)
model = tf.keras.Model(inputs=input_layer,
outputs=output_layer)
# Compile your model
model.compile(loss=loss,
optimizer=tf.keras.optimizers.Adam(lr=lr),
metrics=metrics)
model.summary()
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
# Train your model and return it
model.fit(train_dataset,
epochs=epochs,
validation_data=eval_dataset,
callbacks=[tensorboard_callback])
return model

Using your model

In order to use your custom model, you need to point towards a function and a specific version of the selected function:

Python SDK
YAML
Python SDK
from cengine import PipelineConfig
p = PipelineConfig()
# Define the trainer and the hparams
p.trainer.fn = 'my_first_model@4281ab731bc123a0'
p.trainer.param = {'batch_size': 64,
'epoch': 15,
'learning_rate': 0.00015,
'loss': 'mean_squared_error'}
YAML
trainer:
fn: my_first_model@4281ab731bc123a0
params:
batch_size: 64
epoch: 200
learning_rate: 0.00015
loss: mean_squared_error

You do not have to assign a value for each optional parameter. If you have used default values for hyperparameters in your model function, the Core Engine will infer these values from your function and use them for the experiment tracking and comparison.

Regarding the version: latest

p.trainer.fn = 'my_first_model@latest'
trainer:
fn: my_first_model@latest

The immutability of pipelines is one of the core values of the Core Engine. In that regard, it is important to mention that if you use a custom model by using the tag latest as the version, the version will be resolved upon the creation of the pipeline. Which means, if you do cengine pipeline push on the CLI or client.push_pipeline(.....) with the Python SDK, the Core Engine will take the latest version of your model at pipeline creation time, register your pipeline with the corresponding version and it will not change with time. In other words, even if you push a new version of the model and then run your pipeline, it will not use the newly pushed function but it will utilize the latest at the time of pipeline creation.

Pulling your model

In order to make your pipelines both trackable and open to collaboration, you have the ability to pull a version of any function in your organization at any time and look at the custom code.

Python SDK
CLI
CE Dashboard
Python SDK

In order to pull a version of a function through the Python SDK, you can use the client method pull_function_version:

def pull_function_version(self,
function_id: Text,
version_id: Text,
output_path: Text = None)

The function_id and the version_id define which custom function to download, whereas output_path, if defined, will be used as a path to download the function to. If it is not defined, the function is downloaded to local directory.

Moreover, if you are in a Jupyter Notebook setting, you can use the client method called magic_function. It will pull the function but instead of downloading into a local file like pull_function_version, it will write it to your notebook as an executable cell, so you can test and improve in an efficient manner.

def magic_function(self,
function_id: Text,
version_id: Text)
CLI

In order to pull a function through the CLI, please follow:

cengine function pull FUNCTION_ID VERSION_ID [--output_path=OUTPUT_PATH]

where:

  • FUNCTION_ID is a string idetifier for the function

  • VERSION_ID is a string identifier for the function version

  • OUTPUT_PATH is an optional path value, which, if defined, will be used to download the function to. If it is not defined, the function is downloaded to local directory.

CE Dashboard

Currently, this feature is unavailable.

We are working hard to create a dashboard for the Core Engine. Please see our roadmap for an indication of when this feature will be released.