Barrage Config¶
Table of Contents:
Configs serve a central role in Barrage. They are an entire recipe for a deep learning
model - comprised of the components (e.g. etl classes, TensorFlow
metrics,
TensorFlow
optimizers, etc…), with auto-configuring defaults and best practices,
and serve as instructions for the underlying Barrage engine to orchestrate training and
scoring.
Configs are broken into four distinct sections:
dataset
: define how to load, transform, augment a dataset.model
: define the model structure, metrics, losses, loss weights, etc…solver
: define the optimizer and learning rate scheduling.services
: define the underlying artifacting and configure best practices.
Note: It is highly recommended (but not required) to use a container
(e.g. Docker
) to version the entire environment because it enables the user to
exactly reproduce the environment used to train the model at scoring time
(and or deployment time).
Imports¶
Components are imported via their python paths
in the config.
An import block
in the config is defined as such:
{
"import": str (required python path),
"params": dict (optional params)
}
Let’s consider several examples where a config import is translated into python code:
A custom user class:
{
"import": "placeholder.path.MyClass"
"params": {
"a": "hello world",
"b": 42
}
}
from placeholder.path import MyClass
MyClass(a="hello world", b=42)
A custom user function:
{
"import": "sample.my_func"
}
from sample import my_func
# params are optional, none passed by the config
my_func()
The above strategy additionally works for TensorFlow
imports but can lead to
verbose python paths (e.g. tensorflow.keras.losses.CategoricalCrossentropy
).
The following import shorthands are adopt for all TensorFlow
imports
(e.g metrics
, loss
, optimizers
, schedulers
, etc…):
# Respect TensorFlow string aliases
"categorical_crossentropy" == "tensorflow.keras.losses.CategoricalCrossentropy"
# Search TensorFlow paths automatically
"Adam" == "tensorflow.python.keras.optimizer_v2.adam.Adam" == "tensorflow.keras.optimizers.Adam"
In addition in the dataset
section of the config, the following import shorthands are
adopted (e.g. loaders
, transformers
, etc..):
# Search barrage.dataset paths
"KeySelector" == "barrage.dataset.KeySelector"
Config Section: dataset¶
dataset
configures the following:
loader
transformer
augmentor
Schema¶
"dataset": {
"loader": {
"import": string,
"params": dict // optional
},
"transformer": {
"import": string,
"params": dict // optional
},
"augmentor": [ // optional
{
"import": string,
"params": dict // optional
}
],
"sample_count": string, //optional
"seed": int // optional
}
Breakdown¶
dataset
: import a class derived frombarrage.dataset.RecordLoader
.transformer
: import a class derived frombarrage.dataset.RecordTransformer
.augmentor
: list of augmentation functions to import and apply in sequential order.sample_count
: name of a key that contains integer counts that represent the number of times to put a sample in an epoch.seed
: numpy random seed.
Config Section: model¶
model
configures the following:
network architecture
loss functions and loss weights
metrics
Schema¶
"model": {
"network": {
"import": string,
"params": dict // optional
},
"outputs": [
"name": string,
"loss": {
"import": string,
"params": dict // optional
},
"loss_weight": float, // required if len(outputs) > 1
"metrics": [ // optional
{
"import": string,
"params": dict // optional
}
],
"sample_weight_mode": str //optional
]
}
Breakdown¶
network
: import a function that returns atensorflow.python.keras.Model
.outputs.name
: string that must match an output name from theModel
return bynetwork
.outputs.loss
: import a loss (must bev2
loss class compliant).outputs.loss_weight
: loss weight for a multi output network.outputs.metrics
: import a list of metrics (must bev2
metric or loss class compliant).outputs.sample_weight_mode
: sample weight mode.
Config Section: solver¶
solver
configures the following:
optimizer
learning rate scheduling technique
batch size
epochs
Schema¶
"solver": {
"optimizer": { // optional, all or none
"import": string, // required
"learning_rate": float or import block // required
"params": dict // optional
},
"batch_size": int, // optional
"epochs": int, // optional
"steps": int, // optional
"learning_rate_reducer": {
"monitor": string,
"mode": "min" or "max",
"patience": int,
"factor": float
// optional additional ReduceLROnPlateau callback params
}
}
Note: mode = "auto"
is not supported.
Defaults¶
"solver": {
"optimizer": {
"import": "Adam",
"learning_rate": 1e-3,
"params": {}
},
"batch_size": 32,
"epochs": 10
}
Breakdown¶
optimizer
: import aTensorFlow
optimizer (must be compatible withv2
optimizer class).optimizer.learning_rate
: can be a float or an import block to a schedule (must be compatible withv2
schedule class)
// float
"learning_rate": 1e-3
// import block
"learning_rate": {
"import": "ExponentialDecay",
"params": {
"initial_learning_rate": 1e-3,
"decay_steps": 100,
"decay_rate": 0.99,
}
}
batch_size
: batch size.epochs
: number of epochs to train.steps
: modify the length of anepoch
tosteps
batches. Can be used to shorten or lengthen an epoch.learning_rate_reducer
: defines params for anReduceLROnPlateua
callback:
from tensorflow.python.keras import callbacks
callbacks.ReduceLROnPlateau(**cfg["solver"]["learning_rate_reducer"])
Config Section: services¶
services
automatically configures the following best practices with default settings:
the best graph should be saved and it should be derived by the performance on a validation metric and not a training metric (e.g.
val_loss
vs.loss
)after every checkpoint interval the graph should be saved.
TensorBoard
should be automatically setup.if training loss is not changing -> early stop.
if the validation metric that is monitored is not changing -> early stop.
Note: Early stopping has the potential to prematurely terminate a train even when
loss
or val_loss
may continue to improve later (e.g. learning rate scheduling).
To avoid this issue, the defaults have been generously set for a large number of checkpoint
intervals and a very lax improvement condition (near floating point precision).
Schema¶
"services": {
{
"best_checkpoint": { // optional, all or none
"monitor": string,
"mode": "min" or "max"
},
"tensorboard": dict, // optional TensorBoard callback params
"train_early_stopping": { // optional, all or none
"monitor": string,
"mode": "min" or "max",
"patience": int,
"min_delta": float
// optional additional EarlyStopping callback params
}
"validation_early_stopping": { // optional, all or none
"monitor": string,
"mode": "min" or "max",
"patience": int,
"min_delta": float
// optional additional EarlyStopping callback params
}
}
}
Note: mode = "auto"
is not supported.
Defaults¶
"services": {
"best_checkpoint": {
"monitor": "val_loss",
"mode": "min"
},
"tensorboard": {},
"train_early_stopping": {
"monitor": "val_loss",
"mode": "min",
"patience": 10,
"min_delta": 1e-5,
"verbose": 1
}
"validation_early_stopping": {
"monitor": "val_loss",
"mode": "min",
"min_delta": float,
"min_delta": 1e-5,
"verbose": 1
}
}
}
Breakdown¶
best_checkpoint
: defines aModelCheckpoint
callback wheresave_best_only=True
:
from tensorflow.python.keras import callbacks
callbacks.ModelCheckpoint(filepath=..., **cfg["services"]["best_checkpoint"], save_best_only=True)
tensorboard
: defines params for aTensorBoard
callback (log_dir
preconfigured automatically):
from tensorflow.python.keras import callbacks
callbacks.TensorBoard(log_dir=..., **cfg["services"]["tensorboard"])
train_early_stopping
: defines params for anEarlyStopping
callback that must monitor a train metric:
from tensorflow.python.keras import callbacks
callbacks.EarlyStopping(**cfg["services"]["train_early_stopping"])
validation_early_stopping
: defines params for anEarlyStopping
callback that must monitor a validation metric:
from tensorflow.python.keras import callbacks
callbacks.EarlyStopping(**cfg["services"]["validation_early_stopping"])