The Configuration file and the Register¶

Let’s delve a little bit more into Biom3d structure.

Biom3d code is modular, which means that it is easy to plug in or out so-called modules. The complete list of existing modules can be found in biom3d.register.

Training configuration file definition¶

All of the hyper-parameters are defined in the configuration file. The configuration files are stored in Python format in the configs folder. You can create a new config file by copy/paste one of the existing ones and by adapting the parameters defined below. For instance, copy/paste and rename unet_pancreas.py in the same folder and open this Python script with your favourite text editor.

There are two types of hyper-parameters in the configuration file: builder parameters and modules parameters.

Builder parameters¶

Builder parameters are written as follows: NAME=value. The dataset builder parameters must be adapted to your own dataset and the Auto-config builder parameters value can be set with the pre-processing values. The rest of the builder parameters is optional.

Here is the exhaustive list of builder parameters:

#---------------------------------------------------------------------------
# Dataset builder-parameters
# EDIT THE FOLLOWING PARAMATERS WITH YOUR OWN DATASETS PARAMETERS

# Folder where pre-processed images are stored
IMG_DIR = 'data/pancreas/tif_imagesTr_small'

# Folder where pre-processed masks are stored
MSK_DIR = 'data/pancreas/tif_labelsTr_small'

# (optional) path to the .csv file storing "filename,hold_out,fold", where:
# "filename" is the image name,
# "hold_out" is either 0 (training image) or 1 (testing image),
# "fold" (non-negative integer) indicates the k-th fold, 
# by default fold 0 of the training image (hold_out=0) is the validation set.
CSV_DIR = 'data/pancreas/folds_pancreas.csv'

# CSV_DIR can be set to None, in which case the validation set will be
# automatically chosen from the training set (20% of the training images/masks)
# CSV_DIR = None 

# model name
DESC = 'unet_mine-pancreas_21'

# number of classes of objects
# the background does not count, so the minimum is 1 (the max is 255)
NUM_CLASSES=2

#---------------------------------------------------------------------------
# Auto-config builder-parameters
# PASTE AUTO-CONFIG RESULTS HERE

# batch size
BATCH_SIZE = 2

# patch size passed to the model
PATCH_SIZE = [40,224,224]

# larger patch size used prior rotation augmentation to avoid "empty" corners.
AUG_PATCH_SIZE = [48,263,263]

# number of pooling done in the UNet
NUM_POOLS = [3,5,5]

# median spacing is used only during prediction to normalize the output images
# it is commented here because we did not noticed any improvemet
# MEDIAN_SPACING=[0.79492199, 0.79492199, 2.5]
MEDIAN_SPACING=[]

#---------------------------------------------------------------------------
# Advanced paramaters (can be left as such) 
# training configs

# whether to store also the best model 
SAVE_BEST = True 

# number of epochs
# the number of epochs can be reduced for small training set (e.g. a set of 10 images/masks of 128x128x64)
NB_EPOCHS = 1000

# optimizer paramaters
LR_START = 1e-2 # comment if need to reload learning rate after training interruption
WEIGHT_DECAY = 3e-5

# whether to use deep-supervision loss:
# a loss is placed at each stage of the UNet model
USE_DEEP_SUPERVISION = False

# whether to use softmax loss instead of sigmoid
# should not be set to True if object classes are overlapping in the masks
USE_SOFTMAX=False 

# training loop parameters
USE_FP16 = True
NUM_WORKERS = 4

#---------------------------------------------------------------------------
# callback setup (can be left as such) 
# callbacks are routines that execute periodically during the training loop

# folder where the training logs will be stored, including:
# - model .pth files (state_dict)
# - image snapshots of model training (only if USE_IMAGE_CLBK is True)
# - logs with this configuration stored in .yaml format and tensorboard logs
LOG_DIR = 'logs/'

SAVE_MODEL_EVERY_EPOCH = 1
USE_IMAGE_CLBK = True
VAL_EVERY_EPOCH = SAVE_MODEL_EVERY_EPOCH
SAVE_IMAGE_EVERY_EPOCH = SAVE_MODEL_EVERY_EPOCH
USE_FG_CLBK = True
#---------------------------------------------------------------------------

Module parameters¶

The modules parameters are written as follows in the configuration file:

NAME=Dict(
  fct="RegisterName"
  kwargs=Dict(
    key_word=arguments,
  )
)

The fct argumentation correspond to one of the module name listed in the register.py file. The register.py file lists all existing modules in Biom3d. To have more details about one specific module, we recommended to read the documentation of the module. There are currently 5 main modules type: dataset, model, metric, trainer and predictor. Each modules are not compatible with all modules, read the documentation for more details.

Training¶

Please create a folder named logs/ in the current directory.

Once the configuration file is defined, the training can start with the following command:

python biom3d/train.py --config configs.your_config_file

Careful, do not put .py in the end of your config file name.

A new sub-folder, that we dubbed base-folder in this documentation, will be created in the logs/ folder. The base-folder contains 3 sub-folders:

image: with the snapshots of the current training results
log: with the configuration files stored in Yaml format and with Tensorboard event file
model: with the Pytorch model(s).

You can plot the training curves during model training with the following command:

tensorboard --logdir=logs/

Advanced training/evaluation/prediction¶

Biom3d has originally been designed to fasten state-of-the-art tools development for 3d bio-medical imaging, that’s why it possible to run in a single command: the training, the test prediction and the test metrics computations. Use python biom3d/train.py --help to get more details.

Prediction¶

Once your model is trained, it is ready to use for prediction with the following command:

python biom3d/pred.py --log path/to/base-folder --dir_in path/to/raw/data --dir_out path/of/the/future/predictions 

For Omero user, you can use the following command to download a Omero Dataset or a Omero Project and to directly run the prediction over this dataset:

python biom3d/omero_pred.py --obj Dataset:ID

or with a Omero Project

python biom3d/omero_pred.py --obj Project:ID

The previous command will ask you to provide your omero server name, your omero identification and your omero password.

Advanced prediction¶

pred.py can also be used to compare the prediction results with existing test annotations. Use python biom3d/pred.py --help for more details.