How to use Biom3d in Docker

Biom3d now has Docker images that you can find here. We will explore here some simples use-cases of those images.

What is Docker ?

Docker is a tool that lets you run software in isolated environments, called containers. These containers are based on images, prebuilt environments that contain everything the software needs to run (OS, libraries, Python, etc.). For example, the command docker run hello-world creates and runs a container from the official hello-world image. It download the image, create the container and execute the default command of the image.

Installing Docker

Installing Docker is a complex step and we strongly recommend asking your IT support.

For Windows you can use Docker Desktop.

Biom3d use Linux-based images, you must enable WSL2 Back-end or Hyper-V backend to run them.

Docker Desktop is also available on MacOs but Biom3d doesn’t have arm64 images yet.

For Linux, the installation process depends on the distribution.

With Docker Desktop

Unfortunatly, the images aren’t usable with Docker Desktop as it doesn’t allow to make a run with container parameters. However, it is a good mean to install Docker on your machine.

With command lines

Basic Docker commands

This section introduces the essential Docker commands. If you are already used to Docker you can skip to Running a Biom3d container.

Docker is separated in several submodules and three of them will be used in this tutorial : image, container and build.

Image

The image submodule is here to manipulate images. An image is a blueprint, it contains the whole environment, pre-built, ready-to-use.

It is defined by two part : the image name (or repository) and the tag. For example let’s take ubuntu:22.04 where ubuntu is the image name and 22.04, is the tag. The tag is here to differentiate the differents versions of the same image.

Those images are stored in repository such as DockerHub or directly on your machine. To download an image on your computer you use docker pull such as :

  docker pull helloworld
  docker pull ubuntu:22.04
  docker pull pytorch/pytorch:2.3.1-cuda11.8-cudnn8-runtime

It will download it from distant repository, default being DockerHub.

Once you have the image on your machine you can see it by doing docker image ls :

  docker image ls
  REPOSITORY        TAG                             IMAGE ID       CREATED              SIZE
  ubuntu            22.04                           1b668a2d748d   2 weeks ago          77.9MB
  pytorch/pytorch   2.7.1-cuda11.8-cudnn9-runtime   cc0fe24aee5e   4 weeks ago          6.48GB

If you want to remove an image (to liberate disk space) you can do docker image rm with an identifier:

  docker image rm ubuntu # The images from ubuntu repository
  docker image rm ubuntu:22.04 # A specific image by full name
  docker image rm 1b668a2d748d # A specific image by ID

With that we covered the basics on image managment. We can now see how to use them.

Container

The container submodule is the one used by default when you don’t specify one, which mean that those two commands are equivalent :

  docker run helloworld
  docker container run helloworld
Running a container

Container are instances of an image, you can run one of them with docker run :

  docker run docker_arguments image_name image_argument

With some examples :

  docker run --rm helloworld
  docker run biom3d pred -i foo -o bar
  docker run ubuntu
  docker run ubuntu:22.04

Let’s go break down our docker run docker_arguments image_name image_argument :

  1. Docker arguments, they describe how the container should run. Here are the most commons :

    • --rm destroy the container once its job is finished.

    • -n foo or --name foo give a name to the container (here foo).

    • -e FOO=bar set an environment variable in the container

    • -v absolute_path_on_machine:absolute_path_in_container attach a volume to the container. For example -v /home/me:/home/me or -v C:\users\me:/home/me will link your user folder to the user folder in the container, which mean that any modification done in this folder in the container is also done outside and reverse. Another example is -v $(pwd):/data will link the folder you are in to the /datafolder. Be careful to not link a folder you don’t want to modify or be sure that the things you do in the container doesn’t modify it.

    Those are the basics in Docker argument, you can find a complete list here

  2. Image name, seen earlier. The image is pulled if not on your machine and by default the tag used is latest if not specified. Biom3d doesn’t have a latest tag, so a precise tag must be given, which means :

  docker run biom3d # Will not work
  docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu # Will work
  1. Image arguments, they depends on the image entrypoint that is by default not defined. This entrypoint describe what script is launch when you do a run on an image. Biom3d containers have Biom3d as entry point which mean they want to know the module you want to use.

  docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu pred -i foo -o bar # Will run prediction on the foo folder and send it to bar
  docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu # Will crash because the entry point want you to say which submodule you want to use and then its parameters

Keep in mind that everything that is written after the image name will be treated as a command to execute in the container.

Listing containers

Now your container is running but how to monitor it ? Well you can do docker ps :

  docker container ls 
  # Or
  docker ps
  # Or better
  docker ps -a # Will show you even non running container
  CONTAINER ID   IMAGE          COMMAND                  CREATED             STATUS                      PORTS     NAMES
  00e85ccba457   ubuntu:22.04   "/bin/bash"              12 seconds ago      Exited (0) 8 seconds ago              cool_kowalevski
  73ec202bb921   ubuntu:22.04   "/bin/bash"              18 seconds ago      Exited (0) 14 seconds ago             stupefied_banach

It will show you the name, id and status (running, stopped,…) of every container.

Cleaning containers

Biom3d container stop once they finish their job (Status Exited like above) so they must be removed after by doing one of the following:

  docker rm my_biom3d_container_name
  # Or
  docker rm my_biom3d_container_id
  # Or run it with --rm arguments

If you want to remove a container while it is running you can do :

  # Proper way
  docker stop my_biom3d_container
  docker rm my_biom3d_container
  # More violent way
  docker rm my_biom3d_container -f

Here is a bonus : to remove all existing container you can do

  docker rm $(docker ps -aq) # Eventually -f

But it is irreversible so use it wisely.

Now you see the basics of running container. If you just want to use Biom3d you can directly go to running a Biom3d container, if you want to contribute on developement you may be interested on building

Building

If you want to contribute to Biom3d or make your custom images tailored to your needs, you’ll have to build an image. To do that you use a DockerFile, you can see some examples here.

A DockerFile is a plain text that describe how to build an image. It follow this structure :

  FROM baseimage # It is always the first line
  ENV FOO=Bar
  COPY relative_path_from_dockerfile absolute_path_in_image
  ADD source destination
  WORKDIR folder # Set the folder in which you locate the rest of the dockerfile or execution
  RUN command
  ENTRYPOINT my_entrypoint_script # Define the default command

Let’s break down :

  • FROM is always the first (or one of the first) command. It is the address of an existing Docker image, and you will add new thing to it.

  • ENV set an environment variable.

  • COPY copy the element at givent location on your drive to the given location in the image.

  • ADD work the same way as copy in a sense it copy the source to the destination in the image. But it can also take an URL as a source and it automically exract archives (.tar,tar.gz,…)

  • WORKDIR path allow you to go to the given path, it is equivalent to a cd path command. For example in the Biom3d DockerFiles, there is always at the end WORKDIR /workspace, it means that when you use the container it will save files in /workspace and that’s why we always want to attach a volume to /workspace

  • RUN command will run the command, following the base image syntaxe (shell for MacOs/Linux, batch or powershell on Windows), it is quite simple but there are some good pratices. One of those are to group RUN statement :

      # Don't do
      RUN mdir /foo
      RUN pip install biom3d
      # But
      RUN mkdir /foo \
      && pip install torch
      # Or on Windows images
      RUN mkdir /foo ^
      && pip install torch
    

    This will make your image smaller. But while you’re creating a new image, keep separated RUN statement, it will be easier to debug.

  • ENTRYPOINT is a script or program (shell, batch, python,..) that will be used each time you access the container.

You can also add arguments with the ARG FOO=default_value, and access the value with ${FOO} (or %FOO% if you’re using it in a Windows command).

Then you build the image with docker build

  docker build -f path_to_dockerfile .
  docker build . # Only build the file named Dockerfile
  docker build --build_arg FOO=not_bar -f path . # Will change the value of the given ARG
  docker build -f path ./subforlder # The ADD or COPY will be relative to path/subfolder instead of path 

With that you should be able to build your image if you need it, the hard part of Biom3d build is the managment of python dependencies (there are examples of DockerFiles on the git repository and a specific guide on Biom3d images here).

Running a Biom3d container

Now you should know some basics on how to use Docker, so here we will see how to use Biom3d container themselves. As explained earlier, Biom3d has a specific entrypoint that make containers not (easily) reusable, they are throwable container.

You run a command -> it executes -> it exits -> it gets removed.

Image tagging

As said earlier, Biom3d doesn’t have a latest tag, meaning that specific tag must be given.

  docker run biom3d # Will not work
  docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu # Will work

Here is how our tag is structured :

# CPU only image
biom3d:<Version>-<Architecture>-torch<torch version>-cpu
# GPU images
biom3d:<Version>-<Architecture>-torch<torch version>-cuda<cuda version>-cudnn<cudnn version>

Running

Here is the basic command for using Biom3d container :

  docker run --rm \
  -v folder_with_data_set:/workpace \
  biom3d:tag_relevant_with_hardware \
  module \
  module_argument \

  # Example
  docker run --rm\
  -v /home/me/dataset1:/workspace \
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu \
  preprocess_train \
  --img_dir raw \ # Is a subfolder of /home/me/dataset1
  --img_dir pred  # Will use or create subfolder of /home/me/dataset1

Or on windows :

  docker run --rm ^
  -v folder_with_data_set:/workpace ^
  biom3d:tag_relevant_with_hardware ^
  module ^
  module_argument ^

  :: Example
  docker run --rm^
  -v C:\users\me\dataset1:/workspace ^
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu ^
  preprocess_train ^
  --img_dir raw ^ :: Is a subfolder of C:\users\me\dataset1
  --img_dir pred  :: Will use or create subfolder of C:\users\me\dataset1

Here is a description.

  1. As said earlier, Biom3d container are one use only so we add --rm that allow us to automatically destroy the container at the end of execution.

  2. We want to transmit our dataset to the container so Biom3d can use it, the simpler solution is by mounting a volume with the folder containing the dataset with -v.

  3. We select the tag that is the most pertinent for your use case (CUDA drivers, Biom3d version, architecture,…).

  4. We pass the submodule and it’s argument (that are described in CLI Documentation).

GUI specificity

However there is a twist, if you want to use the GUI module, you have another thing to do, and it is giving the container access to you screen.

On Linux with :

  -e DISPLAY=$DISPLAY\ # Tell to use your screen
  # Transmit permission to container
  -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
  -v $HOME/.Xauthority:/root/.Xauthority:ro \

  # That give 
  docker run --rm \
  -e DISPLAY=$DISPLAY \
  -v /tmp/.X11-unix:/tmp/.X11-unix:ro \
  -v $HOME/.Xauthority:/root/.Xauthority:ro \
  -v /home/me/dataset1:/workspace \
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu gui

On Windows : On Windows, it is harder. You will first need to install a X server (like VcXsrv for example). Start it. Then you will need to know your IP address. In a terminal :

ipconfig

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . : corp.example.local
   IPv4 Address. . . . . . . . . . . : 10.42.16.87
   Subnet Mask . . . . . . . . . . . : 255.255.252.0
   Default Gateway . . . . . . . . . : 10.42.16.1

Ethernet adapter vEthernet (WSL):

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : fe80::215:5aff:fe7a:3d0e%57
   IPv4 Address. . . . . . . . . . . : 172.29.112.1
   Subnet Mask . . . . . . . . . . . : 255.255.240.0
   Default Gateway . . . . . . . . . :

Don’t take the WSL one but the one of your machine.

Then :

  docker run --rm \
  -e DISPLAY=your_ip_address \
  -v /home/me/dataset1:/workspace \
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu gui

GPU specificity

If you want to use a container with your GPU you have to meet two condition :

  • Use a container with the correct version of CUDA and cudNN

  • Use the container argument --gpus all, it will allow your container to interact with your GPU

      docker run --rm\
      -v /home/me/dataset1:/workspace \
      --gpus all \
      biom3d:v1.0.0-x86_64-torch2.7.1-cpu \
      preprocess_train \
      --img_dir raw \ # Is a subfolder of /home/me/dataset1
      --img_dir pred
    

Memory troubleshooting

If you encounter a memory problem. It can be two thing :

RAM problem

You don’t have enough RAM to allocate everything Biom3d needs. You can see the RAM accessible to your container by doing :

  # Create a permanent container
  docker run --rm\
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu \
  gui

  docker inspect --format='{{.HostConfig.Memory}}' gui_container_identifier

It will give you an int, if it is 0 there is no memory limit and if you still don’t have enough free RAM it is either you don’t have enough on your computer or other processes use too much. Else it is the limit in Byte, the default value is 0 (unlimited).

You can set the limit with the --memory parameter.

  docker run --rm\
  -v /home/me/dataset1:/workspace \
  --memory=32g \
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu \
  preprocess_train \
  --img_dir raw \ # Is a subfolder of /home/me/dataset1
  --img_dir pred

You you can also reduce the NUM_WORKERS variable in the model’s config file (model/log/config.yaml)

Shared memory problem

This problem is more common, it is caused by the multiprocessing used by Biom3d. First you need to know how much of shared memory your container can use :

  # Create a permanent container
  docker run --rm\
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu \
  gui

  docker inspect --format='{{.HostConfig.ShmSize}}' gui_container_identifier

It will give you the shared memory size in Byte, default being 64 MiB, so around 67MB. You can augment it by using --shm-size.

  docker run --rm\
  -v /home/me/dataset1:/workspace \
  --shm-size=12g \
  biom3d:v1.0.0-x86_64-torch2.7.1-cpu \
  preprocess_train \
  --img_dir raw \ # Is a subfolder of /home/me/dataset1
  --img_dir pred

The amount of shared memory allowed will depends on your OS, whether it is RAM or VRAM (so your GPU). If it isn’t enough (or you don’t want to meddle with memory), you can reduce the NUM_WORKERS variable in the model’s config file (model/log/config.yaml).