# How to use Biom3d in Docker Biom3d now has Docker images that you can find [here](https://hub.docker.com/r/gumougeot/biom3d). We will explore here some simples use-cases of those images. ## What is Docker ? Docker is a tool that lets you run software in **isolated environments**, called **containers**. These containers are based on **images**, prebuilt environments that contain everything the software needs to run (OS, libraries, Python, etc.). For example, the command `docker run hello-world` creates and runs a container from the official hello-world image. It download the image, create the container and execute the default command of the image. ## Installing Docker Installing Docker is a complex step and we strongly recommend asking your IT support. For Windows you can use [Docker Desktop](https://docs.docker.com/desktop/setup/install/windows-install/). > Biom3d use Linux-based images, you must enable WSL2 Back-end or Hyper-V backend to run them. > Docker Desktop is also available on MacOs but Biom3d doesn't have arm64 images yet. For Linux, the installation process depends on the distribution. ## With Docker Desktop Unfortunatly, the images aren't usable with Docker Desktop as it doesn't allow to make a `run` with container parameters. However, it is a good mean to install Docker on your machine. ## With command lines ### Basic Docker commands This section introduces the essential Docker commands. If you are already used to Docker you can skip to [Running a Biom3d container](#running-a-biom3d-container). Docker is separated in several submodules and three of them will be used in this tutorial : `image`, `container` and `build`. #### Image The image submodule is here to manipulate images. An image is a blueprint, it contains the whole environment, pre-built, ready-to-use. It is defined by two part : the image name (or repository) and the tag. For example let's take `ubuntu:22.04` where `ubuntu` is the image name and `22.04`, is the tag. The tag is here to differentiate the differents versions of the same image. Those images are stored in repository such as [DockerHub](https://hub.docker.com/) or directly on your machine. To download an image on your computer you use `docker pull` such as : ```shell docker pull helloworld docker pull ubuntu:22.04 docker pull pytorch/pytorch:2.3.1-cuda11.8-cudnn8-runtime ``` It will download it from distant repository, default being `DockerHub`. Once you have the image on your machine you can see it by doing `docker image ls` : ```shell docker image ls REPOSITORY TAG IMAGE ID CREATED SIZE ubuntu 22.04 1b668a2d748d 2 weeks ago 77.9MB pytorch/pytorch 2.7.1-cuda11.8-cudnn9-runtime cc0fe24aee5e 4 weeks ago 6.48GB ``` If you want to remove an image (to liberate disk space) you can do `docker image rm` with an identifier: ```shell docker image rm ubuntu # The images from ubuntu repository docker image rm ubuntu:22.04 # A specific image by full name docker image rm 1b668a2d748d # A specific image by ID ``` With that we covered the basics on image managment. We can now see how to use them. #### Container The `container` submodule is the one used by default when you don't specify one, which mean that those two commands are equivalent : ```shell docker run helloworld docker container run helloworld ``` ##### Running a container Container are instances of an image, you can run one of them with `docker run` : ```shell docker run docker_arguments image_name image_argument ``` With some examples : ```bash docker run --rm helloworld docker run biom3d pred -i foo -o bar docker run ubuntu docker run ubuntu:22.04 ``` Let's go break down our `docker run docker_arguments image_name image_argument` : 1. **Docker arguments**, they describe how the container should run. Here are the most commons : - `--rm` destroy the container once its job is finished. - `-n foo` or `--name foo` give a name to the container (here foo). - `-e FOO=bar` set an environment variable in the container - `-v absolute_path_on_machine:absolute_path_in_container` attach a volume to the container. For example `-v /home/me:/home/me` or `-v C:\users\me:/home/me` will link your user folder to the user folder in the container, which mean that any modification done in this folder in the container is also done outside and reverse. Another example is `-v $(pwd):/data` will link the folder you are in to the `/data`folder. Be careful to not link a folder you don't want to modify or be sure that the things you do in the container doesn't modify it. Those are the basics in Docker argument, you can find a complete list [here](https://docs.docker.com/reference/cli/docker/container/run/) 2. **Image name**, seen earlier. The image is pulled if not on your machine and by default the tag used is `latest` if not specified. Biom3d doesn't have a latest tag, so a precise tag must be given, which means : ```shell docker run biom3d # Will not work docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu # Will work ``` 3. **Image arguments**, they depends on the image entrypoint that is by default not defined. This entrypoint describe what script is launch when you do a run on an image. Biom3d containers have Biom3d as entry point which mean they want to know the module you want to use. ```bash docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu pred -i foo -o bar # Will run prediction on the foo folder and send it to bar docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu # Will crash because the entry point want you to say which submodule you want to use and then its parameters ``` Keep in mind that everything that is written **after** the image name will be treated as a command to execute in the container. ##### Listing containers Now your container is running but how to monitor it ? Well you can do `docker ps` : ```shell docker container ls # Or docker ps # Or better docker ps -a # Will show you even non running container CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 00e85ccba457 ubuntu:22.04 "/bin/bash" 12 seconds ago Exited (0) 8 seconds ago cool_kowalevski 73ec202bb921 ubuntu:22.04 "/bin/bash" 18 seconds ago Exited (0) 14 seconds ago stupefied_banach ``` It will show you the name, id and status (running, stopped,...) of every container. ##### Cleaning containers Biom3d container stop once they finish their job (Status Exited like above) so they must be removed after by doing one of the following: ```shell docker rm my_biom3d_container_name # Or docker rm my_biom3d_container_id # Or run it with --rm arguments ``` If you want to remove a container while it is running you can do : ```shell # Proper way docker stop my_biom3d_container docker rm my_biom3d_container # More violent way docker rm my_biom3d_container -f ``` Here is a bonus : to remove all existing container you can do ```shell docker rm $(docker ps -aq) # Eventually -f ``` But it is irreversible so use it wisely. Now you see the basics of running container. If you just want to use Biom3d you can directly go to [running a Biom3d container](#running-a-biom3d-container), if you want to contribute on developement you may be interested on [building](#building) (building)= #### Building If you want to contribute to Biom3d or make your custom images tailored to your needs, you'll have to build an image. To do that you use a `DockerFile`, you can see some examples [here](https://github.com/GuillaumeMougeot/biom3d/tree/deployment/deployment/dockerfiles/examples). A DockerFile is a plain text that describe how to build an image. It follow this structure : ```Dockerfile FROM baseimage # It is always the first line ENV FOO=Bar COPY relative_path_from_dockerfile absolute_path_in_image ADD source destination WORKDIR folder # Set the folder in which you locate the rest of the dockerfile or execution RUN command ENTRYPOINT my_entrypoint_script # Define the default command ``` Let's break down : - `FROM` is always the first (or one of the first) command. It is the address of an existing Docker image, and you will add new thing to it. - `ENV` set an environment variable. - `COPY` copy the element at givent location on your drive to the given location in the image. - `ADD` work the same way as copy in a sense it copy the source to the destination in the image. But it can also take an URL as a source and it automically exract archives (`.tar`,`tar.gz`,...) - `WORKDIR path` allow you to go to the given path, it is equivalent to a `cd path` command. For example in the Biom3d DockerFiles, there is always at the end `WORKDIR /workspace`, it means that when you use the container it will save files in `/workspace` and that's why we always want to attach a volume to `/workspace` - `RUN command` will run the command, following the base image syntaxe (shell for MacOs/Linux, batch or powershell on Windows), it is quite simple but there are some good pratices. One of those are to group `RUN` statement : ``` # Don't do RUN mdir /foo RUN pip install biom3d # But RUN mkdir /foo \ && pip install torch # Or on Windows images RUN mkdir /foo ^ && pip install torch ``` This will make your image smaller. But while you're creating a new image, keep separated `RUN` statement, it will be easier to debug. - `ENTRYPOINT` is a script or program (shell, batch, python,..) that will be used each time you access the container. You can also add arguments with the `ARG FOO=default_value`, and access the value with `${FOO}` (or `%FOO%` if you're using it in a Windows command). Then you build the image with `docker build` ```shell docker build -f path_to_dockerfile . docker build . # Only build the file named Dockerfile docker build --build_arg FOO=not_bar -f path . # Will change the value of the given ARG docker build -f path ./subforlder # The ADD or COPY will be relative to path/subfolder instead of path ``` With that you should be able to build your image if you need it, the hard part of Biom3d build is the managment of python dependencies (there are examples of DockerFiles on the [git repository](https://github.com/GuillaumeMougeot/biom3d) and a specific guide on Biom3d images [here](../dep/docker)). ### Running a Biom3d container Now you should know some basics on how to use Docker, so here we will see how to use Biom3d container themselves. As explained earlier, Biom3d has a specific entrypoint that make containers not (easily) reusable, they are throwable container. You run a command -> it executes -> it exits -> it gets removed. #### Image tagging As said earlier, Biom3d doesn't have a `latest` tag, meaning that specific tag must be given. ```shell docker run biom3d # Will not work docker run biom3d:v1.0.0-x86_64-torch2.7.1-cpu # Will work ``` Here is how our tag is structured : ```bash # CPU only image biom3d:--torch-cpu # GPU images biom3d:--torch-cuda-cudnn ``` #### Running Here is the basic command for using Biom3d container : ```shell docker run --rm \ -v folder_with_data_set:/workpace \ biom3d:tag_relevant_with_hardware \ module \ module_argument \ # Example docker run --rm\ -v /home/me/dataset1:/workspace \ biom3d:v1.0.0-x86_64-torch2.7.1-cpu \ preprocess_train \ --img_dir raw \ # Is a subfolder of /home/me/dataset1 --img_dir pred # Will use or create subfolder of /home/me/dataset1 ``` Or on windows : ```batch docker run --rm ^ -v folder_with_data_set:/workpace ^ biom3d:tag_relevant_with_hardware ^ module ^ module_argument ^ :: Example docker run --rm^ -v C:\users\me\dataset1:/workspace ^ biom3d:v1.0.0-x86_64-torch2.7.1-cpu ^ preprocess_train ^ --img_dir raw ^ :: Is a subfolder of C:\users\me\dataset1 --img_dir pred :: Will use or create subfolder of C:\users\me\dataset1 ``` Here is a description. 1. As said earlier, Biom3d container are one use only so we add `--rm` that allow us to automatically destroy the container at the end of execution. 2. We want to transmit our dataset to the container so Biom3d can use it, the simpler solution is by mounting a volume with the folder containing the dataset with `-v`. 3. We select the tag that is the most pertinent for your use case (CUDA drivers, Biom3d version, architecture,...). 4. We pass the submodule and it's argument (that are described in [CLI Documentation](../tuto/cli.md)). #### GUI specificity However there is a twist, if you want to use the `GUI` module, you have another thing to do, and it is giving the container access to you screen. On Linux with : ```shell -e DISPLAY=$DISPLAY\ # Tell to use your screen # Transmit permission to container -v /tmp/.X11-unix:/tmp/.X11-unix:ro \ -v $HOME/.Xauthority:/root/.Xauthority:ro \ # That give docker run --rm \ -e DISPLAY=$DISPLAY \ -v /tmp/.X11-unix:/tmp/.X11-unix:ro \ -v $HOME/.Xauthority:/root/.Xauthority:ro \ -v /home/me/dataset1:/workspace \ biom3d:v1.0.0-x86_64-torch2.7.1-cpu gui ``` On Windows : On Windows, it is harder. You will first need to install a X server (like VcXsrv for example). Start it. Then you will need to know your IP address. In a terminal : ```batch ipconfig Ethernet adapter Ethernet: Connection-specific DNS Suffix . : corp.example.local IPv4 Address. . . . . . . . . . . : 10.42.16.87 Subnet Mask . . . . . . . . . . . : 255.255.252.0 Default Gateway . . . . . . . . . : 10.42.16.1 Ethernet adapter vEthernet (WSL): Connection-specific DNS Suffix . : Link-local IPv6 Address . . . . . : fe80::215:5aff:fe7a:3d0e%57 IPv4 Address. . . . . . . . . . . : 172.29.112.1 Subnet Mask . . . . . . . . . . . : 255.255.240.0 Default Gateway . . . . . . . . . : ``` Don't take the WSL one but the one of your machine. Then : ```batch docker run --rm \ -e DISPLAY=your_ip_address \ -v /home/me/dataset1:/workspace \ biom3d:v1.0.0-x86_64-torch2.7.1-cpu gui ``` #### GPU specificity If you want to use a container with your GPU you have to meet two condition : - Use a container with the correct version of CUDA and cudNN - Use the container argument `--gpus all`, it will allow your container to interact with your GPU ```bash docker run --rm\ -v /home/me/dataset1:/workspace \ --gpus all \ biom3d:v1.0.0-x86_64-torch2.7.1-cpu \ preprocess_train \ --img_dir raw \ # Is a subfolder of /home/me/dataset1 --img_dir pred ``` #### Memory troubleshooting If you encounter a memory problem. It can be two thing : ##### RAM problem You don't have enough RAM to allocate everything Biom3d needs. You can see the RAM accessible to your container by doing : ```bash # Create a permanent container docker run --rm\ biom3d:v1.0.0-x86_64-torch2.7.1-cpu \ gui docker inspect --format='{{.HostConfig.Memory}}' gui_container_identifier ``` It will give you an int, if it is `0` there is no memory limit and if you still don't have enough free RAM it is either you don't have enough on your computer or other processes use too much. Else it is the limit in Byte, the default value is `0` (unlimited). You can set the limit with the `--memory` parameter. ```bash docker run --rm\ -v /home/me/dataset1:/workspace \ --memory=32g \ biom3d:v1.0.0-x86_64-torch2.7.1-cpu \ preprocess_train \ --img_dir raw \ # Is a subfolder of /home/me/dataset1 --img_dir pred ``` You you can also reduce the `NUM_WORKERS` variable in the model's config file (`model/log/config.yaml`) ##### Shared memory problem This problem is more common, it is caused by the multiprocessing used by Biom3d. First you need to know how much of shared memory your container can use : ```bash # Create a permanent container docker run --rm\ biom3d:v1.0.0-x86_64-torch2.7.1-cpu \ gui docker inspect --format='{{.HostConfig.ShmSize}}' gui_container_identifier ``` It will give you the shared memory size in Byte, default being 64 MiB, so around 67MB. You can augment it by using `--shm-size`. ```bash docker run --rm\ -v /home/me/dataset1:/workspace \ --shm-size=12g \ biom3d:v1.0.0-x86_64-torch2.7.1-cpu \ preprocess_train \ --img_dir raw \ # Is a subfolder of /home/me/dataset1 --img_dir pred ``` The amount of shared memory allowed will depends on your OS, whether it is RAM or VRAM (so your GPU). If it isn't enough (or you don't want to meddle with memory), you can reduce the `NUM_WORKERS` variable in the model's config file (`model/log/config.yaml`).