Featured image of post About Containerization: (Containerization, Docker, Docker-Compose, Kubernetes / K8s, etc.)

About Containerization: (Containerization, Docker, Docker-Compose, Kubernetes / K8s, etc.)

An introduction to container-related knowledge: (Containerization, Docker, Docker-Compose, Kubernetes / K8s, etc.)

Background

I have almost all my websites deployed using Docker. When I talked to my friends about how I built these websites, I found that many people said they had heard of Docker or containers, but didn’t know what they were.

In fact, when I first started using Docker, I came into contact with many container-related concepts, but because I had not practiced them, I was also at a loss about these concepts. Now that I have been using container technology for a long time, I want to summarize the related knowledge and hope to help others.

Containerization

What is containerization

Containerization is a virtualization technology that can package applications and their dependencies together so that they can run in different environments. Containerized applications can run anywhere, including developers’ laptops, physical servers, virtual machines, container clusters, and public clouds.

Why containerization is needed

Looking directly at the concept of containerization, it may still be difficult to understand. We use an example to illustrate why containerization is needed, so that the concept of containerization will be very clear.

First, if you are just a user of a program or service and don’t care how the program or service runs on the machine, you don’t need to understand the concept of containerization at all.

Second, if the program or service you are running is just a simple one, and you just need it to run on your computer, you don’t need to understand the concept of containerization either.

So, here we assume that you are a developer (of course, you don’t really have to develop a program, because for most beginners, they don’t develop any programs, they just use programs they get from open source websites or elsewhere), you have a program you want to run, this program can provide some services, and you want to be able to access these services from different places. For example, you have built a website using a popular website framework, and you want to be able to access this website from different devices in different places, and you also want others to be able to access your website. At this point, you deploy the website to your computer, assuming your network operator assigns you a static IP and opens a port, you can open your website on the public network, so you can access the website from anywhere that can connect to the public network.

After a while, you bought a domain name and bound it to your website using a web server (such as Nginx or Apache), so you can access your website through the domain name.

After a while, you found that you had deployed more and more services on your website, and your website was becoming more and more popular, and the number of visitors was increasing. Your computer’s weak performance was no longer able to handle such a large number of visitors, so you decided to buy a more powerful computer or server and move your website there.

So you assembled a server, installed a Linux system, and started migrating the website. You copied the website files from the original computer to the new computer, and then started running them, only to find that the website couldn’t run at all. Because your website has many dependencies, such as Python libraries and some miscellaneous libraries, you can only install these libraries one by one. Then you will find that many of the default versions of the libraries are newer than the versions you installed on the original computer, and the new libraries are simply not compatible with your old code, so you can only find old versions of the libraries on the Internet and install them one by one. If you’re lucky, you can install the old versions of the libraries smoothly; if you’re unlucky, you may find that your operating system also depends on some libraries, and your operating system crashes after installing the old versions.

This is not the end. Even if you successfully solve the dependency problem, you need to reconfigure the database, web server, etc. that you configured for your website on the original computer.

At this point, you may think, if you could just pack up the website, database, web server, and their dependencies and runtime environment on the original computer, and then run them directly on the new computer, that would be great. If you’ve used virtual machines before, you might think, isn’t this just creating a virtual machine that’s the same as the original computer and then migrating the virtual machine to the new computer?

But virtual machines are too heavy, because virtual machines contain a complete operating system, which generally takes up a lot of resources and starts very slowly. So we don’t want to create the entire operating system, we just want to create a runtime environment that is the same as the original program. This is actually containerization.

How to containerize

Now that our goal is clear, we want to package a program and its dependencies and runtime environment, and then run this packaged program in different places. So, simply put, containerization consists of two steps: 1. Packaging the program and its dependencies and runtime environment; 2. Isolating an independent runtime environment on the machine and running the packaged program.

Packaging

This step is relatively easy, just package the program and its dependencies and runtime environment. The packaged program and its dependencies and runtime environment are called images.

Isolating the environment

The technology for isolating the environment depends on the characteristics of the operating system. In the Linux system, the isolation of the environment depends on three technologies: namespace, control group, and chroot:

  • Namespace: Namespaces can isolate the view of processes, so that processes can only see their own view and not the view of other processes.
  • Control Group: Control groups can limit the resource usage of processes, such as CPU, memory, disk IO, etc.
  • chroot: chroot can change the root directory of a process.

A detailed introduction to these three technologies is beyond the scope of this article, and interested readers can search for them on their own.

Windows systems also have container technology, but it is different from Linux systems. In Docker Desktop for Windows, containerization technology depends on the Hyper-V virtual machine.

Containers and virtual machines

From the above description, it can be seen that containers and virtual machines both provide the function of isolating the environment, and both can be called sandbox technology. But there are also significant differences between the two:

  • Containers are implemented at the operating system level and provide process-level isolation, and must share the operating system kernel with the host machine.
  • Virtual machines are implemented at the hardware level and provide operating system-level isolation, and can have their own operating system kernel.

Because containers share the operating system kernel with the host machine, the startup speed of containers (seconds) is much faster than that of virtual machines (minutes), and the resource usage of containers is also much lower than that of virtual machines.

Docker

The three technologies required for containerization appeared in 1979 (chroot), 2002 (namespace), and 2007 (cgroup), but the original intention of these three technologies was not containerization. It wasn’t until 2013 that Docker emerged and combined these three technologies, and containerization began to become popular.

Docker is simple in composition and use, as shown in the figure below:

Docker usage

Docker’s composition

The figure contains the three core concepts of Docker: image, container, and repository.

  • Image: An image is a read-only template that contains everything needed to run a program, including code, runtime, libraries, environment variables, and configuration files. The image is the basis of the container.
  • Repository: A repository is a place to store images, and can be understood as a collection of images. Repositories are divided into public and private repositories, public repositories are open, and anyone can upload and download images; private repositories are private, and only the owner can upload and download images.
  • Container: A container is a running instance of an image, and it contains the image and everything it needs at runtime, including the file system, system environment, network configuration, etc. The container is the runtime state of the image.

Docker usage (single container)

There are two ways to use Docker: using someone else’s image and making your own image.

Using someone else’s image

Using someone else’s image is very simple, just two steps:

  1. Download the image: Use the docker pull command to download the image, for example:

    1
    
    docker pull ubuntu:latest
    

    This command will download an image named ubuntu from Docker Hub, and the tag latest indicates that the downloaded image is the latest version.

  2. Run the container: Use the docker run command to run the container, for example:

    1
    
    docker run -it --rm ubuntu:latest /bin/bash
    

    The above command will run a container named ubuntu, with the tag latest, and enter the bash terminal of the container.

Using your own image

Using your own image requires three steps:

  1. Write a Dockerfile: The Dockerfile is a text file that contains a series of commands used to build the image. For example:

    1
    2
    3
    4
    5
    
    FROM python:3.12
    WORKDIR /app
    COPY . /app
    RUN pip install -r requirements.txt
    CMD ["python", "app.py"]
    

    In the Dockerfile above, the FROM command indicates that the base image is python:3.12, the WORKDIR command indicates that the working directory is /app, the COPY command indicates that all files in the current directory are copied to the /app directory, the RUN command indicates that the dependencies in requirements.txt are installed, and the CMD command indicates the command to run when the container starts.

  2. Build the image: Use the docker build command to build the image, for example:

    1
    
    docker build -t myapp .
    

    The above command will build an image named myapp in the current directory.

  3. Run the container: Use the docker run command to run the container, for example:

    1
    
    docker run -d -p 5000:5000 myapp
    

    The above command will run a container named myapp and map the container’s 5000 port to the host’s 5000 port.

Docker usage (multiple containers)

In most cases, we need more than one container to deploy a service, for example, we need a container to run the cloud disk service, and another container to run the database service. And some complex services may require more containers.

At the same time, the technology for deploying and managing multiple containers is called container orchestration, and Docker provides a tool called docker-compose to achieve container orchestration.

docker-compose installation

When Docker client was first released, it did not have the ability to orchestrate containers, so developers developed the docker-compose tool to achieve container orchestration. This version of docker-compose is called the first version (v1) of docker-compose.

Later, Docker released a new version of the Docker client, which has the ability to orchestrate containers, called the second version (v2) of docker-compose.

In Ubuntu 22, docker-compose can be installed using the apt package manager, and the apt search docker-compose command can be used to see three versions of docker-compose in the apt source:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ apt search docker-compose       
Sorting... Done
Full Text Search... Done
docker-compose/jammy,jammy,now 1.29.2-1 all [installed]
  define and run multi-container Docker applications with YAML

docker-compose-plugin/jammy 2.24.5-1~ubuntu.22.04~jammy amd64 [upgradable from: 2.24.1-1~ubuntu.22.04~jammy]
  Docker Compose (V2) plugin for the Docker CLI.

docker-compose-v2/jammy-updates 2.20.2+ds1-0ubuntu1~22.04.1 amd64
  tool for running multi-container applications on Docker

The first one is the first version, which we won’t consider. The second and third ones are both the second version.

  • The first one is the v1, we won’t consider it
  • The second one is a plugin for docker-cli, and the command to use is docker compose
  • The third one is a standalone tool, and the command to use is docker-compose

We generally choose to install the third one, because many tutorials on the Internet use the docker-compose command, not the docker compose command.

docker-compose usage

docker-compose uses a file named docker-compose.yml to define the configuration of multiple containers. For example, when we deploy a personal cloud disk service using Nextcloud, the docker-compose.yml file we use is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
version: ‘3’

services:

db:
    image: mariadb
    container_name: nextcloud-mariadb
    networks:
      - nextcloud_network
    volumes:
      - ./db:/var/lib/mysql
      - /etc/localtime:/etc/localtime:ro
    environment:
      - MYSQL_ROOT_PASSWORD=PASSWORD1
      - MYSQL_PASSWORD=PASSSWORD2
      - MYSQL_DATABASE=nextcloud
      - MYSQL_USER=nextcloud
    restart: unless-stopped

app:
    image: nextcloud:latest
    container_name: nextcloud-app
    networks:
      - nextcloud_network
    ports:
      - 7080:80
    depends_on:
      - db
    volumes:
      - ./nextcloud:/var/www/html
      - ./app/config:/var/www/html/config
      - ./app/custom_apps:/var/www/html/custom_apps
      - ./app/data:/var/www/html/data
      - ./app/themes:/var/www/html/themes
      - /etc/localtime:/etc/localtime:ro
    environment:
      - VIRTUAL_HOST=your.cloud.domain.name
    restart: unless-stopped

networks:
  nextcloud_network:

This file defines two services: db and app. The db service uses the mariadb image, and the app service uses the nextcloud image. Both the db service and the app service use the nextcloud_network network.

We can use the docker-compose command to start these two services:

1
docker-compose up -d

If you want to stop these two services, you can use the docker-compose command to stop them:

1
docker-compose down

Kubernetes / K8s

As mentioned earlier, the docker-compose tool was developed by Docker, and its functionality is relatively simple, suitable for small projects that are only used on a single computer or server. For large projects that include several computer clusters, docker-compose’s functionality is somewhat lacking, and this is where Kubernetes (K8s) comes in.

Kubernetes (K8s, because there are 8 letters between K and s) is an open-source container orchestration engine that can automate the deployment, scaling, and management of containerized applications, and its functionality is very powerful.

It should be noted that the containerization standard is defined by the Open Container Initiative (OCI), and Docker is just one implementation of OCI. Kubernetes is also an implementation of OCI, and it used to use Docker as the container runtime, but now it no longer depends on Docker, but uses containerd as the container runtime.

I haven’t used Kubernetes yet, so I won’t go into detail here.

comments powered by Disqus