Containers

Containerization is a technology that allows us to run multiple application processes or multiple services (for microservices) on the same host machine while –

  • providing different environments for each of them
  • and also isolating them from each other.

Containers are more efficient than Virtual Machines

Containerization is way more efficient in terms of resource utilization as –

  • processes in the containers use the same host kernel
  • way less overhead than VM
  • Faster boot time than VM
  • Near native performance

How do containers isolate processes

Processes running inside containers still end up running on the same host machine. How does containerization isolate them?

They use 2 Linux mechanisms –

  • Namespaces
  • cgroups

Linux Namespaces

In Linux, there’s this idea of namespaces, where each resource(e.g. process, users, network interfaces etc.) belongs to some namespaces. For simplicity, think of them as a group of resources.

we can create additional namespaces and run a new process inside them.

Each running process will only see resources that are in the same namespace in which the process is running.

Processes running in different namespaces feel as if they’re running on different machines.

Linux cgroups

cgroups is a Linux kernel feature that limits system resources like CPU, RAM etc. for a process or a group of processes.

Hence processes can’t hog resources reserved for other processes ensuring better isolation, as if they’re running on separate machines.

Multiple containers are better than multiple processes in a container

It’s always better to run one application process per container.

For example, one container for frontend and another for backend, instead of cramming both the application processes in one container.

Cons of having multiple processes in one container –

  • all processes must be on the same machine
    • can’t deploy, say frontend and backend, separately
  • we need to restart the container if either frontend or backend crashes.
    • but the container will run as long as one process is running
    • we’ve to manually stop one process if another one crashes, which is not ideal.
  • both frontend and backend will produce logs.
    • and as they’re on the same container, it’ll be all mixed up and we won’t be able to tell which log came from frontend and which from backend.

Pods - an abstraction over Containers

Pods are the fundamental units of Kubernetes. It’s a group of container(s) bound together and managed as a single unit.

But you may be wondering, why do we need another abstraction over containers?

Why pods? why not directly use containers?

Now that we’ve seen that we should always run one process per container.

But some tightly coupled processes are meant to be run together.

As for Oracle Database, there’s something called ORDS, which is a tool developing REST APIs that interact with Oracle. It would make sense to run them together in a container. So that they could share some resources like the network interface, users etc.

But we don’t want the issues that come with running multiple processes in a container. Hence we need another form of abstraction, Pods.

Multiple pods vs multiple containers in a pod

Now we’re encountering another dilemma.

Say you have some containers. Now you can’t decide whether you wanna put them in one pod or separate pods.

To answer that we need to know some key things about containers in a pod –

  • Pods NEVER span multi-node
    • That means all containers in a pod must live in the same physical machine
  • as Pods are the fundamental Unit in k8s, k8s can only horizontally scale pods, not containers.
    • hence we can’t scale containers in a pod individually.
  • Containers in the same pod share -
    • same IP address and port space
    • same Linux namespace

So while organizing containers in pods, we should make sure each pod only contains tightly coupled containers.

By tightly coupled, I mean one main process in a container (like the web server) and one/more sidecar containers tightly related to that web server. For example, a container running a process to log events in that server.

  • It makes sense to put that logging container in the same pod as the web server as they’ll share same linux namespace.
  • Also, we don’t want to scale the server and its logging process separately.

Ask these questions if you can’t decide between multi-pods or multi-containers

  1. Do the containers always need to be scaled together?
  2. Must they always run on the same machine?
  3. Are those containers too tightly coupled that they need to share the same Linux namespace?

If the answer to any of the questions is ‘yes’, then you should put those containers in the same pod. Otherwise, we should put them on separate pods.

Tl;dr

  • Pods are the most fundamental unit in k8s.
  • Pods are like logical machines, acting like physical machines.
  • Processes running in the same pod are like processes running in the same machine except
    • Each process running in a pod is encapsulated in a container.

I tweet about these topics and anything I’m exploring regularly. Follow me on twitter