Reproducing OOM Errors with Docker/Podman/*Containers
📆: 2024-03-28 - 🏷: blog, til, k8s, oom, cgroups
From the previous adventures in: Why my Kubernetes rails app returns 502 errors?, do I really need a Kubernetes cluster to experience an OOM in my workload? That seems excessive…
I couldn’t let go this problem as a “deployment environment limitation”, I needed to have something I could reproduce faster at my reach.
That workload is mostly developed using docker-compose
, I do
remember that docker has memory limits:
docker run --help | grep memory
--kernel-memory bytes Kernel memory limit
-m, --memory bytes Memory limit
--memory-reservation bytes Memory soft limit
--memory-swap bytes Swap limit equal to memory plus swap: '-1' to enable unlimited swap
--memory-swappiness int Tune container memory swappiness (0 to 100) (default -1)
Maybe I could get to the same conditions in the cluster if
docker-compose
passes --memory
to the containers it spins… 🤔
but how?🤔🤔🤔
And it just so happens that… it’s possible!
From the horse’s mouth:
deploy
[…]
Specify configuration related to the deployment and running of services. The following sub-options only takes effect when deploying to a swarm with docker stack deploy, and is ignored by
docker-compose up
anddocker-compose run
, except forresources
.
It looks like it’s a matter of adding:
version: '3.0' # this is what the workload had
services:
workload:
image: registry.internal/workload:base
# vvv this part
deploy:
resources:
limits:
memory: 1G
reservations:
memory: 512M
# ^^^ this part
To the service
definition and we get the sweet sweet OOM locally! Let’s give
it a try!
docker-compose -f docker-compose.yml run workload bin/puma -C config/puma.rb | grep 'Worker '
# Just to keep those lines of puma worker restart
It seems to boot just fine but no matter how hard I hit the workload I can’t seem to kill it… Maybe the memory limits aren’t properly set?
# i'm lazy, this gives me the container name I need
# docker ps --filter name=container --format "{{ .Names }}"
docker inspect $(docker ps --filter name=container --format "{{ .Names }}") | grep -i memory
"Memory": 1073741824,
"MemoryReservation": 536870912,
"MemorySwap": 2147483648,
"MemorySwappiness": null,
Hmm, that swap… it seems that swap is counted as usable memory, then we’re not
having really a memory: 1G
limit, but rather 3G
…
Maybe setting it to half of what you expect to have 512M vs 1G
could get us to
an exploitable values, but I went a tad deeper into this container beast… and
went with:
# Run the workload...
docker-compose -f docker-compose.yml run workload bin/puma -C config/puma.rb | grep 'Worker '
In another shell then:
docker container update --memory 1G --memory-swap 1025M $(docker ps --filter name=container --format "{{ .Names }}")
docker inspect $(docker ps --filter name=container --format "{{ .Names }}") | grep -i memory
"Memory": 1073741824,
"MemoryReservation": 536870912,
"MemorySwap": 1074790400,
"MemorySwappiness": 0,
Aha! That certainly looks better :D
But I still couldn’t hit that OOM behavior that easily, so I brought a last quick tweak to this whole setup: Import the OOM behavior from the cluster.
I still was seeing:
[1] - Worker 1 (PID: 206) booted in 0.06s, phase: 0
And nothing else.
When I was ls
-ing shit around in the previous post, I found some oom_*
files
in the /proc/
filesystem out of random. Didn’t pay much attention to them,
but now after reading the kernel documentation about those oom_*
files
These files can be used to adjust the badness heuristic used to select which process gets killed in out of memory (oom) conditions.
The badness heuristic assigns a value to each candidate task ranging from 0 (never kill) to 1000 (always kill) to determine which process is targeted.
[…]
The value of
/proc/<pid>/oom_score_adj
is added to the badness score before it is used to determine which task to kill. Acceptable values range from -1000 (OOM_SCORE_ADJ_MIN
) to +1000 (OOM_SCORE_ADJ_MAX
)[…]
For backwards compatibility with previous kernels,
/proc/<pid>/oom_adj
may also be used to tune the badness score. Its acceptable values range from -16
I went ahead and checked the cluster for values and noticed that when the workload is running in the cluster under the memory limits, the proc files related to OOM have some values that my local machine don’t have.
# cat /proc/1/oom_score
«something close to 800»
# cat /proc/self/oom_score_adj
«something close to 1000»
Following the docs and this, then I should write to
/proc/<pid>/oom_score_adj
the value of 1000
to make the OOM unforgiving
(just like the cluster).
I ended up with the following script:
# Run the workload...
docker-compose -f docker-compose.yml run workload bin/puma -C config/puma.rb | grep 'Worker '
In another shell then:
docker container update --memory 1G --memory-swap 1025M $(docker ps --filter name=container --format "{{ .Names }}")
docker inspect $(docker ps --filter name=container --format "{{ .Names }}") | grep -i memory
"Memory": 1073741824,
"MemoryReservation": 536870912,
"MemorySwap": 1074790400,
"MemorySwappiness": 0,
docker exec -it $(docker ps --filter name=container --format "{{ .Names }}") bash
# inside the container: adjust oom proc file for all puma processes (even the started ones).
for pid in $(ps aux | grep puma | grep -v grep | perl -lane 'print "@F[1]"'); do echo 1000 > /proc/$pid/oom_score_adj; done
# Yes, perl is everywhere
And sure enough, I finally could do it! I saw puma restarting! 🎉
[1] - Worker 1 (PID: 206) booted in 0.06s, phase: 0
[1] - Worker 1 (PID: 301) booted in 0.06s, phase: 0