Extracting a Heap Dump from a running OpenJ9 Java process in Kubernetes

Posted by Joep Weijers on November 26, 2020

At TOPdesk we run many Java services in our Service Architecture. We use the AdoptopenJDK OpenJ9 JRE as base image for our services running in Kubernetes. In this post we will find out how to get a Java Heap Dump from a Java application running on OpenJ9 JRE.

A Java Heap Dump is a snapshot of all the objects that are in memory in the JVM at a certain moment. Typically, a heap dump is created at the moment a Java application crashes because it runs out of memory. The heap dump can then show you what the application was doing in its dying moments, providing insight into potential memory leaks.

Likewise, you can also extract a heap dump from a running JVM. This is a useful technique to peak under the hood of a service that is running with abnormal memory usage, but that is not running out of memory (yet). For us at TOPdesk, this is a very valuable tool to investigate performance issues.

The JVM supplies the Attach API to allow external tools to attach to the JVM. Profiling tools like jmap, jcmd and JVisualVM use that API to monitor and troubleshoot the Java process inside JVM.

Unfortunately, we can’t use any of these tools in our containers. JVisualVM is the user friendliest, as it provides a simple graphical user interface, but is not usable from within our headless Docker containers. jmap is not an officially supported tool and is not part of the OpenJ9 distribution. And jcmd is only available in the JDK variant of OpenJ9, not in the JRE.

So we need a way to make the jcmd tool available in an already running OpenJ9 JRE container.

Attempt 1: Using a Kubernetes Ephemeral Debug Container

Kubernetes 1.16 introduces an alpha feature called Ephemeral Containers: a special type of container that runs temporarily in an existing Pod to accomplish user-initiated actions such as troubleshooting.

Could we use such an ephemeral container containing the OpenJ9 JDK to attach to a container running a service on OpenJ9 JRE?

In this example we will use a Docker container running a simple Hello World webserver: topdesk/example-openj9-web-service:1.0.0. You can find the code of this example service in the GitHub repository: https://github.com/TOPdesk/example-openj9-web-service.

Starting the web service container

Let’s start by firing up a minikube cluster. Since ephemeral containers are still an alpha version the EphemeralContainers feature gate has to be enabled:

minikube start --feature-gates=EphemeralContainers=true

Now we are going to start a pod with our example-openj9-web-service:

kubectl run example-openj9-web-service --image=topdesk/example-openj9-web-service

If we exec into the pod, we can verify the Java version is indeed OpenJ9 JRE. And we also see that jcmd is not present in this image:

kubectl exec -it example-openj9-web-service -- sh

$ java -version
openjdk version "11.0.9" 2020-10-20
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.9+11)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.23.0, JRE 11 Linux amd64-64-Bit Compressed References 20201022_810 (JIT enabled, AOT enabled)

$ which java

$ ls /opt/java/openjdk/bin/
java jitserver jjs jrunscript keytool pack200 rmid rmiregistry unpack200

$ jcmd
sh: 4: jcmd: not found

Starting the ephemeral debug container

Now we are going to attach an ephemeral container to the example-openj9-web-service container. We use the OpenJ9 JDK image, are targeting the process namespace of the other container and start the sh command:

kubectl alpha debug -it example-openj9-web-service --image=adoptopenjdk:11.0.9_11-jdk-openj9-0.23.0 --target=example-openj9-web-service -- sh

We can use ps to list all running processes and see the web service of the example-openj9-web-service container running:

$ ps -ef
UID          PID    PPID  C STIME TTY          TIME CMD
999            1       0  0 16:34 ?        00:00:00 /bin/sh -c java $JAVA_OPTS -jar /opt/webservice/webservice.jar
999            7       1  0 16:34 ?        00:00:00 java -jar /opt/webservice/webservice.jar
root         251       0  0 17:08 pts/0    00:00:00 sh
root         280     251  0 17:08 pts/0    00:00:00 ps -ef

The additional tools of the JDK are now at our disposal. Let’s use jcmd to create a heap dump of our Java process with PID 7:

$ jcmd 7 GC.heap_dump /tmp/heapdump
Error getting data from 7: Exception connecting to 7

It seems we can’t connect to the JVM. Let’s use jps to list all the running JVMs:

$ jps -l
327 jdk.jcmd/openj9.tools.attach.diagnostics.tools.Jps

That is odd, we only see the tool itself running. So we seem unable to reach the JVM in the container we are attached to. Are we doing something wrong here?

Attempt 2: Switching user in the ephemeral debug container

According to the jps documentation “the tool shows information for every Java process that is owned by the current user ID on the current host”. We are currently running as root, but the Java process is running as user 999. Maybe it works if we create a user 999 and run jps under that user?

$ apt-get update
$ apt-get install -y sudo
$ useradd --no-create-home --uid 999 debuguser
$ sudo su debuguser
$ /opt/java/openjdk/bin/jps -l
613 jdk.jcmd/openj9.tools.attach.diagnostics.tools.Jps

Unfortunately, we still can’t connect to the JVM running in the example-openj9-web-service container. I am not sure whether that is caused by something in Kubernetes’ process sharing between the main container and debug container, or maybe the JDK tooling can’t handle this situation.

Attempt 3: Downloading the JDK into the container

A completely different approach is to copy the debug tools into your container. For instance, by downloading the JDK into the running web service container. After downloading and unzipping, you can use jps and jcmd directly:

$ cd /tmp
$ curl -L https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.9%2B11_openj9-0.23.0/OpenJDK11U-jdk_x64_linux_openj9_11.0.9_11_openj9-0.23.0.tar.gz --output jdk.tgz
$ tar -zxvf jdk.tgz
$ jdk-11.0.9+11/bin/jps -l
687 jdk.jcmd/openj9.tools.attach.diagnostics.tools.Jps
7 /opt/webservice/webservice.jar
$ jdk-11.0.9+11/bin/jcmd 7 GC.heap_dump /tmp/heapdump
Dump written to /tmp/heapdump

Success! We have our heap dump. However, we also polluted our container with a JDK. We can of course remove the JDK files afterwards. But the concept of ephemeral debug containers, that disappear when you are done debugging, is much more appealing. If you know of a way to do heap dumps on running containers using ephemeral debug containers, please reach out!

About the author: Joep Weijers

Joep is a Developer Experience Engineer at TOPdesk with a keen interest in delivering quality software continuously. He loves playing around with Jenkins Pipelines, GitLab CI, Selenium, Docker, Kubernetes and keeps in touch with his inner developer by educating his colleagues on testable Java code.

More Posts - Website