Beware of calling long running native code together with high throughput workload in the same JVM
Sometimes there is a need to call native library functions (through JNI) from the application running on Java. For example, there are libraries available with fast native implementation algorithms like compression, cipher-suites, machine learning etc related use cases where processing can be done without all JVM overheads, in other cases we need to reuse existing piece of native code without porting it to Java. Usually these kind of libraries are used to improve performance of a specific task of the application, but if its impact on other parts of application is unknown then overall application performance may suffer.
One specific problem we are going to look in this post is - while long running native code is in progress, other threads might go on hold if they need garbage collection to be performed in order to proceed further. This is not always a problem when native function is called, but only if native code is using some specific functionality, for example - https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/functions.html#GetPrimitiveArrayCritical_ReleasePrimitiveArrayCritical
In this post we will use Zstd compression library (https://github.com/luben/zstd-jni) for Java application and see impact of calling it. For demonstration purpose I have created single application which spawns 2 threads for performing following tasks
1) repeatedly compress
120 MB json file using Zstd library. This task keeps repeating until task 2 is complete
2) repeatedly count number of lines in a
~5 GB log file (
~38 Million log lines). This task is executed 10 times and after that final statistics related to performance are captured
All tests are done on
2 Core machine. CPU usage statistics are captured using
mpstat linux tool
For file compression task, each iteration takes about
9–10 secondsto generate compressed output. This performance remains same whether or not other task is running in the JVM. During the runs, the task utilizes ~100% of single CPU through out the run.
Counting a log line is GC intensive task as it has to create new String object for each line read (excluding any other internal objects and details). I will mainly focus on this task’s performance.
When running only log line counting task in an application, each iteration completes in about
5–6 seconds with processing
~6726 lines/millisecond. During the runs, the task utilizes ~100% of single CPU through out the run.
When both tasks are running together in same JVM in separate threads, log line counting task quickly fills up heap with a lot of small short-living objects and waits for GC to be performed before continuing further. However while native code is running with using specific feature which can block GC execution, GC execution will remain on hold too till usage of specific feature in native function is complete. In our case, until whole compression operation is complete (about
9–10 seconds), GC can’t be performed and log line counting thread won’t be able to proceed even though half of the CPU cycles are almost free.
When compression operation is complete, GC is performed and log line counting thread can continue further. Again it will start filling up heap quickly and will go on hold until GC is performed, and GC will not run until current compression task is complete. We can see overall 2 Core CPU usage pattern in below graph -
Since log line counting task goes on hold frequently, its performance is largely impacted. In this scenario each iteration takes about
57 seconds and processing only
For the first task if I add
2 secondsdelay between execution of next iteration, it gives log counting task some room to proceed quickly during that duration. But as soon as next compression task is started and GC needs to be performed for log counting task, it goes on hold again. This time each iteration of log line counting task completes in about
24 seconds and processing bit faster with
Throughput of log line counting task is largely affected due to GC can’t be performed while compression task is in progress. This constraint is when running a log line counting task together with a compression task in the same JVM. But if we execute both tasks in separate JVMs on the same host, then each iteration of log line counting task completes in just
8-9 seconds and able to process much faster with
~4234 lines/millisecond. And during the runs, this task is able to utilize ~100% of single CPU through out the run and main aspect is that it does not go on hold. Performance is still relatively low compared to running only log counting task on the host - due to I/O operations are sharing hardware and both CPU cores are roughly utilized with full capacity.
Calling a long running native code from Java application will not always cause a problem. Performance may degrade only if native code is using some specific JNI feature which can block GC execution and other threads rely on GC to be performed to continue further. There could be other such issues too, so necessary impact analysis should be done and if it is affecting performance we should isolate such task in separate JVM instance