Checking performance of different SSL/TLS implementations for Java Applications
Java comes with built-in implementation of platform independent crypto algorithms which are used by default to handle SSL/TLS traffic. However this Java based built-in implementation is deemed slow and CPU intensive for many years. Various libraries are available which are using platform specific native implementation of these algorithms and can be used to deal with the performance and scalability problems associated with the built-in one.
In this post I will check for the relative performance differences between different implementations by measuring various performance metrics about server-side HTTPS request handling. I will be using different implementations like Java based built-in implementation, Tomcat Native Fork for Netty (https://github.com/netty/netty-tcnative) and Amazon Corretto Crypto Provider (https://github.com/corretto/amazon-corretto-crypto-provider) with both Java 8 and Java 17.
Server-side application which is being tested with different implementation libraries is Reactive Spring Boot application which is serving 1 MB static image file when requested on particular URL. Gatling application is used as load testing tool. Spring Boot application and Gatling applications are running on different virtual machines in the cloud. Linux mpstat tool is used to capture CPU Utilization on the virtual machine where Spring Boot application is running.
Server-side Software Stack
Operating System : Ubuntu 20.04
Amazon Corretto JDK (1.8.0_322 and 17.0.1)
Reactive Spring Boot (2.6.2)
TC-Native Library (netty-tcnative-boringssl-static:2.0.47.Final)
ACCP Library (1.6.1:linux-x86_64)
Server-side JVM is running with maximum of 2 GB heap size with default garbage collector as per the JDK version. Before each test run, it is warmed up by sending 120 requests/second for 10 minutes. These requests are not monitored and not captured in below statistics. Actual test results are collected by sending 120 requests/second for 5 minutes duration
Here is a load test summary result after testing with both Java 8 and Java 17
As we can see with Java 8 when using native library implementations - mean response time is reduced by roughly 46% and response time statistics at 95th percentile is reduced by roughy 29%. With Java 17, performance metrics looks almost same for all cases and sometimes better when using built-in implementation, this shows impressive amount of optimizations done over the years in JVM itself
Let’s look at CPU utilization measured during load-test period across different runs
With Java 8 built-in implementation, average CPU usage is at 86% and with ACCP it is about 42% and with Netty TC-Native CPU usage is just about 33% for the same amount of workload and exactly same application code - this shows huge performance benefits of using native implementation compared to built-in one when running applications on Java 8
With Java 17, we can see there is no major difference in CPU usage across runs but there are interesting things to observe. As for the given workload, with built-in implementation average CPU usage is 37% which is slightly lower than ACCP with average CPU usage 42%. With Netty TC-Native - application runs with lowest average CPU usage at 32%
Less CPU usage over same amount of workload means there is more room for serving more requests per second or utilizing available CPU cycles for other tasks. Considering that -
Using Netty TC-Native implementation with the application gives best performance both in terms of throughput and lower CPU utilization with both Java 8 and Java 17 - but application should be communicating over network through netty to utilize those benefits.
ACCP implementation provides drop-in replacement for built-in implementation of algorithms. With the workload desribed in this post, its performance is little bit less when running application over Java 17. But if application is running on Java 8 and not communicating over network through netty then it could be the choice for the best utilization of resources.