Assessing web application performance on single-core machine

Parth Mistry
5 min readOct 10, 2022

Modern web applications developed on top of non-blocking I/O facilities can easily serve tens of thousands of concurrent requests per host (with moderate sized hardware). Together with horizontal scalability, scenarios where applications serve millions of concurrent requests are not new. Each web application instance being deployed is supposed to handle different amount of concurrent load depending on use cases and it needs to be deployed on machine with suitable hardware. Sometimes these applications are deployed on predetermined over-sized hardware which could result into unnecessary infrastructure usage cost. Assessing optimal hardware size for running an application could be important in such cases. I often wonder how much concurrent load a single-core machine can handle and hence writing this post based on my experiments

Test Scenario
For assessing baseline performance on single-core machine, I am taking example of basic demo web application having RESTful service endpoint. Usually these kind of applications don’t perform CPU-intensive calculations, rather they rely on external data sources and/or other applications to exchange data and perform some in-memory processing on data as per the required use cases. I have developed 2 applications (one with Java and another with Rust) having one RESTful service endpoint /api/demo. These applications are deployed on a host with single-core CPU. To serve a single request - implementation in this endpoint makes 8 different sequential REST API calls to another application running on remote host (to mimic requests to external systems). This another application is running on host with sufficient hardware to handle any number of concurrent requests which could potentially come from demo application running on single-core host. I have used Gatling application as load-testing tool which runs various load testing scenarios from yet another host. All 3 hosts are running as part of same virtual network in the cloud. Following diagram represents how applications are deployed across hosts -

Software Stack
Operating System : Ubuntu 22.04
Amazon Corretto JDK (17.0.4)
Rust 1.63.0

Java application is developed using Reactive spring boot framework and it is running on top of Netty. Rust application is developed using Axum (Web application framework) and Reqwest (HTTP client), both utilizes and runs on top of Tokio runtime. Source code for demo application and gatling load test scenarios is located at single-core-performance-test

Before capturing any stats from load testing scenarios, applications running on single-core machine are warmed up with 120 requests/sec for 10 minutes. Of course Rust application doesn’t need such extensive amount of warm up (indeed doesn’t need any warm up at all), but just for sake of maintaining consistency between executing load testing scenarios on Java and Rust demo applications, I execute warm up scenarios from gatling app in all cases. Actual load testing scenarios are executed for 5 minutes and statistics are captured. For different load testing scenarios, concurrent requests/sec is adjusted in such a way that CPU on single-core machine remains idle for around 10% - 15% throughout the run. Here are the stats of load testing scenarios when sending concurrent requests with establishing new connection for each request -

Load test stats - new connection for each request

When load testing scenario is sending concurrent HTTP requests with new connection for each request, resulting stats looks good considering single-core machine and as we can see Rust app is able to handle ~1.8 times more requests per second over same hardware resources. But if we look at load testing scenario stats for sending concurrent HTTPS requests with new connection for each request, Java app is able to handle constant load of just 195 requests/sec. Even if we see performance of natively generated code in Rust app, it is able to handle constant load of 540 requests/sec. We can see that establishing new SSL/TLS connection involves lot of overhead due to necessary initial handshaking between client and server. In this example - load test scenario is establishing new connection for each request, however in real-world if multiple requests are coming from same browser or from some sort of HTTP client instance (programmatically) - usually subsequent requests can reuse connection from pool of already established connections which are idle. Following load testing scenario stats are based on such use case where initially set of persistent connections are established and subsequent requests are sent to the server with 1 second delay from each established connection -

Load test stats - reuse persistent connections

As we can see when sending new requests by reusing already established connections - stats are comparatively better than establishing new connection for each request. Even if we consider scenarios with HTTPS requests - with Java app it is able to handle constant load of ~1100 requests/sec and with Rust app it is able to handle constant load of ~1950 requests/sec

In real-world - usually there is a mixed incoming traffic with some requests are establishing new connections and some requests are received on already established connections. We can expect this single-core machine can handle constant load of concurrent requests in range of the observed stats between above 2 scenarios. Remember these stats are specific to demo web application use case I have considered in this post. In my opinion - single-core machines are very much capable of serving few hundreds to thousands of requests per second and if that is not sufficient - horizontal and/or vertical scalability can always be implemented

It’s “Java and Rust” not “Java vs Rust”
Even though demo application developed with Rust outperforms application developed with Java particularly in all the scenarios discussed in this post, This post shouldn’t be seen as Java vs Rust. Performance usually depends on implementation of specific library/framework and runtime being used to run application. Different implementations usually handle different amount of complexities and offers varying degree of flexibilities. For example, Spring framework is there since couple of decades and offers lots of features and flexibilities to develop web applications. Compared to that Axum is quite new and nowhere flexible and feature rich compared to spring framework, but it provides necessary features at least to develop demo app in this post and performs a lot better than demo app running on Java. Developing an app with Rust takes time and thinking especially if developer is having past background in some language with garbage collector. Apart from that JIT dynamic compilation can generate highly efficient machine code with CPU specific instruction set, speculative optimization, dynamic inlining etc and in some cases this dynamically generated machine code could run faster than static native code generated once at compile time. I usually consider Java and Rust as just tools at my disposal to solve problems and depending on use case one might be better fit than another

--

--

Parth Mistry

Enterprise Application and BigData engineering enthusiast. Interested in highly efficient low-cost app design and development with Java/Scala and Rust 🦀