Tuning linux network stack - with examples - Part 2

sudo sysctl -w net.ipv4.tcp_window_scaling=1
sudo sysctl -p
sudo sysctl -w net.ipv4.tcp_window_scaling=0
sudo sysctl -p
sudo sysctl -w net.core.rmem_default="65535"
sudo sysctl -w net.core.wmem_default="65535"
sudo sysctl -w net.core.rmem_max="<BUFFER_SIZE>"
sudo sysctl -w net.core.wmem_max="<BUFFER_SIZE>"
sudo sysctl -w net.ipv4.tcp_rmem="4096 65535 <BUFFER_SIZE>"
sudo sysctl -w net.ipv4.tcp_wmem="4096 65535 <BUFFER_SIZE>"
sudo sysctl -p
BDP (bits) = bandwidth (bits/second) * latency (seconds)
$ sysctl net.ipv4.tcp_mem
net.ipv4.tcp_mem = 190791 254389 381582
$ getconf PAGESIZE
$ cat /proc/net/sockstat
sockets: used 165
TCP: inuse 5 orphan 0 tw 0 alloc 6 mem 1054
UDP: inuse 3 mem 1
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
  1. Server app is up and listening for incoming connections. Once connections are accepted - writing to established socket connections won’t start until second stage starts. Server app remains in this stage for the duration specified by conn_wait (default=15s)
  2. Server app starts writing (in parallel) to all established socket connections accepted in previous stage until write buffers are full. Server app remains in this stage for the duration specified by write_wait (default=15s)
  3. Server app remains idle in this stage and allows us to monitor overall system memory usage for TCP and per socket network buffer usage using other tools. Server app remains in this stage for the duration specified by close_wait (default=30s)
min: 4180 (0.004 mb)
max: 338736 (0.323 mb)
average: 105136.6496 (0.1 mb)
sum: 1051366496 (1002.661 mb)
25th percentile: 61768 (0.059 mb)
50th percentile: 95744 (0.091 mb)
75th percentile: 134256 (0.128 mb)
85th percentile: 168232 (0.16 mb)
95th percentile: 237832 (0.227 mb)
99th percentile: 333576 (0.318 mb)
min: 215004 (0.205 mb)
max: 338112 (0.322 mb)
average: 298133.2868 (0.284 mb)
sum: 2981332868 (2843.221 mb)
25th percentile: 236184 (0.225 mb)
50th percentile: 321748 (0.307 mb)
75th percentile: 333576 (0.318 mb)
85th percentile: 335256 (0.32 mb)
95th percentile: 335256 (0.32 mb)
99th percentile: 338112 (0.322 mb)
  • When considering public facing servers and having clients with varying amount of latencies and bandwidths — there are no fixed values available to calculate optimal BDP and usually medium sized buffers like 2-8 MB could be appropriate depending on available system memory and expected concurrent clients. Other network monitoring tools can be used further to measure latency statistics of the connected clients and based on that buffer sizes can be adjusted over time
  • If packet loss is frequent in the network than buffer sizes can be kept lower to avoid frequent cycles of re-transmission for the lost packets
  • If we are considering server-to-server communications and bandwidth is high and so is latency, we can use large sized buffers of 12-32 MB or more (depending on usecase). Doing that could result in higher memory usage for the TCP buffers and upper bounds for overall TCP memory usage should be tuned properly so that if other memory intensive applications are running in the same host then they get enough memory without crashing
  • When clients and servers are part of the same network or having latency lower than 2-6 ms then TCP buffer sizes can be kept very low (i.e. less than 200K). Depending on usecase - window scaling can also be disabled and max buffer sizes can be kept to 64K without losing any network throughput. This minimizes TCP buffer memory utilization overheads and available system memory can be utilized for other purposes



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Parth Mistry

Parth Mistry


Enterprise Application and BigData engineering enthusiast. Interested in highly efficient low-cost app design and development with Java/Scala and Rust 🦀