Ngnix Web Server Best Practices(Used by Ngnix)
Can serve upto 56k concurrent connections (solves C10k problem) https://www.aosabook.org/en/nginx.html
Use | How |
---|---|
SO_REUSEPORT(also called Kernel Socket Sharding) | Multiple threads can listen on (same socket+ip) combination. Load is spread evenly across multiple threads. |
NON_BLOCKING, ASYNC sockets. | |
SSL-Terminators and Load Balancers. | |
Skip kernel stack |
Write custom driver to send pkt directly to Application, kernel stack is complicated and slow How fast can this be? Intel has a benchmark where the process 80 million packets/second on a fairly lightweight server in user mode. |
MULTI-CORE |
Some code gets slower on running with cores because code is written badly. We want to get faster as we add more cores. Recommendations: Keep Data structures/core, Avoid atomic operation(those are expensive), Lock-free data structures |
COMPRESS DATA | Use cache friendly data |
MULTI-THREADED | What's thread model(Pipelined or worker?). Set processor affinity |
EVENT TRIGGERED LIBRARIES: (Eg: libtevent) |
Use ET-libraries in-place of poll() or select().
poll() is better than select(). select can only monitor 1024(FD_SETSIZE) sockets. Why ET Libraries? offer significant performance advantages, especially when dealing with a large number of concurrent connections |
LOAD BALANCING IN WORKER THREADS |
Ngnix defines accept_mutex directive(if enabled), worker processes will accept new connections by turn. Else all worker processes will be notified about new connections, and if volume of new connections is low, some of the worker processes may just waste system resources |
NON_BLOCKING STATE MACHINE |
What is state machine? State machine like pre-defined rules for chess on server. Every HTTP transaction has pre-defined action and state to which server should transition Each worker-thread can serve 1000s of web-clients. Ngnix provides separate state machines for different protocols. Eg: HTTP, POP, IMAP, SMTP |
TUNING KERNEL & NGNIX KERNEL |
a. net.core.somaxconn: Maximum number of connections that can be queued for acceptance. Default=128, Maximum value=65535 b. net.core.netdev_max_backlog: Rate at which packets are buffered by the network card before being handed off to the CPU. Default is 128, Maximum value: 65535 c. sys.fs.file-max: no of opened files. Maximum: 1MB |
SCALING WEB SERVER FOR 10 Million(1 Crore) CONNECTIONS (Solves C10M problem)
Use | How |
---|---|
Control Plane | Seperating the Control Plane. All functions that determine which path to use for sending packet. Routing protocols, spanning tree, ldp, |
Data/Forwarding Plane | Seperating the Data/Forwarding Plane. All functions/processes that forward packets from one interface to another. |
Why Seperation of Control & Data Plane is needed
-
Because Kernel is doing both that is problem GIVE CONTROL PLANE TO KERNEL & DATA PLANE TO APPLICATION
- 1. Data Plane on Application
1a. Packet Handling (Don’t let kernel handle the packets, pass directly to application):
Write Your Own Custom Driver To Bypass The Stack
kernel stack is complicated and slow PF_RING, Netmap, Intel DPDK (data plane development kit) drivers. Intel has benchmark of 80 million packets/sec on lightweight server
1b. Memory Management:
Preallocate all memory all at once on startup.
Reduces page table size.
Co-locate Data: Don’t place data on different parts of memory via pointers. Each time you follow a pointer it will be a cache miss.