Main, Operating System, Redhat / CEntOS / Oracle Linux

Standard Linux Tuning

Hello Bloggers,

Majority of the applications these days are deployed on (Debian / Redhat) Linux Operating System as the Base OS.

I Would like to share some generic tuning that can be done before deploying any application on it.

Index Component Question / Test / Reason  
Network
  These are some checks to validate the network setup.
[ Network Are the switches redundant?
Unplug one switch.
Fault-tolerance.
 
  Network Is the cabling redundant?
Pull cables.
Fault-tolerance.
 
  Network Is the network full-duplex?
Double check setup.
Performance.
 
       
Network adapter (NIC) Tuning
  It is recommended to consult with the network adapter provider on recommended Linux TCP/IP settings for optimal performance and stability on Linux.

There are also quite a few TCP/IP tuning source on the Internet such as http://fasterdata.es.net/TCP-tuning/linux.html

  NIC Are the NIC fault-tolerant (aka. auto-port negotiation)?
Pull cables and/or disable network adapter.
Fault-tolerance.
 
  NIC Set the transmission queue depth to at least 1000.

txqueuelen <length>

‘cat /proc/net/softnet_stat’

Performance and stability (packet drops).

 
  NIC Enable TCP/IP offloading (aka. Generic Segment Offloading (GSO)) which was added in kernel 2.6.18

See: http://www.linuxfoundation.org/en/Net:GSO

Check: ethtool -k eth0

Modify: ethtool -K <DevName>eth

lsb
Performance.

Note: I recommend enabling all supported TCP/IP offloading capabilities on and EMS host to free CPU resources.

 
  NIC Enable Interrupt Coalescence (aka. Interrupt Moderation or Interrupt Blanking).

See: http://kb.pert.geant.net/PERTKB/InterruptCoalescence

Check: ethtool -c eth0

Modify: ethtool -C <DevName>

Performance.

Note: The configuration is system dependant but the goal is to reduce the number of interrupts per second at the ‘cost’ of slightly increased latency.

 
       
TCP/IP Buffer Tuning
  For a low latency or high throughput messaging system TCP/IP buffer tuning is important.
Thus instead of tuning the defaults values one should rather check if the settings (sysctl –a) provide large enough buffer The values can be changed via the command sysctrl –w <name> <value>.

The below values and comments were taken from TIBCO support FAQ1-6YOAA) and serve as a guideline towards “large enough” buffers, i.e. if your system configuration has lower values it is suggested to raise them to below values.

  TCP/IP Maximum OS receive buffer size for all connection types.

sysctl -w net.core.rmem_max=8388608

Default: 131071
Performance.

 
  TCP/IP Default OS receive buffer size for all connection types.

sysctl -w net.core.rmem_default=65536

Default: 126976
Performance.

 
  TCP/IP Maximum OS send buffer size for all connection types.

sysctl -w net.core.wmem_max=8388608

Default: 131071
Performance.

 
  TCP/IP Default OS send buffer size for all types of connections.

sysctl -w net.core.wmem_default=65536

Default: 126976
Performance.

 
  TCP/IP Enable/Disable TCP/IP window scaling enabled?

sysctl net.ipv4.tcp_window_scaling

Default: 1

Performance.
Note: As Applications set buffers sizes explicitly this ‘disables’ the TCP/IP windows scaling on Linux. Thus there is no point in enabling it though there should be no harm on leaving the default (enabled). [This is my understanding / what I have been told but I never double checked and it could vary with kernel versions]

 
  TCP/IP TCP auto-tuning setting:

sysctl -w net.ipv4.tcp_mem=’8388608 8388608 8388608′

Default: 1966087 262144 393216
Performance.

The tcp_mem variable defines how the TCP stack should behave when it comes to memory usage:

–          The first value specified in the tcp_mem variable tells the kernel the low threshold. Below this point, the TCP stack does not bother at all about putting any pressure on the memory usage by different TCP sockets.

–          The second value tells the kernel at which point to start pressuring memory usage down.

–          The final value tells the kernel how many memory pages it may use maximally. If this value is reached, TCP streams and packets start getting dropped until we reach a lower memory usage again. This value includes all TCP sockets currently in use.

 
  TCP/IP TCP auto-tuning (receive) setting:

sysctl -w net.ipv4.tcp_rmem=’4096 87380 8388608′

Default: 4096 87380 4194304
Performance.

The tcp_rmem variable defines how the TCP stack should behave when it comes to memory usage:

–          The first value tells the kernel the minimum receive buffer for each TCP connection, and this buffer is always allocated to a TCP socket, even under high pressure on the system.

–          The second value specified tells the kernel the default receive buffer allocated for each TCP socket. This value overrides the /proc/sys/net/core/rmem_default value used by other protocols.

–          The third and last value specified in this variable specifies the maximum receive buffer that can be allocated for a TCP socket.”

 
  TCP/IP TCP auto-tuning (send) setting:

sysctl -w net.ipv4.tcp_wmem=’4096 65536 8388608′

Default: 4096 87380 4194304
Performance.

This variable takes three different values which hold information on how much TCP send buffer memory space each TCP socket has to use. Every TCP socket has this much buffer space to use before the buffer is filled up.  Each of the three values are used under different conditions:

–          The first value in this variable tells the minimum TCP send buffer space available for a single TCP socket.

–          The second value in the variable tells us the default buffer space allowed for a single TCP socket to use.

–          The third value tells the kernel the maximum TCP send buffer space.”

 
  TCP/IP This will ensure that immediately subsequent connections use these values.

sysctl -w net.ipv4.route.flush=1

Default: Not present

 
       
TCP Keep Alive
  In order to detect ungracefully closed sockets either the TCP keep-alive comes into play or the EMS client-server heartbeat. Which setup or which combination of parameters works better depends on the requirements and test scenarios.

As the EMS daemon does not explicitly enables TCP keep-alive on sockets the TCP keep-alive setting (net.ipv4.tcp_keepalive_intvl, net.ipv4.tcp_keepalive_probes, net.ipv4.tcp_keepalive_time) do not play a role.

  TCP How may times to retry before killing alive TCP connection. RFC1122 says that the limit should be longer than 100 sec. It is too small number. Default value 15 corresponds to 13-30 minutes depending on retransmission timeout (RTO).

sysctl -w net.ipv4.tcp_retries2=<test> (7 preferred)

Default: 15

Fault-Tolerance (EMS failover)

The default (15) is often considered too high and a value of 3 is often felt as too ‘edgy’ thus customer testing should establish a good value in the range between 4 and 10.

 
       
Linux System Settings
  System limits (ulimit) are used to establish boundaries for resource utilization by individual processes and thus protect the system and other processes. A too high or unlimited value provides zero protection but a too low value could hinder growth or cause premature errors.
  Linux Is the number of file descriptor at least 4096

ulimit –n

Scalability

Note: It is expected that the number of connected clients and thus the number of connections is going to increase over time and this setting allows for greater growth and also provides a greater safety room should some application have a connection leak. Also note that the number of open connection can decrease system performance due to the way the OS handles the select() API. Thus care should be taken if the number of connected clients increases over time that all SLA are still met.

 
  Linux Limit maximum file size for EMS to 2/5 of the disk space if the disk space is shared between EMS servers.

ulimit –f

Robustness: Contain the damage of a very large backlog.

 
  Linux Consider limiting the maximum data segment size for EMS daemons in order to avoid one EMS monopolizing all available memory.

ulimit –d

Robustness:  Contain the damage of a very large backlog.

Note: It should be tested if such a limit operates well with (triggers) the EMS reserved memory mode.

 
  Linux Limit number of child processes to X to contain rouge application (shell bomb)

ulimit –u

Robustness: Contain the damage a rogue application can do.
See: http://www.cyberciti.biz/tips/linux-limiting-user-process.html

This is just an example of a Linux system setting that is unrelated to TIBCO products. It is recommended to consult with Linux experts for recommended settings.

 
       
Linux Virtual Memory Management
  There are a couple of virtual memory related setting that play a role on how likely Linux swaps out memory pages and how Linux reacts to out-of-memory conditions. Both aspects are not important under “normal” operation conditions but are very important under memory pressure and thus the system’s stability under stress.

 

A server running EAI software and even more a server running a messaging server like EMS should rarely have to resort to swap space for obvious performance reasons. However considerations due to malloc/sbrk high-water-mark behavior, the behavior of the different over-commit strategies and the price of storage lead to above recommendation: Even with below tuning of EMS server towards larger malloc regions[1] the reality is that the EMS daemon is still subject to the sbrk() high-water-mark and is potentially allocation a lot of memory pages that could be swapped out without impacting performance. Of course the EMS server instance must eventually be bounced but the recommendation in this section aim to provide operations with a larger window to schedule the maintenance.

 

As theses values operate as a bundle they must be changed together or any variation must be well understood.

  Linux Swap-Space:                1.5 to 2x the physical RAM (24-32 GB )

Logical-Partition:        One of the first ones but after the EMS disk storage and application checkpoint files.

Physical-Partition:     Use a different physical partition than the one used for storage files, logging or application checkpoints to avoid competing disk IO.

 

 
  Linux Committing virtual memory:

sysctl -w vm.overcommit_memory=2

$ cat /proc/sys/vm/overcommit_memory

Default: 0

Robustness

 

Note: The recommended setting uses a new heuristic that only commits as much memory as available, where available is defined as swap-space plus a portion of RAM. The portion of RAM is defined in the overcommit_ratio. See also: http://www.mjmwired.net/kernel/Documentation/vm/overcommit-accounting and http://www.centos.org/docs/5/html/5.2/Deployment_Guide/s3-proc-sys-vm.html

 
  Linux Committing virtual memory II:

sysctl -w vm.overcommit_ratio=25 (or less)

$ cat /proc/sys/vm/overcommit_ratio

Default: 50

Robustness

Note: This value specifies how much percent of the RAM Linux will add to the swap space in order to calculate the “available” memory. The more the swap space exceeds the physical RAM the lower values might be chosen. See also: http://www.linuxinsight.com/proc_sys_vm_overcommit_ratio.html

 
  Linux Swappiness

sysctl -w vm.swappiness=25 (or less)

$ cat /proc/sys/vm/swappiness
Default: 60

Robustness

 

Note: The swappiness defines how likely memory pages will be swapped in order to make room for the file buffer cache.

 

Generally speaking an enterprise server should not need to swap out pages in order to make room for the file buffer cache or other processes which would favor a setting of 0. 

 

On the other hand it is likely that applications have at least some memory pages that almost never get referenced again and swapping them out is a good thing.

 
  Linux Exclude essential processes (Application) from being killed by the out-of-memory (OOM) daemon.

Echo “-17: > /proc/<pid>/oom_adj

Default: NA

Robustness

See: http://linux-mm.org/OOM_Killer and http://lwn.net/Articles/317814/

 

Note: With any configuration but overcommit_memory=2 and overcommit_ratio=0 the Linux Virtual Memory Ma­nagement can commit more memory than available. If then the memory must be provided Linux engages the out-of-memory kill daemon to kill process based on “badness”. In order to exclude essential processes from being killed one can set their oom_adj to -17.

 
  Linux 32bit Low memory area –  32bit Linux only

# cat /proc/sys/vm/lower_zone_protection
# echo “250” > /proc/sys/vm/lower_zone_protection

(NOT APPLICABLE)
To set this option on boot, add the following to /etc/sysctl.conf:
vm.lower_zone_protection = 250

 

See: http://linux.derkeiler.com/Mailing-Lists/RedHat/2007-08/msg00062.html

 
       
Linux CPU Tuning (Processor Binding & Priorities)
  This level of tuning is seldom required for Any Application solution. The tuning options are mentioned in case there is a need to go an extra mile. 
  Linux IRQ-Binding

Recommendation: Leave default
Note: For real-time messaging the binding of interrupts to a certain exclusively used CPU allows reducing jitter and thus improves the system characteristics as needed by ultra-low-latency solutions.

The default on Linux is IRQ balancing across multiple CPU and Linux offers two solutions in that real (kernel and daemon) of which only one should be enabled at most.

 
  Linux Process Base Priority
Recommendation: Leave default

 

Note: The process base priority is determined by the user running the process instance and thus running processes as root (chown and set sticky bit) increases the processes base priority.

And a root user can further increase the priority of Application to real-time scheduling which can further improve performance particularly in terms of jitter. However in 2008 we observed that doing so actually decreased the performance of EMS in terms of number of messages per second. That issue was researched with Novell at that time but I am not sure of its outcome.

 
  Linux Foreground and Background Processes

Recommendation: TBD

 

Note: Linux assigns foreground processes a better base priority than background processes but if it really matters and if so then how to change start-up scripts is a to-be-determined. 

 
  Linux Processor Set

Recommendation: Don’t bother

 

Note: Linux allows defining a processor set and limiting a process to only use cores from that processor set. This can be used to increase cache hits and cap the CPU resource for a particular process instance.

 

If larger memory regions are allocated the malloc() in the Linux glibc library uses mmap() instead of sbrk() to provide the memory pages to the process.

The memory mapped files (mmap()) are better in the way how they release memory back to the OS and thus the high-water-mark effect is avoided for these regions.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s