OOM KIller survival guide (linux kernel 2.6 or higher)

01-02-2022 | 2 min to read

Why was OOM killer introduced in operating systems?

The latest updates of Linux operating systems contain a memory management feature to prevent overuse by processes.

The reason for this new feature is dictated by the fact that if one or more processes use the memory, by exceeding the available RAM memory, the Operating System intervenes using the "swap area" resulting in a general slowdown of the whole system, very often making it unusable.

OOM killer tries to overcome this problem by putting an end to processes that go beyond certain limits while safeguarding the functioning of the operating system and the remaining processes.

The problem actually arises from the design of programs that often do not take into account the capabilities of the system or from the massive use into applications of third-party components that make it impossible to control and predict the usages of resources during runtime.

Why do Oplon products not need the OOM killer?

The Oplon suite, and the ADC in particular, is a machine designed to work in real-time. At the start the ADC allocates 90% of the memory it will need, this is because it has been designed from the beginning to work in environments with "finite resources", so it does not even need the "swap area".

In this context, Oplon ADC does not need to have an OOM killer system if the sizing has been done properly. In any case, Oplon ADC benefits from the OOM killer features because, if the sizing has not been done correctly, the processes are stopped immediately at start. If, on the other hand, the processes start they will never be stopped during runtime precisely because they do not allocate more memory even if subjected to stress, they allocated it all at the beginning!

The OOM killer configuration must be set as indicated below. We invite you not to generalize this configuration for other environments not produced by Oplon Networks because they may not have the same design assumptions as our products.

Please note: From the next Oplon versions in roadmap, 10.1.7 and 11.0.0, they will already have these parameters inside them. These settings are to be referred to the previous versions in case of an upgrade of the operating systems.

Prerequisites of Oplon Networks products

  • They are designed as real-time systems
  • They are designed as finite resource systems
  • They do not require swap area

Parameters to modify

  • vm.oom-kill: enable disable the OOM killer system (at the date we write this parameter it does not work)
  • vm.overcommit_memory: is the type of algorithm used to determine the exceeding of the limits and subsequent intervention of the OOM killer
  • vm.overcommit_ratio: is the utilization percentage which includes RAM memory + swap area

Values to use

  • vm.oom-kill = 0 (disable overcommit)
  • vm.overcommit_ratio = 200 (percentage threshold of RAM + swap usage, the default is 50%)
  • vm.overcommit_memory = 2 (threshold crossing verification algorithm set through vm.overcommit_ratio, the default is 50%)

System file to modify

/etc/sysctl.conf

vm.oom-kill=0
vm.overcommit_ratio=200
vm.overcommit_memory=2

In any case, when Oplon starts, a setup of the main parameters is performed, modify the file:

/TCOProject/bin/LBL/LBL_HOME/lbloptenv.sh

# prevent out of memory
sysctl -w vm.oom-kill = 0
sysctl -w vm.overcommit_ratio = 200
sysctl -w vm.overcommit_memory = 2

Warning: the spaces between the parameter and the = are important, in the file /etc/sysctl.conf NOT there must be spaces between name = value, in the script file /TCOProject/bin/LBL/LBL_HOME/lbloptenv.sh instead there must be spaces between sysctl -w name = value

References

https://www.kernel.org/doc/gorman/html/understand/understand016.html

https://www.kernel.org/doc/Documentation/vm/overcommit-accounting