In this article we will discuss on how to limit CPU resources in Linux using cgroups and slice with some practical examples.
Now to start with this article, cgroup or Control Group provides resource management and resource accounting for groups of processes. Cgroups kernel implementation is mostly in non-critical paths in terms of performance. The cgroups subsystem implements a new Virtual File System (VFS) type named “cgroups
”. All cgroups actions are done by filesystem actions, like creating cgroups directories in a cgroup filesystem, writing or reading to entries in these directories, mounting cgroup filesystems, etc.
Few pointers on cgroups to limit resources
- cgroup is now integrated with systemd in recent Linux versions since kernel 2.6.24.
- Control group place resources in controllers that represent the type of resource i.e you can define groups of available resources to make sure your application like webserver has guaranteed claim on resources
- In order to do so, cgroup works with default controller which are cpu, memory and blkio
- These controllers are divided into tree structure where different weight or limits are applied to each branch
- Each of these branches is a cgroup
- One or more processes are assigned to a cgroup
- cgroups can be applied from the command line or from
systemd
- Manual creation happens through the
cgconfig
service and thecgred
process - In all cases, cgroup settings are written to
/sys/fs/cgroups
- Manual creation happens through the
# ls -l /sys/fs/cgroup/ total 0 drwxr-xr-x 2 root root 0 Nov 26 12:48 blkio lrwxrwxrwx 1 root root 11 Nov 26 12:48 cpu -> cpu,cpuacct lrwxrwxrwx 1 root root 11 Nov 26 12:48 cpuacct -> cpu,cpuacct drwxr-xr-x 2 root root 0 Nov 26 12:48 cpu,cpuacct drwxr-xr-x 2 root root 0 Nov 26 12:48 cpuset drwxr-xr-x 3 root root 0 Nov 26 12:50 devices drwxr-xr-x 2 root root 0 Nov 26 12:48 freezer drwxr-xr-x 2 root root 0 Nov 26 12:48 hugetlb drwxr-xr-x 2 root root 0 Nov 26 12:48 memory lrwxrwxrwx 1 root root 16 Nov 26 12:48 net_cls -> net_cls,net_prio drwxr-xr-x 2 root root 0 Nov 26 12:48 net_cls,net_prio lrwxrwxrwx 1 root root 16 Nov 26 12:48 net_prio -> net_cls,net_prio drwxr-xr-x 2 root root 0 Nov 26 12:48 perf_event drwxr-xr-x 2 root root 0 Nov 26 12:48 pids drwxr-xr-x 4 root root 0 Nov 26 12:48 systemd
These are the different controllers which are created by the kernel itself. Each of these controllers have their own tunables for example
# ls -l /sys/fs/cgroup/cpuacct/ total 0 -rw-r--r-- 1 root root 0 Nov 26 12:48 cgroup.clone_children --w--w---- 1 root root 0 Nov 26 12:48 cgroup.event_control -rw-r--r-- 1 root root 0 Nov 26 12:48 cgroup.procs -r--r--r-- 1 root root 0 Nov 26 12:48 cgroup.sane_behavior -r--r--r-- 1 root root 0 Nov 26 12:48 cpuacct.stat -rw-r--r-- 1 root root 0 Nov 26 12:48 cpuacct.usage -r--r--r-- 1 root root 0 Nov 26 12:48 cpuacct.usage_percpu -rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.cfs_period_us -rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.cfs_quota_us -rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.rt_period_us -rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.rt_runtime_us -rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.shares -r--r--r-- 1 root root 0 Nov 26 12:48 cpu.stat -rw-r--r-- 1 root root 0 Nov 26 12:48 notify_on_release -rw-r--r-- 1 root root 0 Nov 26 12:48 release_agent -rw-r--r-- 1 root root 0 Nov 26 12:48 tasks
RESOURCE CONTROLLERS IN LINUX KERNEL
Understanding slice
By default, systemd automatically creates a hierarchy of slice, scope and service units to provide a unified structure for the cgroup tree. Services, scopes, and slices are created manually by the system administrator or dynamically by programs. By default, the operating system defines a number of built-in services that are necessary to run the system. Also, there are four slices created by default:
- -.slice — the root slice;
- system.slice — the default place for all system services;
- user.slice — the default place for all user sessions;
- machine.slice — the default place for all virtual machines and Linux containers.
How to limit CPU using slice?
Let us take an example of CPUShares
to limit CPU resources. Now assuming we assign following value of CPUShares
to below slice
system.slice -> 1024 user.slice -> 256 machine.slice -> 2048
What does these values mean?
They actually individually mean nothing but instead these values are used as a comparison factor between all the slices. Here if we assume that if total CPU availability is 100% then user.slice
will get ~7%, system.slice
will get 4 times the allocation of user.slice
i.e. ~30% and machine.slice
will get twice the allocation of system.slice
which will be around ~60% of the available CPU resource.
Can I limit CPU of multiple services in system.slice?
This is a valid question, assuming I created three service inside system.slice
with CPUShares
value as defined below
service1 -> 1024 service2 -> 256 service3 -> 512
If we sum it up the total becomes larger than 1024 which is actually assigned to system.slice
in the above example. Well again, these values are only meant of comparison and in real mean nothing. Here service1 will get the maximum amount of available resource i.e. if 100% of resource is available for system.slice
then the service1 will get ~56%, service2 will get ~14% and service3 will get ~28% of the available CPU
This is how this cgroup settings in the big level relates between different slices and between different slices relates to different services.
How to create custom slice?
- Every name of a slice unit corresponds to the path to a location in the hierarchy.
- Child slice will inherit the settings from parent slice.
- The dash ("-") character acts as a separator of the path components.
For example, if the name of a slice looks as follows:
parent-name.slice
It means that a slice called parent-name.slice
is a subslice of the parent.slice
. This slice can have its own subslice named parent-name-name2.slice
, and so on..
Test CPU resource allocation using practical examples
Now we will create two systemd unit files namely stress1.service and stress2.service to test if we are able to limit CPU. These service scripts will utilise all the CPU on my system
CPUShares
Using these systemd unit files I will put some CPU load using system.slice
# cat /etc/systemd/system/stress1.service
[Unit]
Description=Put some stress
[Service]
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null
This is my second unit file with same content to stress the CPU
# cat /etc/systemd/system/stress2.service
[Unit]
Description=Put some stress
[Service]
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null
Start these services
# systemctl daemon-reload # systemctl start stress1 # systemctl start stress1
Now validate the CPU usage using top
command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1994 root 20 0 107992 608 516 R 49.7 0.0 0:03.11 dd 2001 root 20 0 107992 612 516 R 49.7 0.0 0:02.21 dd
As you see I have two processes which are trying to utilise available CPU, now since both are in the system slice, the process equally gets the available resource. So both the process gets ~50% of the CPU as expected.
Now let us try to add a new process on the user.slice using a while command in the the background
# while true; do true; done &
Next check the CPU usage, and as expected now the available CPU is equally divided into 3 processes. there is no distinction between user.slice and system.slice
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1983 root 20 0 116220 1404 152 R 32.9 0.0 1:53.28 bash 2193 root 20 0 107992 608 516 R 32.9 0.0 0:07.59 dd 2200 root 20 0 107992 612 516 R 32.9 0.0 0:07.13 dd
Now let us enable the slicing by enabling below values in "/etc/systemd/system.conf
"
DefaultCPUAccounting=yes DefaultBlockIOAccounting=yes DefaultMemoryAccounting=yes
Reboot the node to activate the changes
Once the system is back UP, next we will again start our stress1 and stress2 service and a while loop using bash shell
# systemctl start stress1 # systemctl start stress2 # while true; do true; done &
Now validate the CPU usage using top
command
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2132 root 20 0 116220 1520 392 R 49.3 0.0 1:16.47 bash 1994 root 20 0 107992 608 516 R 24.8 0.0 2:30.40 dd 2001 root 20 0 107992 612 516 R 24.8 0.0 2:29.50 dd
As you see now our slicing has become effective. The user slice is now able to claim 50% of the CPU while the system slice is divided at ~25% for both the stress service.
Let us now further reserve the CPU using CPUShares
for our systemd unit files.
# cat /etc/systemd/system/stress2.service
[Unit]
Description=Put some stress
[Service]
CPUShares=1024
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null
# cat /etc/systemd/system/stress1.service
[Unit]
Description=Put some stress
[Service]
CPUShares=512
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null
Now in the above unit files I have given priority to stress2.service so it will be allowed double of the resource allocated to stress1.service.
Next restart the services
# systemctl daemon-reload # systemctl restart stress1 # systemctl restart stress2
Validate the top
output
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2132 root 20 0 116220 1520 392 R 49.7 0.0 2:43.11 bash 2414 root 20 0 107992 612 516 R 33.1 0.0 0:04.85 dd 2421 root 20 0 107992 608 516 R 16.6 0.0 0:01.95 dd
So as expected, out of the available 50% CPU resources for system.slice, stress2 gets double the CPU allocated to stress1 service.
user.slice
then system.slice
will be allowed to use upto 100% of the available CPU resource.
Monitor CPU resource usage per slice
systemd-cgtop shows the top control groups of the local Linux control group hierarchy, ordered by their CPU, memory, or disk I/O load. The display is refreshed in regular intervals (by default every 1s), similar in style to top command. Resource usage is only accounted for control groups in the relevant hierarchy, i.e. CPU usage is only accounted for control groups in the "cpuacct
" hierarchy, memory usage only for those in "memory
" and disk I/O usage for those in "blkio
".
Path Tasks %CPU Memory Input/s Output/s / 56 100.0 309.0M - - /system.slice - 97.5 277.4M - - /system.slice/stress3.service 1 59.9 104.0K - - /system.slice/stress1.service 1 29.9 104.0K - - /system.slice/stress2.service 1 7.5 108.0K - - /user.slice - 1.7 10.7M - - /user.slice/user-0.slice - 1.7 10.4M - - /user.slice/user-0.slice/session-7.scope 3 1.7 4.6M - - /system.slice/pacemaker.service 7 0.0 41.6M - - /system.slice/pcsd.service 1 0.0 46.8M - - /system.slice/fail2ban.service 1 0.0 9.0M - - /system.slice/dhcpd.service 1 0.0 4.4M - - /system.slice/tuned.service 1 0.0 11.8M - - /system.slice/NetworkManager.service 3 0.0 11.3M - - /system.slice/httpd.service 6 0.0 4.6M - - /system.slice/abrt-oops.service 1 0.0 1.4M - - /system.slice/rsyslog.service 1 0.0 1.5M - - /system.slice/rngd.service 1 0.0 176.0K - - /system.slice/ModemManager.service 1 - 3.6M - - /system.slice/NetworkManager-dispatcher.service 1 - 944.0K - -
Lastly I hope this article on understanding cgroups and slices with examples to limit CPU resources on Linux was helpful. So, let me know your suggestions and feedback using the comment section.
danke…
gracious…
merci….
thanks…
kudos………
Hi
I really enjoyed this article.
I need to learn more about cgroups.
I have 32 cpu’s and I need to allow each user the use of a maximum of
of only 2 cpu’s.
Can you give me some guidance on this please?
thx
Josh
Hi Josh,
This is an interesting question, give me some time and I will find this for you and if possible I may also write an article on this topic
With RHEL 6 I know this was possible using cgconfig and creating user/group based templates but this is deprecated with RHEL 7 as we have systemd now where can can create crgoups based on slice, user or user and with RHEL 8 we have cgroupV2 which I have not analysed properly
So let me see what is the best possible option to achieve this
I am using rhel8 to test this out. And I did not have to start default accounting in /etc/systemd/system.conf.
I have two CPUs.
1st case: only started stress1 and stress2 and they were using 100% both CPUs.
2nd case: along with above, now, I started while true loop on my logged in session. This process landed in user.slice and observed that now, this take 1 100% cpu and other two processes take 50% each. i.e. system divided equally CPUs. 1 cpu for user.slice and 1 cpu for system.slice.
3rd case: I added CPUShares accordingly into stress1 and stress2, Now, I see that, user.slice at 100% utilized by while true loop and stress2 take about 66% and stress1 takes about 33% of other CPU.
Thanks Manish for sharing your observation. Were you using the same values as provided in the article?
I will also retest this using RHEL 8 and update here if I see any discrepancies or if this article needs any update