Limit CPU with cgroups & slice in Linux [100% Working]


Written by - Deepak Prasad

In this article we will discuss on how to limit CPU resources in Linux using cgroups and slice with some practical examples.

Now to start with this article, cgroup or Control Group provides resource management and resource accounting for groups of processes. Cgroups kernel implementation is mostly in non-critical paths in terms of performance. The cgroups subsystem implements a new Virtual File System (VFS) type named “cgroups”. All cgroups actions are done by filesystem actions, like creating cgroups directories in a cgroup filesystem, writing or reading to entries in these directories, mounting cgroup filesystems, etc.

How to limit CPU using cgroup and slice in Linux

 

Few pointers on cgroups to limit resources

  • cgroup is now integrated with systemd in recent Linux versions since kernel 2.6.24.
  • Control group place resources in controllers that represent the type of resource i.e you can define groups of available resources to make sure your application like webserver has guaranteed claim on resources
  • In order to do so, cgroup works with default controller which are cpu, memory and blkio
  • These controllers are divided into tree structure where different weight or limits are applied to each branch
    • Each of these branches is a cgroup
    • One or more processes are assigned to a cgroup
  • cgroups can be applied from the command line or from systemd
    • Manual creation happens through the cgconfig service and the cgred process
    • In all cases, cgroup settings are written to /sys/fs/cgroups
# ls -l /sys/fs/cgroup/
total 0
drwxr-xr-x 2 root root  0 Nov 26 12:48 blkio
lrwxrwxrwx 1 root root 11 Nov 26 12:48 cpu -> cpu,cpuacct
lrwxrwxrwx 1 root root 11 Nov 26 12:48 cpuacct -> cpu,cpuacct
drwxr-xr-x 2 root root  0 Nov 26 12:48 cpu,cpuacct
drwxr-xr-x 2 root root  0 Nov 26 12:48 cpuset
drwxr-xr-x 3 root root  0 Nov 26 12:50 devices
drwxr-xr-x 2 root root  0 Nov 26 12:48 freezer
drwxr-xr-x 2 root root  0 Nov 26 12:48 hugetlb
drwxr-xr-x 2 root root  0 Nov 26 12:48 memory
lrwxrwxrwx 1 root root 16 Nov 26 12:48 net_cls -> net_cls,net_prio
drwxr-xr-x 2 root root  0 Nov 26 12:48 net_cls,net_prio
lrwxrwxrwx 1 root root 16 Nov 26 12:48 net_prio -> net_cls,net_prio
drwxr-xr-x 2 root root  0 Nov 26 12:48 perf_event
drwxr-xr-x 2 root root  0 Nov 26 12:48 pids
drwxr-xr-x 4 root root  0 Nov 26 12:48 systemd

These are the different controllers which are created by the kernel itself. Each of these controllers have their own tunables for example

# ls -l /sys/fs/cgroup/cpuacct/
total 0
-rw-r--r-- 1 root root 0 Nov 26 12:48 cgroup.clone_children
--w--w---- 1 root root 0 Nov 26 12:48 cgroup.event_control
-rw-r--r-- 1 root root 0 Nov 26 12:48 cgroup.procs
-r--r--r-- 1 root root 0 Nov 26 12:48 cgroup.sane_behavior
-r--r--r-- 1 root root 0 Nov 26 12:48 cpuacct.stat
-rw-r--r-- 1 root root 0 Nov 26 12:48 cpuacct.usage
-r--r--r-- 1 root root 0 Nov 26 12:48 cpuacct.usage_percpu
-rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.cfs_period_us
-rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.cfs_quota_us
-rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.rt_period_us
-rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.rt_runtime_us
-rw-r--r-- 1 root root 0 Nov 26 12:48 cpu.shares
-r--r--r-- 1 root root 0 Nov 26 12:48 cpu.stat
-rw-r--r-- 1 root root 0 Nov 26 12:48 notify_on_release
-rw-r--r-- 1 root root 0 Nov 26 12:48 release_agent
-rw-r--r-- 1 root root 0 Nov 26 12:48 tasks

 

 

Understanding slice

By default, systemd automatically creates a hierarchy of slice, scope and service units to provide a unified structure for the cgroup tree. Services, scopes, and slices are created manually by the system administrator or dynamically by programs. By default, the operating system defines a number of built-in services that are necessary to run the system. Also, there are four slices created by default:

  • -.slice — the root slice;
  • system.slice — the default place for all system services;
  • user.slice — the default place for all user sessions;
  • machine.slice — the default place for all virtual machines and Linux containers.

 

How to limit CPU using slice?

Let us take an example of CPUShares to limit CPU resources. Now assuming we assign following value of CPUShares to below slice

system.slice -> 1024
user.slice -> 256
machine.slice -> 2048

 

What does these values mean?

They actually individually mean nothing but instead these values are used as a comparison factor between all the slices. Here if we assume that if total CPU availability is 100% then user.slice will get ~7%, system.slice will get 4 times the allocation of user.slice i.e. ~30% and machine.slice will get twice the allocation of system.slice which will be around ~60% of the available CPU resource.

 

Can I limit CPU of multiple services in system.slice?

This is a valid question, assuming I created three service inside system.slice with CPUShares value as defined below

service1 -> 1024
service2 -> 256
service3 -> 512

If we sum it up the total becomes larger than 1024 which is actually assigned to system.slice in the above example. Well again, these values are only meant of comparison and in real mean nothing. Here service1 will get the maximum amount of available resource i.e. if 100% of resource is available for system.slice then the service1 will get ~56%, service2 will get ~14% and service3 will get ~28% of the available CPU

 

This is how this cgroup settings in the big level relates between different slices and between different slices relates to different services.

 

How to create custom slice?

  • Every name of a slice unit corresponds to the path to a location in the hierarchy.
  • Child slice will inherit the settings from parent slice.
  • The dash ("-") character acts as a separator of the path components.

For example, if the name of a slice looks as follows:

parent-name.slice

It means that a slice called parent-name.slice is a subslice of the parent.slice. This slice can have its own subslice named parent-name-name2.slice, and so on..

 

Test CPU resource allocation using practical examples

Now we will create two systemd unit files namely stress1.service and stress2.service to test if we are able to limit CPU. These service scripts will utilise all the CPU on my system

NOTE:
For demonstration I have disabled all other CPUs and have enabled only one CPU. Because if we have more than one CPU then the load will be distributed and I won't be able to show the resource allocation for CPUShares

Using these systemd unit files I will put some CPU load using system.slice

# cat /etc/systemd/system/stress1.service
[Unit]
Description=Put some stress

[Service]
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null

This is my second unit file with same content to stress the CPU

# cat /etc/systemd/system/stress2.service
[Unit]
Description=Put some stress

[Service]
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null

Start these services

# systemctl daemon-reload
# systemctl start stress1
# systemctl start stress1

Now validate the CPU usage using top command

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1994 root      20   0  107992    608    516 R 49.7  0.0   0:03.11 dd
 2001 root      20   0  107992    612    516 R 49.7  0.0   0:02.21 dd

As you see I have two processes which are trying to utilise available CPU, now since both are in the system slice, the process equally gets the available resource. So both the process gets ~50% of the CPU as expected.

 

Now let us try to add a new process on the user.slice using a while command in the the background

# while true; do true; done &

Next check the CPU usage, and as expected now the available CPU is equally divided into 3 processes. there is no distinction between user.slice and system.slice

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 1983 root      20   0  116220   1404    152 R 32.9  0.0   1:53.28 bash
 2193 root      20   0  107992    608    516 R 32.9  0.0   0:07.59 dd
 2200 root      20   0  107992    612    516 R 32.9  0.0   0:07.13 dd
NOTE:
Here as we see both user and system slice level process gets equal amount of available resource, it's because by default in our distribution DefaultCPUAccounting, DefaultBlockIOAccounting and DefaultMemoryAccounting are in disabled state.

Now let us enable the slicing by enabling below values in "/etc/systemd/system.conf"

DefaultCPUAccounting=yes
DefaultBlockIOAccounting=yes
DefaultMemoryAccounting=yes

Reboot the node to activate the changes

IMPORTANT NOTE:
It is important to reboot the node to activate the changes.

Once the system is back UP, next we will again start our stress1 and stress2 service and a while loop using bash shell

# systemctl start stress1
# systemctl start stress2

# while true; do true; done &

Now validate the CPU usage using top command

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 2132 root      20   0  116220   1520    392 R 49.3  0.0   1:16.47 bash
 1994 root      20   0  107992    608    516 R 24.8  0.0   2:30.40 dd
 2001 root      20   0  107992    612    516 R 24.8  0.0   2:29.50 dd

As you see now our slicing has become effective. The user slice is now able to claim 50% of the CPU while the system slice is divided at ~25% for both the stress service.

Let us now further reserve the CPU using CPUShares for our systemd unit files.

# cat /etc/systemd/system/stress2.service
[Unit]
Description=Put some stress

[Service]
CPUShares=1024
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null
# cat /etc/systemd/system/stress1.service
[Unit]
Description=Put some stress

[Service]
CPUShares=512
Type=Simple
ExecStart=/usr/bin/dd if=/dev/zero of=/dev/null

Now in the above unit files I have given priority to stress2.service so it will be allowed double of the resource allocated to stress1.service.

NOTE:
The allowed range for CPUShares is 2 to 262144. Defaults to 1024.

Next restart the services

# systemctl daemon-reload
# systemctl restart stress1
# systemctl restart stress2

Validate the top output

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 2132 root      20   0  116220   1520    392 R 49.7  0.0   2:43.11 bash
 2414 root      20   0  107992    612    516 R 33.1  0.0   0:04.85 dd
 2421 root      20   0  107992    608    516 R 16.6  0.0   0:01.95 dd

So as expected, out of the available 50% CPU resources for system.slice, stress2 gets double the CPU allocated to stress1 service.

NOTE:
If there are no active process running from user.slice then system.slice will be allowed to use upto 100% of the available CPU resource.

 

Monitor CPU resource usage per slice

systemd-cgtop shows the top control groups of the local Linux control group hierarchy, ordered by their CPU, memory, or disk I/O load. The display is refreshed in regular intervals (by default every 1s), similar in style to top command. Resource usage is only accounted for control groups in the relevant hierarchy, i.e. CPU usage is only accounted for control groups in the "cpuacct" hierarchy, memory usage only for those in "memory" and disk I/O usage for those in "blkio".

Path                                                                               Tasks   %CPU   Memory  Input/s Output/s

/                                                                                     56  100.0   309.0M        -        -
/system.slice                                                                          -   97.5   277.4M        -        -
/system.slice/stress3.service                                                          1   59.9   104.0K        -        -
/system.slice/stress1.service                                                          1   29.9   104.0K        -        -
/system.slice/stress2.service                                                          1    7.5   108.0K        -        -
/user.slice                                                                            -    1.7    10.7M        -        -
/user.slice/user-0.slice                                                               -    1.7    10.4M        -        -
/user.slice/user-0.slice/session-7.scope                                               3    1.7     4.6M        -        -
/system.slice/pacemaker.service                                                        7    0.0    41.6M        -        -
/system.slice/pcsd.service                                                             1    0.0    46.8M        -        -
/system.slice/fail2ban.service                                                         1    0.0     9.0M        -        -
/system.slice/dhcpd.service                                                            1    0.0     4.4M        -        -
/system.slice/tuned.service                                                            1    0.0    11.8M        -        -
/system.slice/NetworkManager.service                                                   3    0.0    11.3M        -        -
/system.slice/httpd.service                                                            6    0.0     4.6M        -        -
/system.slice/abrt-oops.service                                                        1    0.0     1.4M        -        -
/system.slice/rsyslog.service                                                          1    0.0     1.5M        -        -
/system.slice/rngd.service                                                             1    0.0   176.0K        -        -
/system.slice/ModemManager.service                                                     1      -     3.6M        -        -
/system.slice/NetworkManager-dispatcher.service                                        1      -   944.0K        -        -

 

Lastly I hope this article on understanding cgroups and slices with examples to limit CPU resources on Linux was helpful. So, let me know your suggestions and feedback using the comment section.

 

Views: 37

Deepak Prasad

He is the founder of GoLinuxCloud and brings over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels in various domains, from development to DevOps, Networking, and Security, ensuring robust and efficient solutions for diverse projects. You can reach out to him on his LinkedIn profile or join on Facebook page.

Can't find what you're searching for? Let us assist you.

Enter your query below, and we'll provide instant results tailored to your needs.

If my articles on GoLinuxCloud has helped you, kindly consider buying me a coffee as a token of appreciation.

Buy GoLinuxCloud a Coffee

For any other feedbacks or questions you can send mail to admin@golinuxcloud.com

Thank You for your support!!

4 thoughts on “Limit CPU with cgroups & slice in Linux [100% Working]”

  1. I am using rhel8 to test this out. And I did not have to start default accounting in /etc/systemd/system.conf.
    I have two CPUs.
    1st case: only started stress1 and stress2 and they were using 100% both CPUs.
    2nd case: along with above, now, I started while true loop on my logged in session. This process landed in user.slice and observed that now, this take 1 100% cpu and other two processes take 50% each. i.e. system divided equally CPUs. 1 cpu for user.slice and 1 cpu for system.slice.
    3rd case: I added CPUShares accordingly into stress1 and stress2, Now, I see that, user.slice at 100% utilized by while true loop and stress2 take about 66% and stress1 takes about 33% of other CPU.

    Reply
    • Thanks Manish for sharing your observation. Were you using the same values as provided in the article?
      I will also retest this using RHEL 8 and update here if I see any discrepancies or if this article needs any update

      Reply
  2. Hi
    I really enjoyed this article.
    I need to learn more about cgroups.
    I have 32 cpu’s and I need to allow each user the use of a maximum of
    of only 2 cpu’s.
    Can you give me some guidance on this please?
    thx
    Josh

    Reply
    • Hi Josh,

      This is an interesting question, give me some time and I will find this for you and if possible I may also write an article on this topic
      With RHEL 6 I know this was possible using cgconfig and creating user/group based templates but this is deprecated with RHEL 7 as we have systemd now where can can create crgoups based on slice, user or user and with RHEL 8 we have cgroupV2 which I have not analysed properly

      So let me see what is the best possible option to achieve this

      Reply

Leave a Comment