eco Beginner Tutorial/How-to

Advanced Systemd for Production: Reliability,

calendar_month Mar 07, 2026 schedule 53 min read visibility 14 views
Продвинутый Systemd для Продакшн: Надежность, Изоляция и Мониторинг Сервисов на Linux-Серверах
info

Need a server for this guide? We offer dedicated servers and VPS in 50+ countries with instant setup.

Need a server for this guide?

Deploy a VPS or dedicated server in minutes.

Advanced Systemd for Production: Reliability, Isolation, and Monitoring of Services on Linux Servers

TL;DR

  • Systemd is not just an init system: In 2026, it is the de-facto standard for service management, offering powerful tools to ensure reliability, security, and efficient resource utilization in production environments.
  • Reliability through automation: Use Systemd's capabilities for automatic restarts, dependency management, socket and timer activation to create fault-tolerant services, minimizing manual intervention.
  • Isolation and security: Implement advanced Systemd directives (PrivateTmp, ProtectHome, RestrictAddressFamilies, CapabilityBoundingSet, DynamicUser, and others) to create secure sandboxes, significantly reducing the attack surface.
  • Monitoring with Journald and Cgroups: Centralized log collection via Journald and resource control through integration with cgroups v2 allow for effective monitoring of service status and prevention of resource starvation.
  • Resource and performance optimization: Apply CPUQuota, MemoryLimit, IOWeight, and Systemd Slices for precise resource management, preventing undesirable influence of one service on another.
  • Savings and efficiency: Proper use of Systemd reduces operational costs by minimizing downtime, optimizing server capacity utilization, and simplifying deployment and management automation.
  • Scaling and orchestration: Systemd integrates well with modern orchestration tools such as Ansible, Terraform, and Kubernetes (via CRI-O and Kubelet), ensuring consistent service management on individual hosts and in clusters.

Introduction

In the world of high-load systems and continuous delivery, where every second of downtime costs money and security is not just an option but a critical requirement, service management on Linux servers comes to the forefront. By 2026, Systemd has firmly established itself as the dominant init system and service manager in most popular Linux distributions. However, for many engineers, its potential is still underestimated or only superficially utilized. Systemd is not just a tool for starting daemons; it is a comprehensive platform for ensuring reliability, isolation, monitoring, and efficient resource management that can significantly simplify the lives of DevOps engineers, backend developers, and system administrators.

This article aims to reveal the advanced capabilities of Systemd that go far beyond basic systemctl start/stop/enable commands. We will delve into the world of cgroups v2 for precise resource management, examine security directives for creating isolated sandboxes, explore socket and timer activation mechanisms for optimizing resource utilization and increasing fault tolerance, and delve into the intricacies of monitoring with Journald. The goal is to provide deep, practical knowledge that will enable you to build more reliable, secure, and cost-effective production systems.

The problems we will address in this article include: how to ensure your service is always running and automatically recovers from failures; how to minimize security risks by isolating applications from each other and from the system; how to effectively allocate resources so that one "resource-hungry" process doesn't "kill" the entire server; and how to get a complete picture of your services' status through centralized log and metric collection. This material will be useful for DevOps engineers striving for automation and fault tolerance; backend developers who want better control over their application environment; SaaS project founders for whom stability and security are the foundation of their business; system administrators looking for ways to optimize and unify management; and technical directors of startups who need to build scalable and reliable infrastructure with limited resources.

In 2026, when microservice architectures and cloud environments have become the norm, the ability to effectively manage processes at the host level remains fundamental. Even when using containers, Systemd plays a key role in managing basic operating system services, and can also be used to launch and manage containerized workloads outside orchestrators, or to provide an additional layer of protection and monitoring for Kubelet and other system daemons. Understanding and mastering Systemd is an investment in the stability and security of your infrastructure.

Key Criteria for Reliability, Isolation, and Monitoring

Diagram: Key Criteria for Reliability, Isolation, and Monitoring
Diagram: Key Criteria for Reliability, Isolation, and Monitoring

When deploying and managing services in a production environment, it is necessary to consider a number of key criteria that determine the overall stability, security, and manageability of the system. Systemd provides powerful tools to meet these requirements.

1. Reliability and Fault Tolerance

Service reliability in production is its ability to perform its functions without interruption over a long period, and to quickly recover from failures without manual intervention. This is critically important for any business, as downtime directly leads to financial losses and reputational damage.

  • Automatic restart after failure: A service must be able to automatically restart upon abnormal termination or freezing. Systemd provides Restart= and RestartSec= directives for fine-tuning this behavior.
  • Dependency management: Services often depend on each other. Systemd allows precise definition of start and stop order, as well as behavior upon dependency failure (e.g., Requires=, After=, Wants=, PartOf=).
  • On-demand activation (Socket/Path Activation): Starting a service only when it is actually needed (e.g., upon an incoming connection or file appearance) saves resources and speeds up system boot.
  • Resistance to resource starvation: A service should not "hang" due to memory or CPU exhaustion. Systemd, integrated with cgroups, allows setting limits and priorities.
  • Graceful termination: When stopping a service, it must terminate correctly, saving data and releasing resources, rather than being forcibly killed. Systemd provides KillMode=, TimeoutStopSec= directives.

How to evaluate: Downtime, Mean Time To Recovery (MTTR), Number of service-related incidents.

2. Isolation and Security

Service isolation aims to minimize the attack surface and prevent the spread of potential vulnerabilities. If one service is compromised, it should not lead to the compromise of the entire system or other services. Security is the foundation of trust in any system.

  • Filesystem restriction: A service should only have access to the files and directories it absolutely needs. Directives: RootDirectory=, RootImage=, ReadWritePaths=, ReadOnlyPaths=, ProtectSystem=, ProtectHome=, PrivateTmp=, PrivateDevices=.
  • Network access restriction: A service should only be able to interact with permitted network addresses and ports. Directives: RestrictAddressFamilies=, IPAddressDeny=, IPAddressAllow=.
  • System call restriction (seccomp): Filtering system calls available to a process significantly reduces the risks of exploiting vulnerabilities. Directive: SystemCallFilter=.
  • Privilege and capability restriction: Running a service with minimal privileges and a set of Linux capabilities. Directives: User=, Group=, DynamicUser=, CapabilityBoundingSet=.
  • IPC (Inter-Process Communication) isolation: Preventing unwanted interaction between processes via shared memory or other IPC mechanisms. Directive: PrivateIPC=.

How to evaluate: Number of vulnerabilities, possibility of lateral attack movement, compliance with security standards (e.g., CIS Benchmarks).

3. Monitoring and Diagnostics

Effective monitoring allows for timely problem detection, performance analysis, and resource planning. Without adequate monitoring, it is impossible to ensure reliability and respond promptly to incidents.

  • Centralized log collection: All service logs should be available in one place, easily filterable and analyzable. Journald, built into Systemd, provides powerful capabilities for this.
  • Resource metrics: Tracking CPU, memory, disk I/O, and network traffic usage at the service level. Systemd, through cgroups, provides this data.
  • Service status: Ability to quickly obtain the current state of a service (running, stopped, failed). Commands: systemctl status, journalctl -u.
  • Incident notifications: Automatic alerting of engineers when critical events occur or thresholds are exceeded. Integration with external monitoring systems.
  • Ease of debugging: Ability to easily obtain diagnostic information when problems arise.

How to evaluate: Completeness of collected metrics, speed of incident detection, Mean Time To Detect (MTTD), convenience of log and metric analysis.

4. Resource Management

Effective resource management is necessary to prevent service starvation, ensure stable performance, and optimize infrastructure costs. In 2026, when cloud resources are billed by consumption, this becomes particularly relevant.

  • CPU limiting: Setting the maximum CPU percentage a service can use. Directive: CPUQuota=.
  • Memory limiting: Setting the maximum amount of RAM available to a service. Directive: MemoryLimit=.
  • I/O priorities: Managing disk I/O priorities for different services. Directive: IOWeight=.
  • Grouping services into Slices: Logical grouping of related services for joint resource management. Systemd Slices.
  • Process management: Preventing the forking of "zombie" processes and controlling child processes.

How to evaluate: Stability of service performance, absence of resource starvation, efficiency of hardware utilization, size of cloud infrastructure bills.

Understanding these criteria and knowing how Systemd can help achieve them is a cornerstone for building a modern, reliable, and secure production infrastructure.

Comparative Table: Systemd vs. Traditional Approaches

For clarity, let's compare Systemd's capabilities with traditional service management approaches (e.g., SysVinit/Upstart with custom scripts and simple process managers like Supervisord), focusing on production requirements for 2026. The table demonstrates why Systemd has become the de-facto standard.

Criterion / Feature Systemd (Advanced Usage) SysVinit/Upstart (with Custom Scripts) Supervisord (Simple Process Manager) Rating (2026)
Reliability: Automatic Restart Built-in Restart=, RestartSec= with flexible strategies (on-failure, always, on-success, etc.). Tracks exit codes. Requires complex logic in shell scripts, often unreliable, poorly tracks actual state. Built-in auto-restart, but less flexible than Systemd. Depends on the Supervisord parent process. Systemd: 5/5 (Reliable, flexible)
Reliability: Dependency Management Declarative definition of Requires=, After=, Wants=, PartOf=, BindsTo=. Parallel startup. Manual control of startup order via symbolic links, prone to race conditions. Basic dependencies between processes within Supervisord, but not at the OS level. Systemd: 5/5 (Comprehensive, fault-tolerant)
Reliability: On-Demand Activation (Socket/Path) Built-in support for Socket Activation (.socket), Path Activation (.path), Timer Activation (.timer). Resource saving. Absent. Requires external daemons (xinetd) or custom solutions. Absent. Services are always running. Systemd: 5/5 (Unique, efficient)
Isolation: File System Restrictions (Sandboxing) ProtectSystem=, ProtectHome=, PrivateTmp=, ReadOnlyPaths=, ReadWritePaths=, RootDirectory=. Only chroot (chroot), limited capabilities, requires manual configuration. No built-in support, depends on external tools or containerization. Systemd: 5/5 (Powerful, declarative)
Isolation: Network/IPC/Capabilities Restrictions RestrictAddressFamilies=, PrivateIPC=, CapabilityBoundingSet=, NoNewPrivileges=, SystemCallFilter= (seccomp). Requires manual configuration of iptables, sysctl, libcap, seccomp — complex and fragmented. No built-in support. Systemd: 5/5 (Comprehensive, unified)
Monitoring: Centralized Logs Journald: structured logs, indexing, filtering, forwarding. Automatic stdout/stderr collection. Logs in different files, unstructured, require Logrotate and grep/awk for analysis. Logs in separate files, basic stdout/stderr capture, no advanced filtering. Systemd: 5/5 (Integrated, powerful)
Monitoring: Resource Metrics (cgroups) Full integration with cgroups v2 for CPU, RAM, I/O, Network. Declarative limits and priorities. Manual cgroups configuration (complex), or external tools (cgget, cgroup-tools). No built-in support, only basic process metrics. Systemd: 5/5 (Native, precise)
Resource Management: Limits and Priorities CPUQuota=, MemoryLimit=, IOWeight=, LimitNOFILE=, Nice=, Slices. Manual configuration of ulimit, nice, cgroups (complex and fragmented). Basic ulimit, nice. No cgroups. Systemd: 5/5 (Detailed, efficient)
User/Privilege Management User=, Group=, DynamicUser= (automatic creation of temporary users). su -c or manual user creation. user= (in configuration). Systemd: 5/5 (Secure, convenient)
Configuration Complexity Unified declarative format for .service files. Learning curve for advanced features. Shell scripts, much boilerplate, lack of a unified standard. High probability of errors. Simple INI-like format, but limited capabilities. Systemd: 4/5 (Powerful, but requires learning)
OS Integration Deep integration with the Linux kernel (cgroups, seccomp), udev, networking, logging. Basic integration via kernel calls and utilities. Minimal integration, operates as a user application. Systemd: 5/5 (Seamless, comprehensive)
Relevance in 2026 De-facto standard for Linux servers. Actively developed. Outdated approach, used only in legacy systems. Applicable for very simple cases, but not for production systems with high requirements. Systemd: 5/5 (Key element)

This table clearly demonstrates that Systemd has not merely replaced old init systems, but has also provided a qualitatively new level of capabilities for service management, which are critically important for modern production environments. Its deep integration with the Linux kernel and a unified declarative approach to configuration significantly surpass fragmented and often unreliable solutions based on scripts or simple process managers.

Detailed Overview of Advanced Systemd Capabilities

Diagram: Detailed Overview of Advanced Systemd Capabilities
Diagram: Detailed Overview of Advanced Systemd Capabilities

Systemd is much more than just an initialization system. It is a comprehensive platform for service management that provides a wide range of features to enhance reliability, security, efficiency, and manageability. Let's explore the key advanced capabilities that every DevOps engineer and system administrator should know.

1. Various Unit File Types and Their Application

Systemd operates with the concept of "units," which represent various objects managed by Systemd. Each unit type has its own .conf file and specific directives.

  • .service (Services): The primary type for running daemons and applications.

    This is the most frequently used type. It describes how to start, stop, restart, and monitor a specific application or daemon. Advanced directives include Type=forking/simple/oneshot/notify/idle for defining process behavior, ExecStartPre/Post for commands executed before/after the main service, and RemainAfterExit=yes for services that perform a task and exit but are still considered "active." Understanding these types is crucial for correctly managing long-running and short-lived processes. For example, Type=notify allows a service to signal Systemd about its readiness, which improves dependency management.

  • .socket (Sockets): On-demand service activation via sockets.

    Socket Activation is a powerful feature that allows services to be started only when an incoming connection arrives on a specific socket. Systemd "listens" to the socket, and when a connection comes in, it launches the corresponding service, passing the already open socket to it. This significantly saves resources, as the service does not run constantly, and increases fault tolerance, as Systemd buffers connections during service restarts. It also simplifies hot updates of services (zero-downtime deployments), as the old process can be stopped and the new one started while Systemd continues to accept connections. Used for databases, web servers, API gateways. For example, Postfix or Nginx can use socket activation.

  • .timer (Timers): Cron replacement for task scheduling.

    Timer Units provide a more flexible and reliable alternative to traditional cron. They can be configured to run at a specific time (OnCalendar=) or after a certain interval after system boot/Systemd startup (OnBootSec=) or after the last service run (OnUnitActiveSec=). Advantages: integration with Systemd (logging to Journald, dependency management), more precise execution times, ability to run missed tasks on next system power-on (Persistent=true). This is an ideal tool for daily backups, log cleanup, and periodic data synchronization.

  • .path (Paths): Service activation upon file system changes.

    Path Activation allows a service to be launched when a specific path in the file system changes (e.g., file creation, modification, or deletion). This is useful for tasks that need to be executed in response to file events, such as processing uploaded files or monitoring configuration changes. Systemd tracks changes using inotify. For example, an image processing service can be launched every time a new image file appears in a directory.

  • .slice (Slices): Grouping services for resource management.

    Slice Units are used for hierarchical organization of cgroups and resource management for groups of services. Instead of assigning limits to each service individually, you can create a "slice" (e.g., web.slice, db.slice, batch.slice) and assign it overall CPU, memory, and I/O limits. All services belonging to this slice will use resources from the common pool, limited by the slice. This prevents resource "starvation" among critical service groups and ensures fair distribution. For example, you can guarantee that web servers always get 60% of CPU, and background tasks get 20%, even if they try to use more.

  • .scope (Scopes): For external processes managed by Systemd.

    Scope Units are used to manage groups of external processes that were not launched directly by Systemd (e.g., user sessions, processes launched via SSH). Systemd places them in cgroups and can apply resource limits and monitor their status. They are automatically created and deleted by Systemd. This is useful for monitoring and managing user processes or processes launched by container runtimes.

2. Resource Management with Cgroups v2

Systemd is deeply integrated with the cgroups (Control Groups) subsystem of the Linux kernel, especially with cgroups v2, which provides a more unified and powerful mechanism for resource control. This allows for precise tuning of CPU, memory, disk I/O, and network usage for each service or group of services.

  • CPUQuota=: Limits CPU usage by a service. For example, CPUQuota=50% ensures that a service will not use more than 50% of one CPU core. This prevents a single "greedy" process from monopolizing the CPU.
  • CPUShares= / CPUWeight=: Defines the relative CPU weight for a service. If CPUShares=1024 (default) and another service has CPUShares=512, then in case of CPU scarcity, the first service will receive twice as much CPU time. CPUWeight is the equivalent for cgroups v2.
  • MemoryLimit=: Sets the maximum amount of RAM a service can use. If the limit is exceeded, the kernel will start killing processes in the cgroup (OOM killer), but softer behavior can be configured. For example, MemoryLimit=2G.
  • IOWeight= / IODeviceWeight= / IOReadBandwidthMax= / IOWriteBandwidthMax=: Manages disk I/O priorities and limits. This is critical for preventing system slowdowns due to intensive disk activity of a single service. For example, IOWeight=200 for priority services and IOWeight=50 for background ones.
  • TasksMax=: Limits the maximum number of processes and threads a service can create. Protects against fork bombs and PID exhaustion.

Using Slices in conjunction with these directives allows for creating a complex yet manageable hierarchy of resource allocation, ensuring the stability of critical services.

3. Advanced Security Directives (Sandboxing)

Systemd provides unprecedented capabilities for isolating services, turning them into lightweight containers or sandboxes, which significantly enhances system security.

  • PrivateTmp=yes: Provides the service with its own isolated /tmp and /var/tmp. This prevents access to temporary files of other services and data leakage through them.
  • ProtectSystem=full/strict: Makes system directories (/usr, /boot, /etc) read-only or completely hides them. full makes /usr, /boot, /etc read-only; strict adds /sys, /proc, and other important directories to this.
  • ProtectHome=yes/read-only: Makes user home directories inaccessible or read-only. Prevents service access to sensitive user data.
  • PrivateDevices=yes: Provides the service with an isolated device namespace, hiding the host's real devices. This prevents unauthorized access to hardware devices.
  • NoNewPrivileges=yes: Prohibits the service and its descendants from gaining new privileges (e.g., through setuid/setgid programs). This is a powerful protection against privilege escalation.
  • DynamicUser=yes: Systemd automatically creates a temporary, unprivileged user and group for the service when it starts and removes them when it stops. This eliminates the need to manually create system users and ensures that each service has its unique UID/GID, minimizing risks.
  • CapabilityBoundingSet=: Limits the set of Linux capabilities available to a process. Instead of running as root and then dropping privileges, capabilities can be restricted from the outset. For example, CapabilityBoundingSet=~CAP_NET_ADMIN CAP_SYS_RAWIO.
  • SystemCallFilter= / SystemCallArchitectures=: Uses seccomp to filter allowed system calls. Allows creating a very strict sandbox, permitting only absolutely necessary system calls. For example, a web server may not need reboot or mount.
  • RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6: Restricts which network address families a service can use. For example, only AF_UNIX for local interaction.
  • IPAddressAllow= / IPAddressDeny=: Basic IP address filtering for outgoing connections (at the cgroups level).
  • RestrictSUIDSGID=yes: Prohibits the execution of SUID/SGID binaries within the service's namespace, which is a common attack vector.

By combining these directives, an extremely secure environment can be created for each service, significantly reducing the risks of compromise.

4. Centralized Logging with Journald

Journald is a Systemd-integrated daemon for collecting and managing logs. It replaces traditional syslog daemons and offers several advantages:

  • Centralized Collection: All logs from all services, the kernel, udev, and other sources are collected into a single binary store.
  • Structured Data: Logs are stored not as plain text, but as structured entries with metadata (PID, UID, UNIT, SYSLOG_IDENTIFIER, etc.), which simplifies filtering and analysis.
  • Log Persistence After Reboot: When Storage=persistent is configured in /etc/systemd/journald.conf, logs are preserved after a system reboot.
  • Powerful Filtering: The journalctl command allows filtering logs by unit, PID, time, priority, field, and other parameters. For example, journalctl -u myapp.service --since "1 hour ago" -p err.
  • Log Forwarding: Journald can forward logs to syslog, a remote server, or other log aggregation systems (e.g., rsyslog, fluentd, vector).
  • Rate-limiting: Built-in mechanisms prevent log "spam," protecting the system from overload.

Journald significantly simplifies monitoring and debugging by providing a single point of access to all system and service logs.

5. DynamicUser and Temporary Users

The DynamicUser=yes directive is a relatively new but extremely useful Systemd feature. When enabled, Systemd automatically creates a temporary, unprivileged user and group with unique UID/GID each time the service starts. These UID/GIDs are chosen from a range reserved for dynamic users (typically 61184-65534). After the service stops, the user and group are removed. This offers several advantages:

  • Improved Security: Each service runs under its unique user, preventing UID/GID conflicts and making it impossible for one service to access another's resources through shared UID/GIDs.
  • Simplified Management: There is no need to manually create and delete system users and groups or manage their UID/GIDs. Systemd does this automatically.
  • File System Isolation: In conjunction with StateDirectory=, CacheDirectory=, LogsDirectory=, Systemd can automatically create and assign permissions to these directories for the dynamic user, ensuring clean data isolation.

This makes service deployment cleaner, more secure, and more automated, especially in scenarios where services may start and stop frequently or in large numbers.

6. Unit File Overrides (Drop-in Snippets)

Systemd allows modifying unit settings without editing the original files located in /lib/systemd/system/. This is achieved using "drop-in" files or "override" files.

  • /etc/systemd/system/my-service.service.d/override.conf: To modify or add individual directives, a directory named <unit-name>.d is created, inside which .conf files are placed. For example, for nginx.service, this would be /etc/systemd/system/nginx.service.d/custom-memory.conf. In this file, you can specify only the directives you want to change, for example:
    
    [Service]
    MemoryLimit=512M
    RestartSec=10s
                
    Systemd will merge these settings with the original file.
  • /etc/systemd/system/my-service.service: You can completely override a unit by creating a file with the same name in /etc/systemd/system/. This is less preferred as it makes tracking changes relative to the original unit more difficult.

Using drop-in files is best practice, as it allows changes to be preserved during package updates without affecting the original files, and makes changes easily traceable. The command systemctl edit my-service.service automatically creates and opens such a file.

7. Unit File Templates (Templated Units)

Templated Units allow running multiple instances of the same service with different parameters from a single template file. The template name ends with @.service (e.g., [email protected]). When such a service is started, for example, systemctl start [email protected], the part after @ (in this case, instance1) becomes the "instance name" and is available within the unit file via the %i variable.

This is extremely useful for:

  • Multi-instance applications: Running multiple web servers or worker processes, each with its own configuration or port.
  • Unified management: A single template for all instances simplifies maintenance.
  • Dynamic creation: Instances can be created and launched programmatically.

Example: [email protected] can use ExecStart=/usr/bin/myapp --instance=%i --port=80%i. Then [email protected] will listen on port 8001, [email protected] on 8002, and so on.

Mastering these advanced Systemd capabilities significantly improves service management in production, making them more reliable, secure, efficient, and easily scalable. This is an investment in the stability and manageability of your infrastructure.

Practical Tips and Implementation Recommendations

Diagram: Practical Tips and Implementation Recommendations
Diagram: Practical Tips and Implementation Recommendations

Transitioning from basic to advanced Systemd usage requires a systematic approach and attention to detail. Below are specific steps, commands, and configurations that will help you use Systemd most effectively in production.

1. Unit File Structure and Best Practices

Always place your custom unit files in /etc/systemd/system/. To override system units, use drop-in files in /etc/systemd/system/unit.service.d/. Use systemctl edit --full your-service.service to create or edit a full unit, or systemctl edit your-service.service to create a drop-in file.

Example of a basic unit file for a Python application with Gunicorn:


# /etc/systemd/system/mywebapp.service
[Unit]
Description=My Web Application Gunicorn Service
After=network.target

[Service]
User=mywebapp
Group=mywebapp
WorkingDirectory=/opt/mywebapp
ExecStart=/opt/mywebapp/venv/bin/gunicorn --workers 3 --bind unix:/run/mywebapp.sock mywebapp.wsgi:application
ExecReload=/bin/kill -s HUP $MAINPID
KillMode=mixed
TimeoutStopSec=5
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal

# Security Directives
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
NoNewPrivileges=true
CapabilityBoundingSet=
DynamicUser=true # Can be used instead of User/Group if a permanent UID/GID is not required

# Resource Control
CPUQuota=75%
MemoryLimit=512M
TasksMax=50

[Install]
WantedBy=multi-user.target
    

After creating/modifying the unit file:


sudo systemctl daemon-reload
sudo systemctl enable mywebapp.service
sudo systemctl start mywebapp.service
sudo systemctl status mywebapp.service
    

2. Implementing Socket Activation for Resource Saving and Fault Tolerance

Socket Activation allows services to be launched on demand and provides zero downtime during restarts. It is ideal for HTTP/HTTPS services, databases (via proxy), and any other service that listens on a socket.

Example for a Python application with Gunicorn and Nginx:

Create /etc/systemd/system/mywebapp.socket:


# /etc/systemd/system/mywebapp.socket
[Unit]
Description=My Web Application Socket

[Socket]
ListenStream=/run/mywebapp.sock
SocketUser=mywebapp # If DynamicUser is used, this is not needed
SocketGroup=nginx # Grant Nginx access to the socket
SocketMode=0660
FreeBind=yes # Allows binding to an address that does not yet exist or is not configured
ReusePort=yes # Allows multiple processes to listen on the same port

[Install]
WantedBy=sockets.target
    

Modify mywebapp.service (remove --bind from ExecStart, as the socket will be passed):


# /etc/systemd/system/mywebapp.service
...
[Service]
# ... other directives ...
ExecStart=/opt/mywebapp/venv/bin/gunicorn --workers 3 mywebapp.wsgi:application
# ...
Requires=mywebapp.socket # Dependency on the socket, but not on its startup
...
    

In the Nginx configuration, use the same socket path:


# /etc/nginx/sites-available/mywebapp.conf
server {
    listen 80;
    server_name mywebapp.com;

    location / {
        proxy_pass http://unix:/run/mywebapp.sock;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
}
    

Enable and start the socket:


sudo systemctl daemon-reload
sudo systemctl enable mywebapp.socket
sudo systemctl start mywebapp.socket
sudo systemctl stop mywebapp.service # The service is not needed until there are requests
sudo systemctl status mywebapp.socket
    

Now mywebapp.service will only start on the first request to the /run/mywebapp.sock socket.

3. Replacing Cron with Timer Units

Timer Units are more reliable and integrated with Systemd. Use them for all periodic tasks.

Example: Daily backup at 02:00 AM.

Create /etc/systemd/system/mybackup.service:


# /etc/systemd/system/mybackup.service
[Unit]
Description=My Daily Backup Service
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/run_daily_backup.sh
StandardOutput=journal
StandardError=journal
User=backupuser # Or DynamicUser=yes

[Install]
WantedBy=multi-user.target
    

Create /etc/systemd/system/mybackup.timer:


# /etc/systemd/system/mybackup.timer
[Unit]
Description=Run My Daily Backup Service daily at 02:00

[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true # If the system was off, it will run the task on the next power-on
Unit=mybackup.service

[Install]
WantedBy=timers.target
    

Enable and start the timer (not the service directly):


sudo systemctl daemon-reload
sudo systemctl enable mybackup.timer
sudo systemctl start mybackup.timer
sudo systemctl status mybackup.timer
sudo journalctl -u mybackup.timer # Check timer logs
    

4. Precise Resource Management with Cgroups

Always set resource limits for production services to prevent them from "bloating" and affecting other processes.

Example: Memory and CPU limits for a Node.js service.


# /etc/systemd/system/mynodeapp.service
[Unit]
Description=My Node.js Application
After=network.target

[Service]
User=nodeapp
WorkingDirectory=/opt/mynodeapp
ExecStart=/usr/bin/node /opt/mynodeapp/app.js
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal

# Resource Control
CPUQuota=100% # Up to 1 full CPU core
MemoryLimit=1G # Maximum 1GB RAM
TasksMax=100 # Maximum 100 processes/threads
IOWeight=100 # Standard I/O priority

[Install]
WantedBy=multi-user.target
    

For aggregated resource management, use Slices. For example, for a group of web services:

Create /etc/systemd/system/web.slice:


# /etc/systemd/system/web.slice
[Unit]
Description=Web Services Slice

[Slice]
CPUQuota=300% # All web services combined, no more than 3 cores
MemoryLimit=8G # All web services combined, no more than 8GB RAM
IOWeight=500 # High I/O priority for web services
    

Then, in each web service unit file, specify:


# /etc/systemd/system/mywebapp.service
[Unit]
Description=My Web Application Gunicorn Service
PartOf=web.slice # This service belongs to the web.slice slice

[Service]
# ...
    

Reload Systemd and ensure the slice is running:


sudo systemctl daemon-reload
sudo systemctl start web.slice # If the slice does not activate automatically
sudo systemctl status web.slice
    

5. Security Hardening with Isolation

Always apply the strongest possible security directives for each service without compromising its functionality. This is one of Systemd's most powerful aspects.

Example: Maximum isolation for a simple worker service (no network, file system only).


# /etc/systemd/system/myworker.service
[Unit]
Description=My Isolated Worker Service
After=network.target

[Service]
Type=exec
DynamicUser=yes
WorkingDirectory=/opt/myworker
ExecStart=/opt/myworker/worker_script.sh
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal

# File System Security
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
PrivateDevices=true
ReadOnlyPaths=/usr/bin:/usr/lib:/etc/ssl # Allow reading only necessary system paths
ReadWritePaths=/var/lib/myworker # Allow writing only to a specific directory
StateDirectory=myworker # Automatically creates /var/lib/myworker with the correct permissions

# Network Security
RestrictAddressFamilies=AF_UNIX # Allow only Unix sockets, if needed
# IPAddressDeny=any # Can deny all network access if the service does not need network

# Process & Privilege Security
NoNewPrivileges=true
CapabilityBoundingSet= # Drop all capabilities if not needed
SystemCallFilter=@system-service @basic-io @file-system @process # Example of syscall filtering
# SystemCallFilter=~@mount @reboot @swap # Exclude dangerous syscalls
RemoveIPC=true # Remove IPC objects on stop

[Install]
WantedBy=multi-user.target
    

Important: When using SystemCallFilter, it is essential to thoroughly test the service to ensure it does not use forbidden system calls. Start with more general filters (e.g., @system-service) and gradually tighten them.

6. Effective Use of Journald

Configure Journald to persist logs after reboot and leverage its powerful filtering capabilities.

Ensure that /etc/systemd/journald.conf contains:


[Journal]
Storage=persistent
# SystemMaxUse=10G # Limit the total size of logs on disk
# MaxRetentionSec=1month # Delete logs older than 1 month
    

Restart Journald:


sudo systemctl restart systemd-journald
    

Useful journalctl commands:

  • journalctl -u mywebapp.service: Show logs for a specific service.
  • journalctl -u mywebapp.service -f: Follow service logs in real-time.
  • journalctl -u mywebapp.service --since "2026-01-01 10:00:00" --until "2026-01-01 11:00:00": Logs for a specific period.
  • journalctl -u mywebapp.service -p err..crit: Show only errors and critical messages.
  • journalctl -k: Kernel logs.
  • journalctl -x: Add explanations for some messages.
  • journalctl -o json: Output logs in JSON format for parsing.
  • journalctl _COMM=nginx: Show logs by executable name (if SYSLOG_IDENTIFIER is not specified).

7. Using Template Units for Scaling

For similar services that differ by only one parameter (e.g., instance number, port, config), use template units.

Example: Multiple workers with different IDs.

Create /etc/systemd/system/[email protected]:


# /etc/systemd/system/[email protected]
[Unit]
Description=My Worker Service Instance %i
After=network.target

[Service]
User=workeruser
WorkingDirectory=/opt/workers/worker-%i
ExecStart=/opt/workers/run_worker.sh --id=%i --config=/etc/workers/config-%i.yaml
Restart=on-failure
RestartSec=5s
StandardOutput=journal
StandardError=journal

# Security & Resources (applicable to each instance)
PrivateTmp=true
MemoryLimit=256M
CPUQuota=25%

[Install]
WantedBy=multi-user.target
    

Starting two instances:


sudo systemctl daemon-reload
sudo systemctl enable [email protected] [email protected]
sudo systemctl start [email protected] [email protected]
sudo systemctl status [email protected]
sudo journalctl -u [email protected]
    

These practical tips and examples will help you start using Systemd at a deeper level, creating reliable, secure, and manageable production services.

Common Mistakes When Working with Systemd in Production

Diagram: Common Mistakes When Working with Systemd in Production
Diagram: Common Mistakes When Working with Systemd in Production

Despite its power, Systemd has its pitfalls. Errors in its configuration or a misunderstanding of its operating principles can lead to instability, security issues, or debugging difficulties. Below are the most common mistakes engineers encounter and ways to avoid them.

1. Missing or Incorrect Configuration of the Directive Restart=

Mistake: Many engineers forget to specify Restart=on-failure (or another suitable option) for production services, assuming that Systemd will handle restarts automatically. Or they use Restart=always for services that should terminate after completing a task.

Consequences: The service crashes and does not restart automatically, leading to downtime. If Restart=always is used for a one-shot task, it will constantly restart, consuming resources and potentially creating an infinite error loop.

How to Avoid: For long-running daemons that should always be active, use Restart=on-failure or Restart=on-abnormal. For services that should perform a task and then terminate (e.g., cleanup scripts, backups), use Type=oneshot and do not specify Restart=, or use Restart=no. Always test the service's behavior in case of failure.


# Correct for a long-running service
[Service]
Type=simple
ExecStart=/usr/bin/my_daemon
Restart=on-failure
RestartSec=10s # Delay before restart

# Correct for a one-shot task
[Service]
Type=oneshot
ExecStart=/usr/bin/my_script.sh
RemainAfterExit=no # Default for oneshot, but can be explicitly specified
Restart=no # Explicitly indicate that no restart is needed
    

2. Running Services as Root Unnecessarily

Mistake: Out of habit or lack of knowledge, many services are run as the root user, even if they don't require elevated privileges.

Consequences: The attack surface is significantly increased. If a service is compromised, an attacker gains root access to the system, which can lead to a complete server compromise.

How to Avoid: Always run services as the least privileged user. Use the User= and Group= directives. Ideally, use DynamicUser=yes to automatically create a temporary user with minimal privileges. If a service requires special capabilities (e.g., listening on a port below 1024), use CapabilityBoundingSet= instead of running as root.


# Bad
[Service]
ExecStart=/usr/bin/my_app

# Good
[Service]
User=myuser
Group=mygroup
ExecStart=/usr/bin/my_app

# Even better
[Service]
DynamicUser=yes
ExecStart=/usr/bin/my_app
    

3. Ignoring Security Directives (Sandboxing)

Mistake: Not using powerful directives such as PrivateTmp, ProtectSystem, ProtectHome, NoNewPrivileges, and SystemCallFilter.

Consequences: The service has excessive access to the file system, network resources, and system calls. This allows a compromised service to read/modify system files, access data of other users, or exploit kernel vulnerabilities through forbidden system calls.

How to Avoid: Implement security directives based on the principle of least privilege. Start with basic ones (PrivateTmp=yes, ProtectSystem=full, ProtectHome=true, NoNewPrivileges=true) and gradually add stricter ones (PrivateDevices=yes, CapabilityBoundingSet=, SystemCallFilter=), thoroughly testing each. This requires a deep understanding of your application's needs.


# Example of a minimal set for most web services
[Service]
# ...
PrivateTmp=true
ProtectSystem=full
ProtectHome=true
NoNewPrivileges=true
    

4. Lack of Resource Limits (CPU, Memory, I/O)

Mistake: Running services without any limits on CPU, memory, or disk I/O consumption.

Consequences: A single "resource-hungry" or misbehaving service can easily exhaust all server resources, leading to slowdowns or complete freezes of other critical services, as well as triggering the OOM killer (Out-Of-Memory killer) and a full system reboot.

How to Avoid: Always set reasonable resource limits based on the service's expected consumption. Use CPUQuota=, MemoryLimit=, IOWeight=, TasksMax=. Regularly monitor actual resource consumption and adjust limits. Use Systemd Slices for group resource management.


# Setting limits
[Service]
# ...
CPUQuota=70%
MemoryLimit=1.5G
IOWeight=150
TasksMax=200
    

5. Directly Editing System Unit Files or Incorrect Use of Override Files

Mistake: Modifying files in /lib/systemd/system/ directly or creating override files that completely duplicate the original.

Consequences: When a package is updated, your changes will be lost or overwritten, leading to unexpected service behavior. Duplicating configuration makes tracking changes difficult and is prone to errors.

How to Avoid: Always use /etc/systemd/system/your-service.service.d/override.conf to make changes. Use systemctl edit your-service.service to automatically create and open such a file. In override files, specify only the directives you want to change or add. If you need to undo a directive that was in the original file, use empty values (e.g., ExecStart= to cancel all ExecStart, and then your own).


# Correct way to modify service settings
sudo systemctl edit mywebapp.service
    

# Contents of /etc/systemd/system/mywebapp.service.d/override.conf
[Service]
MemoryLimit=768M
RestartSec=8s
    

6. Ignoring Journald and Lack of Centralized Logging

Mistake: Continuing to use outdated logging methods (e.g., writing to separate files without rotation), ignoring Journald.

Consequences: Difficulties in finding and analyzing logs, especially when problems arise. Dispersed logs complicate diagnosis and monitoring, increasing MTTR.

How to Avoid: Always direct StandardOutput and StandardError to journal. Configure Storage=persistent in journald.conf. Use journalctl for filtering and analyzing logs. Integrate Journald with your log aggregation system (Fluentd, Vector, Loki) for centralized collection and analysis at scale.


[Service]
# ...
StandardOutput=journal
StandardError=journal
    

7. Incorrect or Insufficient Dependency Management

Mistake: Incorrect use of After=, Requires=, Wants=, leading to services that fail to start, start too early or too late, or do not terminate correctly.

Consequences: Services may crash on startup if their dependencies are not yet ready. Or, conversely, the system may hang during shutdown if services do not terminate in the correct order.

How to Avoid:

  • After=: The service starts after the specified unit. Order only.
  • Requires=: The service requires the specified unit. If the dependency fails to start or crashes, this service will also not start/stop.
  • Wants=: The service wants the specified unit. If the dependency fails to start, this service will still attempt to start. A softer dependency.
  • BindsTo=: If the dependency stops, this service will also stop. A strong dependency for linking lifecycles.
  • PartOf=: Links the lifecycle of a unit to another (e.g., for Slices).
Always carefully consider and test dependencies. Use systemctl list-dependencies my-service.service for visualization. For example, a web service typically has After=network.target db.service and Wants=db.service.

By avoiding these common mistakes, you will significantly enhance the stability, security, and manageability of your Systemd-based production infrastructure.

Checklist for Practical Systemd Application

Before deploying or updating a service in production, work through this checklist. It will help ensure that you are using Systemd as efficiently and securely as possible.

1. Unit File: Basics

  • File Path: The unit is located in /etc/systemd/system/ or is a drop-in file in /etc/systemd/system/unit.service.d/.

  • Description: The Description= directive contains a clear and understandable description of the service.

  • Service Type: The Type= directive is correctly configured (simple, forking, oneshot, notify, idle) depending on the application's behavior.

  • Working Directory: The WorkingDirectory= directive points to the correct working directory of the service.

  • Start Command: The ExecStart= directive specifies the full path to the executable file or script.

  • Restart: The Restart=on-failure directive (or similar) is configured for long-running services.

  • Restart Delay: The RestartSec= directive is set to prevent service "flapping".

  • Termination: The KillMode= directive (e.g., mixed or control-group) and TimeoutStopSec= are configured for proper termination.

2. Dependency Management

  • Run After: The After=network.target directive (and other necessary units, e.g., db.service) is specified.

  • Required Dependencies: The Requires= directive is used for critically important dependencies.

  • Desired Dependencies: The Wants= directive is used for optional but desired dependencies.

  • Target Unit: The WantedBy=multi-user.target directive (or another suitable target) is specified for automatic startup at boot.

  • On-demand Activation: If applicable, .socket, .path, or .timer units are used for on-demand activation.

3. Isolation and Security

  • User Privileges: The service runs as an unprivileged user/group (User=, Group=) or with DynamicUser=yes.

  • Temporary Files: The PrivateTmp=true directive is enabled for temporary file isolation.

  • System Directories: The ProtectSystem=full (or strict) directive is enabled to protect system directories.

  • Home Directories: The ProtectHome=true directive is enabled to protect user home directories.

  • Devices: The PrivateDevices=true directive is enabled for device isolation.

  • New Privileges: The NoNewPrivileges=true directive is enabled to prevent privilege escalation.

  • Capability Restriction: The CapabilityBoundingSet= directive is used to minimize available Linux capabilities.

  • System Call Filtering: The SystemCallFilter= directive (seccomp) is used to restrict allowed system calls (after thorough testing!).

  • Network Access: The RestrictAddressFamilies=, IPAddressAllow=/IPAddressDeny= directives are used to restrict network access, if necessary.

  • IPC Isolation: The PrivateIPC=true directive is enabled if the service does not require shared IPC.

4. Resource Management

  • CPU Limit: The CPUQuota= directive is set to limit CPU usage.

  • Memory Limit: The MemoryLimit= directive is set to limit RAM usage.

  • I/O Priority: The IOWeight= directive is set to manage disk I/O priority.

  • Max Tasks: The TasksMax= directive is set to limit the number of processes/threads.

  • Slice Grouping: The service is included in the appropriate .slice (e.g., PartOf=web.slice) for group resource management.

5. Monitoring and Logging

  • Logs in Journald: The StandardOutput=journal and StandardError=journal directives are configured.

  • Log Retention: Journald is configured for Storage=persistent and, if necessary, for disk usage limits (SystemMaxUse=, MaxRetentionSec=).

  • Metric Monitoring: You know how to obtain resource metrics via systemd-cgtop, systemctl status, or external tools integrated with cgroups.

  • Alerts: Integration with a monitoring and alerting system for critical service events (e.g., failures, limit overruns).

6. Deployment and Maintenance

  • Daemon Reload: After modifying the unit file, sudo systemctl daemon-reload is executed.

  • Enable Service: The service is enabled for autostart at boot (sudo systemctl enable unit.service).

  • Status Check: The service status (sudo systemctl status unit.service) and logs (journalctl -u unit.service) are regularly checked.

  • Testing: The service's behavior has been tested under failures, restarts, and load.

  • Automation: Systemd unit configuration is part of an automated deployment process (e.g., with Ansible, Terraform).

By following this checklist, you will significantly improve the quality and reliability of your Systemd-managed production infrastructure.

Cost Calculation and Economic Efficiency of Systemd

Diagram: Cost Calculation and Economic Efficiency of Systemd
Diagram: Cost Calculation and Economic Efficiency of Systemd

At first glance, Systemd is an integrated Linux component that has no direct cost. However, its advanced use in production environments significantly impacts the total cost of ownership (TCO) of infrastructure and the economic efficiency of projects. Savings are achieved through reduced operating expenses, increased reliability, and more efficient resource utilization.

Areas of Cost Reduction

  1. Downtime Reduction:
    • Automatic Recovery: Directives like Restart=on-failure, RestartSec=, and Socket Activation significantly reduce downtime by automatically restarting failed services. Each hour of downtime for a high-load SaaS project can cost from several hundred to tens of thousands of dollars. Reducing MTTR (Mean Time To Recovery) by 30-50% through Systemd automation leads to direct savings.
    • Example: A SaaS project with a revenue of $500/hour of downtime. If Systemd prevents 2 incidents per month, each of which could have lasted 1 hour without automatic recovery, the savings amount to $1000/month.
  2. Resource Optimization:
    • Cgroups v2: Directives like CPUQuota, MemoryLimit, IOWeight allow for precise resource allocation among services. This prevents resource "starvation" and enables more efficient use of server hardware.
    • Socket Activation: On-demand service startup saves CPU and RAM for rarely used services, allowing more services to be hosted on a single machine or smaller instances to be used.
    • Example: If, thanks to resource optimization on 10 servers, it's possible to use instances one size smaller (e.g., m6g.large instead of m6g.xlarge in AWS), this can save $0.086 24 30 10 = $619.2 per month on CPU/RAM alone, not including I/O.
    • DynamicUser: Simplifies user management, reducing overhead for security administration.
  3. Enhanced Security:
    • Isolation (Sandboxing): Directives like PrivateTmp, ProtectSystem, ProtectHome, SystemCallFilter, NoNewPrivileges create isolated sandboxes for services. This significantly reduces the risk of horizontal attack movement in case a single service is compromised.
    • Risk Reduction: The cost of a data breach or complete server takeover can be catastrophic (fines, reputational damage, legal costs). Investments in security through Systemd pay off many times over.
    • Example: The average cost of a data breach in 2026 for SMBs is approximately $150,000. Reducing the probability of such a breach by 5-10% through advanced Systemd isolation represents significant indirect savings.
  4. Simplified Administration & Automation:
    • Unified Configuration: A single declarative unit file format simplifies service management compared to disparate shell scripts.
    • Centralized Logging (Journald): Accelerates problem diagnosis, reducing the time engineers spend searching and analyzing logs.
    • Tool Integration: Systemd integrates well with Ansible, Terraform, Puppet, simplifying deployment and management automation and reducing the labor costs of DevOps engineers.
    • Example: If automation and simplified debugging save 2 hours of engineering time per week (at an engineer's rate of $70/hour), this amounts to $560/month.

Hidden Costs and How to Optimize Them

  • Learning Curve: Learning advanced Systemd features requires time and effort from engineers.
    • Optimization: Invest in team training, create internal documentation, and unit file templates. Initial costs will be recouped through increased efficiency.
  • Debugging Complexity: Sometimes deep isolation can complicate debugging, as services have limited access to the system.
    • Optimization: Use journalctl -x to get context. Temporarily relax security directives for debugging (in override.conf) and then revert to strict settings.
  • Potential Overlap: Some Systemd features may duplicate the capabilities of container orchestrators (e.g., Kubernetes).
    • Optimization: Understand where Systemd complements and where it overlaps with other tools. Systemd remains critical for managing basic host services and for running container runtimes (CRI-O, containerd) and Kubelet.

Table with Examples of Economic Efficiency Calculations (Hypothetical SaaS Project)

Let's assume we have a SaaS project running on 10 cloud servers, with a monthly revenue of $50,000.

Metric / Area "Before" Systemd (Traditional Approaches) "After" Systemd (Advanced Use) Savings / Benefit per Month Comment
Mean Time To Recovery (MTTR) 30 min/incident 10 min/incident $1,667 2 incidents/month. MTTR reduction by 20 min 2 inc. ($50000 / 720 hours) = $1667.
Number of Incidents due to Failures 4 incidents/month. 2 incidents/month. $2,778 Reduction by 2 incidents/month 30 min/inc. ($50000 / 720 hours) = $2778.
Instance Costs (Cloud Compute) 10 x m6g.xlarge ($0.172/hour) = $1238.4/month. 10 x m6g.large ($0.086/hour) = $619.2/month. $619.2 Resource optimization allowed for the use of smaller instances.
Engineer Time for Debugging/Administration 40 hours/month. 20 hours/month. $1,400 Simplified logging and automation. (Rate $70/hour).
Security Risks (Potential Losses) High Significantly Lower Invaluable / >$10,000 Reduced probability of data breach or compromise. Difficult to quantify, but critically important.
Total Estimated Savings/Benefit ~$6,464.2 + (risk reduction) Per month, excluding training.

This table demonstrates that even without a direct cost, Systemd is a powerful tool for increasing economic efficiency and reducing TCO. Investments in mastering and correctly applying Systemd pay off through increased reliability, security, resource utilization efficiency, and reduced operating expenses.

Cases and Examples of Systemd Usage in Production

Diagram: Cases and Examples of Systemd Usage in Production
Diagram: Cases and Examples of Systemd Usage in Production

To illustrate the practical value of advanced Systemd, let's consider several realistic scenarios from a production environment.

Case 1: High-Load API Gateway with Socket Activation and Resource Limits

Problem:

The startup "API-Hub" provides critical APIs for numerous clients. Their API Gateway (written in Go) was launched in a traditional way, constantly consuming resources. During peak loads, delays sometimes occurred due to CPU contention, and service updates resulted in brief service interruptions.

Solution with Systemd:

The DevOps team decided to optimize the API Gateway's operation using Socket Activation and fine-tuning Systemd cgroups.

  1. Socket Activation:
    • An api-gateway.socket unit was created, listening on port 8080.
    • api-gateway.service was configured to launch on demand from api-gateway.socket.
    • This allowed reducing baseline RAM and CPU consumption during off-peak hours, as the service was not running constantly. Systemd buffered incoming connections while the service restarted during updates.
  2. Resource Limits:
    • Strict resource limits were set in api-gateway.service: CPUQuota=200% (no more than two cores), MemoryLimit=2G.
    • For critical APIs, a critical-api.slice was created with higher IOWeight= and CPUShares= to guarantee them priority.
  3. Security:
    • The service was launched with DynamicUser=yes, PrivateTmp=true, ProtectSystem=full, ProtectHome=true.
    • SystemCallFilter= was configured to allow only necessary system calls, significantly reducing the attack surface.

Results:

  • Zero downtime during updates: Clients no longer noticed brief interruptions during the deployment of new versions.
  • Reduced infrastructure costs: During off-peak hours, servers could host more other services, as the API Gateway was not consuming resources. The number of API Gateway instances was reduced by 15% due to more efficient resource utilization.
  • Increased stability: Random load spikes on the API Gateway no longer led to resource starvation for other critical services on the same host.
  • Improved security: The risk of compromise was reduced thanks to deep isolation.

Case 2: Background Data Processing with Timer Units and Resource Slices

Problem:

The company "Data-Flow" processes large volumes of data. They have several types of background tasks: daily reports, weekly old data cleanup, and hourly synchronization. These tasks were launched via cron. Problems often arose: cron jobs sometimes "hung," there was no centralized logging, and heavy tasks could consume too many resources, affecting the performance of the production database.

Solution with Systemd:

Engineers migrated all background tasks to Systemd Timer Units, grouping them into Resource Slices.

  1. Timer Units:
    • Corresponding .timer units were created for each task (daily-report.service, weekly-cleanup.service, hourly-sync.service).
    • OnCalendar= was used for precise scheduling, and Persistent=true ensured that missed tasks were launched after a reboot.
    • All task logs were now directed to Journald (StandardOutput=journal), which simplified monitoring.
  2. Resource Slices:
    • Two Systemd Slices were created: batch.slice for all background tasks and priority-batch.slice for critical synchronization tasks.
    • batch.slice received CPUQuota=300% and MemoryLimit=16G (a common limit for all background tasks).
    • priority-batch.slice, which included hourly-sync.service, received higher IOWeight= and CPUShares=.
    • Each background task (.service unit) was included in the corresponding slice using PartOf=batch.slice or PartOf=priority-batch.slice.
  3. Isolation:
    • DynamicUser=yes and PrivateTmp=true were used for each task.
    • File system access was restricted using ReadOnlyPaths= and ReadWritePaths=, allowing access only to necessary data.

Results:

  • Reliable task execution: Tasks no longer "hung" unnoticed, and their execution was stable. Missed tasks were launched automatically.
  • Elimination of resource conflicts: Background tasks no longer affected the performance of the production database, as their resource consumption was strictly limited and prioritized.
  • Centralized monitoring: All task logs became accessible via journalctl, which significantly simplified debugging and auditing.
  • Simplified management: A unified approach to task scheduling instead of scattered entries in crontab.

Case 3: Managing Core Services for Kubernetes Hosts

Problem:

Cloud provider "KubeCloud" deploys Kubernetes clusters for its clients. On each cluster node (worker node), numerous system daemons run (kubelet, containerd, cni-plugins, node-exporter, etc.). It was important to ensure their stability, security, and predictable resource consumption to avoid impacting client workloads.

Solution with Systemd:

Systemd was used to manage all system services on Kubernetes nodes.

  1. Strict resource limits for system components:
    • CPUQuota= and MemoryLimit= were set for kubelet.service, containerd.service, node-exporter.service, and other critical system daemons. This ensured that even under high load on the node, system components would always have sufficient resources for stable operation.
    • A system-core.slice was created, including all core Kubernetes components, to manage their overall resource consumption.
  2. Enhanced isolation:
    • Strict security directives were applied for each system daemon: PrivateTmp=true, ProtectSystem=full, ProtectHome=true, NoNewPrivileges=true.
    • For containerd.service and kubelet.service, SystemCallFilter= was configured to allow only the system calls necessary for their operation, significantly reducing the host-level attack surface.
  3. Auditing and Monitoring:
    • All system logs were directed to Journald and then aggregated into a centralized monitoring system.
    • .timer units were used for periodic cluster health checks and node maintenance tasks (e.g., disk cleanup of old images).

Results:

  • Cluster stability: System daemons on nodes operated predictably, without resource starvation, even under high load from client containers.
  • Enhanced node security: Isolation of system components reduced the risk of node compromise through vulnerabilities in core services.
  • Simplified diagnostics: Centralized Systemd logs and metrics significantly accelerated the diagnosis of node-level issues.
  • Unified management: All system services were managed uniformly via Systemd, which simplified the automation of node deployment using Ansible and Terraform.

These cases demonstrate that Systemd is not just a tool for launching services, but a full-fledged platform capable of solving complex tasks to ensure reliability, security, and efficiency in the most demanding production environments.

Tools and Resources for Working with Systemd

Diagram: Tools and Resources for Working with Systemd
Diagram: Tools and Resources for Working with Systemd

Effective work with Systemd in production requires not only an understanding of its functions but also knowledge of tools for its configuration, monitoring, and debugging, as well as access to up-to-date documentation.

1. Essential Systemd CLI Utilities

  • systemctl: The main utility for managing Systemd.
    • systemctl status <unit>: Show the current status of a unit.
    • systemctl start/stop/restart/reload <unit>: Manage the state of a unit.
    • systemctl enable/disable <unit>: Enable/disable autostart for a unit.
    • systemctl is-active/is-enabled/is-failed <unit>: Check the status of a unit.
    • systemctl list-units: List all active/loaded units.
    • systemctl list-unit-files: List all unit files and their statuses.
    • systemctl list-dependencies <unit>: Show the dependency tree of a unit.
    • systemctl show <unit>: Show all properties of a unit, including current directive values.
    • systemctl edit <unit> / systemctl edit --full <unit>: Convenient editing of override files or full units.
    • systemctl daemon-reload: Reload Systemd configuration after changing unit files.
    • systemctl set-property <unit> <property>=<value>: Dynamically change unit properties during runtime (e.g., resource limits).
  • journalctl: Utility for working with Journald logs.
    • journalctl -u <unit>: Show logs for a specific unit.
    • journalctl -f: Follow logs in real-time.
    • journalctl --since "1 hour ago": Logs from the last hour.
    • journalctl -p err: Show only errors.
    • journalctl -o json: Output logs in JSON format.
    • journalctl -k: Show kernel logs.
    • journalctl _PID=<pid>: Show logs by PID.
  • systemd-cgtop: Utility for interactive monitoring of cgroups resource usage.
    • Shows real-time CPU, memory, and I/O consumption by processes and cgroups (including Systemd units and slices).
    • Very useful for quickly identifying "resource-hungry" services or groups.
  • loginctl: Utility for managing user sessions.
    • loginctl list-sessions: List active sessions.
    • loginctl show-session <id>: Session details.
  • hostnamectl: Utility for managing the hostname.
  • timedatectl: Utility for managing system time and timezone.

2. Monitoring and Testing

  • Prometheus + Node Exporter:
    • Node Exporter collects host state metrics, including cgroups information that Systemd uses for resource management.
    • Prometheus can aggregate these metrics, and Grafana can visualize them, allowing you to track CPU, RAM, and I/O consumption by each Systemd unit.
    • This provides a deep understanding of service behavior at the OS level.
  • Loki / ELK Stack:
    • For centralized collection and analysis of logs from Journald.
    • Loki (from Grafana Labs) is well-suited for structured Journald logs, allowing easy filtering and analysis.
    • Filebeat with the Journald module or Vector can forward logs from Journald to the ELK Stack (Elasticsearch, Logstash, Kibana) or other systems.
  • stress-ng: Utility for creating artificial load on the system (CPU, RAM, I/O).
    • Useful for testing how your Systemd resource limits perform under pressure.
    • stress-ng --cpu 4 --vm 2 --vm-bytes 1G --timeout 60s: Load on 4 CPU cores and 2GB RAM for 60 seconds.
  • strace, ltrace: For debugging issues with SystemCallFilter.
    • strace -f -o /tmp/syscalls.log /usr/bin/my_app: Records all system calls made by the application and its descendants. This helps determine which system calls the application needs and, consequently, which ones to allow in SystemCallFilter=.

3. Automation Tools

  • Ansible:
    • The ansible.builtin.systemd module allows managing Systemd units (start, stop, enable, disable, daemon-reload) in a declarative style.
    • Use templates (Jinja2) to generate unit files based on variables.
    • Example:
      
      - name: Ensure mywebapp service is running and enabled
        ansible.builtin.systemd:
          name: mywebapp.service
          state: started
          enabled: yes
          daemon_reload: yes
                          
  • Terraform:
    • While Terraform itself does not directly manage Systemd, it can be used to deploy cloud instances and execute initialization scripts (e.g., via cloud-init or remote-exec) that configure Systemd units.
    • Unit files can be generated and copied to servers.
  • SaltStack / Puppet / Chef:
    • Similar to Ansible, these configuration management tools have modules for working with Systemd, allowing declarative description of service states.

4. Useful Links and Documentation

  • Official Systemd Documentation (man pages): The most authoritative source of information.
    • man systemd.unit: General information about unit files.
    • man systemd.service: Directives for services.
    • man systemd.socket, man systemd.timer, man systemd.path, man systemd.slice: Specific directives for other unit types.
    • man systemd.exec: General directives concerning process execution (including security and resources).
    • man systemd.resource-control: Directives for cgroups.
    • man journalctl: Using Journald.
  • ArchWiki Systemd: An excellent resource with clear explanations and examples, often more accessible than man pages.
  • Systemd by Example: A project with many practical examples of unit files for various scenarios.
  • Systemd for Developers (YouTube): A series of video tutorials that may be useful.
  • Systemd Repository on GitHub: For those who want to delve into the source code and project development.

By using this set of tools and resources, you will be able to effectively manage Systemd in your production environment, ensuring high reliability, security, and service manageability.

Troubleshooting: Systemd Problem Solving

Diagram: Troubleshooting: Systemd Problem Solving
Diagram: Troubleshooting: Systemd Problem Solving

When working with Systemd in production, situations where services do not behave as expected are inevitable. Effective diagnosis and troubleshooting require a systematic approach and knowledge of basic commands and methods. Below are typical problems and ways to solve them.

1. Service fails to start or immediately crashes (failed status)

Symptoms:

  • systemctl status myapp.service shows Active: failed.
  • The service attempts to start multiple times but crashes each time.

Diagnosis and Solution:

  1. Check logs: This is the first and most important step.
    
    journalctl -u myapp.service --since "1 hour ago" -b -x
                
    • -b: Show logs since the last boot.
    • -x: Add explanations for some messages, which often provides important clues.
    Look for error messages, stack traces, file/network access errors.
  2. Check the ExecStart command:
    • Ensure that the path to the executable file or script is correct.
    • Ensure that all command arguments are correct.
    • Try running the ExecStart command manually from the command line under the same user (sudo -u <user> <command>) and in the same WorkingDirectory to rule out environment issues.
  3. Check permissions:
    • Ensure that the user under which the service runs (User= or DynamicUser) has the necessary read/write permissions in the WorkingDirectory and other directories the service interacts with.
    • Check permissions on the executable file itself.
  4. Reload the daemon: If you recently changed the unit file, make sure you ran sudo systemctl daemon-reload.
  5. Check dependencies: Ensure that all necessary dependencies (Requires=, After=) are running and active.
    
    systemctl list-dependencies myapp.service
    systemctl status network.target # Example
                
  6. Check environment variables: Ensure that all necessary environment variables (Environment=, EnvironmentFile=) are set correctly.

2. Service runs slowly or "hangs" due to resource starvation

Symptoms:

  • The application responds with a delay or does not respond at all.
  • The server as a whole runs slowly.
  • There are no explicit errors in the logs, but there are messages about timeouts or slowdowns.

Diagnosis and Solution:

  1. Resource monitoring:
    
    systemd-cgtop
    htop # Or top
                
    Use systemd-cgtop to view CPU, memory, I/O consumption by Systemd units/slices. This will help quickly identify a "resource-hungry" service.
  2. Check resource limits:
    
    systemctl show myapp.service | grep -E "CPUQuota|MemoryLimit|IOWeight|TasksMax"
                
    Ensure that the set CPUQuota, MemoryLimit, IOWeight, TasksMax are not too small for the service to function normally. If the service reaches the memory limit, it may be killed by the OOM killer (check journalctl -k -p err for OOM messages).
  3. Configure priorities: If there are competing services, use CPUShares=/CPUWeight= and IOWeight=, as well as Systemd Slices for more fine-grained priority management.
  4. Log analysis: Look for signs of slowdowns, long database queries, or network issues in the service logs (journalctl -u myapp.service).

3. Service does not work after applying security directives

Symptoms:

  • The service crashes with file access, network, or system call errors.
  • Errors like "Permission denied", "Operation not permitted".

Diagnosis and Solution:

This is one of the most complex problems, as isolation can be very strict.

  1. Disable directives one by one:
    • Create an override file (sudo systemctl edit myapp.service).
    • Temporarily comment out or set security directives to no/empty value (e.g., PrivateTmp=no, ProtectSystem=no, SystemCallFilter=).
    • Reload the Systemd daemon (systemctl daemon-reload) and restart the service (systemctl restart myapp.service).
    • Gradually re-enable directives until you find the one causing the problem.
  2. Log analysis and strace:
    • Carefully examine logs (journalctl -u myapp.service) for access errors.
    • If the problem is with SystemCallFilter=, run the application with strace to see which system calls it makes.
      
      strace -f -o /tmp/myapp_syscalls.log /usr/bin/my_app
      # Then analyze /tmp/myapp_syscalls.log
                          
      This will help add the necessary system calls to SystemCallFilter=.
  3. Check paths: If ReadOnlyPaths=, ReadWritePaths=, StateDirectory= are used, ensure that all necessary read/write paths are explicitly allowed.
  4. Check network access: If the service needs to communicate over the network, ensure that RestrictAddressFamilies= and IPAddressDeny=/IPAddressAllow= are not blocking the required traffic.

4. Socket Activation is not working

Symptoms:

  • The service does not start when an incoming connection arrives.
  • The Systemd socket is listening, but the connection is not passed to the service.

Diagnosis and Solution:

  1. Check socket and service status:
    
    systemctl status myapp.socket
    systemctl status myapp.service
                
    Ensure that myapp.socket is active (Active: active (listening)). myapp.service should be inactive (dead) until the first connection.
  2. Check socket logs:
    
    journalctl -u myapp.socket
                
    Look for errors during socket creation or connection transfer.
  3. Check the link between the socket and the service:
    • Ensure that myapp.socket specifies the Unit=myapp.service directive (if the name differs).
    • Ensure that myapp.service does not have an ExecStart that tries to create the socket itself. The service should use the socket passed by Systemd (usually via file descriptor 3).
    • If the service is written in Python, Node.js, Go, ensure that it correctly handles the passed socket (e.g., via the LISTEN_FDS or SD_LISTEN_FDS environment variable).
  4. Check socket permissions: If it's a Unix socket, ensure that SocketUser=, SocketGroup=, SocketMode= are configured so that the client (e.g., Nginx) can connect to it.
  5. Firewall: Ensure that the firewall (ufw, firewalld, iptables) is not blocking incoming connections to the port the socket is listening on.

5. Timer Unit does not start the service

Symptoms:

  • The service that should start by timer does not start.
  • systemctl status mytimer.timer shows that the timer is active, but Last run or Next run do not match expectations.

Diagnosis and Solution:

  1. Check timer status:
    
    systemctl status mytimer.timer
                
    Ensure that Active: active (waiting) and look at Next run.
  2. Check timer logs:
    
    journalctl -u mytimer.timer
                
    Look for configuration errors or service activation errors.
  3. Check OnCalendar=:
    • Ensure that the OnCalendar= syntax is correct. Use man systemd.time for reference.
    • Ensure that the server's timezone matches your expectations (timedatectl).
  4. Check the link between the timer and the service:
    • Ensure that mytimer.timer specifies the Unit=myapp.service directive.
    • Ensure that myapp.service exists and can be started manually (systemctl start myapp.service).
  5. Persistent=true: If the task should run even if the system was off during the scheduled run, ensure that Persistent=true is enabled.

Always start by checking logs and status. Systemd is very informative, and most problems can be diagnosed by carefully examining the output of journalctl and systemctl status. If the problem is not resolved, temporarily loosen the configuration (e.g., security directives) one by one to isolate the source of the problem, and then re-enable them.

When to seek support

If you have exhausted all your diagnostic capabilities and the problem remains unresolved:

  • Linux Communities: Distribution forums (Ubuntu Forums, Ask Fedora, ArchWiki), Stack Overflow, Reddit (r/linuxadmin, r/systemd, r/devops). Provide as much information as possible: OS version, Systemd version, full unit file, systemctl status output, and relevant logs from journalctl.
  • Official Bug Trackers: If you suspect a bug in Systemd itself or in your distribution, refer to official support channels or the Systemd project's bug trackers on GitHub.
  • Vendor Support: For commercial distributions (Red Hat Enterprise Linux, SUSE Linux Enterprise Server) or cloud providers (AWS, Azure, GCP), use their official support channels if the problem affects the base OS or integration with cloud services.

FAQ: Frequently Asked Questions about Systemd in Production

1. Should I use Systemd if I'm already using Docker/Kubernetes?

Yes, absolutely. Systemd and container orchestrators solve different but complementary tasks. Systemd manages the host operating system's basic services (kubelet, containerd, network daemons, sshd, logging, monitoring), and can also be used to start the Docker daemon. Kubernetes orchestrates containers, but Kubelet itself and the container runtime (e.g., containerd) are managed by Systemd. Advanced Systemd features (e.g., cgroups for limiting system process resources, isolation for host security) remain critically important even in containerized environments.

2. How secure is DynamicUser=yes?

DynamicUser=yes significantly enhances security. Systemd automatically creates a unique, unprivileged user and group for a service upon its startup and removes them upon shutdown. This eliminates the need for manual UID/GID management and prevents conflicts or unauthorized access between services. In conjunction with directives like StateDirectory=, CacheDirectory=, LogsDirectory=, Systemd also creates and manages data directory permissions for this dynamic user, ensuring clean isolation. This is a best practice for many services, especially those that do not require a persistent UID/GID for interaction with other systems.

3. What are the main benefits of Socket Activation for production?

Main benefits: 1) Resource Savings: The service starts only upon the first request, freeing up CPU and RAM during inactive periods. 2) Zero-Downtime Deployments: Systemd buffers incoming connections while the service restarts, allowing application updates without service interruption. 3) Increased Fault Tolerance: If a service crashes, Systemd continues to listen on the socket and can restart the service upon the next request. 4) Simplified Dependency Management: Services can depend on sockets rather than other services, allowing them to start in any order.

4. Can Journald fill up the entire disk with logs?

Yes, if limits are not configured. By default, Journald can store logs in RAM or on the file system (/var/log/journal/). To prevent disk overflow, configure directives in /etc/systemd/journald.conf: SystemMaxUse=10G (maximum size of all logs on disk), SystemKeepFree=15% (leave X% free space), MaxRetentionSec=1month (delete logs older than one month). After modifying the file, remember to run sudo systemctl restart systemd-journald.

5. Why are Systemd Timer Units better than Cron for production?

Systemd Timer Units surpass Cron for several reasons: 1) Reliability: Integration with Systemd allows for the use of dependencies, cgroups, and logging to Journald. 2) Persistence: With Persistent=true, missed tasks are run upon the next system boot. 3) Precision: More flexible scheduling options (OnCalendar=, OnBootSec=, OnUnitActiveSec=). 4) Monitoring: Easily check status, last and next run times, and logs via systemctl status <timer> and journalctl -u <service>. 5) Isolation: Services launched by timers can utilize all Systemd security directives.

6. What is the difference between cgroups v1 and v2, and why is it important for Systemd?

Cgroups v2 is a unified cgroups hierarchy, offering a simpler and more powerful resource management model compared to cgroups v1, where each subsystem (CPU, memory, I/O) had its own hierarchy. Systemd fully supports cgroups v2 (using it by default in most modern distributions), which allows for more precise and consistent application of resource limits and priorities. This is crucial for preventing resource starvation, ensuring stable performance, and efficiently utilizing hardware resources in production.

7. How deep is Systemd's isolation from a security perspective?

Systemd's isolation is very deep and comparable to lightweight containers, but at the host level. It utilizes numerous Linux kernel mechanisms: namespaces (PID, mount, UTS, IPC, cgroup), seccomp, capabilities, chroot. Directives like PrivateTmp, ProtectSystem, NoNewPrivileges, SystemCallFilter create powerful sandboxes, significantly limiting the capabilities of a compromised service. While it's not full virtualization like KVM, nor full isolation like Docker (without additional layers), for many tasks, this is a sufficient and highly effective level of protection.

8. How to test Systemd unit files?

Testing unit files includes: 1) Manual Testing: Starting and stopping the service, checking status and logs. 2) Stress Testing: Using stress-ng to verify how the service behaves under load and how resource limits function. 3) Failure Testing: Forcibly terminating the service (kill -9 <PID>) to check automatic restarts. 4) Security Testing: Running exploits (if safe) or attempting unauthorized access from the service's environment to verify isolation directives. 5) Integration Testing: Starting the entire service stack to check dependencies. 6) CI/CD: Including checks for unit files and their functionality in the continuous integration/delivery pipeline.

9. What are the general best practices for Systemd in production?

1) Always use User=/DynamicUser=. 2) Apply security directives (PrivateTmp, ProtectSystem, etc.). 3) Set resource limits (CPUQuota, MemoryLimit). 4) Direct logs to Journald. 5) Use Restart=on-failure for daemons. 6) Use Socket/Timer Activation when appropriate. 7) Store custom units in /etc/systemd/system/, use drop-in files. 8) Automate unit management with tools (Ansible, Terraform). 9) Regularly monitor service status and resources.

10. Can Systemd be used on Windows or macOS?

No, Systemd is a Linux-specific initialization system and service manager, deeply integrated with the Linux kernel. It does not run directly on Windows or macOS. These operating systems have their own service managers (e.g., Windows Services Manager on Windows, launchd on macOS). However, if you use the Windows Subsystem for Linux (WSL2), you can run a Linux distribution where Systemd may be (or will be in future versions) functional, but this is still a Linux environment within Windows.

Conclusion

By 2026, Systemd has firmly established itself as a cornerstone of modern Linux infrastructure. Its evolution from a simple init system to a comprehensive platform for service management has transformed the approach to ensuring reliability, security, isolation, and monitoring in production environments. As we have seen, Systemd offers much more than basic systemctl start/stop commands; it provides a powerful toolkit for creating fault-tolerant, secure, and resource-efficient applications.

Mastering Systemd's advanced capabilities is not just a "nice to have"; it's a mandatory requirement for any engineer working with Linux servers in production. The ability to use Socket Activation for zero downtime, Timer Units for reliable periodic tasks, cgroups v2 for precise resource management, and an extensive set of security directives to create isolated sandboxes significantly enhances the quality and resilience of your infrastructure. This directly translates into reduced operational costs, decreased downtime, and minimized security risks, which in turn contributes to the success of any SaaS project or high-load system.

We have explored how Systemd can help solve real-world problems: from preventing resource starvation and ensuring seamless deployments to centralized logging and enhanced security. The presented use cases demonstrate that these capabilities are applicable in a wide range of scenarios — from high-load API gateways to background data processing and managing the core services of Kubernetes nodes.

Final Recommendations:

  1. Apply the Principle of Least Privilege: Always run services as an unprivileged user, use DynamicUser=yes, and tighten security directives as much as possible (PrivateTmp, ProtectSystem, SystemCallFilter).
  2. Manage Resources: Set CPUQuota, MemoryLimit, IOWeight, and use Systemd Slices for all production services to prevent resource starvation and ensure predictable performance.
  3. Automate and Monitor: Integrate Systemd into your CI/CD pipelines using Ansible or Terraform. Configure Journald for centralized logging and use Prometheus/Grafana for monitoring cgroups metrics.
  4. Utilize Advanced Activation Mechanisms: Socket Activation and Timer Units are powerful tools for enhancing fault tolerance and optimizing resources.
  5. Continuously Learn and Improve: The Linux and Systemd ecosystem is constantly evolving. Regularly consult official documentation and communities to stay informed about best practices and new capabilities.

Next Steps for the Reader:

  • Practice: Start applying the learned directives on test environments. Create your first .service with full isolation, configure a .socket and a .timer.
  • Review Existing Services: Analyze your current production unit files. Are there opportunities to improve security, reliability, or resource optimization?
  • Study man pages: For each directive mentioned in the article, open the corresponding man page (e.g., man systemd.exec) and delve into the details.
  • Implement in CI/CD: Automate the creation and deployment of unit files to ensure consistency and reliability.
  • Share Knowledge: Exchange experiences with colleagues, participate in communities, as collective knowledge is power.

Systemd is not just a tool; it's a philosophy for managing Linux systems. By mastering it at an advanced level, you will become a more effective, confident, and valuable specialist in the world of modern IT infrastructure.

Was this guide helpful?

Advanced systemd for production: reliability, isolation, and service monitoring on Linux servers