Systemd Hardening
Evaluate Security of Services
Security score:
$ systemd-analyze security
prometheus-node-exporter.service 8.6 EXPOSED 🙁
prometheus.service 4.3 OK 🙂
rescue.service 9.5 UNSAFE 😨
rtkit-daemon.service 7.2 MEDIUM 😐
...
List of All Options
Service Options
Options that can be used in the [Service]
section of a systemd service
unit.
Capabilities
On Linux, super-user privileges are divided into capabilities. Available
capabilities are listed in capabilities(7) and systemd-analyze
capability
lists all capabilities known to systemd.
AmbientCapabilities
By default, all capabilities are dropped when running a service as a non-root user. In order to grant a non-root user limited super-user capabilities, this directive can be used.
Grant user backup-daemon capability CAP_DAC_READ_SEARCH:
User=backup-daemon
AmbientCapabilities=CAP_DAC_READ_SEARCH
This is generally preferred to running a service as root and dropping capabilities.
See:
systemd.exec(5) → SecureBits= (subtly changes behavior of capabilities)
NoNewPrivileges
Deny access to escalating privileges:
NoNewPrivileges=yes
In particular, the service process and all its children will ignore setuid and
setgid bits used by su
and sudo
to gain privileges.
Devices
DeviceAllow
Allow device /dev/loop-control, /dev/loop[0-9]:
DeviceAllow=/dev/loop-control
DeviceAllow=block-loop
Allow read-only access to /dev/sda:
DeviceAllow=/dev/sda:r
Use PrivateDevices when only the default set of pseudo-devices like /dev/null, /dev/zero, and /dev/urandom is needed.
By default, access to common pseudo-devices like /dev/null or /dev/urandom is always granted.
PrivateDevices
Only provide a minimal set of devices like /dev/null, /dev/zero or /dev/urandom to the service. Systemd will also take other measures to prevent device creation and access.
Enable private devices:
PrivateDevices=yes
PrivateIPC
Create a private IPC namespace for the service:
PrivateIPC=yes
Multiple services can be made to share their IPC namespace using JoinsNamespaceOf
PrivatePIDs
Create a private PID namespace:
PrivatePIDs=true
Isolate PID space
Caveats You probably wan to use ProtectProc instead.
RemoveIPC
Remove IPC objects when service is stopped:
User=exampled
RemoveIPC=yes
Remove all IPC objects owned by the user running the service.
/proc and /sys Filesystem
ProcSubset
Only allow access to PID information in /proc (eg /proc/<pid>):
ProcSubset=pid
ProtectProc
Control access to processes in /proc
Deny access to other users processes:
ProtectProc=noaccess
Hide other users processes:
ProtectProc=invisible
Hide non-ptraceable processes:
ProtectProc=ptraceable
You should usually prefer invisible over noaccess as many services do not handle being denied access well.
These directives correspond to the hidepid= mount option of proc. See proc(5)#Mount_options
ProtectKernelTunables
Protect kernel veriables accessible in /proc, /sys or via sysctl(8)/sysctl.conf(5):
ProtectKernelTunables=yes
ProtectClock
Prevent service from changing the time:
ProtectClock=yes
ProtectControlGroups
Prevent modifications to cgroup hierarchies:
ProtectControlGroups=yes
Filesystem Access
NoExecPaths
Only allow execution of /usr/bin/serviced:
NoExecPaths=/
ExecPaths=/usr/bin/serviced
This, in combination with MemoryDenyWriteExecute may be used to make arbitrary code execution harder.
PrivateTmp
Create private, empty /tmp and /var/tmp for the service:
PrivateTmp=yes
Explicitly create a disconnected tmpfs(5) for /var/tmp and /tmp:
PrivateTmp=disconnect
Use disconnect to avoid temporary data hitting the disk when /tmp and /var/tmp is unencrypted, persistent storage.
Multiple services can be made to share their /tmp and /var/tmp using JoinsNamespaceOf. Cannot be used with disconnect.
Temporary files are cleaned when the service is stopped.
ProtectHome
Restrict access to /home, /root and /run/user for a service.
Make /home inaccessible:
ProtectHome=yes
Make /home read-only:
ProtectHome=read-only
You can use ReadWritePaths to lift read-only restriction on subdirectories.
Replace /home with an empty, read-only directory:
ProtectHome=tmpfs
See:
ExecStart (full write access in ExecStart=, ExecStartPre=, etc.)
InaccessiblePaths
Make directories at /etc/hidden, /hidden and /home inaccessible:
InaccessiblePaths=/etc/hidden /hidden
InaccessiblePaths=/home
See:
ExecStart (full write access in ExecStart=, ExecStartPre=, etc.)
ReadOnlyPaths
Make files and directories read-only at /etc/read-only, /read-only and /home:
ReadOnlyPaths=/etc/read-only /read-only
ReadOnlyPaths=/home
See:
ExecStart (full write access in ExecStart=, ExecStartPre=, etc.)
ReadWritePaths
Make files and directories read-write at /etc/read-write, /read-write and /home:
ReadWritePaths=/etc/read-write /read-write
ReadWritePaths=/home/
Directories otherwise read-only or inaccessible due to the use of ProtectHome or ProtectSystem may be made readable/writable.
Subdirectories or files specified in ReadOnlyPaths may be made writable. However, this does not extend to InaccessiblePaths.
See:
ExecStart (full write access in ExecStart=, ExecStartPre=, etc.)
RestrictFileSystems
Only allow opening files on a ext4 or tmpfs filesystem:
RestrictFileSystems=ext4 tmpfs
Only deny access to network filesystems:
RestrictFileSystems=~@network
Obtain a list of all known filesystems and groups:
$ systemd-analyze filesystems
ProtectSystem
Mount /usr/, /boot/ and /efi/ read-only:
ProtectSystem=yes
Additionally mount /etc/ read-only:
ProtectSystem=full
Mount everything read-only except /dev/, /proc/ and /sys
ProtectSystem=strict
Use ReadWritePaths to allow write access to specific files or directories.
See
ExecStart (full write access in ExecStart=, ExecStartPre=, etc.)
ProtectHostname
Prevent service from manipulating hostname (UTS namespace):
ProtectHostname=yes
ProtectKernelLogs
Deny service access to kernel logs (e.g. via dmesg(1)):
ProtectKernelLogs=yes
ProtectKernelModules
Prevent loading of kernel modules by service:
ProtectKernelModules=yes
TemporaryFileSystem
Place a empty tmpfs filesystem at /path/directory:
TemporaryFileSystem=/path/directory
The same but make the directory read-only:
TemporaryFileSystem=/path/directory:ro
This is often useful when a service can’t deal with a directory being read-only or inaccessible but is fine with it being empty.
Networking
PrivateNetwork
Create a private network namespace with only a private loopback interface:
PrivateNetwork=yes
Multiple services can be made to share their network namespace using JoinsNamespaceOf. Restricting access to the (global) loopback interface, or any other interface, can be done using RestrictNetworkInterfaces.
RestrictAddressFamilies
Restrict socket access to IPv6, IPv4 and Unix socket families respectively:
RestrictAddressFamilies=AF_INET AF_INET6 AF_UNIX
Allow no address family:
RestrictAddressFamilies=none
The special value of none
is only supported starting with systemd 249.
Often required:
Family |
Reason |
---|---|
AF_NETLINK |
Enumerating network interfaces, for instance, to be able to bind to specific interfaces. |
AF_UNIX |
Logging via syslog(3). |
See
List of socket types in socket(2)
RestrictNetworkInterfaces
Restrict access to the loopback (lo) interface:
RestrictNetworkInterfaces=lo
Deny access to interface eth0 only:
RestrictNetworkInterfaces=~eth0
When no network access is needed use PrivateNetwork.
See systemd.resource-conntrol(5) → RestrictNetworkInterfaces=
SocketBindAllow
Important
Without also specifying SocketBindDeny=any
, the
service may bind to all ports.
Allow service to bind to TCP ports 80 and 443 only:
SocketBindAllow=tcp:80
SocketBindAllow=tcp:443
SocketBindDeny=any
Omit protocol to allow TCP and UDP:
SocketBindAllow=80
SocketBindAllow=443
SocketBindDeny=any
Port ranges, like 1200-1300
, are accepted too.
Allow an unprivileged service to bind to TCP ports 80 and 443 only:
AmbientCapabilities=CAP_NET_BIND_SERVICE
SocketBindAllow=tcp:80
SocketBindAllow=tcp:443
SocketBindDeny=any
User=www-data
Capability CAP_NET_BIND_SERVICE is required to bind to any port lower than 1024. SocketBindAllow can be used to restrict this privilege to certain ports.
IPAddressAllow
Important
Without also specifying IPAddressDeny=any
, the
service will be allowed to connect to any address.
Only allow connecting to CIDR networks 10.0.0.0/8 and fc00/7:
IPAddressAllow=10.0.0.0/8 fc00/7
IPAddressDeny=any
The value localhost
can be used to restrict access to 127.0.0.1
and ::1. If you wish to restrict access to localhost only, consider
using RestrictNetworkInterfaces=lo in addition.
IPAddressDeny
Deny access to CIDR networks 10.0.0.0/8 and fc00/7:
IPAddressDeny=10.0.0.0/8 fc00/7
RestrictNamespaces
Deny any namespace change:
RestrictNamespaces=yes
Only allow access namespaces ipc and net:
RestrictNamespaces=ipc net
Only deny access namespaces ipc and net:
RestrictNamespaces=~ipc net
RestrictRealtime
Deny access to any realtime scheduling functionality:
RestrictRealtime=yes
RestrictSUIDSGID
Prevent setting of SUID and SGID bits for file permissions:
RestrictSUIDSGID=yes
See
Details about SUID/SGID (AKA set-user-ID/set-group-ID) in execve(2)
UMask
Create files and directories that are only accessible by user/owner if permission are not explicitly set during creation:
UMask=0077
Allow user and group only:
UMask=0007
See
User / Group
DynamicUser
Dynamically create a Unix user as which the service is ran:
DynamicUser=yes
This is not suitable for services that write persistent data to disk or have to read private data. This because the UID/GID will be unpredictable and may be shared (though not at the same time) with other services.
Read sysemd.exec(5) → DynamicUser= before use.
See also ExecStart (run ExecStart=, ExecStartPre=, etc. with full privileges)
PrivateUsers
Run service in a private user namespace:
PrivateUsers=yes
User
Run process as user serviced:
User=serviced
Group is taken from the passwd database unless specified via Group and Supplementary groups from the group database.
See:
ExecStart (run ExecStart=, ExecStartPre=, etc. with full privileges)
Group
Set users group to serviced:
Group=serviced
See:
ExecStart (run ExecStart=, ExecStartPre=, etc. with full privileges)
SupplementaryGroups
On Unix, any process belongs to a user (UID) and group (GID)
but it may also belong to additional/supplementary groups. Such
supplementary groups are shown in groups=
by id
:
$ id user
uid=1000(user) gid=1000(user) groups=1000(user),999(qubes),126(docker)
Add service to supplementary group inet:
SupplementaryGroups=inet
Groups from the system’s group database are left untouched and SupplementaryGroups are appended.
Exec{Start,Stop}{,Pre,Post}
Prefixes +
and !
can be used to execute commands with full
privileges (without User/Group/etc. being applied) and
without filesystem access restriction being applied
(PrivateHome/ReadOnlyPaths/etc.).
Call mkdir /etc/directory/
as root and with /etc/ being writable:
ExecStartPre=+mkdir /etc/directory
ExecStart=serviced --foreground
ReadOnlyPaths=/etc/
User=serviced
Use !
to only revert the effects of User
, Group
and
SupplementaryGroups
.
These prefixes can be used with ExecStart, ExecStartPre, ExecStartPost, ExecStop, ExecStopPre and ExecStopPost.