If you ever tried to configure systemd’s SystemCallFilter= directive to
harden some systemd unit… When its quite a pain, how are you supposed to know
that?
Of systemd developers did notice that very few people ever will know all the system calls any process would use, so they gave us system call groups that each roughly correspond to one subsystem in the kernel.
But there is no information on which groups to use, so its still not really
better than trial-and-error or wasting a lot of time on viewing strace logs.
Collecting a list on all the system calls used
Fortunately strace has a handy feature called “summary-only” mode that
reduces the entire infinite log of system calls into just a list of system
calls that occured and how often each system call was used.
Since this already simplifies things a lot run your process like this and make sure to trigger as many features as you can:
$ strace -f -c -o /tmp/strace.stats <progname> [<args>, …]
It will be somewhat slower, but after it finishes (you may use
Ctrl+C, it’s safe!) it will create a the meantioned
/tmp/strace.stats file with information regarding all the system calls that
occurred:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- -------------------
92,54 34,660383 5779 5997 608 futex
1,77 0,664396 33219 20 clock_nanosleep
1,18 0,442909 15 28667 clock_gettime
1,05 0,393020 17 22346 21769 readlink
0,44 0,166252 112 1478 munmap
0,40 0,150442 46 3245 mprotect
0,36 0,133489 21 6254 read
0,36 0,133216 27 4788 2218 openat
0,34 0,127973 18 6963 2451 statx
0,17 0,063905 83 761 1 epoll_wait
0,16 0,060857 15 4022 fcntl
0,14 0,054210 61 879 mmap
0,14 0,053680 83 641 sched_yield
0,14 0,050997 18 2720 close
0,11 0,041341 11 3637 340 recvfrom
[… more lines …]
0,00 0,000004 4 1 arch_prctl
0,00 0,000004 4 1 set_tid_address
0,00 0,000000 0 1 listen
0,00 0,000000 0 1 execve
0,00 0,000000 0 1 rename
0,00 0,000000 0 1 1 pkey_alloc
------ ----------- ----------- --------- --------- -------------------
100,00 37,454788 346 108137 27505 total
The rightmost column here is what we’re interested to pass to systemd.
In theory we could use that column as-is, but its quite likely that minor updates of glibc, the application runtime or the application itself will end up using slightly different (but very likely related) system calls, so it’s not a good idea.
Instead we want to map this list to systemd’s more dynamic system call groups!
Mapping the list
First save the following ZSH shell script based on this simpler version by SjonHortensius:
#!/usr/bin/env zsh
# Generate SystemCallFilter from list of syscalls
#
# Run this script, type or paste a list of syscalls and this script will return
# the possible @call2groups based on the list of groups returned by
# `systemd-analyze` on the current system.
## Sjon Hortensius, 12020
## Erin of Yukis, 12025
set -eu
# Dynamically initialize call2groups (${syscall} → ${group})
declare -A call2group
declare -A group2call
maxgrouplen=0
while IFS= read -r line
do
[[ ${#line} -eq 0 ]] && continue
if [[ $line == @* ]]
then
group=${line}
group2call[${group}]=
if [ ${#group} -gt ${maxgrouplen} ];
then
maxgrouplen=${#group}
fi
elif [[ $line != \ *\#* && -n ${group+set} && ${group} != @known ]]
then
syscall=${line## }
call2group[${syscall}]=${call2group[$syscall]:-}${call2group[$syscall]:+,}${group}
group2call[${group}]+=${group2call[$syscall]:-}${group2call[$syscall]:+,}${syscall}
fi
done < <(systemd-analyze syscall-filter)
# Expand group references
for name group in ${(kv)group2call[@]};
do
if [[ ${name} == @.* ]];
then
for syscall in ${(s:,:)group2call[${name}]};
do
call2group[${syscall}]=${call2group[$syscall]:-}${call2group[$syscall]:+,}${name}
done
fi
done
unset group2call
# Read used syscalls, eg. from strace -c, and build forward mappings (${group} → ${syscalls})
declare -A groupuse
while read -r syscall;
do
if [[ -n ${call2group[${syscall}]+set} ]];
then
for group in ${(s:,:)call2group[${syscall}]};
do
groupuse[${group}]=${groupuse[${group}]-}${groupuse[${group}]+,}${syscall}
done
else
groupuse[${syscall}]=${syscall}
fi
done
# Drop groups entirely subsumed (strict subset) by other groups
for group syscalls in ${(kv)groupuse[@]};
do
for group2 syscalls2 in ${(kv)groupuse[@]};
do
all_found=true
for syscall in ${(s:,:)syscalls};
do
if [[ ",${syscalls2}," != .*,${syscall},.* ]];
then
all_found=false
break
fi
done
# Check if all substrings where found AND the number of items in the
# reference is strictly greater than in our list
if ${all_found} \
&& [ ${(ws:,:)#syscalls} -lt ${(ws:,:)#syscalls2} ] \
&& [ -v ${groupuse[${group}]} ];
then
unset groupuse[${group}]
fi
done
done
# Pretty print and sort each group and used syscall therein
for group syscalls in ${(kv)groupuse[@]};
do
printf "%-$((maxgrouplen+1))s %s\n" "${group}:" ${(j:, :)${(os:,:)syscalls}}
done | sort
Review it if you want then use it like this:
$ tail -n+3 /tmp/strace.stats | head -n-2 | cut -b52- | ./systemd-callgroups.sh
That’s doing some cutting out of the part of the strace summary we care about
(just the list of system calls), then pass it to the script which will match
the system calls with all the system call groups present on the current system
and generate a report on the possible groups to use:
@basic-io: close, lseek, pread64, pwrite64, read, write
@default: arch_prctl, brk, clock_gettime, clock_nanosleep, execve, futex, geteuid, getpid, getrandom, gettid, gettimeofday, mmap, mprotect, munmap, prlimit64, rseq, sched_getaffinity, sched_yield, set_robust_list, set_tid_address
@file-system: access, close, fcntl, fstat, ftruncate, getcwd, getdents64, mkdir, newfstatat, openat, readlink, rename, statx, unlink, unlinkat
@io-event: epoll_create1, epoll_ctl, epoll_wait, eventfd2, poll, pselect6
@network-io: accept4, bind, connect, getpeername, getsockname, getsockopt, listen, recvfrom, recvmsg, sendto, setsockopt, shutdown, socket, socketpair
@pkey: pkey_alloc
@process: clone3, prctl
@signal: rt_sigaction, rt_sigprocmask, sigaltstack
@system-service: flock, ioctl, madvise, mremap, sched_getparam, sched_getscheduler, sched_yield, sysinfo, uname
Note that some system calls may show up in multiple groups! This just matches systemd making some system-calls available in multiple groups as well. (Groups entirely subsumed by other groups are removed however.)
Review which groups make the most sense for your use-case (and are least-likely
to break!) and add them to the SystemCallFilter= directive.
Much better!