Slowing the flow of core-dump-related CVEs

The 6.16 kernel will include a number of changes to how the kernel handles the processing of core dumps for crashed processes. Christian Brauner explained his reasons for doing this work as: “Because I'm a clown and also I had it with all the CVEs because we provide a **** API for userspace”. The handling of core dumps has indeed been a constant source of vulnerabilities; with luck, the 6.16 work will result in rather fewer of them in the future.
Linux 6.16 内核将对处理崩溃进程的 core dump（核心转储）方式进行一系列改进。Christian Brauner 解释他进行这项工作的原因是：“因为我是个小丑，而且我受够了这些 CVE 漏洞，因为我们为用户空间提供了一个糟糕透顶的 API。”事实上，core dump 的处理一直是漏洞的温床；希望 6.16 的这项工作能大幅减少这类问题。

—

**The problem with core dumps**
**core dump 的问题**

A core dump is an image of a process's data areas — everything except the executable text; it can be used to investigate the cause of a crash by examining a process's state at the time things went wrong. Once upon a time, Unix systems would routinely place a core dump into a file called core in the current working directory when a program crashed. The main effects of this practice were to inspire system administrators worldwide to remove core files daily via cron jobs, and to make it hazardous to use the name core for anything you wanted to keep. Linux systems can still create core files, but are usually configured not to.
Core dump 是一个进程数据区（不包括可执行代码）的映像；通过查看进程在崩溃时的状态，开发人员可以分析其崩溃原因。早期的 Unix 系统在程序崩溃时，会默认将 core dump 写入当前工作目录下名为 `core` 的文件中。这种做法的后果是全球的系统管理员都需要通过 cron 定时任务每天清理 core 文件，并且“core”这个名字也成了不宜用来保存其他重要文件的危险名称。如今的 Linux 系统仍然支持生成 core 文件，但通常配置为默认不生成。

An alternative that is used on some systems is to have the kernel launch a process to read the core dump from a crashing process and, presumably, do something useful with it. This behavior is configured by writing an appropriate string to the core\_pattern sysctl knob. A number of distributors use this mechanism to set up core-dump handlers that phone home to report crashes so that the guilty programs can, hopefully, be fixed.
另一种替代方案是在某些系统中，当进程崩溃时由内核启动另一个进程来读取 core dump，并对其进行某种有用的处理。这种行为通过向 `core_pattern` sysctl 接口写入指定格式的字符串来配置。许多发行版会使用这种机制配置 core dump 处理程序，并在崩溃时回传信息，以便开发人员修复有问题的程序。

This is the “**** API” referred to by Brauner; it indeed has a number of problems. For example, the core-dump handler is launched by the kernel as a user-mode helper, meaning that it runs fully privileged in the root namespace. That, needless to say, makes it an attractive target for attackers. There are also a number of race conditions that emerge from this design that have led to vulnerabilities of their own.
这正是 Brauner 所说的“糟糕 API”；它确实存在诸多问题。例如，core dump 处理程序是由内核以用户模式辅助程序的身份启动的，意味着它在 root 命名空间中以完全特权运行。这显然成为了攻击者的理想目标。此外，这种设计还存在多个竞态条件，曾引发过一些安全漏洞。

See, for example, this recent Qualys advisory describing a vulnerability in Ubuntu's apport tool and the systemd-coredump utility, both of which are designed to process core dumps. In short, an attacker starts by running a setuid binary, then forcing it to crash at an opportune moment. While the core-dump handler is being launched (a step that the attacker can delay in various ways), the crashed process is killed outright with a SIGKILL signal, then quickly replaced by another process with the same process ID. The core-dump handler will then begin to examine the core dump from the crashed process, but with the information from the replacement process.
例如，Qualys 最近的一份安全通告描述了 Ubuntu 的 apport 工具和 systemd-coredump 工具中的一个漏洞，这两者都用于处理 core dump。简而言之，攻击者先运行一个 setuid 程序，然后在合适的时机强制其崩溃。在 core dump 处理程序被启动的过程中（攻击者可以用多种方式延迟这一过程），原进程会被用 SIGKILL 杀死，并迅速被另一个具有相同 PID 的进程替换。此时，core dump 处理程序会开始处理原本崩溃进程的 core dump，但实际上处理的是新进程的信息。

That process is running in its own attacker-crafted namespace, with some strategic environmental changes. In this environment, the core-dump handler's attempt to pass the core-dump socket to a helper can be intercepted; that allows said process to gain access to the file descriptor from which the core dump can be read. That, in turn, gives the attacker the ability to read the (original, privileged) process's memory, happily pillaging any secrets found there. The example given by Qualys obtains the contents of /etc/shadow, which is normally unreadable, but it seems that SSH servers (and the keys in their memory) are vulnerable to the same sort of attack.
该替代进程运行在攻击者构造的命名空间中，并进行了某些有策略的环境配置更改。在这种环境下，core dump 处理程序试图将 core dump 的 socket 传递给辅助程序的行为可能被拦截；这样攻击者就能获得可读取 core dump 的文件描述符。进一步来说，这使得攻击者能够读取原（特权）进程的内存，从而窃取其中的敏感数据。Qualys 提供的示例显示攻击者可以读取本应无法访问的 `/etc/shadow` 文件内容，甚至 SSH 服务器及其内存中的密钥也可能遭遇类似攻击。

Interested readers should consult the advisory for a much more detailed (and coherent) description of how this attack works, as well as information on some previous vulnerabilities in this area. The key takeaways, though, are that core-dump handlers on a number of widely used distributions are vulnerable to this attack, and that reusable integer IDs as a way to identify processes are just as much of a problem as the pidfd developers have been saying over the years.
有兴趣的读者可以阅读该安全通告，其中对攻击过程进行了更详细且逻辑清晰的说明，并列举了该领域以往的一些漏洞。核心观点是：许多主流发行版的 core dump 处理机制易受此类攻击，并且多年来 pidfd 的开发者反复指出的“可复用整数 PID 识别进程”的问题，确实非常严重。

—

**Toward a better API**
**走向更好的 API**

The solution to this kind of race condition is to give the core-dump handler a way to know that the process it is investigating is, indeed, the one that crashed. The 6.16 kernel contains two separate changes toward that goal. The first is this patch from Brauner adding a new format specifier (“%F”) for the string written to core\_pattern. This specifier will cause the core-dump handler to be launched with a pidfd identifying the crashed process installed as file descriptor number three. Since it is a pidfd, it will always refer to the intended process and cannot be fooled by process-ID reuse.
解决这类竞态问题的方法，是让 core dump 处理程序能够确认其正在处理的确实是崩溃进程。Linux 6.16 内核为此引入了两项独立的改进。第一项是 Brauner 提交的补丁，在 `core_pattern` 字符串中新增了格式符 `%F`。使用该格式符后，core dump 处理程序在启动时会获得一个 pidfd（文件描述符 3），用于标识崩溃的进程。由于 pidfd 是内核分配的引用，不会受到 PID 重用的影响，因此能够始终准确指向目标进程。

This change makes it relatively easy to adapt core-dump handlers to avoid the most recently identified vulnerabilities; it has already been backported to a recent set of stable kernels. But it does not change the basic nature of the core\_pattern API, which still requires the launch of a new, fully privileged process to handle each crash. It is, instead, a workaround for one of the worst problems with that API.
这一更改使 core dump 处理程序能够较容易地规避近期曝光的漏洞；它已经被回合并进了多个稳定内核版本中。不过，它并未改变 `core_pattern` API 的基本机制 —— 每次崩溃仍需启动一个新的、具有完全权限的处理进程。换言之，这只是对该 API 最严重问题的一种“权宜之计”。

The longer-term fix is this series from Brauner, which was also merged for 6.16. It adds a new syntax to core\_pattern instructing the kernel to write core dumps to an existing socket; a user-space handler can bind to that socket and accept a new connection for each core dump that the kernel sends its way. The handler must be privileged to bind to the socket, but it remains an ordinary process rather than a kernel-created user-mode helper, and the process that actually reads core dumps requires no special privileges at all. So the core-dump handler can bind to the socket, then drop its privileges and sandbox itself, closing off a number of attack vectors.
更长期的解决方案是 Brauner 提交的另一组补丁，也已经合并进了 Linux 6.16。它为 `core_pattern` 引入了一种新语法，指示内核将 core dump 写入一个已有的 socket；用户空间中的处理程序可以绑定该 socket，并在每次接收到 core dump 时接受新连接。绑定 socket 的处理程序需要有特权权限，但它仍是普通进程，而非由内核创建的用户模式 helper。实际读取 core dump 的进程也不再需要特权。因此，处理程序可以先绑定 socket，再主动放弃权限并进入沙箱运行，从而关闭多个攻击面。

Once a new connection has been made, the handler can obtain a pidfd for the crashed process using the SO\_PEERPIDFD request for getsockopt(). Once again, the pidfd will refer to the actual crashed process, rather than something an attacker might want the handler to treat like the crashed process. The handler can pass the new PIDFD\_INFO\_COREDUMP option to the PIDFD\_GET\_INFO ioctl() command to learn more about the crashed process, including whether the process is, indeed, having its core dumped. There are, in other words, a couple of layers of defense against the sort of substitution attack demonstrated by Qualys.
在建立连接后，处理程序可以通过 `getsockopt()` 的 `SO_PEERPIDFD` 请求获取崩溃进程的 pidfd。再次强调，这个 pidfd 始终引用的是实际崩溃的进程，而不是攻击者伪装的目标。处理程序还可以通过向 `PIDFD_GET_INFO` ioctl 命令传入新的 `PIDFD_INFO_COREDUMP` 选项，获取关于崩溃进程的更多信息，例如该进程是否真的处于 core dump 状态。也就是说，该机制提供了多层防护，防止类似 Qualys 报告中所示的替换攻击。

The end result is a system for handling core dumps that is more efficient (since there is no need to launch new helper processes each time) and which should be far more resistant to many types of attacks. It may take some time to roll out to deployed systems, since this change seems unlikely to be backported to the stable kernels (though distributors may well choose to backport it to their own kernels). But, eventually, this particular source of CVEs should become rather less productive than it traditionally has been.
最终，这套机制为 core dump 的处理带来了更高的效率（因为无需每次都启动新的 helper 进程），同时也更能抵御多种类型的攻击。尽管这一机制短期内不会被回合并进所有稳定内核，但发行版厂商可能会选择自行回合并。不过，随着这一改进的广泛部署，core dump 所带来的漏洞（CVE）数量有望显著减少。

文章版权归作者所有，未经允许请勿转载。如内容涉嫌侵权，请在本页底部进入<联系我们>进行举报投诉!

THE END