─── ✦─☆─✦─☆─✦─☆─✦ ───
Co-authored by Minerva.
AI-assisted writing was used to improve clarity and structure; all technical viewpoints and conclusions are independently validated by the author. The blog was developed from my original notes and later refined into its current form.
The Linux virtual file system is a beautiful mess. It’s a layer of abstractions that mostly works, until you start poking at the corners and realize that things like /proc are essentially giant, leaky buckets of internal state. If you’ve spent any time in the offensive security trenches, you know that the filesystem is rarely as static or as private as developers hope.
Here is a deep dive into two specific primitives—FD resurrection and TOCTOU races—that demonstrate why relying on path strings or “secure” unlinking is a dangerous game.
The /proc Leak: Magic Links and Zombie Files
In Linux, the /proc/[pid]/fd/ directory contains entries for every file a process has open. These aren’t standard symlinks; they are “magic links.” The difference is subtle but devastating for security. A normal symlink just points to a path. A magic link points directly to the underlying file object in the kernel.
If a process opens a sensitive file and then unlinks it (deletes it from the disk), the file disappears from the directory tree. However, the data remains on the disk as long as that process keeps the file descriptor (FD) open.
Here is where it gets weird: anyone with sufficient permissions can open /proc/[pid]/fd/[num] and obtain a new functional file descriptor to that deleted file. The key permission requirement is that the attacker must be able to read that process’s /proc directory—typically this means either being the same user as the target process or having superuser privileges. This is a critical constraint that affects the real-world exploitability in many environments.
It's kernel-level necromancy. You can "kill" the file on the disk, but as long as a process is holding a handle, the file is effectively a digital zombie that can be resurrected by anyone who knows where to look in
/proc.
The Security Primitive
This is a classic way to bypass “delete-on-open” patterns. Many developers create a temporary secret file, open it, and immediately unlink it so it “won’t be accessible to others.” But if that process stays alive, the secret is still very much available to any other process on the system that can read that PID’s file descriptors.
In practice, this is most dangerous in multi-tenant environments (containers, shared hosting) where the attacker is a co-located process running as the same or a related user. Single-user systems and those with strict user separation are less vulnerable, though privilege escalation chains can make this relevant even in apparently secure contexts.
The Race: Unlink vs. Open (TOCTOU)
This leads to the second category of weirdness: the Time-of-Check to Time-of-Use (TOCTOU) race condition. Most code follows a simple logic: check if a file exists, then open it. Or, check if a path is a file, and then write to it.
The problem is that the kernel does not freeze the filesystem state between those two operations. In high-concurrency environments, an attacker can swap the file out exactly between your check and your open.
Winning the Race
Winning a TOCTOU race requires overcoming kernel scheduler timing and creating a window—usually microseconds or less—between the check and the use. In practice, this is more difficult than it appears in textbooks.
The simplest approach is a tight loop that continuously unlinks and replaces the target file:
# Simple race primitive (low success rate on modern kernels)
while true; do
ln -sf /etc/passwd /tmp/target
unlink /tmp/target
ln -sf /dev/null /tmp/target
doneHowever, this brute-force approach has a low success rate on modern kernels with high scheduler resolution. More reliable exploitation requires:
-
Multi-threaded concurrency: A single attacker thread manipulating
/tmp/targetwhile the victim process repeatedly opens it. The added CPU contention and context switching create wider race windows. -
Syscall tracing delays: Using tools like
straceto force the victim process to pause at specific syscalls, expanding the vulnerable window. This requires the attacker to either run the victim process directly or have ptrace capabilities. -
Predictable timing: Identifying patterns in how often the victim opens the file, then timing the swap to coincide with those opens.
If you can force a process to open a file at the exact microsecond you are unlinking the original and replacing it with a symlink to
/etc/shadow, you win. You've convinced a privileged process to give you a handle to a file you shouldn't be able to touch. But success requires understanding your target's behavior and kernel timing—it's not a "spray and pray" primitive on modern systems.
If your security logic depends on a file path being "stable" for more than one system call, you've already created an exploit primitive. The path is a lie; the inode is the truth.
Technical Deep-Dive: File Descriptor Resurrection
Why does this happen? When you unlink a file, you are removing a name from a directory. The kernel doesn’t actually delete the file data until the “link count” reaches zero. Every open file descriptor counts as a link.
When you open a magic link in /proc, the kernel skips the standard path resolution logic and goes straight to the file object associated with that FD. This is why you can “re-open” a file that no longer exists on the disk.
The kernel maintains a reference count on the underlying inode. As long as any process has the file open, or any directory entry points to it, the inode remains allocated. This is elegant for normal use cases but becomes a security liability when combined with world-readable /proc directories.
To mitigate this, use
O_TMPFILEfor temporary files. It creates an unnamed inode that never has a path in the first place, making it significantly harder (though not impossible) to locate via/proc. Unnamed inodes bypass the directory lookup entirely, so there's no path traversal vector to exploit—an attacker would need direct inode number enumeration, which is far more difficult.
Real-World Primitive: FD Re-opening and Permission Changes
One subtle vulnerability involves reopening file descriptors through /proc/self/fd/ or /proc/[pid]/fd/ when the original file has been replaced or when the descriptor’s underlying inode has changed ownership or permissions.
Here’s a specific scenario: A privileged process (running as root) opens a world-writable temporary file for reading. The developer assumes this is safe because they’re using read-only flags (O_RDONLY). However, between the time the file was created and the time it was opened, an attacker has replaced the file with a symlink pointing to /etc/shadow.
On some kernel versions or with certain mount configurations, accessing the FD through /proc/[pid]/fd/[num] may trigger a fresh permission check against the current target of the symlink, not the original file permissions. This can allow privilege escalation if the permissions of the symlink target are more permissive than expected.
Additionally, if a file’s inode is unlinked and replaced with a different file (same inode number reused after garbage collection), reopening through /proc may give access to an entirely different file than the one originally opened. This is filesystem-dependent and less reliable but demonstrates the fragility of assuming FD semantics remain constant.
Path traversal is elementary. FD-based attacks are where the real system-level complexity lives. Once an attacker has a handle in
/proc, standard path-based security rules like mount flags (e.g.,noexec) can behave in unexpected ways. More importantly, the semantics of what an open FD actually points to can shift beneath your feet if the underlying filesystem state changes.
Lessons for the Defensive Engineer
-
Don’t Trust Paths: Always use
openat(),fstat(), and other FD-based system calls once you have an initial handle. Validate the file’s properties (inode number, device ID, permissions) immediately after opening and re-check them before sensitive operations. -
Pin Your Operations: Use
O_NOFOLLOWto prevent symlink attacks andO_CLOEXECto prevent FD leakage to child processes. For critical files, usefstat()to verify inode and device numbers before reading or writing. -
Limit /proc Exposure: Be aware that your process’s internal life is visible. If you are handling sensitive data in temporary files, audit which processes can read
/proc/[your-pid]/fd/. In containerized environments, consider mount options that restrict/procaccess. -
Use O_TMPFILE When Possible: For temporary data that should never have a directory entry,
O_TMPFILEeliminates the path-based attack surface entirely. -
Atomic Operations Over State Transitions: Where possible, use atomic syscalls that combine check and use (e.g.,
openat()with appropriate flags) rather than separate check-then-open patterns.
Most "secure" logging implementations I've audited fail exactly here. They rotate logs by unlinking and creating new ones, but they leave the old FDs open for seconds—more than enough time for a zombie-resurrection attack. The fix is simple: ensure old FDs are closed before the new log file is created, or use dedicated log rotation APIs that handle this atomically.
I have written in detail about Privilege Escalation methods for Windows - Privilege Escalation and Linux - Privilege Escalation.
─── ✦─☆─✦─☆─✦─☆─✦ ───