Skip to content

fix: cache ELF file descriptors per file, not per process#528

Open
r1viollet wants to merge 1 commit intomainfrom
r1viollet/elf-fd-cache-per-file
Open

fix: cache ELF file descriptors per file, not per process#528
r1viollet wants to merge 1 commit intomainfrom
r1viollet/elf-fd-cache-per-file

Conversation

@r1viollet
Copy link
Copy Markdown
Collaborator

What does this PR do?

When profiling many forked processes sharing the same ELF files (e.g. libc), report_module() was opening a fresh fd for each process's Dwfl session. With N forks and M shared libraries, this resulted in N*M open file descriptors.

Fix: add _fd_cache (unordered_map<FileInfoId_t, UniqueFd>) to DsoHdr, which is shared across all processes in UnwindState. The first time an ELF file is registered, it is opened and cached. Subsequent processes dup() the cached fd for dwfl ownership transfer. elfutils uses MAP_PRIVATE mmaps so the OS page cache already deduplicates physical pages; the dup() avoids redundant open() syscalls and fd exhaustion.

Add STATS_ELF_FDS metric (elf.fds) to track the fd cache size each profiling cycle, observable in the notice-level stats output.

Motivation

@bwoebi reported an issue with forking processes

Additional Notes

NA

How to test the change?

I straced a process that was forking before and after the fix:

  - Buggy: 93 openat() calls on ELF files — same shared libraries opened repeatedly, once per process                                                                                                       
  - Fixed: 17 openat() calls — each unique ELF file opened exactly once, dup'd for all 20 forked processes                                                                                                                                                                                                                                                                                                       

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 27, 2026

Benchmark results for collatz

Parameters

Baseline Candidate
config baseline candidate
profiler-version ddprof 0.24.0+6a698080.104358164 ddprof 0.24.0+95e24c76.104834675

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean execution_time
scenario:ddprof -S bench-collatz --preset cpu_only collatz_runner.sh same

@pr-commenter
Copy link
Copy Markdown

pr-commenter bot commented Mar 27, 2026

Benchmark results for BadBoggleSolver_run

Parameters

Baseline Candidate
config baseline candidate
profiler-version ddprof 0.24.0+6a698080.104358164 ddprof 0.24.0+95e24c76.104834675

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean execution_time
scenario:ddprof -S bench-bad-boggle-solver BadBoggleSolver_run work 1000 same

When profiling many forked processes sharing the same ELF files (e.g.
libc), report_module() was opening a fresh fd for each process's Dwfl
session. With N forks and M shared libraries, this resulted in N*M open
file descriptors.

Fix: add _fd_cache (unordered_map<FileInfoId_t, UniqueFd>) to DsoHdr,
which is shared across all processes in UnwindState. The first time an
ELF file is registered, it is opened and cached. Subsequent processes
dup() the cached fd for dwfl ownership transfer. elfutils uses
MAP_PRIVATE mmaps so the OS page cache already deduplicates physical
pages; the dup() avoids redundant open() syscalls and fd exhaustion.

Add STATS_ELF_FDS metric (elf.fds) to track the fd cache size each
profiling cycle, observable in the notice-level stats output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@r1viollet r1viollet force-pushed the r1viollet/elf-fd-cache-per-file branch from 4519337 to 95e24c7 Compare March 27, 2026 18:43
// dup with O_CLOEXEC preserved: plain dup() drops close-on-exec, which
// would leak ELF fds into exec'd children. F_DUPFD_CLOEXEC atomically
// duplicates and sets FD_CLOEXEC, matching the original open(O_CLOEXEC).
UniqueFd fd_holder{::fcntl(cached_fd.get(), F_DUPFD_CLOEXEC, 0)};
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not very useful to add this (cloexec)

@r1viollet r1viollet marked this pull request as ready for review March 27, 2026 18:51
@r1viollet r1viollet requested a review from nsavoire as a code owner March 27, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant