Test output: buffer the result stream and drop the per-line flush (~6x faster file output)#15
Open
antalvdb wants to merge 1 commit into
Open
Test output: buffer the result stream and drop the per-line flush (~6x faster file output)#15antalvdb wants to merge 1 commit into
antalvdb wants to merge 1 commit into
Conversation
Writing test results was dominated by write() syscalls: the default std::ofstream buffer is tiny, so one result per test instance turned into a flood of small writes, and show_results() additionally flushed the stream on every line via std::endl. Give outStream a 1 MB buffer (a per-experiment member, set with rdbuf()->pubsetbuf() before open()) and write '\n' instead of std::endl in show_results(). The stream is flushed when it is closed at the end of testing, so output is unchanged. Measured (IGTree, 512k test instances written to a file; reused saved base): - wall time: ~31 s -> ~4.6 s (~17k -> ~111k instances/s, ~6.5x) - instructions: ~430 B -> ~36 B (~12x) Output is byte-identical (verified IGTree 512k and TRIBL2, plain and +v db). pubsetbuf must precede open(); honoured by both libc++ and libstdc++. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Writing test results is dominated by
write()syscalls. Profiling a batch test run shows ~82% of the time in__write_nocancel: the defaultstd::ofstreambuffer is tiny, so emitting one result per test instance becomes a flood of small writes, andshow_results()additionally flushes the stream on every line viastd::endl.This:
outStreama 1 MB buffer (a per-experiment member, set withrdbuf()->pubsetbuf()beforeopen()), so lines batch into a handful of large writes, and'\n'instead ofstd::endlinshow_results()so the per-line flush doesn't defeat the buffer.The stream is flushed when it is closed at the end of testing, so the output is unchanged.
Measured impact
IGTree, 512k test instances written to a file (reused saved instance base):
≈ 6.5× faster wall-clock, ≈ 12× fewer instructions. Both changes matter: the buffer alone takes ~430 B → ~47 B (even with
endl), and'\n'then takes ~47 B → ~36 B; either change alone with the default buffer makes no difference. As a side effect this shows the actual IGTree classification is only ~17 B instructions for 512k (~33k each) — the rest was write-syscall overhead.Correctness
Output is byte-identical and complete — verified on IGTree (512k lines) and TRIBL2 (20k), both plain and with
+v db(distributions printed).Notes
pubsetbufmust precedeopen()to take effect; this is honoured by both libc++ and libstdc++.NS_Test) still usesendl; only the commonshow_results()path is changed here.Posting for your consideration — happy to adjust (e.g. buffer size, or making it configurable).
🤖 Generated with Claude Code