This is an archive of the discontinued LLVM Phabricator instance.

Fix -ftime-trace breaking flame-graph assumptions
ClosedPublic

Authored by russell.gallop on Aug 19 2019, 3:42 AM.

Details

Summary

-ftime-trace could break flame-graph assumptions on Windows, with an inner scope overrunning outer scopes. This was due to the way that times were truncated. Changing this so time points for the flame-graph are truncated, instead of durations, preserves the relative order of event starts and ends.

I have tried to retain the extra precision for the totals, which count thousands or millions of events.

Added assert to check this property holds in future.

Fixes: https://bugs.llvm.org/show_bug.cgi?id=43043

Diff Detail

Event Timeline

russell.gallop created this revision.Aug 19 2019, 3:42 AM
russell.gallop set the repository for this revision to rG LLVM Github Monorepo.Aug 19 2019, 7:58 AM
rnk added a comment.Aug 19 2019, 1:57 PM

Code looks good.

llvm/lib/Support/TimeProfiler.cpp
117–118

nit: "microsecond precision" instead of "us precision", I read it wrong. Variables names seem clear, though.

llvm/unittests/Support/TimeProfilerTest.cpp
19 ↗(On Diff #215852)

I'm not sure this test has that much value because it will not fail deterministically if you revert the fix to bring back the truncation bugs.

Fixed comment

russell.gallop added inline comments.Aug 20 2019, 7:52 AM
llvm/unittests/Support/TimeProfilerTest.cpp
19 ↗(On Diff #215852)

Thanks for the review. I agree it's not ideal, this was the best way I could think of for testing this behaviour.

It might be possible to set the time values to appropriate values to trigger this but that would require exposing a fair bit more of TimeProfiler.cpp.

One alternative might be to have a check in Write() that reconstructed the flame graph and asserted if there was a problem. It would require a very large file (or a lot of files) to reliably hit the issue though.

Or is this okay to send without a test?

rnk added inline comments.Aug 20 2019, 11:56 AM
llvm/unittests/Support/TimeProfilerTest.cpp
19 ↗(On Diff #215852)

Test aside, adding the assert seems valuable. As it is written now, it should hold unless the steady clock lies, right?

For the test, how long does the test normally take? If it takes more than 8s-10s, I think I would prefer to remove it and commit this without a test.

russell.gallop marked an inline comment as done.Aug 21 2019, 7:59 AM
russell.gallop added inline comments.
llvm/unittests/Support/TimeProfilerTest.cpp
19 ↗(On Diff #215852)

Test aside, adding the assert seems valuable. As it is written now, it should hold unless the steady clock lies, right?

Yes, I think so. I'll try to add that.

For the test, how long does the test normally take? If it takes more than 8s-10s, I think I would prefer to remove it and commit this without a test.

I measure 5-7 seconds in Release which is quite a large increase to SupportTests.exe (previously 10s).

Added structure check on time profile and assertion.

Without the fix, this assertion fires compiling a single moderately large source file (~30 seconds). With the fix it compiles fine (even with finer granularity (I tried down to 5 microseconds). I think that this is probably a better check than TimeProfilerTest.cpp so, unless anyone disagrees, I'll remove that and just commit the changes to TimeProfiler.cpp.

mgrang added inline comments.Aug 22 2019, 10:02 AM
llvm/lib/Support/TimeProfiler.cpp
112

You can use the range-based version of llvm:sort here.

llvm::sort(SortedEntries);
russell.gallop marked 2 inline comments as done.
russell.gallop added inline comments.
llvm/lib/Support/TimeProfiler.cpp
112

Thanks, fixed.

russell.gallop marked an inline comment as done.
russell.gallop edited the summary of this revision. (Show Details)

Fix some style issues and remove unit test.

russell.gallop marked an inline comment as done.
russell.gallop edited the summary of this revision. (Show Details)

Realised that this could be done with just an assert (no separate check function) so this is a much smaller patch now. This still mostly hits the assertion with the previous way of calculating duration (duration_cast). I think the extra code in end() may make it slightly less likely that this case happens, but only with assertions enabled.

Please take another look.

rnk accepted this revision.Sep 4 2019, 1:13 PM

lgtm, sorry for the delay

This revision is now accepted and ready to land.Sep 4 2019, 1:13 PM
This revision was automatically updated to reflect the committed changes.