This is an archive of the discontinued LLVM Phabricator instance.

[XRay] [compiler-rt] - Fix standalone and non-deterministic test issue
ClosedPublic

Authored by kpw on Apr 19 2017, 5:04 PM.

Details

Summary

The thread order test fails sometimes my machine independently of standalone
build.

From testing both standalone and in-tree build, I see I configured it wrong.

The other hypothesis for an issue is that cold starts can interfere with whether
record unwriting happens. Once this happens more than once, we can naively
FileCheck on the wrong test output, which compounds the issue.

While "rm blah.* || true" will print to stderr if the glob can't expand, this is
mostly harmless and makes sure earlier failing tests don't sabotage us.

Example failure:


header:

version:         1
type:            1
constant-tsc:    true
nonstop-tsc:     true
cycle-frequency: 3800000000

records:

  • { type: 0, func-id: 1, function: 'f1()', cpu: 9, thread: 21377, kind: function-enter, tsc: 2413745203147228 }
  • { type: 0, func-id: 1, function: 'f1()', cpu: 9, thread: 21377, kind: function-exit, tsc: 2413745203304238 }

...

The CMAKE related change fixes the expectation that COMPILER_RT_STANDALONE_BUILD will be explicitly FALSE instead
of empty string when it is not "TRUE".

Event Timeline

kpw created this revision.Apr 19 2017, 5:04 PM
kpw added a comment.Apr 19 2017, 5:27 PM

With a standalone build, check-xray produces

Testing Time: 0.31s

Expected Passes    : 6
Unsupported Tests  : 2

With an in-tree build, the results are

Testing Time: 4.62s

Expected Passes    : 7
Expected Failures  : 1
dberris edited edge metadata.Apr 19 2017, 7:04 PM

Interesting. I think this test is flaky by construction either due to a race condition (f2 may have already run before the patching happened), or there's some other thing causing f2's records to not show up (maybe something to do with the changes made recently to check the cpu frequency the first time the handler is called).

On a related note, do we have a test case in the FDR log reader tests that checks what we do when for example we cannot find the entry record for an unmatched exit record?

FWIW, I cannot reproduce this locally in a consistent manner (the failure) but I have seen it happen a couple of times.

kpw added a comment.Apr 19 2017, 7:06 PM

I've confirmed this test is buggy in the writer. The reader sees the EOB record before any function records for one of the threads.

kpw added a comment.Apr 19 2017, 7:23 PM

On a related note, do we have a test case in the FDR log reader tests that checks what we do when for example we cannot find the entry record for an unmatched exit record?

We don't have that, but the reader will allow that currently. We may have to be careful about assertions like that due to potential StronglyConnectedComponent optimizations. I don't know if optimizations would go so far as to jump to an alternative prologue that saves less registers or something crazy like that. Our analysis would be hosed for at least a portion of the program if that happened though. Where is it better to fail?

I think this test is flaky by construction either due to a race condition (f2 may have already run before the patching happened), or there's some other thing causing f2's records to not show up (maybe something to do with the changes made recently to check the cpu frequency the first time the handler is called).

It's possible that added I/O and indeterminism tip the function over the unwriting heuristic. There is definitely a cold start for the first function, it makes calls to get metadata in order.
Adding xray_fdr_log_func_duration_threshold_us to that test is probably a good idea, but it doesn't make the test pass for me. The number of ticks to qualify for unwrite could be behaving weird if that data isn't managed across threads correctly though...

kpw updated this revision to Diff 95875.Apr 19 2017, 7:57 PM

My best guess is that the problem is test speed unwriting determinism coupled with
leftover test garbage. This relies on bash being the shell run by RUN, but works
whether or not there is garbage from a previous failure.

Bash traps might be a way to get "finally" like behavior in lit scripts in the future.

dberris accepted this revision.Apr 19 2017, 8:10 PM

LGTM with one nit.

test/xray/TestCases/Linux/fdr-thread-order.cc
2

I don't think we can make the assumption that bash is the shell, but unconditionally removing the files anyway would be fine at this point.

This revision is now accepted and ready to land.Apr 19 2017, 8:10 PM
kpw updated this revision to Diff 95885.Apr 19 2017, 10:04 PM

Using only posix features instead of bash's compdef.

The command will print to stderr, but that's harmless and better than relying on the
location of /dev/null.

Btw, none of the xray tests in the Linux/X86 folder work with the lit internal
shell.
This mostly comes down to the absence of variable assignment for setting flags
on our test invocations. This is actually OK because the tests already target
Linux.

LGTM -- thanks @kpw!

kpw retitled this revision from [XRay] [compiler-rt] - Fix standalone build test and XFAIL a test. to [XRay] [compiler-rt] - Fix standalone and non-deterministic test issue.Apr 19 2017, 11:03 PM
kpw edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.