This is an archive of the discontinued LLVM Phabricator instance.

creduce script for clang crashes
ClosedPublic

Authored by akhuang on Mar 7 2019, 4:17 PM.

Details

Summary

Add a script that calls C-Reduce on an input file and given the clang crash script, which is used to generate an interestingness test for C-Reduce.

Diff Detail

Repository
rC Clang

Event Timeline

akhuang created this revision.Mar 7 2019, 4:17 PM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: cfe-commits. · View Herald Transcript
smeenai added a subscriber: smeenai.Mar 7 2019, 4:59 PM

Thanks for this! Functionally, this looks good. My comments are mostly just simplicity/readability nitpicks.

clang/utils/creduce-clang-crash.py
36 ↗(On Diff #189796)

nit: this isn't necessary; with doesn't introduce its own scope

37 ↗(On Diff #189796)

nit , 'r' is implied; please remove

38 ↗(On Diff #189796)

Hrm. Looks like there are cases where these crash reproducers are multiple lines, though the actually meaningful one (to us) is always the last one? If so, it may be cleaner to say cmd = f.readlines()[-1] here

43 ↗(On Diff #189796)

nit: can replace with subprocess.check_output unless we explicitly want to ignore the return value (in which case, we should probably still call wait() anyway?)

49 ↗(On Diff #189796)

please pipes.quote(llvm_not) and pipes.quote(cmd)

54 ↗(On Diff #189796)

Why are we escaping the graves and single-quotes?

Also, nit: when possible with regex strings, please use raw strings. Doing so makes it way easier to reason about what Python's going to transform in the string literal before the regex engine gets to see it.

58 ↗(On Diff #189796)

nit: please pipes.quote instead of adding manual quotes

62 ↗(On Diff #189796)

nit: please simplify to del matches[:-5]

63 ↗(On Diff #189796)

nit: pipes.quote please

76 ↗(On Diff #189796)

nit: probably better to concatenate these; otherwise, I think we'll end up with a lot of spaces between these sentences? e.g.

help="The path to the llvm-not executable. "
     "Required if [...]")
108 ↗(On Diff #189796)

I hate being *that* person, this technically isn't portable Python. I don't honestly know if we care about not-CPython being able to run our code, but the fix is to simply use with open(...) as f: instead of this one-liner.

(Specifically, CPython uses refcounts, so testfile will always be closed promptly, unless CPython decides someday to make files cyclic with themselves. GC'ed python implementations might take a while to flush this file and clean it up, which would be bad...)

113 ↗(On Diff #189796)

Does creduce try and jump out into its own pgid? If not, I think this try/except block can be replaced with sys.exit(subprocess.call([creduce, testfile, file_to_reduce]))

120 ↗(On Diff #189796)

nit: this is implied; please remove

akhuang updated this revision to Diff 190126.Mar 11 2019, 11:11 AM
akhuang marked 15 inline comments as done.

Addressed readability comments

The new diffs should not be relative to the previously uploaded diff, but to the git master/svn trunk.

akhuang added inline comments.Mar 11 2019, 11:15 AM
clang/utils/creduce-clang-crash.py
43 ↗(On Diff #189796)

check_output() raises an error when the return code is nonzero, which it is in this case. I think communicate() calls wait(), though.

49 ↗(On Diff #189796)

cmd is already quoted, since it's read out of another file

joerg added a subscriber: joerg.Mar 11 2019, 2:09 PM

For it to be really useful for the majority of bugs, it would be nice to figure out automatically how to get the preprocessing step done and filter out the # lines afterwards. That part alone significantly cuts down the creduce time.

akhuang updated this revision to Diff 190164.Mar 11 2019, 3:02 PM

fixed diff with style edits

rnk added a subscriber: gbiv.Mar 11 2019, 4:09 PM

@gbiv already got all my shell quoting comments.

I think we should do one more round of fixes, we can commit that for you, and then move on to the next steps.

clang/utils/creduce-clang-crash.py
109 ↗(On Diff #189796)

We could try validating that the interestingness test passes here. If it doesn't, that's a bug in this script, I suppose.

CReduce already does this for the user, but it's not clear with our usage model how to do this.

110 ↗(On Diff #189796)

Let's add a TODO (or FIXME, that's more LLVM-y) here to add a step that runs the full pre-processor with -E and -P here. As we've discussed, it often doesn't work, but when it does, it avoids all those issues with #defines, comments, etc, breaking up topformflat. This doesn't have to be in the first version, of course.

I see @joerg added a comment about this as well.

I think we should do one more round of fixes, we can commit that for you, and then move on to the next steps

+1. This looks good to me with outstanding nits fixed

and filter out the # lines afterwards.

AIUI, the very-recently-merged https://github.com/csmith-project/creduce/pull/183 and https://github.com/csmith-project/creduce/pull/188 should hopefully handle the majority of those?

clang/utils/creduce-clang-crash.py
1 ↗(On Diff #190164)

nit: please specify a python version here, since the world is steadily making python == python3 (if pipes.quote is working, I assume you'll want #!/usr/bin/env python2)

rnk added inline comments.Mar 11 2019, 4:55 PM
clang/utils/creduce-clang-crash.py
1 ↗(On Diff #190164)

Please don't do that, there is no python2.exe or python3.exe on Windows. #!/usr/bin/env python is the simplest thing that could work.

akhuang updated this revision to Diff 190189.Mar 11 2019, 5:11 PM

add interestingness test sanity check;
revive ctrl-c hack

thakis added a subscriber: thakis.Mar 11 2019, 6:23 PM

(A few python style comments; feel free to ignore, and feel free to land no matter what you do with the comments.)

clang/utils/creduce-clang-crash.py
1 ↗(On Diff #190164)

There's no python2 on mac either. #!/usr/bin/env python is the correct first line for executable python scripts.

29 ↗(On Diff #190189)

There's distutils.spawn.find_executable() which does this for you.

68 ↗(On Diff #190189)

It's a somewhat common pattern to put a module docstring at the top of the file instead of a comment which explains usage, and then say description=__doc__ here; see llvm/utils/gn/build/symlink_or_copy.py for an example.

90 ↗(On Diff #190189)

It's a bit more common to say return 1 here and do sys.exit(main()) at the bottom.

akhuang updated this revision to Diff 190285.Mar 12 2019, 9:46 AM
akhuang marked 6 inline comments as done.

style edits

I think that addresses all of the concerns people have put forward; given rnk's comment about one more round of fixes, this LGTM. Will check this in for you shortly.

Thanks again!

clang/utils/creduce-clang-crash.py
1 ↗(On Diff #190164)

TIL. Thank you both for the counterexamples :)

This revision is now accepted and ready to land.Mar 12 2019, 10:29 AM
This revision was automatically updated to reflect the committed changes.

In case anyone is interested, for the CHERI fork of LLVM/Clang I added a similar script that contains additional features such as inferring the crash message (so that you get the minimal reproducer for the issue that you are trying to reduce and not some obscure parser crash for invalid input), dealing with infinite loops, generating a test case with a mostly sensible RUN: line.
Furthermore, it will attempt to use bugpoint if it is a backend rather than a frontend crash since that is much much faster than creduce.
It should allows you to run $LLVM_BUILDIR/bin/creduce_crash_testcase.py /tmp/clang-reproducers.sh and then give a sensible minimal test case back

I can try to clean that up and remove the CHERI-specific compiler flags and llvm-lit substitutions.

For reference that script can be found here: https://github.com/CTSRD-CHERI/llvm-project/blob/master/clang/utils/creduce_crash_testcase.py