Page MenuHomePhabricator

Lower llvm.objectsize earlier in our optimization pipeline
Needs ReviewPublic

Authored by george.burgess.iv on Jul 9 2018, 3:04 PM.

Details

Reviewers
efriedma
jfb
Summary

For context, please see "cleaning up ‘br i1 false’ cases in CodeGenPrepare"

I don't know if InstCombine is a great place for this, or if we'd prefer to have some kind of LowerBestEffortPostOptimizationIntrinsics pass, or ...

Suggestions for how to better test this are appreciated.

Summarizing my appearing-soon response on said email thread, I ran this on a large project that has a clang-tailored FORTIFY implementation. With this change, we fail to lower (e.g. we return failure values for) 1.9% more calls to objectsize, but we also end up lowering quite a few more objectsize intrinsics in total. The most likely explanation for this seems to be that we're able to DCE most of those "new" failures before hitting CGP. So, I think a 1.9% degradation is likely to be an overstatement.

During that test, I added an llvm_unreachable to CGP's objectsize lowering logic. It wasn't hit.

Diff Detail

Event Timeline

I don't really like adding arbitrary flags to existing passes; better to have a separate pass, I think.

Switch to having a pass to do this, as requested.

I don't really like adding arbitrary flags to existing passes; better to have a separate pass, I think.

SGTM; better names for the pass are appreciated :)

This approach seems reasonable, but it would be great to have a chance to run some numbers internally to be sure we not losing something important.

You mentioned a 1.9% degradation with the current placement. Did you happen to try any other locations in the pipeline?

You mentioned a 1.9% degradation with the current placement. Did you happen to try any other locations in the pipeline?

I did not, though I'd be happy to do so if anyone has recommendations :)

Looking at this, it appears that we may be running this pass too early with LTO enabled. I'll update this patch shortly to rebase it + delay running this pass until link-time if we're going to be running some flavor of LTO, and will collect new accuracy numbers for ${large_project} soonish.

Any other numbers/vetting would be highly appreciated.

Rebase, and only run this in the non-prelink full-LTO pipeline

Apparently I misread the ThinLTO code, so we were fine with that. Full-LTO would lower these before linking, though, which is suboptimal.

There doesn't appear to be a full-LTO test-case (and I can't figure out how to flip PrepareForLTO without going through e.g. cfe), so making a test for this behavior on full-LTO seems a bit icky. Happy to try to add some cruft to do so if people want.

And numbers for building ${large_project}, as promised:

I built each compiler/ran the numbers twice with the patch and twice without, and all of the numbers were stable across reruns.

Across ~29,000 TUs that had at least one attempt by LLVM to lower an @llvm.objectsize intrinsic:

  • With this patch, we lowered 585,692 successfully, and we failed to lower 649,086, for a total of 1,234,778 "forceful" lowerings (e.g. either we return success, or we returned a "we're giving up; here's a conservative answer" value). This is a 52.6% failure rate.
  • Without this patch, we lowered 589,559 successfully, and we failed to lower 640,908, for a total of 1,230,507 "forceful" lowerings. This is a 52.1% failure rate.

It's interesting to note that the "with the patch" numbers included 30 more TUs than the "without the patch" numbers. Like said, a TU only "counts" here if it had one or more objectsize calls that LLVM had to lower, so it would seem that some number of our failed lowerings with the patch are part of functions or blocks of code that just get DCE'd later on.

(For users of C/C++/ObjC specifically, I'd like to emphasize that these numbers don't include any calls to __builtin_object_size that clang could lower without the help of LLVM.)