Page MenuHomePhabricator

Add a pass to lower is.constant and objectsize intrinsics
ClosedPublic

Authored by joerg on Jul 25 2019, 6:51 AM.

Details

Summary

This pass lowers the remaining is.constant and objectsize intrinsics. This moves the lowering from the codegen-prepare pass and the fallback in the SDAG/GlobalISel/FastISel layers.

The API for replaceAndRecursivelySimplify is extended to provide a list of un-modified instructions. Those are scanned for conditional branches with now invariant condition.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2019, 6:51 AM
joerg updated this revision to Diff 212070.Jul 27 2019, 1:22 PM

Avoid goto. Create new BranchInst instead of modifying in-place. Update tests to reflect changes. Move most of the x86 is-constant test to generic.

Couple of random comments, haven't looked at most of it yet.

include/llvm/Analysis/InstructionSimplify.h
258

Really would like to avoid boolean arguments like this.

lib/Analysis/InstructionSimplify.cpp
5047

This seems to return nullptr everywhere?

arsenm added a subscriber: arsenm.Jul 29 2019, 1:39 PM
arsenm added inline comments.
lib/Transforms/Scalar/LowerIsConstantIntrinsic.cpp
51

Should early exit if there are is no is_constant declaration, or just visit uses of it rather than search the whole function for a rarely used intrinsic?

void added a comment.Jul 29 2019, 1:42 PM

Feel free to hijack my revision. :-) I may not get back to my change soon.

joerg marked 3 inline comments as done.Jul 29 2019, 3:01 PM
joerg added inline comments.
include/llvm/Analysis/InstructionSimplify.h
258

Right, this is the one big part of the change I am absolutely not sure about. I was considering passing a pointer to a vector down to replaceAndRecursivelySimplify. The contract would be to return the work list of all modified uses. The caller can then iterate the list and see if any branch instructions are affected.

lib/Analysis/InstructionSimplify.cpp
5047

It does, trying to preserve the style of the other functions here. The question might become irrelevant when going with the work list idea.

lib/Transforms/Scalar/LowerIsConstantIntrinsic.cpp
51

As mentioned on IRC, llvm.is.constant is a family of intrinsics depending on the type of the argument. There doesn't seem to be an easy way to check if a given template-like intrinsic is used.

joerg updated this revision to Diff 212341.Jul 30 2019, 7:45 AM
joerg retitled this revision from Add a pass to lower is.constant intrinsics Needs ReviewPublic to Add a pass to lower is.constant intrinsics.
joerg edited the summary of this revision. (Show Details)

Replace boolean argument to replaceAndRecursivelySimplify with optional vector of un-modified instructions. This simplifies the API change significantly and allows other potential use cases. Redo the restart handling. After a successful simplification step, restart the current BB only, but always do another full pass of the function.

joerg marked 2 inline comments as done and an inline comment as not done.Aug 1 2019, 6:49 AM

I think the pass needs to handle the removal of any remaining llvm.objectsize, as well, so that llvm.is.constant(llvm.objectsize(...)) continues to return true -- even if the object size cannot be determined.

With this new pass, I think we should be able to remove the code handling the objectsize and is.constant instrinsics in CodegenPrepare, FastISel, SelectionDAGBuilder.cpp, and IRTranslator. Which sure is a nice cleanup.

test/Transforms/LowerIsConstant/is-constant.ll
1

While this is a more targeted run, the old asserts are checking what _actually_ needs to continue to work.

E.g. I'd be more confident that my suggested removal of the handling code elsewhere was correct if this test still ran the original 4 tests.

joerg updated this revision to Diff 212867.Aug 1 2019, 11:34 AM

Update PHI nodes in disconnected block.

joerg updated this revision to Diff 213201.Aug 3 2019, 1:04 PM
joerg retitled this revision from Add a pass to lower is.constant intrinsics to Add a pass to lower is.constant and objectsize intrinsics.
joerg edited the summary of this revision. (Show Details)

Generalize slightly to also cover llvm.objectsize which has very similar constraints.

chandlerc requested changes to this revision.Aug 3 2019, 6:54 PM

Generally, I do like the approach. Two high level comments:

First, I don't think we can crash if these things reach lower layers. I think we should retain the logic to just fold them to a constant if somehow they show up. Anything else makes things weird -- the IR is valid, passes the verifier, but crashes? We want to support fuzzing and bisecting and such and so it should be possible to rip this pass out and still have things lower successfully. This pass should just be a much more *advanced* way of lowering these intrinsics. And yes, I understand that in some bizarre cases failing to do this "optimization" will result in inline assembly that cannot be emitted or some such. While I find that really frustrating as well, I still think it is better than making even more boring uses of these intrinsics hostile to things like fuzzers.

Second, I don't think you need the fancy approach you're taking to find the branches that need simplification. How about the following approach:

  1. Walk every instruction and if it is one of these instrinsics, fold it and then recursively simplify.
  1. If we simplify any intrinsics, but after simplifying *all* of them found in the function, walk every terminator in every block and if it is a conditional branch with a constant condition, fold the branch.

I don't think this will be too wasteful, and seems much simpler.

This revision now requires changes to proceed.Aug 3 2019, 6:54 PM
joerg added a comment.Aug 4 2019, 7:28 AM

For the first part, I was actually asked to do that. I don't mind either way.

For the second part, it seems to lead to inconsistent behavior. If we want to go down that road, it seems to me that this behavior should either be a separate branch or be folded into the existing unreachable block removal pass.

joerg added a comment.Aug 5 2019, 6:54 AM

Looking a bit more into the details. Chandler, you've originally suggested going with the LowerAtomic route and that actually does create code that fails the SDAG lowering if the pass is skipped, e.g. on ARM.
The second part is currently overlapping with the CodeGenPrepare pass. I can cleanup the implementation somewhat by reusing the same functionality as that pass is using OR I could factor out a minimal branch for doing the constant folding optimization from CodeGenPrepare as a general branch that is included in the pass chain for -O0, e.g. instead of the more general CodeGenPrepare pass. The main difference is that the non-optimized build would not get any recursive folding from PHI-simplification, but I think that's fine for the original use case. It would also not get the block merging, but again, that seems to be fine for the constraints.

Looking a bit more into the details. Chandler, you've originally suggested going with the LowerAtomic route and that actually does create code that fails the SDAG lowering if the pass is skipped, e.g. on ARM.

Sorry, I had forgotten about that discussion. Thank you for digging it up and reminding me to be consistent with my past self.

If this pass is going into the llc pipeline like the lower atomic pass, you're right it totally makes sense to fail hard in SDAG lowering. Sorry for the confusion again.

The second part is currently overlapping with the CodeGenPrepare pass. I can cleanup the implementation somewhat by reusing the same functionality as that pass is using OR I could factor out a minimal branch for doing the constant folding optimization from CodeGenPrepare as a general branch that is included in the pass chain for -O0, e.g. instead of the more general CodeGenPrepare pass. The main difference is that the non-optimized build would not get any recursive folding from PHI-simplification, but I think that's fine for the original use case. It would also not get the block merging, but again, that seems to be fine for the constraints.

I'm not really sure I understand the problem?

This pass needs to run (at least) in the same general area as the atomics lowering for the same reasons (it would be part of SDAG but it can't be due to CFG mutation). It might optionally be run in other places of course.

I think you have all the code you need in this patch already. I'm just suggesting a different strategy for applying it to the IR that I think will be simpler (and not require any changes to instsimplify). Maybe my suggestion isn't clear? If so I can try to clarify....

void added a comment.Aug 13 2019, 3:03 AM

This PR is possibly related to the original issue: https://bugs.llvm.org/show_bug.cgi?id=42956

Thanks to Joerg for some useful discussion on IRC -- there was a concern I hadn't thought about that is exactly right: we somewhat want this pass to minimally disrupt things but also to be reasonably self contained.

Based on the discussion, I think the patch can still be simplified a bit though (although not as much as I had originally suggested). Notably, the additional API for inst simplify makes a lot more sense to me now.

I'd suggest thinking about this in three phases.

First, simplify the instructions to constants. Here, you would keep the new API to extract a list of potentially relevant instructions. After each simplification, scan this and append any branches or switches to a list of terminators needing an update. (May want it to be a set vector so you don't duplicate.) But don't actually update the CFG in any way during this phase.

Second, rewrite the branches and switches collected in the first phase, collecting basic blocks that might be made unreachable in another SetVector. This just mutates the CFG but doesn't delete any blocks and so doesn't make any of the instructions invalid to visit. You should be able to collect a batch of domtree updates during this phase as well without applying them eagerly.

Third, apply the domtree updates (so you have precise reachability) and remove any blocks from the list that are still reachable. Hand the remaining blocks to the DeleteBasicBlocks utility to completely remove them. This will remove the implicit dependence between this pass and the later codegen passes which I think is much cleaner.

I think that will still remove the need to iterate, and will enhance this to actually fully remove the blocks orphaned by the folding of these branches (but *only* those blocks). Thoughts?

joerg updated this revision to Diff 216822.Aug 23 2019, 6:27 AM

Simplify the pass logic. First round will update the predecessor links and note if any block is orphaned, a second round will remove unreachable blocks if necessary.
The pass can be further refined to use and update DominatorTree and AssumptionCache incrementally, but this should be functionally complete now. It will not handle more complex cases like orphaned loops, but I don't think those are commonly used with is.constant or objectsize conditions either.
The pass will now scan every BB once, but fall back to the start of a BB of recursive removal invalidates the iterator. This seems to be the strictest form I can manage.

joerg updated this revision to Diff 217160.Aug 26 2019, 8:22 AM

Hook up a second instance of the pass after the Float To Int pass for optimized builds. This is after the initial loop transforms, so it can profit from some unrolling, but it is before vectorization. The late run of the pass should is kept for now and ensures any potentially added variants are still dropped before SDAG.

joerg added a comment.Aug 27 2019, 7:03 AM

Chandler, are you OK with getting the InstructionSimplify.h part in now, so that it can be merged into 9.0 and the rest follow separately?

Chandler, are you OK with getting the InstructionSimplify.h part in now, so that it can be merged into 9.0 and the rest follow separately?

Can we not get the entire thing merged? I'd really like that... I think the patch is actually really close. I have a bunch of comments below but they're all pretty boring in reality.

include/llvm/Analysis/InstructionSimplify.h
265–266

I'd call this UnsimplifiedUsers.

lib/Analysis/InstructionSimplify.cpp
5242–5243

I'd make this a bit more precise...

"Recursively visited users which do not themselves simplify are ..."

5251

If this is what clang-format did with this... I am sad. If not, clang-format?

lib/Transforms/Scalar/LowerConstantIntrinsics.cpp
41 ↗(On Diff #217160)

No #include after usings and statistics and such please.

45–51 ↗(On Diff #217160)

Seems simpler to write as:

return isa<Constant>(Op) ? ConstantInt::getTrue(II->getType())
                         : ConstantInt::getFalse(II->getType());
54–55 ↗(On Diff #217160)

clang-format please.

I'd also call this replaceConditionalBranchesOnConstant.

113–117 ↗(On Diff #217160)

As I tried to indicate before, this can be even simpler AFAICT.

I think you should keep a single worklist of unsimplified users. Then you should just do the recursive inst simplify here using the same worklist each time. You'll want to change the loop to go in RPO over the function rather than in the naive order.

Then below this loop you will have a single worklist of unsimplified users that you can walk with the exact code you have above AFAICT.

hans added a subscriber: hans.Aug 28 2019, 7:14 AM

Can we not get the entire thing merged? I'd really like that... I think the patch is actually really close. I have a bunch of comments below but they're all pretty boring in reality.

It's really late in the process -- as in I'm only really waiting for this and one other bug -- so I'm hesitant to take a large change, but I also don't really understand exactly what's at stake here.

Can we not get the entire thing merged? I'd really like that... I think the patch is actually really close. I have a bunch of comments below but they're all pretty boring in reality.

It's really late in the process -- as in I'm only really waiting for this and one other bug -- so I'm hesitant to take a large change, but I also don't really understand exactly what's at stake here.

I'm also against putting it into 9.0, which is supposed to have a final release Real Soon Now. This is not a obviously-correct change, and it should bake in trunk for at least a couple weeks before going into the release, to shake out any unexpected problems.

I think we should target this for the 9.0.1 patch though, which is why joerg wants to merge the API change to 9.0 now.

hans added a comment.Aug 28 2019, 7:43 AM

I'm also against putting it into 9.0, which is supposed to have a final release Real Soon Now. This is not a obviously-correct change, and it should bake in trunk for at least a couple weeks before going into the release, to shake out any unexpected problems.

I think we should target this for the 9.0.1 patch though, which is why joerg wants to merge the API change to 9.0 now.

This sounds good to me. From my point of view, ideally the API change would be broken out and landed on trunk as soon as possible, and then I'd merge that change to the 9 branch.

joerg updated this revision to Diff 217656.Aug 28 2019, 8:00 AM
joerg marked 5 inline comments as done.

Apply feedback.

joerg marked an inline comment as not done.Aug 28 2019, 8:02 AM
joerg added inline comments.
lib/Transforms/Scalar/LowerConstantIntrinsics.cpp
113–117 ↗(On Diff #217160)

Inside a BB, we have to use forward order for the processing so that the intrinsic calls are removed with the expected results. This means that the recursive simplification can remove the next instruction (InstPtr) or unhook it (no parent). Both cases are covered by the test cases. I don't see how a different iteration order inside the BB can work or avoid the problem.

Updating the CFG directly can help in those cases were PHIs are removed by removePredecessor. That seems generally desirable as property. So the question is which iteration order for the BBs to use. BB iteration order can change the result, but I'm not sure computing RPO is worth the hassle here? I'm leaving the actual removing of the dead block to removeUnreachableBlocks, especially since I don't want to open-code recursive removal and other edge cases.

hans added a comment.Aug 30 2019, 2:07 AM

I've merged the InstructionSimplify.h part of this (r370355) to release_90 in r370447.

joerg updated this revision to Diff 219444.Sep 9 2019, 3:39 PM

Switch the pass to use two rounds. The first round will collect all relevant intrinsics in RPO, the second one will translate them accordingly.

joerg marked 2 inline comments as done.Sep 9 2019, 3:40 PM
joerg added a comment.Tue, Oct 1, 1:20 PM

2nd ping. Chandler, care to check this please?

(Tried to get this out last weekend, but was blocked by the Phab down time... Sorry about that ...)

Mostly nits around the exact code here. The approach looks really nice now (and sorry it took so many iterations to get there).

lib/Passes/PassRegistry.def
189

Maybe lower-constant-intrinsics as a name? (Since it handles objectsize as well.

lib/Transforms/Scalar/LowerConstantIntrinsics.cpp
93 ↗(On Diff #219444)

Use an early continue to reduce indentation?

96 ↗(On Diff #219444)

Odd to continue here but break below. Doesn't matter in this case of course, but just seemed surprising.

105–106 ↗(On Diff #219444)

FWIW, this doesn't skip anything, the loop has the same behavior.

112–117 ↗(On Diff #219444)

For both the II thing and the default case -- do we really expect these to ever fail?

I would expect either the VH to be null, or for it to definitively be one of the two intrinsics we added. Maybe switch to cast_or_null above with VN.get() or some such, and llvm_unreachable on the default case.

joerg updated this revision to Diff 224563.Fri, Oct 11, 4:41 AM
joerg marked 6 inline comments as done.

Adjust based on comments.

joerg added inline comments.Fri, Oct 11, 4:55 AM
lib/Transforms/Scalar/LowerConstantIntrinsics.cpp
105–106 ↗(On Diff #219444)

It was primarily to get the correct return value, but I'm changing it to push the check to the final return.

112–117 ↗(On Diff #219444)

Yes, the same concerns as with the earlier version still apply. The recursive simplification can change the instruction type in place or remove it. The logic is still simpler since no new instructions can appear.

chandlerc accepted this revision.Sun, Oct 13, 12:42 AM

FWIW, the adjustments I'm suggesting around tightening the logic can easily be in a follow-up patch if you like. I think generally the code LGTM and I'd just like us to pin down exactly what changes we expect to happen w/ the handles as much as possible to avoid subtle latent bugs creeping in and never getting noticed.

The other two are trivial, feel free to land w/ those fixed.

lib/Transforms/Scalar/LowerConstantIntrinsics.cpp
112–117 ↗(On Diff #219444)

I'm really surprised that it can *change* the value handle in this way.

I guess because we're using a tracking value handle (is that really necessary?) they may be moved onto the constant, but IMO that'd be more cleanly handled by checking for the value handle being either null or a non-instruction value. If its an instruction, it should really only be one of these two intrinsics or something deeply wrong has happened elsewhere, no?

I'm mostly suggesting we assert on that to track down the strange behavior and make sure the overall logic is actually still correct if it comes up rather than potentially hiding a deeper bug.

test/Transforms/LowerConstantIntrinsics/crash-on-large-allocas.ll
1 ↗(On Diff #224563)

Probable just one - is fine?

test/Transforms/LowerConstantIntrinsics/objectsize_basic.ll
1 ↗(On Diff #224563)

Probably just one - is fine?

This revision is now accepted and ready to land.Sun, Oct 13, 12:42 AM
This revision was automatically updated to reflect the committed changes.

Sorry, this change broke the build (http://lab.llvm.org:8011/builders/clang-x86_64-debian-fast/builds/19218) and I reverted it in r374768.