Partial rewrite based on Hal's suggestions. I now handle the case of constant i1 function arguments as well as multiple i1 sinks per function. The style violations noted by Hal have also been resolved.
Diff Detail
Event Timeline
Please upload the patch with full context, see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface
As a general comment, I'd make this a bit more general. The core problem, as you know, if that the logic that takes care of eliminating unnecessary round-tripping through crbit regs is in PPCISelLowering, and thus is BB local. Using crbits makes a lot of sense only when some input comes from a comparison, or feeds a conditional branch. If none of the i1 values comes from a comparison, and none feeds a branch, then it seems likely we'll have unnecessary round-tripping through crbit regs.
Thus, it seems like you could start with all i1 consumers that are not selects or conditional branches (returns, function arguments, [zs]exts), and build a tree of defines all the way back to the initial producers. Unless one of the producers is a comparison, then promote everything to i32.
lib/Transforms/Scalar/BoolRetToInt.cpp | ||
---|---|---|
42 ↗ | (On Diff #38416) | Why are you using SetVector here? We normally use SmallVector and SmallPtrSet for worklist iterations. Also, you can use pop_back_val below. |
75 ↗ | (On Diff #38416) | Use llvm_unreachable. |
93 ↗ | (On Diff #38416) | There is a lot of weird indentation in this file (tabs?). Please fit it. Running clang-format over the file is the easiest way. |
97 ↗ | (On Diff #38416) | You probably need to ignore debug intrinsics here (at least). |
100 ↗ | (On Diff #38416) | Local variables in LLVM start with a capital letter. |
118 ↗ | (On Diff #38416) | This needs to be filled in appropriately. I assume it can have at least: AU.addPreserved<DominatorTreeWrapperPass>(); |
test/Transforms/BoolRetToInt/BoolRetToIntTest.ll | ||
7 ↗ | (On Diff #38416) | Use CHECK-LABEL for the function-name matching |
15 ↗ | (On Diff #38416) | CHECK-LABEL |
48 ↗ | (On Diff #38416) | Please remove unnecessary attributes and metadata. |
A couple general comments, in addition to the inline comments below:
- Could you please add the LLVM::Statistics class to collect some stats on how many times we are changing bools to ints
- Are there any pass dependencies that need to be specified for this pass?
include/llvm/LinkAllPasses.h | ||
---|---|---|
68 ↗ | (On Diff #39447) | Spacing is off here |
lib/Target/PowerPC/PPCTargetMachine.cpp | ||
289 | This placement will make this one of the first passes that gets runs, won't it? Is this where we want to run it? I admit I haven't thought through the different placements for this, but for some reason I was expecting it to run later (not sure why). | |
lib/Transforms/Scalar/BoolRetToInt.cpp | ||
130 ↗ | (On Diff #39447) | Why not make PromotablePHINodes a private member of BoolRetToInt? |
I have added some llvm::Statistics to collect data on how frequently booleans are promoted to integers, and fixed the spacing irregularity.
I feel like you've not addressed this, and this pass is still much too focused on a particular special case. While I'm generally fine with incremental improvement, the comments in the code should at least explain why this particular special case is worth handling in isolation.
I think this is justifiable because handling more-complex cases might require additional logic to take them out of canonical form. To provide a quick example, consider this:
$ cat /tmp/f.cpp bool test1(bool a, bool b, bool c, bool d) { return a && b && c && d; }
but as we canonicalize this, we get:
$ clang++ -O3 -S -emit-llvm -o - /tmp/f.cpp ; ModuleID = '/tmp/f.cpp' target datalayout = "E-m:e-i64:64-n32:64" target triple = "powerpc64-unknown-linux-gnu" ; Function Attrs: nounwind readnone define zeroext i1 @_Z5test1bbbb(i1 zeroext %a, i1 zeroext %b, i1 zeroext %c, i1 zeroext %d) #0 { entry: %brmerge.demorgan = and i1 %a, %b br i1 %brmerge.demorgan, label %land.lhs.true.5, label %land.end land.lhs.true.5: ; preds = %entry %d. = and i1 %c, %d ret i1 %d. land.end: ; preds = %entry ret i1 false }
and the code we generate, using crbits or not, is pretty bad:
$ clang++ -O3 -S -o - /tmp/f.cpp ... .Lfunc_begin0: # BB#0: # %entry andi. 6, 6, 1 and 3, 3, 4 crmove 20, 1 andi. 5, 5, 1 crmove 21, 1 andi. 3, 3, 1 bc 4, 1, .LBB0_3 # BB#1: # %land.lhs.true.5 crand 20, 21, 20 li 3, 1 bclr 12, 20, 0 # BB#2: # %land.lhs.true.5 li 3, 0 blr .LBB0_3: # %land.end li 3, 0 blr
but this is still not good:
$ clang++ -O3 -S -o - /tmp/f.cpp -mno-crbits ... .Lfunc_begin0: # BB#0: # %entry cmplwi 0, 3, 0 beq 0, .LBB0_3 # BB#1: # %entry cmplwi 0, 4, 0 beq 0, .LBB0_3 # BB#2: # %land.lhs.true.5 and 3, 5, 6 clrldi 3, 3, 32 blr .LBB0_3: # %land.end li 3, 0 blr
so we have a lot of work to do on this front.
I will point out, however, that:
$ cat /tmp/f2.cpp bool test1(bool a, bool b, bool c, bool d) { return a & b & c & d; } $ clang++ -O3 -S -o - /tmp/f2.cpp ... .Lfunc_begin0: # BB#0: # %entry and 3, 3, 4 and 3, 3, 5 and 3, 3, 6 blr
and that is what we want.
lib/Target/PowerPC/PPCTargetMachine.cpp | ||
---|---|---|
289 | This seems like a reasonable place for it. We probably do want it before CodeGenPrep runs, and I see no particular reason for it to be later. |
Expand comments to discuss the performance benefits, the limitations of the current implementation, and how these limitations could be addressed in the future. Also, add support for promoting i1 Arguments to i32 when it makes sense.
This problem has been reported on bugzilla (https://llvm.org/bugs/show_bug.cgi?id=25548). Can you please verify this fix works for the options reported there? Thanks.
I think that we should first add this pass into the PPC backend; there maybe other backends that can use it, but we should only generalize when we have a use case. There are other IR-level passes completely defined in the backend (PPCLoopPreIncPrep and PPCLoopDataPrefetch for example). Emulate how those are added and don't touch the common files/directories here.
lib/Transforms/Scalar/BoolRetToInt.cpp | ||
---|---|---|
32 ↗ | (On Diff #40101) | |
85 ↗ | (On Diff #40101) | Remove extra blank line. |
127 ↗ | (On Diff #40101) | toRemove -> ToRemove |
129 ↗ | (On Diff #40101) | Remove blank line. |
135 ↗ | (On Diff #40101) | Please extract this code into a function (or lambda), so that you can do an early return here so that you don't do any further processing on PHIs you've already decided to remove. |
139 ↗ | (On Diff #40101) | Indentation odd here. |
150 ↗ | (On Diff #40101) | Remove blank line. |
156 ↗ | (On Diff #40101) | Same here; there's no need to check condition 5 if condition 4 fails. Refactor so you don't. |
184 ↗ | (On Diff #40101) | Indentation is odd here. |
191 ↗ | (On Diff #40101) | We should also start searching from [sz]ext i1 -> iN nodes. Either do that, or add a note. |
210 ↗ | (On Diff #40101) | Indentation. |
224 ↗ | (On Diff #40101) | Indentation. |
Move pass to PPC backend, address formatting issues, add early exit when searching for disqualifying PHINodes.
Hi Hal,
If you are satisfied, would you mind committing the patch? I don't have
commit access. Thanks!
Tom
This placement will make this one of the first passes that gets runs, won't it? Is this where we want to run it? I admit I haven't thought through the different placements for this, but for some reason I was expecting it to run later (not sure why).