This is an archive of the discontinued LLVM Phabricator instance.

Convert Returned Constant i1 Values to i32 on PPC64
ClosedPublic

Authored by tjablin on Oct 26 2015, 6:45 AM.

Details

Reviewers
kbarton
hfinkel
Summary

Partial rewrite based on Hal's suggestions. I now handle the case of constant i1 function arguments as well as multiple i1 sinks per function. The style violations noted by Hal have also been resolved.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 36688.Oct 26 2015, 6:45 AM
tjablin updated this revision to Diff 38410.
tjablin retitled this revision from to Convert Returned Constant i1 Values to i32 on PPC64.
tjablin updated this object.
tjablin set the repository for this revision to rL LLVM.
tjablin removed rL LLVM as the repository for this revision.
tjablin updated this revision to Diff 38416.Oct 26 2015, 6:52 AM
tjablin updated this object.
tjablin added a reviewer: hfinkel.
tjablin added a subscriber: llvm-commits.
hfinkel edited edge metadata.Oct 27 2015, 6:50 AM

Please upload the patch with full context, see: http://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

As a general comment, I'd make this a bit more general. The core problem, as you know, if that the logic that takes care of eliminating unnecessary round-tripping through crbit regs is in PPCISelLowering, and thus is BB local. Using crbits makes a lot of sense only when some input comes from a comparison, or feeds a conditional branch. If none of the i1 values comes from a comparison, and none feeds a branch, then it seems likely we'll have unnecessary round-tripping through crbit regs.

Thus, it seems like you could start with all i1 consumers that are not selects or conditional branches (returns, function arguments, [zs]exts), and build a tree of defines all the way back to the initial producers. Unless one of the producers is a comparison, then promote everything to i32.

lib/Transforms/Scalar/BoolRetToInt.cpp
43

Why are you using SetVector here? We normally use SmallVector and SmallPtrSet for worklist iterations. Also, you can use pop_back_val below.

76

Use llvm_unreachable.

94

There is a lot of weird indentation in this file (tabs?). Please fit it. Running clang-format over the file is the easiest way.

98

You probably need to ignore debug intrinsics here (at least).

101

Local variables in LLVM start with a capital letter.

119

This needs to be filled in appropriately. I assume it can have at least:

AU.addPreserved<DominatorTreeWrapperPass>();
test/Transforms/BoolRetToInt/BoolRetToIntTest.ll
8

Use CHECK-LABEL for the function-name matching

16

CHECK-LABEL

49

Please remove unnecessary attributes and metadata.

tjablin updated this revision to Diff 39447.Nov 5 2015, 5:09 PM
tjablin updated this object.
tjablin added a reviewer: kbarton.
kbarton edited edge metadata.Nov 9 2015, 10:32 AM

A couple general comments, in addition to the inline comments below:

  1. Could you please add the LLVM::Statistics class to collect some stats on how many times we are changing bools to ints
  2. Are there any pass dependencies that need to be specified for this pass?
include/llvm/LinkAllPasses.h
68

Spacing is off here

lib/Target/PowerPC/PPCTargetMachine.cpp
286

This placement will make this one of the first passes that gets runs, won't it? Is this where we want to run it? I admit I haven't thought through the different placements for this, but for some reason I was expecting it to run later (not sure why).

lib/Transforms/Scalar/BoolRetToInt.cpp
131

Why not make PromotablePHINodes a private member of BoolRetToInt?
That saves the copy here, and eliminates the need to pass it to the other methods.

tjablin updated this revision to Diff 39811.Nov 10 2015, 7:58 AM
tjablin edited edge metadata.

I have added some llvm::Statistics to collect data on how frequently booleans are promoted to integers, and fixed the spacing irregularity.

...

As a general comment, I'd make this a bit more general. The core problem, as you know, if that the logic that takes care of eliminating unnecessary round-tripping through crbit regs is in PPCISelLowering, and thus is BB local. Using crbits makes a lot of sense only when some input comes from a comparison, or feeds a conditional branch. If none of the i1 values comes from a comparison, and none feeds a branch, then it seems likely we'll have unnecessary round-tripping through crbit regs.

Thus, it seems like you could start with all i1 consumers that are not selects or conditional branches (returns, function arguments, [zs]exts), and build a tree of defines all the way back to the initial producers. Unless one of the producers is a comparison, then promote everything to i32.

I feel like you've not addressed this, and this pass is still much too focused on a particular special case. While I'm generally fine with incremental improvement, the comments in the code should at least explain why this particular special case is worth handling in isolation.

I think this is justifiable because handling more-complex cases might require additional logic to take them out of canonical form. To provide a quick example, consider this:

$ cat /tmp/f.cpp 
bool test1(bool a, bool b, bool c, bool d) {
  return a && b && c && d;
}

but as we canonicalize this, we get:

$ clang++ -O3 -S -emit-llvm -o - /tmp/f.cpp
; ModuleID = '/tmp/f.cpp'
target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

; Function Attrs: nounwind readnone
define zeroext i1 @_Z5test1bbbb(i1 zeroext %a, i1 zeroext %b, i1 zeroext %c, i1 zeroext %d) #0 {
entry:
  %brmerge.demorgan = and i1 %a, %b
  br i1 %brmerge.demorgan, label %land.lhs.true.5, label %land.end

land.lhs.true.5:                                  ; preds = %entry
  %d. = and i1 %c, %d
  ret i1 %d.

land.end:                                         ; preds = %entry
  ret i1 false
}

and the code we generate, using crbits or not, is pretty bad:

$ clang++ -O3 -S -o - /tmp/f.cpp
...
.Lfunc_begin0:
# BB#0:                                 # %entry
	andi. 6, 6, 1
	and 3, 3, 4
	crmove	 20, 1
	andi. 5, 5, 1
	crmove	 21, 1
	andi. 3, 3, 1
	bc 4, 1, .LBB0_3
# BB#1:                                 # %land.lhs.true.5
	crand 20, 21, 20
	li 3, 1
	bclr 12, 20, 0
# BB#2:                                 # %land.lhs.true.5
	li 3, 0
	blr
.LBB0_3:                                # %land.end
	li 3, 0
	blr

but this is still not good:

$ clang++ -O3 -S -o - /tmp/f.cpp -mno-crbits
...
.Lfunc_begin0:
# BB#0:                                 # %entry
	cmplwi 0, 3, 0
	beq	 0, .LBB0_3
# BB#1:                                 # %entry
	cmplwi 0, 4, 0
	beq	 0, .LBB0_3
# BB#2:                                 # %land.lhs.true.5
	and 3, 5, 6
	clrldi	 3, 3, 32
	blr
.LBB0_3:                                # %land.end
	li 3, 0
	blr

so we have a lot of work to do on this front.

I will point out, however, that:

$ cat /tmp/f2.cpp 
bool test1(bool a, bool b, bool c, bool d) {
  return a & b & c & d;
}

$ clang++ -O3 -S -o - /tmp/f2.cpp
...
.Lfunc_begin0:
# BB#0:                                 # %entry
	and 3, 3, 4
	and 3, 3, 5
	and 3, 3, 6
	blr

and that is what we want.

lib/Target/PowerPC/PPCTargetMachine.cpp
286

This seems like a reasonable place for it. We probably do want it before CodeGenPrep runs, and I see no particular reason for it to be later.

tjablin updated this revision to Diff 40101.Nov 12 2015, 5:53 PM

Expand comments to discuss the performance benefits, the limitations of the current implementation, and how these limitations could be addressed in the future. Also, add support for promoting i1 Arguments to i32 when it makes sense.

This problem has been reported on bugzilla (https://llvm.org/bugs/show_bug.cgi?id=25548). Can you please verify this fix works for the options reported there? Thanks.

I think that we should first add this pass into the PPC backend; there maybe other backends that can use it, but we should only generalize when we have a use case. There are other IR-level passes completely defined in the backend (PPCLoopPreIncPrep and PPCLoopDataPrefetch for example). Emulate how those are added and don't touch the common files/directories here.

lib/Transforms/Scalar/BoolRetToInt.cpp
33

Saying PPC here is over-generalizing, because it is not true on the A2, or on various Freescale cores. You specifically mean P7/P8.

86

Remove extra blank line.

128

toRemove -> ToRemove

130

Remove blank line.

136

Please extract this code into a function (or lambda), so that you can do an early return here so that you don't do any further processing on PHIs you've already decided to remove.

140

Indentation odd here.

151

Remove blank line.

157

Same here; there's no need to check condition 5 if condition 4 fails. Refactor so you don't.

185

Indentation is odd here.

192

We should also start searching from [sz]ext i1 -> iN nodes. Either do that, or add a note.

211

Indentation.

225

Indentation.

tjablin updated this revision to Diff 41161.Nov 25 2015, 11:37 AM

Move pass to PPC backend, address formatting issues, add early exit when searching for disqualifying PHINodes.

hfinkel accepted this revision.Nov 26 2015, 8:09 AM
hfinkel edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Nov 26 2015, 8:09 AM

Hi Hal,
If you are satisfied, would you mind committing the patch? I don't have
commit access. Thanks!
Tom

tjablin updated this revision to Diff 41629.Dec 2 2015, 8:20 AM
tjablin edited edge metadata.

Fix the bug caught by Kit. Sorry.

kbarton closed this revision.Dec 7 2015, 12:53 PM

Committed r254942