This is an archive of the discontinued LLVM Phabricator instance.

[CGP] widen switch condition and case constants to target's register width
ClosedPublic

Authored by spatel on Oct 7 2015, 2:08 PM.

Details

Summary

This is a follow-up from the discussion in D12965. The block-at-a-time limitation of SelectionDAG also came up in D13297.

Without the InstCombine change from D12965, I don't expect this patch to make any difference in the real world because InstCombine will already be widening cases like this in visitSwitchInst(). But we need to have this CGP safety harness in place before proceeding with any shrinkage in D12965, so we won't generate extra extends for compares.

There are regression tests for CGP in both test/Transforms/CodeGenPrepare and test/CodeGen/* . I opted for IR regression tests in the patch because that seems like a clearer way to test the transform, but PowerPC CodeGen for the i16 widening test is shown below. x86 will need more work to solve: https://llvm.org/bugs/show_bug.cgi?id=22473

Before:

  1. BB#0: mr 4, 3 extsh. 3, 4 ble 0, .LBB0_5
  2. BB#1: cmpwi 3, 99 bgt 0, .LBB0_9
  3. BB#2: rlwinm 4, 4, 0, 16, 31 <--- 32-bit mask/extend li 3, 0 cmplwi 4, 1 beqlr 0
  4. BB#3: cmplwi 4, 10 bne 0, .LBB0_12
  5. BB#4: li 3, 1 blr .LBB0_5: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 65436 beq 0, .LBB0_13
  6. BB#6: cmplwi 3, 65526 beq 0, .LBB0_15
  7. BB#7: cmplwi 3, 65535 bne 0, .LBB0_12
  8. BB#8: li 3, 4 blr .LBB0_9: rlwinm 3, 4, 0, 16, 31 <--- 32-bit mask/extend cmplwi 3, 100 beq 0, .LBB0_14 ...

After:

    1. BB#0: rlwinm 4, 3, 0, 16, 31 <--- mask/extend to 32-bit and then use that for comparisons cmpwi 4, 999 ble 0, .LBB0_5
    2. BB#1: lis 3, 0 ori 3, 3, 65525 cmpw 4, 3 bgt 0, .LBB0_9
    3. BB#2: cmplwi 4, 1000 beq 0, .LBB0_14
  1. BB#3: cmplwi 4, 65436 bne 0, .LBB0_13
    1. BB#4: li 3, 6 blr .LBB0_5: li 3, 0 cmplwi 4, 1 beqlr 0
    2. BB#6: cmplwi 4, 10 beq 0, .LBB0_12
    3. BB#7: cmplwi 4, 100 bne 0, .LBB0_13
    4. BB#8: li 3, 2 blr .LBB0_9: cmplwi 4, 65526 beq 0, .LBB0_15
    5. BB#10: cmplwi 4, 65535 bne 0, .LBB0_13 ...

Diff Detail

Event Timeline

spatel updated this revision to Diff 36790.Oct 7 2015, 2:08 PM
spatel retitled this revision from to [CGP] widen switch condition and case constants to target's register width.
spatel updated this object.
spatel added reviewers: hfinkel, reames, mehdi_amini.
spatel added a subscriber: llvm-commits.
spatel added a comment.Oct 7 2015, 2:12 PM

Phab formatting was thrown off by '#' on the BB's. Removed below:

Before:

 BB#0:
  mr 4, 3
  extsh. 3, 4
  ble 0, .LBB0_5
 BB#1: 
  cmpwi	 3, 99
  bgt	 0, .LBB0_9
 BB#2:            
  rlwinm 4, 4, 0, 16, 31      <--- 32-bit mask/extend
  li 3, 0
  cmplwi	 4, 1
  beqlr 0
 BB#3:            
  cmplwi	 4, 10
  bne	 0, .LBB0_12
 BB#4:                      
  li 3, 1
  blr
.LBB0_5:                             
  rlwinm 3, 4, 0, 16, 31         <--- 32-bit mask/extend
  cmplwi	 3, 65436
  beq	 0, .LBB0_13
 BB#6:                            
  cmplwi	 3, 65526
  beq	 0, .LBB0_15
 BB#7:                       
  cmplwi	 3, 65535
  bne	 0, .LBB0_12
 BB#8:                       
  li 3, 4
  blr
.LBB0_9:                       
  rlwinm 3, 4, 0, 16, 31      <--- 32-bit mask/extend
  cmplwi	 3, 100
  beq	 0, .LBB0_14
...

After:

 BB#0:        
  rlwinm 4, 3, 0, 16, 31   <--- mask/extend to 32-bit and then use that for comparisons
  cmpwi	 4, 999
  ble 0, .LBB0_5
 BB#1:          
  lis 3, 0
  ori 3, 3, 65525
  cmpw	 4, 3
  bgt	 0, .LBB0_9
 BB#2:         
  cmplwi	 4, 1000
  beq	 0, .LBB0_14
 BB#3:    
  cmplwi	 4, 65436
  bne	 0, .LBB0_13
 BB#4:       
  li 3, 6
  blr
.LBB0_5:   
  li 3, 0
  cmplwi	 4, 1
  beqlr 0
 BB#6: 
  cmplwi	 4, 10
  beq	 0, .LBB0_12
 BB#7:             
  cmplwi	 4, 100
  bne	 0, .LBB0_13
 BB#8:             
  li 3, 2
  blr
.LBB0_9:       
  cmplwi	 4, 65526
  beq	 0, .LBB0_15
 BB#10:      
  cmplwi	 4, 65535
  bne	 0, .LBB0_13
...
reames edited edge metadata.Oct 14 2015, 2:15 PM

This looks entirely reasonable to me, but I'm not an expert in this area. The code LGTM, but I'm not sure about the implications of the change. Can someone else comment on that?

lib/CodeGen/CodeGenPrepare.cpp
4015

auto

hfinkel edited edge metadata.Oct 27 2015, 6:10 PM

After:

BB#0:        
 rlwinm 4, 3, 0, 16, 31   <--- mask/extend to 32-bit and then use that for comparisons
 cmpwi	 4, 999

...

Someone amusingly, even though you can't tell from this test case, this is somewhat suboptimal. Consider this:

$ cat /tmp/foo.c 
short foo(short a) { return a; }

$ clang -O3 -S -emit-llvm -o - /tmp/foo.c 
target datalayout = "E-m:e-i64:64-n32:64"
target triple = "powerpc64-unknown-linux-gnu"

define signext i16 @foo(i16 signext %a) #0 {
entry:
  ret i16 %a
}

and, notice here that the argument has the 'signext' attribute. This argument will be carried in an i32 register, but sign extended from i16. Thus, at least in theory, if we sign extended in this transformation, instead of zero-extended, we'd not actually need any extension instruction at all.

Can you think of any way to have it pick sext instead of zext when we know the input is really sign extended anyway? This might only apply to function arguments?

lib/CodeGen/CodeGenPrepare.cpp
4016

We don't need the 'OrBitCast' here, do we?

spatel marked 2 inline comments as done.Nov 2 2015, 11:28 AM
spatel added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
4016

Correct - the check above guarantees that RegWidth is greater than the original type.

spatel marked an inline comment as done.Nov 2 2015, 11:30 AM

Can you think of any way to have it pick sext instead of zext when we know the input is really sign extended anyway? This might only apply to function arguments?

Nice catch. I agree that we should sext instead of zext in that case. If it's only applicable to function args, it seems straightforward to add that check. New patch coming soon.

spatel updated this revision to Diff 38953.Nov 2 2015, 11:39 AM
spatel edited edge metadata.

Patch updated:

  1. Add check for a sign-extended function argument - in that case sext everything instead of zext.
  2. Minimized tests - we don't need so many switch cases to show the differences in IR (the larger examples were better for visualizing the asm differences).
  3. Added test case for #1.
hfinkel accepted this revision.Nov 2 2015, 11:51 AM
hfinkel edited edge metadata.

LGTM, thanks!

This revision is now accepted and ready to land.Nov 2 2015, 11:51 AM
This revision was automatically updated to reflect the committed changes.