This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Lower @llvm.get.active.lane.mask to setcc
ClosedPublic

Authored by SjoerdMeijer on Jun 22 2020, 6:06 AM.

Details

Summary

This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp ule, and creates vectors for its 2 arguments on which the comparison is performed.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Jun 22 2020, 6:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2020, 6:06 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
efriedma added inline comments.Jun 22 2020, 2:02 PM
llvm/test/CodeGen/Thumb2/active_lane_mask.ll
8

Where's the increment?

SjoerdMeijer added inline comments.Jun 23 2020, 12:33 AM
llvm/test/CodeGen/Thumb2/active_lane_mask.ll
8

Arggg, how silly!

Now with the increment included.

In general, you need to overflow check the increment. (In lanes where it overflows, we should return false.)

In general, you need to overflow check the increment. (In lanes where it overflows, we should return false.)

I don't think I am going to get the overflow checks right for a case like this:

define <4 x i32> @v4i32(i32 %index, i32 %BTC, <4 x i32> %V1, <4 x i32> %V2) {
  %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %index, i32 %BTC)
 ..
}

Even with more context, if this is a loop, I am not sure, and I don't think SelectionDAG is the right framework for this.

To me it looks like we've discovered the perfect motivating case here to start looking into redefining llvm.get.active.lane.mask, because this is now problematic in 2 different places: here and in the tail-predication pass. I had postponed doing this as I thought I could first quickly fix this before returning to that and happily forgot about this overflow case here. So, I am going to look at llvm.get.active.lane.mask now.

It's not hard to write out the overflow check; just replace the ADD with UADDO, and AND the overflow result with the compare result.

Not sure about the quality of the code that would produce. I suspect in common cases, SelectionDAG could prove the UADDO doesn't overflow using known bits.

Ah, okay, sorry. Yes, writing out the overflow check isn't hard. As I was concerned about the codegen, I was stuck with the idea to prove absence of overflow, thus it didn't occur to me to codegen the checks. But yes, I guess that's what is needed.

And now with the overflow checks included and codegen'ed.

efriedma added inline comments.Jun 25 2020, 9:50 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6932

I know I wrote "just replace the ADD with UADDO, and AND the overflow result with the compare result", but I think you need to NOT the overflow vector before the AND.

SjoerdMeijer updated this revision to Diff 273468.EditedJun 25 2020, 11:30 AM

I know I wrote "just replace the ADD with UADDO, and AND the overflow result with the compare result", but I think you need to NOT the overflow vector before the AND.

Yep, thanks, redoing my pen-and-paper exercise with some numbers plugged in, to also confirm this.

viv  = [ i i i i ]
step = [ 0 1 2 3 ]
[i+0 i+1 i+2 i+3 ], [ 0 0 1 1 ] = uaddo [ i i i i ], [ 0 1 2 3 ]
vecbtc = [ btc btc btc btc]
[ 1 1 1 0 ] = setcc ule [i+0 i+1 i+2 i+3 ], [ btc btc btc btc]
[ 0 0 1 0 ] = and [ 1 1 1 0], [ 0 0 1 1 ]
  ^^^^
  wrong: lanes 0 and 1 should be enabled, lanes 2 and 3 should be disabled
  because of overflow in these lanes.

Replacing the last line with:

[ 1 1 0 0 ] = not [ 0 0 1 1 ]
[ 1 1 0 0 ] = and [ 1 1 1 0 ], [ 1 1 0 0 ]

gives the first 2 lanes enabled, and the last 2 disabled.

This revision is now accepted and ready to land.Jun 25 2020, 12:06 PM