This lowers intrinsic @llvm.get.active.lane.mask to a setcc node, i.e. an icmp ule, and creates vectors for its 2 arguments on which the comparison is performed.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/Thumb2/active_lane_mask.ll | ||
---|---|---|
9 | Where's the increment? |
llvm/test/CodeGen/Thumb2/active_lane_mask.ll | ||
---|---|---|
9 | Arggg, how silly! |
In general, you need to overflow check the increment. (In lanes where it overflows, we should return false.)
I don't think I am going to get the overflow checks right for a case like this:
define <4 x i32> @v4i32(i32 %index, i32 %BTC, <4 x i32> %V1, <4 x i32> %V2) { %active.lane.mask = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i32(i32 %index, i32 %BTC) .. }
Even with more context, if this is a loop, I am not sure, and I don't think SelectionDAG is the right framework for this.
To me it looks like we've discovered the perfect motivating case here to start looking into redefining llvm.get.active.lane.mask, because this is now problematic in 2 different places: here and in the tail-predication pass. I had postponed doing this as I thought I could first quickly fix this before returning to that and happily forgot about this overflow case here. So, I am going to look at llvm.get.active.lane.mask now.
It's not hard to write out the overflow check; just replace the ADD with UADDO, and AND the overflow result with the compare result.
Not sure about the quality of the code that would produce. I suspect in common cases, SelectionDAG could prove the UADDO doesn't overflow using known bits.
Ah, okay, sorry. Yes, writing out the overflow check isn't hard. As I was concerned about the codegen, I was stuck with the idea to prove absence of overflow, thus it didn't occur to me to codegen the checks. But yes, I guess that's what is needed.
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
---|---|---|
6946 | I know I wrote "just replace the ADD with UADDO, and AND the overflow result with the compare result", but I think you need to NOT the overflow vector before the AND. |
I know I wrote "just replace the ADD with UADDO, and AND the overflow result with the compare result", but I think you need to NOT the overflow vector before the AND.
Yep, thanks, redoing my pen-and-paper exercise with some numbers plugged in, to also confirm this.
viv = [ i i i i ] step = [ 0 1 2 3 ] [i+0 i+1 i+2 i+3 ], [ 0 0 1 1 ] = uaddo [ i i i i ], [ 0 1 2 3 ] vecbtc = [ btc btc btc btc] [ 1 1 1 0 ] = setcc ule [i+0 i+1 i+2 i+3 ], [ btc btc btc btc] [ 0 0 1 0 ] = and [ 1 1 1 0], [ 0 0 1 1 ] ^^^^ wrong: lanes 0 and 1 should be enabled, lanes 2 and 3 should be disabled because of overflow in these lanes.
Replacing the last line with:
[ 1 1 0 0 ] = not [ 0 0 1 1 ] [ 1 1 0 0 ] = and [ 1 1 1 0 ], [ 1 1 0 0 ]
gives the first 2 lanes enabled, and the last 2 disabled.
I know I wrote "just replace the ADD with UADDO, and AND the overflow result with the compare result", but I think you need to NOT the overflow vector before the AND.