This is an archive of the discontinued LLVM Phabricator instance.

Emit table lookup from TargetLowering::expandCTTZ()
ClosedPublic

Authored by gsocshubham on Jun 30 2022, 7:17 AM.

Details

Summary

This patch emits table lookup in expandCTTZ. The patch is child revision of https://reviews.llvm.org/D113291.

Context -

https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gsocshubham added inline comments.Jul 15 2022, 5:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

I can use DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT)); instead of << but this changes variable from unsigned int to SDNode.

7858–7867

I tried ConstantDataArray::get(*DAG.getContext(), RshrArr) but it does not generate table in constant pool since RshrArr is not array of unsigned int and not Constant*

I think there might be better alternative to simplify above. I am exploring it.

dmgreen added inline comments.Jul 18 2022, 2:51 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849

Move DeBruijn up so that it can be re-used in the creation of Lookup above.
Perhaps add a variable for the Shift Amount too, which is 27 here but in general is BitWidth - log2(BitWdith), I think.

7853

That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time.
If DeBruijn is an APInt of the correct size APInt DeBruijn(32, 0x077CB531U), then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added.

llvm/test/CodeGen/VE/Scalar/cttz.ll
44 ↗(On Diff #444939)

If the target has a ctpop instruction then that should be preferred to the table lookup.

Fixing review comments - Using APInt for DeBruijn constant.

I am using *DeBruijn.getRawData() to get the integer from the APInt but it does not seem appropriate way. Can anyone suggest me good approach to fetch integer from an APInt? I went through Constants.h and found getRawData() is the only way to fetch a data pointer which then I have to derefer it.

gsocshubham added inline comments.Jul 21 2022, 12:57 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849

Done. Can you please give me some suggestions to make usage of APInt minimal and clean?

7853

Understood. I have used - APInt DeBruijn(32, 0x077CB531U)

Is it correct way? Can you suggest a way to simplify below usage of APInt?

8101

Regarding this line - RshrArr[Rshr] = i;

In debug mode, when I run SPARC/cttz.ll which is present in this patch, I get a segmentation fault. but in release it is working fine.

Why is there a difference in the behaviour?

llvm/test/CodeGen/VE/Scalar/cttz.ll
44 ↗(On Diff #444939)

I have added a check - !isOperationLegal(ISD::CTPOP, VT) but that does not seem to help.

barannikov88 added inline comments.Jul 21 2022, 3:23 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8101

Most probably there is a bug in your code. Most probably you haven't enabled assertions in release mode (-DLLVM_ENABLE_ASSERTIONS=ON is the default in debug mode).
You should be able to find some hints in the printed backtrace, but first make sure that llvm-symbolizer target is built.

llvm/test/CodeGen/VE/Scalar/cttz.ll
44 ↗(On Diff #444939)

VE sets the action to Promote. Perhaps, you should call isOperationExpand instead and do the transformation if it returns true.

barannikov88 added inline comments.Jul 21 2022, 3:54 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8083

I think your code should be moved a little higher, before SDValue Tmp = . Otherwise the Tmp node will be unused if your optimization has been applied.
The check will probably need some corrections. Note the comments above, they may help deduce the correct condition.
Would also be better if you extract your implementation into separate function.

8086

getRawData is very low level method, you should be able to easily avoid it.

8089

Be sure to clang-format your changes before submitting a patch. This is required both by the coding style and by the contribution guidelines.

8091

Ditto, don't use getRawData.

8095

There are always exactly 32 elements, you can use plain C array and avoid dynamic memory allocation at all.
This is a bit of hard-code though, so SmallVector with 32 minimum size might be better.

8099

Ditto here and on the next line. APInt Lshr = DeBruijn.rotl(i) should do.

8115

You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ.

Fix review comments.

There are 3 LIT test failures -

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv32zbb.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll

I will update them as soon as table lookup code is finalized which is still under review.

gsocshubham added inline comments.Jul 22 2022, 4:29 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8083

Moved code above SDValue Tmp = and added into a separate function.

I am using below check to do the optimization -

if (NumBitsPerElt == 32 && !VT.isVector() &&
      TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT))
8086

Done. Now I am using - getZExtValue()

8089

I have formatted code using -

$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i --lines=7866:7872

8091

Updated to getZExtValue()

8095

I am using plain C array and got rid of dynamic memory allocation error.

8099

Done.

8101

Thanks. -DLLVM_ENABLE_ASSERTIONS=ON this helped.

gsocshubham added inline comments.Jul 22 2022, 4:36 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8115

From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment?

If I use array alignment CA, I get below assembly as compared to SPARC/cttz.ll assembly if array element alignment is used. What do you think? Should I update from CPIdx to CA?

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        or %o0, %o2, %o0
barannikov88 added inline comments.Jul 22 2022, 8:10 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762

I believe it returns something else

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7991

You should be able to avoid getZExtValue here either.
Moreover, you don't need APInt here at all, the values are small enough to be put into 'unsigned', e.g.:
unsinged ShiftAmt = BitWidth - Log2_32(BitWidth);
Same for most other APInts.

8000

Could as well be C array.

8004

There is no standalone "rotl" function which would work for 'unsigned', but you could do something like this:
`Lshr = (DeBruijn << i) | (DeBruijn >> (NumBitsPerElt - Amt)));

8035

This is redundant. Your code is already in TargetLowering class. Just call isOperationExpand directly.

barannikov88 added inline comments.Jul 22 2022, 8:19 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8115

I meant you should call
TD.getPrefTypeAlign(Elts->getType())
instead of
TD.getPrefTypeAlign(Elts[0]->getType()
Is the above assembly a result of such change, or did you do something different?

llvm/test/CodeGen/SPARC/cttz.ll
5

Unused

27

Why not just ret i32 %0 ?

When we get that far, we might need to add a way for targets to opt out of the new lowering if it will cause them problems.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7990

Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't necessarily a 32bit quantity - there are other values for 8/16/64. We should add i64 at least, so long as they look beneficial (which they should for 64, not sure about the other values). APInt makes that simpler because it can store any size without overflow, and the operations are performed on the right size.

BitWidth and ShiftAmt can remain as unsigned - the values will always easily fit in an unsigned value. ShiftWidth can be Bitwidth - Log2_32(BitWidth). I would rename the "NumBitsPerElt" argument to "Bitwidth" too, to make it clear what it is and avoid the need for the two.

7996

I think we can make a constant from a APInt directly.

8001

This shouldn't be a plain C array. It's size is dependant on the BitWidth. SmallVector<uint8_t> RshrArr(BitWidth, 0) should create an array that is initialized to 0's with the correct size.

8005

APInt Rshr = Lshr.lshr(ShiftAmt), then use Rshr.getZExtValue() in the line below. It is a little strange to use getZExtValue in an array index, but so long as the array is a safe type, it should complain if the value is out of bounds.

8009–8014

Do we need this loop, or can we create the array from the constant pool directly? The elements should be MVT::i8.

auto *CA = ConstantDataArray::get(*DAG.getContext(), RshrArr);
8024

The load should be loading MVT::i8, extended the result into VT.

gsocshubham added inline comments.Jul 24 2022, 2:49 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8009–8014

If I directly use RshrArr, I get below table in assembly -

.LCPI0_0:
        .ascii  "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"

instead of -

.LCPI0_0:
        .word   0                               ! 0x0
        .word   1                               ! 0x1
        .word   28                              ! 0x1c
        .word   2                               ! 0x2
        .word   29                              ! 0x1d
        .word   14                              ! 0xe
        .word   24                              ! 0x18
        .word   3                               ! 0x3
        .word   30                              ! 0x1e
        .word   22                              ! 0x16
        .word   20                              ! 0x14
        .word   15                              ! 0xf
        .word   25                              ! 0x19
        .word   17                              ! 0x11
        .word   4                               ! 0x4
        .word   8                               ! 0x8
        .word   31                              ! 0x1f
        .word   27                              ! 0x1b
        .word   13                              ! 0xd
        .word   23                              ! 0x17
        .word   21                              ! 0x15
        .word   19                              ! 0x13
        .word   16                              ! 0x10
        .word   7                               ! 0x7
        .word   26                              ! 0x1a
        .word   12                              ! 0xc
        .word   18                              ! 0x12
        .word   6                               ! 0x6
        .word   11                              ! 0xb
        .word   5                               ! 0x5
        .word   10                              ! 0xa
        .word   9                               ! 0x9
        .text
        .globl  f
dmgreen added inline comments.Jul 24 2022, 8:29 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8009–8014

Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will be i32 I think, which uses much more data than it needs, as all the values are in the range 0-BitWidth.

gsocshubham marked an inline comment as not done.Jul 24 2022, 8:58 AM

Fix review comments.

With this revision, there are 2 LIT test failures from RISCV backend which I am guessing will be fixed once we finalize conditions for lowering check.

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll
gsocshubham added inline comments.Jul 24 2022, 11:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7996

I tried replacing DAG.getConstant(DeBruijn.getZExtValue(), dl, VT)) with just DeBruijn but there does not seem a direct conversion from APInt to SDValue`?

8001

Changed it to SmallVector. Thanks!

8005

Done.

8009–8014

Done accordingly.

8024

Can you please elaborate it? I did not understand it. Do you mean to change VT to MVT::i8 in the return statement?

8115

I did something differently. But now it seems fine. Now, we don't have Elts and hence I am taking alignment of an element from RshrArr. Is it fine?

llvm/test/CodeGen/SPARC/cttz.ll
5

Removed.

27

Updated test accordingly.

gsocshubham added inline comments.Jul 24 2022, 11:52 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7990

Here, do you mean to change APInt DeBruijn(32, 0x077CB531U) to APInt DeBruijn(64, 0x0218A392CD3D5DBF)?

NumBitsPerElt is 32 even in the case for call i64 @llvm.cttz.i64(i64 %x, i1 true)

Any suggestions on above?

dmgreen added inline comments.Jul 25 2022, 5:51 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7990

It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64:

APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U) : APInt(64, 0x0218A392CD3D5DBFULL)

For some targets the i64 cttz will be legalized to a i32 cttz. It is for 64bit targets that the 64bit variant is useful.

8024

We want to create an array of i8 elements, load an i8 from it at the right index, and extend that to the original VT. That can either be done by creating a load of an i8 and calling DAG.getZExtOrTrunc - but that might introduce an illegal type where one cannot be created. So it may need to create a ZEXTLOAD load directly.

craig.topper added inline comments.Jul 25 2022, 10:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7997

This should be getShiftAmountConstant.

8007

We shouldn't need a ConstantSDNode and ConstantInt to get the Type*. You can use VT.getTypeForEVT(*DAG.getContext()) in place of CI->getType()

Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT.

8064

Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table lookup. For CTTZ we need a select. CodeGenPrepare rewrites llvm.cttz(i32 %x, i1 false) into a branch around llvm.cttz(i32 %x, i1 true) on some targets. So the difference might be hard to test.

craig.topper added inline comments.Jul 25 2022, 10:07 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl.

craig.topper added inline comments.Jul 25 2022, 10:14 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8015

Lookup needs a sextOrTrunc to the target's pointer type.

craig.topper added inline comments.Jul 25 2022, 10:18 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001

What does Lshr mean? shr is usually "shift right", and lshr is logical shift right. But here L means left? But that means I don't know what shr means.

Fix review comments.

gsocshubham added inline comments.Jul 26 2022, 6:21 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

Updated to shl().

7990

Done.

7997

Updated accordingly.

8001

It was a typo in naming. I have updated it accordingly.

8007

Done.

8015

I have created a new variable TargetLookup using sextOrTrunc().

8024

Created a load using getZExtOrTrunc. If type is illegal then I am returning getExtLoad(ISD::ZEXTLOAD....

Let me know your suggestions on above.

gsocshubham added inline comments.Jul 26 2022, 6:30 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064

Can you point me to target and test where above scenario occurs? I will update it accordingly.

craig.topper added inline comments.Jul 26 2022, 9:16 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7999

You overwrite Lookup here instead of creating TargetLookup.

8012

The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be MVT::i8

8015

The data in the array is only 8-bits. The load type shouldn't be VT.

8019

NewVT is the same as VT.

8023

ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can be avoided.

8064

Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen prepare.

gsocshubham added inline comments.Jul 28 2022, 10:42 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064

@craig.topper - What is the desired output value from the table when x=0?

define i32 @f(i32 %x) {
entry:
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
  ret i32 %0
}
craig.topper added inline comments.Jul 28 2022, 10:46 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064

With i1 true there is no desired out put for 0. I'm concerned about i1 false which needs to produce 32.

Fix some of review comments.

gsocshubham added inline comments.Jul 29 2022, 4:54 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7999

Done.

8012

I tried using MVT::i8 by converting it to Value Type to be used instead of VT but there seems some compatibility issue.

8019

Updated.

8023

Modified the blocks. Actually else part depends on ZExtOrTrunc.

craig.topper added inline comments.Jul 29 2022, 3:24 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8013

Replace VT.getTypeForEVT(*DAG.getContext()) with CA->getType()

8015

What happens if you always create a zextload even if the type isn't legal?

8018

This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType.

8023

The VT being passed to the operand named MemVT needs to be MemVT.

dmgreen added inline comments.Jul 30 2022, 12:52 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.org/wiki/De_Bruijn_sequence#Construction. The lower bits of the last elements use the wrapped-around upper bits of the constant. A rotate is more general than a shift, but for the constants we chose the upper value in the constant is always zero, so they become equivalent. If we are relying on a shift on the generated code, using a shift here too sounds OK.

7989

If BitWidth isn't 32 or 64, we need to return SDValue.
We need to make sure the 64it value is tested too.

gsocshubham added inline comments.Aug 1 2022, 4:41 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8013

If I change alignment as above, then for call i32 @llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC -

I get -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %g0, %o1
        sub %o1, %o0, %o1
        and %o0, %o1, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        retl
        or %o0, %o2, %o0

instead of -

mov     %g0, %o1
sub %o1, %o0, %o1
and %o0, %o1, %o0
sethi 122669, %o1
or %o1, 305, %o1
smul %o0, %o1, %o0
srl %o0, 27, %o0
sethi %hi(.LCPI0_0), %o1
add %o1, %lo(.LCPI0_0), %o1
retl
ld [%o1+%o0], %o0
8015

There is no change in generated code nor in LIT test failures as it was originally submitted earlier.

craig.topper added inline comments.Aug 1 2022, 8:31 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8015

A zext load with i8 memVT?

gsocshubham added inline comments.Aug 1 2022, 11:34 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8015

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0
gsocshubham added inline comments.Aug 1 2022, 11:35 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8015

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0

ZExt with MemVT

Fix review comments.

gsocshubham added inline comments.Aug 3 2022, 7:47 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7989

Done. Added a check.

There are already tests present for cttz.i64 which I have updated in this patch.

gsocshubham added inline comments.Aug 3 2022, 7:50 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8027

@craig.topper - What should be the check here for case llvm.cttz(i32 %x, i1 false)?

It can't be isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT) because that is taken care before this new lowering.

dmgreen added inline comments.Aug 3 2022, 9:41 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7989

This needs to handle other types too - just doing if (BitWidth != 32 && BitWidth != 64) is probably easiest.

8005

Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values.

8018

It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out.

8027

I think, if I understand, this should be if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {
But I'm not sure what this does to the profitability of the transform.

It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth.

craig.topper added inline comments.Aug 3 2022, 10:11 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8016

The Type for the alignment needs to be the type of the elements in CA which is based on what data type is used for the Table SmallVector.

barannikov88 added inline comments.Aug 3 2022, 10:46 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8016

Why the type of the elements specifically and not of the CA itself?
AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element.

Fix review comments.

gsocshubham added inline comments.Aug 4 2022, 3:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7989

Updated accordingly.

8005

Reverted to uint8_t.

For @llvm.cttz.i64(i64 %x, i1 true) compiled with riscv64, I am getting table sequence as -

.ascii  "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<*;:"

Hence, earlier I had changed it to uint16_t where I was getting -

.half   0                               # 0x0
.half   1                               # 0x1
.half   2                               # 0x2
.half   7                               # 0x7
.half   3                               # 0x3
.half   13                              # 0xd
.half   8                               # 0x8
.half   19                              # 0x13
.half   4                               # 0x4
.half   25                              # 0x19
.half   14                              # 0xe
.half   28                              # 0x1c
.half   9                               # 0x9
.half   34                              # 0x22
.half   20                              # 0x14
.half   40                              # 0x28
.half   5                               # 0x5
.half   17                              # 0x11
.half   26                              # 0x1a
.half   38                              # 0x26
.half   15                              # 0xf
.half   46                              # 0x2e
.half   29                              # 0x1d
.half   48                              # 0x30
.half   10                              # 0xa
.half   31                              # 0x1f
.half   35                              # 0x23
.half   54                              # 0x36
.half   21                              # 0x15
.half   50                              # 0x32
.half   41                              # 0x29
.half   57                              # 0x39
.half   63                              # 0x3f
.half   6                               # 0x6
.half   12                              # 0xc
.half   18                              # 0x12
.half   24                              # 0x18
.half   27                              # 0x1b
.half   33                              # 0x21
.half   39                              # 0x27
.half   16                              # 0x10
.half   37                              # 0x25
.half   45                              # 0x2d
.half   47                              # 0x2f
.half   30                              # 0x1e
.half   53                              # 0x35
.half   49                              # 0x31
.half   56                              # 0x38
.half   62                              # 0x3e
.half   11                              # 0xb
.half   23                              # 0x17
.half   32                              # 0x20
.half   36                              # 0x24
.half   44                              # 0x2c
.half   52                              # 0x34
.half   55                              # 0x37
.half   61                              # 0x3d
.half   22                              # 0x16
.half   43                              # 0x2b
.half   51                              # 0x33
.half   60                              # 0x3c
.half   42                              # 0x2a
.half   59                              # 0x3b
.half   58                              # 0x3a
8016

Understood. Thanks!
Now, the alignment is of CA.

8018

Removed Load. It is just getExtLoad with MVT::i8 now.

8027

I have added above check!

gsocshubham added inline comments.Aug 4 2022, 8:09 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064

Regarding this check -

`!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) &&

!isOperationLegal(ISD::CTLZ, VT)`

I am not sure whether these checks are complete to allow new lowering. I have these 3 failures currently -

LLVM :: CodeGen/AVR/cttz.ll
LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll
  1. When I update these tests using utils/update_llc_test_checks.py, it crashes. Does it mean the new lowering is generating wrong code for AVR and RISCV?
  1. Or is it like I am lowering those cases which should not be lowered in first place. Like adding some more checks in above if() block.
craig.topper added inline comments.Aug 4 2022, 8:42 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8003

The RISC-V crash needs this to fix

Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy(TD));

Still looking at AVR

craig.topper added inline comments.Aug 4 2022, 8:59 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8066

This needs to be

if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt))
  return V;

Because there is an early out in CTTZTableLookup for types other than i32/i64.

craig.topper added inline comments.Aug 4 2022, 9:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8066

That will fix the AVR failure

Fix LIT test failures.

Add cttz.i64() in SPARC/cttz.ll

gsocshubham added inline comments.Aug 4 2022, 11:21 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8003

Added above line!

8066

Updated as above.

@craig.topper - Thanks a lot for the help. Are there any other suggestions on above patch? or Is it good to merge?

craig.topper added inline comments.Aug 4 2022, 11:23 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8035

You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo() is equivalent to this.

Fix review comment.

gsocshubham added inline comments.Aug 4 2022, 12:37 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8035

Got it. Removed.

Is it fine now?

craig.topper added inline comments.Aug 4 2022, 1:35 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4764

No need for long here.

SDLoc should be passed by const reference.

Fix review comments.

llvm/include/llvm/CodeGen/TargetLowering.h
4764

Done.

Is it in the state to merge?

craig.topper accepted this revision.Aug 5 2022, 8:04 AM

LGTM to me with those two changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001

You don't need getZExtValue() here

8018

I don't think you need an explicit EVT around MVT::i8

This revision is now accepted and ready to land.Aug 5 2022, 8:04 AM
barannikov88 added inline comments.Aug 5 2022, 8:55 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762

The comment is still confusing. The function does not return reference to the generated table, it returns the expanded node.
The "brief" part of the should also be fixed. Currently, it documents the way the function is used, which it should not. Uses may be added or removed, while the comment should stay the same.

gsocshubham added inline comments.Aug 5 2022, 11:08 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762

How about now?

/// Expand CTTZ node if CTLZ/CTPOP operations are not legal.
/// \param N Node to expand
/// \returns The expansion result or SDValue() if it fails.

@barannikov88 - Is brief part fine now? If not, can you please suggest a better one?

barannikov88 added inline comments.Aug 5 2022, 12:31 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4762

Sounds ok to me now.
My suggestion would be "Expands CTTZ via table lookup". This would not imply that ctlz/ctpop have to be illegal if you want to use this method. This, however, is what the name of the method already says, making the whole comment kind of redundant. Use it or leave it as it is; I guess it is not a major issue since noone else objected.

barannikov88 accepted this revision.Aug 5 2022, 12:32 PM

Thank you for your work!

Fix review comments.

gsocshubham added inline comments.Aug 6 2022, 12:16 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762

Thanks for suggestion. It looks better now!

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001

Removed.

8018

Removed. Done.

Can someone please commit this patch with below details? I do not have commit access.

Name - Shubham Narlawar
Email - shubham.narlawar@rrlogic.co.in

"Shubham Narlawar <shubham.narlawar@rrlogic.co.in>"

dmgreen accepted this revision.Aug 8 2022, 2:44 AM

LGTM - I can commit this for you. I will do so now.

I was trying some testing before I did, everything seems to be alright. The code isn't as optimal as it could be in places, but it is certainly an improvement over what was already present. I tried other backend to see if they all at least compiled successfully. The only one that didn't was BPF, but that appears to be an pre-existing condition. All the others seemed OK for a compile at least, and the results on Arm/hacked up AArch64 produce the correct values. Looks OK.

This revision was landed with ongoing or failed builds.Aug 8 2022, 4:08 AM
This revision was automatically updated to reflect the committed changes.