This is an archive of the discontinued LLVM Phabricator instance.

Emit table lookup from TargetLowering::expandCTTZ()
ClosedPublic

Authored by gsocshubham on Jun 30 2022, 7:17 AM.

Details

Summary

This patch emits table lookup in expandCTTZ. The patch is child revision of https://reviews.llvm.org/D113291.

Context -

https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gsocshubham added inline comments.Jul 15 2022, 5:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

I can use DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT)); instead of << but this changes variable from unsigned int to SDNode.

7858–7867

I tried ConstantDataArray::get(*DAG.getContext(), RshrArr) but it does not generate table in constant pool since RshrArr is not array of unsigned int and not Constant*

I think there might be better alternative to simplify above. I am exploring it.

dmgreen added inline comments.Jul 18 2022, 2:51 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849

Move DeBruijn up so that it can be re-used in the creation of Lookup above.
Perhaps add a variable for the Shift Amount too, which is 27 here but in general is BitWidth - log2(BitWdith), I think.

7853

That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time.
If DeBruijn is an APInt of the correct size APInt DeBruijn(32, 0x077CB531U), then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added.

llvm/test/CodeGen/VE/Scalar/cttz.ll
44

If the target has a ctpop instruction then that should be preferred to the table lookup.

Fixing review comments - Using APInt for DeBruijn constant.

I am using *DeBruijn.getRawData() to get the integer from the APInt but it does not seem appropriate way. Can anyone suggest me good approach to fetch integer from an APInt? I went through Constants.h and found getRawData() is the only way to fetch a data pointer which then I have to derefer it.

gsocshubham added inline comments.Jul 21 2022, 12:57 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849

Done. Can you please give me some suggestions to make usage of APInt minimal and clean?

7853

Understood. I have used - APInt DeBruijn(32, 0x077CB531U)

Is it correct way? Can you suggest a way to simplify below usage of APInt?

7856

Regarding this line - RshrArr[Rshr] = i;

In debug mode, when I run SPARC/cttz.ll which is present in this patch, I get a segmentation fault. but in release it is working fine.

Why is there a difference in the behaviour?

llvm/test/CodeGen/VE/Scalar/cttz.ll
44

I have added a check - !isOperationLegal(ISD::CTPOP, VT) but that does not seem to help.

barannikov88 added inline comments.Jul 21 2022, 3:23 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7856

Most probably there is a bug in your code. Most probably you haven't enabled assertions in release mode (-DLLVM_ENABLE_ASSERTIONS=ON is the default in debug mode).
You should be able to find some hints in the printed backtrace, but first make sure that llvm-symbolizer target is built.

llvm/test/CodeGen/VE/Scalar/cttz.ll
44

VE sets the action to Promote. Perhaps, you should call isOperationExpand instead and do the transformation if it returns true.

barannikov88 added inline comments.Jul 21 2022, 3:54 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7838

I think your code should be moved a little higher, before SDValue Tmp = . Otherwise the Tmp node will be unused if your optimization has been applied.
The check will probably need some corrections. Note the comments above, they may help deduce the correct condition.
Would also be better if you extract your implementation into separate function.

7841

getRawData is very low level method, you should be able to easily avoid it.

7844

Be sure to clang-format your changes before submitting a patch. This is required both by the coding style and by the contribution guidelines.

7846

Ditto, don't use getRawData.

7850

There are always exactly 32 elements, you can use plain C array and avoid dynamic memory allocation at all.
This is a bit of hard-code though, so SmallVector with 32 minimum size might be better.

7854

Ditto here and on the next line. APInt Lshr = DeBruijn.rotl(i) should do.

7870

You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ.

Fix review comments.

There are 3 LIT test failures -

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv32zbb.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll

I will update them as soon as table lookup code is finalized which is still under review.

gsocshubham added inline comments.Jul 22 2022, 4:29 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7838

Moved code above SDValue Tmp = and added into a separate function.

I am using below check to do the optimization -

if (NumBitsPerElt == 32 && !VT.isVector() &&
      TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT))
7841

Done. Now I am using - getZExtValue()

7844

I have formatted code using -

$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i --lines=7866:7872

7846

Updated to getZExtValue()

7850

I am using plain C array and got rid of dynamic memory allocation error.

7854

Done.

7856

Thanks. -DLLVM_ENABLE_ASSERTIONS=ON this helped.

gsocshubham added inline comments.Jul 22 2022, 4:36 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7870

From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment?

If I use array alignment CA, I get below assembly as compared to SPARC/cttz.ll assembly if array element alignment is used. What do you think? Should I update from CPIdx to CA?

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        or %o0, %o2, %o0
barannikov88 added inline comments.Jul 22 2022, 8:10 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4725 ↗(On Diff #446780)

I believe it returns something else

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7796

You should be able to avoid getZExtValue here either.
Moreover, you don't need APInt here at all, the values are small enough to be put into 'unsigned', e.g.:
unsinged ShiftAmt = BitWidth - Log2_32(BitWidth);
Same for most other APInts.

7805

Could as well be C array.

7809

There is no standalone "rotl" function which would work for 'unsigned', but you could do something like this:
`Lshr = (DeBruijn << i) | (DeBruijn >> (NumBitsPerElt - Amt)));

7839

This is redundant. Your code is already in TargetLowering class. Just call isOperationExpand directly.

barannikov88 added inline comments.Jul 22 2022, 8:19 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7870

I meant you should call
TD.getPrefTypeAlign(Elts->getType())
instead of
TD.getPrefTypeAlign(Elts[0]->getType()
Is the above assembly a result of such change, or did you do something different?

llvm/test/CodeGen/SPARC/cttz.ll
5

Unused

27

Why not just ret i32 %0 ?

When we get that far, we might need to add a way for targets to opt out of the new lowering if it will cause them problems.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795

Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't necessarily a 32bit quantity - there are other values for 8/16/64. We should add i64 at least, so long as they look beneficial (which they should for 64, not sure about the other values). APInt makes that simpler because it can store any size without overflow, and the operations are performed on the right size.

BitWidth and ShiftAmt can remain as unsigned - the values will always easily fit in an unsigned value. ShiftWidth can be Bitwidth - Log2_32(BitWidth). I would rename the "NumBitsPerElt" argument to "Bitwidth" too, to make it clear what it is and avoid the need for the two.

7801

I think we can make a constant from a APInt directly.

7806

This shouldn't be a plain C array. It's size is dependant on the BitWidth. SmallVector<uint8_t> RshrArr(BitWidth, 0) should create an array that is initialized to 0's with the correct size.

7810

APInt Rshr = Lshr.lshr(ShiftAmt), then use Rshr.getZExtValue() in the line below. It is a little strange to use getZExtValue in an array index, but so long as the array is a safe type, it should complain if the value is out of bounds.

7814–7819

Do we need this loop, or can we create the array from the constant pool directly? The elements should be MVT::i8.

auto *CA = ConstantDataArray::get(*DAG.getContext(), RshrArr);
7829

The load should be loading MVT::i8, extended the result into VT.

gsocshubham added inline comments.Jul 24 2022, 2:49 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7814–7819

If I directly use RshrArr, I get below table in assembly -

.LCPI0_0:
        .ascii  "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"

instead of -

.LCPI0_0:
        .word   0                               ! 0x0
        .word   1                               ! 0x1
        .word   28                              ! 0x1c
        .word   2                               ! 0x2
        .word   29                              ! 0x1d
        .word   14                              ! 0xe
        .word   24                              ! 0x18
        .word   3                               ! 0x3
        .word   30                              ! 0x1e
        .word   22                              ! 0x16
        .word   20                              ! 0x14
        .word   15                              ! 0xf
        .word   25                              ! 0x19
        .word   17                              ! 0x11
        .word   4                               ! 0x4
        .word   8                               ! 0x8
        .word   31                              ! 0x1f
        .word   27                              ! 0x1b
        .word   13                              ! 0xd
        .word   23                              ! 0x17
        .word   21                              ! 0x15
        .word   19                              ! 0x13
        .word   16                              ! 0x10
        .word   7                               ! 0x7
        .word   26                              ! 0x1a
        .word   12                              ! 0xc
        .word   18                              ! 0x12
        .word   6                               ! 0x6
        .word   11                              ! 0xb
        .word   5                               ! 0x5
        .word   10                              ! 0xa
        .word   9                               ! 0x9
        .text
        .globl  f
dmgreen added inline comments.Jul 24 2022, 8:29 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7814–7819

Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will be i32 I think, which uses much more data than it needs, as all the values are in the range 0-BitWidth.

gsocshubham marked an inline comment as not done.Jul 24 2022, 8:58 AM

Fix review comments.

With this revision, there are 2 LIT test failures from RISCV backend which I am guessing will be fixed once we finalize conditions for lowering check.

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll
gsocshubham added inline comments.Jul 24 2022, 11:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7801

I tried replacing DAG.getConstant(DeBruijn.getZExtValue(), dl, VT)) with just DeBruijn but there does not seem a direct conversion from APInt to SDValue`?

7806

Changed it to SmallVector. Thanks!

7810

Done.

7814–7819

Done accordingly.

7829

Can you please elaborate it? I did not understand it. Do you mean to change VT to MVT::i8 in the return statement?

7870

I did something differently. But now it seems fine. Now, we don't have Elts and hence I am taking alignment of an element from RshrArr. Is it fine?

llvm/test/CodeGen/SPARC/cttz.ll
5

Removed.

27

Updated test accordingly.

gsocshubham added inline comments.Jul 24 2022, 11:52 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795

Here, do you mean to change APInt DeBruijn(32, 0x077CB531U) to APInt DeBruijn(64, 0x0218A392CD3D5DBF)?

NumBitsPerElt is 32 even in the case for call i64 @llvm.cttz.i64(i64 %x, i1 true)

Any suggestions on above?

dmgreen added inline comments.Jul 25 2022, 5:51 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795

It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64:

APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U) : APInt(64, 0x0218A392CD3D5DBFULL)

For some targets the i64 cttz will be legalized to a i32 cttz. It is for 64bit targets that the 64bit variant is useful.

7829

We want to create an array of i8 elements, load an i8 from it at the right index, and extend that to the original VT. That can either be done by creating a load of an i8 and calling DAG.getZExtOrTrunc - but that might introduce an illegal type where one cannot be created. So it may need to create a ZEXTLOAD load directly.

craig.topper added inline comments.Jul 25 2022, 10:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7968

This should be getShiftAmountConstant.

7978

We shouldn't need a ConstantSDNode and ConstantInt to get the Type*. You can use VT.getTypeForEVT(*DAG.getContext()) in place of CI->getType()

Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT.

8025

Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table lookup. For CTTZ we need a select. CodeGenPrepare rewrites llvm.cttz(i32 %x, i1 false) into a branch around llvm.cttz(i32 %x, i1 true) on some targets. So the difference might be hard to test.

craig.topper added inline comments.Jul 25 2022, 10:07 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl.

craig.topper added inline comments.Jul 25 2022, 10:14 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7986

Lookup needs a sextOrTrunc to the target's pointer type.

craig.topper added inline comments.Jul 25 2022, 10:18 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7972

What does Lshr mean? shr is usually "shift right", and lshr is logical shift right. But here L means left? But that means I don't know what shr means.

Fix review comments.

gsocshubham added inline comments.Jul 26 2022, 6:21 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795

Done.

7829

Created a load using getZExtOrTrunc. If type is illegal then I am returning getExtLoad(ISD::ZEXTLOAD....

Let me know your suggestions on above.

7853

Updated to shl().

7968

Updated accordingly.

7972

It was a typo in naming. I have updated it accordingly.

7978

Done.

7986

I have created a new variable TargetLookup using sextOrTrunc().

gsocshubham added inline comments.Jul 26 2022, 6:30 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8025

Can you point me to target and test where above scenario occurs? I will update it accordingly.

craig.topper added inline comments.Jul 26 2022, 9:16 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7970

You overwrite Lookup here instead of creating TargetLookup.

7983

The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be MVT::i8

7986

The data in the array is only 8-bits. The load type shouldn't be VT.

7990

NewVT is the same as VT.

7994

ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can be avoided.

8025

Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen prepare.

gsocshubham added inline comments.Jul 28 2022, 10:42 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8025

@craig.topper - What is the desired output value from the table when x=0?

define i32 @f(i32 %x) {
entry:
  %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
  ret i32 %0
}
craig.topper added inline comments.Jul 28 2022, 10:46 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8025

With i1 true there is no desired out put for 0. I'm concerned about i1 false which needs to produce 32.

Fix some of review comments.

gsocshubham added inline comments.Jul 29 2022, 4:54 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7970

Done.

7983

I tried using MVT::i8 by converting it to Value Type to be used instead of VT but there seems some compatibility issue.

7990

Updated.

7994

Modified the blocks. Actually else part depends on ZExtOrTrunc.

craig.topper added inline comments.Jul 29 2022, 3:24 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7984

Replace VT.getTypeForEVT(*DAG.getContext()) with CA->getType()

7986

What happens if you always create a zextload even if the type isn't legal?

7989

This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType.

7994

The VT being passed to the operand named MemVT needs to be MemVT.

dmgreen added inline comments.Jul 30 2022, 12:52 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853

Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.org/wiki/De_Bruijn_sequence#Construction. The lower bits of the last elements use the wrapped-around upper bits of the constant. A rotate is more general than a shift, but for the constants we chose the upper value in the constant is always zero, so they become equivalent. If we are relying on a shift on the generated code, using a shift here too sounds OK.

7960

If BitWidth isn't 32 or 64, we need to return SDValue.
We need to make sure the 64it value is tested too.

gsocshubham added inline comments.Aug 1 2022, 4:41 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7984

If I change alignment as above, then for call i32 @llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC -

I get -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %g0, %o1
        sub %o1, %o0, %o1
        and %o0, %o1, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        retl
        or %o0, %o2, %o0

instead of -

mov     %g0, %o1
sub %o1, %o0, %o1
and %o0, %o1, %o0
sethi 122669, %o1
or %o1, 305, %o1
smul %o0, %o1, %o0
srl %o0, 27, %o0
sethi %hi(.LCPI0_0), %o1
add %o1, %lo(.LCPI0_0), %o1
retl
ld [%o1+%o0], %o0
7986

There is no change in generated code nor in LIT test failures as it was originally submitted earlier.

craig.topper added inline comments.Aug 1 2022, 8:31 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7986

A zext load with i8 memVT?

gsocshubham added inline comments.Aug 1 2022, 11:34 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7986

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0
gsocshubham added inline comments.Aug 1 2022, 11:35 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7986

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0

ZExt with MemVT

Fix review comments.

gsocshubham added inline comments.Aug 3 2022, 7:47 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7960

Done. Added a check.

There are already tests present for cttz.i64 which I have updated in this patch.

gsocshubham added inline comments.Aug 3 2022, 7:50 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7998

@craig.topper - What should be the check here for case llvm.cttz(i32 %x, i1 false)?

It can't be isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT) because that is taken care before this new lowering.

dmgreen added inline comments.Aug 3 2022, 9:41 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7960

This needs to handle other types too - just doing if (BitWidth != 32 && BitWidth != 64) is probably easiest.

7976

Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values.

7989

It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out.

7998

I think, if I understand, this should be if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {
But I'm not sure what this does to the profitability of the transform.

It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth.

craig.topper added inline comments.Aug 3 2022, 10:11 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7987

The Type for the alignment needs to be the type of the elements in CA which is based on what data type is used for the Table SmallVector.

barannikov88 added inline comments.Aug 3 2022, 10:46 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7987

Why the type of the elements specifically and not of the CA itself?
AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element.

Fix review comments.

gsocshubham added inline comments.Aug 4 2022, 3:39 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7960

Updated accordingly.

7976

Reverted to uint8_t.

For @llvm.cttz.i64(i64 %x, i1 true) compiled with riscv64, I am getting table sequence as -

.ascii  "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<*;:"

Hence, earlier I had changed it to uint16_t where I was getting -

.half   0                               # 0x0
.half   1                               # 0x1
.half   2                               # 0x2
.half   7                               # 0x7
.half   3                               # 0x3
.half   13                              # 0xd
.half   8                               # 0x8
.half   19                              # 0x13
.half   4                               # 0x4
.half   25                              # 0x19
.half   14                              # 0xe
.half   28                              # 0x1c
.half   9                               # 0x9
.half   34                              # 0x22
.half   20                              # 0x14
.half   40                              # 0x28
.half   5                               # 0x5
.half   17                              # 0x11
.half   26                              # 0x1a
.half   38                              # 0x26
.half   15                              # 0xf
.half   46                              # 0x2e
.half   29                              # 0x1d
.half   48                              # 0x30
.half   10                              # 0xa
.half   31                              # 0x1f
.half   35                              # 0x23
.half   54                              # 0x36
.half   21                              # 0x15
.half   50                              # 0x32
.half   41                              # 0x29
.half   57                              # 0x39
.half   63                              # 0x3f
.half   6                               # 0x6
.half   12                              # 0xc
.half   18                              # 0x12
.half   24                              # 0x18
.half   27                              # 0x1b
.half   33                              # 0x21
.half   39                              # 0x27
.half   16                              # 0x10
.half   37                              # 0x25
.half   45                              # 0x2d
.half   47                              # 0x2f
.half   30                              # 0x1e
.half   53                              # 0x35
.half   49                              # 0x31
.half   56                              # 0x38
.half   62                              # 0x3e
.half   11                              # 0xb
.half   23                              # 0x17
.half   32                              # 0x20
.half   36                              # 0x24
.half   44                              # 0x2c
.half   52                              # 0x34
.half   55                              # 0x37
.half   61                              # 0x3d
.half   22                              # 0x16
.half   43                              # 0x2b
.half   51                              # 0x33
.half   60                              # 0x3c
.half   42                              # 0x2a
.half   59                              # 0x3b
.half   58                              # 0x3a
7987

Understood. Thanks!
Now, the alignment is of CA.

7989

Removed Load. It is just getExtLoad with MVT::i8 now.

7998

I have added above check!

gsocshubham added inline comments.Aug 4 2022, 8:09 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8035

Regarding this check -

`!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) &&

!isOperationLegal(ISD::CTLZ, VT)`

I am not sure whether these checks are complete to allow new lowering. I have these 3 failures currently -

LLVM :: CodeGen/AVR/cttz.ll
LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll
  1. When I update these tests using utils/update_llc_test_checks.py, it crashes. Does it mean the new lowering is generating wrong code for AVR and RISCV?
  1. Or is it like I am lowering those cases which should not be lowered in first place. Like adding some more checks in above if() block.
craig.topper added inline comments.Aug 4 2022, 8:42 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7974

The RISC-V crash needs this to fix

Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy(TD));

Still looking at AVR

craig.topper added inline comments.Aug 4 2022, 8:59 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8037

This needs to be

if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt))
  return V;

Because there is an early out in CTTZTableLookup for types other than i32/i64.

craig.topper added inline comments.Aug 4 2022, 9:02 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8037

That will fix the AVR failure

Fix LIT test failures.

Add cttz.i64() in SPARC/cttz.ll

gsocshubham added inline comments.Aug 4 2022, 11:21 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7974

Added above line!

8037

Updated as above.

@craig.topper - Thanks a lot for the help. Are there any other suggestions on above patch? or Is it good to merge?

craig.topper added inline comments.Aug 4 2022, 11:23 AM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8006

You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo() is equivalent to this.

Fix review comment.

gsocshubham added inline comments.Aug 4 2022, 12:37 PM
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8006

Got it. Removed.

Is it fine now?

craig.topper added inline comments.Aug 4 2022, 1:35 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4764 ↗(On Diff #450102)

No need for long here.

SDLoc should be passed by const reference.

Fix review comments.

llvm/include/llvm/CodeGen/TargetLowering.h
4764 ↗(On Diff #450102)

Done.

Is it in the state to merge?

craig.topper accepted this revision.Aug 5 2022, 8:04 AM

LGTM to me with those two changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001

You don't need getZExtValue() here

8018

I don't think you need an explicit EVT around MVT::i8

This revision is now accepted and ready to land.Aug 5 2022, 8:04 AM
barannikov88 added inline comments.Aug 5 2022, 8:55 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗(On Diff #450227)

The comment is still confusing. The function does not return reference to the generated table, it returns the expanded node.
The "brief" part of the should also be fixed. Currently, it documents the way the function is used, which it should not. Uses may be added or removed, while the comment should stay the same.

gsocshubham added inline comments.Aug 5 2022, 11:08 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗(On Diff #450227)

How about now?

/// Expand CTTZ node if CTLZ/CTPOP operations are not legal.
/// \param N Node to expand
/// \returns The expansion result or SDValue() if it fails.

@barannikov88 - Is brief part fine now? If not, can you please suggest a better one?

barannikov88 added inline comments.Aug 5 2022, 12:31 PM
llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗(On Diff #450227)

Sounds ok to me now.
My suggestion would be "Expands CTTZ via table lookup". This would not imply that ctlz/ctpop have to be illegal if you want to use this method. This, however, is what the name of the method already says, making the whole comment kind of redundant. Use it or leave it as it is; I guess it is not a major issue since noone else objected.

barannikov88 accepted this revision.Aug 5 2022, 12:32 PM

Thank you for your work!

Fix review comments.

gsocshubham added inline comments.Aug 6 2022, 12:16 AM
llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗(On Diff #450227)

Thanks for suggestion. It looks better now!

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001

Removed.

8018

Removed. Done.

Can someone please commit this patch with below details? I do not have commit access.

Name - Shubham Narlawar
Email - shubham.narlawar@rrlogic.co.in

"Shubham Narlawar <shubham.narlawar@rrlogic.co.in>"

dmgreen accepted this revision.Aug 8 2022, 2:44 AM

LGTM - I can commit this for you. I will do so now.

I was trying some testing before I did, everything seems to be alright. The code isn't as optimal as it could be in places, but it is certainly an improvement over what was already present. I tried other backend to see if they all at least compiled successfully. The only one that didn't was BPF, but that appears to be an pre-existing condition. All the others seemed OK for a compile at least, and the results on Arm/hacked up AArch64 produce the correct values. Looks OK.

This revision was landed with ongoing or failed builds.Aug 8 2022, 4:08 AM
This revision was automatically updated to reflect the committed changes.