This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
49/107
TargetLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
RISCV/
-
rv32zbb.ll
-
SPARC/
2/4
cttz.ll
-
VE/Scalar/
-
Scalar/
1/3
cttz.ll

Differential D128911

Emit table lookup from TargetLowering::expandCTTZ()
ClosedPublic

Authored by gsocshubham on Jun 30 2022, 7:17 AM.

Download Raw Diff

Details

Reviewers

craig.topper
momchil.velikov
greened
efriedma
barannikov88
dmgreen

Commits

rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ()

Summary

This patch emits table lookup in expandCTTZ. The patch is child revision of https://reviews.llvm.org/D113291.

Context -

https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

gsocshubham added inline comments.Jul 15 2022, 5:02 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<` but this changes variable from `unsigned int` to `SDNode`.
7858–7867	I tried `ConstantDataArray::get(DAG.getContext(), RshrArr)` but it does not generate table in constant pool since `RshrArr` is not array of `unsigned int` and not `Constant` I think there might be better alternative to simplify above. I am exploring it.

dmgreen added inline comments.Jul 18 2022, 2:51 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849	Move DeBruijn up so that it can be re-used in the creation of Lookup above. Perhaps add a variable for the Shift Amount too, which is 27 here but in general is BitWidth - log2(BitWdith), I think.
7853	That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time. If DeBruijn is an APInt of the correct size `APInt DeBruijn(32, 0x077CB531U)`, then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added.
llvm/test/CodeGen/VE/Scalar/cttz.ll
44	If the target has a ctpop instruction then that should be preferred to the table lookup.

Fixing review comments - Using APInt for DeBruijn constant.

I am using *DeBruijn.getRawData() to get the integer from the APInt but it does not seem appropriate way. Can anyone suggest me good approach to fetch integer from an APInt? I went through Constants.h and found getRawData() is the only way to fetch a data pointer which then I have to derefer it.

Harbormaster completed remote builds in B176845: Diff 446612.Jul 21 2022, 12:49 PM

gsocshubham added inline comments.Jul 21 2022, 12:57 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849	Done. Can you please give me some suggestions to make usage of APInt minimal and clean?
7853	Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)` Is it correct way? Can you suggest a way to simplify below usage of APInt?
7856	Regarding this line - `RshrArr[Rshr] = i;` In debug mode, when I run `SPARC/cttz.ll` which is present in this patch, I get a segmentation fault. but in release it is working fine. Why is there a difference in the behaviour?
llvm/test/CodeGen/VE/Scalar/cttz.ll
44	I have added a check - `!isOperationLegal(ISD::CTPOP, VT)` but that does not seem to help.

barannikov88 added a subscriber: barannikov88.Jul 21 2022, 3:14 PM

barannikov88 added inline comments.Jul 21 2022, 3:23 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7856	Most probably there is a bug in your code. Most probably you haven't enabled assertions in release mode (-DLLVM_ENABLE_ASSERTIONS=ON is the default in debug mode). You should be able to find some hints in the printed backtrace, but first make sure that `llvm-symbolizer` target is built.
llvm/test/CodeGen/VE/Scalar/cttz.ll
44	VE sets the action to `Promote`. Perhaps, you should call `isOperationExpand` instead and do the transformation if it returns true.

barannikov88 added inline comments.Jul 21 2022, 3:54 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7838	I think your code should be moved a little higher, before `SDValue Tmp =` . Otherwise the `Tmp` node will be unused if your optimization has been applied. The check will probably need some corrections. Note the comments above, they may help deduce the correct condition. Would also be better if you extract your implementation into separate function.
7841	`getRawData` is very low level method, you should be able to easily avoid it.
7844	Be sure to clang-format your changes before submitting a patch. This is required both by the coding style and by the contribution guidelines.
7846	Ditto, don't use `getRawData`.
7850	There are always exactly 32 elements, you can use plain C array and avoid dynamic memory allocation at all. This is a bit of hard-code though, so SmallVector with 32 minimum size might be better.
7854	Ditto here and on the next line. `APInt Lshr = DeBruijn.rotl(i)` should do.
7870	You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ.

Fix review comments.

There are 3 LIT test failures -

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv32zbb.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll

I will update them as soon as table lookup code is finalized which is still under review.

Harbormaster completed remote builds in B176977: Diff 446780.Jul 22 2022, 4:23 AM

gsocshubham added inline comments.Jul 22 2022, 4:29 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7838	Moved code above `SDValue Tmp =` and added into a separate function. I am using below check to do the optimization - if (NumBitsPerElt == 32 && !VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT))
7841	Done. Now I am using - `getZExtValue()`
7844	I have formatted code using - `$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i --lines=7866:7872`
7846	Updated to `getZExtValue()`
7850	I am using plain C array and got rid of dynamic memory allocation error.
7854	Done.
7856	Thanks. `-DLLVM_ENABLE_ASSERTIONS=ON` this helped.

gsocshubham added inline comments.Jul 22 2022, 4:36 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

7870

From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment?

If I use array alignment CA, I get below assembly as compared to SPARC/cttz.ll assembly if array element alignment is used. What do you think? Should I update from CPIdx to CA?

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        or %o0, %o2, %o0

barannikov88 added inline comments.Jul 22 2022, 8:10 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4725 ↗	(On Diff #446780)	I believe it returns something else
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7796	You should be able to avoid getZExtValue here either. Moreover, you don't need APInt here at all, the values are small enough to be put into 'unsigned', e.g.: `unsinged ShiftAmt = BitWidth - Log2_32(BitWidth);` Same for most other APInts.
7805	Could as well be C array.
7809	There is no standalone "rotl" function which would work for 'unsigned', but you could do something like this: `Lshr = (DeBruijn << i) \| (DeBruijn >> (NumBitsPerElt - Amt)));
7839	This is redundant. Your code is already in TargetLowering class. Just call `isOperationExpand` directly.

barannikov88 added inline comments.Jul 22 2022, 8:19 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7870	I meant you should call `TD.getPrefTypeAlign(Elts->getType())` instead of `TD.getPrefTypeAlign(Elts[0]->getType()` Is the above assembly a result of such change, or did you do something different?
llvm/test/CodeGen/SPARC/cttz.ll
5	Unused
27	Why not just `ret i32 %0` ?

When we get that far, we might need to add a way for targets to opt out of the new lowering if it will cause them problems.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795	Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't necessarily a 32bit quantity - there are other values for 8/16/64. We should add i64 at least, so long as they look beneficial (which they should for 64, not sure about the other values). APInt makes that simpler because it can store any size without overflow, and the operations are performed on the right size. BitWidth and ShiftAmt can remain as unsigned - the values will always easily fit in an unsigned value. ShiftWidth can be `Bitwidth - Log2_32(BitWidth)`. I would rename the "NumBitsPerElt" argument to "Bitwidth" too, to make it clear what it is and avoid the need for the two.
7801	I think we can make a constant from a APInt directly.
7806	This shouldn't be a plain C array. It's size is dependant on the BitWidth. `SmallVector<uint8_t> RshrArr(BitWidth, 0)` should create an array that is initialized to 0's with the correct size.
7810	`APInt Rshr = Lshr.lshr(ShiftAmt)`, then use `Rshr.getZExtValue()` in the line below. It is a little strange to use getZExtValue in an array index, but so long as the array is a safe type, it should complain if the value is out of bounds.
7814–7819	Do we need this loop, or can we create the array from the constant pool directly? The elements should be MVT::i8. auto CA = ConstantDataArray::get(DAG.getContext(), RshrArr);
7829	The load should be loading MVT::i8, extended the result into VT.

dmgreen mentioned this in D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Jul 23 2022, 4:47 AM

gsocshubham added inline comments.Jul 24 2022, 2:49 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

7814–7819

If I directly use RshrArr, I get below table in assembly -

.LCPI0_0:
        .ascii  "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"

instead of -

.LCPI0_0:
        .word   0                               ! 0x0
        .word   1                               ! 0x1
        .word   28                              ! 0x1c
        .word   2                               ! 0x2
        .word   29                              ! 0x1d
        .word   14                              ! 0xe
        .word   24                              ! 0x18
        .word   3                               ! 0x3
        .word   30                              ! 0x1e
        .word   22                              ! 0x16
        .word   20                              ! 0x14
        .word   15                              ! 0xf
        .word   25                              ! 0x19
        .word   17                              ! 0x11
        .word   4                               ! 0x4
        .word   8                               ! 0x8
        .word   31                              ! 0x1f
        .word   27                              ! 0x1b
        .word   13                              ! 0xd
        .word   23                              ! 0x17
        .word   21                              ! 0x15
        .word   19                              ! 0x13
        .word   16                              ! 0x10
        .word   7                               ! 0x7
        .word   26                              ! 0x1a
        .word   12                              ! 0xc
        .word   18                              ! 0x12
        .word   6                               ! 0x6
        .word   11                              ! 0xb
        .word   5                               ! 0x5
        .word   10                              ! 0xa
        .word   9                               ! 0x9
        .text
        .globl  f

dmgreen added inline comments.Jul 24 2022, 8:29 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7814–7819	Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will be i32 I think, which uses much more data than it needs, as all the values are in the range 0-BitWidth.

gsocshubham marked an inline comment as not done.Jul 24 2022, 8:58 AM

Fix review comments.

With this revision, there are 2 LIT test failures from RISCV backend which I am guessing will be fixed once we finalize conditions for lowering check.

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll

Harbormaster completed remote builds in B177241: Diff 447148.Jul 24 2022, 11:33 AM

gsocshubham added inline comments.Jul 24 2022, 11:39 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7801	I tried replacing `DAG.getConstant(DeBruijn.getZExtValue(), dl, VT))` with just `DeBruijn` but there does not seem a direct conversion from `APInt` to SDValue`?
7806	Changed it to SmallVector. Thanks!
7810	Done.
7814–7819	Done accordingly.
7829	Can you please elaborate it? I did not understand it. Do you mean to change `VT` to `MVT::i8` in the return statement?
7870	I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am taking alignment of an element from `RshrArr`. Is it fine?
llvm/test/CodeGen/SPARC/cttz.ll
5	Removed.
27	Updated test accordingly.

gsocshubham added inline comments.Jul 24 2022, 11:52 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795	Here, do you mean to change `APInt DeBruijn(32, 0x077CB531U)` to `APInt DeBruijn(64, 0x0218A392CD3D5DBF)`? `NumBitsPerElt` is 32 even in the case for `call i64 @llvm.cttz.i64(i64 %x, i1 true)`

Any suggestions on above?

dmgreen added inline comments.Jul 25 2022, 5:51 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795	It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64: APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U) : APInt(64, 0x0218A392CD3D5DBFULL) For some targets the i64 cttz will be legalized to a i32 cttz. It is for 64bit targets that the 64bit variant is useful.
7829	We want to create an array of i8 elements, load an i8 from it at the right index, and extend that to the original VT. That can either be done by creating a load of an i8 and calling DAG.getZExtOrTrunc - but that might introduce an illegal type where one cannot be created. So it may need to create a ZEXTLOAD load directly.

craig.topper added inline comments.Jul 25 2022, 10:02 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7968	This should be getShiftAmountConstant.
7978	We shouldn't need a ConstantSDNode and ConstantInt to get the Type. You can use `VT.getTypeForEVT(DAG.getContext())` in place of `CI->getType()` Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT.
8025	Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table lookup. For CTTZ we need a select. CodeGenPrepare rewrites llvm.cttz(i32 %x, i1 false) into a branch around llvm.cttz(i32 %x, i1 true) on some targets. So the difference might be hard to test.

craig.topper added inline comments.Jul 25 2022, 10:07 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl.

craig.topper added inline comments.Jul 25 2022, 10:14 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7986	`Lookup` needs a sextOrTrunc to the target's pointer type.

craig.topper added inline comments.Jul 25 2022, 10:18 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7972	What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But here `L` means left? But that means I don't know what `shr` means.

Fix review comments.

Harbormaster completed remote builds in B177585: Diff 447662.Jul 26 2022, 6:15 AM

gsocshubham added inline comments.Jul 26 2022, 6:21 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7795	Done.
7829	Created a load using `getZExtOrTrunc`. If type is illegal then I am returning `getExtLoad(ISD::ZEXTLOAD....` Let me know your suggestions on above.
7853	Updated to `shl()`.
7968	Updated accordingly.
7972	It was a typo in naming. I have updated it accordingly.
7978	Done.
7986	I have created a new variable `TargetLookup` using `sextOrTrunc()`.

gsocshubham added inline comments.Jul 26 2022, 6:30 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8025	Can you point me to target and test where above scenario occurs? I will update it accordingly.

craig.topper added inline comments.Jul 26 2022, 9:16 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7970	You overwrite `Lookup` here instead of creating `TargetLookup`.
7983	The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be MVT::i8
7986	The data in the array is only 8-bits. The load type shouldn't be VT.
7990	NewVT is the same as VT.
7994	ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can be avoided.
8025	Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen prepare.

gsocshubham added inline comments.Jul 28 2022, 10:42 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8025	@craig.topper - What is the desired output value from the table when x=0? define i32 @f(i32 %x) { entry: %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true) ret i32 %0 }

craig.topper added inline comments.Jul 28 2022, 10:46 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8025	With `i1 true` there is no desired out put for 0. I'm concerned about `i1 false` which needs to produce 32.

Fix some of review comments.

Harbormaster completed remote builds in B178248: Diff 448583.Jul 29 2022, 4:49 AM

gsocshubham added inline comments.Jul 29 2022, 4:54 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7970	Done.
7983	I tried using `MVT::i8` by converting it to Value Type to be used instead of VT but there seems some compatibility issue.
7990	Updated.
7994	Modified the blocks. Actually else part depends on `ZExtOrTrunc`.

craig.topper added inline comments.Jul 29 2022, 3:24 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7984	Replace VT.getTypeForEVT(*DAG.getContext()) with `CA->getType()`
7986	What happens if you always create a zextload even if the type isn't legal?
7989	This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType.
7994	The `VT` being passed to the operand named `MemVT` needs to be MemVT.

dmgreen added inline comments.Jul 30 2022, 12:52 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.org/wiki/De_Bruijn_sequence#Construction. The lower bits of the last elements use the wrapped-around upper bits of the constant. A rotate is more general than a shift, but for the constants we chose the upper value in the constant is always zero, so they become equivalent. If we are relying on a shift on the generated code, using a shift here too sounds OK.
7960	If BitWidth isn't 32 or 64, we need to return SDValue. We need to make sure the 64it value is tested too.

gsocshubham added inline comments.Aug 1 2022, 4:41 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

7984

If I change alignment as above, then for call i32 @llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC -

I get -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %g0, %o1
        sub %o1, %o0, %o1
        and %o0, %o1, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        retl
        or %o0, %o2, %o0

instead of -

mov     %g0, %o1
sub %o1, %o0, %o1
and %o0, %o1, %o0
sethi 122669, %o1
or %o1, 305, %o1
smul %o0, %o1, %o0
srl %o0, 27, %o0
sethi %hi(.LCPI0_0), %o1
add %o1, %lo(.LCPI0_0), %o1
retl
ld [%o1+%o0], %o0

7986

There is no change in generated code nor in LIT test failures as it was originally submitted earlier.

craig.topper added inline comments.Aug 1 2022, 8:31 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7986	A zext load with i8 memVT?

gsocshubham added inline comments.Aug 1 2022, 11:34 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

7986

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0

gsocshubham added inline comments.Aug 1 2022, 11:35 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

7986

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0

ZExt with MemVT

Fix review comments.

Harbormaster completed remote builds in B179030: Diff 449665.Aug 3 2022, 7:46 AM

gsocshubham added inline comments.Aug 3 2022, 7:47 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7960	Done. Added a check. There are already tests present for cttz.i64 which I have updated in this patch.

gsocshubham added inline comments.Aug 3 2022, 7:50 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7998	@craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`? It can't be `isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)` because that is taken care before this new lowering.

dmgreen added inline comments.Aug 3 2022, 9:41 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7960	This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is probably easiest.
7976	Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values.
7989	It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out.
7998	I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {` But I'm not sure what this does to the profitability of the transform. It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth.

craig.topper added inline comments.Aug 3 2022, 10:11 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7987	The Type for the alignment needs to be the type of the elements in `CA` which is based on what data type is used for the `Table` SmallVector.

barannikov88 added inline comments.Aug 3 2022, 10:46 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7987	Why the type of the elements specifically and not of the CA itself? AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element.

Fix review comments.

Harbormaster completed remote builds in B179231: Diff 449925.Aug 4 2022, 3:31 AM

gsocshubham added inline comments.Aug 4 2022, 3:39 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7960	Updated accordingly.
7976	Reverted to `uint8_t`. For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am getting table sequence as - .ascii "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<*;:" Hence, earlier I had changed it to `uint16_t` where I was getting - .half 0 # 0x0 .half 1 # 0x1 .half 2 # 0x2 .half 7 # 0x7 .half 3 # 0x3 .half 13 # 0xd .half 8 # 0x8 .half 19 # 0x13 .half 4 # 0x4 .half 25 # 0x19 .half 14 # 0xe .half 28 # 0x1c .half 9 # 0x9 .half 34 # 0x22 .half 20 # 0x14 .half 40 # 0x28 .half 5 # 0x5 .half 17 # 0x11 .half 26 # 0x1a .half 38 # 0x26 .half 15 # 0xf .half 46 # 0x2e .half 29 # 0x1d .half 48 # 0x30 .half 10 # 0xa .half 31 # 0x1f .half 35 # 0x23 .half 54 # 0x36 .half 21 # 0x15 .half 50 # 0x32 .half 41 # 0x29 .half 57 # 0x39 .half 63 # 0x3f .half 6 # 0x6 .half 12 # 0xc .half 18 # 0x12 .half 24 # 0x18 .half 27 # 0x1b .half 33 # 0x21 .half 39 # 0x27 .half 16 # 0x10 .half 37 # 0x25 .half 45 # 0x2d .half 47 # 0x2f .half 30 # 0x1e .half 53 # 0x35 .half 49 # 0x31 .half 56 # 0x38 .half 62 # 0x3e .half 11 # 0xb .half 23 # 0x17 .half 32 # 0x20 .half 36 # 0x24 .half 44 # 0x2c .half 52 # 0x34 .half 55 # 0x37 .half 61 # 0x3d .half 22 # 0x16 .half 43 # 0x2b .half 51 # 0x33 .half 60 # 0x3c .half 42 # 0x2a .half 59 # 0x3b .half 58 # 0x3a
7987	Understood. Thanks! Now, the alignment is of `CA`.
7989	Removed Load. It is just `getExtLoad` with `MVT::i8` now.
7998	I have added above check!

gsocshubham removed a parent revision: D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Aug 4 2022, 5:07 AM

gsocshubham added inline comments.Aug 4 2022, 8:09 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8035	Regarding this check - `!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT)` I am not sure whether these checks are complete to allow new lowering. I have these 3 failures currently - LLVM :: CodeGen/AVR/cttz.ll LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll LLVM :: CodeGen/RISCV/rv64zbb.ll When I update these tests using `utils/update_llc_test_checks.py`, it crashes. Does it mean the new lowering is generating wrong code for `AVR` and `RISCV`? Or is it like I am lowering those cases which should not be lowered in first place. Like adding some more checks in above `if()` block.

craig.topper added inline comments.Aug 4 2022, 8:42 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7974	The RISC-V crash needs this to fix Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy(TD)); Still looking at AVR

craig.topper added inline comments.Aug 4 2022, 8:59 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8037	This needs to be if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt)) return V; Because there is an early out in CTTZTableLookup for types other than i32/i64.

craig.topper added inline comments.Aug 4 2022, 9:02 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8037	That will fix the AVR failure

Fix LIT test failures.

Add cttz.i64() in SPARC/cttz.ll

Harbormaster completed remote builds in B179351: Diff 450079.Aug 4 2022, 11:21 AM

gsocshubham added inline comments.Aug 4 2022, 11:21 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7974	Added above line!
8037	Updated as above.

@craig.topper - Thanks a lot for the help. Are there any other suggestions on above patch? or Is it good to merge?

craig.topper added inline comments.Aug 4 2022, 11:23 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8006	You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo() is equivalent to `this`.

Fix review comment.

gsocshubham added inline comments.Aug 4 2022, 12:37 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8006	Got it. Removed. Is it fine now?

craig.topper added inline comments.Aug 4 2022, 1:35 PM

llvm/include/llvm/CodeGen/TargetLowering.h
4764 ↗	(On Diff #450102)	No need for `long` here. `SDLoc` should be passed by const reference.

Harbormaster completed remote builds in B179368: Diff 450102.Aug 4 2022, 2:45 PM

Fix review comments.

llvm/include/llvm/CodeGen/TargetLowering.h
4764 ↗	(On Diff #450102)	Done.

Harbormaster completed remote builds in B179467: Diff 450227.Aug 4 2022, 11:11 PM

Is it in the state to merge?

LGTM to me with those two changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001	You don't need getZExtValue() here
8018	I don't think you need an explicit `EVT` around MVT::i8

This revision is now accepted and ready to land.Aug 5 2022, 8:04 AM

barannikov88 added inline comments.Aug 5 2022, 8:55 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗	(On Diff #450227)	The comment is still confusing. The function does not return reference to the generated table, it returns the expanded node. The "brief" part of the should also be fixed. Currently, it documents the way the function is used, which it should not. Uses may be added or removed, while the comment should stay the same.

gsocshubham added inline comments.Aug 5 2022, 11:08 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗	(On Diff #450227)	How about now? /// Expand CTTZ node if CTLZ/CTPOP operations are not legal. /// \param N Node to expand /// \returns The expansion result or SDValue() if it fails. @barannikov88 - Is brief part fine now? If not, can you please suggest a better one?

barannikov88 added inline comments.Aug 5 2022, 12:31 PM

llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗	(On Diff #450227)	Sounds ok to me now. My suggestion would be "Expands CTTZ via table lookup". This would not imply that ctlz/ctpop have to be illegal if you want to use this method. This, however, is what the name of the method already says, making the whole comment kind of redundant. Use it or leave it as it is; I guess it is not a major issue since noone else objected.

Thank you for your work!

Fix review comments.

gsocshubham added inline comments.Aug 6 2022, 12:16 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762 ↗	(On Diff #450227)	Thanks for suggestion. It looks better now!
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001	Removed.
8018	Removed. Done.

Harbormaster completed remote builds in B179665: Diff 450481.Aug 6 2022, 1:05 AM

Can someone please commit this patch with below details? I do not have commit access.

Name - Shubham Narlawar
Email - shubham.narlawar@rrlogic.co.in

"Shubham Narlawar <shubham.narlawar@rrlogic.co.in>"

@greened @craig.topper @barannikov88 - Can you commit this?

LGTM - I can commit this for you. I will do so now.

I was trying some testing before I did, everything seems to be alright. The code isn't as optimal as it could be in places, but it is certainly an improvement over what was already present. I tried other backend to see if they all at least compiled successfully. The only one that didn't was BPF, but that appears to be an pre-existing condition. All the others seemed OK for a compile at least, and the results on Arm/hacked up AArch64 produce the correct values. Looks OK.

This revision was landed with ongoing or failed builds.Aug 8 2022, 4:08 AM

Closed by commit rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ() (authored by gsocshubham, committed by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ().

john.brawn mentioned this in D133199: [ARM] Constant pools need 4-byte alignment if we only have tADR.Sep 5 2022, 4:12 AM

john.brawn mentioned this in rGe26cadcc32a2: [ARM] Constant pools need 4-byte alignment if we only have tADR.Sep 6 2022, 3:36 AM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

36 lines

test/

CodeGen/

RISCV/

rv32zbb.ll

83 lines

SPARC/

cttz.ll

36 lines

VE/

Scalar/

cttz.ll

27 lines

Diff 444939

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

Context not available.
	DAG.getNode(ISD::CTLZ, dl, VT, Tmp));	DAG.getNode(ISD::CTLZ, dl, VT, Tmp));
	}	}

		if (NumBitsPerElt == 32 && !VT.isVector()) {
		barannikov88Unsubmitted Not Done Reply Inline Actions I think your code should be moved a little higher, before `SDValue Tmp =` . Otherwise the `Tmp` node will be unused if your optimization has been applied. The check will probably need some corrections. Note the comments above, they may help deduce the correct condition. Would also be better if you extract your implementation into separate function. barannikov88: I think your code should be moved a little higher, before `SDValue Tmp = `. Otherwise the `Tmp`…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Moved code above `SDValue Tmp =` and added into a separate function. I am using below check to do the optimization - if (NumBitsPerElt == 32 && !VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT)) gsocshubham: Moved code above `SDValue Tmp = ` and added into a separate function. I am using below check…
		SDValue Neg = DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(0, dl, VT), Op);
		barannikov88Unsubmitted Not Done Reply Inline Actions This is redundant. Your code is already in TargetLowering class. Just call `isOperationExpand` directly. barannikov88: This is redundant. Your code is already in TargetLowering class. Just call `isOperationExpand`…
		SDValue Lookup = DAG.getNode(
		dmgreenUnsubmitted Not Done Reply Inline Actions This needs to be in a `if (NumBitsPerElt == 32 && !VT.isVector())` block. Hopefully with other types available too. Else it would fall back to the existing CTPop. If the CTPOP is legal it is worth using it too, instead of this table lookup. dmgreen: This needs to be in a `if (NumBitsPerElt == 32 && !VT.isVector())` block. Hopefully with other…
		gsocshubhamAuthorUnsubmitted Not Done Reply Inline Actions Done. gsocshubham: Done.
		ISD::SRL, dl, VT,
		barannikov88Unsubmitted Not Done Reply Inline Actions `getRawData` is very low level method, you should be able to easily avoid it. barannikov88: `getRawData` is very low level method, you should be able to easily avoid it.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Now I am using - `getZExtValue()` gsocshubham: Done. Now I am using - `getZExtValue()`
		DAG.getNode(ISD::MUL, dl, VT, DAG.getNode(ISD::AND, dl, VT, Op, Neg),
		DAG.getConstant(0x077CB531U, dl, VT)),
		DAG.getConstant(27, dl, VT));
		dmgreenUnsubmitted Not Done Reply Inline Actions I think that some other DeBruijn constants would be: i8: 0x17 i16: 0x09AF i64: 0x0218A392CD3D5DBF Hopefully I didn't get those wrong. The i64 almost certainly better than the alternative, but I haven't looked at whether the other two are useful. dmgreen: I think that some other DeBruijn constants would be: i8: 0x17 i16: 0x09AF i64…
		barannikov88Unsubmitted Not Done Reply Inline Actions Be sure to clang-format your changes before submitting a patch. This is required both by the coding style and by the contribution guidelines. barannikov88: Be sure to clang-format your changes before submitting a patch. This is required both by the…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have formatted code using - `$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i --lines=7866:7872` gsocshubham: I have formatted code using - `$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i…

		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Can anyone suggest how do I get reference to table here? If I get a reference here then I can index into the table using above `Lookup`. gsocshubham: Can anyone suggest how do I get reference to table here? If I get a reference here then I can…
		dmgreenUnsubmitted Not Done Reply Inline Actions Eli had some comments in https://reviews.llvm.org/D113291#3540702. DAGCombiner:: convertSelectOfFPConstantsToLoadOffset has some code that emits a table. dmgreen: Eli had some comments in https://reviews.llvm.org/D113291#3540702. DAGCombiner…
		craig.topperUnsubmitted Not Done Reply Inline Actions You can't use the original table. It might never have existed in IR or will have been removed once all references to it are removed. You'll need to create a new table in the constant pool. craig.topper: You can't use the original table. It might never have existed in IR or will have been removed…
		djtodoroUnsubmitted Not Done Reply Inline Actions Agree, a new table is required here. djtodoro: Agree, a new table is required here.
		// Create a table in constant data pool
		barannikov88Unsubmitted Not Done Reply Inline Actions Ditto, don't use `getRawData`. barannikov88: Ditto, don't use `getRawData`.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated to `getZExtValue()` gsocshubham: Updated to `getZExtValue()`
		std::vector<Constant *> Elts;
		SmallVector<uint8_t, 32> RshrArr;
		unsigned int DeBruijn = 0x077CB531U;
		dmgreenUnsubmitted Not Done Reply Inline Actions Use a SmallVector for the constant values, probably with uint8_t elements. dmgreen: Use a SmallVector for the constant values, probably with uint8_t elements.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		dmgreenUnsubmitted Not Done Reply Inline Actions Move DeBruijn up so that it can be re-used in the creation of Lookup above. Perhaps add a variable for the Shift Amount too, which is 27 here but in general is BitWidth - log2(BitWdith), I think. dmgreen: Move DeBruijn up so that it can be re-used in the creation of Lookup above. Perhaps add a…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Can you please give me some suggestions to make usage of APInt minimal and clean? gsocshubham: Done. Can you please give me some suggestions to make usage of APInt minimal and clean?

		dmgreenUnsubmitted Not Done Reply Inline Actions "DeBruijn" dmgreen: "DeBruijn"
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		barannikov88Unsubmitted Not Done Reply Inline Actions There are always exactly 32 elements, you can use plain C array and avoid dynamic memory allocation at all. This is a bit of hard-code though, so SmallVector with 32 minimum size might be better. barannikov88: There are always exactly 32 elements, you can use plain C array and avoid dynamic memory…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I am using plain C array and got rid of dynamic memory allocation error. gsocshubham: I am using plain C array and got rid of dynamic memory allocation error.
		for (unsigned int i = 0; i < NumBitsPerElt; i++) {
		unsigned int Lshr = DeBruijn << i;
		unsigned int Rshr = Lshr >> 27;
		dmgreenUnsubmitted Not Done Reply Inline Actions I think this should technically be a rotl, because the DeBruijn sequences are cyclic. The top element in the sequence is always 0 though, so it probably doesn't make a difference in practice. It may be better to us APInt for all the DeBrujin constants though. It is easier to be more precise with them, and it's useful when i64 constants are supported. dmgreen: I think this should technically be a rotl, because the DeBruijn sequences are cyclic. The top…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<` but this changes variable from `unsigned int` to `SDNode`. gsocshubham: I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<`…
		dmgreenUnsubmitted Not Done Reply Inline Actions That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time. If DeBruijn is an APInt of the correct size `APInt DeBruijn(32, 0x077CB531U)`, then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added. dmgreen: That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)` Is it correct way? Can you suggest a way to simplify below usage of APInt? gsocshubham: Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)` Is it correct way? Can you suggest…
		craig.topperUnsubmitted Not Done Reply Inline Actions Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl. craig.topper: Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated to `shl()`. gsocshubham: Updated to `shl()`.
		dmgreenUnsubmitted Not Done Reply Inline Actions Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.org/wiki/De_Bruijn_sequence#Construction. The lower bits of the last elements use the wrapped-around upper bits of the constant. A rotate is more general than a shift, but for the constants we chose the upper value in the constant is always zero, so they become equivalent. If we are relying on a shift on the generated code, using a shift here too sounds OK. dmgreen: Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.
		RshrArr[Rshr] = i;
		barannikov88Unsubmitted Not Done Reply Inline Actions Ditto here and on the next line. `APInt Lshr = DeBruijn.rotl(i)` should do. barannikov88: Ditto here and on the next line. `APInt Lshr = DeBruijn.rotl(i)` should do.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		}

		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Regarding this line - `RshrArr[Rshr] = i;` In debug mode, when I run `SPARC/cttz.ll` which is present in this patch, I get a segmentation fault. but in release it is working fine. Why is there a difference in the behaviour? gsocshubham: Regarding this line - `RshrArr[Rshr] = i;` In debug mode, when I run `SPARC/cttz.ll` which is…
		barannikov88Unsubmitted Not Done Reply Inline Actions Most probably there is a bug in your code. Most probably you haven't enabled assertions in release mode (-DLLVM_ENABLE_ASSERTIONS=ON is the default in debug mode). You should be able to find some hints in the printed backtrace, but first make sure that `llvm-symbolizer` target is built. barannikov88: Most probably there is a bug in your code. Most probably you haven't enabled assertions in…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Thanks. `-DLLVM_ENABLE_ASSERTIONS=ON` this helped. gsocshubham: Thanks. `-DLLVM_ENABLE_ASSERTIONS=ON` this helped.
		for (unsigned int i = 0; i < NumBitsPerElt; i++) {
		SDValue Index = DAG.getConstant(RshrArr[i], dl, VT);
		ConstantSDNode *IndexNode = cast<ConstantSDNode>(Index);
		ConstantInt CI = const_cast<ConstantInt >(IndexNode->getConstantIntValue());
		craig.topperUnsubmitted Not Done Reply Inline Actions `dyn_cast` should only be used if the cast can fail. Otherwise use `cast` craig.topper: `dyn_cast` should only be used if the cast can fail. Otherwise use `cast`
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Understood. Changed it to `cast`. gsocshubham: Understood. Changed it to `cast`.
		Elts.push_back(CI);
		}
		craig.topperUnsubmitted Not Done Reply Inline Actions Same comment about `dyn_cast` craig.topper: Same comment about `dyn_cast`
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Changed it to `cast`. gsocshubham: Changed it to `cast`.
		const DataLayout &TD = DAG.getDataLayout();

		// Create a ConstantArray of size NumBitsPerElt
		craig.topperUnsubmitted Not Done Reply Inline Actions Why do we need to create an explicit ArrayRef here? craig.topper: Why do we need to create an explicit ArrayRef here?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Not required. I have removed it. gsocshubham: Not required. I have removed it.
		auto *CA = ConstantArray::get(ArrayType::get(Elts[0]->getType(), NumBitsPerElt), Elts);
		SDValue CPIdx = DAG.getConstantPool(CA, getPointerTy(TD),
		dmgreenUnsubmitted Not Done Reply Inline Actions I think a lot of this may simplify into `auto CA = ConstantDataArray::get(DAG.getContext(), RshrArr);` or something like it. I don't think we need to create ConstantSDNode, just to get the Constant's from them. dmgreen: I think a lot of this may simplify into `auto CA = ConstantDataArray::get(DAG.getContext()…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I tried `ConstantDataArray::get(DAG.getContext(), RshrArr)` but it does not generate table in constant pool since `RshrArr` is not array of `unsigned int` and not `Constant` I think there might be better alternative to simplify above. I am exploring it. gsocshubham: I tried `ConstantDataArray::get(*DAG.getContext(), RshrArr)` but it does not generate table in…
		TD.getPrefTypeAlign(Elts[0]->getType()));
		Align Alignment = cast<ConstantPoolSDNode>(CPIdx)->getAlign();
		return DAG.getLoad(VT, dl, DAG.getEntryNode(), DAG.getMemBasePlusOffset(CPIdx, Lookup, dl),
		barannikov88Unsubmitted Not Done Reply Inline Actions You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ. barannikov88: You should use the alignment requirement of the array (i.e. CA), not of its element. They may…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment? If I use array alignment `CA`, I get below assembly as compared to `SPARC/cttz.ll` assembly if array element alignment is used. What do you think? Should I update from `CPIdx` to `CA`? f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %o0, %o1 cmp %o0, 0 be .LBB0_2 mov %g0, %o0 ! %bb.1: ! %entry sub %o0, %o1, %o0 and %o1, %o0, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 add %o1, %o0, %o2 ldub [%o2+2], %o3 ldub [%o2+3], %o4 ldub [%o1+%o0], %o0 ldub [%o2+1], %o1 sll %o3, 8, %o2 or %o2, %o4, %o2 sll %o0, 8, %o0 or %o0, %o1, %o0 sll %o0, 16, %o0 or %o0, %o2, %o0 gsocshubham: From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment…
		barannikov88Unsubmitted Not Done Reply Inline Actions I meant you should call `TD.getPrefTypeAlign(Elts->getType())` instead of `TD.getPrefTypeAlign(Elts[0]->getType()` Is the above assembly a result of such change, or did you do something different? barannikov88: I meant you should call `TD.getPrefTypeAlign(Elts->getType())` instead of `TD.getPrefTypeAlign…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am taking alignment of an element from `RshrArr`. Is it fine? gsocshubham: I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am…
		MachinePointerInfo::getConstantPool(
		DAG.getMachineFunction()), Alignment);
		}
	return DAG.getNode(ISD::CTPOP, dl, VT, Tmp);	return DAG.getNode(ISD::CTPOP, dl, VT, Tmp);
	}	}

Context not available.
		dmgreenUnsubmitted Not Done Reply Inline Actions I think this should be an extload from the i8 elements in the table to the original VT. Chain can be DAG.getEntryNode(), and the Ptr we load from needs to be CPIdx + Lookup. dmgreen: I think this should be an extload from the i8 elements in the table to the original VT. Chain…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Please check it. gsocshubham: Done. Please check it.
		craig.topperUnsubmitted Not Done Reply Inline Actions This should be getShiftAmountConstant. craig.topper: This should be getShiftAmountConstant.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated accordingly. gsocshubham: Updated accordingly.
		craig.topperUnsubmitted Not Done Reply Inline Actions Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table lookup. For CTTZ we need a select. CodeGenPrepare rewrites llvm.cttz(i32 %x, i1 false) into a branch around llvm.cttz(i32 %x, i1 true) on some targets. So the difference might be hard to test. craig.topper: Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Can you point me to target and test where above scenario occurs? I will update it accordingly. gsocshubham: Can you point me to target and test where above scenario occurs? I will update it accordingly.
		craig.topperUnsubmitted Not Done Reply Inline Actions Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen prepare. craig.topper: Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @craig.topper - What is the desired output value from the table when x=0? define i32 @f(i32 %x) { entry: %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true) ret i32 %0 } gsocshubham: @craig.topper - What is the desired output value from the table when x=0? ``` define i32 @f…
		craig.topperUnsubmitted Not Done Reply Inline Actions With `i1 true` there is no desired out put for 0. I'm concerned about `i1 false` which needs to produce 32. craig.topper: With `i1 true` there is no desired out put for 0. I'm concerned about `i1 false` which needs to…
		craig.topperUnsubmitted Not Done Reply Inline Actions We shouldn't need a ConstantSDNode and ConstantInt to get the Type. You can use `VT.getTypeForEVT(DAG.getContext())` in place of `CI->getType()` Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT. craig.topper: We shouldn't need a ConstantSDNode and ConstantInt to get the Type*. You can use `VT.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		craig.topperUnsubmitted Not Done Reply Inline Actions `Lookup` needs a sextOrTrunc to the target's pointer type. craig.topper: `Lookup` needs a sextOrTrunc to the target's pointer type.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have created a new variable `TargetLookup` using `sextOrTrunc()`. gsocshubham: I have created a new variable `TargetLookup` using `sextOrTrunc()`.
		craig.topperUnsubmitted Not Done Reply Inline Actions What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But here `L` means left? But that means I don't know what `shr` means. craig.topper: What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions It was a typo in naming. I have updated it accordingly. gsocshubham: It was a typo in naming. I have updated it accordingly.
		craig.topperUnsubmitted Not Done Reply Inline Actions NewVT is the same as VT. craig.topper: NewVT is the same as VT.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated. gsocshubham: Updated.
		craig.topperUnsubmitted Not Done Reply Inline Actions ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can be avoided. craig.topper: ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Modified the blocks. Actually else part depends on `ZExtOrTrunc`. gsocshubham: Modified the blocks. Actually else part depends on `ZExtOrTrunc`.
		craig.topperUnsubmitted Not Done Reply Inline Actions You overwrite `Lookup` here instead of creating `TargetLookup`. craig.topper: You overwrite `Lookup` here instead of creating `TargetLookup`.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		craig.topperUnsubmitted Not Done Reply Inline Actions The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be MVT::i8 craig.topper: The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I tried using `MVT::i8` by converting it to Value Type to be used instead of VT but there seems some compatibility issue. gsocshubham: I tried using `MVT::i8` by converting it to Value Type to be used instead of VT but there seems…
		craig.topperUnsubmitted Not Done Reply Inline Actions The data in the array is only 8-bits. The load type shouldn't be VT. craig.topper: The data in the array is only 8-bits. The load type shouldn't be VT.
		craig.topperUnsubmitted Not Done Reply Inline Actions This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType. craig.topper: This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType.
		craig.topperUnsubmitted Not Done Reply Inline Actions Replace VT.getTypeForEVT(DAG.getContext()) with `CA->getType()` craig.topper:* Replace VT.getTypeForEVT(*DAG.getContext()) with `CA->getType()`
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions If I change alignment as above, then for `call i32 @llvm.cttz.i32(i32 %x, i1 true)` compiled with SPARC - I get - f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %g0, %o1 sub %o1, %o0, %o1 and %o0, %o1, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 add %o1, %o0, %o2 ldub [%o2+2], %o3 ldub [%o2+3], %o4 ldub [%o1+%o0], %o0 ldub [%o2+1], %o1 sll %o3, 8, %o2 or %o2, %o4, %o2 sll %o0, 8, %o0 or %o0, %o1, %o0 sll %o0, 16, %o0 retl or %o0, %o2, %o0 instead of - mov %g0, %o1 sub %o1, %o0, %o1 and %o0, %o1, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 retl ld [%o1+%o0], %o0 gsocshubham: If I change alignment as above, then for `call i32 @llvm.cttz.i32(i32 %x, i1 true)` compiled…
		craig.topperUnsubmitted Not Done Reply Inline Actions The `VT` being passed to the operand named `MemVT` needs to be MemVT. craig.topper: The `VT` being passed to the operand named `MemVT` needs to be MemVT.
		craig.topperUnsubmitted Not Done Reply Inline Actions What happens if you always create a zextload even if the type isn't legal? craig.topper: What happens if you always create a zextload even if the type isn't legal?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions There is no change in generated code nor in LIT test failures as it was originally submitted earlier. gsocshubham: There is no change in generated code nor in LIT test failures as it was originally submitted…
		craig.topperUnsubmitted Not Done Reply Inline Actions A zext load with i8 memVT? craig.topper: A zext load with i8 memVT?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)` compiled with SPARC gives - f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %o0, %o1 cmp %o0, 0 be .LBB0_2 mov %g0, %o0 ! %bb.1: ! %entry sub %o0, %o1, %o0 and %o1, %o0, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 ld [%o1+%o0], %o0 gsocshubham: Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)` compiled…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)` compiled with SPARC gives - f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %o0, %o1 cmp %o0, 0 be .LBB0_2 mov %g0, %o0 ! %bb.1: ! %entry sub %o0, %o1, %o0 and %o1, %o0, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 ld [%o1+%o0], %o0 ZExt with MemVT gsocshubham: > Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)`…
		dmgreenUnsubmitted Not Done Reply Inline Actions If BitWidth isn't 32 or 64, we need to return SDValue. We need to make sure the 64it value is tested too. dmgreen: If BitWidth isn't 32 or 64, we need to return SDValue. We need to make sure the 64it value is…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Added a check. There are already tests present for cttz.i64 which I have updated in this patch. gsocshubham: Done. Added a check. There are already tests present for cttz.i64 which I have updated in this…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`? It can't be `isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)` because that is taken care before this new lowering. gsocshubham: @craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`? It can't…
		dmgreenUnsubmitted Not Done Reply Inline Actions This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is probably easiest. dmgreen: This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated accordingly. gsocshubham: Updated accordingly.
		dmgreenUnsubmitted Not Done Reply Inline Actions Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values. dmgreen: Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Reverted to `uint8_t`. For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am getting table sequence as - .ascii "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<;:" Hence, earlier I had changed it to `uint16_t` where I was getting - .half 0 # 0x0 .half 1 # 0x1 .half 2 # 0x2 .half 7 # 0x7 .half 3 # 0x3 .half 13 # 0xd .half 8 # 0x8 .half 19 # 0x13 .half 4 # 0x4 .half 25 # 0x19 .half 14 # 0xe .half 28 # 0x1c .half 9 # 0x9 .half 34 # 0x22 .half 20 # 0x14 .half 40 # 0x28 .half 5 # 0x5 .half 17 # 0x11 .half 26 # 0x1a .half 38 # 0x26 .half 15 # 0xf .half 46 # 0x2e .half 29 # 0x1d .half 48 # 0x30 .half 10 # 0xa .half 31 # 0x1f .half 35 # 0x23 .half 54 # 0x36 .half 21 # 0x15 .half 50 # 0x32 .half 41 # 0x29 .half 57 # 0x39 .half 63 # 0x3f .half 6 # 0x6 .half 12 # 0xc .half 18 # 0x12 .half 24 # 0x18 .half 27 # 0x1b .half 33 # 0x21 .half 39 # 0x27 .half 16 # 0x10 .half 37 # 0x25 .half 45 # 0x2d .half 47 # 0x2f .half 30 # 0x1e .half 53 # 0x35 .half 49 # 0x31 .half 56 # 0x38 .half 62 # 0x3e .half 11 # 0xb .half 23 # 0x17 .half 32 # 0x20 .half 36 # 0x24 .half 44 # 0x2c .half 52 # 0x34 .half 55 # 0x37 .half 61 # 0x3d .half 22 # 0x16 .half 43 # 0x2b .half 51 # 0x33 .half 60 # 0x3c .half 42 # 0x2a .half 59 # 0x3b .half 58 # 0x3a gsocshubham:* Reverted to `uint8_t`. For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am…
		dmgreenUnsubmitted Not Done Reply Inline Actions It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out. dmgreen: It doesn't need to create a load just to create another load. It can just use the getExtLoad…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed Load. It is just `getExtLoad` with `MVT::i8` now. gsocshubham: Removed Load. It is just `getExtLoad` with `MVT::i8` now.
		dmgreenUnsubmitted Not Done Reply Inline Actions I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {` But I'm not sure what this does to the profitability of the transform. It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth. dmgreen: I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {` But…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have added above check! gsocshubham: I have added above check!
		craig.topperUnsubmitted Not Done Reply Inline Actions The Type for the alignment needs to be the type of the elements in `CA` which is based on what data type is used for the `Table` SmallVector. craig.topper: The Type for the alignment needs to be the type of the elements in `CA` which is based on what…
		barannikov88Unsubmitted Not Done Reply Inline Actions Why the type of the elements specifically and not of the CA itself? AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element. barannikov88: Why the type of the elements specifically and not of the CA itself? AFAIK the alignment passed…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Understood. Thanks! Now, the alignment is of `CA`. gsocshubham: Understood. Thanks! Now, the alignment is of `CA`.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Regarding this check - `!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT)` I am not sure whether these checks are complete to allow new lowering. I have these 3 failures currently - LLVM :: CodeGen/AVR/cttz.ll LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll LLVM :: CodeGen/RISCV/rv64zbb.ll When I update these tests using `utils/update_llc_test_checks.py`, it crashes. Does it mean the new lowering is generating wrong code for `AVR` and `RISCV`? Or is it like I am lowering those cases which should not be lowered in first place. Like adding some more checks in above `if()` block. gsocshubham: Regarding this check - `!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !
		craig.topperUnsubmitted Not Done Reply Inline Actions The RISC-V crash needs this to fix Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy(TD)); Still looking at AVR craig.topper: The RISC-V crash needs this to fix ``` Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Added above line! gsocshubham: Added above line!
		craig.topperUnsubmitted Not Done Reply Inline Actions This needs to be if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt)) return V; Because there is an early out in CTTZTableLookup for types other than i32/i64. craig.topper: This needs to be ``` if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt))…
		craig.topperUnsubmitted Not Done Reply Inline Actions That will fix the AVR failure craig.topper: That will fix the AVR failure
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated as above. gsocshubham: Updated as above.
		craig.topperUnsubmitted Not Done Reply Inline Actions You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo() is equivalent to `this`. craig.topper: You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo()…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Got it. Removed. Is it fine now? gsocshubham: Got it. Removed. Is it fine now?
		craig.topperUnsubmitted Not Done Reply Inline Actions You don't need getZExtValue() here craig.topper: You don't need getZExtValue() here
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. gsocshubham: Removed.
		craig.topperUnsubmitted Not Done Reply Inline Actions I don't think you need an explicit `EVT` around MVT::i8 craig.topper: I don't think you need an explicit `EVT` around MVT::i8
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. Done. gsocshubham: Removed. Done.

llvm/test/CodeGen/RISCV/rv32zbb.ll

Context not available.
	; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill	; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
	; RV32I-NEXT: beqz a0, .LBB2_2	; RV32I-NEXT: beqz a0, .LBB2_2
	; RV32I-NEXT: # %bb.1: # %cond.false	; RV32I-NEXT: # %bb.1: # %cond.false
	; RV32I-NEXT: addi a1, a0, -1	; RV32I-NEXT: neg a1, a0
	; RV32I-NEXT: not a0, a0
	; RV32I-NEXT: and a0, a0, a1
	; RV32I-NEXT: srli a1, a0, 1
	; RV32I-NEXT: lui a2, 349525
	; RV32I-NEXT: addi a2, a2, 1365
	; RV32I-NEXT: and a1, a1, a2
	; RV32I-NEXT: sub a0, a0, a1
	; RV32I-NEXT: lui a1, 209715
	; RV32I-NEXT: addi a1, a1, 819
	; RV32I-NEXT: and a2, a0, a1
	; RV32I-NEXT: srli a0, a0, 2
	; RV32I-NEXT: and a0, a0, a1	; RV32I-NEXT: and a0, a0, a1
	; RV32I-NEXT: add a0, a2, a0	; RV32I-NEXT: lui a1, 30667
	; RV32I-NEXT: srli a1, a0, 4	; RV32I-NEXT: addi a1, a1, 1329
	; RV32I-NEXT: add a0, a0, a1
	; RV32I-NEXT: lui a1, 61681
	; RV32I-NEXT: addi a1, a1, -241
	; RV32I-NEXT: and a0, a0, a1
	; RV32I-NEXT: lui a1, 4112
	; RV32I-NEXT: addi a1, a1, 257
	; RV32I-NEXT: call __mulsi3@plt	; RV32I-NEXT: call __mulsi3@plt
	; RV32I-NEXT: srli a0, a0, 24	; RV32I-NEXT: srli a0, a0, 27
		; RV32I-NEXT: lui a1, %hi(.LCPI2_0)
		; RV32I-NEXT: addi a1, a1, %lo(.LCPI2_0)
		; RV32I-NEXT: add a0, a1, a0
		; RV32I-NEXT: lw a0, 0(a0)
	; RV32I-NEXT: j .LBB2_3	; RV32I-NEXT: j .LBB2_3
	; RV32I-NEXT: .LBB2_2:	; RV32I-NEXT: .LBB2_2:
	; RV32I-NEXT: li a0, 32	; RV32I-NEXT: li a0, 32
Context not available.
	; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill	; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill
	; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill	; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill
	; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill	; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill
	; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill
	; RV32I-NEXT: sw s6, 0(sp) # 4-byte Folded Spill
	; RV32I-NEXT: mv s1, a1	; RV32I-NEXT: mv s1, a1
	; RV32I-NEXT: mv s2, a0	; RV32I-NEXT: mv s2, a0
	; RV32I-NEXT: addi a0, a0, -1	; RV32I-NEXT: neg a0, a0
	; RV32I-NEXT: not a1, s2	; RV32I-NEXT: and a0, s2, a0
	; RV32I-NEXT: and a0, a1, a0	; RV32I-NEXT: lui a1, 30667
	; RV32I-NEXT: srli a1, a0, 1	; RV32I-NEXT: addi s3, a1, 1329
	; RV32I-NEXT: lui a2, 349525
	; RV32I-NEXT: addi s4, a2, 1365
	; RV32I-NEXT: and a1, a1, s4
	; RV32I-NEXT: sub a0, a0, a1
	; RV32I-NEXT: lui a1, 209715
	; RV32I-NEXT: addi s5, a1, 819
	; RV32I-NEXT: and a1, a0, s5
	; RV32I-NEXT: srli a0, a0, 2
	; RV32I-NEXT: and a0, a0, s5
	; RV32I-NEXT: add a0, a1, a0
	; RV32I-NEXT: srli a1, a0, 4
	; RV32I-NEXT: add a0, a0, a1
	; RV32I-NEXT: lui a1, 61681
	; RV32I-NEXT: addi s6, a1, -241
	; RV32I-NEXT: and a0, a0, s6
	; RV32I-NEXT: lui a1, 4112
	; RV32I-NEXT: addi s3, a1, 257
	; RV32I-NEXT: mv a1, s3	; RV32I-NEXT: mv a1, s3
	; RV32I-NEXT: call __mulsi3@plt	; RV32I-NEXT: call __mulsi3@plt
	; RV32I-NEXT: mv s0, a0	; RV32I-NEXT: mv s0, a0
	; RV32I-NEXT: addi a0, s1, -1	; RV32I-NEXT: lui a0, %hi(.LCPI3_0)
	; RV32I-NEXT: not a1, s1	; RV32I-NEXT: addi s4, a0, %lo(.LCPI3_0)
	; RV32I-NEXT: and a0, a1, a0	; RV32I-NEXT: neg a0, s1
	; RV32I-NEXT: srli a1, a0, 1	; RV32I-NEXT: and a0, s1, a0
	; RV32I-NEXT: and a1, a1, s4
	; RV32I-NEXT: sub a0, a0, a1
	; RV32I-NEXT: and a1, a0, s5
	; RV32I-NEXT: srli a0, a0, 2
	; RV32I-NEXT: and a0, a0, s5
	; RV32I-NEXT: add a0, a1, a0
	; RV32I-NEXT: srli a1, a0, 4
	; RV32I-NEXT: add a0, a0, a1
	; RV32I-NEXT: and a0, a0, s6
	; RV32I-NEXT: mv a1, s3	; RV32I-NEXT: mv a1, s3
	; RV32I-NEXT: call __mulsi3@plt	; RV32I-NEXT: call __mulsi3@plt
	; RV32I-NEXT: bnez s2, .LBB3_2	; RV32I-NEXT: bnez s2, .LBB3_2
	; RV32I-NEXT: # %bb.1:	; RV32I-NEXT: # %bb.1:
	; RV32I-NEXT: srli a0, a0, 24	; RV32I-NEXT: srli a0, a0, 27
		; RV32I-NEXT: add a0, s4, a0
		; RV32I-NEXT: lw a0, 0(a0)
	; RV32I-NEXT: addi a0, a0, 32	; RV32I-NEXT: addi a0, a0, 32
	; RV32I-NEXT: j .LBB3_3	; RV32I-NEXT: j .LBB3_3
	; RV32I-NEXT: .LBB3_2:	; RV32I-NEXT: .LBB3_2:
	; RV32I-NEXT: srli a0, s0, 24	; RV32I-NEXT: srli a0, s0, 27
		; RV32I-NEXT: add a0, s4, a0
		; RV32I-NEXT: lw a0, 0(a0)
	; RV32I-NEXT: .LBB3_3:	; RV32I-NEXT: .LBB3_3:
	; RV32I-NEXT: li a1, 0	; RV32I-NEXT: li a1, 0
	; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload	; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
Context not available.
	; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload	; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload
	; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload	; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload
	; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload	; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload
	; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload
	; RV32I-NEXT: lw s6, 0(sp) # 4-byte Folded Reload
	; RV32I-NEXT: addi sp, sp, 32	; RV32I-NEXT: addi sp, sp, 32
	; RV32I-NEXT: ret	; RV32I-NEXT: ret
	;	;
Context not available.

llvm/test/CodeGen/SPARC/cttz.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -march=sparc -mcpu=v9 \| FileCheck %s

				@f.table = internal unnamed_addr constant [32 x i8] c"\00\01\1C\02\1D\0E\18\03\1E\16\14\0F\19\11\04\08\1F\1B\0D\17\15\13\10\07\1A\0C\12\06\0B\05\0A\09", align 1

				barannikov88Unsubmitted Not Done Reply Inline Actions Unused barannikov88: Unused
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. gsocshubham: Removed.
				define i32 @f(i32 %x) {
				; CHECK-LABEL: f:
				; CHECK: .cfi_startproc
				; CHECK-NEXT: ! %bb.0: ! %entry
				; CHECK-NEXT: mov %g0, %o1
				; CHECK-NEXT: sub %o1, %o0, %o1
				; CHECK-NEXT: and %o0, %o1, %o1
				; CHECK-NEXT: sethi 122669, %o2
				; CHECK-NEXT: or %o2, 305, %o2
				; CHECK-NEXT: smul %o1, %o2, %o1
				; CHECK-NEXT: srl %o1, 27, %o1
				; CHECK-NEXT: sethi %hi(.LCPI0_0), %o2
				; CHECK-NEXT: add %o2, %lo(.LCPI0_0), %o2
				; CHECK-NEXT: ld [%o2+%o1], %o1
				; CHECK-NEXT: cmp %o0, 0
				; CHECK-NEXT: move %icc, 0, %o1
				; CHECK-NEXT: retl
				; CHECK-NEXT: mov %o1, %o0
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %x, 0
				%2 = select i1 %1, i32 0, i32 %0
				barannikov88Unsubmitted Not Done Reply Inline Actions Why not just `ret i32 %0` ? barannikov88: Why not just `ret i32 %0` ?
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated test accordingly. gsocshubham: Updated test accordingly.
				%3 = trunc i32 %2 to i8
				%conv = zext i8 %3 to i32
				ret i32 %conv
				}

				; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
				declare i32 @llvm.cttz.i32(i32, i1 immarg) #0

				attributes #0 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

llvm/test/CodeGen/VE/Scalar/cttz.ll

		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=ve-unknown-unknown \| FileCheck %s	; RUN: llc < %s -mtriple=ve-unknown-unknown \| FileCheck %s

	declare i128 @llvm.cttz.i128(i128, i1)	declare i128 @llvm.cttz.i128(i128, i1)
Context not available.
	define signext i32 @func32s(i32 signext %p) {	define signext i32 @func32s(i32 signext %p) {
	; CHECK-LABEL: func32s:	; CHECK-LABEL: func32s:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: adds.w.sx %s1, -1, %s0	; CHECK-NEXT: subs.w.sx %s1, 0, %s0
	; CHECK-NEXT: nnd %s0, %s0, %s1	; CHECK-NEXT: and %s0, %s0, %s1
		; CHECK-NEXT: lea %s1, 125613361
		; CHECK-NEXT: muls.w.sx %s0, %s0, %s1
	; CHECK-NEXT: and %s0, %s0, (32)0	; CHECK-NEXT: and %s0, %s0, (32)0
	; CHECK-NEXT: pcnt %s0, %s0	; CHECK-NEXT: srl %s0, %s0, 27
	dmgreenUnsubmitted Not Done Reply Inline Actions If the target has a ctpop instruction then that should be preferred to the table lookup. dmgreen: If the target has a ctpop instruction then that should be preferred to the table lookup.
	gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have added a check - `!isOperationLegal(ISD::CTPOP, VT)` but that does not seem to help. gsocshubham: I have added a check - `!isOperationLegal(ISD::CTPOP, VT)` but that does not seem to help.
	barannikov88Unsubmitted Not Done Reply Inline Actions VE sets the action to `Promote`. Perhaps, you should call `isOperationExpand` instead and do the transformation if it returns true. barannikov88: VE sets the action to `Promote`. Perhaps, you should call `isOperationExpand` instead and do…
		; CHECK-NEXT: lea %s1, .LCPI2_0@lo
		; CHECK-NEXT: and %s1, %s1, (32)0
		; CHECK-NEXT: lea.sl %s1, .LCPI2_0@hi(, %s1)
		; CHECK-NEXT: or %s0, 0, %s0
		; CHECK-NEXT: ldl.zx %s0, (%s0, %s1)
	; CHECK-NEXT: b.l.t (, %s10)	; CHECK-NEXT: b.l.t (, %s10)
	%r = tail call i32 @llvm.cttz.i32(i32 %p, i1 true)	%r = tail call i32 @llvm.cttz.i32(i32 %p, i1 true)
	ret i32 %r	ret i32 %r
Context not available.
	define zeroext i32 @func32z(i32 zeroext %p) {	define zeroext i32 @func32z(i32 zeroext %p) {
	; CHECK-LABEL: func32z:	; CHECK-LABEL: func32z:
	; CHECK: # %bb.0:	; CHECK: # %bb.0:
	; CHECK-NEXT: adds.w.sx %s1, -1, %s0	; CHECK-NEXT: subs.w.sx %s1, 0, %s0
	; CHECK-NEXT: nnd %s0, %s0, %s1	; CHECK-NEXT: and %s0, %s0, %s1
		; CHECK-NEXT: lea %s1, 125613361
		; CHECK-NEXT: muls.w.sx %s0, %s0, %s1
	; CHECK-NEXT: and %s0, %s0, (32)0	; CHECK-NEXT: and %s0, %s0, (32)0
	; CHECK-NEXT: pcnt %s0, %s0	; CHECK-NEXT: srl %s0, %s0, 27
		; CHECK-NEXT: lea %s1, .LCPI3_0@lo
		; CHECK-NEXT: and %s1, %s1, (32)0
		; CHECK-NEXT: lea.sl %s1, .LCPI3_0@hi(, %s1)
		; CHECK-NEXT: or %s0, 0, %s0
		; CHECK-NEXT: ldl.zx %s0, (%s0, %s1)
	; CHECK-NEXT: b.l.t (, %s10)	; CHECK-NEXT: b.l.t (, %s10)
	%r = tail call i32 @llvm.cttz.i32(i32 %p, i1 true)	%r = tail call i32 @llvm.cttz.i32(i32 %p, i1 true)
	ret i32 %r	ret i32 %r
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

Emit table lookup from TargetLowering::expandCTTZ()ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 444939

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/RISCV/rv32zbb.ll

llvm/test/CodeGen/SPARC/cttz.ll

llvm/test/CodeGen/VE/Scalar/cttz.ll

Emit table lookup from TargetLowering::expandCTTZ()
ClosedPublic