This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
3/7
TargetLowering.h
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
58/128
TargetLowering.cpp
-
test/CodeGen/
-
CodeGen/
-
ARM/
-
cttz.ll
-
RISCV/
-
ctlz-cttz-ctpop.ll
-
rv32zbb.ll
-
rv64zbb.ll
-
SPARC/
2/4
cttz.ll

Differential D128911

Emit table lookup from TargetLowering::expandCTTZ()
ClosedPublic

Authored by gsocshubham on Jun 30 2022, 7:17 AM.

Download Raw Diff

Details

Reviewers

craig.topper
momchil.velikov
greened
efriedma
barannikov88
dmgreen

Commits

rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ()

Summary

This patch emits table lookup in expandCTTZ. The patch is child revision of https://reviews.llvm.org/D113291.

Context -

https://reviews.llvm.org/D113291 transforms set of IR instructions to cttz intrinsic but there are some targets which does not support CTTZ or CTLZ. Hence, I generate a table lookup in TargetLowering::expandCTTZ().

Diff Detail

Unit TestsFailed

	Time	Test
	60,060 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

gsocshubham added inline comments.Jul 15 2022, 5:02 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<` but this changes variable from `unsigned int` to `SDNode`.
7858–7867	I tried `ConstantDataArray::get(DAG.getContext(), RshrArr)` but it does not generate table in constant pool since `RshrArr` is not array of `unsigned int` and not `Constant` I think there might be better alternative to simplify above. I am exploring it.

dmgreen added inline comments.Jul 18 2022, 2:51 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849	Move DeBruijn up so that it can be re-used in the creation of Lookup above. Perhaps add a variable for the Shift Amount too, which is 27 here but in general is BitWidth - log2(BitWdith), I think.
7853	That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time. If DeBruijn is an APInt of the correct size `APInt DeBruijn(32, 0x077CB531U)`, then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added.
llvm/test/CodeGen/VE/Scalar/cttz.ll
44 ↗	(On Diff #444939)	If the target has a ctpop instruction then that should be preferred to the table lookup.

Fixing review comments - Using APInt for DeBruijn constant.

I am using *DeBruijn.getRawData() to get the integer from the APInt but it does not seem appropriate way. Can anyone suggest me good approach to fetch integer from an APInt? I went through Constants.h and found getRawData() is the only way to fetch a data pointer which then I have to derefer it.

Harbormaster completed remote builds in B176845: Diff 446612.Jul 21 2022, 12:49 PM

gsocshubham added inline comments.Jul 21 2022, 12:57 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7849	Done. Can you please give me some suggestions to make usage of APInt minimal and clean?
7853	Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)` Is it correct way? Can you suggest a way to simplify below usage of APInt?
8101	Regarding this line - `RshrArr[Rshr] = i;` In debug mode, when I run `SPARC/cttz.ll` which is present in this patch, I get a segmentation fault. but in release it is working fine. Why is there a difference in the behaviour?
llvm/test/CodeGen/VE/Scalar/cttz.ll
44 ↗	(On Diff #444939)	I have added a check - `!isOperationLegal(ISD::CTPOP, VT)` but that does not seem to help.

barannikov88 added a subscriber: barannikov88.Jul 21 2022, 3:14 PM

barannikov88 added inline comments.Jul 21 2022, 3:23 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8101	Most probably there is a bug in your code. Most probably you haven't enabled assertions in release mode (-DLLVM_ENABLE_ASSERTIONS=ON is the default in debug mode). You should be able to find some hints in the printed backtrace, but first make sure that `llvm-symbolizer` target is built.
llvm/test/CodeGen/VE/Scalar/cttz.ll
44 ↗	(On Diff #444939)	VE sets the action to `Promote`. Perhaps, you should call `isOperationExpand` instead and do the transformation if it returns true.

barannikov88 added inline comments.Jul 21 2022, 3:54 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8083	I think your code should be moved a little higher, before `SDValue Tmp =` . Otherwise the `Tmp` node will be unused if your optimization has been applied. The check will probably need some corrections. Note the comments above, they may help deduce the correct condition. Would also be better if you extract your implementation into separate function.
8086	`getRawData` is very low level method, you should be able to easily avoid it.
8089	Be sure to clang-format your changes before submitting a patch. This is required both by the coding style and by the contribution guidelines.
8091	Ditto, don't use `getRawData`.
8095	There are always exactly 32 elements, you can use plain C array and avoid dynamic memory allocation at all. This is a bit of hard-code though, so SmallVector with 32 minimum size might be better.
8099	Ditto here and on the next line. `APInt Lshr = DeBruijn.rotl(i)` should do.
8115	You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ.

Fix review comments.

There are 3 LIT test failures -

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv32zbb.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll

I will update them as soon as table lookup code is finalized which is still under review.

Harbormaster completed remote builds in B176977: Diff 446780.Jul 22 2022, 4:23 AM

gsocshubham added inline comments.Jul 22 2022, 4:29 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8083	Moved code above `SDValue Tmp =` and added into a separate function. I am using below check to do the optimization - if (NumBitsPerElt == 32 && !VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT))
8086	Done. Now I am using - `getZExtValue()`
8089	I have formatted code using - `$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i --lines=7866:7872`
8091	Updated to `getZExtValue()`
8095	I am using plain C array and got rid of dynamic memory allocation error.
8099	Done.
8101	Thanks. `-DLLVM_ENABLE_ASSERTIONS=ON` this helped.

gsocshubham added inline comments.Jul 22 2022, 4:36 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

8115

From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment?

If I use array alignment CA, I get below assembly as compared to SPARC/cttz.ll assembly if array element alignment is used. What do you think? Should I update from CPIdx to CA?

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        or %o0, %o2, %o0

barannikov88 added inline comments.Jul 22 2022, 8:10 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762	I believe it returns something else
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7991	You should be able to avoid getZExtValue here either. Moreover, you don't need APInt here at all, the values are small enough to be put into 'unsigned', e.g.: `unsinged ShiftAmt = BitWidth - Log2_32(BitWidth);` Same for most other APInts.
8000	Could as well be C array.
8004	There is no standalone "rotl" function which would work for 'unsigned', but you could do something like this: `Lshr = (DeBruijn << i) \| (DeBruijn >> (NumBitsPerElt - Amt)));
8035	This is redundant. Your code is already in TargetLowering class. Just call `isOperationExpand` directly.

barannikov88 added inline comments.Jul 22 2022, 8:19 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8115	I meant you should call `TD.getPrefTypeAlign(Elts->getType())` instead of `TD.getPrefTypeAlign(Elts[0]->getType()` Is the above assembly a result of such change, or did you do something different?
llvm/test/CodeGen/SPARC/cttz.ll
5	Unused
27	Why not just `ret i32 %0` ?

When we get that far, we might need to add a way for targets to opt out of the new lowering if it will cause them problems.

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7990	Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't necessarily a 32bit quantity - there are other values for 8/16/64. We should add i64 at least, so long as they look beneficial (which they should for 64, not sure about the other values). APInt makes that simpler because it can store any size without overflow, and the operations are performed on the right size. BitWidth and ShiftAmt can remain as unsigned - the values will always easily fit in an unsigned value. ShiftWidth can be `Bitwidth - Log2_32(BitWidth)`. I would rename the "NumBitsPerElt" argument to "Bitwidth" too, to make it clear what it is and avoid the need for the two.
7996	I think we can make a constant from a APInt directly.
8001	This shouldn't be a plain C array. It's size is dependant on the BitWidth. `SmallVector<uint8_t> RshrArr(BitWidth, 0)` should create an array that is initialized to 0's with the correct size.
8005	`APInt Rshr = Lshr.lshr(ShiftAmt)`, then use `Rshr.getZExtValue()` in the line below. It is a little strange to use getZExtValue in an array index, but so long as the array is a safe type, it should complain if the value is out of bounds.
8009–8014	Do we need this loop, or can we create the array from the constant pool directly? The elements should be MVT::i8. auto CA = ConstantDataArray::get(DAG.getContext(), RshrArr);
8024	The load should be loading MVT::i8, extended the result into VT.

dmgreen mentioned this in D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Jul 23 2022, 4:47 AM

gsocshubham added inline comments.Jul 24 2022, 2:49 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

8009–8014

If I directly use RshrArr, I get below table in assembly -

.LCPI0_0:
        .ascii  "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"

instead of -

.LCPI0_0:
        .word   0                               ! 0x0
        .word   1                               ! 0x1
        .word   28                              ! 0x1c
        .word   2                               ! 0x2
        .word   29                              ! 0x1d
        .word   14                              ! 0xe
        .word   24                              ! 0x18
        .word   3                               ! 0x3
        .word   30                              ! 0x1e
        .word   22                              ! 0x16
        .word   20                              ! 0x14
        .word   15                              ! 0xf
        .word   25                              ! 0x19
        .word   17                              ! 0x11
        .word   4                               ! 0x4
        .word   8                               ! 0x8
        .word   31                              ! 0x1f
        .word   27                              ! 0x1b
        .word   13                              ! 0xd
        .word   23                              ! 0x17
        .word   21                              ! 0x15
        .word   19                              ! 0x13
        .word   16                              ! 0x10
        .word   7                               ! 0x7
        .word   26                              ! 0x1a
        .word   12                              ! 0xc
        .word   18                              ! 0x12
        .word   6                               ! 0x6
        .word   11                              ! 0xb
        .word   5                               ! 0x5
        .word   10                              ! 0xa
        .word   9                               ! 0x9
        .text
        .globl  f

dmgreen added inline comments.Jul 24 2022, 8:29 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8009–8014	Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will be i32 I think, which uses much more data than it needs, as all the values are in the range 0-BitWidth.

gsocshubham marked an inline comment as not done.Jul 24 2022, 8:58 AM

Fix review comments.

With this revision, there are 2 LIT test failures from RISCV backend which I am guessing will be fixed once we finalize conditions for lowering check.

LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll
LLVM :: CodeGen/RISCV/rv64zbb.ll

Harbormaster completed remote builds in B177241: Diff 447148.Jul 24 2022, 11:33 AM

gsocshubham added inline comments.Jul 24 2022, 11:39 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7996	I tried replacing `DAG.getConstant(DeBruijn.getZExtValue(), dl, VT))` with just `DeBruijn` but there does not seem a direct conversion from `APInt` to SDValue`?
8001	Changed it to SmallVector. Thanks!
8005	Done.
8009–8014	Done accordingly.
8024	Can you please elaborate it? I did not understand it. Do you mean to change `VT` to `MVT::i8` in the return statement?
8115	I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am taking alignment of an element from `RshrArr`. Is it fine?
llvm/test/CodeGen/SPARC/cttz.ll
5	Removed.
27	Updated test accordingly.

gsocshubham added inline comments.Jul 24 2022, 11:52 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7990	Here, do you mean to change `APInt DeBruijn(32, 0x077CB531U)` to `APInt DeBruijn(64, 0x0218A392CD3D5DBF)`? `NumBitsPerElt` is 32 even in the case for `call i64 @llvm.cttz.i64(i64 %x, i1 true)`

Any suggestions on above?

dmgreen added inline comments.Jul 25 2022, 5:51 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7990	It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64: APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U) : APInt(64, 0x0218A392CD3D5DBFULL) For some targets the i64 cttz will be legalized to a i32 cttz. It is for 64bit targets that the 64bit variant is useful.
8024	We want to create an array of i8 elements, load an i8 from it at the right index, and extend that to the original VT. That can either be done by creating a load of an i8 and calling DAG.getZExtOrTrunc - but that might introduce an illegal type where one cannot be created. So it may need to create a ZEXTLOAD load directly.

craig.topper added inline comments.Jul 25 2022, 10:02 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7997	This should be getShiftAmountConstant.
8007	We shouldn't need a ConstantSDNode and ConstantInt to get the Type. You can use `VT.getTypeForEVT(DAG.getContext())` in place of `CI->getType()` Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT.
8064	Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table lookup. For CTTZ we need a select. CodeGenPrepare rewrites llvm.cttz(i32 %x, i1 false) into a branch around llvm.cttz(i32 %x, i1 true) on some targets. So the difference might be hard to test.

craig.topper added inline comments.Jul 25 2022, 10:07 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl.

craig.topper added inline comments.Jul 25 2022, 10:14 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8015	`Lookup` needs a sextOrTrunc to the target's pointer type.

craig.topper added inline comments.Jul 25 2022, 10:18 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001	What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But here `L` means left? But that means I don't know what `shr` means.

Fix review comments.

Harbormaster completed remote builds in B177585: Diff 447662.Jul 26 2022, 6:15 AM

gsocshubham added inline comments.Jul 26 2022, 6:21 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	Updated to `shl()`.
7990	Done.
7997	Updated accordingly.
8001	It was a typo in naming. I have updated it accordingly.
8007	Done.
8015	I have created a new variable `TargetLookup` using `sextOrTrunc()`.
8024	Created a load using `getZExtOrTrunc`. If type is illegal then I am returning `getExtLoad(ISD::ZEXTLOAD....` Let me know your suggestions on above.

gsocshubham added inline comments.Jul 26 2022, 6:30 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064	Can you point me to target and test where above scenario occurs? I will update it accordingly.

craig.topper added inline comments.Jul 26 2022, 9:16 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7999	You overwrite `Lookup` here instead of creating `TargetLookup`.
8012	The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be MVT::i8
8015	The data in the array is only 8-bits. The load type shouldn't be VT.
8019	NewVT is the same as VT.
8023	ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can be avoided.
8064	Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen prepare.

gsocshubham added inline comments.Jul 28 2022, 10:42 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064	@craig.topper - What is the desired output value from the table when x=0? define i32 @f(i32 %x) { entry: %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true) ret i32 %0 }

craig.topper added inline comments.Jul 28 2022, 10:46 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064	With `i1 true` there is no desired out put for 0. I'm concerned about `i1 false` which needs to produce 32.

Fix some of review comments.

Harbormaster completed remote builds in B178248: Diff 448583.Jul 29 2022, 4:49 AM

gsocshubham added inline comments.Jul 29 2022, 4:54 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7999	Done.
8012	I tried using `MVT::i8` by converting it to Value Type to be used instead of VT but there seems some compatibility issue.
8019	Updated.
8023	Modified the blocks. Actually else part depends on `ZExtOrTrunc`.

craig.topper added inline comments.Jul 29 2022, 3:24 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8013	Replace VT.getTypeForEVT(*DAG.getContext()) with `CA->getType()`
8015	What happens if you always create a zextload even if the type isn't legal?
8018	This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType.
8023	The `VT` being passed to the operand named `MemVT` needs to be MemVT.

dmgreen added inline comments.Jul 30 2022, 12:52 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7853	Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.org/wiki/De_Bruijn_sequence#Construction. The lower bits of the last elements use the wrapped-around upper bits of the constant. A rotate is more general than a shift, but for the constants we chose the upper value in the constant is always zero, so they become equivalent. If we are relying on a shift on the generated code, using a shift here too sounds OK.
7989	If BitWidth isn't 32 or 64, we need to return SDValue. We need to make sure the 64it value is tested too.

gsocshubham added inline comments.Aug 1 2022, 4:41 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

8013

If I change alignment as above, then for call i32 @llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC -

I get -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %g0, %o1
        sub %o1, %o0, %o1
        and %o0, %o1, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        add %o1, %o0, %o2
        ldub [%o2+2], %o3
        ldub [%o2+3], %o4
        ldub [%o1+%o0], %o0
        ldub [%o2+1], %o1
        sll %o3, 8, %o2
        or %o2, %o4, %o2
        sll %o0, 8, %o0
        or %o0, %o1, %o0
        sll %o0, 16, %o0
        retl
        or %o0, %o2, %o0

instead of -

mov     %g0, %o1
sub %o1, %o0, %o1
and %o0, %o1, %o0
sethi 122669, %o1
or %o1, 305, %o1
smul %o0, %o1, %o0
srl %o0, 27, %o0
sethi %hi(.LCPI0_0), %o1
add %o1, %lo(.LCPI0_0), %o1
retl
ld [%o1+%o0], %o0

8015

There is no change in generated code nor in LIT test failures as it was originally submitted earlier.

craig.topper added inline comments.Aug 1 2022, 8:31 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8015	A zext load with i8 memVT?

gsocshubham added inline comments.Aug 1 2022, 11:34 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

8015

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0

gsocshubham added inline comments.Aug 1 2022, 11:35 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

8015

Zext eith memVt without checking VT type legality for llvm.cttz.i32(i32 %x, i1 true) compiled with SPARC gives -

f:                                      ! @f
        .cfi_startproc
! %bb.0:                                ! %entry
        mov     %o0, %o1
        cmp %o0, 0
        be      .LBB0_2
        mov     %g0, %o0
! %bb.1:                                ! %entry
        sub %o0, %o1, %o0
        and %o1, %o0, %o0
        sethi 122669, %o1
        or %o1, 305, %o1
        smul %o0, %o1, %o0
        srl %o0, 27, %o0
        sethi %hi(.LCPI0_0), %o1
        add %o1, %lo(.LCPI0_0), %o1
        ld [%o1+%o0], %o0

ZExt with MemVT

Fix review comments.

Harbormaster completed remote builds in B179030: Diff 449665.Aug 3 2022, 7:46 AM

gsocshubham added inline comments.Aug 3 2022, 7:47 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7989	Done. Added a check. There are already tests present for cttz.i64 which I have updated in this patch.

gsocshubham added inline comments.Aug 3 2022, 7:50 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8027	@craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`? It can't be `isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)` because that is taken care before this new lowering.

dmgreen added inline comments.Aug 3 2022, 9:41 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7989	This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is probably easiest.
8005	Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values.
8018	It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out.
8027	I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {` But I'm not sure what this does to the profitability of the transform. It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth.

craig.topper added inline comments.Aug 3 2022, 10:11 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8016	The Type for the alignment needs to be the type of the elements in `CA` which is based on what data type is used for the `Table` SmallVector.

barannikov88 added inline comments.Aug 3 2022, 10:46 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8016	Why the type of the elements specifically and not of the CA itself? AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element.

Fix review comments.

Harbormaster completed remote builds in B179231: Diff 449925.Aug 4 2022, 3:31 AM

gsocshubham added inline comments.Aug 4 2022, 3:39 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
7989	Updated accordingly.
8005	Reverted to `uint8_t`. For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am getting table sequence as - .ascii "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<*;:" Hence, earlier I had changed it to `uint16_t` where I was getting - .half 0 # 0x0 .half 1 # 0x1 .half 2 # 0x2 .half 7 # 0x7 .half 3 # 0x3 .half 13 # 0xd .half 8 # 0x8 .half 19 # 0x13 .half 4 # 0x4 .half 25 # 0x19 .half 14 # 0xe .half 28 # 0x1c .half 9 # 0x9 .half 34 # 0x22 .half 20 # 0x14 .half 40 # 0x28 .half 5 # 0x5 .half 17 # 0x11 .half 26 # 0x1a .half 38 # 0x26 .half 15 # 0xf .half 46 # 0x2e .half 29 # 0x1d .half 48 # 0x30 .half 10 # 0xa .half 31 # 0x1f .half 35 # 0x23 .half 54 # 0x36 .half 21 # 0x15 .half 50 # 0x32 .half 41 # 0x29 .half 57 # 0x39 .half 63 # 0x3f .half 6 # 0x6 .half 12 # 0xc .half 18 # 0x12 .half 24 # 0x18 .half 27 # 0x1b .half 33 # 0x21 .half 39 # 0x27 .half 16 # 0x10 .half 37 # 0x25 .half 45 # 0x2d .half 47 # 0x2f .half 30 # 0x1e .half 53 # 0x35 .half 49 # 0x31 .half 56 # 0x38 .half 62 # 0x3e .half 11 # 0xb .half 23 # 0x17 .half 32 # 0x20 .half 36 # 0x24 .half 44 # 0x2c .half 52 # 0x34 .half 55 # 0x37 .half 61 # 0x3d .half 22 # 0x16 .half 43 # 0x2b .half 51 # 0x33 .half 60 # 0x3c .half 42 # 0x2a .half 59 # 0x3b .half 58 # 0x3a
8016	Understood. Thanks! Now, the alignment is of `CA`.
8018	Removed Load. It is just `getExtLoad` with `MVT::i8` now.
8027	I have added above check!

gsocshubham removed a parent revision: D113291: [AggressiveInstCombine] Lower Table Based CTTZ .Aug 4 2022, 5:07 AM

gsocshubham added inline comments.Aug 4 2022, 8:09 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8064	Regarding this check - `!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT)` I am not sure whether these checks are complete to allow new lowering. I have these 3 failures currently - LLVM :: CodeGen/AVR/cttz.ll LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll LLVM :: CodeGen/RISCV/rv64zbb.ll When I update these tests using `utils/update_llc_test_checks.py`, it crashes. Does it mean the new lowering is generating wrong code for `AVR` and `RISCV`? Or is it like I am lowering those cases which should not be lowered in first place. Like adding some more checks in above `if()` block.

craig.topper added inline comments.Aug 4 2022, 8:42 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8003	The RISC-V crash needs this to fix Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy(TD)); Still looking at AVR

craig.topper added inline comments.Aug 4 2022, 8:59 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8066	This needs to be if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt)) return V; Because there is an early out in CTTZTableLookup for types other than i32/i64.

craig.topper added inline comments.Aug 4 2022, 9:02 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8066	That will fix the AVR failure

Fix LIT test failures.

Add cttz.i64() in SPARC/cttz.ll

Harbormaster completed remote builds in B179351: Diff 450079.Aug 4 2022, 11:21 AM

gsocshubham added inline comments.Aug 4 2022, 11:21 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8003	Added above line!
8066	Updated as above.

@craig.topper - Thanks a lot for the help. Are there any other suggestions on above patch? or Is it good to merge?

craig.topper added inline comments.Aug 4 2022, 11:23 AM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8035	You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo() is equivalent to `this`.

Fix review comment.

gsocshubham added inline comments.Aug 4 2022, 12:37 PM

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8035	Got it. Removed. Is it fine now?

craig.topper added inline comments.Aug 4 2022, 1:35 PM

llvm/include/llvm/CodeGen/TargetLowering.h
4764	No need for `long` here. `SDLoc` should be passed by const reference.

Harbormaster completed remote builds in B179368: Diff 450102.Aug 4 2022, 2:45 PM

Fix review comments.

llvm/include/llvm/CodeGen/TargetLowering.h
4764	Done.

Harbormaster completed remote builds in B179467: Diff 450227.Aug 4 2022, 11:11 PM

Is it in the state to merge?

LGTM to me with those two changes

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001	You don't need getZExtValue() here
8018	I don't think you need an explicit `EVT` around MVT::i8

This revision is now accepted and ready to land.Aug 5 2022, 8:04 AM

barannikov88 added inline comments.Aug 5 2022, 8:55 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762	The comment is still confusing. The function does not return reference to the generated table, it returns the expanded node. The "brief" part of the should also be fixed. Currently, it documents the way the function is used, which it should not. Uses may be added or removed, while the comment should stay the same.

gsocshubham added inline comments.Aug 5 2022, 11:08 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762	How about now? /// Expand CTTZ node if CTLZ/CTPOP operations are not legal. /// \param N Node to expand /// \returns The expansion result or SDValue() if it fails. @barannikov88 - Is brief part fine now? If not, can you please suggest a better one?

barannikov88 added inline comments.Aug 5 2022, 12:31 PM

llvm/include/llvm/CodeGen/TargetLowering.h
4762	Sounds ok to me now. My suggestion would be "Expands CTTZ via table lookup". This would not imply that ctlz/ctpop have to be illegal if you want to use this method. This, however, is what the name of the method already says, making the whole comment kind of redundant. Use it or leave it as it is; I guess it is not a major issue since noone else objected.

Thank you for your work!

Fix review comments.

gsocshubham added inline comments.Aug 6 2022, 12:16 AM

llvm/include/llvm/CodeGen/TargetLowering.h
4762	Thanks for suggestion. It looks better now!
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp
8001	Removed.
8018	Removed. Done.

Harbormaster completed remote builds in B179665: Diff 450481.Aug 6 2022, 1:05 AM

Can someone please commit this patch with below details? I do not have commit access.

Name - Shubham Narlawar
Email - shubham.narlawar@rrlogic.co.in

"Shubham Narlawar <shubham.narlawar@rrlogic.co.in>"

@greened @craig.topper @barannikov88 - Can you commit this?

LGTM - I can commit this for you. I will do so now.

I was trying some testing before I did, everything seems to be alright. The code isn't as optimal as it could be in places, but it is certainly an improvement over what was already present. I tried other backend to see if they all at least compiled successfully. The only one that didn't was BPF, but that appears to be an pre-existing condition. All the others seemed OK for a compile at least, and the results on Arm/hacked up AArch64 produce the correct values. Looks OK.

This revision was landed with ongoing or failed builds.Aug 8 2022, 4:08 AM

Closed by commit rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ() (authored by gsocshubham, committed by dmgreen). · Explain Why

This revision was automatically updated to reflect the committed changes.

dmgreen added a commit: rGab4fc87a9d96: [DAG] Emit table lookup from TargetLowering::expandCTTZ().

john.brawn mentioned this in D133199: [ARM] Constant pools need 4-byte alignment if we only have tADR.Sep 5 2022, 4:12 AM

john.brawn mentioned this in rGe26cadcc32a2: [ARM] Constant pools need 4-byte alignment if we only have tADR.Sep 6 2022, 3:36 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

TargetLowering.h

6 lines

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

50 lines

test/

CodeGen/

ARM/

cttz.ll

381 lines

RISCV/

ctlz-cttz-ctpop.ll

750 lines

rv32zbb.ll

115 lines

rv64zbb.ll

224 lines

SPARC/

cttz.ll

77 lines

Diff 450227

llvm/include/llvm/CodeGen/TargetLowering.h

	Show First 20 Lines • Show All 4,751 Lines • ▼ Show 20 Lines
	SDValue expandCTPOP(SDNode *N, SelectionDAG &DAG) const;			SDValue expandCTPOP(SDNode *N, SelectionDAG &DAG) const;

	/// Expand CTLZ/CTLZ_ZERO_UNDEF nodes. Expands vector/scalar CTLZ nodes,			/// Expand CTLZ/CTLZ_ZERO_UNDEF nodes. Expands vector/scalar CTLZ nodes,
	/// vector nodes can only succeed if all operations are legal/custom.			/// vector nodes can only succeed if all operations are legal/custom.
	/// \param N Node to expand			/// \param N Node to expand
	/// \returns The expansion result or SDValue() if it fails.			/// \returns The expansion result or SDValue() if it fails.
	SDValue expandCTLZ(SDNode *N, SelectionDAG &DAG) const;			SDValue expandCTLZ(SDNode *N, SelectionDAG &DAG) const;

				/// Emit Table Lookup if ISD::CTLZ and ISD::CTPOP are not legal.
				/// \param N Node to expand
				/// \returns Reference to table generated in Constant Pool.
				barannikov88Unsubmitted Not Done Reply Inline Actions I believe it returns something else barannikov88: I believe it returns something else
				barannikov88Unsubmitted Not Done Reply Inline Actions The comment is still confusing. The function does not return reference to the generated table, it returns the expanded node. The "brief" part of the should also be fixed. Currently, it documents the way the function is used, which it should not. Uses may be added or removed, while the comment should stay the same. barannikov88: The comment is still confusing. The function does not return reference to the generated table…
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions How about now? /// Expand CTTZ node if CTLZ/CTPOP operations are not legal. /// \param N Node to expand /// \returns The expansion result or SDValue() if it fails. @barannikov88 - Is brief part fine now? If not, can you please suggest a better one? gsocshubham: How about now? ``` /// Expand CTTZ node if CTLZ/CTPOP operations are not legal. /// \param…
				barannikov88Unsubmitted Not Done Reply Inline Actions Sounds ok to me now. My suggestion would be "Expands CTTZ via table lookup". This would not imply that ctlz/ctpop have to be illegal if you want to use this method. This, however, is what the name of the method already says, making the whole comment kind of redundant. Use it or leave it as it is; I guess it is not a major issue since noone else objected. barannikov88: Sounds ok to me now. My suggestion would be "Expands CTTZ via table lookup". This would not…
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Thanks for suggestion. It looks better now! gsocshubham: Thanks for suggestion. It looks better now!
				SDValue CTTZTableLookup(SDNode *N, SelectionDAG &DAG, const SDLoc &DL, EVT VT,
				SDValue Op, unsigned NumBitsPerElt) const;
				craig.topperUnsubmitted Not Done Reply Inline Actions No need for `long` here. `SDLoc` should be passed by const reference. craig.topper: No need for `long` here. `SDLoc` should be passed by const reference.
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.

	/// Expand CTTZ/CTTZ_ZERO_UNDEF nodes. Expands vector/scalar CTTZ nodes,			/// Expand CTTZ/CTTZ_ZERO_UNDEF nodes. Expands vector/scalar CTTZ nodes,
	/// vector nodes can only succeed if all operations are legal/custom.			/// vector nodes can only succeed if all operations are legal/custom.
	/// \param N Node to expand			/// \param N Node to expand
	/// \returns The expansion result or SDValue() if it fails.			/// \returns The expansion result or SDValue() if it fails.
	SDValue expandCTTZ(SDNode *N, SelectionDAG &DAG) const;			SDValue expandCTTZ(SDNode *N, SelectionDAG &DAG) const;

	/// Expand ABS nodes. Expands vector/scalar ABS nodes,			/// Expand ABS nodes. Expands vector/scalar ABS nodes,
	/// vector nodes can only succeed if all operations are legal/custom.			/// vector nodes can only succeed if all operations are legal/custom.
	▲ Show 20 Lines • Show All 239 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,831 Lines • ▼ Show 20 Lines	if (unsigned PartialCheck = Test & fcNormal) {
APInt ExpLSB = ExpMask & ~(ExpMask.shl(1));		APInt ExpLSB = ExpMask & ~(ExpMask.shl(1));
SDValue ExpLSBV = DAG.getConstant(ExpLSB, DL, IntVT);		SDValue ExpLSBV = DAG.getConstant(ExpLSB, DL, IntVT);
SDValue ExpMinus1 = DAG.getNode(ISD::SUB, DL, IntVT, AbsV, ExpLSBV);		SDValue ExpMinus1 = DAG.getNode(ISD::SUB, DL, IntVT, AbsV, ExpLSBV);
APInt ExpLimit = ExpMask - ExpLSB;		APInt ExpLimit = ExpMask - ExpLSB;
SDValue ExpLimitV = DAG.getConstant(ExpLimit, DL, IntVT);		SDValue ExpLimitV = DAG.getConstant(ExpLimit, DL, IntVT);
PartialRes = DAG.getSetCC(DL, ResultVT, ExpMinus1, ExpLimitV, ISD::SETULT);		PartialRes = DAG.getSetCC(DL, ResultVT, ExpMinus1, ExpLimitV, ISD::SETULT);
if (PartialCheck == fcNegNormal)		if (PartialCheck == fcNegNormal)
PartialRes = DAG.getNode(ISD::AND, DL, ResultVT, PartialRes, SignV);		PartialRes = DAG.getNode(ISD::AND, DL, ResultVT, PartialRes, SignV);
else if (PartialCheck == fcPosNormal) {		else if (PartialCheck == fcPosNormal) {
		dmgreenUnsubmitted Not Done Reply Inline Actions This needs to be in a `if (NumBitsPerElt == 32 && !VT.isVector())` block. Hopefully with other types available too. Else it would fall back to the existing CTPop. If the CTPOP is legal it is worth using it too, instead of this table lookup. dmgreen: This needs to be in a `if (NumBitsPerElt == 32 && !VT.isVector())` block. Hopefully with other…
		gsocshubhamAuthorUnsubmitted Not Done Reply Inline Actions Done. gsocshubham: Done.
SDValue PosSignV =		SDValue PosSignV =
DAG.getNode(ISD::XOR, DL, ResultVT, SignV, ResultInvertionMask);		DAG.getNode(ISD::XOR, DL, ResultVT, SignV, ResultInvertionMask);
PartialRes = DAG.getNode(ISD::AND, DL, ResultVT, PartialRes, PosSignV);		PartialRes = DAG.getNode(ISD::AND, DL, ResultVT, PartialRes, PosSignV);
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions I think that some other DeBruijn constants would be: i8: 0x17 i16: 0x09AF i64: 0x0218A392CD3D5DBF Hopefully I didn't get those wrong. The i64 almost certainly better than the alternative, but I haven't looked at whether the other two are useful. dmgreen: I think that some other DeBruijn constants would be: i8: 0x17 i16: 0x09AF i64…
if (IsF80)		if (IsF80)
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Can anyone suggest how do I get reference to table here? If I get a reference here then I can index into the table using above `Lookup`. gsocshubham: Can anyone suggest how do I get reference to table here? If I get a reference here then I can…
		dmgreenUnsubmitted Not Done Reply Inline Actions Eli had some comments in https://reviews.llvm.org/D113291#3540702. DAGCombiner:: convertSelectOfFPConstantsToLoadOffset has some code that emits a table. dmgreen: Eli had some comments in https://reviews.llvm.org/D113291#3540702. DAGCombiner…
		craig.topperUnsubmitted Not Done Reply Inline Actions You can't use the original table. It might never have existed in IR or will have been removed once all references to it are removed. You'll need to create a new table in the constant pool. craig.topper: You can't use the original table. It might never have existed in IR or will have been removed…
		djtodoroUnsubmitted Not Done Reply Inline Actions Agree, a new table is required here. djtodoro: Agree, a new table is required here.
PartialRes =		PartialRes =
DAG.getNode(ISD::AND, DL, ResultVT, PartialRes, getIntBitIsSet());		DAG.getNode(ISD::AND, DL, ResultVT, PartialRes, getIntBitIsSet());
appendResult(PartialRes);		appendResult(PartialRes);
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions Use a SmallVector for the constant values, probably with uint8_t elements. dmgreen: Use a SmallVector for the constant values, probably with uint8_t elements.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		dmgreenUnsubmitted Not Done Reply Inline Actions Move DeBruijn up so that it can be re-used in the creation of Lookup above. Perhaps add a variable for the Shift Amount too, which is 27 here but in general is BitWidth - log2(BitWdith), I think. dmgreen: Move DeBruijn up so that it can be re-used in the creation of Lookup above. Perhaps add a…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Can you please give me some suggestions to make usage of APInt minimal and clean? gsocshubham: Done. Can you please give me some suggestions to make usage of APInt minimal and clean?

		dmgreenUnsubmitted Not Done Reply Inline Actions "DeBruijn" dmgreen: "DeBruijn"
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
if (!Res)		if (!Res)
return DAG.getConstant(IsInverted, DL, ResultVT);		return DAG.getConstant(IsInverted, DL, ResultVT);
if (IsInverted)		if (IsInverted)
		dmgreenUnsubmitted Not Done Reply Inline Actions I think this should technically be a rotl, because the DeBruijn sequences are cyclic. The top element in the sequence is always 0 though, so it probably doesn't make a difference in practice. It may be better to us APInt for all the DeBrujin constants though. It is easier to be more precise with them, and it's useful when i64 constants are supported. dmgreen: I think this should technically be a rotl, because the DeBruijn sequences are cyclic. The top…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<` but this changes variable from `unsigned int` to `SDNode`. gsocshubham: I can use `DAG.getNode(ISD::ROTL, dl, VT, Op, DAG.getConstant(i, dl, SHVT));` instead of `<<`…
		dmgreenUnsubmitted Not Done Reply Inline Actions That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and the calculations we are doing at compile time. If DeBruijn is an APInt of the correct size `APInt DeBruijn(32, 0x077CB531U)`, then it will have a rotl method. The advantage of APInt is that they will also be able to work with other bitwidths, once those are added. dmgreen: That seems to be mixing up the instructions we want to produce in the output (DAG.getNode) and…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)` Is it correct way? Can you suggest a way to simplify below usage of APInt? gsocshubham: Understood. I have used - `APInt DeBruijn(32, 0x077CB531U)` Is it correct way? Can you suggest…
		craig.topperUnsubmitted Not Done Reply Inline Actions Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should be a shl not rotl. craig.topper: Is it really cyclic? The multiply is a shift by a power of 2. So emulating the multiply should…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated to `shl()`. gsocshubham: Updated to `shl()`.
		dmgreenUnsubmitted Not Done Reply Inline Actions Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.org/wiki/De_Bruijn_sequence#Construction. The lower bits of the last elements use the wrapped-around upper bits of the constant. A rotate is more general than a shift, but for the constants we chose the upper value in the constant is always zero, so they become equivalent. If we are relying on a shift on the generated code, using a shift here too sounds OK. dmgreen: Yeah DeBruijn sequences are cyclic. They are explained here: https://en.wikipedia.
Res = DAG.getNode(ISD::XOR, DL, ResultVT, Res, ResultInvertionMask);		Res = DAG.getNode(ISD::XOR, DL, ResultVT, Res, ResultInvertionMask);
return Res;		return Res;
}		}

// Only expand vector types if we have the appropriate vector bit operations.		// Only expand vector types if we have the appropriate vector bit operations.
static bool canExpandVectorCTPOP(const TargetLowering &TLI, EVT VT) {		static bool canExpandVectorCTPOP(const TargetLowering &TLI, EVT VT) {
assert(VT.isVector() && "Expected vector type");		assert(VT.isVector() && "Expected vector type");
		craig.topperUnsubmitted Not Done Reply Inline Actions `dyn_cast` should only be used if the cast can fail. Otherwise use `cast` craig.topper: `dyn_cast` should only be used if the cast can fail. Otherwise use `cast`
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Understood. Changed it to `cast`. gsocshubham: Understood. Changed it to `cast`.
unsigned Len = VT.getScalarSizeInBits();		unsigned Len = VT.getScalarSizeInBits();
return TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&		return TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&
		craig.topperUnsubmitted Not Done Reply Inline Actions Same comment about `dyn_cast` craig.topper: Same comment about `dyn_cast`
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Changed it to `cast`. gsocshubham: Changed it to `cast`.
TLI.isOperationLegalOrCustom(ISD::SUB, VT) &&		TLI.isOperationLegalOrCustom(ISD::SUB, VT) &&
TLI.isOperationLegalOrCustom(ISD::SRL, VT) &&		TLI.isOperationLegalOrCustom(ISD::SRL, VT) &&
(Len == 8 \|\| TLI.isOperationLegalOrCustom(ISD::MUL, VT)) &&		(Len == 8 \|\| TLI.isOperationLegalOrCustom(ISD::MUL, VT)) &&
		craig.topperUnsubmitted Not Done Reply Inline Actions Why do we need to create an explicit ArrayRef here? craig.topper: Why do we need to create an explicit ArrayRef here?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Not required. I have removed it. gsocshubham: Not required. I have removed it.
TLI.isOperationLegalOrCustomOrPromote(ISD::AND, VT);		TLI.isOperationLegalOrCustomOrPromote(ISD::AND, VT);
}		}
		dmgreenUnsubmitted Not Done Reply Inline Actions I think a lot of this may simplify into `auto CA = ConstantDataArray::get(DAG.getContext(), RshrArr);` or something like it. I don't think we need to create ConstantSDNode, just to get the Constant's from them. dmgreen: I think a lot of this may simplify into `auto CA = ConstantDataArray::get(DAG.getContext()…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I tried `ConstantDataArray::get(DAG.getContext(), RshrArr)` but it does not generate table in constant pool since `RshrArr` is not array of `unsigned int` and not `Constant` I think there might be better alternative to simplify above. I am exploring it. gsocshubham: I tried `ConstantDataArray::get(*DAG.getContext(), RshrArr)` but it does not generate table in…

SDValue TargetLowering::expandCTPOP(SDNode *Node, SelectionDAG &DAG) const {		SDValue TargetLowering::expandCTPOP(SDNode *Node, SelectionDAG &DAG) const {
SDLoc dl(Node);		SDLoc dl(Node);
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());		EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
SDValue Op = Node->getOperand(0);		SDValue Op = Node->getOperand(0);
unsigned Len = VT.getScalarSizeInBits();		unsigned Len = VT.getScalarSizeInBits();
assert(VT.isInteger() && "CTPOP not implemented for this type.");		assert(VT.isInteger() && "CTPOP not implemented for this type.");

		dmgreenUnsubmitted Not Done Reply Inline Actions I think this should be an extload from the i8 elements in the table to the original VT. Chain can be DAG.getEntryNode(), and the Ptr we load from needs to be CPIdx + Lookup. dmgreen: I think this should be an extload from the i8 elements in the table to the original VT. Chain…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Please check it. gsocshubham: Done. Please check it.
// TODO: Add support for irregular type lengths.		// TODO: Add support for irregular type lengths.
if (!(Len <= 128 && Len % 8 == 0))		if (!(Len <= 128 && Len % 8 == 0))
return SDValue();		return SDValue();

// Only expand vector types if we have the appropriate vector bit operations.		// Only expand vector types if we have the appropriate vector bit operations.
if (VT.isVector() && !canExpandVectorCTPOP(*this, VT))		if (VT.isVector() && !canExpandVectorCTPOP(*this, VT))
return SDValue();		return SDValue();

▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	for (unsigned i = 0; (1U << i) < NumBitsPerElt; ++i) {
SDValue Tmp = DAG.getConstant(1ULL << i, dl, ShVT);		SDValue Tmp = DAG.getConstant(1ULL << i, dl, ShVT);
Op = DAG.getNode(ISD::OR, dl, VT, Op,		Op = DAG.getNode(ISD::OR, dl, VT, Op,
DAG.getNode(ISD::SRL, dl, VT, Op, Tmp));		DAG.getNode(ISD::SRL, dl, VT, Op, Tmp));
}		}
Op = DAG.getNOT(dl, Op, VT);		Op = DAG.getNOT(dl, Op, VT);
return DAG.getNode(ISD::CTPOP, dl, VT, Op);		return DAG.getNode(ISD::CTPOP, dl, VT, Op);
}		}

		SDValue TargetLowering::CTTZTableLookup(SDNode *Node, SelectionDAG &DAG,
		const SDLoc &DL, EVT VT, SDValue Op,
		unsigned BitWidth) const {
		if (BitWidth != 32 && BitWidth != 64)
		dmgreenUnsubmitted Not Done Reply Inline Actions If BitWidth isn't 32 or 64, we need to return SDValue. We need to make sure the 64it value is tested too. dmgreen: If BitWidth isn't 32 or 64, we need to return SDValue. We need to make sure the 64it value is…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Added a check. There are already tests present for cttz.i64 which I have updated in this patch. gsocshubham: Done. Added a check. There are already tests present for cttz.i64 which I have updated in this…
		dmgreenUnsubmitted Not Done Reply Inline Actions This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is probably easiest. dmgreen: This needs to handle other types too - just doing `if (BitWidth != 32 && BitWidth != 64)` is…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated accordingly. gsocshubham: Updated accordingly.
		return SDValue();
		dmgreenUnsubmitted Not Done Reply Inline Actions Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't necessarily a 32bit quantity - there are other values for 8/16/64. We should add i64 at least, so long as they look beneficial (which they should for 64, not sure about the other values). APInt makes that simpler because it can store any size without overflow, and the operations are performed on the right size. BitWidth and ShiftAmt can remain as unsigned - the values will always easily fit in an unsigned value. ShiftWidth can be `Bitwidth - Log2_32(BitWidth)`. I would rename the "NumBitsPerElt" argument to "Bitwidth" too, to make it clear what it is and avoid the need for the two. dmgreen: Sorry the suggestion for using APInt wasn't very clear. The point is that DeBruijn isn't…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Here, do you mean to change `APInt DeBruijn(32, 0x077CB531U)` to `APInt DeBruijn(64, 0x0218A392CD3D5DBF)`? `NumBitsPerElt` is 32 even in the case for `call i64 @llvm.cttz.i64(i64 %x, i1 true)` gsocshubham: Here, do you mean to change `APInt DeBruijn(32, 0x077CB531U)` to `APInt DeBruijn(64…
		dmgreenUnsubmitted Not Done Reply Inline Actions It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64: APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U) : APInt(64, 0x0218A392CD3D5DBFULL) For some targets the i64 cttz will be legalized to a i32 cttz. It is for 64bit targets that the 64bit variant is useful. dmgreen: It would be based on the BitWidth, providing that the BitWidth is known to be 32 or 64: ```…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		APInt DeBruijn = BitWidth == 32 ? APInt(32, 0x077CB531U)
		barannikov88Unsubmitted Not Done Reply Inline Actions You should be able to avoid getZExtValue here either. Moreover, you don't need APInt here at all, the values are small enough to be put into 'unsigned', e.g.: `unsinged ShiftAmt = BitWidth - Log2_32(BitWidth);` Same for most other APInts. barannikov88: You should be able to avoid getZExtValue here either. Moreover, you don't need APInt here at…
		: APInt(64, 0x0218A392CD3D5DBFULL);
		const DataLayout &TD = DAG.getDataLayout();
		MachinePointerInfo PtrInfo =
		MachinePointerInfo::getConstantPool(DAG.getMachineFunction());
		unsigned ShiftAmt = BitWidth - Log2_32(BitWidth);
		dmgreenUnsubmitted Not Done Reply Inline Actions I think we can make a constant from a APInt directly. dmgreen: I think we can make a constant from a APInt directly.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I tried replacing `DAG.getConstant(DeBruijn.getZExtValue(), dl, VT))` with just `DeBruijn` but there does not seem a direct conversion from `APInt` to SDValue`? gsocshubham: I tried replacing `DAG.getConstant(DeBruijn.getZExtValue(), dl, VT))` with just `DeBruijn` but…
		SDValue Neg = DAG.getNode(ISD::SUB, DL, VT, DAG.getConstant(0, DL, VT), Op);
		craig.topperUnsubmitted Not Done Reply Inline Actions This should be getShiftAmountConstant. craig.topper: This should be getShiftAmountConstant.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated accordingly. gsocshubham: Updated accordingly.
		SDValue Lookup = DAG.getNode(
		ISD::SRL, DL, VT,
		craig.topperUnsubmitted Not Done Reply Inline Actions You overwrite `Lookup` here instead of creating `TargetLookup`. craig.topper: You overwrite `Lookup` here instead of creating `TargetLookup`.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		DAG.getNode(ISD::MUL, DL, VT, DAG.getNode(ISD::AND, DL, VT, Op, Neg),
		barannikov88Unsubmitted Not Done Reply Inline Actions Could as well be C array. barannikov88: Could as well be C array.
		DAG.getConstant(DeBruijn.getZExtValue(), DL, VT)),
		dmgreenUnsubmitted Not Done Reply Inline Actions This shouldn't be a plain C array. It's size is dependant on the BitWidth. `SmallVector<uint8_t> RshrArr(BitWidth, 0)` should create an array that is initialized to 0's with the correct size. dmgreen: This shouldn't be a plain C array. It's size is dependant on the BitWidth.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Changed it to SmallVector. Thanks! gsocshubham: Changed it to SmallVector. Thanks!
		craig.topperUnsubmitted Not Done Reply Inline Actions What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But here `L` means left? But that means I don't know what `shr` means. craig.topper: What does `Lshr` mean? `shr` is usually "shift right", and lshr is logical shift right. But…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions It was a typo in naming. I have updated it accordingly. gsocshubham: It was a typo in naming. I have updated it accordingly.
		craig.topperUnsubmitted Not Done Reply Inline Actions You don't need getZExtValue() here craig.topper: You don't need getZExtValue() here
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. gsocshubham: Removed.
		DAG.getConstant(ShiftAmt, DL, VT));
		Lookup = DAG.getSExtOrTrunc(Lookup, DL, getPointerTy(TD));
		craig.topperUnsubmitted Not Done Reply Inline Actions The RISC-V crash needs this to fix Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy(TD)); Still looking at AVR craig.topper: The RISC-V crash needs this to fix ``` Lookup = DAG.getSExtOrTrunc(Lookup, dl, getPointerTy…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Added above line! gsocshubham: Added above line!

		barannikov88Unsubmitted Not Done Reply Inline Actions There is no standalone "rotl" function which would work for 'unsigned', but you could do something like this: `Lshr = (DeBruijn << i) \| (DeBruijn >> (NumBitsPerElt - Amt))); barannikov88: There is no standalone "rotl" function which would work for 'unsigned', but you could do…
		SmallVector<uint8_t> Table(BitWidth, 0);
		dmgreenUnsubmitted Not Done Reply Inline Actions `APInt Rshr = Lshr.lshr(ShiftAmt)`, then use `Rshr.getZExtValue()` in the line below. It is a little strange to use getZExtValue in an array index, but so long as the array is a safe type, it should complain if the value is out of bounds. dmgreen: `APInt Rshr = Lshr.lshr(ShiftAmt)`, then use `Rshr.getZExtValue()` in the line below. It is a…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		dmgreenUnsubmitted Not Done Reply Inline Actions Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the values. dmgreen: Why has this become uint16_t? I don't think it needs to be any bigger than i8 to hold all the…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Reverted to `uint8_t`. For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am getting table sequence as - .ascii "\000\001\002\007\003\r\b\023\004\031\016\034\t\"\024(\005\021\032&\017.\0350\n\037#6\0252)9?\006\f\022\030\033!'\020%-/\036518>\013\027 $,47=\026+3<;:" Hence, earlier I had changed it to `uint16_t` where I was getting - .half 0 # 0x0 .half 1 # 0x1 .half 2 # 0x2 .half 7 # 0x7 .half 3 # 0x3 .half 13 # 0xd .half 8 # 0x8 .half 19 # 0x13 .half 4 # 0x4 .half 25 # 0x19 .half 14 # 0xe .half 28 # 0x1c .half 9 # 0x9 .half 34 # 0x22 .half 20 # 0x14 .half 40 # 0x28 .half 5 # 0x5 .half 17 # 0x11 .half 26 # 0x1a .half 38 # 0x26 .half 15 # 0xf .half 46 # 0x2e .half 29 # 0x1d .half 48 # 0x30 .half 10 # 0xa .half 31 # 0x1f .half 35 # 0x23 .half 54 # 0x36 .half 21 # 0x15 .half 50 # 0x32 .half 41 # 0x29 .half 57 # 0x39 .half 63 # 0x3f .half 6 # 0x6 .half 12 # 0xc .half 18 # 0x12 .half 24 # 0x18 .half 27 # 0x1b .half 33 # 0x21 .half 39 # 0x27 .half 16 # 0x10 .half 37 # 0x25 .half 45 # 0x2d .half 47 # 0x2f .half 30 # 0x1e .half 53 # 0x35 .half 49 # 0x31 .half 56 # 0x38 .half 62 # 0x3e .half 11 # 0xb .half 23 # 0x17 .half 32 # 0x20 .half 36 # 0x24 .half 44 # 0x2c .half 52 # 0x34 .half 55 # 0x37 .half 61 # 0x3d .half 22 # 0x16 .half 43 # 0x2b .half 51 # 0x33 .half 60 # 0x3c .half 42 # 0x2a .half 59 # 0x3b .half 58 # 0x3a gsocshubham:* Reverted to `uint8_t`. For `@llvm.cttz.i64(i64 %x, i1 true)` compiled with `riscv64`, I am…
		for (unsigned i = 0; i < BitWidth; i++) {
		APInt Shl = DeBruijn.shl(i);
		craig.topperUnsubmitted Not Done Reply Inline Actions We shouldn't need a ConstantSDNode and ConstantInt to get the Type. You can use `VT.getTypeForEVT(DAG.getContext())` in place of `CI->getType()` Though really we should be using an array of Int8Ty and doing a zextload from i8 to VT. craig.topper: We shouldn't need a ConstantSDNode and ConstantInt to get the Type*. You can use `VT.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.
		APInt Lshr = Shl.lshr(ShiftAmt);
		Table[Lshr.getZExtValue()] = i;
		}

		// Create a ConstantArray in Constant Pool
		craig.topperUnsubmitted Not Done Reply Inline Actions The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be MVT::i8 craig.topper: The ConstantDataArray is an array of bytes, but the alignment here is based on VT. Should be…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I tried using `MVT::i8` by converting it to Value Type to be used instead of VT but there seems some compatibility issue. gsocshubham: I tried using `MVT::i8` by converting it to Value Type to be used instead of VT but there seems…
		auto CA = ConstantDataArray::get(DAG.getContext(), Table);
		craig.topperUnsubmitted Not Done Reply Inline Actions Replace VT.getTypeForEVT(DAG.getContext()) with `CA->getType()` craig.topper:* Replace VT.getTypeForEVT(*DAG.getContext()) with `CA->getType()`
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions If I change alignment as above, then for `call i32 @llvm.cttz.i32(i32 %x, i1 true)` compiled with SPARC - I get - f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %g0, %o1 sub %o1, %o0, %o1 and %o0, %o1, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 add %o1, %o0, %o2 ldub [%o2+2], %o3 ldub [%o2+3], %o4 ldub [%o1+%o0], %o0 ldub [%o2+1], %o1 sll %o3, 8, %o2 or %o2, %o4, %o2 sll %o0, 8, %o0 or %o0, %o1, %o0 sll %o0, 16, %o0 retl or %o0, %o2, %o0 instead of - mov %g0, %o1 sub %o1, %o0, %o1 and %o0, %o1, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 retl ld [%o1+%o0], %o0 gsocshubham: If I change alignment as above, then for `call i32 @llvm.cttz.i32(i32 %x, i1 true)` compiled…
		SDValue CPIdx = DAG.getConstantPool(CA, getPointerTy(TD),
		dmgreenUnsubmitted Not Done Reply Inline Actions Do we need this loop, or can we create the array from the constant pool directly? The elements should be MVT::i8. auto CA = ConstantDataArray::get(DAG.getContext(), RshrArr); dmgreen: Do we need this loop, or can we create the array from the constant pool directly? The elements…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions If I directly use `RshrArr`, I get below table in assembly - .LCPI0_0: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t" instead of - .LCPI0_0: .word 0 ! 0x0 .word 1 ! 0x1 .word 28 ! 0x1c .word 2 ! 0x2 .word 29 ! 0x1d .word 14 ! 0xe .word 24 ! 0x18 .word 3 ! 0x3 .word 30 ! 0x1e .word 22 ! 0x16 .word 20 ! 0x14 .word 15 ! 0xf .word 25 ! 0x19 .word 17 ! 0x11 .word 4 ! 0x4 .word 8 ! 0x8 .word 31 ! 0x1f .word 27 ! 0x1b .word 13 ! 0xd .word 23 ! 0x17 .word 21 ! 0x15 .word 19 ! 0x13 .word 16 ! 0x10 .word 7 ! 0x7 .word 26 ! 0x1a .word 12 ! 0xc .word 18 ! 0x12 .word 6 ! 0x6 .word 11 ! 0xb .word 5 ! 0x5 .word 10 ! 0xa .word 9 ! 0x9 .text .globl f gsocshubham: If I directly use `RshrArr`, I get below table in assembly - ``` .LCPI0_0: .ascii…
		dmgreenUnsubmitted Not Done Reply Inline Actions Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will be i32 I think, which uses much more data than it needs, as all the values are in the range 0-BitWidth. dmgreen: Yes - that seems better to be, so long as it is loading i8's from the array. The .word's will…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done accordingly. gsocshubham: Done accordingly.
		TD.getPrefTypeAlign(CA->getType()));
		craig.topperUnsubmitted Not Done Reply Inline Actions `Lookup` needs a sextOrTrunc to the target's pointer type. craig.topper: `Lookup` needs a sextOrTrunc to the target's pointer type.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have created a new variable `TargetLookup` using `sextOrTrunc()`. gsocshubham: I have created a new variable `TargetLookup` using `sextOrTrunc()`.
		craig.topperUnsubmitted Not Done Reply Inline Actions The data in the array is only 8-bits. The load type shouldn't be VT. craig.topper: The data in the array is only 8-bits. The load type shouldn't be VT.
		craig.topperUnsubmitted Not Done Reply Inline Actions What happens if you always create a zextload even if the type isn't legal? craig.topper: What happens if you always create a zextload even if the type isn't legal?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions There is no change in generated code nor in LIT test failures as it was originally submitted earlier. gsocshubham: There is no change in generated code nor in LIT test failures as it was originally submitted…
		craig.topperUnsubmitted Not Done Reply Inline Actions A zext load with i8 memVT? craig.topper: A zext load with i8 memVT?
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)` compiled with SPARC gives - f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %o0, %o1 cmp %o0, 0 be .LBB0_2 mov %g0, %o0 ! %bb.1: ! %entry sub %o0, %o1, %o0 and %o1, %o0, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 ld [%o1+%o0], %o0 gsocshubham: Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)` compiled…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)` compiled with SPARC gives - f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %o0, %o1 cmp %o0, 0 be .LBB0_2 mov %g0, %o0 ! %bb.1: ! %entry sub %o0, %o1, %o0 and %o1, %o0, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 ld [%o1+%o0], %o0 ZExt with MemVT gsocshubham: > Zext eith memVt without checking VT type legality for `llvm.cttz.i32(i32 %x, i1 true)`…
		SDValue ExtLoad = DAG.getExtLoad(ISD::ZEXTLOAD, DL, VT, DAG.getEntryNode(),
		craig.topperUnsubmitted Not Done Reply Inline Actions The Type for the alignment needs to be the type of the elements in `CA` which is based on what data type is used for the `Table` SmallVector. craig.topper: The Type for the alignment needs to be the type of the elements in `CA` which is based on what…
		barannikov88Unsubmitted Not Done Reply Inline Actions Why the type of the elements specifically and not of the CA itself? AFAIK the alignment passed to the load is the "base alignment"; the alignment of the accessed element will be inferred based on the base alignment and the offset of the element. barannikov88: Why the type of the elements specifically and not of the CA itself? AFAIK the alignment passed…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Understood. Thanks! Now, the alignment is of `CA`. gsocshubham: Understood. Thanks! Now, the alignment is of `CA`.
		DAG.getMemBasePlusOffset(CPIdx, Lookup, DL),
		PtrInfo, EVT(MVT::i8));
		craig.topperUnsubmitted Not Done Reply Inline Actions This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType. craig.topper: This ZextOrTrunc doesn't do anything, the Load already has VT as its ValueType.
		dmgreenUnsubmitted Not Done Reply Inline Actions It doesn't need to create a load just to create another load. It can just use the getExtLoad method with the MemoryVT set to MVT::i8. I think the other parameters can be default/left out. dmgreen: It doesn't need to create a load just to create another load. It can just use the getExtLoad…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed Load. It is just `getExtLoad` with `MVT::i8` now. gsocshubham: Removed Load. It is just `getExtLoad` with `MVT::i8` now.
		craig.topperUnsubmitted Not Done Reply Inline Actions I don't think you need an explicit `EVT` around MVT::i8 craig.topper: I don't think you need an explicit `EVT` around MVT::i8
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. Done. gsocshubham: Removed. Done.
		if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {
		craig.topperUnsubmitted Not Done Reply Inline Actions NewVT is the same as VT. craig.topper: NewVT is the same as VT.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated. gsocshubham: Updated.
		EVT SetCCVT =
		getSetCCResultType(DAG.getDataLayout(), *DAG.getContext(), VT);
		SDValue Zero = DAG.getConstant(0, DL, VT);
		SDValue SrcIsZero = DAG.getSetCC(DL, SetCCVT, Op, Zero, ISD::SETEQ);
		craig.topperUnsubmitted Not Done Reply Inline Actions ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can be avoided. craig.topper: ZExtOrTrunc and Load are unused if this else is taken. We shouldn't create dead nodes if it can…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Modified the blocks. Actually else part depends on `ZExtOrTrunc`. gsocshubham: Modified the blocks. Actually else part depends on `ZExtOrTrunc`.
		craig.topperUnsubmitted Not Done Reply Inline Actions The `VT` being passed to the operand named `MemVT` needs to be MemVT. craig.topper: The `VT` being passed to the operand named `MemVT` needs to be MemVT.
		ExtLoad = DAG.getSelect(DL, VT, SrcIsZero,
		dmgreenUnsubmitted Not Done Reply Inline Actions The load should be loading MVT::i8, extended the result into VT. dmgreen: The load should be loading MVT::i8, extended the result into VT.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Can you please elaborate it? I did not understand it. Do you mean to change `VT` to `MVT::i8` in the return statement? gsocshubham: Can you please elaborate it? I did not understand it. Do you mean to change `VT` to `MVT::i8`…
		dmgreenUnsubmitted Not Done Reply Inline Actions We want to create an array of i8 elements, load an i8 from it at the right index, and extend that to the original VT. That can either be done by creating a load of an i8 and calling DAG.getZExtOrTrunc - but that might introduce an illegal type where one cannot be created. So it may need to create a ZEXTLOAD load directly. dmgreen: We want to create an array of i8 elements, load an i8 from it at the right index, and extend…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Created a load using `getZExtOrTrunc`. If type is illegal then I am returning `getExtLoad(ISD::ZEXTLOAD....` Let me know your suggestions on above. gsocshubham: Created a load using `getZExtOrTrunc`. If type is illegal then I am returning `getExtLoad(ISD…
		DAG.getConstant(BitWidth, DL, VT), ExtLoad);
		}
		return ExtLoad;
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`? It can't be `isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)` because that is taken care before this new lowering. gsocshubham: @craig.topper - What should be the check here for case `llvm.cttz(i32 %x, i1 false)`? It can't…
		dmgreenUnsubmitted Not Done Reply Inline Actions I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {` But I'm not sure what this does to the profitability of the transform. It might be possible to just encode it into the table. It involves changing how the table to have bitwidth+1 elements, and using a different debruijn constant so that the zero element can be the bitwidth. dmgreen: I think, if I understand, this should be `if (Node->getOpcode() != ISD::CTLZ_ZERO_UNDEF) {` But…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have added above check! gsocshubham: I have added above check!
		}

SDValue TargetLowering::expandCTTZ(SDNode *Node, SelectionDAG &DAG) const {		SDValue TargetLowering::expandCTTZ(SDNode *Node, SelectionDAG &DAG) const {
SDLoc dl(Node);		SDLoc dl(Node);
EVT VT = Node->getValueType(0);		EVT VT = Node->getValueType(0);
SDValue Op = Node->getOperand(0);		SDValue Op = Node->getOperand(0);
unsigned NumBitsPerElt = VT.getScalarSizeInBits();		unsigned NumBitsPerElt = VT.getScalarSizeInBits();

		barannikov88Unsubmitted Not Done Reply Inline Actions This is redundant. Your code is already in TargetLowering class. Just call `isOperationExpand` directly. barannikov88: This is redundant. Your code is already in TargetLowering class. Just call `isOperationExpand`…
		craig.topperUnsubmitted Not Done Reply Inline Actions You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo() is equivalent to `this`. craig.topper: You don't need this. We're in a member function of TargetLowering. DAG.getTargetLoweringInfo()…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Got it. Removed. Is it fine now? gsocshubham: Got it. Removed. Is it fine now?
// If the non-ZERO_UNDEF version is supported we can use that instead.		// If the non-ZERO_UNDEF version is supported we can use that instead.
if (Node->getOpcode() == ISD::CTTZ_ZERO_UNDEF &&		if (Node->getOpcode() == ISD::CTTZ_ZERO_UNDEF &&
isOperationLegalOrCustom(ISD::CTTZ, VT))		isOperationLegalOrCustom(ISD::CTTZ, VT))
return DAG.getNode(ISD::CTTZ, dl, VT, Op);		return DAG.getNode(ISD::CTTZ, dl, VT, Op);

// If the ZERO_UNDEF version is supported use that and handle the zero case.		// If the ZERO_UNDEF version is supported use that and handle the zero case.
if (isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {		if (isOperationLegalOrCustom(ISD::CTTZ_ZERO_UNDEF, VT)) {
EVT SetCCVT =		EVT SetCCVT =
Show All 11 Lines	if (VT.isVector() && (!isPowerOf2_32(NumBitsPerElt) \|\|
(!isOperationLegalOrCustom(ISD::CTPOP, VT) &&		(!isOperationLegalOrCustom(ISD::CTPOP, VT) &&
!isOperationLegalOrCustom(ISD::CTLZ, VT) &&		!isOperationLegalOrCustom(ISD::CTLZ, VT) &&
!canExpandVectorCTPOP(*this, VT)) \|\|		!canExpandVectorCTPOP(*this, VT)) \|\|
!isOperationLegalOrCustom(ISD::SUB, VT) \|\|		!isOperationLegalOrCustom(ISD::SUB, VT) \|\|
!isOperationLegalOrCustomOrPromote(ISD::AND, VT) \|\|		!isOperationLegalOrCustomOrPromote(ISD::AND, VT) \|\|
!isOperationLegalOrCustomOrPromote(ISD::XOR, VT)))		!isOperationLegalOrCustomOrPromote(ISD::XOR, VT)))
return SDValue();		return SDValue();

		// Emit Table Lookup if ISD::CTLZ and ISD::CTPOP are not legal.
		if (!VT.isVector() && isOperationExpand(ISD::CTPOP, VT) &&
		craig.topperUnsubmitted Not Done Reply Inline Actions Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table lookup. For CTTZ we need a select. CodeGenPrepare rewrites llvm.cttz(i32 %x, i1 false) into a branch around llvm.cttz(i32 %x, i1 true) on some targets. So the difference might be hard to test. craig.topper: Should this be checking CTTZ_ZERO_UNDEF? The zero case is not handled correctly by the table…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Can you point me to target and test where above scenario occurs? I will update it accordingly. gsocshubham: Can you point me to target and test where above scenario occurs? I will update it accordingly.
		craig.topperUnsubmitted Not Done Reply Inline Actions Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen prepare. craig.topper: Compiling llvm.cttz.i32 for riscv32 with -O0 should do it I think. -O0 will disable codegen…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions @craig.topper - What is the desired output value from the table when x=0? define i32 @f(i32 %x) { entry: %0 = call i32 @llvm.cttz.i32(i32 %x, i1 true) ret i32 %0 } gsocshubham: @craig.topper - What is the desired output value from the table when x=0? ``` define i32 @f…
		craig.topperUnsubmitted Not Done Reply Inline Actions With `i1 true` there is no desired out put for 0. I'm concerned about `i1 false` which needs to produce 32. craig.topper: With `i1 true` there is no desired out put for 0. I'm concerned about `i1 false` which needs to…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Regarding this check - `!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT)` I am not sure whether these checks are complete to allow new lowering. I have these 3 failures currently - LLVM :: CodeGen/AVR/cttz.ll LLVM :: CodeGen/RISCV/ctlz-cttz-ctpop.ll LLVM :: CodeGen/RISCV/rv64zbb.ll When I update these tests using `utils/update_llc_test_checks.py`, it crashes. Does it mean the new lowering is generating wrong code for `AVR` and `RISCV`? Or is it like I am lowering those cases which should not be lowered in first place. Like adding some more checks in above `if()` block. gsocshubham: Regarding this check - `!VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !
		!isOperationLegal(ISD::CTLZ, VT))
		if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt))
		craig.topperUnsubmitted Not Done Reply Inline Actions This needs to be if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt)) return V; Because there is an early out in CTTZTableLookup for types other than i32/i64. craig.topper: This needs to be ``` if (SDValue V = CTTZTableLookup(Node, DAG, dl, VT, Op, NumBitsPerElt))…
		craig.topperUnsubmitted Not Done Reply Inline Actions That will fix the AVR failure craig.topper: That will fix the AVR failure
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated as above. gsocshubham: Updated as above.
		return V;

// for now, we use: { return popcount(~x & (x - 1)); }		// for now, we use: { return popcount(~x & (x - 1)); }
// unless the target has ctlz but not ctpop, in which case we use:		// unless the target has ctlz but not ctpop, in which case we use:
// { return 32 - nlz(~x & (x-1)); }		// { return 32 - nlz(~x & (x-1)); }
// Ref: "Hacker's Delight" by Henry Warren		// Ref: "Hacker's Delight" by Henry Warren
SDValue Tmp = DAG.getNode(		SDValue Tmp = DAG.getNode(
ISD::AND, dl, VT, DAG.getNOT(dl, Op, VT),		ISD::AND, dl, VT, DAG.getNOT(dl, Op, VT),
DAG.getNode(ISD::SUB, dl, VT, Op, DAG.getConstant(1, dl, VT)));		DAG.getNode(ISD::SUB, dl, VT, Op, DAG.getConstant(1, dl, VT)));

// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.		// If ISD::CTLZ is legal and CTPOP isn't, then do that instead.
if (isOperationLegal(ISD::CTLZ, VT) && !isOperationLegal(ISD::CTPOP, VT)) {		if (isOperationLegal(ISD::CTLZ, VT) && !isOperationLegal(ISD::CTPOP, VT)) {
return DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(NumBitsPerElt, dl, VT),		return DAG.getNode(ISD::SUB, dl, VT, DAG.getConstant(NumBitsPerElt, dl, VT),
DAG.getNode(ISD::CTLZ, dl, VT, Tmp));		DAG.getNode(ISD::CTLZ, dl, VT, Tmp));
}		}

return DAG.getNode(ISD::CTPOP, dl, VT, Tmp);		return DAG.getNode(ISD::CTPOP, dl, VT, Tmp);
		barannikov88Unsubmitted Not Done Reply Inline Actions I think your code should be moved a little higher, before `SDValue Tmp =` . Otherwise the `Tmp` node will be unused if your optimization has been applied. The check will probably need some corrections. Note the comments above, they may help deduce the correct condition. Would also be better if you extract your implementation into separate function. barannikov88: I think your code should be moved a little higher, before `SDValue Tmp = `. Otherwise the `Tmp`…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Moved code above `SDValue Tmp =` and added into a separate function. I am using below check to do the optimization - if (NumBitsPerElt == 32 && !VT.isVector() && TLI.isOperationExpand(ISD::CTPOP, VT) && !isOperationLegal(ISD::CTLZ, VT)) gsocshubham: Moved code above `SDValue Tmp = ` and added into a separate function. I am using below check…
}		}

SDValue TargetLowering::expandABS(SDNode *N, SelectionDAG &DAG,		SDValue TargetLowering::expandABS(SDNode *N, SelectionDAG &DAG,
		barannikov88Unsubmitted Not Done Reply Inline Actions `getRawData` is very low level method, you should be able to easily avoid it. barannikov88: `getRawData` is very low level method, you should be able to easily avoid it.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. Now I am using - `getZExtValue()` gsocshubham: Done. Now I am using - `getZExtValue()`
bool IsNegative) const {		bool IsNegative) const {
SDLoc dl(N);		SDLoc dl(N);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
		barannikov88Unsubmitted Not Done Reply Inline Actions Be sure to clang-format your changes before submitting a patch. This is required both by the coding style and by the contribution guidelines. barannikov88: Be sure to clang-format your changes before submitting a patch. This is required both by the…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I have formatted code using - `$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i --lines=7866:7872` gsocshubham: I have formatted code using - `$INSTALL/bin/clang-format -style=LLVM TargetLowering.cpp -i…
EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());		EVT ShVT = getShiftAmountTy(VT, DAG.getDataLayout());
SDValue Op = N->getOperand(0);		SDValue Op = N->getOperand(0);
		barannikov88Unsubmitted Not Done Reply Inline Actions Ditto, don't use `getRawData`. barannikov88: Ditto, don't use `getRawData`.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated to `getZExtValue()` gsocshubham: Updated to `getZExtValue()`

// abs(x) -> smax(x,sub(0,x))		// abs(x) -> smax(x,sub(0,x))
if (!IsNegative && isOperationLegal(ISD::SUB, VT) &&		if (!IsNegative && isOperationLegal(ISD::SUB, VT) &&
isOperationLegal(ISD::SMAX, VT)) {		isOperationLegal(ISD::SMAX, VT)) {
		barannikov88Unsubmitted Not Done Reply Inline Actions There are always exactly 32 elements, you can use plain C array and avoid dynamic memory allocation at all. This is a bit of hard-code though, so SmallVector with 32 minimum size might be better. barannikov88: There are always exactly 32 elements, you can use plain C array and avoid dynamic memory…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I am using plain C array and got rid of dynamic memory allocation error. gsocshubham: I am using plain C array and got rid of dynamic memory allocation error.
SDValue Zero = DAG.getConstant(0, dl, VT);		SDValue Zero = DAG.getConstant(0, dl, VT);
return DAG.getNode(ISD::SMAX, dl, VT, Op,		return DAG.getNode(ISD::SMAX, dl, VT, Op,
DAG.getNode(ISD::SUB, dl, VT, Zero, Op));		DAG.getNode(ISD::SUB, dl, VT, Zero, Op));
}		}
		barannikov88Unsubmitted Not Done Reply Inline Actions Ditto here and on the next line. `APInt Lshr = DeBruijn.rotl(i)` should do. barannikov88: Ditto here and on the next line. `APInt Lshr = DeBruijn.rotl(i)` should do.
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Done. gsocshubham: Done.

// abs(x) -> umin(x,sub(0,x))		// abs(x) -> umin(x,sub(0,x))
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Regarding this line - `RshrArr[Rshr] = i;` In debug mode, when I run `SPARC/cttz.ll` which is present in this patch, I get a segmentation fault. but in release it is working fine. Why is there a difference in the behaviour? gsocshubham: Regarding this line - `RshrArr[Rshr] = i;` In debug mode, when I run `SPARC/cttz.ll` which is…
		barannikov88Unsubmitted Not Done Reply Inline Actions Most probably there is a bug in your code. Most probably you haven't enabled assertions in release mode (-DLLVM_ENABLE_ASSERTIONS=ON is the default in debug mode). You should be able to find some hints in the printed backtrace, but first make sure that `llvm-symbolizer` target is built. barannikov88: Most probably there is a bug in your code. Most probably you haven't enabled assertions in…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Thanks. `-DLLVM_ENABLE_ASSERTIONS=ON` this helped. gsocshubham: Thanks. `-DLLVM_ENABLE_ASSERTIONS=ON` this helped.
if (!IsNegative && isOperationLegal(ISD::SUB, VT) &&		if (!IsNegative && isOperationLegal(ISD::SUB, VT) &&
isOperationLegal(ISD::UMIN, VT)) {		isOperationLegal(ISD::UMIN, VT)) {
SDValue Zero = DAG.getConstant(0, dl, VT);		SDValue Zero = DAG.getConstant(0, dl, VT);
Op = DAG.getFreeze(Op);		Op = DAG.getFreeze(Op);
return DAG.getNode(ISD::UMIN, dl, VT, Op,		return DAG.getNode(ISD::UMIN, dl, VT, Op,
DAG.getNode(ISD::SUB, dl, VT, Zero, Op));		DAG.getNode(ISD::SUB, dl, VT, Zero, Op));
}		}

// 0 - abs(x) -> smin(x, sub(0,x))		// 0 - abs(x) -> smin(x, sub(0,x))
if (IsNegative && isOperationLegal(ISD::SUB, VT) &&		if (IsNegative && isOperationLegal(ISD::SUB, VT) &&
isOperationLegal(ISD::SMIN, VT)) {		isOperationLegal(ISD::SMIN, VT)) {
Op = DAG.getFreeze(Op);		Op = DAG.getFreeze(Op);
SDValue Zero = DAG.getConstant(0, dl, VT);		SDValue Zero = DAG.getConstant(0, dl, VT);
return DAG.getNode(ISD::SMIN, dl, VT, Op,		return DAG.getNode(ISD::SMIN, dl, VT, Op,
		barannikov88Unsubmitted Not Done Reply Inline Actions You should use the alignment requirement of the array (i.e. CA), not of its element. They may differ. barannikov88: You should use the alignment requirement of the array (i.e. CA), not of its element. They may…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment or array alignment? If I use array alignment `CA`, I get below assembly as compared to `SPARC/cttz.ll` assembly if array element alignment is used. What do you think? Should I update from `CPIdx` to `CA`? f: ! @f .cfi_startproc ! %bb.0: ! %entry mov %o0, %o1 cmp %o0, 0 be .LBB0_2 mov %g0, %o0 ! %bb.1: ! %entry sub %o0, %o1, %o0 and %o1, %o0, %o0 sethi 122669, %o1 or %o1, 305, %o1 smul %o0, %o1, %o0 srl %o0, 27, %o0 sethi %hi(.LCPI0_0), %o1 add %o1, %lo(.LCPI0_0), %o1 add %o1, %o0, %o2 ldub [%o2+2], %o3 ldub [%o2+3], %o4 ldub [%o1+%o0], %o0 ldub [%o2+1], %o1 sll %o3, 8, %o2 or %o2, %o4, %o2 sll %o0, 8, %o0 or %o0, %o1, %o0 sll %o0, 16, %o0 or %o0, %o2, %o0 gsocshubham: From the assembly dump of SPARC/cttz.ll, I am not sure whether to use array element alignment…
		barannikov88Unsubmitted Not Done Reply Inline Actions I meant you should call `TD.getPrefTypeAlign(Elts->getType())` instead of `TD.getPrefTypeAlign(Elts[0]->getType()` Is the above assembly a result of such change, or did you do something different? barannikov88: I meant you should call `TD.getPrefTypeAlign(Elts->getType())` instead of `TD.getPrefTypeAlign…
		gsocshubhamAuthorUnsubmitted Done Reply Inline Actions I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am taking alignment of an element from `RshrArr`. Is it fine? gsocshubham: I did something differently. But now it seems fine. Now, we don't have `Elts` and hence I am…
DAG.getNode(ISD::SUB, dl, VT, Zero, Op));		DAG.getNode(ISD::SUB, dl, VT, Zero, Op));
}		}

// Only expand vector types if we have the appropriate vector operations.		// Only expand vector types if we have the appropriate vector operations.
if (VT.isVector() &&		if (VT.isVector() &&
(!isOperationLegalOrCustom(ISD::SRA, VT) \|\|		(!isOperationLegalOrCustom(ISD::SRA, VT) \|\|
(!IsNegative && !isOperationLegalOrCustom(ISD::ADD, VT)) \|\|		(!IsNegative && !isOperationLegalOrCustom(ISD::ADD, VT)) \|\|
(IsNegative && !isOperationLegalOrCustom(ISD::SUB, VT)) \|\|		(IsNegative && !isOperationLegalOrCustom(ISD::SUB, VT)) \|\|
▲ Show 20 Lines • Show All 1,796 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/cttz.ll

	Show All 17 Lines
	; CHECK-NEXT: orr r0, r0, #256			; CHECK-NEXT: orr r0, r0, #256
	; CHECK-NEXT: rbit r0, r0			; CHECK-NEXT: rbit r0, r0
	; CHECK-NEXT: clz r0, r0			; CHECK-NEXT: clz r0, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i8:			; CHECK-THUMB-LABEL: test_i8:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: lsls r1, r0, #24			; CHECK-THUMB-NEXT: lsls r1, r0, #24
	; CHECK-THUMB-NEXT: beq .LBB0_2			; CHECK-THUMB-NEXT: beq .LBB0_3
	; CHECK-THUMB-NEXT: @ %bb.1: @ %cond.false			; CHECK-THUMB-NEXT: @ %bb.1: @ %cond.false
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: beq .LBB0_4
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: @ %bb.2: @ %cond.false
	; CHECK-THUMB-NEXT: ldr r2, .LCPI0_0			; CHECK-THUMB-NEXT: rsbs r1, r0, #0
	; CHECK-THUMB-NEXT: ands r2, r0
	; CHECK-THUMB-NEXT: subs r0, r1, r2
	; CHECK-THUMB-NEXT: ldr r1, .LCPI0_1
	; CHECK-THUMB-NEXT: lsrs r2, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r1
	; CHECK-THUMB-NEXT: ands r2, r1
	; CHECK-THUMB-NEXT: adds r0, r0, r2
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ldr r1, .LCPI0_2
	; CHECK-THUMB-NEXT: ands r1, r0			; CHECK-THUMB-NEXT: ands r1, r0
	; CHECK-THUMB-NEXT: ldr r0, .LCPI0_3			; CHECK-THUMB-NEXT: ldr r0, .LCPI0_0
	; CHECK-THUMB-NEXT: muls r0, r1, r0			; CHECK-THUMB-NEXT: muls r0, r1, r0
	; CHECK-THUMB-NEXT: lsrs r0, r0, #24			; CHECK-THUMB-NEXT: lsrs r0, r0, #27
				; CHECK-THUMB-NEXT: adr r1, .LCPI0_1
				; CHECK-THUMB-NEXT: ldrb r0, [r1, r0]
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .LBB0_2:			; CHECK-THUMB-NEXT: .LBB0_3:
	; CHECK-THUMB-NEXT: movs r0, #8			; CHECK-THUMB-NEXT: movs r0, #8
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
				; CHECK-THUMB-NEXT: .LBB0_4:
				; CHECK-THUMB-NEXT: movs r0, #32
				; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.3:			; CHECK-THUMB-NEXT: @ %bb.5:
	; CHECK-THUMB-NEXT: .LCPI0_0:			; CHECK-THUMB-NEXT: .LCPI0_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI0_1:			; CHECK-THUMB-NEXT: .LCPI0_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI0_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI0_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i8 @llvm.cttz.i8(i8 %a, i1 false)			%tmp = call i8 @llvm.cttz.i8(i8 %a, i1 false)
	ret i8 %tmp			ret i8 %tmp
	}			}

	define i16 @test_i16(i16 %a) {			define i16 @test_i16(i16 %a) {
	; CHECK-LABEL: test_i16:			; CHECK-LABEL: test_i16:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: orr r0, r0, #65536			; CHECK-NEXT: orr r0, r0, #65536
	; CHECK-NEXT: rbit r0, r0			; CHECK-NEXT: rbit r0, r0
	; CHECK-NEXT: clz r0, r0			; CHECK-NEXT: clz r0, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i16:			; CHECK-THUMB-LABEL: test_i16:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: lsls r1, r0, #16			; CHECK-THUMB-NEXT: lsls r1, r0, #16
	; CHECK-THUMB-NEXT: beq .LBB1_2			; CHECK-THUMB-NEXT: beq .LBB1_3
	; CHECK-THUMB-NEXT: @ %bb.1: @ %cond.false			; CHECK-THUMB-NEXT: @ %bb.1: @ %cond.false
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: beq .LBB1_4
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: @ %bb.2: @ %cond.false
	; CHECK-THUMB-NEXT: ldr r2, .LCPI1_0			; CHECK-THUMB-NEXT: rsbs r1, r0, #0
	; CHECK-THUMB-NEXT: ands r2, r0
	; CHECK-THUMB-NEXT: subs r0, r1, r2
	; CHECK-THUMB-NEXT: ldr r1, .LCPI1_1
	; CHECK-THUMB-NEXT: lsrs r2, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r1
	; CHECK-THUMB-NEXT: ands r2, r1
	; CHECK-THUMB-NEXT: adds r0, r0, r2
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ldr r1, .LCPI1_2
	; CHECK-THUMB-NEXT: ands r1, r0			; CHECK-THUMB-NEXT: ands r1, r0
	; CHECK-THUMB-NEXT: ldr r0, .LCPI1_3			; CHECK-THUMB-NEXT: ldr r0, .LCPI1_0
	; CHECK-THUMB-NEXT: muls r0, r1, r0			; CHECK-THUMB-NEXT: muls r0, r1, r0
	; CHECK-THUMB-NEXT: lsrs r0, r0, #24			; CHECK-THUMB-NEXT: lsrs r0, r0, #27
				; CHECK-THUMB-NEXT: adr r1, .LCPI1_1
				; CHECK-THUMB-NEXT: ldrb r0, [r1, r0]
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .LBB1_2:			; CHECK-THUMB-NEXT: .LBB1_3:
	; CHECK-THUMB-NEXT: movs r0, #16			; CHECK-THUMB-NEXT: movs r0, #16
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
				; CHECK-THUMB-NEXT: .LBB1_4:
				; CHECK-THUMB-NEXT: movs r0, #32
				; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.3:			; CHECK-THUMB-NEXT: @ %bb.5:
	; CHECK-THUMB-NEXT: .LCPI1_0:			; CHECK-THUMB-NEXT: .LCPI1_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI1_1:			; CHECK-THUMB-NEXT: .LCPI1_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI1_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI1_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i16 @llvm.cttz.i16(i16 %a, i1 false)			%tmp = call i16 @llvm.cttz.i16(i16 %a, i1 false)
	ret i16 %tmp			ret i16 %tmp
	}			}

	define i32 @test_i32(i32 %a) {			define i32 @test_i32(i32 %a) {
	; CHECK-LABEL: test_i32:			; CHECK-LABEL: test_i32:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: rbit r0, r0			; CHECK-NEXT: rbit r0, r0
	; CHECK-NEXT: clz r0, r0			; CHECK-NEXT: clz r0, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i32:			; CHECK-THUMB-LABEL: test_i32:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: cmp r0, #0			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: beq .LBB2_2			; CHECK-THUMB-NEXT: beq .LBB2_3
	; CHECK-THUMB-NEXT: @ %bb.1: @ %cond.false			; CHECK-THUMB-NEXT: @ %bb.1: @ %cond.false
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: beq .LBB2_3
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: @ %bb.2: @ %cond.false
	; CHECK-THUMB-NEXT: ldr r2, .LCPI2_0			; CHECK-THUMB-NEXT: rsbs r1, r0, #0
	; CHECK-THUMB-NEXT: ands r2, r0
	; CHECK-THUMB-NEXT: subs r0, r1, r2
	; CHECK-THUMB-NEXT: ldr r1, .LCPI2_1
	; CHECK-THUMB-NEXT: lsrs r2, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r1
	; CHECK-THUMB-NEXT: ands r2, r1
	; CHECK-THUMB-NEXT: adds r0, r0, r2
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ldr r1, .LCPI2_2
	; CHECK-THUMB-NEXT: ands r1, r0			; CHECK-THUMB-NEXT: ands r1, r0
	; CHECK-THUMB-NEXT: ldr r0, .LCPI2_3			; CHECK-THUMB-NEXT: ldr r0, .LCPI2_0
	; CHECK-THUMB-NEXT: muls r0, r1, r0			; CHECK-THUMB-NEXT: muls r0, r1, r0
	; CHECK-THUMB-NEXT: lsrs r0, r0, #24			; CHECK-THUMB-NEXT: lsrs r0, r0, #27
				; CHECK-THUMB-NEXT: adr r1, .LCPI2_1
				; CHECK-THUMB-NEXT: ldrb r0, [r1, r0]
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .LBB2_2:			; CHECK-THUMB-NEXT: .LBB2_3:
	; CHECK-THUMB-NEXT: movs r0, #32			; CHECK-THUMB-NEXT: movs r0, #32
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.3:			; CHECK-THUMB-NEXT: @ %bb.4:
	; CHECK-THUMB-NEXT: .LCPI2_0:			; CHECK-THUMB-NEXT: .LCPI2_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI2_1:			; CHECK-THUMB-NEXT: .LCPI2_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI2_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI2_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i32 @llvm.cttz.i32(i32 %a, i1 false)			%tmp = call i32 @llvm.cttz.i32(i32 %a, i1 false)
	ret i32 %tmp			ret i32 %tmp
	}			}

	define i64 @test_i64(i64 %a) {			define i64 @test_i64(i64 %a) {
	; CHECK-LABEL: test_i64:			; CHECK-LABEL: test_i64:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: rbit r1, r1			; CHECK-NEXT: rbit r1, r1
	; CHECK-NEXT: rbit r2, r0			; CHECK-NEXT: rbit r2, r0
	; CHECK-NEXT: clz r1, r1			; CHECK-NEXT: clz r1, r1
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: add r1, r1, #32			; CHECK-NEXT: add r1, r1, #32
	; CHECK-NEXT: clzne r1, r2			; CHECK-NEXT: clzne r1, r2
	; CHECK-NEXT: mov r0, r1			; CHECK-NEXT: mov r0, r1
	; CHECK-NEXT: mov r1, #0			; CHECK-NEXT: mov r1, #0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i64:			; CHECK-THUMB-LABEL: test_i64:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: .save {r4, r5, r7, lr}			; CHECK-THUMB-NEXT: .save {r4, r5, r7, lr}
	; CHECK-THUMB-NEXT: push {r4, r5, r7, lr}			; CHECK-THUMB-NEXT: push {r4, r5, r7, lr}
	; CHECK-THUMB-NEXT: ldr r5, .LCPI3_0			; CHECK-THUMB-NEXT: ldr r5, .LCPI3_0
	; CHECK-THUMB-NEXT: ldr r4, .LCPI3_1			; CHECK-THUMB-NEXT: adr r4, .LCPI3_1
	; CHECK-THUMB-NEXT: ldr r3, .LCPI3_2			; CHECK-THUMB-NEXT: movs r3, #32
	; CHECK-THUMB-NEXT: ldr r2, .LCPI3_3
	; CHECK-THUMB-NEXT: cmp r0, #0			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bne .LBB3_2			; CHECK-THUMB-NEXT: mov r2, r3
				; CHECK-THUMB-NEXT: bne .LBB3_5
	; CHECK-THUMB-NEXT: @ %bb.1:			; CHECK-THUMB-NEXT: @ %bb.1:
	; CHECK-THUMB-NEXT: subs r0, r1, #1			; CHECK-THUMB-NEXT: cmp r1, #0
	; CHECK-THUMB-NEXT: bics r0, r1			; CHECK-THUMB-NEXT: bne .LBB3_6
	; CHECK-THUMB-NEXT: lsrs r1, r0, #1
	; CHECK-THUMB-NEXT: ands r1, r5
	; CHECK-THUMB-NEXT: subs r0, r0, r1
	; CHECK-THUMB-NEXT: lsrs r1, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r4
	; CHECK-THUMB-NEXT: ands r1, r4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ands r0, r3
	; CHECK-THUMB-NEXT: muls r2, r0, r2
	; CHECK-THUMB-NEXT: lsrs r0, r2, #24
	; CHECK-THUMB-NEXT: adds r0, #32
	; CHECK-THUMB-NEXT: movs r1, #0
	; CHECK-THUMB-NEXT: pop {r4, r5, r7, pc}
	; CHECK-THUMB-NEXT: .LBB3_2:			; CHECK-THUMB-NEXT: .LBB3_2:
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: bne .LBB3_4
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: .LBB3_3:
	; CHECK-THUMB-NEXT: ands r0, r5			; CHECK-THUMB-NEXT: adds r3, #32
	; CHECK-THUMB-NEXT: subs r0, r1, r0			; CHECK-THUMB-NEXT: mov r2, r3
	; CHECK-THUMB-NEXT: lsrs r1, r0, #2			; CHECK-THUMB-NEXT: .LBB3_4:
	; CHECK-THUMB-NEXT: ands r0, r4
	; CHECK-THUMB-NEXT: ands r1, r4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ands r0, r3
	; CHECK-THUMB-NEXT: muls r2, r0, r2
	; CHECK-THUMB-NEXT: lsrs r0, r2, #24
	; CHECK-THUMB-NEXT: movs r1, #0			; CHECK-THUMB-NEXT: movs r1, #0
				; CHECK-THUMB-NEXT: mov r0, r2
	; CHECK-THUMB-NEXT: pop {r4, r5, r7, pc}			; CHECK-THUMB-NEXT: pop {r4, r5, r7, pc}
				; CHECK-THUMB-NEXT: .LBB3_5:
				; CHECK-THUMB-NEXT: rsbs r2, r0, #0
				; CHECK-THUMB-NEXT: ands r2, r0
				; CHECK-THUMB-NEXT: muls r2, r5, r2
				; CHECK-THUMB-NEXT: lsrs r2, r2, #27
				; CHECK-THUMB-NEXT: ldrb r2, [r4, r2]
				; CHECK-THUMB-NEXT: cmp r1, #0
				; CHECK-THUMB-NEXT: beq .LBB3_2
				; CHECK-THUMB-NEXT: .LBB3_6:
				; CHECK-THUMB-NEXT: rsbs r3, r1, #0
				; CHECK-THUMB-NEXT: ands r3, r1
				; CHECK-THUMB-NEXT: muls r5, r3, r5
				; CHECK-THUMB-NEXT: lsrs r1, r5, #27
				; CHECK-THUMB-NEXT: ldrb r3, [r4, r1]
				; CHECK-THUMB-NEXT: cmp r0, #0
				; CHECK-THUMB-NEXT: beq .LBB3_3
				; CHECK-THUMB-NEXT: b .LBB3_4
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.3:			; CHECK-THUMB-NEXT: @ %bb.7:
	; CHECK-THUMB-NEXT: .LCPI3_0:			; CHECK-THUMB-NEXT: .LCPI3_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI3_1:			; CHECK-THUMB-NEXT: .LCPI3_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI3_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI3_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i64 @llvm.cttz.i64(i64 %a, i1 false)			%tmp = call i64 @llvm.cttz.i64(i64 %a, i1 false)
	ret i64 %tmp			ret i64 %tmp
	}			}

	;------------------------------------------------------------------------------			;------------------------------------------------------------------------------

	define i8 @test_i8_zero_undef(i8 %a) {			define i8 @test_i8_zero_undef(i8 %a) {
	; CHECK-LABEL: test_i8_zero_undef:			; CHECK-LABEL: test_i8_zero_undef:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: rbit r0, r0			; CHECK-NEXT: rbit r0, r0
	; CHECK-NEXT: clz r0, r0			; CHECK-NEXT: clz r0, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i8_zero_undef:			; CHECK-THUMB-LABEL: test_i8_zero_undef:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: beq .LBB4_2
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: @ %bb.1:
	; CHECK-THUMB-NEXT: ldr r2, .LCPI4_0			; CHECK-THUMB-NEXT: rsbs r1, r0, #0
	; CHECK-THUMB-NEXT: ands r2, r0
	; CHECK-THUMB-NEXT: subs r0, r1, r2
	; CHECK-THUMB-NEXT: ldr r1, .LCPI4_1
	; CHECK-THUMB-NEXT: lsrs r2, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r1
	; CHECK-THUMB-NEXT: ands r2, r1
	; CHECK-THUMB-NEXT: adds r0, r0, r2
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ldr r1, .LCPI4_2
	; CHECK-THUMB-NEXT: ands r1, r0			; CHECK-THUMB-NEXT: ands r1, r0
	; CHECK-THUMB-NEXT: ldr r0, .LCPI4_3			; CHECK-THUMB-NEXT: ldr r0, .LCPI4_0
	; CHECK-THUMB-NEXT: muls r0, r1, r0			; CHECK-THUMB-NEXT: muls r0, r1, r0
	; CHECK-THUMB-NEXT: lsrs r0, r0, #24			; CHECK-THUMB-NEXT: lsrs r0, r0, #27
				; CHECK-THUMB-NEXT: adr r1, .LCPI4_1
				; CHECK-THUMB-NEXT: ldrb r0, [r1, r0]
				; CHECK-THUMB-NEXT: bx lr
				; CHECK-THUMB-NEXT: .LBB4_2:
				; CHECK-THUMB-NEXT: movs r0, #32
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.1:			; CHECK-THUMB-NEXT: @ %bb.3:
	; CHECK-THUMB-NEXT: .LCPI4_0:			; CHECK-THUMB-NEXT: .LCPI4_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI4_1:			; CHECK-THUMB-NEXT: .LCPI4_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI4_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI4_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i8 @llvm.cttz.i8(i8 %a, i1 true)			%tmp = call i8 @llvm.cttz.i8(i8 %a, i1 true)
	ret i8 %tmp			ret i8 %tmp
	}			}

	define i16 @test_i16_zero_undef(i16 %a) {			define i16 @test_i16_zero_undef(i16 %a) {
	; CHECK-LABEL: test_i16_zero_undef:			; CHECK-LABEL: test_i16_zero_undef:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: rbit r0, r0			; CHECK-NEXT: rbit r0, r0
	; CHECK-NEXT: clz r0, r0			; CHECK-NEXT: clz r0, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i16_zero_undef:			; CHECK-THUMB-LABEL: test_i16_zero_undef:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: beq .LBB5_2
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: @ %bb.1:
	; CHECK-THUMB-NEXT: ldr r2, .LCPI5_0			; CHECK-THUMB-NEXT: rsbs r1, r0, #0
	; CHECK-THUMB-NEXT: ands r2, r0
	; CHECK-THUMB-NEXT: subs r0, r1, r2
	; CHECK-THUMB-NEXT: ldr r1, .LCPI5_1
	; CHECK-THUMB-NEXT: lsrs r2, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r1
	; CHECK-THUMB-NEXT: ands r2, r1
	; CHECK-THUMB-NEXT: adds r0, r0, r2
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ldr r1, .LCPI5_2
	; CHECK-THUMB-NEXT: ands r1, r0			; CHECK-THUMB-NEXT: ands r1, r0
	; CHECK-THUMB-NEXT: ldr r0, .LCPI5_3			; CHECK-THUMB-NEXT: ldr r0, .LCPI5_0
	; CHECK-THUMB-NEXT: muls r0, r1, r0			; CHECK-THUMB-NEXT: muls r0, r1, r0
	; CHECK-THUMB-NEXT: lsrs r0, r0, #24			; CHECK-THUMB-NEXT: lsrs r0, r0, #27
				; CHECK-THUMB-NEXT: adr r1, .LCPI5_1
				; CHECK-THUMB-NEXT: ldrb r0, [r1, r0]
				; CHECK-THUMB-NEXT: bx lr
				; CHECK-THUMB-NEXT: .LBB5_2:
				; CHECK-THUMB-NEXT: movs r0, #32
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.1:			; CHECK-THUMB-NEXT: @ %bb.3:
	; CHECK-THUMB-NEXT: .LCPI5_0:			; CHECK-THUMB-NEXT: .LCPI5_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI5_1:			; CHECK-THUMB-NEXT: .LCPI5_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI5_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI5_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i16 @llvm.cttz.i16(i16 %a, i1 true)			%tmp = call i16 @llvm.cttz.i16(i16 %a, i1 true)
	ret i16 %tmp			ret i16 %tmp
	}			}


	define i32 @test_i32_zero_undef(i32 %a) {			define i32 @test_i32_zero_undef(i32 %a) {
	; CHECK-LABEL: test_i32_zero_undef:			; CHECK-LABEL: test_i32_zero_undef:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: rbit r0, r0			; CHECK-NEXT: rbit r0, r0
	; CHECK-NEXT: clz r0, r0			; CHECK-NEXT: clz r0, r0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i32_zero_undef:			; CHECK-THUMB-LABEL: test_i32_zero_undef:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: beq .LBB6_2
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: @ %bb.1:
	; CHECK-THUMB-NEXT: ldr r2, .LCPI6_0			; CHECK-THUMB-NEXT: rsbs r1, r0, #0
	; CHECK-THUMB-NEXT: ands r2, r0
	; CHECK-THUMB-NEXT: subs r0, r1, r2
	; CHECK-THUMB-NEXT: ldr r1, .LCPI6_1
	; CHECK-THUMB-NEXT: lsrs r2, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r1
	; CHECK-THUMB-NEXT: ands r2, r1
	; CHECK-THUMB-NEXT: adds r0, r0, r2
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ldr r1, .LCPI6_2
	; CHECK-THUMB-NEXT: ands r1, r0			; CHECK-THUMB-NEXT: ands r1, r0
	; CHECK-THUMB-NEXT: ldr r0, .LCPI6_3			; CHECK-THUMB-NEXT: ldr r0, .LCPI6_0
	; CHECK-THUMB-NEXT: muls r0, r1, r0			; CHECK-THUMB-NEXT: muls r0, r1, r0
	; CHECK-THUMB-NEXT: lsrs r0, r0, #24			; CHECK-THUMB-NEXT: lsrs r0, r0, #27
				; CHECK-THUMB-NEXT: adr r1, .LCPI6_1
				; CHECK-THUMB-NEXT: ldrb r0, [r1, r0]
				; CHECK-THUMB-NEXT: bx lr
				; CHECK-THUMB-NEXT: .LBB6_2:
				; CHECK-THUMB-NEXT: movs r0, #32
	; CHECK-THUMB-NEXT: bx lr			; CHECK-THUMB-NEXT: bx lr
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.1:			; CHECK-THUMB-NEXT: @ %bb.3:
	; CHECK-THUMB-NEXT: .LCPI6_0:			; CHECK-THUMB-NEXT: .LCPI6_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI6_1:			; CHECK-THUMB-NEXT: .LCPI6_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI6_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI6_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i32 @llvm.cttz.i32(i32 %a, i1 true)			%tmp = call i32 @llvm.cttz.i32(i32 %a, i1 true)
	ret i32 %tmp			ret i32 %tmp
	}			}

	define i64 @test_i64_zero_undef(i64 %a) {			define i64 @test_i64_zero_undef(i64 %a) {
	; CHECK-LABEL: test_i64_zero_undef:			; CHECK-LABEL: test_i64_zero_undef:
	; CHECK: @ %bb.0:			; CHECK: @ %bb.0:
	; CHECK-NEXT: rbit r1, r1			; CHECK-NEXT: rbit r1, r1
	; CHECK-NEXT: rbit r2, r0			; CHECK-NEXT: rbit r2, r0
	; CHECK-NEXT: clz r1, r1			; CHECK-NEXT: clz r1, r1
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: add r1, r1, #32			; CHECK-NEXT: add r1, r1, #32
	; CHECK-NEXT: clzne r1, r2			; CHECK-NEXT: clzne r1, r2
	; CHECK-NEXT: mov r0, r1			; CHECK-NEXT: mov r0, r1
	; CHECK-NEXT: mov r1, #0			; CHECK-NEXT: mov r1, #0
	; CHECK-NEXT: bx lr			; CHECK-NEXT: bx lr
	;			;
	; CHECK-THUMB-LABEL: test_i64_zero_undef:			; CHECK-THUMB-LABEL: test_i64_zero_undef:
	; CHECK-THUMB: @ %bb.0:			; CHECK-THUMB: @ %bb.0:
	; CHECK-THUMB-NEXT: .save {r4, r5, r7, lr}			; CHECK-THUMB-NEXT: .save {r4, r5, r7, lr}
	; CHECK-THUMB-NEXT: push {r4, r5, r7, lr}			; CHECK-THUMB-NEXT: push {r4, r5, r7, lr}
	; CHECK-THUMB-NEXT: ldr r5, .LCPI7_0			; CHECK-THUMB-NEXT: ldr r5, .LCPI7_0
	; CHECK-THUMB-NEXT: ldr r4, .LCPI7_1			; CHECK-THUMB-NEXT: adr r4, .LCPI7_1
	; CHECK-THUMB-NEXT: ldr r3, .LCPI7_2			; CHECK-THUMB-NEXT: movs r3, #32
	; CHECK-THUMB-NEXT: ldr r2, .LCPI7_3
	; CHECK-THUMB-NEXT: cmp r0, #0			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bne .LBB7_2			; CHECK-THUMB-NEXT: mov r2, r3
				; CHECK-THUMB-NEXT: bne .LBB7_5
	; CHECK-THUMB-NEXT: @ %bb.1:			; CHECK-THUMB-NEXT: @ %bb.1:
	; CHECK-THUMB-NEXT: subs r0, r1, #1			; CHECK-THUMB-NEXT: cmp r1, #0
	; CHECK-THUMB-NEXT: bics r0, r1			; CHECK-THUMB-NEXT: bne .LBB7_6
	; CHECK-THUMB-NEXT: lsrs r1, r0, #1
	; CHECK-THUMB-NEXT: ands r1, r5
	; CHECK-THUMB-NEXT: subs r0, r0, r1
	; CHECK-THUMB-NEXT: lsrs r1, r0, #2
	; CHECK-THUMB-NEXT: ands r0, r4
	; CHECK-THUMB-NEXT: ands r1, r4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ands r0, r3
	; CHECK-THUMB-NEXT: muls r2, r0, r2
	; CHECK-THUMB-NEXT: lsrs r0, r2, #24
	; CHECK-THUMB-NEXT: adds r0, #32
	; CHECK-THUMB-NEXT: movs r1, #0
	; CHECK-THUMB-NEXT: pop {r4, r5, r7, pc}
	; CHECK-THUMB-NEXT: .LBB7_2:			; CHECK-THUMB-NEXT: .LBB7_2:
	; CHECK-THUMB-NEXT: subs r1, r0, #1			; CHECK-THUMB-NEXT: cmp r0, #0
	; CHECK-THUMB-NEXT: bics r1, r0			; CHECK-THUMB-NEXT: bne .LBB7_4
	; CHECK-THUMB-NEXT: lsrs r0, r1, #1			; CHECK-THUMB-NEXT: .LBB7_3:
	; CHECK-THUMB-NEXT: ands r0, r5			; CHECK-THUMB-NEXT: adds r3, #32
	; CHECK-THUMB-NEXT: subs r0, r1, r0			; CHECK-THUMB-NEXT: mov r2, r3
	; CHECK-THUMB-NEXT: lsrs r1, r0, #2			; CHECK-THUMB-NEXT: .LBB7_4:
	; CHECK-THUMB-NEXT: ands r0, r4
	; CHECK-THUMB-NEXT: ands r1, r4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: lsrs r1, r0, #4
	; CHECK-THUMB-NEXT: adds r0, r0, r1
	; CHECK-THUMB-NEXT: ands r0, r3
	; CHECK-THUMB-NEXT: muls r2, r0, r2
	; CHECK-THUMB-NEXT: lsrs r0, r2, #24
	; CHECK-THUMB-NEXT: movs r1, #0			; CHECK-THUMB-NEXT: movs r1, #0
				; CHECK-THUMB-NEXT: mov r0, r2
	; CHECK-THUMB-NEXT: pop {r4, r5, r7, pc}			; CHECK-THUMB-NEXT: pop {r4, r5, r7, pc}
				; CHECK-THUMB-NEXT: .LBB7_5:
				; CHECK-THUMB-NEXT: rsbs r2, r0, #0
				; CHECK-THUMB-NEXT: ands r2, r0
				; CHECK-THUMB-NEXT: muls r2, r5, r2
				; CHECK-THUMB-NEXT: lsrs r2, r2, #27
				; CHECK-THUMB-NEXT: ldrb r2, [r4, r2]
				; CHECK-THUMB-NEXT: cmp r1, #0
				; CHECK-THUMB-NEXT: beq .LBB7_2
				; CHECK-THUMB-NEXT: .LBB7_6:
				; CHECK-THUMB-NEXT: rsbs r3, r1, #0
				; CHECK-THUMB-NEXT: ands r3, r1
				; CHECK-THUMB-NEXT: muls r5, r3, r5
				; CHECK-THUMB-NEXT: lsrs r1, r5, #27
				; CHECK-THUMB-NEXT: ldrb r3, [r4, r1]
				; CHECK-THUMB-NEXT: cmp r0, #0
				; CHECK-THUMB-NEXT: beq .LBB7_3
				; CHECK-THUMB-NEXT: b .LBB7_4
	; CHECK-THUMB-NEXT: .p2align 2			; CHECK-THUMB-NEXT: .p2align 2
	; CHECK-THUMB-NEXT: @ %bb.3:			; CHECK-THUMB-NEXT: @ %bb.7:
	; CHECK-THUMB-NEXT: .LCPI7_0:			; CHECK-THUMB-NEXT: .LCPI7_0:
	; CHECK-THUMB-NEXT: .long 1431655765 @ 0x55555555			; CHECK-THUMB-NEXT: .long 125613361 @ 0x77cb531
	; CHECK-THUMB-NEXT: .LCPI7_1:			; CHECK-THUMB-NEXT: .LCPI7_1:
	; CHECK-THUMB-NEXT: .long 858993459 @ 0x33333333			; CHECK-THUMB-NEXT: .ascii "\000\001\034\002\035\016\030\003\036\026\024\017\031\021\004\b\037\033\r\027\025\023\020\007\032\f\022\006\013\005\n\t"
	; CHECK-THUMB-NEXT: .LCPI7_2:
	; CHECK-THUMB-NEXT: .long 252645135 @ 0xf0f0f0f
	; CHECK-THUMB-NEXT: .LCPI7_3:
	; CHECK-THUMB-NEXT: .long 16843009 @ 0x1010101
	%tmp = call i64 @llvm.cttz.i64(i64 %a, i1 true)			%tmp = call i64 @llvm.cttz.i64(i64 %a, i1 true)
	ret i64 %tmp			ret i64 %tmp
	}			}

llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll

Show First 20 Lines • Show All 164 Lines • ▼ Show 20 Lines
; RV64ZBB-NEXT: ret		; RV64ZBB-NEXT: ret
%tmp = call i16 @llvm.cttz.i16(i16 %a, i1 false)		%tmp = call i16 @llvm.cttz.i16(i16 %a, i1 false)
ret i16 %tmp		ret i16 %tmp
}		}

define i32 @test_cttz_i32(i32 %a) nounwind {		define i32 @test_cttz_i32(i32 %a) nounwind {
; RV32I-LABEL: test_cttz_i32:		; RV32I-LABEL: test_cttz_i32:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: beqz a0, .LBB2_2		; RV32I-NEXT: beqz a0, .LBB2_4
; RV32I-NEXT: # %bb.1: # %cond.false		; RV32I-NEXT: # %bb.1: # %cond.false
; RV32I-NEXT: addi sp, sp, -16		; RV32I-NEXT: addi sp, sp, -16
; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
; RV32I-NEXT: addi a1, a0, -1		; RV32I-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: not a0, a0		; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: and a0, a0, a1		; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: srli a1, a0, 1		; RV32I-NEXT: and a0, s0, a0
; RV32I-NEXT: lui a2, 349525		; RV32I-NEXT: lui a1, 30667
; RV32I-NEXT: addi a2, a2, 1365		; RV32I-NEXT: addi a1, a1, 1329
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: lui a1, 209715
; RV32I-NEXT: addi a1, a1, 819
; RV32I-NEXT: and a2, a0, a1
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: add a0, a2, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: lui a1, 61681
; RV32I-NEXT: addi a1, a1, -241
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: lui a1, 4112
; RV32I-NEXT: addi a1, a1, 257
; RV32I-NEXT: call __mulsi3@plt		; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: srli a0, a0, 24		; RV32I-NEXT: mv a1, a0
		; RV32I-NEXT: li a0, 32
		; RV32I-NEXT: beqz s0, .LBB2_3
		; RV32I-NEXT: # %bb.2: # %cond.false
		; RV32I-NEXT: srli a0, a1, 27
		; RV32I-NEXT: lui a1, %hi(.LCPI2_0)
		; RV32I-NEXT: addi a1, a1, %lo(.LCPI2_0)
		; RV32I-NEXT: add a0, a1, a0
		; RV32I-NEXT: lbu a0, 0(a0)
		; RV32I-NEXT: .LBB2_3: # %cond.false
; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
		; RV32I-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 16		; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret		; RV32I-NEXT: ret
; RV32I-NEXT: .LBB2_2:		; RV32I-NEXT: .LBB2_4:
; RV32I-NEXT: li a0, 32		; RV32I-NEXT: li a0, 32
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_cttz_i32:		; RV64I-LABEL: test_cttz_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: sext.w a1, a0
; RV64I-NEXT: beqz a1, .LBB2_2
; RV64I-NEXT: # %bb.1: # %cond.false
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addiw a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: sext.w s0, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: beqz s0, .LBB2_3
; RV64I-NEXT: srli a1, a0, 1		; RV64I-NEXT: # %bb.1: # %cond.false
; RV64I-NEXT: lui a2, 349525		; RV64I-NEXT: neg a1, a0
; RV64I-NEXT: addiw a2, a2, 1365
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: lui a1, 209715
; RV64I-NEXT: addiw a1, a1, 819
; RV64I-NEXT: and a2, a0, a1
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: add a0, a2, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: lui a1, 61681
; RV64I-NEXT: addiw a1, a1, -241
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: lui a1, 4112		; RV64I-NEXT: lui a1, 30667
; RV64I-NEXT: addiw a1, a1, 257		; RV64I-NEXT: addiw a1, a1, 1329
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srliw a0, a0, 24		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 32
		; RV64I-NEXT: beqz s0, .LBB2_4
		; RV64I-NEXT: # %bb.2: # %cond.false
		; RV64I-NEXT: srliw a0, a1, 27
		; RV64I-NEXT: lui a1, %hi(.LCPI2_0)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI2_0)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: j .LBB2_4
		; RV64I-NEXT: .LBB2_3:
		; RV64I-NEXT: li a0, 32
		; RV64I-NEXT: .LBB2_4: # %cond.end
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
; RV64I-NEXT: .LBB2_2:
; RV64I-NEXT: li a0, 32
; RV64I-NEXT: ret
;		;
; RV32M-LABEL: test_cttz_i32:		; RV32M-LABEL: test_cttz_i32:
; RV32M: # %bb.0:		; RV32M: # %bb.0:
; RV32M-NEXT: beqz a0, .LBB2_2		; RV32M-NEXT: beqz a0, .LBB2_4
; RV32M-NEXT: # %bb.1: # %cond.false		; RV32M-NEXT: # %bb.1: # %cond.false
; RV32M-NEXT: addi a1, a0, -1		; RV32M-NEXT: mv a1, a0
; RV32M-NEXT: not a0, a0		; RV32M-NEXT: li a0, 32
; RV32M-NEXT: and a0, a0, a1		; RV32M-NEXT: beqz a1, .LBB2_3
; RV32M-NEXT: srli a1, a0, 1		; RV32M-NEXT: # %bb.2: # %cond.false
; RV32M-NEXT: lui a2, 349525		; RV32M-NEXT: neg a0, a1
; RV32M-NEXT: addi a2, a2, 1365		; RV32M-NEXT: and a0, a1, a0
; RV32M-NEXT: and a1, a1, a2		; RV32M-NEXT: lui a1, 30667
; RV32M-NEXT: sub a0, a0, a1		; RV32M-NEXT: addi a1, a1, 1329
; RV32M-NEXT: lui a1, 209715
; RV32M-NEXT: addi a1, a1, 819
; RV32M-NEXT: and a2, a0, a1
; RV32M-NEXT: srli a0, a0, 2
; RV32M-NEXT: and a0, a0, a1
; RV32M-NEXT: add a0, a2, a0
; RV32M-NEXT: srli a1, a0, 4
; RV32M-NEXT: add a0, a0, a1
; RV32M-NEXT: lui a1, 61681
; RV32M-NEXT: addi a1, a1, -241
; RV32M-NEXT: and a0, a0, a1
; RV32M-NEXT: lui a1, 4112
; RV32M-NEXT: addi a1, a1, 257
; RV32M-NEXT: mul a0, a0, a1		; RV32M-NEXT: mul a0, a0, a1
; RV32M-NEXT: srli a0, a0, 24		; RV32M-NEXT: srli a0, a0, 27
		; RV32M-NEXT: lui a1, %hi(.LCPI2_0)
		; RV32M-NEXT: addi a1, a1, %lo(.LCPI2_0)
		; RV32M-NEXT: add a0, a1, a0
		; RV32M-NEXT: lbu a0, 0(a0)
		; RV32M-NEXT: .LBB2_3: # %cond.end
; RV32M-NEXT: ret		; RV32M-NEXT: ret
; RV32M-NEXT: .LBB2_2:		; RV32M-NEXT: .LBB2_4:
; RV32M-NEXT: li a0, 32		; RV32M-NEXT: li a0, 32
; RV32M-NEXT: ret		; RV32M-NEXT: ret
;		;
; RV64M-LABEL: test_cttz_i32:		; RV64M-LABEL: test_cttz_i32:
; RV64M: # %bb.0:		; RV64M: # %bb.0:
; RV64M-NEXT: sext.w a1, a0		; RV64M-NEXT: sext.w a2, a0
; RV64M-NEXT: beqz a1, .LBB2_2		; RV64M-NEXT: beqz a2, .LBB2_4
; RV64M-NEXT: # %bb.1: # %cond.false		; RV64M-NEXT: # %bb.1: # %cond.false
; RV64M-NEXT: addiw a1, a0, -1		; RV64M-NEXT: mv a1, a0
; RV64M-NEXT: not a0, a0		; RV64M-NEXT: li a0, 32
; RV64M-NEXT: and a0, a0, a1		; RV64M-NEXT: beqz a2, .LBB2_3
; RV64M-NEXT: srli a1, a0, 1		; RV64M-NEXT: # %bb.2: # %cond.false
; RV64M-NEXT: lui a2, 349525		; RV64M-NEXT: neg a0, a1
; RV64M-NEXT: addiw a2, a2, 1365		; RV64M-NEXT: and a0, a1, a0
; RV64M-NEXT: and a1, a1, a2		; RV64M-NEXT: lui a1, 30667
; RV64M-NEXT: sub a0, a0, a1		; RV64M-NEXT: addiw a1, a1, 1329
; RV64M-NEXT: lui a1, 209715
; RV64M-NEXT: addiw a1, a1, 819
; RV64M-NEXT: and a2, a0, a1
; RV64M-NEXT: srli a0, a0, 2
; RV64M-NEXT: and a0, a0, a1
; RV64M-NEXT: add a0, a2, a0
; RV64M-NEXT: srli a1, a0, 4
; RV64M-NEXT: add a0, a0, a1
; RV64M-NEXT: lui a1, 61681
; RV64M-NEXT: addiw a1, a1, -241
; RV64M-NEXT: and a0, a0, a1
; RV64M-NEXT: lui a1, 4112
; RV64M-NEXT: addiw a1, a1, 257
; RV64M-NEXT: mulw a0, a0, a1		; RV64M-NEXT: mulw a0, a0, a1
; RV64M-NEXT: srliw a0, a0, 24		; RV64M-NEXT: srliw a0, a0, 27
		; RV64M-NEXT: lui a1, %hi(.LCPI2_0)
		; RV64M-NEXT: addi a1, a1, %lo(.LCPI2_0)
		; RV64M-NEXT: add a0, a1, a0
		; RV64M-NEXT: lbu a0, 0(a0)
		; RV64M-NEXT: .LBB2_3: # %cond.end
; RV64M-NEXT: ret		; RV64M-NEXT: ret
; RV64M-NEXT: .LBB2_2:		; RV64M-NEXT: .LBB2_4:
; RV64M-NEXT: li a0, 32		; RV64M-NEXT: li a0, 32
; RV64M-NEXT: ret		; RV64M-NEXT: ret
;		;
; RV32ZBB-LABEL: test_cttz_i32:		; RV32ZBB-LABEL: test_cttz_i32:
; RV32ZBB: # %bb.0:		; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: ctz a0, a0		; RV32ZBB-NEXT: ctz a0, a0
; RV32ZBB-NEXT: ret		; RV32ZBB-NEXT: ret
;		;
Show All 11 Lines
; RV32I-NEXT: addi sp, sp, -32		; RV32I-NEXT: addi sp, sp, -32
; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s0, 24(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s1, 20(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s1, 20(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s6, 0(sp) # 4-byte Folded Spill
; RV32I-NEXT: mv s1, a1		; RV32I-NEXT: mv s1, a1
; RV32I-NEXT: mv s2, a0
; RV32I-NEXT: addi a0, a0, -1
; RV32I-NEXT: not a1, s2
; RV32I-NEXT: and a0, a1, a0
; RV32I-NEXT: srli a1, a0, 1
; RV32I-NEXT: lui a2, 349525
; RV32I-NEXT: addi s4, a2, 1365
; RV32I-NEXT: and a1, a1, s4
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: lui a1, 209715
; RV32I-NEXT: addi s5, a1, 819
; RV32I-NEXT: and a1, a0, s5
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, s5
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: lui a1, 61681
; RV32I-NEXT: addi s6, a1, -241
; RV32I-NEXT: and a0, a0, s6
; RV32I-NEXT: lui a1, 4112
; RV32I-NEXT: addi s3, a1, 257
; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: mv s0, a0		; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: addi a0, s1, -1		; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: not a1, s1		; RV32I-NEXT: and a0, s0, a0
; RV32I-NEXT: and a0, a1, a0		; RV32I-NEXT: lui a1, 30667
; RV32I-NEXT: srli a1, a0, 1		; RV32I-NEXT: addi s3, a1, 1329
; RV32I-NEXT: and a1, a1, s4
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: and a1, a0, s5
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, s5
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: and a0, a0, s6
; RV32I-NEXT: mv a1, s3		; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3@plt		; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: bnez s2, .LBB3_2		; RV32I-NEXT: lui a1, %hi(.LCPI3_0)
		; RV32I-NEXT: addi s5, a1, %lo(.LCPI3_0)
		; RV32I-NEXT: li s4, 32
		; RV32I-NEXT: li s2, 32
		; RV32I-NEXT: beqz s0, .LBB3_2
; RV32I-NEXT: # %bb.1:		; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srli a0, a0, 24		; RV32I-NEXT: srli a0, a0, 27
; RV32I-NEXT: addi a0, a0, 32		; RV32I-NEXT: add a0, s5, a0
; RV32I-NEXT: j .LBB3_3		; RV32I-NEXT: lbu s2, 0(a0)
; RV32I-NEXT: .LBB3_2:		; RV32I-NEXT: .LBB3_2:
; RV32I-NEXT: srli a0, s0, 24		; RV32I-NEXT: neg a0, s1
; RV32I-NEXT: .LBB3_3:		; RV32I-NEXT: and a0, s1, a0
		; RV32I-NEXT: mv a1, s3
		; RV32I-NEXT: call __mulsi3@plt
		; RV32I-NEXT: beqz s1, .LBB3_4
		; RV32I-NEXT: # %bb.3:
		; RV32I-NEXT: srli a0, a0, 27
		; RV32I-NEXT: add a0, s5, a0
		; RV32I-NEXT: lbu s4, 0(a0)
		; RV32I-NEXT: .LBB3_4:
		; RV32I-NEXT: bnez s0, .LBB3_6
		; RV32I-NEXT: # %bb.5:
		; RV32I-NEXT: addi s2, s4, 32
		; RV32I-NEXT: .LBB3_6:
		; RV32I-NEXT: mv a0, s2
; RV32I-NEXT: li a1, 0		; RV32I-NEXT: li a1, 0
; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 24(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s1, 20(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s6, 0(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 32		; RV32I-NEXT: addi sp, sp, 32
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_cttz_i64:		; RV64I-LABEL: test_cttz_i64:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: beqz a0, .LBB3_2		; RV64I-NEXT: beqz a0, .LBB3_4
; RV64I-NEXT: # %bb.1: # %cond.false		; RV64I-NEXT: # %bb.1: # %cond.false
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addi a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: mv s0, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: neg a0, a0
		; RV64I-NEXT: and a0, s0, a0
; RV64I-NEXT: lui a1, %hi(.LCPI3_0)		; RV64I-NEXT: lui a1, %hi(.LCPI3_0)
; RV64I-NEXT: ld a1, %lo(.LCPI3_0)(a1)		; RV64I-NEXT: ld a1, %lo(.LCPI3_0)(a1)
; RV64I-NEXT: lui a2, %hi(.LCPI3_1)
; RV64I-NEXT: ld a2, %lo(.LCPI3_1)(a2)
; RV64I-NEXT: srli a3, a0, 1
; RV64I-NEXT: and a1, a3, a1
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: and a1, a0, a2
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a2, %hi(.LCPI3_2)
; RV64I-NEXT: ld a2, %lo(.LCPI3_2)(a2)
; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a1, %hi(.LCPI3_3)
; RV64I-NEXT: ld a1, %lo(.LCPI3_3)(a1)
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srli a0, a0, 56		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 64
		; RV64I-NEXT: beqz s0, .LBB3_3
		; RV64I-NEXT: # %bb.2: # %cond.false
		; RV64I-NEXT: srli a0, a1, 58
		; RV64I-NEXT: lui a1, %hi(.LCPI3_1)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI3_1)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: .LBB3_3: # %cond.false
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
; RV64I-NEXT: .LBB3_2:		; RV64I-NEXT: .LBB3_4:
; RV64I-NEXT: li a0, 64		; RV64I-NEXT: li a0, 64
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32M-LABEL: test_cttz_i64:		; RV32M-LABEL: test_cttz_i64:
; RV32M: # %bb.0:		; RV32M: # %bb.0:
; RV32M-NEXT: lui a2, 349525		; RV32M-NEXT: lui a2, 30667
; RV32M-NEXT: addi a5, a2, 1365		; RV32M-NEXT: addi a4, a2, 1329
; RV32M-NEXT: lui a2, 209715		; RV32M-NEXT: lui a2, %hi(.LCPI3_0)
; RV32M-NEXT: addi a4, a2, 819		; RV32M-NEXT: addi a5, a2, %lo(.LCPI3_0)
; RV32M-NEXT: lui a2, 61681		; RV32M-NEXT: li a3, 32
; RV32M-NEXT: addi a2, a2, -241		; RV32M-NEXT: li a2, 32
; RV32M-NEXT: lui a3, 4112		; RV32M-NEXT: bnez a0, .LBB3_5
; RV32M-NEXT: addi a3, a3, 257
; RV32M-NEXT: bnez a0, .LBB3_2
; RV32M-NEXT: # %bb.1:		; RV32M-NEXT: # %bb.1:
; RV32M-NEXT: addi a0, a1, -1		; RV32M-NEXT: bnez a1, .LBB3_6
; RV32M-NEXT: not a1, a1
; RV32M-NEXT: and a0, a1, a0
; RV32M-NEXT: srli a1, a0, 1
; RV32M-NEXT: and a1, a1, a5
; RV32M-NEXT: sub a0, a0, a1
; RV32M-NEXT: and a1, a0, a4
; RV32M-NEXT: srli a0, a0, 2
; RV32M-NEXT: and a0, a0, a4
; RV32M-NEXT: add a0, a1, a0
; RV32M-NEXT: srli a1, a0, 4
; RV32M-NEXT: add a0, a0, a1
; RV32M-NEXT: and a0, a0, a2
; RV32M-NEXT: mul a0, a0, a3
; RV32M-NEXT: srli a0, a0, 24
; RV32M-NEXT: addi a0, a0, 32
; RV32M-NEXT: li a1, 0
; RV32M-NEXT: ret
; RV32M-NEXT: .LBB3_2:		; RV32M-NEXT: .LBB3_2:
; RV32M-NEXT: addi a1, a0, -1		; RV32M-NEXT: bnez a0, .LBB3_4
; RV32M-NEXT: not a0, a0		; RV32M-NEXT: .LBB3_3:
; RV32M-NEXT: and a0, a0, a1		; RV32M-NEXT: addi a2, a3, 32
; RV32M-NEXT: srli a1, a0, 1		; RV32M-NEXT: .LBB3_4:
; RV32M-NEXT: and a1, a1, a5		; RV32M-NEXT: mv a0, a2
; RV32M-NEXT: sub a0, a0, a1
; RV32M-NEXT: and a1, a0, a4
; RV32M-NEXT: srli a0, a0, 2
; RV32M-NEXT: and a0, a0, a4
; RV32M-NEXT: add a0, a1, a0
; RV32M-NEXT: srli a1, a0, 4
; RV32M-NEXT: add a0, a0, a1
; RV32M-NEXT: and a0, a0, a2
; RV32M-NEXT: mul a0, a0, a3
; RV32M-NEXT: srli a0, a0, 24
; RV32M-NEXT: li a1, 0		; RV32M-NEXT: li a1, 0
; RV32M-NEXT: ret		; RV32M-NEXT: ret
		; RV32M-NEXT: .LBB3_5:
		; RV32M-NEXT: neg a2, a0
		; RV32M-NEXT: and a2, a0, a2
		; RV32M-NEXT: mul a2, a2, a4
		; RV32M-NEXT: srli a2, a2, 27
		; RV32M-NEXT: add a2, a5, a2
		; RV32M-NEXT: lbu a2, 0(a2)
		; RV32M-NEXT: beqz a1, .LBB3_2
		; RV32M-NEXT: .LBB3_6:
		; RV32M-NEXT: neg a3, a1
		; RV32M-NEXT: and a1, a1, a3
		; RV32M-NEXT: mul a1, a1, a4
		; RV32M-NEXT: srli a1, a1, 27
		; RV32M-NEXT: add a1, a5, a1
		; RV32M-NEXT: lbu a3, 0(a1)
		; RV32M-NEXT: beqz a0, .LBB3_3
		; RV32M-NEXT: j .LBB3_4
;		;
; RV64M-LABEL: test_cttz_i64:		; RV64M-LABEL: test_cttz_i64:
; RV64M: # %bb.0:		; RV64M: # %bb.0:
; RV64M-NEXT: beqz a0, .LBB3_2		; RV64M-NEXT: beqz a0, .LBB3_4
; RV64M-NEXT: # %bb.1: # %cond.false		; RV64M-NEXT: # %bb.1: # %cond.false
; RV64M-NEXT: addi a1, a0, -1		; RV64M-NEXT: mv a1, a0
; RV64M-NEXT: not a0, a0		; RV64M-NEXT: li a0, 64
; RV64M-NEXT: and a0, a0, a1		; RV64M-NEXT: beqz a1, .LBB3_3
; RV64M-NEXT: lui a1, %hi(.LCPI3_0)		; RV64M-NEXT: # %bb.2: # %cond.false
; RV64M-NEXT: ld a1, %lo(.LCPI3_0)(a1)		; RV64M-NEXT: lui a0, %hi(.LCPI3_0)
; RV64M-NEXT: lui a2, %hi(.LCPI3_1)		; RV64M-NEXT: ld a0, %lo(.LCPI3_0)(a0)
; RV64M-NEXT: ld a2, %lo(.LCPI3_1)(a2)		; RV64M-NEXT: neg a2, a1
; RV64M-NEXT: srli a3, a0, 1		; RV64M-NEXT: and a1, a1, a2
; RV64M-NEXT: and a1, a3, a1		; RV64M-NEXT: mul a0, a1, a0
; RV64M-NEXT: sub a0, a0, a1		; RV64M-NEXT: srli a0, a0, 58
; RV64M-NEXT: and a1, a0, a2		; RV64M-NEXT: lui a1, %hi(.LCPI3_1)
; RV64M-NEXT: srli a0, a0, 2		; RV64M-NEXT: addi a1, a1, %lo(.LCPI3_1)
; RV64M-NEXT: and a0, a0, a2
; RV64M-NEXT: add a0, a1, a0		; RV64M-NEXT: add a0, a1, a0
; RV64M-NEXT: lui a1, %hi(.LCPI3_2)		; RV64M-NEXT: lbu a0, 0(a0)
; RV64M-NEXT: ld a1, %lo(.LCPI3_2)(a1)		; RV64M-NEXT: .LBB3_3: # %cond.end
; RV64M-NEXT: lui a2, %hi(.LCPI3_3)
; RV64M-NEXT: ld a2, %lo(.LCPI3_3)(a2)
; RV64M-NEXT: srli a3, a0, 4
; RV64M-NEXT: add a0, a0, a3
; RV64M-NEXT: and a0, a0, a1
; RV64M-NEXT: mul a0, a0, a2
; RV64M-NEXT: srli a0, a0, 56
; RV64M-NEXT: ret		; RV64M-NEXT: ret
; RV64M-NEXT: .LBB3_2:		; RV64M-NEXT: .LBB3_4:
; RV64M-NEXT: li a0, 64		; RV64M-NEXT: li a0, 64
; RV64M-NEXT: ret		; RV64M-NEXT: ret
;		;
; RV32ZBB-LABEL: test_cttz_i64:		; RV32ZBB-LABEL: test_cttz_i64:
; RV32ZBB: # %bb.0:		; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: bnez a0, .LBB3_2		; RV32ZBB-NEXT: bnez a0, .LBB3_2
; RV32ZBB-NEXT: # %bb.1:		; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: ctz a0, a1		; RV32ZBB-NEXT: ctz a0, a1
▲ Show 20 Lines • Show All 123 Lines • ▼ Show 20 Lines	; RV64ZBB-NEXT: ret
ret i16 %tmp		ret i16 %tmp
}		}

define i32 @test_cttz_i32_zero_undef(i32 %a) nounwind {		define i32 @test_cttz_i32_zero_undef(i32 %a) nounwind {
; RV32I-LABEL: test_cttz_i32_zero_undef:		; RV32I-LABEL: test_cttz_i32_zero_undef:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: addi sp, sp, -16		; RV32I-NEXT: addi sp, sp, -16
; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
; RV32I-NEXT: addi a1, a0, -1		; RV32I-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: not a0, a0		; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: and a0, a0, a1		; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: srli a1, a0, 1		; RV32I-NEXT: and a0, s0, a0
; RV32I-NEXT: lui a2, 349525		; RV32I-NEXT: lui a1, 30667
; RV32I-NEXT: addi a2, a2, 1365		; RV32I-NEXT: addi a1, a1, 1329
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: lui a1, 209715
; RV32I-NEXT: addi a1, a1, 819
; RV32I-NEXT: and a2, a0, a1
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: add a0, a2, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: lui a1, 61681
; RV32I-NEXT: addi a1, a1, -241
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: lui a1, 4112
; RV32I-NEXT: addi a1, a1, 257
; RV32I-NEXT: call __mulsi3@plt		; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: srli a0, a0, 24		; RV32I-NEXT: mv a1, a0
		; RV32I-NEXT: li a0, 32
		; RV32I-NEXT: beqz s0, .LBB6_2
		; RV32I-NEXT: # %bb.1:
		; RV32I-NEXT: srli a0, a1, 27
		; RV32I-NEXT: lui a1, %hi(.LCPI6_0)
		; RV32I-NEXT: addi a1, a1, %lo(.LCPI6_0)
		; RV32I-NEXT: add a0, a1, a0
		; RV32I-NEXT: lbu a0, 0(a0)
		; RV32I-NEXT: .LBB6_2:
; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
		; RV32I-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 16		; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_cttz_i32_zero_undef:		; RV64I-LABEL: test_cttz_i32_zero_undef:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addiw a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: sext.w s0, a0
		; RV64I-NEXT: neg a1, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: srli a1, a0, 1		; RV64I-NEXT: lui a1, 30667
; RV64I-NEXT: lui a2, 349525		; RV64I-NEXT: addiw a1, a1, 1329
; RV64I-NEXT: addiw a2, a2, 1365
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: lui a1, 209715
; RV64I-NEXT: addiw a1, a1, 819
; RV64I-NEXT: and a2, a0, a1
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: add a0, a2, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: lui a1, 61681
; RV64I-NEXT: addiw a1, a1, -241
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: lui a1, 4112
; RV64I-NEXT: addiw a1, a1, 257
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srliw a0, a0, 24		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 32
		; RV64I-NEXT: beqz s0, .LBB6_2
		; RV64I-NEXT: # %bb.1:
		; RV64I-NEXT: srliw a0, a1, 27
		; RV64I-NEXT: lui a1, %hi(.LCPI6_0)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI6_0)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: .LBB6_2:
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32M-LABEL: test_cttz_i32_zero_undef:		; RV32M-LABEL: test_cttz_i32_zero_undef:
; RV32M: # %bb.0:		; RV32M: # %bb.0:
; RV32M-NEXT: addi a1, a0, -1		; RV32M-NEXT: li a1, 32
; RV32M-NEXT: not a0, a0		; RV32M-NEXT: beqz a0, .LBB6_2
; RV32M-NEXT: and a0, a0, a1		; RV32M-NEXT: # %bb.1:
; RV32M-NEXT: srli a1, a0, 1		; RV32M-NEXT: neg a1, a0
; RV32M-NEXT: lui a2, 349525
; RV32M-NEXT: addi a2, a2, 1365
; RV32M-NEXT: and a1, a1, a2
; RV32M-NEXT: sub a0, a0, a1
; RV32M-NEXT: lui a1, 209715
; RV32M-NEXT: addi a1, a1, 819
; RV32M-NEXT: and a2, a0, a1
; RV32M-NEXT: srli a0, a0, 2
; RV32M-NEXT: and a0, a0, a1
; RV32M-NEXT: add a0, a2, a0
; RV32M-NEXT: srli a1, a0, 4
; RV32M-NEXT: add a0, a0, a1
; RV32M-NEXT: lui a1, 61681
; RV32M-NEXT: addi a1, a1, -241
; RV32M-NEXT: and a0, a0, a1		; RV32M-NEXT: and a0, a0, a1
; RV32M-NEXT: lui a1, 4112		; RV32M-NEXT: lui a1, 30667
; RV32M-NEXT: addi a1, a1, 257		; RV32M-NEXT: addi a1, a1, 1329
; RV32M-NEXT: mul a0, a0, a1		; RV32M-NEXT: mul a0, a0, a1
; RV32M-NEXT: srli a0, a0, 24		; RV32M-NEXT: srli a0, a0, 27
		; RV32M-NEXT: lui a1, %hi(.LCPI6_0)
		; RV32M-NEXT: addi a1, a1, %lo(.LCPI6_0)
		; RV32M-NEXT: add a0, a1, a0
		; RV32M-NEXT: lbu a1, 0(a0)
		; RV32M-NEXT: .LBB6_2:
		; RV32M-NEXT: mv a0, a1
; RV32M-NEXT: ret		; RV32M-NEXT: ret
;		;
; RV64M-LABEL: test_cttz_i32_zero_undef:		; RV64M-LABEL: test_cttz_i32_zero_undef:
; RV64M: # %bb.0:		; RV64M: # %bb.0:
; RV64M-NEXT: addiw a1, a0, -1		; RV64M-NEXT: sext.w a2, a0
; RV64M-NEXT: not a0, a0		; RV64M-NEXT: li a1, 32
; RV64M-NEXT: and a0, a0, a1		; RV64M-NEXT: beqz a2, .LBB6_2
; RV64M-NEXT: srli a1, a0, 1		; RV64M-NEXT: # %bb.1:
; RV64M-NEXT: lui a2, 349525		; RV64M-NEXT: neg a1, a0
; RV64M-NEXT: addiw a2, a2, 1365
; RV64M-NEXT: and a1, a1, a2
; RV64M-NEXT: sub a0, a0, a1
; RV64M-NEXT: lui a1, 209715
; RV64M-NEXT: addiw a1, a1, 819
; RV64M-NEXT: and a2, a0, a1
; RV64M-NEXT: srli a0, a0, 2
; RV64M-NEXT: and a0, a0, a1
; RV64M-NEXT: add a0, a2, a0
; RV64M-NEXT: srli a1, a0, 4
; RV64M-NEXT: add a0, a0, a1
; RV64M-NEXT: lui a1, 61681
; RV64M-NEXT: addiw a1, a1, -241
; RV64M-NEXT: and a0, a0, a1		; RV64M-NEXT: and a0, a0, a1
; RV64M-NEXT: lui a1, 4112		; RV64M-NEXT: lui a1, 30667
; RV64M-NEXT: addiw a1, a1, 257		; RV64M-NEXT: addiw a1, a1, 1329
; RV64M-NEXT: mulw a0, a0, a1		; RV64M-NEXT: mulw a0, a0, a1
; RV64M-NEXT: srliw a0, a0, 24		; RV64M-NEXT: srliw a0, a0, 27
		; RV64M-NEXT: lui a1, %hi(.LCPI6_0)
		; RV64M-NEXT: addi a1, a1, %lo(.LCPI6_0)
		; RV64M-NEXT: add a0, a1, a0
		; RV64M-NEXT: lbu a1, 0(a0)
		; RV64M-NEXT: .LBB6_2:
		; RV64M-NEXT: mv a0, a1
; RV64M-NEXT: ret		; RV64M-NEXT: ret
;		;
; RV32ZBB-LABEL: test_cttz_i32_zero_undef:		; RV32ZBB-LABEL: test_cttz_i32_zero_undef:
; RV32ZBB: # %bb.0:		; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: ctz a0, a0		; RV32ZBB-NEXT: ctz a0, a0
; RV32ZBB-NEXT: ret		; RV32ZBB-NEXT: ret
;		;
; RV64ZBB-LABEL: test_cttz_i32_zero_undef:		; RV64ZBB-LABEL: test_cttz_i32_zero_undef:
Show All 10 Lines
; RV32I-NEXT: addi sp, sp, -32		; RV32I-NEXT: addi sp, sp, -32
; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s0, 24(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s1, 20(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s1, 20(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s6, 0(sp) # 4-byte Folded Spill
; RV32I-NEXT: mv s1, a1		; RV32I-NEXT: mv s1, a1
; RV32I-NEXT: mv s2, a0
; RV32I-NEXT: addi a0, a0, -1
; RV32I-NEXT: not a1, s2
; RV32I-NEXT: and a0, a1, a0
; RV32I-NEXT: srli a1, a0, 1
; RV32I-NEXT: lui a2, 349525
; RV32I-NEXT: addi s4, a2, 1365
; RV32I-NEXT: and a1, a1, s4
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: lui a1, 209715
; RV32I-NEXT: addi s5, a1, 819
; RV32I-NEXT: and a1, a0, s5
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, s5
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: lui a1, 61681
; RV32I-NEXT: addi s6, a1, -241
; RV32I-NEXT: and a0, a0, s6
; RV32I-NEXT: lui a1, 4112
; RV32I-NEXT: addi s3, a1, 257
; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: mv s0, a0		; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: addi a0, s1, -1		; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: not a1, s1		; RV32I-NEXT: and a0, s0, a0
; RV32I-NEXT: and a0, a1, a0		; RV32I-NEXT: lui a1, 30667
; RV32I-NEXT: srli a1, a0, 1		; RV32I-NEXT: addi s3, a1, 1329
; RV32I-NEXT: and a1, a1, s4
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: and a1, a0, s5
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, s5
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: and a0, a0, s6
; RV32I-NEXT: mv a1, s3		; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3@plt		; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: bnez s2, .LBB7_2		; RV32I-NEXT: lui a1, %hi(.LCPI7_0)
		; RV32I-NEXT: addi s5, a1, %lo(.LCPI7_0)
		; RV32I-NEXT: li s4, 32
		; RV32I-NEXT: li s2, 32
		; RV32I-NEXT: beqz s0, .LBB7_2
; RV32I-NEXT: # %bb.1:		; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srli a0, a0, 24		; RV32I-NEXT: srli a0, a0, 27
; RV32I-NEXT: addi a0, a0, 32		; RV32I-NEXT: add a0, s5, a0
; RV32I-NEXT: j .LBB7_3		; RV32I-NEXT: lbu s2, 0(a0)
; RV32I-NEXT: .LBB7_2:		; RV32I-NEXT: .LBB7_2:
; RV32I-NEXT: srli a0, s0, 24		; RV32I-NEXT: neg a0, s1
; RV32I-NEXT: .LBB7_3:		; RV32I-NEXT: and a0, s1, a0
		; RV32I-NEXT: mv a1, s3
		; RV32I-NEXT: call __mulsi3@plt
		; RV32I-NEXT: beqz s1, .LBB7_4
		; RV32I-NEXT: # %bb.3:
		; RV32I-NEXT: srli a0, a0, 27
		; RV32I-NEXT: add a0, s5, a0
		; RV32I-NEXT: lbu s4, 0(a0)
		; RV32I-NEXT: .LBB7_4:
		; RV32I-NEXT: bnez s0, .LBB7_6
		; RV32I-NEXT: # %bb.5:
		; RV32I-NEXT: addi s2, s4, 32
		; RV32I-NEXT: .LBB7_6:
		; RV32I-NEXT: mv a0, s2
; RV32I-NEXT: li a1, 0		; RV32I-NEXT: li a1, 0
; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 24(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s1, 20(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s6, 0(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 32		; RV32I-NEXT: addi sp, sp, 32
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV64I-LABEL: test_cttz_i64_zero_undef:		; RV64I-LABEL: test_cttz_i64_zero_undef:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addi a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: mv s0, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: neg a0, a0
		; RV64I-NEXT: and a0, s0, a0
; RV64I-NEXT: lui a1, %hi(.LCPI7_0)		; RV64I-NEXT: lui a1, %hi(.LCPI7_0)
; RV64I-NEXT: ld a1, %lo(.LCPI7_0)(a1)		; RV64I-NEXT: ld a1, %lo(.LCPI7_0)(a1)
; RV64I-NEXT: lui a2, %hi(.LCPI7_1)
; RV64I-NEXT: ld a2, %lo(.LCPI7_1)(a2)
; RV64I-NEXT: srli a3, a0, 1
; RV64I-NEXT: and a1, a3, a1
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: and a1, a0, a2
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a2, %hi(.LCPI7_2)
; RV64I-NEXT: ld a2, %lo(.LCPI7_2)(a2)
; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a1, %hi(.LCPI7_3)
; RV64I-NEXT: ld a1, %lo(.LCPI7_3)(a1)
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srli a0, a0, 56		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 64
		; RV64I-NEXT: beqz s0, .LBB7_2
		; RV64I-NEXT: # %bb.1:
		; RV64I-NEXT: srli a0, a1, 58
		; RV64I-NEXT: lui a1, %hi(.LCPI7_1)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI7_1)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: .LBB7_2:
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV32M-LABEL: test_cttz_i64_zero_undef:		; RV32M-LABEL: test_cttz_i64_zero_undef:
; RV32M: # %bb.0:		; RV32M: # %bb.0:
; RV32M-NEXT: lui a2, 349525		; RV32M-NEXT: lui a2, 30667
; RV32M-NEXT: addi a5, a2, 1365		; RV32M-NEXT: addi a4, a2, 1329
; RV32M-NEXT: lui a2, 209715		; RV32M-NEXT: lui a2, %hi(.LCPI7_0)
; RV32M-NEXT: addi a4, a2, 819		; RV32M-NEXT: addi a5, a2, %lo(.LCPI7_0)
; RV32M-NEXT: lui a2, 61681		; RV32M-NEXT: li a3, 32
; RV32M-NEXT: addi a2, a2, -241		; RV32M-NEXT: li a2, 32
; RV32M-NEXT: lui a3, 4112		; RV32M-NEXT: bnez a0, .LBB7_5
; RV32M-NEXT: addi a3, a3, 257
; RV32M-NEXT: bnez a0, .LBB7_2
; RV32M-NEXT: # %bb.1:		; RV32M-NEXT: # %bb.1:
; RV32M-NEXT: addi a0, a1, -1		; RV32M-NEXT: bnez a1, .LBB7_6
; RV32M-NEXT: not a1, a1
; RV32M-NEXT: and a0, a1, a0
; RV32M-NEXT: srli a1, a0, 1
; RV32M-NEXT: and a1, a1, a5
; RV32M-NEXT: sub a0, a0, a1
; RV32M-NEXT: and a1, a0, a4
; RV32M-NEXT: srli a0, a0, 2
; RV32M-NEXT: and a0, a0, a4
; RV32M-NEXT: add a0, a1, a0
; RV32M-NEXT: srli a1, a0, 4
; RV32M-NEXT: add a0, a0, a1
; RV32M-NEXT: and a0, a0, a2
; RV32M-NEXT: mul a0, a0, a3
; RV32M-NEXT: srli a0, a0, 24
; RV32M-NEXT: addi a0, a0, 32
; RV32M-NEXT: li a1, 0
; RV32M-NEXT: ret
; RV32M-NEXT: .LBB7_2:		; RV32M-NEXT: .LBB7_2:
; RV32M-NEXT: addi a1, a0, -1		; RV32M-NEXT: bnez a0, .LBB7_4
; RV32M-NEXT: not a0, a0		; RV32M-NEXT: .LBB7_3:
; RV32M-NEXT: and a0, a0, a1		; RV32M-NEXT: addi a2, a3, 32
; RV32M-NEXT: srli a1, a0, 1		; RV32M-NEXT: .LBB7_4:
; RV32M-NEXT: and a1, a1, a5		; RV32M-NEXT: mv a0, a2
; RV32M-NEXT: sub a0, a0, a1
; RV32M-NEXT: and a1, a0, a4
; RV32M-NEXT: srli a0, a0, 2
; RV32M-NEXT: and a0, a0, a4
; RV32M-NEXT: add a0, a1, a0
; RV32M-NEXT: srli a1, a0, 4
; RV32M-NEXT: add a0, a0, a1
; RV32M-NEXT: and a0, a0, a2
; RV32M-NEXT: mul a0, a0, a3
; RV32M-NEXT: srli a0, a0, 24
; RV32M-NEXT: li a1, 0		; RV32M-NEXT: li a1, 0
; RV32M-NEXT: ret		; RV32M-NEXT: ret
		; RV32M-NEXT: .LBB7_5:
		; RV32M-NEXT: neg a2, a0
		; RV32M-NEXT: and a2, a0, a2
		; RV32M-NEXT: mul a2, a2, a4
		; RV32M-NEXT: srli a2, a2, 27
		; RV32M-NEXT: add a2, a5, a2
		; RV32M-NEXT: lbu a2, 0(a2)
		; RV32M-NEXT: beqz a1, .LBB7_2
		; RV32M-NEXT: .LBB7_6:
		; RV32M-NEXT: neg a3, a1
		; RV32M-NEXT: and a1, a1, a3
		; RV32M-NEXT: mul a1, a1, a4
		; RV32M-NEXT: srli a1, a1, 27
		; RV32M-NEXT: add a1, a5, a1
		; RV32M-NEXT: lbu a3, 0(a1)
		; RV32M-NEXT: beqz a0, .LBB7_3
		; RV32M-NEXT: j .LBB7_4
;		;
; RV64M-LABEL: test_cttz_i64_zero_undef:		; RV64M-LABEL: test_cttz_i64_zero_undef:
; RV64M: # %bb.0:		; RV64M: # %bb.0:
; RV64M-NEXT: addi a1, a0, -1		; RV64M-NEXT: li a1, 64
; RV64M-NEXT: not a0, a0		; RV64M-NEXT: beqz a0, .LBB7_2
; RV64M-NEXT: and a0, a0, a1		; RV64M-NEXT: # %bb.1:
; RV64M-NEXT: lui a1, %hi(.LCPI7_0)		; RV64M-NEXT: lui a1, %hi(.LCPI7_0)
; RV64M-NEXT: ld a1, %lo(.LCPI7_0)(a1)		; RV64M-NEXT: ld a1, %lo(.LCPI7_0)(a1)
; RV64M-NEXT: lui a2, %hi(.LCPI7_1)		; RV64M-NEXT: neg a2, a0
; RV64M-NEXT: ld a2, %lo(.LCPI7_1)(a2)
; RV64M-NEXT: srli a3, a0, 1
; RV64M-NEXT: and a1, a3, a1
; RV64M-NEXT: sub a0, a0, a1
; RV64M-NEXT: and a1, a0, a2
; RV64M-NEXT: srli a0, a0, 2
; RV64M-NEXT: and a0, a0, a2		; RV64M-NEXT: and a0, a0, a2
		; RV64M-NEXT: mul a0, a0, a1
		; RV64M-NEXT: srli a0, a0, 58
		; RV64M-NEXT: lui a1, %hi(.LCPI7_1)
		; RV64M-NEXT: addi a1, a1, %lo(.LCPI7_1)
; RV64M-NEXT: add a0, a1, a0		; RV64M-NEXT: add a0, a1, a0
; RV64M-NEXT: lui a1, %hi(.LCPI7_2)		; RV64M-NEXT: lbu a1, 0(a0)
; RV64M-NEXT: ld a1, %lo(.LCPI7_2)(a1)		; RV64M-NEXT: .LBB7_2:
; RV64M-NEXT: lui a2, %hi(.LCPI7_3)		; RV64M-NEXT: mv a0, a1
; RV64M-NEXT: ld a2, %lo(.LCPI7_3)(a2)
; RV64M-NEXT: srli a3, a0, 4
; RV64M-NEXT: add a0, a0, a3
; RV64M-NEXT: and a0, a0, a1
; RV64M-NEXT: mul a0, a0, a2
; RV64M-NEXT: srli a0, a0, 56
; RV64M-NEXT: ret		; RV64M-NEXT: ret
;		;
; RV32ZBB-LABEL: test_cttz_i64_zero_undef:		; RV32ZBB-LABEL: test_cttz_i64_zero_undef:
; RV32ZBB: # %bb.0:		; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: bnez a0, .LBB7_2		; RV32ZBB-NEXT: bnez a0, .LBB7_2
; RV32ZBB-NEXT: # %bb.1:		; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: ctz a0, a1		; RV32ZBB-NEXT: ctz a0, a1
; RV32ZBB-NEXT: addi a0, a0, 32		; RV32ZBB-NEXT: addi a0, a0, 32
▲ Show 20 Lines • Show All 1,782 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv32zbb.ll

Show First 20 Lines • Show All 165 Lines • ▼ Show 20 Lines	; RV32ZBB-NEXT: ret
ret i64 %1		ret i64 %1
}		}

declare i32 @llvm.cttz.i32(i32, i1)		declare i32 @llvm.cttz.i32(i32, i1)

define i32 @cttz_i32(i32 %a) nounwind {		define i32 @cttz_i32(i32 %a) nounwind {
; RV32I-LABEL: cttz_i32:		; RV32I-LABEL: cttz_i32:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: beqz a0, .LBB2_2		; RV32I-NEXT: beqz a0, .LBB2_4
; RV32I-NEXT: # %bb.1: # %cond.false		; RV32I-NEXT: # %bb.1: # %cond.false
; RV32I-NEXT: addi sp, sp, -16		; RV32I-NEXT: addi sp, sp, -16
; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
; RV32I-NEXT: addi a1, a0, -1		; RV32I-NEXT: sw s0, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: not a0, a0		; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: and a0, a0, a1		; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: srli a1, a0, 1		; RV32I-NEXT: and a0, s0, a0
; RV32I-NEXT: lui a2, 349525		; RV32I-NEXT: lui a1, 30667
; RV32I-NEXT: addi a2, a2, 1365		; RV32I-NEXT: addi a1, a1, 1329
; RV32I-NEXT: and a1, a1, a2
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: lui a1, 209715
; RV32I-NEXT: addi a1, a1, 819
; RV32I-NEXT: and a2, a0, a1
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: add a0, a2, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: lui a1, 61681
; RV32I-NEXT: addi a1, a1, -241
; RV32I-NEXT: and a0, a0, a1
; RV32I-NEXT: lui a1, 4112
; RV32I-NEXT: addi a1, a1, 257
; RV32I-NEXT: call __mulsi3@plt		; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: srli a0, a0, 24		; RV32I-NEXT: mv a1, a0
		; RV32I-NEXT: li a0, 32
		; RV32I-NEXT: beqz s0, .LBB2_3
		; RV32I-NEXT: # %bb.2: # %cond.false
		; RV32I-NEXT: srli a0, a1, 27
		; RV32I-NEXT: lui a1, %hi(.LCPI2_0)
		; RV32I-NEXT: addi a1, a1, %lo(.LCPI2_0)
		; RV32I-NEXT: add a0, a1, a0
		; RV32I-NEXT: lbu a0, 0(a0)
		; RV32I-NEXT: .LBB2_3: # %cond.false
; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
		; RV32I-NEXT: lw s0, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 16		; RV32I-NEXT: addi sp, sp, 16
; RV32I-NEXT: ret		; RV32I-NEXT: ret
; RV32I-NEXT: .LBB2_2:		; RV32I-NEXT: .LBB2_4:
; RV32I-NEXT: li a0, 32		; RV32I-NEXT: li a0, 32
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV32ZBB-LABEL: cttz_i32:		; RV32ZBB-LABEL: cttz_i32:
; RV32ZBB: # %bb.0:		; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: ctz a0, a0		; RV32ZBB-NEXT: ctz a0, a0
; RV32ZBB-NEXT: ret		; RV32ZBB-NEXT: ret
%1 = call i32 @llvm.cttz.i32(i32 %a, i1 false)		%1 = call i32 @llvm.cttz.i32(i32 %a, i1 false)
ret i32 %1		ret i32 %1
}		}

declare i64 @llvm.cttz.i64(i64, i1)		declare i64 @llvm.cttz.i64(i64, i1)

define i64 @cttz_i64(i64 %a) nounwind {		define i64 @cttz_i64(i64 %a) nounwind {
; RV32I-LABEL: cttz_i64:		; RV32I-LABEL: cttz_i64:
; RV32I: # %bb.0:		; RV32I: # %bb.0:
; RV32I-NEXT: addi sp, sp, -32		; RV32I-NEXT: addi sp, sp, -32
; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw ra, 28(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s0, 24(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s0, 24(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s1, 20(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s1, 20(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s2, 16(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s3, 12(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s4, 8(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill		; RV32I-NEXT: sw s5, 4(sp) # 4-byte Folded Spill
; RV32I-NEXT: sw s6, 0(sp) # 4-byte Folded Spill
; RV32I-NEXT: mv s1, a1		; RV32I-NEXT: mv s1, a1
; RV32I-NEXT: mv s2, a0
; RV32I-NEXT: addi a0, a0, -1
; RV32I-NEXT: not a1, s2
; RV32I-NEXT: and a0, a1, a0
; RV32I-NEXT: srli a1, a0, 1
; RV32I-NEXT: lui a2, 349525
; RV32I-NEXT: addi s4, a2, 1365
; RV32I-NEXT: and a1, a1, s4
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: lui a1, 209715
; RV32I-NEXT: addi s5, a1, 819
; RV32I-NEXT: and a1, a0, s5
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, s5
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: lui a1, 61681
; RV32I-NEXT: addi s6, a1, -241
; RV32I-NEXT: and a0, a0, s6
; RV32I-NEXT: lui a1, 4112
; RV32I-NEXT: addi s3, a1, 257
; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: mv s0, a0		; RV32I-NEXT: mv s0, a0
; RV32I-NEXT: addi a0, s1, -1		; RV32I-NEXT: neg a0, a0
; RV32I-NEXT: not a1, s1		; RV32I-NEXT: and a0, s0, a0
; RV32I-NEXT: and a0, a1, a0		; RV32I-NEXT: lui a1, 30667
; RV32I-NEXT: srli a1, a0, 1		; RV32I-NEXT: addi s3, a1, 1329
; RV32I-NEXT: and a1, a1, s4
; RV32I-NEXT: sub a0, a0, a1
; RV32I-NEXT: and a1, a0, s5
; RV32I-NEXT: srli a0, a0, 2
; RV32I-NEXT: and a0, a0, s5
; RV32I-NEXT: add a0, a1, a0
; RV32I-NEXT: srli a1, a0, 4
; RV32I-NEXT: add a0, a0, a1
; RV32I-NEXT: and a0, a0, s6
; RV32I-NEXT: mv a1, s3		; RV32I-NEXT: mv a1, s3
; RV32I-NEXT: call __mulsi3@plt		; RV32I-NEXT: call __mulsi3@plt
; RV32I-NEXT: bnez s2, .LBB3_2		; RV32I-NEXT: lui a1, %hi(.LCPI3_0)
		; RV32I-NEXT: addi s5, a1, %lo(.LCPI3_0)
		; RV32I-NEXT: li s4, 32
		; RV32I-NEXT: li s2, 32
		; RV32I-NEXT: beqz s0, .LBB3_2
; RV32I-NEXT: # %bb.1:		; RV32I-NEXT: # %bb.1:
; RV32I-NEXT: srli a0, a0, 24		; RV32I-NEXT: srli a0, a0, 27
; RV32I-NEXT: addi a0, a0, 32		; RV32I-NEXT: add a0, s5, a0
; RV32I-NEXT: j .LBB3_3		; RV32I-NEXT: lbu s2, 0(a0)
; RV32I-NEXT: .LBB3_2:		; RV32I-NEXT: .LBB3_2:
; RV32I-NEXT: srli a0, s0, 24		; RV32I-NEXT: neg a0, s1
; RV32I-NEXT: .LBB3_3:		; RV32I-NEXT: and a0, s1, a0
		; RV32I-NEXT: mv a1, s3
		; RV32I-NEXT: call __mulsi3@plt
		; RV32I-NEXT: beqz s1, .LBB3_4
		; RV32I-NEXT: # %bb.3:
		; RV32I-NEXT: srli a0, a0, 27
		; RV32I-NEXT: add a0, s5, a0
		; RV32I-NEXT: lbu s4, 0(a0)
		; RV32I-NEXT: .LBB3_4:
		; RV32I-NEXT: bnez s0, .LBB3_6
		; RV32I-NEXT: # %bb.5:
		; RV32I-NEXT: addi s2, s4, 32
		; RV32I-NEXT: .LBB3_6:
		; RV32I-NEXT: mv a0, s2
; RV32I-NEXT: li a1, 0		; RV32I-NEXT: li a1, 0
; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s0, 24(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s1, 20(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s2, 16(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s3, 12(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s4, 8(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload		; RV32I-NEXT: lw s5, 4(sp) # 4-byte Folded Reload
; RV32I-NEXT: lw s6, 0(sp) # 4-byte Folded Reload
; RV32I-NEXT: addi sp, sp, 32		; RV32I-NEXT: addi sp, sp, 32
; RV32I-NEXT: ret		; RV32I-NEXT: ret
;		;
; RV32ZBB-LABEL: cttz_i64:		; RV32ZBB-LABEL: cttz_i64:
; RV32ZBB: # %bb.0:		; RV32ZBB: # %bb.0:
; RV32ZBB-NEXT: bnez a0, .LBB3_2		; RV32ZBB-NEXT: bnez a0, .LBB3_2
; RV32ZBB-NEXT: # %bb.1:		; RV32ZBB-NEXT: # %bb.1:
; RV32ZBB-NEXT: ctz a0, a1		; RV32ZBB-NEXT: ctz a0, a1
▲ Show 20 Lines • Show All 566 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rv64zbb.ll

Show First 20 Lines • Show All 362 Lines • ▼ Show 20 Lines	; RV64ZBB-NEXT: ret
ret i64 %1		ret i64 %1
}		}

declare i32 @llvm.cttz.i32(i32, i1)		declare i32 @llvm.cttz.i32(i32, i1)

define signext i32 @cttz_i32(i32 signext %a) nounwind {		define signext i32 @cttz_i32(i32 signext %a) nounwind {
; RV64I-LABEL: cttz_i32:		; RV64I-LABEL: cttz_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: sext.w a1, a0
; RV64I-NEXT: beqz a1, .LBB6_2
; RV64I-NEXT: # %bb.1: # %cond.false
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addiw a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: sext.w s0, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: beqz s0, .LBB6_3
; RV64I-NEXT: srli a1, a0, 1		; RV64I-NEXT: # %bb.1: # %cond.false
; RV64I-NEXT: lui a2, 349525		; RV64I-NEXT: neg a1, a0
; RV64I-NEXT: addiw a2, a2, 1365
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: lui a1, 209715
; RV64I-NEXT: addiw a1, a1, 819
; RV64I-NEXT: and a2, a0, a1
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: add a0, a2, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: lui a1, 61681
; RV64I-NEXT: addiw a1, a1, -241
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: lui a1, 4112		; RV64I-NEXT: lui a1, 30667
; RV64I-NEXT: addiw a1, a1, 257		; RV64I-NEXT: addiw a1, a1, 1329
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srliw a0, a0, 24		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 32
		; RV64I-NEXT: beqz s0, .LBB6_4
		; RV64I-NEXT: # %bb.2: # %cond.false
		; RV64I-NEXT: srliw a0, a1, 27
		; RV64I-NEXT: lui a1, %hi(.LCPI6_0)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI6_0)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: j .LBB6_4
		; RV64I-NEXT: .LBB6_3:
		; RV64I-NEXT: li a0, 32
		; RV64I-NEXT: .LBB6_4: # %cond.end
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
; RV64I-NEXT: .LBB6_2:
; RV64I-NEXT: li a0, 32
; RV64I-NEXT: ret
;		;
; RV64ZBB-LABEL: cttz_i32:		; RV64ZBB-LABEL: cttz_i32:
; RV64ZBB: # %bb.0:		; RV64ZBB: # %bb.0:
; RV64ZBB-NEXT: ctzw a0, a0		; RV64ZBB-NEXT: ctzw a0, a0
; RV64ZBB-NEXT: ret		; RV64ZBB-NEXT: ret
%1 = call i32 @llvm.cttz.i32(i32 %a, i1 false)		%1 = call i32 @llvm.cttz.i32(i32 %a, i1 false)
ret i32 %1		ret i32 %1
}		}

define signext i32 @cttz_zero_undef_i32(i32 signext %a) nounwind {		define signext i32 @cttz_zero_undef_i32(i32 signext %a) nounwind {
; RV64I-LABEL: cttz_zero_undef_i32:		; RV64I-LABEL: cttz_zero_undef_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addiw a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: mv s0, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: neg a0, a0
; RV64I-NEXT: srli a1, a0, 1		; RV64I-NEXT: and a0, s0, a0
; RV64I-NEXT: lui a2, 349525		; RV64I-NEXT: lui a1, 30667
; RV64I-NEXT: addiw a2, a2, 1365		; RV64I-NEXT: addiw a1, a1, 1329
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: lui a1, 209715
; RV64I-NEXT: addiw a1, a1, 819
; RV64I-NEXT: and a2, a0, a1
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: add a0, a2, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: lui a1, 61681
; RV64I-NEXT: addiw a1, a1, -241
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: lui a1, 4112
; RV64I-NEXT: addiw a1, a1, 257
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srliw a0, a0, 24		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 32
		; RV64I-NEXT: beqz s0, .LBB7_2
		; RV64I-NEXT: # %bb.1:
		; RV64I-NEXT: srliw a0, a1, 27
		; RV64I-NEXT: lui a1, %hi(.LCPI7_0)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI7_0)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: .LBB7_2:
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV64ZBB-LABEL: cttz_zero_undef_i32:		; RV64ZBB-LABEL: cttz_zero_undef_i32:
; RV64ZBB: # %bb.0:		; RV64ZBB: # %bb.0:
; RV64ZBB-NEXT: ctzw a0, a0		; RV64ZBB-NEXT: ctzw a0, a0
; RV64ZBB-NEXT: ret		; RV64ZBB-NEXT: ret
%1 = call i32 @llvm.cttz.i32(i32 %a, i1 true)		%1 = call i32 @llvm.cttz.i32(i32 %a, i1 true)
ret i32 %1		ret i32 %1
}		}

define signext i32 @findFirstSet_i32(i32 signext %a) nounwind {		define signext i32 @findFirstSet_i32(i32 signext %a) nounwind {
; RV64I-LABEL: findFirstSet_i32:		; RV64I-LABEL: findFirstSet_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: mv s0, a0		; RV64I-NEXT: mv s0, a0
; RV64I-NEXT: addiw a0, a0, -1		; RV64I-NEXT: neg a0, a0
; RV64I-NEXT: not a1, s0		; RV64I-NEXT: and a0, s0, a0
; RV64I-NEXT: and a0, a1, a0		; RV64I-NEXT: lui a1, 30667
; RV64I-NEXT: srli a1, a0, 1		; RV64I-NEXT: addiw a1, a1, 1329
; RV64I-NEXT: lui a2, 349525
; RV64I-NEXT: addiw a2, a2, 1365
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: lui a1, 209715
; RV64I-NEXT: addiw a1, a1, 819
; RV64I-NEXT: and a2, a0, a1
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: add a0, a2, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: lui a1, 61681
; RV64I-NEXT: addiw a1, a1, -241
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: lui a1, 4112
; RV64I-NEXT: addiw a1, a1, 257
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: mv a1, a0		; RV64I-NEXT: li a1, 32
; RV64I-NEXT: li a0, -1
; RV64I-NEXT: beqz s0, .LBB8_2		; RV64I-NEXT: beqz s0, .LBB8_2
; RV64I-NEXT: # %bb.1:		; RV64I-NEXT: # %bb.1:
; RV64I-NEXT: srliw a0, a1, 24		; RV64I-NEXT: srliw a0, a0, 27
		; RV64I-NEXT: lui a1, %hi(.LCPI8_0)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI8_0)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a1, 0(a0)
; RV64I-NEXT: .LBB8_2:		; RV64I-NEXT: .LBB8_2:
		; RV64I-NEXT: li a0, -1
		; RV64I-NEXT: beqz s0, .LBB8_4
		; RV64I-NEXT: # %bb.3:
		; RV64I-NEXT: mv a0, a1
		; RV64I-NEXT: .LBB8_4:
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV64ZBB-LABEL: findFirstSet_i32:		; RV64ZBB-LABEL: findFirstSet_i32:
; RV64ZBB: # %bb.0:		; RV64ZBB: # %bb.0:
; RV64ZBB-NEXT: mv a1, a0		; RV64ZBB-NEXT: mv a1, a0
; RV64ZBB-NEXT: li a0, -1		; RV64ZBB-NEXT: li a0, -1
; RV64ZBB-NEXT: beqz a1, .LBB8_2		; RV64ZBB-NEXT: beqz a1, .LBB8_2
; RV64ZBB-NEXT: # %bb.1:		; RV64ZBB-NEXT: # %bb.1:
; RV64ZBB-NEXT: ctzw a0, a1		; RV64ZBB-NEXT: ctzw a0, a1
; RV64ZBB-NEXT: .LBB8_2:		; RV64ZBB-NEXT: .LBB8_2:
; RV64ZBB-NEXT: ret		; RV64ZBB-NEXT: ret
%1 = call i32 @llvm.cttz.i32(i32 %a, i1 true)		%1 = call i32 @llvm.cttz.i32(i32 %a, i1 true)
%2 = icmp eq i32 %a, 0		%2 = icmp eq i32 %a, 0
%3 = select i1 %2, i32 -1, i32 %1		%3 = select i1 %2, i32 -1, i32 %1
ret i32 %3		ret i32 %3
}		}

define signext i32 @ffs_i32(i32 signext %a) nounwind {		define signext i32 @ffs_i32(i32 signext %a) nounwind {
; RV64I-LABEL: ffs_i32:		; RV64I-LABEL: ffs_i32:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -32
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 24(sp) # 8-byte Folded Spill
; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd s0, 16(sp) # 8-byte Folded Spill
; RV64I-NEXT: mv s0, a0		; RV64I-NEXT: sd s1, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addiw a0, a0, -1		; RV64I-NEXT: mv s1, a0
; RV64I-NEXT: not a1, s0		; RV64I-NEXT: li s0, 0
; RV64I-NEXT: and a0, a1, a0		; RV64I-NEXT: neg a0, a0
; RV64I-NEXT: srli a1, a0, 1		; RV64I-NEXT: and a0, s1, a0
; RV64I-NEXT: lui a2, 349525		; RV64I-NEXT: lui a1, 30667
; RV64I-NEXT: addiw a2, a2, 1365		; RV64I-NEXT: addiw a1, a1, 1329
; RV64I-NEXT: and a1, a1, a2
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: lui a1, 209715
; RV64I-NEXT: addiw a1, a1, 819
; RV64I-NEXT: and a2, a0, a1
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: add a0, a2, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: lui a1, 61681
; RV64I-NEXT: addiw a1, a1, -241
; RV64I-NEXT: and a0, a0, a1
; RV64I-NEXT: lui a1, 4112
; RV64I-NEXT: addiw a1, a1, 257
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: mv a1, a0		; RV64I-NEXT: li a1, 32
; RV64I-NEXT: li a0, 0		; RV64I-NEXT: beqz s1, .LBB9_2
; RV64I-NEXT: beqz s0, .LBB9_2
; RV64I-NEXT: # %bb.1:		; RV64I-NEXT: # %bb.1:
; RV64I-NEXT: srliw a0, a1, 24		; RV64I-NEXT: srliw a0, a0, 27
; RV64I-NEXT: addi a0, a0, 1		; RV64I-NEXT: lui a1, %hi(.LCPI9_0)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI9_0)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a1, 0(a0)
; RV64I-NEXT: .LBB9_2:		; RV64I-NEXT: .LBB9_2:
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: beqz s1, .LBB9_4
; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload		; RV64I-NEXT: # %bb.3:
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi s0, a1, 1
		; RV64I-NEXT: .LBB9_4:
		; RV64I-NEXT: mv a0, s0
		; RV64I-NEXT: ld ra, 24(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 16(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s1, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: addi sp, sp, 32
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV64ZBB-LABEL: ffs_i32:		; RV64ZBB-LABEL: ffs_i32:
; RV64ZBB: # %bb.0:		; RV64ZBB: # %bb.0:
; RV64ZBB-NEXT: mv a1, a0		; RV64ZBB-NEXT: mv a1, a0
; RV64ZBB-NEXT: li a0, 0		; RV64ZBB-NEXT: li a0, 0
; RV64ZBB-NEXT: beqz a1, .LBB9_2		; RV64ZBB-NEXT: beqz a1, .LBB9_2
; RV64ZBB-NEXT: # %bb.1:		; RV64ZBB-NEXT: # %bb.1:
; RV64ZBB-NEXT: ctzw a0, a1		; RV64ZBB-NEXT: ctzw a0, a1
; RV64ZBB-NEXT: addi a0, a0, 1		; RV64ZBB-NEXT: addi a0, a0, 1
; RV64ZBB-NEXT: .LBB9_2:		; RV64ZBB-NEXT: .LBB9_2:
; RV64ZBB-NEXT: ret		; RV64ZBB-NEXT: ret
%1 = call i32 @llvm.cttz.i32(i32 %a, i1 true)		%1 = call i32 @llvm.cttz.i32(i32 %a, i1 true)
%2 = add i32 %1, 1		%2 = add i32 %1, 1
%3 = icmp eq i32 %a, 0		%3 = icmp eq i32 %a, 0
%4 = select i1 %3, i32 0, i32 %2		%4 = select i1 %3, i32 0, i32 %2
ret i32 %4		ret i32 %4
}		}

declare i64 @llvm.cttz.i64(i64, i1)		declare i64 @llvm.cttz.i64(i64, i1)

define i64 @cttz_i64(i64 %a) nounwind {		define i64 @cttz_i64(i64 %a) nounwind {
; RV64I-LABEL: cttz_i64:		; RV64I-LABEL: cttz_i64:
; RV64I: # %bb.0:		; RV64I: # %bb.0:
; RV64I-NEXT: beqz a0, .LBB10_2		; RV64I-NEXT: beqz a0, .LBB10_4
; RV64I-NEXT: # %bb.1: # %cond.false		; RV64I-NEXT: # %bb.1: # %cond.false
; RV64I-NEXT: addi sp, sp, -16		; RV64I-NEXT: addi sp, sp, -16
; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill		; RV64I-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
; RV64I-NEXT: addi a1, a0, -1		; RV64I-NEXT: sd s0, 0(sp) # 8-byte Folded Spill
; RV64I-NEXT: not a0, a0		; RV64I-NEXT: mv s0, a0
; RV64I-NEXT: and a0, a0, a1		; RV64I-NEXT: neg a0, a0
		; RV64I-NEXT: and a0, s0, a0
; RV64I-NEXT: lui a1, %hi(.LCPI10_0)		; RV64I-NEXT: lui a1, %hi(.LCPI10_0)
; RV64I-NEXT: ld a1, %lo(.LCPI10_0)(a1)		; RV64I-NEXT: ld a1, %lo(.LCPI10_0)(a1)
; RV64I-NEXT: lui a2, %hi(.LCPI10_1)
; RV64I-NEXT: ld a2, %lo(.LCPI10_1)(a2)
; RV64I-NEXT: srli a3, a0, 1
; RV64I-NEXT: and a1, a3, a1
; RV64I-NEXT: sub a0, a0, a1
; RV64I-NEXT: and a1, a0, a2
; RV64I-NEXT: srli a0, a0, 2
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a2, %hi(.LCPI10_2)
; RV64I-NEXT: ld a2, %lo(.LCPI10_2)(a2)
; RV64I-NEXT: add a0, a1, a0
; RV64I-NEXT: srli a1, a0, 4
; RV64I-NEXT: add a0, a0, a1
; RV64I-NEXT: and a0, a0, a2
; RV64I-NEXT: lui a1, %hi(.LCPI10_3)
; RV64I-NEXT: ld a1, %lo(.LCPI10_3)(a1)
; RV64I-NEXT: call __muldi3@plt		; RV64I-NEXT: call __muldi3@plt
; RV64I-NEXT: srli a0, a0, 56		; RV64I-NEXT: mv a1, a0
		; RV64I-NEXT: li a0, 64
		; RV64I-NEXT: beqz s0, .LBB10_3
		; RV64I-NEXT: # %bb.2: # %cond.false
		; RV64I-NEXT: srli a0, a1, 58
		; RV64I-NEXT: lui a1, %hi(.LCPI10_1)
		; RV64I-NEXT: addi a1, a1, %lo(.LCPI10_1)
		; RV64I-NEXT: add a0, a1, a0
		; RV64I-NEXT: lbu a0, 0(a0)
		; RV64I-NEXT: .LBB10_3: # %cond.false
; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload		; RV64I-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
		; RV64I-NEXT: ld s0, 0(sp) # 8-byte Folded Reload
; RV64I-NEXT: addi sp, sp, 16		; RV64I-NEXT: addi sp, sp, 16
; RV64I-NEXT: ret		; RV64I-NEXT: ret
; RV64I-NEXT: .LBB10_2:		; RV64I-NEXT: .LBB10_4:
; RV64I-NEXT: li a0, 64		; RV64I-NEXT: li a0, 64
; RV64I-NEXT: ret		; RV64I-NEXT: ret
;		;
; RV64ZBB-LABEL: cttz_i64:		; RV64ZBB-LABEL: cttz_i64:
; RV64ZBB: # %bb.0:		; RV64ZBB: # %bb.0:
; RV64ZBB-NEXT: ctz a0, a0		; RV64ZBB-NEXT: ctz a0, a0
; RV64ZBB-NEXT: ret		; RV64ZBB-NEXT: ret
%1 = call i64 @llvm.cttz.i64(i64 %a, i1 false)		%1 = call i64 @llvm.cttz.i64(i64 %a, i1 false)
▲ Show 20 Lines • Show All 515 Lines • Show Last 20 Lines

llvm/test/CodeGen/SPARC/cttz.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -march=sparc -mcpu=v9 \| FileCheck %s

				define i32 @f(i32 %x) {
				; CHECK-LABEL: f:
				barannikov88Unsubmitted Not Done Reply Inline Actions Unused barannikov88: Unused
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Removed. gsocshubham: Removed.
				; CHECK: .cfi_startproc
				; CHECK-NEXT: ! %bb.0: ! %entry
				; CHECK-NEXT: mov %g0, %o1
				; CHECK-NEXT: sub %o1, %o0, %o1
				; CHECK-NEXT: and %o0, %o1, %o1
				; CHECK-NEXT: sethi 122669, %o2
				; CHECK-NEXT: or %o2, 305, %o2
				; CHECK-NEXT: smul %o1, %o2, %o1
				; CHECK-NEXT: srl %o1, 27, %o1
				; CHECK-NEXT: sethi %hi(.LCPI0_0), %o2
				; CHECK-NEXT: add %o2, %lo(.LCPI0_0), %o2
				; CHECK-NEXT: ldub [%o2+%o1], %o1
				; CHECK-NEXT: cmp %o0, 0
				; CHECK-NEXT: move %icc, 32, %o1
				; CHECK-NEXT: move %icc, 0, %o1
				; CHECK-NEXT: retl
				; CHECK-NEXT: mov %o1, %o0
				entry:
				%0 = call i32 @llvm.cttz.i32(i32 %x, i1 true)
				%1 = icmp eq i32 %x, 0
				%2 = select i1 %1, i32 0, i32 %0
				%3 = trunc i32 %2 to i8
				barannikov88Unsubmitted Not Done Reply Inline Actions Why not just `ret i32 %0` ? barannikov88: Why not just `ret i32 %0` ?
				gsocshubhamAuthorUnsubmitted Done Reply Inline Actions Updated test accordingly. gsocshubham: Updated test accordingly.
				%conv = zext i8 %3 to i32
				ret i32 %conv
				}

				define i64 @g(i64 %x) {
				; CHECK-LABEL: g:
				; CHECK: .cfi_startproc
				; CHECK-NEXT: ! %bb.0: ! %entry
				; CHECK-NEXT: mov %g0, %o2
				; CHECK-NEXT: sub %o2, %o1, %o3
				; CHECK-NEXT: and %o1, %o3, %o3
				; CHECK-NEXT: sethi 122669, %o4
				; CHECK-NEXT: or %o4, 305, %o4
				; CHECK-NEXT: smul %o3, %o4, %o3
				; CHECK-NEXT: srl %o3, 27, %o3
				; CHECK-NEXT: sethi %hi(.LCPI1_0), %o5
				; CHECK-NEXT: add %o5, %lo(.LCPI1_0), %o5
				; CHECK-NEXT: ldub [%o5+%o3], %g2
				; CHECK-NEXT: sub %o2, %o0, %o3
				; CHECK-NEXT: and %o0, %o3, %o3
				; CHECK-NEXT: smul %o3, %o4, %o3
				; CHECK-NEXT: srl %o3, 27, %o3
				; CHECK-NEXT: ldub [%o5+%o3], %o3
				; CHECK-NEXT: cmp %o1, 0
				; CHECK-NEXT: move %icc, 32, %g2
				; CHECK-NEXT: cmp %o0, 0
				; CHECK-NEXT: move %icc, 32, %o3
				; CHECK-NEXT: add %o3, 32, %o3
				; CHECK-NEXT: cmp %o1, 0
				; CHECK-NEXT: movne %icc, %g2, %o3
				; CHECK-NEXT: or %o1, %o0, %o0
				; CHECK-NEXT: cmp %o0, 0
				; CHECK-NEXT: move %icc, 0, %o3
				; CHECK-NEXT: mov %o2, %o0
				; CHECK-NEXT: retl
				; CHECK-NEXT: mov %o3, %o1
				entry:
				%0 = call i64 @llvm.cttz.i64(i64 %x, i1 true)
				%1 = icmp eq i64 %x, 0
				%2 = select i1 %1, i64 0, i64 %0
				%3 = trunc i64 %2 to i32
				%conv = zext i32 %3 to i64
				ret i64 %conv
				}

				; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
				declare i32 @llvm.cttz.i32(i32, i1 immarg) #0
				declare i64 @llvm.cttz.i64(i64, i1 immarg) #0

				attributes #0 = { nocallback nofree nosync nounwind readnone speculatable willreturn }

This is an archive of the discontinued LLVM Phabricator instance.

Emit table lookup from TargetLowering::expandCTTZ()ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 450227

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp

llvm/test/CodeGen/ARM/cttz.ll

llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll

llvm/test/CodeGen/RISCV/rv32zbb.ll

llvm/test/CodeGen/RISCV/rv64zbb.ll

llvm/test/CodeGen/SPARC/cttz.ll

Emit table lookup from TargetLowering::expandCTTZ()
ClosedPublic