This is an archive of the discontinued LLVM Phabricator instance.

Allen retitled this revision from [AArch64][SelectionDAG] Fold the mov and lsl into ubfiz to [AArch64][CodeGen] Fold the mov and lsl into ubfiz.Aug 21 2022, 7:32 PM

Allen mentioned this in D132322: [AArch64][SelectionDAG] Optimize multiplication by constant.Aug 22 2022, 6:04 PM

Allen added a reviewer: efriedma.Aug 22 2022, 6:29 PM

LGTM

Some of these changes feel a little silly, like the change to select_cc.ll, but there isn't any performance difference, so I guess it's fine. (lsl is actually ubfiz in disguise.)

This revision is now accepted and ready to land.Aug 23 2022, 10:29 AM

paulwalker-arm added inline comments.Aug 23 2022, 10:54 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7348	@efriedma / @Allen : Sorry for the naive question but I'm not hugely familiar with this node and the documentation says: /// All other bits are /// assumed to be equal to the bits in the immediate integer constant in the /// first operand. This instruction just communicates information; No code /// should be generated. So is this safe? I mean if no code is generated then in this instance how can we be sure `$Rn` has it's top 32bits zeroed? I'm kind of assuming this is why the original code is using `INSERT_SUBREG`? The emitted code is valid, but could something query the `SUBREG_TO_REG` post isel and reuse/transform it in an invalid way?

efriedma added inline comments.Aug 23 2022, 12:23 PM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7348	Oh, I didn't notice it was changed from using INSERT_SUBREG to use SUBREG_TO_REG. I just glanced over the patterns and assumed it was just using the same pattern we were already using. Not sure why it was changed. We normally only use SUBREG_TO_REG if we know the operand actually clears the high bits. Not sure there's really any practical consequence here; there isn't very much code that's actually aware of the semantics of SUBREG_TO_REG.

Allen added inline comments.Aug 23 2022, 8:19 PM

llvm/lib/Target/AArch64/AArch64InstrInfo.td

7348

hi @paulwalker-arm @efriedma

 According https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/UBFIZ--Unsigned-Bitfield-Insert-in-Zero--an-alias-of-UBFM-?lang=en, the insn ubfiz  copies specified bits from the source $Rn and clears other unspecified bits. 
In this pattern, the top 32bits will be clean, so I think it is save.

If we use the original INSERT_SUBREG, I find the test case @loop2 in file llvm/test/CodeGen/AArch64/tbl-loops.ll regressses.

+++ b/llvm/test/CodeGen/AArch64/tbl-loops.ll
@@ -151,7 +151,8 @@ define void @loop2(i8* noalias nocapture noundef writeonly %dst, float* nocaptur
 ; CHECK-NEXT:    cmp w8, #2
 ; CHECK-NEXT:    b.ls .LBB1_4
 ; CHECK-NEXT:  // %bb.2: // %vector.memcheck
-; CHECK-NEXT:    ubfiz x9, x8, #1, #32
+; CHECK-NEXT:    mov w9, w8
+; CHECK-NEXT:    ubfiz x9, x9, #1, #32

I think this is because the extra %37: grp64all = IMPLICIT_DEF need alloc a register for %37, which is not always specified the same register of %36
%36:gpr64 = INSERT_SUBREG %37：grp64all (tied-def 0), %24:gpr32common, %subreg.sub_32

paulwalker-arm added inline comments.Aug 24 2022, 7:40 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7348	I can see the emitted code is correct but my concern relates to a scenario where a later pass decides to reuse the result of the `SUBREG_TO_REG` MachineInstr in isolation, with the understanding that the top 32bits will be zero. In this instance we don't know how `$Rn` was produced so we don't know anything about its top 32bits. Within PeepholeOptimizer.cpp there's this comment: // It's an error to translate this: // // %reg1025 = <sext> %reg1024 // ... // %reg1026 = SUBREG_TO_REG 0, %reg1024, 4 // // into this: // // %reg1025 = <sext> %reg1024 // ... // %reg1027 = COPY %reg1025:4 // %reg1026 = SUBREG_TO_REG 0, %reg1027, 4 // // The problem here is that SUBREG_TO_REG is there to assert that an // implicit zext occurs. It doesn't insert a zext instruction. If we allow // the COPY here, it will give us the value after the <sext>, not the // original value of %reg1024 before <sext>. This articulates what I mean. An optimisation is prevented that would otherwise break `SUBREG_TO_REG`'s contract. Presumably this is for a good reason and suggests we should only use `SUBREG_TO_REG` when we can honour its contract. That said, if Eli is happy then I shalt worry about it, but to me this just looks unsafe. Perhaps we're missing a MachineInstr transformation for `INSERT_SUBREG (undef), $GPR` -> `SUBREG_TO_REG 0, $GPR` for the cases where we know the top bits of `$GPR` will be zero?

Allen added inline comments.Aug 24 2022, 8:39 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7348	Thanks @paulwalker-arm for detail explanation, and now I think your worry is make sense. I'll try to take a look at the MachineInstr transformation as your suggestion.

Allen added inline comments.Aug 30 2022, 6:40 AM

llvm/lib/Target/AArch64/AArch64InstrInfo.td
7348	Address with D132939.

Allen mentioned this in D132939: [Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero.Sep 7 2022, 6:15 PM

Allen mentioned this in rGb6655333c255: [Peephole] rewrite INSERT_SUBREG to SUBREG_TO_REG if upper bits zero.Sep 8 2022, 6:03 PM

rebase after D132939

hi @paulwalker-arm, I have updated the pattern with INSERT_SUBREG, Any further concerns? Thanks.

Harbormaster completed remote builds in B185768: Diff 458951.Sep 8 2022, 8:54 PM

In D132325#3779167, @Allen wrote:

hi @paulwalker-arm, I have updated the pattern with INSERT_SUBREG, Any further concerns? Thanks.

I've a small request to ensure we preserve an existing test but otherwise this looks good to me. Thanks for persevering with the work.

llvm/test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll
461–466	Your version of these `CHECK-MACHO` lines is missing `add x9, x9, #15` that is fundamental to what the checks are validating. I've checked myself and the add is present so you just need to re-add the `CHECK-MACHO` line itself.

update test case according comment

Harbormaster completed remote builds in B185832: Diff 459042.Sep 9 2022, 7:31 AM

Closed by commit rG7a8178258516: [AArch64][CodeGen]Fold the mov and lsl into ubfiz (authored by Allen). · Explain WhySep 9 2022, 8:50 AM

This revision was automatically updated to reflect the committed changes.

Allen marked an inline comment as done.

Allen added a commit: rG7a8178258516: [AArch64][CodeGen]Fold the mov and lsl into ubfiz.

Looks like this is breaking ASAN buildbot:
https://lab.llvm.org/buildbot/#/builders/168/builds/8760

Log snippet:

******************** TEST 'MLIR-Unit :: IR/./MLIRIRTests/25/57' FAILED ********************
Script(shard):
--
GTEST_OUTPUT=json:/usr/local/google/home/kda/src/bw02/llvm-project/llvm_build_asan/tools/mlir/unittests/IR/./MLIRIRTests-MLIR-Unit-669455-25-57.json GTEST_SHUFFLE=0 GTEST_TOTAL_SHARDS=57 GTEST_SHARD_INDEX=25 /usr/local/google/home/kda/src/bw02/llvm-project/llvm_build_asan/tools/mlir/unittests/IR/./MLIRIRTests
--


Note: This is test shard 26 of 57.
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from BlockAndValueMapping
[ RUN      ] BlockAndValueMapping.TypedValue
[       OK ] BlockAndValueMapping.TypedValue (9 ms)
[----------] 1 test from BlockAndValueMapping (9 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (9 ms total)
[  PASSED  ] 1 test.

=================================================================
==2874064==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 80 byte(s) in 1 object(s) allocated from:
    #0 0x5571e8d4b60e in malloc /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x5571e920c8d8 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, unsigned int) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Operation.cpp:77:46
    #2 0x5571e920c48d in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, mlir::RegionRange) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Operation.cpp:39:19
    #3 0x5571e920c171 in mlir::Operation::create(mlir::OperationState const&) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Operation.cpp:28:10
    #4 0x5571e908b8d4 in mlir::OpBuilder::create(mlir::OperationState const&) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Builders.cpp:409:17
    #5 0x5571e8dbbddb in test::TestOpConstant mlir::OpBuilder::create<test::TestOpConstant, mlir::FloatType, mlir::FloatAttr>(mlir::Location, mlir::FloatType&&, mlir::FloatAttr&&) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/include/mlir/IR/Builders.h:457:16
    #6 0x5571e8dbaf8a in BlockAndValueMapping_TypedValue_Test::TestBody() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/unittests/IR/BlockAndValueMapping.cpp:27:26
    #7 0x5571e8f8ec4c in HandleExceptionsInMethodIfSupported<testing::Test, void> /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #8 0x5571e8f8ec4c in testing::Test::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2508:5
    #9 0x5571e8f9141c in testing::TestInfo::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2684:11
    #10 0x5571e8f9281f in testing::TestSuite::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2816:28
    #11 0x5571e8fbd61a in testing::internal::UnitTestImpl::RunAllTests() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:5338:44
    #12 0x5571e8fbc301 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #13 0x5571e8fbc301 in testing::UnitTest::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4925:10
    #14 0x5571e8f71c7a in RUN_ALL_TESTS /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2472:46
    #15 0x5571e8f71c7a in main /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:55:10
    #16 0x7fbe1698481c in __libc_start_main csu/../csu/libc-start.c:332:16

Direct leak of 80 byte(s) in 1 object(s) allocated from:
    #0 0x5571e8d4b60e in malloc /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/compiler-rt/lib/asan/asan_malloc_linux.cpp:69:3
    #1 0x5571e920c8d8 in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, unsigned int) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Operation.cpp:77:46
    #2 0x5571e920c48d in mlir::Operation::create(mlir::Location, mlir::OperationName, mlir::TypeRange, mlir::ValueRange, mlir::NamedAttrList&&, mlir::BlockRange, mlir::RegionRange) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Operation.cpp:39:19
    #3 0x5571e920c171 in mlir::Operation::create(mlir::OperationState const&) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Operation.cpp:28:10
    #4 0x5571e908b8d4 in mlir::OpBuilder::create(mlir::OperationState const&) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/lib/IR/Builders.cpp:409:17
    #5 0x5571e8dbb75b in test::TestOpConstant mlir::OpBuilder::create<test::TestOpConstant, mlir::IntegerType, mlir::IntegerAttr>(mlir::Location, mlir::IntegerType&&, mlir::IntegerAttr&&) /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/include/mlir/IR/Builders.h:457:16
    #6 0x5571e8dbaefc in BlockAndValueMapping_TypedValue_Test::TestBody() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/mlir/unittests/IR/BlockAndValueMapping.cpp:25:26
    #7 0x5571e8f8ec4c in HandleExceptionsInMethodIfSupported<testing::Test, void> /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #8 0x5571e8f8ec4c in testing::Test::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2508:5
    #9 0x5571e8f9141c in testing::TestInfo::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2684:11
    #10 0x5571e8f9281f in testing::TestSuite::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:2816:28
    #11 0x5571e8fbd61a in testing::internal::UnitTestImpl::RunAllTests() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:5338:44
    #12 0x5571e8fbc301 in HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool> /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc
    #13 0x5571e8fbc301 in testing::UnitTest::Run() /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/src/gtest.cc:4925:10
    #14 0x5571e8f71c7a in RUN_ALL_TESTS /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/googletest/include/gtest/gtest.h:2472:46
    #15 0x5571e8f71c7a in main /usr/local/google/home/kda/src/bw02/llvm-project/llvm-project/llvm/utils/unittest/UnitTestMain/TestMain.cpp:55:10
    #16 0x7fbe1698481c in __libc_start_main csu/../csu/libc-start.c:332:16

SUMMARY: AddressSanitizer: 160 byte(s) leaked in 2 allocation(s).

--
exit: 1
--

********************

In D132325#3781749, @kda wrote:

Looks like this is breaking ASAN buildbot:
https://lab.llvm.org/buildbot/#/builders/168/builds/8760

Are you certain you're assigning blame correctly? That's a unittest failure in MLIR on x86; I don't see how it could possibly involve the AArch64 backend.

In D132325#3781756, @efriedma wrote:

In D132325#3781749, @kda wrote:

Looks like this is breaking ASAN buildbot:
https://lab.llvm.org/buildbot/#/builders/168/builds/8760

Are you certain you're assigning blame correctly? That's a unittest failure in MLIR on x86; I don't see how it could possibly involve the AArch64 backend.

Well, I reverted this change and re-ran the buildbot. It looks like it passed (but admittedly it got hung up, but I saw the offending test pass).
I am re-running the test to see if I can get it to run to completion.
If it passes, I will probably submit the reversion.

I agree, I thought this was an unlikely candidate, but I tried 3 others with no luck.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64InstrInfo.td

5 lines

test/

CodeGen/

AArch64/

aarch64-dynamic-stack-layout.ll

74 lines

aarch64-matrix-umull-smull.ll

4 lines

select_cc.ll

8 lines

shrink-wrapping-vla.ll

11 lines

tbl-loops.ll

4 lines

Diff 459087

llvm/lib/Target/AArch64/AArch64InstrInfo.td

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 7,339 Lines • ▼ Show 20 Lines
	def : Pat<(shl (sext_inreg GPR32:$Rn, i16), (i64 imm0_31:$imm)),			def : Pat<(shl (sext_inreg GPR32:$Rn, i16), (i64 imm0_31:$imm)),
	(SBFMWri GPR32:$Rn, (i64 (i32shift_a imm0_31:$imm)),			(SBFMWri GPR32:$Rn, (i64 (i32shift_a imm0_31:$imm)),
	(i64 (i32shift_sext_i16 imm0_31:$imm)))>;			(i64 (i32shift_sext_i16 imm0_31:$imm)))>;
	def : Pat<(shl (sext_inreg GPR64:$Rn, i16), (i64 imm0_63:$imm)),			def : Pat<(shl (sext_inreg GPR64:$Rn, i16), (i64 imm0_63:$imm)),
	(SBFMXri GPR64:$Rn, (i64 (i64shift_a imm0_63:$imm)),			(SBFMXri GPR64:$Rn, (i64 (i64shift_a imm0_63:$imm)),
	(i64 (i64shift_sext_i16 imm0_63:$imm)))>;			(i64 (i64shift_sext_i16 imm0_63:$imm)))>;

	def : Pat<(shl (i64 (sext GPR32:$Rn)), (i64 imm0_63:$imm)),			def : Pat<(shl (i64 (sext GPR32:$Rn)), (i64 imm0_63:$imm)),
	(SBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),			(SBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
				paulwalker-armUnsubmitted Not Done Reply Inline Actions @efriedma / @Allen : Sorry for the naive question but I'm not hugely familiar with this node and the documentation says: /// All other bits are /// assumed to be equal to the bits in the immediate integer constant in the /// first operand. This instruction just communicates information; No code /// should be generated. So is this safe? I mean if no code is generated then in this instance how can we be sure `$Rn` has it's top 32bits zeroed? I'm kind of assuming this is why the original code is using `INSERT_SUBREG`? The emitted code is valid, but could something query the `SUBREG_TO_REG` post isel and reuse/transform it in an invalid way? paulwalker-arm: @efriedma / @Allen : Sorry for the naive question but I'm not hugely familiar with this node…
				efriedmaUnsubmitted Not Done Reply Inline Actions Oh, I didn't notice it was changed from using INSERT_SUBREG to use SUBREG_TO_REG. I just glanced over the patterns and assumed it was just using the same pattern we were already using. Not sure why it was changed. We normally only use SUBREG_TO_REG if we know the operand actually clears the high bits. Not sure there's really any practical consequence here; there isn't very much code that's actually aware of the semantics of SUBREG_TO_REG. efriedma: Oh, I didn't notice it was changed from using INSERT_SUBREG to use SUBREG_TO_REG. I just…
				AllenAuthorUnsubmitted Done Reply Inline Actions hi @paulwalker-arm @efriedma According https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/UBFIZ--Unsigned-Bitfield-Insert-in-Zero--an-alias-of-UBFM-?lang=en, the insn ubfiz copies specified bits from the source $Rn and clears other unspecified bits. In this pattern, the top 32bits will be clean, so I think it is save. If we use the original INSERT_SUBREG, I find the test case @loop2 in file llvm/test/CodeGen/AArch64/tbl-loops.ll regressses. +++ b/llvm/test/CodeGen/AArch64/tbl-loops.ll @@ -151,7 +151,8 @@ define void @loop2(i8* noalias nocapture noundef writeonly %dst, float* nocaptur ; CHECK-NEXT: cmp w8, #2 ; CHECK-NEXT: b.ls .LBB1_4 ; CHECK-NEXT: // %bb.2: // %vector.memcheck -; CHECK-NEXT: ubfiz x9, x8, #1, #32 +; CHECK-NEXT: mov w9, w8 +; CHECK-NEXT: ubfiz x9, x9, #1, #32 I think this is because the extra %37: grp64all = IMPLICIT_DEF need alloc a register for %37, which is not always specified the same register of %36 %36:gpr64 = INSERT_SUBREG %37：grp64all (tied-def 0), %24:gpr32common, %subreg.sub_32 Allen: hi @paulwalker-arm @efriedma According https://developer.arm.com/documentation/ddi0602/2021…
				paulwalker-armUnsubmitted Not Done Reply Inline Actions I can see the emitted code is correct but my concern relates to a scenario where a later pass decides to reuse the result of the `SUBREG_TO_REG` MachineInstr in isolation, with the understanding that the top 32bits will be zero. In this instance we don't know how `$Rn` was produced so we don't know anything about its top 32bits. Within PeepholeOptimizer.cpp there's this comment: // It's an error to translate this: // // %reg1025 = <sext> %reg1024 // ... // %reg1026 = SUBREG_TO_REG 0, %reg1024, 4 // // into this: // // %reg1025 = <sext> %reg1024 // ... // %reg1027 = COPY %reg1025:4 // %reg1026 = SUBREG_TO_REG 0, %reg1027, 4 // // The problem here is that SUBREG_TO_REG is there to assert that an // implicit zext occurs. It doesn't insert a zext instruction. If we allow // the COPY here, it will give us the value after the <sext>, not the // original value of %reg1024 before <sext>. This articulates what I mean. An optimisation is prevented that would otherwise break `SUBREG_TO_REG`'s contract. Presumably this is for a good reason and suggests we should only use `SUBREG_TO_REG` when we can honour its contract. That said, if Eli is happy then I shalt worry about it, but to me this just looks unsafe. Perhaps we're missing a MachineInstr transformation for `INSERT_SUBREG (undef), $GPR` -> `SUBREG_TO_REG 0, $GPR` for the cases where we know the top bits of `$GPR` will be zero? paulwalker-arm: I can see the emitted code is correct but my concern relates to a scenario where a later pass…
				AllenAuthorUnsubmitted Done Reply Inline Actions Thanks @paulwalker-arm for detail explanation, and now I think your worry is make sense. I'll try to take a look at the MachineInstr transformation as your suggestion. Allen: Thanks @paulwalker-arm for detail explanation, and now I think your worry is make sense. I'll…
				AllenAuthorUnsubmitted Done Reply Inline Actions Address with D132939. Allen: Address with D132939.
	(i64 (i64shift_a imm0_63:$imm)),			(i64 (i64shift_a imm0_63:$imm)),
	(i64 (i64shift_sext_i32 imm0_63:$imm)))>;			(i64 (i64shift_sext_i32 imm0_63:$imm)))>;

				def : Pat<(shl (i64 (zext GPR32:$Rn)), (i64 imm0_63:$imm)),
				(UBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
				(i64 (i64shift_a imm0_63:$imm)),
				(i64 (i64shift_sext_i32 imm0_63:$imm)))>;

	// sra patterns have an AddedComplexity of 10, so make sure we have a higher			// sra patterns have an AddedComplexity of 10, so make sure we have a higher
	// AddedComplexity for the following patterns since we want to match sext + sra			// AddedComplexity for the following patterns since we want to match sext + sra
	// patterns before we attempt to match a single sra node.			// patterns before we attempt to match a single sra node.
	let AddedComplexity = 20 in {			let AddedComplexity = 20 in {
	// We support all sext + sra combinations which preserve at least one bit of the			// We support all sext + sra combinations which preserve at least one bit of the
	// original value which is to be sign extended. E.g. we support shifts up to			// original value which is to be sign extended. E.g. we support shifts up to
	// bitwidth-1 bits.			// bitwidth-1 bits.
	def : Pat<(sra (sext_inreg GPR32:$Rn, i8), (i64 imm0_7:$imm)),			def : Pat<(sra (sext_inreg GPR32:$Rn, i8), (i64 imm0_7:$imm)),
	▲ Show 20 Lines • Show All 1,056 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll

	Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines
	; CHECK: .cfi_offset w20, -16			; CHECK: .cfi_offset w20, -16
	; CHECK: .cfi_offset w30, -24			; CHECK: .cfi_offset w30, -24
	; CHECK: .cfi_offset w29, -32			; CHECK: .cfi_offset w29, -32
	; Check that space is reserved on the stack for the local variable,			; Check that space is reserved on the stack for the local variable,
	; rounded up to a multiple of 16 to keep the stack pointer 16-byte aligned.			; rounded up to a multiple of 16 to keep the stack pointer 16-byte aligned.
	; CHECK: sub sp, sp, #16			; CHECK: sub sp, sp, #16
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK: ldr w[[IARG:[0-9]+]], [x29, #40]			; CHECK: ldr w[[IARG:[0-9]+]], [x29, #40]
	; CHECK: ldr d[[DARG:[0-9]+]], [x29, #56]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; CHECK: mov w9, w0			; CHECK: ubfiz x9, x0, #2, #32
	; CHECK: mov x10, sp
	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15			; CHECK: add x9, x9, #15
				; CHECK: ldr d[[DARG:[0-9]+]], [x29, #56]
	; CHECK: and x9, x9, #0x7fffffff0			; CHECK: and x9, x9, #0x7fffffff0
				; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]			; CHECK: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through frame pointer			; Check correct access to local variable, through frame pointer
	; CHECK: ldur w[[ILOC:[0-9]+]], [x29, #-4]			; CHECK: ldur w[[ILOC:[0-9]+]], [x29, #-4]
	; Check correct accessing of the VLA variable through the base pointer			; Check correct accessing of the VLA variable through the base pointer
	; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	Show All 24 Lines
	; CHECK: stp x29, x30, [sp, #-16]!			; CHECK: stp x29, x30, [sp, #-16]!
	; CHECK: mov x29, sp			; CHECK: mov x29, sp
	; Check that space is reserved on the stack for the local variable,			; Check that space is reserved on the stack for the local variable,
	; rounded up to a multiple of 16 to keep the stack pointer 16-byte aligned.			; rounded up to a multiple of 16 to keep the stack pointer 16-byte aligned.
	; CHECK: sub sp, sp, #16			; CHECK: sub sp, sp, #16
	; Check correctness of cfi pseudo-instructions			; Check correctness of cfi pseudo-instructions
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK: ldr w[[IARG:[0-9]+]], [x29, #24]			; CHECK: ldr w[[IARG:[0-9]+]], [x29, #24]
	; CHECK: ldr d[[DARG:[0-9]+]], [x29, #40]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; CHECK: mov w9, w0			; CHECK: ubfiz x9, x0, #2, #32
	; CHECK: mov x10, sp
	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15			; CHECK: add x9, x9, #15
				; CHECK: ldr d[[DARG:[0-9]+]], [x29, #40]
	; CHECK: and x9, x9, #0x7fffffff0			; CHECK: and x9, x9, #0x7fffffff0
				; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]			; CHECK: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through frame pointer			; Check correct access to local variable, through frame pointer
	; CHECK: ldur w[[ILOC:[0-9]+]], [x29, #-4]			; CHECK: ldur w[[ILOC:[0-9]+]], [x29, #-4]
	; Check correct accessing of the VLA variable through the base pointer			; Check correct accessing of the VLA variable through the base pointer
	; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK: mov sp, x29			; CHECK: mov sp, x29
	; CHECK: ldp x29, x30, [sp], #16			; CHECK: ldp x29, x30, [sp], #16
	Show All 36 Lines
	; bytes & the base pointer (x19) gets initialized to			; bytes & the base pointer (x19) gets initialized to
	; this 128-byte aligned area for local variables &			; this 128-byte aligned area for local variables &
	; spill slots			; spill slots
	; CHECK: sub x9, sp, #80			; CHECK: sub x9, sp, #80
	; CHECK: and sp, x9, #0xffffffffffffff80			; CHECK: and sp, x9, #0xffffffffffffff80
	; CHECK: mov x19, sp			; CHECK: mov x19, sp
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK: ldr w[[IARG:[0-9]+]], [x29, #56]			; CHECK: ldr w[[IARG:[0-9]+]], [x29, #56]
	; CHECK: ldr d[[DARG:[0-9]+]], [x29, #72]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; and set-up of base pointer (x19).			; and set-up of base pointer (x19).
	; CHECK: mov w9, w0			; CHECK: ubfiz x9, x0, #2, #32
	; CHECK: mov x10, sp
	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15			; CHECK: add x9, x9, #15
				; CHECK: ldr d[[DARG:[0-9]+]], [x29, #72]
	; CHECK: and x9, x9, #0x7fffffff0			; CHECK: and x9, x9, #0x7fffffff0
				; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]			; CHECK: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through base pointer			; Check correct access to local variable, through base pointer
	; CHECK: ldr w[[ILOC:[0-9]+]], [x19]			; CHECK: ldr w[[ILOC:[0-9]+]], [x19]
	; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK: mov sp, x29			; CHECK: mov sp, x29
	Show All 24 Lines
	; bytes & the base pointer (x19) gets initialized to			; bytes & the base pointer (x19) gets initialized to
	; this 128-byte aligned area for local variables &			; this 128-byte aligned area for local variables &
	; spill slots			; spill slots
	; CHECK-MACHO: sub x9, sp, #80			; CHECK-MACHO: sub x9, sp, #80
	; CHECK-MACHO: and sp, x9, #0xffffffffffffff80			; CHECK-MACHO: and sp, x9, #0xffffffffffffff80
	; CHECK-MACHO: mov x19, sp			; CHECK-MACHO: mov x19, sp
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK-MACHO: ldr w[[IARG:[0-9]+]], [x29, #20]			; CHECK-MACHO: ldr w[[IARG:[0-9]+]], [x29, #20]
	; CHECK-MACHO: ldr d[[DARG:[0-9]+]], [x29, #32]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; and set-up of base pointer (x19).			; and set-up of base pointer (x19).
	; CHECK-MACHO: mov w9, w0			; CHECK-MACHO: ubfiz x9, x0, #2, #32
	; CHECK-MACHO: mov x10, sp
	; CHECK-MACHO: lsl x9, x9, #2
	; CHECK-MACHO: add x9, x9, #15			; CHECK-MACHO: add x9, x9, #15
				; CHECK-MACHO: ldr d[[DARG:[0-9]+]], [x29, #32]
	; CHECK-MACHO: and x9, x9, #0x7fffffff0			; CHECK-MACHO: and x9, x9, #0x7fffffff0
				; CHECK-MACHO: mov x10, sp
	; CHECK-MACHO: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK-MACHO: sub x[[VLASPTMP:[0-9]+]], x10, x9
				paulwalker-armUnsubmitted Done Reply Inline Actions Your version of these `CHECK-MACHO` lines is missing `add x9, x9, #15` that is fundamental to what the checks are validating. I've checked myself and the add is present so you just need to re-add the `CHECK-MACHO` line itself. paulwalker-arm: Your version of these `CHECK-MACHO` lines is missing `add x9, x9, #15` that is fundamental to…
	; CHECK-MACHO: mov sp, x[[VLASPTMP]]			; CHECK-MACHO: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through base pointer			; Check correct access to local variable, through base pointer
	; CHECK-MACHO: ldr w[[ILOC:[0-9]+]], [x19]			; CHECK-MACHO: ldr w[[ILOC:[0-9]+]], [x19]
	; CHECK-MACHO: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK-MACHO: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK-MACHO: sub sp, x29, #32			; CHECK-MACHO: sub sp, x29, #32
	; CHECK-MACHO: ldp x29, x30, [sp, #32]			; CHECK-MACHO: ldp x29, x30, [sp, #32]
	Show All 28 Lines
	; bytes & the base pointer (x19) gets initialized to			; bytes & the base pointer (x19) gets initialized to
	; this 128-byte aligned area for local variables &			; this 128-byte aligned area for local variables &
	; spill slots			; spill slots
	; CHECK: sub x9, sp, #96			; CHECK: sub x9, sp, #96
	; CHECK: and sp, x9, #0xffffffffffffff80			; CHECK: and sp, x9, #0xffffffffffffff80
	; CHECK: mov x19, sp			; CHECK: mov x19, sp
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK: ldr w[[IARG:[0-9]+]], [x29, #40]			; CHECK: ldr w[[IARG:[0-9]+]], [x29, #40]
	; CHECK: ldr d[[DARG:[0-9]+]], [x29, #56]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; and set-up of base pointer (x19).			; and set-up of base pointer (x19).
	; CHECK: mov w9, w0			; CHECK: ubfiz x9, x0, #2, #32
	; CHECK: mov x10, sp
	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15			; CHECK: add x9, x9, #15
				; CHECK: ldr d[[DARG:[0-9]+]], [x29, #56]
	; CHECK: and x9, x9, #0x7fffffff0			; CHECK: and x9, x9, #0x7fffffff0
				; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]			; CHECK: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through base pointer			; Check correct access to local variable, through base pointer
	; CHECK: ldr w[[ILOC:[0-9]+]], [x19]			; CHECK: ldr w[[ILOC:[0-9]+]], [x19]
	; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK: mov sp, x29			; CHECK: mov sp, x29
	Show All 11 Lines
	; bytes & the base pointer (x19) gets initialized to			; bytes & the base pointer (x19) gets initialized to
	; this 128-byte aligned area for local variables &			; this 128-byte aligned area for local variables &
	; spill slots			; spill slots
	; CHECK-MACHO: sub x9, sp, #96			; CHECK-MACHO: sub x9, sp, #96
	; CHECK-MACHO: and sp, x9, #0xffffffffffffff80			; CHECK-MACHO: and sp, x9, #0xffffffffffffff80
	; CHECK-MACHO: mov x19, sp			; CHECK-MACHO: mov x19, sp
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK-MACHO: ldr w[[IARG:[0-9]+]], [x29, #20]			; CHECK-MACHO: ldr w[[IARG:[0-9]+]], [x29, #20]
	; CHECK-MACHO: ldr d[[DARG:[0-9]+]], [x29, #32]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; and set-up of base pointer (x19).			; and set-up of base pointer (x19).
	; CHECK-MACHO: mov w9, w0			; CHECK-MACHO: ubfiz x9, x0, #2, #32
	; CHECK-MACHO: mov x10, sp
	; CHECK-MACHO: lsl x9, x9, #2
	; CHECK-MACHO: add x9, x9, #15			; CHECK-MACHO: add x9, x9, #15
				; CHECK-MACHO: ldr d[[DARG:[0-9]+]], [x29, #32]
	; CHECK-MACHO: and x9, x9, #0x7fffffff0			; CHECK-MACHO: and x9, x9, #0x7fffffff0
				; CHECK-MACHO: mov x10, sp
	; CHECK-MACHO: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK-MACHO: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK-MACHO: mov sp, x[[VLASPTMP]]			; CHECK-MACHO: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through base pointer			; Check correct access to local variable, through base pointer
	; CHECK-MACHO: ldr w[[ILOC:[0-9]+]], [x19]			; CHECK-MACHO: ldr w[[ILOC:[0-9]+]], [x19]
	; CHECK-MACHO: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK-MACHO: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK-MACHO: sub sp, x29, #16			; CHECK-MACHO: sub sp, x29, #16
	Show All 27 Lines
	; bytes & the base pointer (x19) gets initialized to			; bytes & the base pointer (x19) gets initialized to
	; this 128-byte aligned area for local variables &			; this 128-byte aligned area for local variables &
	; spill slots			; spill slots
	; CHECK: sub x9, sp, #7, lsl #12			; CHECK: sub x9, sp, #7, lsl #12
	; CHECK: and sp, x9, #0xffffffffffff8000			; CHECK: and sp, x9, #0xffffffffffff8000
	; CHECK: mov x19, sp			; CHECK: mov x19, sp
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK: ldr w[[IARG:[0-9]+]], [x29, #40]			; CHECK: ldr w[[IARG:[0-9]+]], [x29, #40]
	; CHECK: ldr d[[DARG:[0-9]+]], [x29, #56]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; and set-up of base pointer (x19).			; and set-up of base pointer (x19).
	; CHECK: mov w9, w0			; CHECK: ubfiz x9, x0, #2, #32
	; CHECK: mov x10, sp
	; CHECK: lsl x9, x9, #2
	; CHECK: add x9, x9, #15			; CHECK: add x9, x9, #15
				; CHECK: ldr d[[DARG:[0-9]+]], [x29, #56]
	; CHECK: and x9, x9, #0x7fffffff0			; CHECK: and x9, x9, #0x7fffffff0
				; CHECK: mov x10, sp
	; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK: mov sp, x[[VLASPTMP]]			; CHECK: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through base pointer			; Check correct access to local variable, through base pointer
	; CHECK: ldr w[[ILOC:[0-9]+]], [x19]			; CHECK: ldr w[[ILOC:[0-9]+]], [x19]
	; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK: mov sp, x29			; CHECK: mov sp, x29
	Show All 11 Lines
	; bytes & the base pointer (x19) gets initialized to			; bytes & the base pointer (x19) gets initialized to
	; this 128-byte aligned area for local variables &			; this 128-byte aligned area for local variables &
	; spill slots			; spill slots
	; CHECK-MACHO: sub x9, sp, #7, lsl #12			; CHECK-MACHO: sub x9, sp, #7, lsl #12
	; CHECK-MACHO: and sp, x9, #0xffffffffffff8000			; CHECK-MACHO: and sp, x9, #0xffffffffffff8000
	; CHECK-MACHO: mov x19, sp			; CHECK-MACHO: mov x19, sp
	; Check correct access to arguments passed on the stack, through frame pointer			; Check correct access to arguments passed on the stack, through frame pointer
	; CHECK-MACHO: ldr w[[IARG:[0-9]+]], [x29, #20]			; CHECK-MACHO: ldr w[[IARG:[0-9]+]], [x29, #20]
	; CHECK-MACHO: ldr d[[DARG:[0-9]+]], [x29, #32]
	; Check correct reservation of 16-byte aligned VLA (size in w0) on stack			; Check correct reservation of 16-byte aligned VLA (size in w0) on stack
	; and set-up of base pointer (x19).			; and set-up of base pointer (x19).
	; CHECK-MACHO: mov w9, w0			; CHECK-MACHO: ubfiz x9, x0, #2, #32
	; CHECK-MACHO: mov x10, sp
	; CHECK-MACHO: lsl x9, x9, #2
	; CHECK-MACHO: add x9, x9, #15			; CHECK-MACHO: add x9, x9, #15
				; CHECK-MACHO: ldr d[[DARG:[0-9]+]], [x29, #32]
	; CHECK-MACHO: and x9, x9, #0x7fffffff0			; CHECK-MACHO: and x9, x9, #0x7fffffff0
				; CHECK-MACHO: mov x10, sp
	; CHECK-MACHO: sub x[[VLASPTMP:[0-9]+]], x10, x9			; CHECK-MACHO: sub x[[VLASPTMP:[0-9]+]], x10, x9
	; CHECK-MACHO: mov sp, x[[VLASPTMP]]			; CHECK-MACHO: mov sp, x[[VLASPTMP]]
	; Check correct access to local variable, through base pointer			; Check correct access to local variable, through base pointer
	; CHECK-MACHO: ldr w[[ILOC:[0-9]+]], [x19]			; CHECK-MACHO: ldr w[[ILOC:[0-9]+]], [x19]
	; CHECK-MACHO: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]			; CHECK-MACHO: ldr w[[VLA:[0-9]+]], [x[[VLASPTMP]]]
	; Check epilogue:			; Check epilogue:
	; Check that stack pointer get restored from frame pointer.			; Check that stack pointer get restored from frame pointer.
	; CHECK-MACHO: sub sp, x29, #16			; CHECK-MACHO: sub sp, x29, #16
	▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines

	define void @matrix_mul_double_shuffle(i32 %N, i32* nocapture %C, i16* nocapture readonly %A, i16 %val) {			define void @matrix_mul_double_shuffle(i32 %N, i32* nocapture %C, i16* nocapture readonly %A, i16 %val) {
	; CHECK-LABEL: matrix_mul_double_shuffle:			; CHECK-LABEL: matrix_mul_double_shuffle:
	; CHECK: // %bb.0: // %vector.header			; CHECK: // %bb.0: // %vector.header
	; CHECK-NEXT: and w8, w3, #0xffff			; CHECK-NEXT: and w8, w3, #0xffff
	; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0			; CHECK-NEXT: // kill: def $w0 killed $w0 def $x0
	; CHECK-NEXT: dup v0.4h, w8			; CHECK-NEXT: dup v0.4h, w8
	; CHECK-NEXT: and x8, x0, #0xfffffff8			; CHECK-NEXT: and x8, x0, #0xfffffff8
				; CHECK-NEXT: // kill: def $w0 killed $w0 killed $x0 def $x0
	; CHECK-NEXT: .LBB2_1: // %vector.body			; CHECK-NEXT: .LBB2_1: // %vector.body
	; CHECK-NEXT: // =>This Inner Loop Header: Depth=1			; CHECK-NEXT: // =>This Inner Loop Header: Depth=1
	; CHECK-NEXT: ldrh w9, [x2], #16			; CHECK-NEXT: ldrh w9, [x2], #16
	; CHECK-NEXT: subs x8, x8, #8			; CHECK-NEXT: subs x8, x8, #8
	; CHECK-NEXT: dup v1.4h, w9			; CHECK-NEXT: dup v1.4h, w9
	; CHECK-NEXT: mov w9, w0			; CHECK-NEXT: ubfiz x9, x0, #2, #32
	; CHECK-NEXT: lsl x9, x9, #2
	; CHECK-NEXT: add w0, w0, #8			; CHECK-NEXT: add w0, w0, #8
	; CHECK-NEXT: umull v1.4s, v0.4h, v1.4h			; CHECK-NEXT: umull v1.4s, v0.4h, v1.4h
	; CHECK-NEXT: str q1, [x1, x9]			; CHECK-NEXT: str q1, [x1, x9]
	; CHECK-NEXT: b.ne .LBB2_1			; CHECK-NEXT: b.ne .LBB2_1
	; CHECK-NEXT: // %bb.2: // %for.end12			; CHECK-NEXT: // %bb.2: // %for.end12
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	vector.header:			vector.header:
	%conv4 = zext i16 %val to i32			%conv4 = zext i16 %val to i32
	▲ Show 20 Lines • Show All 578 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/select_cc.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc < %s -mtriple=aarch64 \| FileCheck %s			; RUN: llc < %s -mtriple=aarch64 \| FileCheck %s

	define i64 @select_ogt_float(float %a, float %b) {			define i64 @select_ogt_float(float %a, float %b) {
	; CHECK-LABEL: select_ogt_float:			; CHECK-LABEL: select_ogt_float:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: fcmp s0, s1			; CHECK-NEXT: fcmp s0, s1
	; CHECK-NEXT: cset w8, gt			; CHECK-NEXT: cset w8, gt
	; CHECK-NEXT: lsl x0, x8, #2			; CHECK-NEXT: ubfiz x0, x8, #2, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%cc = fcmp ogt float %a, %b			%cc = fcmp ogt float %a, %b
	%sel = select i1 %cc, i64 4, i64 0			%sel = select i1 %cc, i64 4, i64 0
	ret i64 %sel			ret i64 %sel
	}			}

	define i64 @select_ule_float_inverse(float %a, float %b) {			define i64 @select_ule_float_inverse(float %a, float %b) {
	; CHECK-LABEL: select_ule_float_inverse:			; CHECK-LABEL: select_ule_float_inverse:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: fcmp s0, s1			; CHECK-NEXT: fcmp s0, s1
	; CHECK-NEXT: cset w8, gt			; CHECK-NEXT: cset w8, gt
	; CHECK-NEXT: lsl x0, x8, #2			; CHECK-NEXT: ubfiz x0, x8, #2, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%cc = fcmp ule float %a, %b			%cc = fcmp ule float %a, %b
	%sel = select i1 %cc, i64 0, i64 4			%sel = select i1 %cc, i64 0, i64 4
	ret i64 %sel			ret i64 %sel
	}			}

	define i64 @select_eq_i32(i32 %a, i32 %b) {			define i64 @select_eq_i32(i32 %a, i32 %b) {
	; CHECK-LABEL: select_eq_i32:			; CHECK-LABEL: select_eq_i32:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: cmp w0, w1			; CHECK-NEXT: cmp w0, w1
	; CHECK-NEXT: cset w8, eq			; CHECK-NEXT: cset w8, eq
	; CHECK-NEXT: lsl x0, x8, #2			; CHECK-NEXT: ubfiz x0, x8, #2, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%cc = icmp eq i32 %a, %b			%cc = icmp eq i32 %a, %b
	%sel = select i1 %cc, i64 4, i64 0			%sel = select i1 %cc, i64 4, i64 0
	ret i64 %sel			ret i64 %sel
	}			}

	define i64 @select_ne_i32_inverse(i32 %a, i32 %b) {			define i64 @select_ne_i32_inverse(i32 %a, i32 %b) {
	; CHECK-LABEL: select_ne_i32_inverse:			; CHECK-LABEL: select_ne_i32_inverse:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: cmp w0, w1			; CHECK-NEXT: cmp w0, w1
	; CHECK-NEXT: cset w8, eq			; CHECK-NEXT: cset w8, eq
	; CHECK-NEXT: lsl x0, x8, #2			; CHECK-NEXT: ubfiz x0, x8, #2, #32
	; CHECK-NEXT: ret			; CHECK-NEXT: ret
	entry:			entry:
	%cc = icmp ne i32 %a, %b			%cc = icmp ne i32 %a, %b
	%sel = select i1 %cc, i64 0, i64 4			%sel = select i1 %cc, i64 0, i64 4
	ret i64 %sel			ret i64 %sel
	}			}

	define <2 x double> @select_olt_load_cmp(<2 x double> %a, <2 x float>* %src) {			define <2 x double> @select_olt_load_cmp(<2 x double> %a, <2 x float>* %src) {
	Show All 29 Lines

llvm/test/CodeGen/AArch64/shrink-wrapping-vla.ll

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: .cfi_def_cfa_offset 16			; CHECK-NEXT: .cfi_def_cfa_offset 16
	; CHECK-NEXT: mov x29, sp			; CHECK-NEXT: mov x29, sp
	; CHECK-NEXT: .cfi_def_cfa w29, 16			; CHECK-NEXT: .cfi_def_cfa w29, 16
	; CHECK-NEXT: .cfi_offset w30, -8			; CHECK-NEXT: .cfi_offset w30, -8
	; CHECK-NEXT: .cfi_offset w29, -16			; CHECK-NEXT: .cfi_offset w29, -16


	; VLA allocation			; VLA allocation
	; CHECK: mov [[X2:x[0-9]+]], sp			; CHECK: ubfiz x8, x0, #2, #32
				; CHECK: mov x9, sp
				; CHECK: add x8, x8, #15
	; CHECK: mov [[SAVE:x[0-9]+]], sp			; CHECK: mov [[SAVE:x[0-9]+]], sp
	; CHECK: add [[X1:x[0-9]+]], [[X1]], #15			; CHECK: and [[X1:x[0-9]+]], [[X1]], #0x7fffffff0
	; CHECK: and [[X1]], [[X1]], #0x7fffffff0
	; Saving the SP via llvm.stacksave()			; Saving the SP via llvm.stacksave()
	; CHECK: sub [[X2]], [[X2]], [[X1]]			; CHECK: sub [[X1]], [[X2:x[0-9]+]], [[X1]]

	; The next instruction comes from llvm.stackrestore()			; The next instruction comes from llvm.stackrestore()
	; CHECK: mov sp, [[SAVE]]			; CHECK: mov sp, [[SAVE]]
	; Epilogue			; Epilogue
	; CHECK-NEXT: mov sp, x29			; CHECK-NEXT: mov sp, x29
	; CHECK-NEXT: .cfi_def_cfa wsp, 16			; CHECK-NEXT: .cfi_def_cfa wsp, 16
	; CHECK-NEXT: ldp x29, x30, [sp], #16			; CHECK-NEXT: ldp x29, x30, [sp], #16
	; CHECK-NEXT: .cfi_def_cfa_offset 0			; CHECK-NEXT: .cfi_def_cfa_offset 0
	; CHECK-NEXT: .cfi_restore w30			; CHECK-NEXT: .cfi_restore w30
	; CHECK-NEXT: .cfi_restore w29			; CHECK-NEXT: .cfi_restore w29
	; CHECK-NEXT: ret			; CHECK-NEXT: ret

llvm/test/CodeGen/AArch64/tbl-loops.ll

	Show First 20 Lines • Show All 145 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: loop2:			; CHECK-LABEL: loop2:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: subs w8, w2, #1			; CHECK-NEXT: subs w8, w2, #1
	; CHECK-NEXT: b.lt .LBB1_7			; CHECK-NEXT: b.lt .LBB1_7
	; CHECK-NEXT: // %bb.1: // %for.body.preheader			; CHECK-NEXT: // %bb.1: // %for.body.preheader
	; CHECK-NEXT: cmp w8, #2			; CHECK-NEXT: cmp w8, #2
	; CHECK-NEXT: b.ls .LBB1_4			; CHECK-NEXT: b.ls .LBB1_4
	; CHECK-NEXT: // %bb.2: // %vector.memcheck			; CHECK-NEXT: // %bb.2: // %vector.memcheck
	; CHECK-NEXT: lsl x9, x8, #1			; CHECK-NEXT: ubfiz x9, x8, #1, #32
	; CHECK-NEXT: add x9, x9, #2			; CHECK-NEXT: add x9, x9, #2
	; CHECK-NEXT: add x10, x1, x9, lsl #2			; CHECK-NEXT: add x10, x1, x9, lsl #2
	; CHECK-NEXT: cmp x10, x0			; CHECK-NEXT: cmp x10, x0
	; CHECK-NEXT: b.ls .LBB1_8			; CHECK-NEXT: b.ls .LBB1_8
	; CHECK-NEXT: // %bb.3: // %vector.memcheck			; CHECK-NEXT: // %bb.3: // %vector.memcheck
	; CHECK-NEXT: add x9, x0, x9			; CHECK-NEXT: add x9, x0, x9
	; CHECK-NEXT: cmp x9, x1			; CHECK-NEXT: cmp x9, x1
	; CHECK-NEXT: b.ls .LBB1_8			; CHECK-NEXT: b.ls .LBB1_8
	▲ Show 20 Lines • Show All 367 Lines • ▼ Show 20 Lines
	; CHECK-LABEL: loop4:			; CHECK-LABEL: loop4:
	; CHECK: // %bb.0: // %entry			; CHECK: // %bb.0: // %entry
	; CHECK-NEXT: subs w8, w2, #1			; CHECK-NEXT: subs w8, w2, #1
	; CHECK-NEXT: b.lt .LBB3_7			; CHECK-NEXT: b.lt .LBB3_7
	; CHECK-NEXT: // %bb.1: // %for.body.preheader			; CHECK-NEXT: // %bb.1: // %for.body.preheader
	; CHECK-NEXT: cmp w8, #2			; CHECK-NEXT: cmp w8, #2
	; CHECK-NEXT: b.ls .LBB3_4			; CHECK-NEXT: b.ls .LBB3_4
	; CHECK-NEXT: // %bb.2: // %vector.memcheck			; CHECK-NEXT: // %bb.2: // %vector.memcheck
	; CHECK-NEXT: lsl x9, x8, #2			; CHECK-NEXT: ubfiz x9, x8, #2, #32
	; CHECK-NEXT: add x9, x9, #4			; CHECK-NEXT: add x9, x9, #4
	; CHECK-NEXT: add x10, x1, x9, lsl #2			; CHECK-NEXT: add x10, x1, x9, lsl #2
	; CHECK-NEXT: cmp x10, x0			; CHECK-NEXT: cmp x10, x0
	; CHECK-NEXT: b.ls .LBB3_8			; CHECK-NEXT: b.ls .LBB3_8
	; CHECK-NEXT: // %bb.3: // %vector.memcheck			; CHECK-NEXT: // %bb.3: // %vector.memcheck
	; CHECK-NEXT: add x9, x0, x9			; CHECK-NEXT: add x9, x0, x9
	; CHECK-NEXT: cmp x9, x1			; CHECK-NEXT: cmp x9, x1
	; CHECK-NEXT: b.ls .LBB3_8			; CHECK-NEXT: b.ls .LBB3_8
	▲ Show 20 Lines • Show All 212 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][CodeGen] Fold the mov and lsl into ubfizClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 459087

llvm/lib/Target/AArch64/AArch64InstrInfo.td

llvm/test/CodeGen/AArch64/aarch64-dynamic-stack-layout.ll

llvm/test/CodeGen/AArch64/aarch64-matrix-umull-smull.ll

llvm/test/CodeGen/AArch64/select_cc.ll

llvm/test/CodeGen/AArch64/shrink-wrapping-vla.ll

llvm/test/CodeGen/AArch64/tbl-loops.ll

[AArch64][CodeGen] Fold the mov and lsl into ubfiz
ClosedPublic