This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
-
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
multi-vector-store-size.ll

Differential D52816

[AArch64] Create proper memoperand for multi-vector stores
ClosedPublic

Authored by greened on Oct 2 2018, 6:20 PM.

Download Raw Diff

Details

Reviewers

niravd
trong
sdesmalen
jfb
eli.friedman
mstorsjo
t.p.northover
javed.absar

Commits

rG3e89fa8e088d: [AArch64] Create proper memoperand for multi-vector stores
rG53e869da7d66: [AArch64] Create proper memoperand for multi-vector stores
rL345631: [AArch64] Create proper memoperand for multi-vector stores
rL345315: [AArch64] Create proper memoperand for multi-vector stores

Summary

Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is.

Diff Detail

Repository: rL LLVM

Event Timeline

greened created this revision.Oct 2 2018, 6:20 PM

Herald added a reviewer: javed.absar. · View Herald TranscriptOct 2 2018, 6:20 PM

Herald added subscribers: llvm-commits, kristof.beyls. · View Herald Transcript

I don't have a good testcase for this, unfortunately. I discovered it while investigating strange aliasing results on a very large code. I'm not entirely sure how to write a synthetic test for machine-level alias analysis. I can write the IR of course but I don't know how to run alias analysis on it and spit out useful information to compare with FileCheck. The alias analysis has to happen after isel. Is anyone aware of other tests that do something like that?

I don't have a complete solution for you, but for trying to come up with smaller cases when compiling larger code bases, I've had a lot of success and am a big fan of creduce. Also, it might be interesting to see if the original author of this line left context for _why_ this loop starts at 1 rather than 0 in their commit message. With a git checkout, you can find the commit that last touched this (which isn't necessarily the commit that added it) via git blame <file> -L <line number>. Then with that sha of the commit, git show <sha>. Might have more information.

nickdesaulniers removed a reviewer: nickdesaulniers.Oct 3 2018, 9:38 AM

nickdesaulniers added a subscriber: nickdesaulniers.

You could just directly test that the computed memory operand is correct: write a test that runs "llc -stop-after=isel" and check the MIR. Without your patch, you should see something like "ST1Fourv2d killed %5, %4 :: (store 48 into %ir.addr, align 64)"; with your patch, that will be "store 64".

I don't think it's necessary to write a test where the final assembly is actually affected by the incorrect memory operand.

In D52816#1254107, @efriedma wrote:

You could just directly test that the computed memory operand is correct: write a test that runs "llc -stop-after=isel" and check the MIR. Without your patch, you should see something like "ST1Fourv2d killed %5, %4 :: (store 48 into %ir.addr, align 64)"; with your patch, that will be "store 64".

Ah yeah, that's a great idea. I'll do that. Thanks!

In D52816#1253887, @nickdesaulniers wrote:

With a git checkout, you can find the commit that last touched this (which isn't necessarily the commit that added it) via git blame <file> -L <line number>. Then with that sha of the commit, git show <sha>. Might have more information.

I checked that but didn't find anything enlightening. My guess is that at one time the intrinsics were defined to take the address as the first operand but then later got switched to make the address the last operand.

Added testcase.

efriedma added inline comments.Oct 5 2018, 11:21 AM

test/CodeGen/AArch64/multi-vector-store-size.ll
69 ↗	(On Diff #168430)	I'm pretty sure these numbers aren't right. You don't have to fix it here, necessarily; being overconservative isn't really a big deal. But it would be nice to have a note in the testcase.

greened added inline comments.Oct 8 2018, 6:11 PM

test/CodeGen/AArch64/multi-vector-store-size.ll
69 ↗	(On Diff #168430)	Yeah, there's an explicit comment in the lowering code about being conservative. I'll add a similar comment to the test.

Added comments about conservative store sizes.

greened marked an inline comment as done.Oct 22 2018, 6:58 AM

LGTM modulo minor typo .

test/CodeGen/AArch64/multi-vector-store-size.ll
73 ↗	(On Diff #170405)	entire*

This revision is now accepted and ready to land.Oct 22 2018, 7:21 AM

Closed by commit rL345315: [AArch64] Create proper memoperand for multi-vector stores (authored by greened). · Explain WhyOct 25 2018, 2:13 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

2 lines

test/

CodeGen/

AArch64/

multi-vector-store-size.ll

164 lines

Diff 171196

llvm/trunk/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,966 Lines • ▼ Show 20 Lines	bool AArch64TargetLowering::getTgtMemIntrinsic(IntrinsicInfo &Info,
case Intrinsic::aarch64_neon_st1x3:		case Intrinsic::aarch64_neon_st1x3:
case Intrinsic::aarch64_neon_st1x4:		case Intrinsic::aarch64_neon_st1x4:
case Intrinsic::aarch64_neon_st2lane:		case Intrinsic::aarch64_neon_st2lane:
case Intrinsic::aarch64_neon_st3lane:		case Intrinsic::aarch64_neon_st3lane:
case Intrinsic::aarch64_neon_st4lane: {		case Intrinsic::aarch64_neon_st4lane: {
Info.opc = ISD::INTRINSIC_VOID;		Info.opc = ISD::INTRINSIC_VOID;
// Conservatively set memVT to the entire set of vectors stored.		// Conservatively set memVT to the entire set of vectors stored.
unsigned NumElts = 0;		unsigned NumElts = 0;
for (unsigned ArgI = 1, ArgE = I.getNumArgOperands(); ArgI < ArgE; ++ArgI) {		for (unsigned ArgI = 0, ArgE = I.getNumArgOperands(); ArgI < ArgE; ++ArgI) {
Type *ArgTy = I.getArgOperand(ArgI)->getType();		Type *ArgTy = I.getArgOperand(ArgI)->getType();
if (!ArgTy->isVectorTy())		if (!ArgTy->isVectorTy())
break;		break;
NumElts += DL.getTypeSizeInBits(ArgTy) / 64;		NumElts += DL.getTypeSizeInBits(ArgTy) / 64;
}		}
Info.memVT = EVT::getVectorVT(I.getType()->getContext(), MVT::i64, NumElts);		Info.memVT = EVT::getVectorVT(I.getType()->getContext(), MVT::i64, NumElts);
Info.ptrVal = I.getArgOperand(I.getNumArgOperands() - 1);		Info.ptrVal = I.getArgOperand(I.getNumArgOperands() - 1);
Info.offset = 0;		Info.offset = 0;
▲ Show 20 Lines • Show All 3,743 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/AArch64/multi-vector-store-size.ll

				; RUN: llc -mtriple=aarch64-linux-gnu -stop-after=isel < %s \| FileCheck %s

				declare void @llvm.aarch64.neon.st2.v4f32.p0f32(<4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st3.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st4.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, <4 x float>, float*)

				declare void @llvm.aarch64.neon.st1x2.v4f32.p0f32(<4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st1x3.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st1x4.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, <4 x float>, float*)

				declare void @llvm.aarch64.neon.st2lane.v4f32.p0f32(<4 x float>, <4 x float>, i64, float*)
				declare void @llvm.aarch64.neon.st3lane.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, i64, float*)
				declare void @llvm.aarch64.neon.st4lane.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, <4 x float>, i64, float*)

				define void @addstx(float* %res, <4 x float>* %a, <4 x float>* %b, <4 x float>* %c, <4 x float>* %d) {
				%al = load <4 x float>, <4 x float>* %a
				%bl = load <4 x float>, <4 x float>* %b
				%cl = load <4 x float>, <4 x float>* %c
				%dl = load <4 x float>, <4 x float>* %d

				%ar = fadd <4 x float> %al, %bl
				%br = fadd <4 x float> %bl, %cl
				%cr = fadd <4 x float> %cl, %dl
				%dr = fadd <4 x float> %dl, %al

				; The sizes below are conservative. AArch64TargetLowering
				; conservatively assumes the entire vector is stored.
				tail call void @llvm.aarch64.neon.st2.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, float* %res)
				; CHECK: ST2Twov4s {{.}} :: (store 32 {{.}})
				tail call void @llvm.aarch64.neon.st3.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, float* %res)
				; CHECK: ST3Threev4s {{.}} :: (store 48 {{.}})
				tail call void @llvm.aarch64.neon.st4.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, <4 x float> %dr, float* %res)
				; CHECK: ST4Fourv4s {{.}} :: (store 64 {{.}})

				ret void
				}

				define void @addst1x(float* %res, <4 x float>* %a, <4 x float>* %b, <4 x float>* %c, <4 x float>* %d) {
				%al = load <4 x float>, <4 x float>* %a
				%bl = load <4 x float>, <4 x float>* %b
				%cl = load <4 x float>, <4 x float>* %c
				%dl = load <4 x float>, <4 x float>* %d

				%ar = fadd <4 x float> %al, %bl
				%br = fadd <4 x float> %bl, %cl
				%cr = fadd <4 x float> %cl, %dl
				%dr = fadd <4 x float> %dl, %al

				; The sizes below are conservative. AArch64TargetLowering
				; conservatively assumes the entire vector is stored.
				tail call void @llvm.aarch64.neon.st1x2.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, float* %res)
				; CHECK: ST1Twov4s {{.}} :: (store 32 {{.}})
				tail call void @llvm.aarch64.neon.st1x3.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, float* %res)
				; CHECK: ST1Threev4s {{.}} :: (store 48 {{.}})
				tail call void @llvm.aarch64.neon.st1x4.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, <4 x float> %dr, float* %res)
				; CHECK: ST1Fourv4s {{.}} :: (store 64 {{.}})

				ret void
				}

				define void @addstxlane(float* %res, <4 x float>* %a, <4 x float>* %b, <4 x float>* %c, <4 x float>* %d) {
				%al = load <4 x float>, <4 x float>* %a
				%bl = load <4 x float>, <4 x float>* %b
				%cl = load <4 x float>, <4 x float>* %c
				%dl = load <4 x float>, <4 x float>* %d

				%ar = fadd <4 x float> %al, %bl
				%br = fadd <4 x float> %bl, %cl
				%cr = fadd <4 x float> %cl, %dl
				%dr = fadd <4 x float> %dl, %al

				; The sizes below are conservative. AArch64TargetLowering
				; conservatively assumes the entire vector is stored.
				tail call void @llvm.aarch64.neon.st2lane.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, i64 1, float* %res)
				; CHECK: ST2i32 {{.}} :: (store 32 {{.}})
				tail call void @llvm.aarch64.neon.st3lane.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, i64 1, float* %res)
				; CHECK: ST3i32 {{.}} :: (store 48 {{.}})
				tail call void @llvm.aarch64.neon.st4lane.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, <4 x float> %dr, i64 1, float* %res)
				; CHECK: ST4i32 {{.}} :: (store 64 {{.}})

				ret void
				}
				; RUN: llc -mtriple=aarch64-linux-gnu -stop-after=isel < %s \| FileCheck %s

				declare void @llvm.aarch64.neon.st2.v4f32.p0f32(<4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st3.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st4.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, <4 x float>, float*)

				declare void @llvm.aarch64.neon.st1x2.v4f32.p0f32(<4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st1x3.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, float*)
				declare void @llvm.aarch64.neon.st1x4.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, <4 x float>, float*)

				declare void @llvm.aarch64.neon.st2lane.v4f32.p0f32(<4 x float>, <4 x float>, i64, float*)
				declare void @llvm.aarch64.neon.st3lane.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, i64, float*)
				declare void @llvm.aarch64.neon.st4lane.v4f32.p0f32(<4 x float>, <4 x float>, <4 x float>, <4 x float>, i64, float*)

				define void @addstx(float* %res, <4 x float>* %a, <4 x float>* %b, <4 x float>* %c, <4 x float>* %d) {
				%al = load <4 x float>, <4 x float>* %a
				%bl = load <4 x float>, <4 x float>* %b
				%cl = load <4 x float>, <4 x float>* %c
				%dl = load <4 x float>, <4 x float>* %d

				%ar = fadd <4 x float> %al, %bl
				%br = fadd <4 x float> %bl, %cl
				%cr = fadd <4 x float> %cl, %dl
				%dr = fadd <4 x float> %dl, %al

				; The sizes below are conservative. AArch64TargetLowering
				; conservatively assumes the entiew vector is stored.
				tail call void @llvm.aarch64.neon.st2.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, float* %res)
				; CHECK: ST2Twov4s {{.}} :: (store 32 {{.}})
				tail call void @llvm.aarch64.neon.st3.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, float* %res)
				; CHECK: ST3Threev4s {{.}} :: (store 48 {{.}})
				tail call void @llvm.aarch64.neon.st4.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, <4 x float> %dr, float* %res)
				; CHECK: ST4Fourv4s {{.}} :: (store 64 {{.}})

				ret void
				}

				define void @addst1x(float* %res, <4 x float>* %a, <4 x float>* %b, <4 x float>* %c, <4 x float>* %d) {
				%al = load <4 x float>, <4 x float>* %a
				%bl = load <4 x float>, <4 x float>* %b
				%cl = load <4 x float>, <4 x float>* %c
				%dl = load <4 x float>, <4 x float>* %d

				%ar = fadd <4 x float> %al, %bl
				%br = fadd <4 x float> %bl, %cl
				%cr = fadd <4 x float> %cl, %dl
				%dr = fadd <4 x float> %dl, %al

				; The sizes below are conservative. AArch64TargetLowering
				; conservatively assumes the entiew vector is stored.
				tail call void @llvm.aarch64.neon.st1x2.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, float* %res)
				; CHECK: ST1Twov4s {{.}} :: (store 32 {{.}})
				tail call void @llvm.aarch64.neon.st1x3.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, float* %res)
				; CHECK: ST1Threev4s {{.}} :: (store 48 {{.}})
				tail call void @llvm.aarch64.neon.st1x4.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, <4 x float> %dr, float* %res)
				; CHECK: ST1Fourv4s {{.}} :: (store 64 {{.}})

				ret void
				}

				define void @addstxlane(float* %res, <4 x float>* %a, <4 x float>* %b, <4 x float>* %c, <4 x float>* %d) {
				%al = load <4 x float>, <4 x float>* %a
				%bl = load <4 x float>, <4 x float>* %b
				%cl = load <4 x float>, <4 x float>* %c
				%dl = load <4 x float>, <4 x float>* %d

				%ar = fadd <4 x float> %al, %bl
				%br = fadd <4 x float> %bl, %cl
				%cr = fadd <4 x float> %cl, %dl
				%dr = fadd <4 x float> %dl, %al

				; The sizes below are conservative. AArch64TargetLowering
				; conservatively assumes the entiew vector is stored.
				tail call void @llvm.aarch64.neon.st2lane.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, i64 1, float* %res)
				; CHECK: ST2i32 {{.}} :: (store 32 {{.}})
				tail call void @llvm.aarch64.neon.st3lane.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, i64 1, float* %res)
				; CHECK: ST3i32 {{.}} :: (store 48 {{.}})
				tail call void @llvm.aarch64.neon.st4lane.v4f32.p0f32(<4 x float> %ar, <4 x float> %br, <4 x float> %cr, <4 x float> %dr, i64 1, float* %res)
				; CHECK: ST4i32 {{.}} :: (store 64 {{.}})

				ret void
				}