This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
1/5
TargetLowering.cpp
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
-
store_nonbytesized_vecs.ll
1/2
vec-move-17.ll

Differential D42100

Fix codegen of stores of vectors with non byte-sized elements.
AbandonedPublic

Authored by jonpa on Jan 16 2018, 4:18 AM.

Download Raw Diff

Details

Reviewers

eli.friedman
uweigand
efriedma

Summary

https://bugs.llvm.org/show_bug.cgi?id=35520

This is completely broken currently.

Diff Detail

Event Timeline

jonpa created this revision.Jan 16 2018, 4:18 AM

First attempt, which seems to work, sort of:

Most failing tests regenerated (as they had already been generated previously), except for two tests:

Failing Tests (2):

LLVM :: CodeGen/SystemZ/vec-move-17.ll
LLVM :: CodeGen/X86/pr32108.ll

During these tests, an assert triggers in SelectionDAG::computeKnownBits(). It seems somewhere there is a mixup of integer width where identical widths are assumed:

(gdb) p Known
$92 = (llvm::KnownBits &) @0x3ffffffb930: {Zero = {static WORD_MAX = 18446744073709551615, U = {
      VAL = 4294967294, pVal = 0xfffffffe}, BitWidth = 32}, One = {
    static WORD_MAX = 18446744073709551615, U = {VAL = 1, pVal = 0x1}, BitWidth = 32}}
(gdb) p Known2
$93 = {Zero = {static WORD_MAX = 18446744073709551615, U = {VAL = 0, pVal = 0x0}, BitWidth = 64},
  One = {static WORD_MAX = 18446744073709551615, U = {VAL = 0, pVal = 0x0}, BitWidth = 64}}

    // Output known-1 bits are only known if set in both the LHS & RHS.                                                                                                                                             
    Known.One &= Known2.One;
->
  APInt &operator&=(const APInt &RHS) {
    assert(BitWidth == RHS.BitWidth && "Bit widths must be the same");

I am guessing that my new code is probably missing some kind of flag, but I am not sure exactly what the fix is...

(SystemZ/vec-move-17.ll was previously just supposed to compile, but now there should probably be some test as well once it actually compiles with patch).

The tests I have gathered so far are right now in CodeGen/Generic/store_nonbytesized_vecs.ll. They seem to be correctly handled by CodeGen, but I am hoping that you will help to verify this, as well as give a suggestion as to where this test should be placed. In SystemZ run just for that target, or in Generic run by multiple targets, perhaps? Currently, I just added the generated output by update_llc_test_checks.py for SystemZ.

spatel added a subscriber: spatel.Jan 16 2018, 8:52 AM

I don't see a change to DAGTypeLegalizer::SplitVecOp_STORE? (I think you're currently miscompiling the <8 x i31> testcase.)

We're generating pretty terrible code for some of the x86 testcases, but you obviously don't need to fix that here.

For the assertion failure, it looks like you're uncovering some existing bug in computeKnownBits; I can't think of any way your changes could cause that issue. Maybe some bug in the bitcast handling code?

I found that if I extract the element into its native width, and then truncate, everything works. I then had to narrow this handling to integer types, as I saw X86 failing with v4f80 vectors since TRUNCATE is only for integers.

About the problem in computeKnownBits:

computeKnownBits() gets called with Known of 32 bits, to analyze an EXTRACT_VECTOR_ELT, which is an v2i64 vector. Therefore Known becomes 64 bits wide, and the calling computeKnownBits() crashes on

Known.One &= Known2.One;

, as Known is 32 bits.

I fixed this by

diff --git a/lib/CodeGen/SelectionDAG/SelectionDAG.cpp b/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
index 837173e..9f98bdd 100644
--- a/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ b/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -2805,6 +2805,10 @@ void SelectionDAG::computeKnownBits(SDValue Op, KnownBits &Known,
     }
     if (BitWidth > EltBitWidth)
       Known = Known.zext(BitWidth);
+    // If the extracted element is wider than BitWidth, truncate to match
+    // callers expectations.
+    if (Known.getBitWidth() > BitWidth)
+      Known = Known.trunc(BitWidth);
     break;
   }
   case ISD::INSERT_VECTOR_ELT: {

, which made the DAGCombiner work for the test case (before trying the explit truncate, which worked without this)

Not sure if this patch should still go in, or if there should be an assert that a EXTRACT_VECTOR_ELT should not implicitly truncate?

I don't see a change to DAGTypeLegalizer::SplitVecOp_STORE? (I think you're currently miscompiling the <8 x i31> testcase.)

You're right, I added the same handling there.

Tests updated.

if there should be an assert that a EXTRACT_VECTOR_ELT should not implicitly truncate?

There should be an assert, yes.

I then had to narrow this handling to integer types, as I saw X86 failing with v4f80 vectors since TRUNCATE is only for integers.

I think all you need for non-integer types is a bitcast to an integer type after the EXTRACT_VECTOR_ELT.

I think all you need for non-integer types is a bitcast to an integer type after the EXTRACT_VECTOR_ELT.

Err, actually, wait a sec... why are we triggering this path for v4f80? As long as the element size in bits is a multiple of 8, we can use separate stores for each element. (Whether the element size is a power of two isn't really important.)

In D42100#978951, @efriedma wrote:

if there should be an assert that a EXTRACT_VECTOR_ELT should not implicitly truncate?

There should be an assert, yes.

Where should this assert be located? Should it be asserted at the point of node creation, or just at the end during isel? I don't see any asserts in the getNode() method. (We could make this a separate patch, I guess)

I then had to narrow this handling to integer types, as I saw X86 failing with v4f80 vectors since TRUNCATE is only for integers.

why are we triggering this path for v4f80? As long as the element size in bits is a multiple of 8, we can use separate stores for each element.

OK, I asked this previously since I was worried that if e.g. storing a v3i24, the i24 element stores might be promoted / split / ... somehow, so that it would be safest to just build the integer.

(Whether the element size is a power of two isn't really important.)

So then we trust that the original loop in scalarizeVectorStore() will do the job, which seems to be the case.

I changed the check then to simply call isByteSized() on the vector element.

In case we get a floating point element which is not byte-sized, the assert for ISD::TRUNCATE will trigger. Since there are no such types in the language reference, I skipped that check.

Tests still pass unmodified from before.

I don't see any asserts in the getNode() method.

Really? I see a bunch. (getNode() is overloaded; make sure you're looking at the version for nodes with two operands.) But anyway, that can be a separate patch.

test/CodeGen/X86/avx512-mask-op.ll
2786 ↗	(On Diff #130402)	This new code is substantially worse than the old code, without solving any correctness problem. Specifically, the key here is that you can split without scalarizing if the size of each half in bits is a multiple of 8. So on AVX-512, we can split a `<64 x i1>` store into four legal `<16 x i1>` stores.

In D42100#980794, @efriedma wrote:

I don't see any asserts in the getNode() method.

Really? I see a bunch. (getNode() is overloaded; make sure you're looking at the version for nodes with two operands.) But anyway, that can be a separate patch.

Aah, sorry, now I see it. But actually, it says

if (VT != Elt.getValueType())
   // If the vector element type is not legal, the BUILD_VECTOR operands
   // are promoted and implicitly truncated, and the result implicitly
   // extended. Make that explicit here.
   Elt = getAnyExtOrTrunc(Elt, DL, VT);

So it is not really true that there could be a general assert since this is actually even expected, or? I don't think I have a patch for this, then.

Specifically, the key here is that you can split without scalarizing if the size of each half in bits is a multiple of 8. So on AVX-512, we can split a <64 x i1> store into four legal <16 x i1> stores.

I updated the test in SplitVecOp_STORE to

+ // Scalarize if elements and the split halves are not byte-sized.
+ if (!MemoryVT.getScalarType().isByteSized() &&
+ (!LoMemVT.isByteSized() || !HiMemVT.isByteSized()))
+ return TLI.scalarizeVectorStore(N, DAG);

, which seems to work. Two stores will be produced, and the second store will have the byte-offset of the first store, which works as long as that first store is byte-sized.

Two X86 tests updated (test/CodeGen/X86/avx512-mask-op.ll, test/CodeGen/X86/vector-compare-results.ll)

LGTM

So it is not really true that there could be a general assert since this is actually even expected, or? I don't think I have a patch for this, then.

That code is specifically dealing with an EXTRACT_VECTOR_ELT from a BUILD_VECTOR; in that case, before type legalization is finished, you could have a mix of types: either the BUILD_VECTOR operand type could be wider than the EXTRACT_VECTOR_ELT result, or the EXTRACT_VECTOR_ELT result type could be wider than the BUILD_VECTOR operand. But they should both be at least as wide as the vector element type.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2017 ↗	(On Diff #130631)	I think the "MemoryVT.getScalarType().isByteSized()" check is redundant. Not a big deal either way.
lib/CodeGen/SelectionDAG/TargetLowering.cpp
3473	Extra space after assert.

This revision is now accepted and ready to land.Jan 19 2018, 11:17 AM

craig.topper added inline comments.Jan 19 2018, 11:29 AM

test/CodeGen/SystemZ/vec-move-17.ll
65–72	Why did this comment get deleted?

Thanks for review!

Committed as trunk@323042.

lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
2017 ↗	(On Diff #130631)	Yes, I think you are right :-) removed.
lib/CodeGen/SelectionDAG/TargetLowering.cpp
3473	removed
test/CodeGen/SystemZ/vec-move-17.ll
65–72	On Jan 16, I wrote: "(SystemZ/vec-move-17.ll was previously just supposed to compile, but now there should probably be some test as well once it actually compiles with patch)." I have now added some CHECKs and left the comment deleted.

efriedma added inline comments.Jan 20 2018, 10:00 AM

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3468	I just realized there's a problem here. I guess I've spent too much time on little-endian targets. Specifically, the problem is that this code will store the elements in the wrong order on big-endian targets. This will lead to inconsistent, and generally messy, results. I think you can solve this by reversing the loop on big-endian targets, so high element of the vector is in the low bits of the integer.

uabelho added a subscriber: uabelho.Jan 22 2018, 4:20 AM

uabelho added inline comments.Jan 23 2018, 5:46 AM

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3468	So we have a bug for big-endian targets on trunk now? Will you deal with this Jonas?

jonpa marked 3 inline comments as done.Jan 30 2018, 3:58 AM

jonpa added inline comments.

lib/CodeGen/SelectionDAG/TargetLowering.cpp
3468	sorry for delay (vacation). Yes, I will propose a patch soon.

I've been talking to Jonas about the big-endian question, and I now think that there may actually be open questions about what the layout should be.

In general, these are the same questions that need to be answered when laying out a struct type including bitfields. However, for bitfields you usually have a platform ABI that specifies the precise layout. But these vector types are LLVM IR types that do not match any high-level language type directly, so really should be defined by LLVM. I was not able to find any such definition in the IR documentation -- am I missing something here?

Starting with the simple case of <8 x i1>, the question that needs to be answered is, if you convert this to an i8 by either storing to memory and loading the byte, or else doing a bitcast, where in the resulting i8 can you find the elements 0 ... 7 of the original <8 x i1>. And specifically, should the answer to that question depend on whether the system is big- or little-endian?

Note that comparing this to a bit field that implements a similar type: struct { char b0 : 1; char b1 : 1; ... char b7 : 1 } ... here, the platform ABI defines where to find the bits b0 .. b7, and this does actually differ. On (most) little-endian machines, b0 is placed in the least-significant bit of the byte in memory, while on (most) big-endian machines, b0 is placed in the most-significant bit. [ GCC in fact also supports a few rare platforms where this doesn't hold. I think those are not supported by LLVM today. ]

Should LLVM define the layout of <8 x i1> the same way? Or should it impose its own layout, which may be defined independently of byte order? (Again my question, maybe this has been defined and I just don't know where to find this information?)

Once we've decided the simple case, we can probably extend the definition to more complex cases like <7 x i1> or <2 x i3> in a straightforward manner ...

We never really formally defined the encoding of vectors, I think. The DataLayout has computed the size in bits of vector types as element_size_in_bits*num_elements for a long time, though (so there's no padding between elements), and we've generally had the rule that the lowest element has the lowest address, since that's generally how vector units work on both big-endian and little-endian targets.

Given those two constraints, there's only one possible rule for non-power-of-two byte-sized elements: <4 x i24> should have the same layout as <{ i24, i24, i24, i24 }> (a packed struct). And there's only one way to extrapolate that rule to non-byte-size element types: essentially, if you bitcast the vector to an integer, the first element should be in the high bit(s).

Yes, that makes sense to me. It also agrees with the layout of bitfields on (all currently supported) big-endian platforms, so that's another argument for it.

This patch follows your suggestion by reversing the order of the vector elements in the built integer for big endian targets.

SystemZ tests updated.

jonpa reopened this revision.Jan 31 2018, 6:21 AM

This revision is now accepted and ready to land.Jan 31 2018, 6:21 AM

jonpa requested review of this revision.Jan 31 2018, 6:21 AM

Could you open a new review request instead of reusing the existing one with a new patch? This is going to get confusing.

In D42100#993814, @efriedma wrote:

Could you open a new review request instead of reusing the existing one with a new patch? This is going to get confusing.

sure...

jonpa mentioned this in D42786: Consider endianness in TargetLowering::scalarizeVectorStore().Feb 1 2018, 1:42 AM

JonChesterfield added a subscriber: JonChesterfield.Feb 20 2018, 9:34 AM

dmgreen mentioned this in D94867: [ARM] Make a BE predicate bitcast consistent with the rest of llvm.Jan 18 2021, 12:54 AM

markus mentioned this in D94964: [LangRef] Describe memory layout for vectors types.Jan 19 2021, 5:07 AM

Revision Contents

Path

Size

lib/

CodeGen/

SelectionDAG/

TargetLowering.cpp

4 lines

test/

CodeGen/

SystemZ/

store_nonbytesized_vecs.ll

127 lines

vec-move-17.ll

3 lines

Diff 132162

lib/CodeGen/SelectionDAG/TargetLowering.cpp

Show First 20 Lines • Show All 3,451 Lines • ▼ Show 20 Lines	if (!MemSclVT.isByteSized()) {

SDValue CurrVal = DAG.getConstant(0, SL, IntVT);		SDValue CurrVal = DAG.getConstant(0, SL, IntVT);

for (unsigned Idx = 0; Idx < NumElem; ++Idx) {		for (unsigned Idx = 0; Idx < NumElem; ++Idx) {
SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, RegSclVT, Value,		SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, RegSclVT, Value,
DAG.getConstant(Idx, SL, IdxVT));		DAG.getConstant(Idx, SL, IdxVT));
SDValue Trunc = DAG.getNode(ISD::TRUNCATE, SL, MemSclVT, Elt);		SDValue Trunc = DAG.getNode(ISD::TRUNCATE, SL, MemSclVT, Elt);
SDValue ExtElt = DAG.getNode(ISD::ZERO_EXTEND, SL, IntVT, Trunc);		SDValue ExtElt = DAG.getNode(ISD::ZERO_EXTEND, SL, IntVT, Trunc);
		unsigned ShiftIntoIdx = (DAG.getDataLayout().isBigEndian() ?
		(NumElem - 1) - Idx: Idx);
SDValue ShiftAmount =		SDValue ShiftAmount =
DAG.getConstant(Idx * MemSclVT.getSizeInBits(), SL, IntVT);		DAG.getConstant(ShiftIntoIdx * MemSclVT.getSizeInBits(), SL, IntVT);
SDValue ShiftedElt = DAG.getNode(ISD::SHL, SL, IntVT, ExtElt, ShiftAmount);		SDValue ShiftedElt = DAG.getNode(ISD::SHL, SL, IntVT, ExtElt, ShiftAmount);
CurrVal = DAG.getNode(ISD::OR, SL, IntVT, CurrVal, ShiftedElt);		CurrVal = DAG.getNode(ISD::OR, SL, IntVT, CurrVal, ShiftedElt);
}		}

return DAG.getStore(Chain, SL, CurrVal, BasePtr, ST->getPointerInfo(),		return DAG.getStore(Chain, SL, CurrVal, BasePtr, ST->getPointerInfo(),
		efriedmaUnsubmitted Not Done Reply Inline Actions I just realized there's a problem here. I guess I've spent too much time on little-endian targets. Specifically, the problem is that this code will store the elements in the wrong order on big-endian targets. This will lead to inconsistent, and generally messy, results. I think you can solve this by reversing the loop on big-endian targets, so high element of the vector is in the low bits of the integer. efriedma: I just realized there's a problem here. I guess I've spent too much time on little-endian…
		uabelhoUnsubmitted Not Done Reply Inline Actions So we have a bug for big-endian targets on trunk now? Will you deal with this Jonas? uabelho: So we have a bug for big-endian targets on trunk now? Will you deal with this Jonas?
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions sorry for delay (vacation). Yes, I will propose a patch soon. jonpa: sorry for delay (vacation). Yes, I will propose a patch soon.
ST->getAlignment(), ST->getMemOperand()->getFlags(),		ST->getAlignment(), ST->getMemOperand()->getFlags(),
ST->getAAInfo());		ST->getAAInfo());
}		}

// Store Stride in bytes		// Store Stride in bytes
		efriedmaUnsubmitted Done Reply Inline Actions Extra space after assert. efriedma: Extra space after assert.
		jonpaAuthorUnsubmitted Not Done Reply Inline Actions removed jonpa: removed
unsigned Stride = MemSclVT.getSizeInBits() / 8;		unsigned Stride = MemSclVT.getSizeInBits() / 8;
assert (Stride && "Zero stride!");		assert (Stride && "Zero stride!");
// Extract each of the elements from the original vector and save them into		// Extract each of the elements from the original vector and save them into
// memory individually.		// memory individually.
SmallVector<SDValue, 8> Stores;		SmallVector<SDValue, 8> Stores;
for (unsigned Idx = 0; Idx < NumElem; ++Idx) {		for (unsigned Idx = 0; Idx < NumElem; ++Idx) {
SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, RegSclVT, Value,		SDValue Elt = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, SL, RegSclVT, Value,
DAG.getConstant(Idx, SL, IdxVT));		DAG.getConstant(Idx, SL, IdxVT));
▲ Show 20 Lines • Show All 437 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/store_nonbytesized_vecs.ll

	; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
	; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 < %s \| FileCheck %s			; RUN: llc -mtriple=s390x-linux-gnu -mcpu=z13 < %s \| FileCheck %s

	; Store a <4 x i31> vector.			; Store a <4 x i31> vector.
	define void @fun0(<4 x i31> %src, <4 x i31>* %p)			define void @fun0(<4 x i31> %src, <4 x i31>* %p)
	; CHECK-LABEL: fun0:			; CHECK-LABEL: fun0:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vlgvf %r1, %v24, 3			; CHECK-NEXT: vlgvf %r1, %v24, 0
	; CHECK-NEXT: vlgvf %r0, %v24, 2			; CHECK-NEXT: vlgvf %r0, %v24, 1
	; CHECK-NEXT: sllg %r1, %r1, 29			; CHECK-NEXT: sllg %r1, %r1, 29
	; CHECK-NEXT: rosbg %r1, %r0, 35, 63, 62			; CHECK-NEXT: rosbg %r1, %r0, 35, 63, 62
	; CHECK-NEXT: nihh %r1, 4095			; CHECK-NEXT: nihh %r1, 4095
	; CHECK-NEXT: vlgvf %r3, %v24, 0
	; CHECK-NEXT: stg %r1, 0(%r2)			; CHECK-NEXT: stg %r1, 0(%r2)
	; CHECK-NEXT: vlgvf %r1, %v24, 1			; CHECK-NEXT: vlgvf %r1, %v24, 2
	; CHECK-NEXT: llgtr %r3, %r3			; CHECK-NEXT: risbgn %r0, %r0, 0, 129, 62
	; CHECK-NEXT: rosbg %r3, %r1, 2, 32, 31			; CHECK-NEXT: rosbg %r0, %r1, 2, 32, 31
	; CHECK-NEXT: rosbg %r3, %r0, 0, 1, 62			; CHECK-NEXT: vlgvf %r1, %v24, 3
	; CHECK-NEXT: stg %r3, 8(%r2)			; CHECK-NEXT: rosbg %r0, %r1, 33, 63, 0
				; CHECK-NEXT: stg %r0, 8(%r2)
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	{			{
	store <4 x i31> %src, <4 x i31>* %p			store <4 x i31> %src, <4 x i31>* %p
	ret void			ret void
	}			}

	; Store a <16 x i1> vector.			; Store a <16 x i1> vector.
	define i16 @fun1(<16 x i1> %src)			define i16 @fun1(<16 x i1> %src)
	; CHECK-LABEL: fun1:			; CHECK-LABEL: fun1:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: aghi %r15, -168			; CHECK-NEXT: aghi %r15, -168
	; CHECK-NEXT: .cfi_def_cfa_offset 328			; CHECK-NEXT: .cfi_def_cfa_offset 328
	; CHECK-NEXT: vlgvb %r0, %v24, 0			; CHECK-NEXT: vlgvb %r0, %v24, 0
	; CHECK-NEXT: vlgvb %r1, %v24, 1			; CHECK-NEXT: vlgvb %r1, %v24, 1
	; CHECK-NEXT: nilf %r0, 1			; CHECK-NEXT: risblg %r0, %r0, 16, 144, 15
	; CHECK-NEXT: rosbg %r0, %r1, 62, 62, 1			; CHECK-NEXT: rosbg %r0, %r1, 49, 49, 14
	; CHECK-NEXT: vlgvb %r1, %v24, 2			; CHECK-NEXT: vlgvb %r1, %v24, 2
	; CHECK-NEXT: rosbg %r0, %r1, 61, 61, 2			; CHECK-NEXT: rosbg %r0, %r1, 50, 50, 13
	; CHECK-NEXT: vlgvb %r1, %v24, 3			; CHECK-NEXT: vlgvb %r1, %v24, 3
	; CHECK-NEXT: rosbg %r0, %r1, 60, 60, 3			; CHECK-NEXT: rosbg %r0, %r1, 51, 51, 12
	; CHECK-NEXT: vlgvb %r1, %v24, 4			; CHECK-NEXT: vlgvb %r1, %v24, 4
	; CHECK-NEXT: rosbg %r0, %r1, 59, 59, 4			; CHECK-NEXT: rosbg %r0, %r1, 52, 52, 11
	; CHECK-NEXT: vlgvb %r1, %v24, 5			; CHECK-NEXT: vlgvb %r1, %v24, 5
	; CHECK-NEXT: rosbg %r0, %r1, 58, 58, 5			; CHECK-NEXT: rosbg %r0, %r1, 53, 53, 10
	; CHECK-NEXT: vlgvb %r1, %v24, 6			; CHECK-NEXT: vlgvb %r1, %v24, 6
	; CHECK-NEXT: rosbg %r0, %r1, 57, 57, 6			; CHECK-NEXT: rosbg %r0, %r1, 54, 54, 9
	; CHECK-NEXT: vlgvb %r1, %v24, 7			; CHECK-NEXT: vlgvb %r1, %v24, 7
	; CHECK-NEXT: rosbg %r0, %r1, 56, 56, 7
	; CHECK-NEXT: vlgvb %r1, %v24, 8
	; CHECK-NEXT: rosbg %r0, %r1, 55, 55, 8			; CHECK-NEXT: rosbg %r0, %r1, 55, 55, 8
				; CHECK-NEXT: vlgvb %r1, %v24, 8
				; CHECK-NEXT: rosbg %r0, %r1, 56, 56, 7
	; CHECK-NEXT: vlgvb %r1, %v24, 9			; CHECK-NEXT: vlgvb %r1, %v24, 9
	; CHECK-NEXT: rosbg %r0, %r1, 54, 54, 9			; CHECK-NEXT: rosbg %r0, %r1, 57, 57, 6
	; CHECK-NEXT: vlgvb %r1, %v24, 10			; CHECK-NEXT: vlgvb %r1, %v24, 10
	; CHECK-NEXT: rosbg %r0, %r1, 53, 53, 10			; CHECK-NEXT: rosbg %r0, %r1, 58, 58, 5
	; CHECK-NEXT: vlgvb %r1, %v24, 11			; CHECK-NEXT: vlgvb %r1, %v24, 11
	; CHECK-NEXT: rosbg %r0, %r1, 52, 52, 11			; CHECK-NEXT: rosbg %r0, %r1, 59, 59, 4
	; CHECK-NEXT: vlgvb %r1, %v24, 12			; CHECK-NEXT: vlgvb %r1, %v24, 12
	; CHECK-NEXT: rosbg %r0, %r1, 51, 51, 12			; CHECK-NEXT: rosbg %r0, %r1, 60, 60, 3
	; CHECK-NEXT: vlgvb %r1, %v24, 13			; CHECK-NEXT: vlgvb %r1, %v24, 13
	; CHECK-NEXT: rosbg %r0, %r1, 50, 50, 13			; CHECK-NEXT: rosbg %r0, %r1, 61, 61, 2
	; CHECK-NEXT: vlgvb %r1, %v24, 14			; CHECK-NEXT: vlgvb %r1, %v24, 14
	; CHECK-NEXT: rosbg %r0, %r1, 49, 49, 14			; CHECK-NEXT: rosbg %r0, %r1, 62, 62, 1
	; CHECK-NEXT: vlgvb %r1, %v24, 15			; CHECK-NEXT: vlgvb %r1, %v24, 15
	; CHECK-NEXT: rosbg %r0, %r1, 32, 48, 15			; CHECK-NEXT: rosbg %r0, %r1, 63, 63, 0
	; CHECK-NEXT: sth %r0, 160(%r15)			; CHECK-NEXT: sth %r0, 160(%r15)
	; CHECK-NEXT: lh %r2, 160(%r15)			; CHECK-NEXT: lh %r2, 160(%r15)
	; CHECK-NEXT: aghi %r15, 168			; CHECK-NEXT: aghi %r15, 168
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	{			{
	%res = bitcast <16 x i1> %src to i16			%res = bitcast <16 x i1> %src to i16
	ret i16 %res			ret i16 %res
	}			}

	; Truncate a <8 x i32> vector to <8 x i31> and store it (test splitting).			; Truncate a <8 x i32> vector to <8 x i31> and store it (test splitting).
	define void @fun2(<8 x i32> %src, <8 x i31>* %p)			define void @fun2(<8 x i32> %src, <8 x i31>* %p)
	; CHECK-LABEL: fun2:			; CHECK-LABEL: fun2:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: stmg %r14, %r15, 112(%r15)			; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
	; CHECK-NEXT: .cfi_offset %r14, -48			; CHECK-NEXT: .cfi_offset %r14, -48
	; CHECK-NEXT: .cfi_offset %r15, -40			; CHECK-NEXT: .cfi_offset %r15, -40
	; CHECK-NEXT: vlgvf %r1, %v24, 0			; CHECK-NEXT: vlgvf %r3, %v26, 1
	; CHECK-NEXT: stc %r1, 30(%r2)			; CHECK-NEXT: vlgvf %r1, %v26, 2
	; CHECK-NEXT: llgtr %r0, %r1			; CHECK-NEXT: risbgn %r4, %r3, 0, 129, 62
	; CHECK-NEXT: # kill: def %r1l killed %r1l killed %r1d def %r1d			; CHECK-NEXT: rosbg %r4, %r1, 2, 32, 31
	; CHECK-NEXT: srl %r1, 8			; CHECK-NEXT: vlgvf %r0, %v26, 3
	; CHECK-NEXT: sth %r1, 28(%r2)			; CHECK-NEXT: rosbg %r4, %r0, 33, 63, 0
				; CHECK-NEXT: stc %r0, 30(%r2)
				; CHECK-NEXT: # kill: def %r0l killed %r0l killed %r0d def %r0d
				; CHECK-NEXT: srl %r0, 8
	; CHECK-NEXT: vlgvf %r1, %v24, 1			; CHECK-NEXT: vlgvf %r1, %v24, 1
	; CHECK-NEXT: rosbg %r0, %r1, 2, 32, 31			; CHECK-NEXT: vlgvf %r14, %v24, 0
	; CHECK-NEXT: srlg %r1, %r0, 24			; CHECK-NEXT: sth %r0, 28(%r2)
	; CHECK-NEXT: vlgvf %r3, %v26, 0			; CHECK-NEXT: vlgvf %r0, %v24, 2
				; CHECK-NEXT: risbgn %r5, %r1, 0, 133, 58
				; CHECK-NEXT: rosbg %r5, %r0, 6, 36, 27
				; CHECK-NEXT: sllg %r14, %r14, 25
				; CHECK-NEXT: rosbg %r14, %r1, 39, 63, 58
				; CHECK-NEXT: vlgvf %r0, %v24, 3
				; CHECK-NEXT: rosbg %r5, %r0, 37, 63, 60
				; CHECK-NEXT: sllg %r1, %r14, 8
				; CHECK-NEXT: rosbg %r1, %r5, 56, 63, 8
				; CHECK-NEXT: stg %r1, 0(%r2)
				; CHECK-NEXT: srlg %r1, %r4, 24
	; CHECK-NEXT: st %r1, 24(%r2)			; CHECK-NEXT: st %r1, 24(%r2)
	; CHECK-NEXT: vlgvf %r1, %v26, 1			; CHECK-NEXT: vlgvf %r1, %v26, 0
	; CHECK-NEXT: risbgn %r4, %r3, 37, 191, 60			; CHECK-NEXT: risbgn %r0, %r0, 0, 131, 60
	; CHECK-NEXT: vlgvf %r5, %v26, 3			; CHECK-NEXT: rosbg %r0, %r1, 4, 34, 29
	; CHECK-NEXT: sllg %r5, %r5, 25
	; CHECK-NEXT: rosbg %r4, %r1, 6, 36, 27
	; CHECK-NEXT: vlgvf %r1, %v26, 2
	; CHECK-NEXT: rosbg %r5, %r1, 39, 63, 58
	; CHECK-NEXT: rosbg %r4, %r1, 0, 5, 58
	; CHECK-NEXT: sllg %r1, %r5, 8			; CHECK-NEXT: sllg %r1, %r5, 8
	; CHECK-NEXT: rosbg %r1, %r4, 56, 63, 8			; CHECK-NEXT: rosbg %r0, %r3, 35, 63, 62
	; CHECK-NEXT: vlgvf %r5, %v24, 2
	; CHECK-NEXT: rosbg %r0, %r5, 0, 1, 62
	; CHECK-NEXT: stg %r1, 0(%r2)
	; CHECK-NEXT: vlgvf %r1, %v24, 3
	; CHECK-NEXT: risbgn %r14, %r5, 35, 191, 62
	; CHECK-NEXT: rosbg %r14, %r1, 4, 34, 29
	; CHECK-NEXT: sllg %r1, %r4, 8
	; CHECK-NEXT: rosbg %r14, %r3, 0, 3, 60
	; CHECK-NEXT: rosbg %r1, %r14, 56, 63, 8
	; CHECK-NEXT: stg %r1, 8(%r2)
	; CHECK-NEXT: sllg %r1, %r14, 8
	; CHECK-NEXT: rosbg %r1, %r0, 56, 63, 8			; CHECK-NEXT: rosbg %r1, %r0, 56, 63, 8
	; CHECK-NEXT: stg %r1, 16(%r2)			; CHECK-NEXT: stg %r1, 8(%r2)
				; CHECK-NEXT: sllg %r0, %r0, 8
				; CHECK-NEXT: rosbg %r0, %r4, 56, 63, 8
				; CHECK-NEXT: stg %r0, 16(%r2)
	; CHECK-NEXT: lmg %r14, %r15, 112(%r15)			; CHECK-NEXT: lmg %r14, %r15, 112(%r15)
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	{			{
	%tmp = trunc <8 x i32> %src to <8 x i31>			%tmp = trunc <8 x i32> %src to <8 x i31>
	store <8 x i31> %tmp, <8 x i31>* %p			store <8 x i31> %tmp, <8 x i31>* %p
	ret void			ret void
	}			}

	; Load and store a <3 x i31> vector (test widening).			; Load and store a <3 x i31> vector (test widening).
	define void @fun3(<3 x i31>* %src, <3 x i31>* %p)			define void @fun3(<3 x i31>* %src, <3 x i31>* %p)
	; CHECK-LABEL: fun3:			; CHECK-LABEL: fun3:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: llgf %r0, 3(%r2)
	; CHECK-NEXT: llgf %r1, 0(%r2)			; CHECK-NEXT: llgf %r1, 0(%r2)
	; CHECK-NEXT: rosbg %r1, %r0, 0, 32, 31			; CHECK-NEXT: llgf %r0, 3(%r2)
				; CHECK-NEXT: sllg %r4, %r1, 62
				; CHECK-NEXT: rosbg %r4, %r0, 0, 32, 31
	; CHECK-NEXT: llgf %r0, 6(%r2)			; CHECK-NEXT: llgf %r0, 6(%r2)
	; CHECK-NEXT: st %r1, 8(%r3)			; CHECK-NEXT: ogr %r0, %r4
	; CHECK-NEXT: rosbg %r1, %r0, 0, 1, 62			; CHECK-NEXT: st %r0, 8(%r3)
	; CHECK-NEXT: srlg %r1, %r1, 32			; CHECK-NEXT: srlg %r0, %r4, 32
	; CHECK-NEXT: sllg %r0, %r0, 30			; CHECK-NEXT: sllg %r1, %r1, 30
	; CHECK-NEXT: lr %r0, %r1			; CHECK-NEXT: lr %r1, %r0
	; CHECK-NEXT: nihh %r0, 8191			; CHECK-NEXT: nihh %r1, 8191
	; CHECK-NEXT: stg %r0, 0(%r3)			; CHECK-NEXT: stg %r1, 0(%r3)
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	{			{
	%tmp = load <3 x i31>, <3 x i31>* %src			%tmp = load <3 x i31>, <3 x i31>* %src
	store <3 x i31> %tmp, <3 x i31>* %p			store <3 x i31> %tmp, <3 x i31>* %p
	ret void			ret void
	}			}

test/CodeGen/SystemZ/vec-move-17.ll

	Show First 20 Lines • Show All 56 Lines • ▼ Show 20 Lines
	; CHECK: br %r14			; CHECK: br %r14
	%trunc = trunc <4 x i32> %val to <4 x i16>			%trunc = trunc <4 x i32> %val to <4 x i16>
	store <4 x i16> %trunc, <4 x i16> *%ptr			store <4 x i16> %trunc, <4 x i16> *%ptr
	ret void			ret void
	}			}

	; Test a v2i64->v2i1 truncation.			; Test a v2i64->v2i1 truncation.
	define void @f7(<2 x i64> %val, <2 x i1> *%ptr) {			define void @f7(<2 x i64> %val, <2 x i1> *%ptr) {
	; CHECK-LABEL: f7:			; CHECK-LABEL: f7:
	; CHECK: # %bb.0:			; CHECK: # %bb.0:
	; CHECK-NEXT: vlgvg %r0, %v24, 0			; CHECK-NEXT: vlgvg %r0, %v24, 0
				; CHECK-NEXT: sll %r0, 1
	; CHECK-NEXT: vlgvg %r1, %v24, 1			; CHECK-NEXT: vlgvg %r1, %v24, 1
	; CHECK-NEXT: risbgn %r0, %r1, 32, 62, 1			; CHECK-NEXT: rosbg %r0, %r1, 63, 63, 0
	; CHECK-NEXT: nilf %r0, 3			; CHECK-NEXT: nilf %r0, 3
	; CHECK-NEXT: stc %r0, 0(%r2)			; CHECK-NEXT: stc %r0, 0(%r2)
	; CHECK-NEXT: br %r14			; CHECK-NEXT: br %r14
	craig.topperUnsubmitted Done Reply Inline Actions Why did this comment get deleted? craig.topper: Why did this comment get deleted?
	jonpaAuthorUnsubmitted Not Done Reply Inline Actions On Jan 16, I wrote: "(SystemZ/vec-move-17.ll was previously just supposed to compile, but now there should probably be some test as well once it actually compiles with patch)." I have now added some CHECKs and left the comment deleted. jonpa: On Jan 16, I wrote: "(SystemZ/vec-move-17.ll was previously just supposed to compile, but now…
	%trunc = trunc <2 x i64> %val to <2 x i1>			%trunc = trunc <2 x i64> %val to <2 x i1>
	store <2 x i1> %trunc, <2 x i1> *%ptr			store <2 x i1> %trunc, <2 x i1> *%ptr
	ret void			ret void
	}			}

	; Test a v2i64->v2i8 truncation. At the moment we use a VPERM rather than			; Test a v2i64->v2i8 truncation. At the moment we use a VPERM rather than
	; a chain of packs.			; a chain of packs.
	define void @f8(<2 x i64> %val, <2 x i8> *%ptr) {			define void @f8(<2 x i64> %val, <2 x i8> *%ptr) {
	Show All 31 Lines