This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
ClosedPublic

Authored by bjope on Nov 24 2017, 9:21 AM.

Download Raw Diff

Details

Reviewers

niravd
hfinkel

Commits

rG823b299fbce7: [DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
rL319771: [DAGCombine] Handle big endian correctly in CombineConsecutiveLoads

Summary

Found out, at code inspection, that there was a fault in
DAGCombiner::CombineConsecutiveLoads for big-endian targets.

A BUILD_PAIR is always having the least significant bits of
the composite value in element 0. So when we are doing the checks
for consecutive loads, for big endian targets, we should check
if the load to elt 1 is at the lower address and the load
to elt 0 is at the higher address.

Normally this bug only resulted in missed oppurtunities for
doing the load combine. I guess that in some rare situation it
could lead to faulty combines, but I've not seen that happen.

Note that this patch actually will trigger load combine for
some big endian regression tests.
One example is test/CodeGen/PowerPC/anon_aggr.ll where we now get

t76: i64,ch = load<LD8[FixedStack-9]

instead of

t37: i32,ch = load<LD4[FixedStack-10]>
t35: i32,ch = load<LD4[FixedStack-9]>
t41: i64 = build_pair t37, t35

before legalization. Then the legalization will split the LD8
into two loads, so the end result is the same. That should
verify that the transfomation is correct now.

Diff Detail

Build Status

Buildable 12726
Build 12726: arc lint + arc unit

Event Timeline

bjope created this revision.Nov 24 2017, 9:21 AM

bjope added reviewers: niravd, hfinkel.Nov 24 2017, 9:29 AM

This endianness problem is probably also latent where do load combination, but we should sink this check into areNonVolatileConsecutiveLoads and MatchLoadCombine.

In D40444#935387, @niravd wrote:

This endianness problem is probably also latent where do load combination, but we should sink this check into areNonVolatileConsecutiveLoads and MatchLoadCombine.

I've looked at MatchLoadCombine and it looks like it is trying to support both little and big endian. As well as combining the loads and doing a bswap if the ordering in memory does not match how the bytes are ordered after OR:ing the pieces together.

And areNonVolatileConsecutiveLoads is working just as it is described. It checks if one load is "Bytes * Dist" bytes after another load. I think it depends on the use case how to use the result of areNonVolatileConsecutiveLoads, and making it aware of endianess only mess up such a low level function.

I've done a double check myself for all in-tree uses and all the uses of areNonVolatileConsecutiveLoads should be fine as well.

So modulo a test case this LGTM.

In D40444#935851, @bjope wrote:

In D40444#935387, @niravd wrote:

This endianness problem is probably also latent where do load combination, but we should sink this check into areNonVolatileConsecutiveLoads and MatchLoadCombine.

I've looked at MatchLoadCombine and it looks like it is trying to support both little and big endian. As well as combining the loads and doing a bswap if the ordering in memory does not match how the bytes are ordered after OR:ing the pieces together.

And areNonVolatileConsecutiveLoads is working just as it is described. It checks if one load is "Bytes * Dist" bytes after another load. I think it depends on the use case how to use the result of areNonVolatileConsecutiveLoads, and making it aware of endianess only mess up such a low level function.

In D40444#935956, @niravd wrote:

I've done a double check myself for all in-tree uses and all the uses of areNonVolatileConsecutiveLoads should be fine as well.

So modulo a test case this LGTM.

So the existing test/CodeGen/PowerPC/anon_aggr.ll test was not enough?

What kind of test are you requesting? Something that dumps output after ISel to see that we got some load combines?
I had a hard time trying to do something like that. I basically need some target that will produce a BUILD_PAIR during ISel, often that happens when having some argument passed byval(?).
I found out that I could get the optimization to trigger for PowerPC, but PowerPC is during legalization splitting the combined load into two smaller loads again. That is what happens in test/CodeGen/PowerPC/anon_aggr.ll for one of the RUN-lines. So there is already an in-tree test for which the optimization now triggers, and where the end result ends up the same as before.

I actually wanted to create a test case that showed that we could get miscompiles earlier, but I haven't been able to do that for any in-tree-target. For that to happen I need SelectionDAG to create a BUILD_PAIR where the two loads are in non-consecutive order. I don't know how to trigger that.

Passing a struct of i16 by load to generate a BUILD_PAIR and

I had a hard time trying to do something like that. I basically need some target that will produce a BUILD_PAIR during ISel, often that happens when having some argument passed byval(?).
I found out that I could get the optimization to trigger for PowerPC, but PowerPC is during legalization splitting the combined load into two smaller loads again. That is what happens in test/CodeGen/PowerPC/anon_aggr.ll for one of the RUN-lines. So there is already an in-tree test for which the optimization now triggers, and where the end result ends up the same as before.

I actually wanted to create a test case that showed that we could get miscompiles earlier, but I haven't been able to do that for any in-tree-target. For that to happen I need SelectionDAG to create a BUILD_PAIR where the two loads are in non-consecutive order. I don't know how to trigger that.

Do they have to be non-consecutive? Could you get something by doing a direct operation on the BUILD_PAIR. Something like load a struct of {i16, i16} , citcast the struct to an i32, apply a (+ 1), and return it. Ideally we could have that case, but I suspect it's not worth the time to spend too much time working something out and a case dumping debug information is reasonable.

In D40444#935979, @bjope wrote:

In D40444#935956, @niravd wrote:

I've done a double check myself for all in-tree uses and all the uses of areNonVolatileConsecutiveLoads should be fine as well.

So modulo a test case this LGTM.

So the existing test/CodeGen/PowerPC/anon_aggr.ll test was not enough?

What kind of test are you requesting? Something that dumps output after ISel to see that we got some load combines?
I had a hard time trying to do something like that. I basically need some target that will produce a BUILD_PAIR during ISel, often that happens when having some argument passed byval(?).
I found out that I could get the optimization to trigger for PowerPC, but PowerPC is during legalization splitting the combined load into two smaller loads again. That is what happens in test/CodeGen/PowerPC/anon_aggr.ll for one of the RUN-lines. So there is already an in-tree test for which the optimization now triggers, and where the end result ends up the same as before.

I actually wanted to create a test case that showed that we could get miscompiles earlier, but I haven't been able to do that for any in-tree-target. For that to happen I need SelectionDAG to create a BUILD_PAIR where the two loads are in non-consecutive order. I don't know how to trigger that.

Added a test case that checks that the build_pair -> combine load transform is done during ISel.

Herald added a subscriber: nemanjai. · View Herald TranscriptDec 4 2017, 10:41 AM

Harbormaster completed remote builds in B12726: Diff 125371.Dec 4 2017, 10:42 AM

LGTM. Thanks

This revision is now accepted and ready to land.Dec 4 2017, 10:48 AM

Closed by commit rL319771: [DAGCombine] Handle big endian correctly in CombineConsecutiveLoads (authored by bjope). · Explain WhyDec 5 2017, 6:50 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

ISDOpcodes.h

3 lines

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

7 lines

test/

CodeGen/

PowerPC/

combine_loads_from_build_pair.ll

19 lines

Diff 125371

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	enum NodeType {
/// a Constant, which is required to be operand #1) half of the integer or		/// a Constant, which is required to be operand #1) half of the integer or
/// float value specified as operand #0. This is only for use before		/// float value specified as operand #0. This is only for use before
/// legalization, for values that will be broken into multiple registers.		/// legalization, for values that will be broken into multiple registers.
EXTRACT_ELEMENT,		EXTRACT_ELEMENT,

/// BUILD_PAIR - This is the opposite of EXTRACT_ELEMENT in some ways.		/// BUILD_PAIR - This is the opposite of EXTRACT_ELEMENT in some ways.
/// Given two values of the same integer value type, this produces a value		/// Given two values of the same integer value type, this produces a value
/// twice as big. Like EXTRACT_ELEMENT, this can only be used before		/// twice as big. Like EXTRACT_ELEMENT, this can only be used before
/// legalization.		/// legalization. The lower part of the composite value should be in
		/// element 0 and the upper part should be in element 1.
BUILD_PAIR,		BUILD_PAIR,

/// MERGE_VALUES - This node takes multiple discrete operands and returns		/// MERGE_VALUES - This node takes multiple discrete operands and returns
/// them all as its individual results. This nodes has exactly the same		/// them all as its individual results. This nodes has exactly the same
/// number of inputs and outputs. This node is useful for some pieces of the		/// number of inputs and outputs. This node is useful for some pieces of the
/// code generator that want to think about a single node with multiple		/// code generator that want to think about a single node with multiple
/// results, not multiple nodes.		/// results, not multiple nodes.
MERGE_VALUES,		MERGE_VALUES,
▲ Show 20 Lines • Show All 798 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 8,588 Lines • ▼ Show 20 Lines

	/// build_pair (load, load) -> load			/// build_pair (load, load) -> load
	/// if load locations are consecutive.			/// if load locations are consecutive.
	SDValue DAGCombiner::CombineConsecutiveLoads(SDNode *N, EVT VT) {			SDValue DAGCombiner::CombineConsecutiveLoads(SDNode *N, EVT VT) {
	assert(N->getOpcode() == ISD::BUILD_PAIR);			assert(N->getOpcode() == ISD::BUILD_PAIR);

	LoadSDNode *LD1 = dyn_cast<LoadSDNode>(getBuildPairElt(N, 0));			LoadSDNode *LD1 = dyn_cast<LoadSDNode>(getBuildPairElt(N, 0));
	LoadSDNode *LD2 = dyn_cast<LoadSDNode>(getBuildPairElt(N, 1));			LoadSDNode *LD2 = dyn_cast<LoadSDNode>(getBuildPairElt(N, 1));

				// A BUILD_PAIR is always having the least significant part in elt 0 and the
				// most significant part in elt 1. So when combining into one large load, we
				// need to consider the endianness.
				if (DAG.getDataLayout().isBigEndian())
				std::swap(LD1, LD2);

	if (!LD1 \|\| !LD2 \|\| !ISD::isNON_EXTLoad(LD1) \|\| !LD1->hasOneUse() \|\|			if (!LD1 \|\| !LD2 \|\| !ISD::isNON_EXTLoad(LD1) \|\| !LD1->hasOneUse() \|\|
	LD1->getAddressSpace() != LD2->getAddressSpace())			LD1->getAddressSpace() != LD2->getAddressSpace())
	return SDValue();			return SDValue();
	EVT LD1VT = LD1->getValueType(0);			EVT LD1VT = LD1->getValueType(0);
	unsigned LD1Bytes = LD1VT.getStoreSize();			unsigned LD1Bytes = LD1VT.getStoreSize();
	if (ISD::isNON_EXTLoad(LD2) && LD2->hasOneUse() &&			if (ISD::isNON_EXTLoad(LD2) && LD2->hasOneUse() &&
	DAG.areNonVolatileConsecutiveLoads(LD2, LD1, LD1Bytes, 1)) {			DAG.areNonVolatileConsecutiveLoads(LD2, LD1, LD1Bytes, 1)) {
	unsigned Align = LD1->getAlignment();			unsigned Align = LD1->getAlignment();
	▲ Show 20 Lines • Show All 8,909 Lines • Show Last 20 Lines

test/CodeGen/PowerPC/combine_loads_from_build_pair.ll

This file was added.

				; RUN: llc -verify-machineinstrs -O0 -mcpu=g4 -mtriple=powerpc-apple-darwin8 < %s -debug -stop-after=machineverifier 2>&1 \| FileCheck %s

				define i64 @func1(i64 %p1, i64 %p2, i64 %p3, i64 %p4, { i64, i8* } %struct) {
				; Verify that we get a combine on the build_pair, creating a LD8 load somewhere
				; between "Initial selection DAG" and "Optimized lowered selection DAG".
				; The target is big-endian, and stack grows towards higher addresses,
				; so we expect the LD8 to load from the address used in the original HIBITS
				; load.
				; CHECK-LABEL: Initial selection DAG:
				; CHECK-DAG: [[LOBITS:t[0-9]+]]: i32,ch = load<LD4[FixedStack-2]>
				; CHECK-DAG: [[HIBITS:t[0-9]+]]: i32,ch = load<LD4[FixedStack-1]>
				; CHECK: Combining: t{{[0-9]+}}: i64 = build_pair [[LOBITS]], [[HIBITS]]
				; CHECK-NEXT: into
				; CHECK-SAME: load<LD8[FixedStack-1]
				; CHECK-LABEL: Optimized lowered selection DAG:
				%result = extractvalue {i64, i8* } %struct, 0
				ret i64 %result
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombine] Handle big endian correctly in CombineConsecutiveLoadsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 125371

include/llvm/CodeGen/ISDOpcodes.h

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

test/CodeGen/PowerPC/combine_loads_from_build_pair.ll

[DAGCombine] Handle big endian correctly in CombineConsecutiveLoads
ClosedPublic