This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/CodeGen/SelectionDAG/
-
CodeGen/
-
SelectionDAG/
-
DAGCombiner.cpp
-
test/CodeGen/ARM/
-
CodeGen/
-
ARM/
-
ldst-f32-2-i32.ll

Differential D60601

[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
ClosedPublic

Authored by • wuzish on Apr 12 2019, 2:23 AM.

Download Raw Diff

Details

Reviewers

t.p.northover
bogner
eli.friedman
nemanjai
steven.zhang
hfinkel
RKSimon
efriedma
jsji

Commits

rG7ae536a1cedf: [DAGCombiner] Exploiting more about the transformation of…
rL364883: [DAGCombiner] Exploiting more about the transformation of…

Summary

For a given floating point load / store pair, if the load value isn't used by any other operations,
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.

And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.

I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.

Diff Detail

Repository: rL LLVM

Event Timeline

• wuzish created this revision.Apr 12 2019, 2:23 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2019, 2:23 AM

Herald added subscribers: llvm-commits, hiraditya, kristof.beyls, javed.absar. · View Herald Transcript

• wuzish edited the summary of this revision. (Show Details)Apr 12 2019, 2:26 AM

• wuzish added a reviewer: hfinkel.Apr 12 2019, 2:31 AM

I can't find any previous discussion of this particular check. I agree it isn't necessary for correctness.

Do you have any benchmarks showing this actually helps? I'm a little concerned that this might not be a performance win in practice due to register pressure.

In D60601#1464855, @efriedma wrote:

I can't find any previous discussion of this particular check. I agree it isn't necessary for correctness.

Do you have any benchmarks showing this actually helps? I'm a little concerned that this might not be a performance win in practice due to register pressure.

Thank you. It's actually a concern about register pressure. And the previous code also has such risk.
I have run the spec2017 in POWER target, but not in ARM because I am trying to enable this for POWER in a follow-up patch and I don't have ARM machine to test.
No obvious performance degression was found in POWER. And I think it's too long time ago (code of 2011) to find the discussion of this particular check.

Gentle pin...
Anybody has any insights of this issue? Or add any more nominators?

Gentle pin....

Is there any possibility to do such optimization during RA or after RA, when the integer register pressure is not that big? Any way or hook where I can implement it llvm infra has prepared?

jsji resigned from this revision.Jun 25 2019, 9:18 AM

Herald added a subscriber: jsji. · View Herald TranscriptJun 25 2019, 9:18 AM

xbolva00 added a reviewer: RKSimon.Jun 25 2019, 9:35 AM

xbolva00 added a subscriber: hfinkel.

LGTM

Sorry, I meant to reply to this earlier. If you've tested the performance on PowerPC, it's probably fine. ARM has fewer registers, but not so few that it's likely to cause problems here.

This revision is now accepted and ready to land.Jun 25 2019, 10:09 AM

xbolva00 added a subscriber: xbolva00.Jun 25 2019, 10:35 AM

xbolva00 added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	Can you please remove test checks and regenerate them using update_llc_tests_checks.py script before you commit this patch ? Thanks.

• wuzish marked an inline comment as done.Jun 27 2019, 12:17 AM

• wuzish added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	I am afraid that -mtriple of darwin is not supported in update_llc_tests_checks.py after I try, and we just write checks manually.

steven.zhang added inline comments.Jun 27 2019, 12:59 AM

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	Maybe, you can fix it as this patch did for darwin triple. https://reviews.llvm.org/D63723

lebedev.ri added a subscriber: lebedev.ri.Jun 27 2019, 1:37 AM

lebedev.ri added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	What @steven.zhang said, let's just add the support there :) I'd suggest to first try `'armv7-apple-darwin' : (scrub_asm_arm_eabi, ASM_FUNCTION_AARCH64_DARWIN_RE),` how does that look?

• wuzish marked an inline comment as done.Jun 27 2019, 2:50 AM

• wuzish added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	I have test all ASM_FUNCTION about AARCH64 and ARM, all of them don't work. I am afraid it's not among them. Need write a new regular expression.

lebedev.ri added inline comments.Jun 27 2019, 3:09 AM

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll

57 ↗

(On Diff #194819)

So as per https://godbolt.org/z/Z9JT6W the structure is:

        .globl  _test                   @ -- Begin function test
        .p2align        2
        .code   32                      @ @test
_test:
<body>
                                        @ -- End function

I haven't tried but i think it should be roughly this:

ASM_FUNCTION_ARM_DARWIN_RE = re.compile(
     r'^[ \t]*\.globl[ \t]*_(?P<func>)[ \t]*@[ \t]--[ \t]Begin[ \t]function[ \t](?P=func)',
     r'^_(?P=func):'
     r'(?P<body>.*?)'
     r'^[ \t]*@[ \t]--[ \t]End[ \t]function',
     flags=(re.M | re.S))

• wuzish marked an inline comment as done.Jun 27 2019, 8:29 PM

• wuzish added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	It doesn't work.

jsji added inline comments.Jun 28 2019, 10:17 AM

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	I made a fix in https://reviews.llvm.org/D63939, you can review and try it.

• wuzish marked 2 inline comments as done.Jul 1 2019, 7:06 PM

• wuzish added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	Thank you for your fixing.

• wuzish marked an inline comment as done.Jul 1 2019, 7:55 PM

• wuzish added inline comments.

llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗	(On Diff #194819)	After I tried, it can not solve my added function. I would commit it firstly. Thank you.

Closed by commit rL364883: [DAGCombiner] Exploiting more about the transformation of… (authored by • wuzish). · Explain WhyJul 1 2019, 7:56 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

CodeGen/

SelectionDAG/

DAGCombiner.cpp

6 lines

test/

CodeGen/

ARM/

ldst-f32-2-i32.ll

40 lines

Diff 207468

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 14,922 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
return SDValue();		return SDValue();
}		}

/// For a given floating point load / store pair, if the load value isn't used		/// For a given floating point load / store pair, if the load value isn't used
/// by any other operations, then consider transforming the pair to integer		/// by any other operations, then consider transforming the pair to integer
/// load / store operations if the target deems the transformation profitable.		/// load / store operations if the target deems the transformation profitable.
SDValue DAGCombiner::TransformFPLoadStorePair(SDNode *N) {		SDValue DAGCombiner::TransformFPLoadStorePair(SDNode *N) {
StoreSDNode *ST = cast<StoreSDNode>(N);		StoreSDNode *ST = cast<StoreSDNode>(N);
SDValue Chain = ST->getChain();
SDValue Value = ST->getValue();		SDValue Value = ST->getValue();
if (ISD::isNormalStore(ST) && ISD::isNormalLoad(Value.getNode()) &&		if (ISD::isNormalStore(ST) && ISD::isNormalLoad(Value.getNode()) &&
Value.hasOneUse() &&		Value.hasOneUse()) {
Chain == SDValue(Value.getNode(), 1)) {
LoadSDNode *LD = cast<LoadSDNode>(Value);		LoadSDNode *LD = cast<LoadSDNode>(Value);
EVT VT = LD->getMemoryVT();		EVT VT = LD->getMemoryVT();
if (!VT.isFloatingPoint() \|\|		if (!VT.isFloatingPoint() \|\|
VT != ST->getMemoryVT() \|\|		VT != ST->getMemoryVT() \|\|
LD->isNonTemporal() \|\|		LD->isNonTemporal() \|\|
ST->isNonTemporal() \|\|		ST->isNonTemporal() \|\|
LD->getPointerInfo().getAddrSpace() != 0 \|\|		LD->getPointerInfo().getAddrSpace() != 0 \|\|
ST->getPointerInfo().getAddrSpace() != 0)		ST->getPointerInfo().getAddrSpace() != 0)
Show All 13 Lines	if (ISD::isNormalStore(ST) && ISD::isNormalLoad(Value.getNode()) &&
if (LDAlign < ABIAlign \|\| STAlign < ABIAlign)		if (LDAlign < ABIAlign \|\| STAlign < ABIAlign)
return SDValue();		return SDValue();

SDValue NewLD =		SDValue NewLD =
DAG.getLoad(IntVT, SDLoc(Value), LD->getChain(), LD->getBasePtr(),		DAG.getLoad(IntVT, SDLoc(Value), LD->getChain(), LD->getBasePtr(),
LD->getPointerInfo(), LDAlign);		LD->getPointerInfo(), LDAlign);

SDValue NewST =		SDValue NewST =
DAG.getStore(NewLD.getValue(1), SDLoc(N), NewLD, ST->getBasePtr(),		DAG.getStore(ST->getChain(), SDLoc(N), NewLD, ST->getBasePtr(),
ST->getPointerInfo(), STAlign);		ST->getPointerInfo(), STAlign);

AddToWorklist(NewLD.getNode());		AddToWorklist(NewLD.getNode());
AddToWorklist(NewST.getNode());		AddToWorklist(NewST.getNode());
WorklistRemover DeadNodes(*this);		WorklistRemover DeadNodes(*this);
DAG.ReplaceAllUsesOfValueWith(Value.getValue(1), NewLD.getValue(1));		DAG.ReplaceAllUsesOfValueWith(Value.getValue(1), NewLD.getValue(1));
++LdStFP2Int;		++LdStFP2Int;
return NewST;		return NewST;
▲ Show 20 Lines • Show All 5,751 Lines • Show Last 20 Lines

llvm/trunk/test/CodeGen/ARM/ldst-f32-2-i32.ll

Show All 30 Lines	bb:
store float %1, float* %dst_addr.03, align 4		store float %1, float* %dst_addr.03, align 4
%2 = add i32 %j.05, 1		%2 = add i32 %j.05, 1
%exitcond = icmp eq i32 %2, %width		%exitcond = icmp eq i32 %2, %width
br i1 %exitcond, label %return, label %bb		br i1 %exitcond, label %return, label %bb

return:		return:
ret void		ret void
}		}

		@a1 = local_unnamed_addr global float 0.000000e+00, align 4
		@a2 = local_unnamed_addr global float 0.000000e+00, align 4
		@a3 = local_unnamed_addr global float 0.000000e+00, align 4
		@a4 = local_unnamed_addr global float 0.000000e+00, align 4
		@a5 = local_unnamed_addr global float 0.000000e+00, align 4
		@a6 = local_unnamed_addr global float 0.000000e+00, align 4
		@a7 = local_unnamed_addr global float 0.000000e+00, align 4
		@a8 = local_unnamed_addr global float 0.000000e+00, align 4


		declare void @_Z3fooddddddddddddddd(float, float, float, float, float, float, float, float)

		; Because this test function is trying to pass float argument by stack,
		; it can be optimized to i32 load / store
		define signext i32 @test() {
		%1 = load float, float* @a1, align 4
		%2 = load float, float* @a2, align 4
		%3 = load float, float* @a3, align 4
		%4 = load float, float* @a4, align 4
		%5 = load float, float* @a5, align 4
		%6 = load float, float* @a6, align 4
		%7 = load float, float* @a7, align 4
		%8 = load float, float* @a8, align 4
		tail call void @_Z3fooddddddddddddddd(float %1, float %2, float %3, float %4, float %5, float %6, float %7, float %8)
		ret i32 0
		}

		; CHECK-LABEL: _test:
		; CHECK: ldr r3, [pc, r3]
		; CHECK: ldr r2, [pc, r2]
		; CHECK: ldr r1, [pc, r1]
		; CHECK: ldr r0, [pc, r0]
		; CHECK: ldr r9, [pc, r9]
		; CHECK: ldr r12, [pc, r12]
		; CHECK: ldr lr, [pc, lr]
		; CHECK: stm sp, {r9, r12, lr}
		; CHECK: ldr r4, [pc, r4]
		; CHECK: str r4, [sp, #12]
		; CHECK: bl __Z3fooddddddddddddddd

This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair functionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 207468

llvm/trunk/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/trunk/test/CodeGen/ARM/ldst-f32-2-i32.ll

[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
ClosedPublic