Page MenuHomePhabricator

[DAGCombiner] Exploiting more about the transformation of TransformFPLoadStorePair function
ClosedPublic

Authored by wuzish on Apr 12 2019, 2:23 AM.

Details

Summary

For a given floating point load / store pair, if the load value isn't used by any other operations,
then consider transforming the pair to integer load / store operations if the target deems the transformation profitable.

And we can exploiting much more when there are other operation nodes with chain operand between the load/store pair so long as we keep the chain ordering original. We only replace the register used to load/store from float to integer.

I only add testcase in ARM because the TLI.isDesirableToTransformToIntegerOp hook is only enabled in ARM target.

Diff Detail

Repository
rL LLVM

Event Timeline

wuzish created this revision.Apr 12 2019, 2:23 AM
wuzish edited the summary of this revision. (Show Details)Apr 12 2019, 2:26 AM

I can't find any previous discussion of this particular check. I agree it isn't necessary for correctness.

Do you have any benchmarks showing this actually helps? I'm a little concerned that this might not be a performance win in practice due to register pressure.

I can't find any previous discussion of this particular check. I agree it isn't necessary for correctness.

Do you have any benchmarks showing this actually helps? I'm a little concerned that this might not be a performance win in practice due to register pressure.

Thank you. It's actually a concern about register pressure. And the previous code also has such risk.
I have run the spec2017 in POWER target, but not in ARM because I am trying to enable this for POWER in a follow-up patch and I don't have ARM machine to test.
No obvious performance degression was found in POWER. And I think it's too long time ago (code of 2011) to find the discussion of this particular check.

wuzish added a comment.EditedApr 19 2019, 12:02 AM

Gentle pin...
Anybody has any insights of this issue? Or add any more nominators?

Gentle pin....

Is there any possibility to do such optimization during RA or after RA, when the integer register pressure is not that big? Any way or hook where I can implement it llvm infra has prepared?

jsji resigned from this revision.Jun 25 2019, 9:18 AM
xbolva00 added a subscriber: hfinkel.
efriedma accepted this revision.Jun 25 2019, 10:09 AM

LGTM

Sorry, I meant to reply to this earlier. If you've tested the performance on PowerPC, it's probably fine. ARM has fewer registers, but not so few that it's likely to cause problems here.

This revision is now accepted and ready to land.Jun 25 2019, 10:09 AM
xbolva00 added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

Can you please remove test checks and regenerate them using update_llc_tests_checks.py script before you commit this patch ?
Thanks.

wuzish marked an inline comment as done.Jun 27 2019, 12:17 AM
wuzish added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

I am afraid that -mtriple of darwin is not supported in update_llc_tests_checks.py after I try, and we just write checks manually.

steven.zhang added inline comments.Jun 27 2019, 12:59 AM
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

Maybe, you can fix it as this patch did for darwin triple. https://reviews.llvm.org/D63723

lebedev.ri added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

What @steven.zhang said, let's just add the support there :)
I'd suggest to first try 'armv7-apple-darwin' : (scrub_asm_arm_eabi, ASM_FUNCTION_AARCH64_DARWIN_RE),
how does that look?

wuzish marked an inline comment as done.Jun 27 2019, 2:50 AM
wuzish added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

I have test all ASM_FUNCTION about AARCH64 and ARM, all of them don't work. I am afraid it's not among them. Need write a new regular expression.

lebedev.ri added inline comments.Jun 27 2019, 3:09 AM
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

So as per https://godbolt.org/z/Z9JT6W the structure is:

        .globl  _test                   @ -- Begin function test
        .p2align        2
        .code   32                      @ @test
_test:
<body>
                                        @ -- End function

I haven't tried but i think it should be roughly this:

ASM_FUNCTION_ARM_DARWIN_RE = re.compile(
     r'^[ \t]*\.globl[ \t]*_(?P<func>)[ \t]*@[ \t]--[ \t]Begin[ \t]function[ \t](?P=func)',
     r'^_(?P=func):'
     r'(?P<body>.*?)'
     r'^[ \t]*@[ \t]--[ \t]End[ \t]function',
     flags=(re.M | re.S))
wuzish marked an inline comment as done.Jun 27 2019, 8:29 PM
wuzish added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

It doesn't work.

jsji added inline comments.Jun 28 2019, 10:17 AM
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

I made a fix in https://reviews.llvm.org/D63939, you can review and try it.

wuzish marked 2 inline comments as done.Jul 1 2019, 7:06 PM
wuzish added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

Thank you for your fixing.

wuzish marked an inline comment as done.Jul 1 2019, 7:55 PM
wuzish added inline comments.
llvm/test/CodeGen/ARM/ldst-f32-2-i32.ll
57 ↗(On Diff #194819)

After I tried, it can not solve my added function. I would commit it firstly. Thank you.

This revision was automatically updated to reflect the committed changes.