This is an archive of the discontinued LLVM Phabricator instance.

X86-FMA3: Memory folding for scalar loads + FMA3
ClosedPublic

Authored by v_klochkov on Nov 17 2015, 3:15 PM.

Details

Summary

Hello,

Please review the patch that enables memory folding optimization for
sequences like this:

#include <immintrin.h>
double mem;
__m128d func(__m128d a, __m128d b) {
  __m128d m = _mm_load_sd(&mem);
  return _mm_fmadd_sd(a, b, m);
}

Code without the patch (clang -O3 -S):

func:                                   # @func
        .cfi_startproc
# BB#0:                                 # %entry
        movsd   mem(%rip), %xmm2        # xmm2 = mem[0],zero
        vfmadd213sd     %xmm2, %xmm1, %xmm0
        retq

Code with the patch:

func:                                   # @func
        .cfi_startproc
# BB#0:                                 # %entry
        vfmadd213sd     mem(%rip), %xmm1, %xmm0
        retq

The load can be folded into 2nd or 3rd operand of FMA*_Int instruction.
The newly added test fma-scalar-memfold.ll checks memory folding for both of operands.

lib/Target/X86/X86InstrFMA.td:

Removed the redundant register to register moves.
Memory folding does not work with those moves.
// TODO: perhaps, the register-to-register moves can be just stripped in such/some cases,
// but that is a separate optimization/change-set.

lib/Target/X86/X86InstrInfo.cpp:

Added the FMA*_Int opcodes to the routine
isNonFoldablePartialRegisterLoad()

test/CodeGen/X86/fma-scalar-memfold.ll:

New test. Checks that result of _mm_load_{s,d}() can be folded into 2nd or 3rd operand of FMA*_Int.

Thank you,
Slava

Diff Detail

Repository
rL LLVM

Event Timeline

v_klochkov updated this revision to Diff 40441.Nov 17 2015, 3:15 PM
v_klochkov retitled this revision from to X86-FMA3: Memory folding for scalar loads + FMA3.
v_klochkov updated this object.
v_klochkov added a reviewer: DavidKreitzer.
v_klochkov added subscribers: llvm-commits, qcolombet.
DavidKreitzer edited edge metadata.Nov 20 2015, 8:37 AM

Hi Slava,

Everything looks straightforward to me. I just have a few minor comments.

Thanks,
-Dave

llvm/lib/Target/X86/X86InstrFMA.td
164 ↗(On Diff #40441)

Just noticed a few typos in this comments while I was reviewing the code.

sence --> sense

173 ↗(On Diff #40441)

implemened --> implemented

245 ↗(On Diff #40441)

Please add a space after ','

llvm/test/CodeGen/X86/fma-scalar-memfold.ll
1 ↗(On Diff #40441)

I like the thoroughness of your test!

A couple ideas to make the test a little less sensitive to innocuous changes.

(1) You could avoid checking for the block labels, e.g. "# BB#0" and just change the subsequent CHECK-NEXT to CHECK.
(2) You could avoid explicitly checking for xmm0 and instead use a variable.

5 ↗(On Diff #40441)

"#3" is not defined.

v_klochkov updated this revision to Diff 40905.Nov 23 2015, 1:14 AM
v_klochkov edited edge metadata.
v_klochkov marked 4 inline comments as done.

Fixed the misprints and updated the unit test.

Hi David,

Thank you for the quick code-review. Excuse me for the delay - I am traveling these days.
I fixed the misprints and updated the unit test.

Thank you,
Slava

llvm/test/CodeGen/X86/fma-scalar-memfold.ll
2 ↗(On Diff #40905)

I replaced xmm0 with a variable as you recommended.

Regarding the BB#0 label. Due to some unknown reasons the script update_llc_test_checks.py does not work when I run it, but that script usually generates "CHECK: # BB#0:" line (I noticed that in other people's change-sets fixing tests with help of that script). So, to relax the test checks a little bit I replaced CHECK-NEXT with CHECK (i.e. it ma be ok to have another label between func entry and # BB#0, which happens on some targets if not use { nounwind }).
Please let me know if it looks good now.

5 ↗(On Diff #40441)

Fixed.

DavidKreitzer added inline comments.Nov 23 2015, 5:46 AM
llvm/test/CodeGen/X86/fma-scalar-memfold.ll
17 ↗(On Diff #40905)

This isn't quite what I meant about the block labels. I think you should just delete line 17 here and change line 18 "CHECK-NEXT" --> "CHECK". That way, if # BB#0 changes to something else, it won't affect this test.

The %[[XMM]] changes are great, thanks!

RKSimon added inline comments.
llvm/lib/Target/X86/X86InstrFMA.td
233 ↗(On Diff #40905)

Please would it be possible to add ExecutionDomains to these definitions? For some reason only the packed FMA instructions have Single/Double domains set.

Updated the unit test.

Thank you for the review!

I updated the unit test.
Undefined ExeDomain for scalar FMAs is the problem unrelated to the one fixed by this patch (memory folding for FMA*_Int). So, it should be fixed in a separate patch. I can prepare such patch later.

Thank you,
Slava

llvm/lib/Target/X86/X86InstrFMA.td
233 ↗(On Diff #40905)

Good catch, thank you for the comment!
That would be a simple additional fix, but I want to follow the recommendation: "1 patch fixes 1 problem",
I.e. I can set the ExeDomain for scalar FMAs in a separate patch.

llvm/test/CodeGen/X86/fma-scalar-memfold.ll
17 ↗(On Diff #40905)

Ok, understood, I'll fix the checks.

Thanks, Slava, looks good! I have no further comments.

  • Dave
DavidKreitzer accepted this revision.Nov 25 2015, 7:43 AM
DavidKreitzer edited edge metadata.
This revision is now accepted and ready to land.Nov 25 2015, 7:43 AM
This revision was automatically updated to reflect the committed changes.