This is an archive of the discontinued LLVM Phabricator instance.

[X86] GLC: Break false dependency for dest register for several instructions.
ClosedPublic

Authored by gpei on Dec 20 2021, 7:27 PM.

Details

Summary

Force insert zero-idiom and break false dependency of dest register for several instructions.

The related instructions are:

VPERMD/Q/PS/PD
VRANGEPD/PS/SD/SS
VGETMANTSS/SD/SH
VGETMANDPS/PD - mem version only
VPMULLQ
VFMULCSH/PH
VFCMULCSH/PH

Diff Detail

Event Timeline

gpei created this revision.Dec 20 2021, 7:27 PM
gpei requested review of this revision.Dec 20 2021, 7:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 20 2021, 7:27 PM

Some of these are quite surprising to me especially PMULLQ. Can you provide any more explanation about why these have false dependencies?

craig.topper added inline comments.Dec 20 2021, 7:38 PM
llvm/lib/Target/X86/X86InstrInfo.cpp
5785

This doesn't make sense. VPXORD is an integer instruction.

gpei added a reviewer: wxiao3.Dec 20 2021, 9:18 PM
RKSimon added inline comments.Dec 21 2021, 2:31 AM
llvm/lib/Target/X86/X86.td
447

(style) Keep all the FalseDeps tuning flags together?

999

Out of interest - why is Alderlake defined with the atom core intel defs? I realise its big-little, but I'd still count it as a mainstream core and not an embedded budget core.

gpei added a comment.Mar 17 2022, 12:14 AM

Some of these are quite surprising to me especially PMULLQ. Can you provide any more explanation about why these have false dependencies?

Intel® 64 and IA-32 Architectures Optimization Reference Manual is updated, 2.2.1.4 Avoiding Destination False Dependency tells the details.

llvm/lib/Target/X86/X86InstrInfo.cpp
5785

Yes, I copied the comments above. There is no VPXORPSZ128rr, is it acceptable I only delete my comments?

Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2022, 12:14 AM
craig.topper added inline comments.Mar 17 2022, 12:37 AM
llvm/lib/Target/X86/X86.td
583

Probably shouldn't overly shorten instruction names in documentation that goes into user facing help text.

llvm/lib/Target/X86/X86InstrInfo.cpp
5785

That's because VP means integer. There's a VXORPSZ128rr, but it requires AVX512DQ. So using VPXORD is probably best.

gpei updated this revision to Diff 416433.Mar 18 2022, 2:51 AM
gpei marked 2 inline comments as done.Mar 18 2022, 2:58 AM
gpei added inline comments.
llvm/lib/Target/X86/X86.td
447

Done, thanks for pointing this!

583

You are right, have fixed it.

999

Sorry, I have no ideas, maybe it's caused by HW limitations.

llvm/lib/Target/X86/X86InstrInfo.cpp
5785

OK, I know, thanks!

LuoYuanke added inline comments.Mar 18 2022, 3:26 AM
llvm/lib/Target/X86/X86.td
999

Out of interest - why is Alderlake defined with the atom core intel defs? I realise its big-little, but I'd still count it as a mainstream core and not an embedded budget core.

The big-small core share the same ISA. I think Alderlake miss some features that previous generation of client CPU has. I notice Alderlake miss FeatureVP2INTERSECT which is introduced in tigerlake. We can add features with !listconcat(features1, features2), but it may not easy to exclude a feature from a CPU feature list.

LuoYuanke accepted this revision.Apr 16 2022, 8:39 PM

LGTM, pls wait for 1 or 2 days in case other reviews have some comments.

This revision is now accepted and ready to land.Apr 16 2022, 8:39 PM
RKSimon requested changes to this revision.Apr 18 2022, 9:48 AM

A few minors to address

llvm/lib/Target/X86/X86.td
588

This change should be a separate patch

llvm/lib/Target/X86/X86Subtarget.h
265 ↗(On Diff #416433)

Move this block above HasSBBDepBreaking so all the FalseDeps attributes are together

702 ↗(On Diff #416433)

Don't put these here - move them below hasLZCNTFalseDeps() further down

llvm/test/CodeGen/X86/cmul-false-deps.ll
2 ↗(On Diff #416433)

Do you need -O3?

Add a RUNs that use the +/- false-deps-cmul attribute to check that it works - same for all other test files,

This revision now requires changes to proceed.Apr 18 2022, 9:48 AM
gpei updated this revision to Diff 423618.Apr 19 2022, 7:33 AM
gpei marked 2 inline comments as done.
gpei marked 4 inline comments as done.Apr 19 2022, 7:36 AM

@RKSimon Fixed these issue you mentions.

RKSimon added inline comments.Apr 19 2022, 8:16 AM
llvm/test/CodeGen/X86/cmul-false-deps.ll
3 ↗(On Diff #423618)

Do you need disable-peephole?

6 ↗(On Diff #423618)

you can remove these - as you have specified a triple in the RUN

gpei updated this revision to Diff 423788.Apr 19 2022, 7:21 PM
gpei marked an inline comment as done.
gpei marked 2 inline comments as done.

Done, thanks for pointing this!

craig.topper added inline comments.Apr 19 2022, 7:26 PM
llvm/lib/Target/X86/X86.td
460

This should probably be MULC since the instruction names are VFMULC and VFCMULC. The consistent C is after MUL not before.

gpei updated this revision to Diff 423802.Apr 19 2022, 8:22 PM
gpei added inline comments.
llvm/lib/Target/X86/X86.td
460

I think cmul stands for complex multiply. However, I will fix it.

RKSimon accepted this revision.Apr 20 2022, 1:34 AM

LGTM - cheers

This revision is now accepted and ready to land.Apr 20 2022, 1:34 AM
RKSimon requested changes to this revision.Apr 20 2022, 9:39 AM

Where have the changes in X86Subtarget.h gone?

This revision now requires changes to proceed.Apr 20 2022, 9:39 AM
gpei added a comment.Apr 20 2022, 9:50 AM

Where have the changes in X86Subtarget.h gone?

We don't need change X86Subtarget.h anymore. These code are generated automatically by *.td now.

RKSimon accepted this revision.Apr 20 2022, 2:14 PM

OK - cheers

This revision is now accepted and ready to land.Apr 20 2022, 2:14 PM
This revision was landed with ongoing or failed builds.Apr 21 2022, 1:47 AM
This revision was automatically updated to reflect the committed changes.

Will this make it into 14.x ?

Will this make it into 14.x ?

This is being discussed on https://github.com/llvm/llvm-project/issues/55130 - if you can prove that there is a performance regression since 13.x without the patch then it should probably go in. Other wise it's really a new feature and I'm not sure whether we are accepting those for point releases.