This is an archive of the discontinued LLVM Phabricator instance.

[X86MacroFusion] Handle branch fusion (AMD CPUs).
ClosedPublic

Authored by courbet on Mar 27 2019, 4:44 AM.

Details

Summary

This adds a BranchFusion feature to replace the usage of the MacroFusion
for AMD CPUs.

See D59688 for context.

Event Timeline

courbet created this revision.Mar 27 2019, 4:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2019, 4:44 AM
courbet updated this revision to Diff 192425.Mar 27 2019, 4:45 AM

clang-format

Harbormaster completed remote builds in B29682: Diff 192425.

Looks good.
Can there be a test that we don't fuse non-CMP/TEST instructions if we only have branchfusion?

llvm/lib/Target/X86/X86.td
347–359

I think it would be better to swap these around?
FeatureMacroFusion is a superset of FeatureBranchFusion.
(maybe even inherit FeatureBranchFusion by FeatureMacroFusion, not sure it is possible here.)

354
// Bulldozer and newer processors can merge CMP/TEST (but not other instructions)
// with conditional branches.
lebedev.ri added inline comments.Mar 27 2019, 6:01 AM
llvm/test/Transforms/LoopStrengthReduce/X86/macro-fuse-cmp.ll
2–3

Also, if you do add a test that we don't fuse non-CMP/TEST instructions if we only have branchfusion,
could you please also add a bdver2 runline, and precommit all that?

courbet updated this revision to Diff 192478.Mar 27 2019, 10:47 AM
courbet marked 3 inline comments as done.

Add tests.

courbet updated this revision to Diff 192479.Mar 27 2019, 10:49 AM

Update comment, swap features.

llvm/lib/Target/X86/X86.td
347–359

I'd rather not inherit because they are actually different (e.g. w.r.t. CMP). Also they have different subtle limitations that are not currently modeled (see Agner), but we might want to model in the future.

Harbormaster completed remote builds in B29698: Diff 192479.
lebedev.ri accepted this revision.Mar 27 2019, 11:13 AM
lebedev.ri marked an inline comment as done.

Looks good to me, but please wait a bit for @andreadb.

llvm/lib/Target/X86/X86.td
347–359

Sounds good.

llvm/lib/Target/X86/X86MacroFusion.cpp
23

FirstInstrKind?

25–30

JumpKind?

llvm/test/CodeGen/X86/testb-je-fusion.ll
1–3

In trunk, i think you will get slightly more meaningful diff with

; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
; RUN: llc < %s -mtriple=x86_64-- -mattr=-macrofusion,-branchfusion | FileCheck %s --check-prefixes=NOFUSION
; RUN: llc < %s -mtriple=x86_64-- -mattr=-macrofusion,+branchfusion | FileCheck %s --check-prefixes=NOFUSION,BRANCHFUSION,BRANCHFUSIONONLY
; RUN: llc < %s -mtriple=x86_64-- -mattr=+macrofusion,-branchfusion | FileCheck %s --check-prefixes=NOFUSION,BRANCHFUSION,MACROFUSION

(drop [+-]branchfusion for trunk, add it back in patch)

This revision is now accepted and ready to land.Mar 27 2019, 11:13 AM
RKSimon requested changes to this revision.Mar 28 2019, 4:10 AM
RKSimon added a reviewer: RKSimon.
RKSimon added a subscriber: RKSimon.

Some style changes

llvm/lib/Target/X86/X86MacroFusion.cpp
29

(style) Please comment enums - they are not obvious.

212

Add a unreachable message

llvm/lib/Target/X86/X86TargetTransformInfo.h
40

NFC change - commit separetely

Macro-op fusion on Intel processors is essentially a form branch fusion.
The only difference in practice is that - starting from Sandybridge - a few extra opcodes (mostly arithmetic) other than CMP/TEST can now be fuse with branches.
But conceptually, what AMD calls branch fusion is (a form of) macro-op fusion.
Using FeatureMacroFusion and FeatureBranchFusion to refer to the two different forms of macro-op fusion may be a bit misleading... But then, I am not good with names, so I am not sure I am able to suggest better names for those :-( .

What if in future Intel/AMD processors decide to further expand the set of opcodes that are subject to macro-op fusion? If we adopt this approach, then we end up with even more feature flags. whether that is acceptable or not is often a matter of tastes. It may not be a problem for now (we only have two flags). However, we should really look into a tablegen-driven approach. For example, every scheduling model could declare the set of opcodes that can be fused with branches. Note that we can already do something like that today if we use STIPRedicateDecl and STIPredicate definitions via tablegen (with the advantage that the information is also accessible at the MC layer - and tools like mca could use it..).

In case I can help you with how to write those STIPredicate.

But for the short term, I think this patch is fine.
We may try to revisit this approach in future if we decide to get rid of those target features entirely, and expose that info to the MC layer (https://bugs.llvm.org/show_bug.cgi?id=36670).

-Andrea

andreadb accepted this revision.Mar 28 2019, 5:13 AM
courbet updated this revision to Diff 192624.Mar 28 2019, 6:43 AM
courbet marked 7 inline comments as done.

Address review comments, rebase.

llvm/test/CodeGen/X86/testb-je-fusion.ll
1–3

Good point, done.

Perfect, thanks!

Thanks for the review.

This revision was automatically updated to reflect the committed changes.