This is an archive of the discontinued LLVM Phabricator instance.

[X86] selectLEAAddr - add X86ISD::SMUL/UMULO handling
ClosedPublic

Authored by RKSimon on Feb 11 2022, 12:40 PM.

Details

Summary

After D118128 relaxed the heuristic to require only one EFLAGS generating operand, it now makes sense to avoid X86ISD::SMUL/UMULO duplication as well.

Diff Detail

Event Timeline

RKSimon created this revision.Feb 11 2022, 12:40 PM
RKSimon requested review of this revision.Feb 11 2022, 12:40 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2022, 12:40 PM
pengfei added inline comments.Feb 12 2022, 5:19 AM
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp
2787

Indent

RKSimon updated this revision to Diff 408179.Feb 12 2022, 6:31 AM

Fix clang-format

pengfei added inline comments.Feb 16 2022, 6:58 PM
llvm/test/CodeGen/X86/xmulo.ll
489–492

No sure if it's always beneficial. IIRC, we have some disadvantages on decoding complex lea.

RKSimon added inline comments.Feb 17 2022, 1:57 AM
llvm/test/CodeGen/X86/xmulo.ll
489–492

Does Intel arch count simple add lea with a different dst reg as complex?

pengfei added inline comments.Feb 17 2022, 4:08 AM
llvm/test/CodeGen/X86/xmulo.ll
489–492

The AOM F.3.2.2 says

LEA: The LEA instruction uses the AGU instead of the ALU. If one of the source register of LEA must
come from an execution unit. This dependency will also cause a 3 cycle delay. Thus, LEA should not
be used in the technique of adding two values and produce the result in a third register. LEA should
be used for address computation.
lebedev.ri added inline comments.Feb 17 2022, 4:15 AM
llvm/test/CodeGen/X86/xmulo.ll
489–492

Won't TuningSlowLEA take care of that by undoing the transform if we end up with bad code?

pengfei added inline comments.Feb 17 2022, 4:39 AM
llvm/test/CodeGen/X86/xmulo.ll
489–492

Not sure. It seems they are different. We only set TuningSlowLEA for Atom, but AOM says:

Assembly/Compiler Coding Rule 5. (MH impact, H generality) For Intel Atom processors, LEA
should be used for address manipulation; but software should avoid the following situations which
creates dependencies from ALU to AGU: an ALU instruction (instead of LEA) for address manipulation or
ESP updates; a LEA for ternary addition or non-destructive writes which do not feed address
generation. Alternatively, hoist producer instruction more than 3 cycles above the consumer instruction
that uses the AGU.

Here the case looks like "non-destructive write".

pengfei added inline comments.Feb 17 2022, 4:42 AM
llvm/test/CodeGen/X86/xmulo.ll
489–492

OK, I misunderstood it. It only applies to Atom. Sorry for the noisy.

pengfei accepted this revision.Feb 17 2022, 4:43 AM

LGTM.

This revision is now accepted and ready to land.Feb 17 2022, 4:43 AM
This revision was landed with ongoing or failed builds.Feb 17 2022, 5:51 AM
This revision was automatically updated to reflect the committed changes.