After D118128 relaxed the heuristic to require only one EFLAGS generating operand, it now makes sense to avoid X86ISD::SMUL/UMULO duplication as well.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86ISelDAGToDAG.cpp | ||
---|---|---|
2787 | Indent |
llvm/test/CodeGen/X86/xmulo.ll | ||
---|---|---|
489–492 | No sure if it's always beneficial. IIRC, we have some disadvantages on decoding complex lea. |
llvm/test/CodeGen/X86/xmulo.ll | ||
---|---|---|
489–492 | Does Intel arch count simple add lea with a different dst reg as complex? |
llvm/test/CodeGen/X86/xmulo.ll | ||
---|---|---|
489–492 | The AOM F.3.2.2 says LEA: The LEA instruction uses the AGU instead of the ALU. If one of the source register of LEA must come from an execution unit. This dependency will also cause a 3 cycle delay. Thus, LEA should not be used in the technique of adding two values and produce the result in a third register. LEA should be used for address computation. |
llvm/test/CodeGen/X86/xmulo.ll | ||
---|---|---|
489–492 | Won't TuningSlowLEA take care of that by undoing the transform if we end up with bad code? |
llvm/test/CodeGen/X86/xmulo.ll | ||
---|---|---|
489–492 | Not sure. It seems they are different. We only set TuningSlowLEA for Atom, but AOM says: Assembly/Compiler Coding Rule 5. (MH impact, H generality) For Intel Atom processors, LEA should be used for address manipulation; but software should avoid the following situations which creates dependencies from ALU to AGU: an ALU instruction (instead of LEA) for address manipulation or ESP updates; a LEA for ternary addition or non-destructive writes which do not feed address generation. Alternatively, hoist producer instruction more than 3 cycles above the consumer instruction that uses the AGU. Here the case looks like "non-destructive write". |
llvm/test/CodeGen/X86/xmulo.ll | ||
---|---|---|
489–492 | OK, I misunderstood it. It only applies to Atom. Sorry for the noisy. |
Indent