- User Since
- Jul 30 2013, 7:58 PM (412 w, 2 d)
AMD's optimization manual for bulldozer only shows a 2 cycle latency. I'm not sure why Agner reports 20 unless there's some bad case for some particular input that isn't documented. A single uop taking 20 cycles sounds very strange and must be serializing the machine. I would only expect divide/sqrt to be that high from a single uop. Maybe someone can run llvm-exegesis and one of those AMD CPUs
Add constant argument range checking to SemaChecking
FXAM appears to be two uops where FTST is one on modern Intel CPUs based on Agner Fog's data. Agner's data for some AMD CPUs shows ~20 cycles of latency.
Doesn't gcc also fold isnan to false under fast math? If we diverge here that means your code would only work correctly with clang.
Honor zext function attributes over a preference for sext.
We may want to update the code in X86ISelLowering getAVX2GatherNode and getGatherNode to replace freeze+poison on Src with a zero vector. We already do this when the Src is undef.
Reword a comment slightly
Tue, Jun 22
Mon, Jun 21
Sun, Jun 20
Sat, Jun 19
Fri, Jun 18
I did some more digging and it looks like ISD::UNDEF for X86 should be turned into ConstantFP<0> by LegalizeDAG. So I really need more information about how we got here.
I'd also like to know what happens if you add "(fneg undef) -> undef" fold to DAGCombiner::visitFNEG.
Can you get IR and use bugpoint to reduce it? I'd really like to see the backend codegen that led to this case.
Limit to cases where mul has a single use. There may be a better heuristic here,
but this is a simple starting point.
Thu, Jun 17
How did you get a CHS with an undef input?
There's a more generic optimization hiding here. Could we teach decomposeMulByConstant to emit (shl (sh1add X, X), C) to handle any constant of the form (3 << C). Similar for (shl (sh2add X, X)) to handle (5 << C), and (shl (sh3add X, X)) to handle (9 << C). If the multiply happens to be used by an add the existing patterns would combine the ADD and the SHL when possible.
Add strictfp to caller in test case.
Address review feedback
Tue, Jun 15
Initialize member variable
Mon, Jun 14
Will frem need to be expanded to a loop in the expansion pass?
Sat, Jun 12
Fri, Jun 11
Fixed typo in comment
I don't what Intel's original failure looked like, but here's a test that should reproduce this with -run-pass=machinelicm https://reviews.llvm.org/P8267 needs more cleanup.
Thu, Jun 10
It would be nice to have a test, but this change seems ok.