- User Since
- Jul 30 2013, 7:58 PM (242 w, 1 d)
2 cycle latency for MOV64rm seems low to me. There's an address calculation and a TLB lookup before it can even start accessing the cache.
This patch makes sure each regular expression covers at least one instruction. We already checked that each InstRW line matched at least one instruction. But if there were multiple regular expressions listed, we didn't check it.
I'd like to see the loops merged to get rid of the nested std::pairs. But yeah I don't want to get in the way of fixing all the targets.
Commited in r327866 but forgot the Differential Revision line
commited in 327869 but forgot to add the Differential Revision line
Mon, Mar 19
And I'll fix TEST
I'll fix the ror/rol/shr/sar problems too.
I think this mostly looks ok. Most of this just stuff SNB doens't support anyway. We should file bugs for the obvious bugs its showing.
I suspect this is wrong for PMULLD. I think that improved to a single uop on goldmont.
Do any instructions on Sandybridge have a Custom load latency of 4? Looks like the most basic instruction like ADD32rr vs ADD32rm has a difference of 5 cycles.
Sun, Mar 18
Fri, Mar 16
icc seems to match gcc for those last 4 cases i sent. And MSVC is throwing an odd signed/unsigned mismatch
What compiler is warning on this? I thought the build bots usually catch this, but maybe the assignments inside the loop is making some compilers not notice?
Fair point, what is the default signedness of char?
gcc also warns for this
Why should the behavior be X86 specific?
Thu, Mar 15
Committed in 327683, but forgot the Differential Revision line.
Wed, Mar 14
The pattern matching detects (and (shift)) and (shift (and)) since we canonicalize the order of shifts and ands. This patch handles 0xffff for (and (srl)) and (shl (and))
Tue, Mar 13
Mon, Mar 12
Added a flag to SplitBinaryOpsAndApply to control checking useBWIRegs or useAVX512Regs. I wanted to do it based on VT, but PMADDWD uses a vXi32 result VT and a vXi16 input VT.
Sun, Mar 11
Remove the MULDQ code from LowerMUL. The DAG combine should cover it now.
Sat, Mar 10
I also picked up some broadcast and fma constant changes in here. I'll separate those out before commit.
Switch to hex. Support splats. Print and/or/xor as 64 bit because byte size interferes with splat recognizition.
Add comment about the operand locations. Rename OpStr->AccStr.
Fri, Mar 9
Once more with context
Made the code less of a heuristic.
Don't continue search if we found a non-aggressive commute. We don't want an aggressive commute to overwrite it.
Thu, Mar 8
Tue, Mar 6
LGTM. This makes sense to me. The extra uop on IMUL16rri/IMUL16rri8 is probably there to preserve the upper bits of the destination register since that operand is not part of the multiply operation. It makes sense that IMUL16rr wouldn't need that.
Mon, Mar 5
Add RUN line for gnux32
The isPseudo flag is supposed to refer to instructions that are expand by the ExpandPostRAPseudos pass.
Sun, Mar 4
Fri, Mar 2
The scalar code is now type based and doesn't new constants. Rework this patch to build on that and find a minimal safe type by looking at each element. Then turn that into a vector type.
Fix spl/bpl/sil/dil as well
Remove comment and rebase
Thu, Mar 1
Probably not, but I don't know how to create that case without an intermediate type larger than 128-bits. Unless I'm missing something.
Add directed tests
Allow it to be turned into fptrunc as well. Which I believe makes it equivalent to the handler for integers.
@RKSimon, are you ok with this patch then?