- User Since
- Jul 30 2013, 7:58 PM (224 w, 5 d)
If it supports masking we can't use the intrinsic in the tablegen as it would go against our normal lowering of intrinsics.
Sat, Nov 18
All skylake-avx512 and cannonlake now set corei7 as of r318616. Abandoning this.
Can you add command lines to vector-tzcnt-128/256/512.ll? We should be able to use popcnt for tzcnt when avx512cd's lzcnt is not available.
Fri, Nov 17
Thu, Nov 16
Yes I'm going to fix the QQ issue as well.
Wed, Nov 15
Sorry I think there's still a bug.
One issue with the struct approach. Suppose I opt-bisect to a failure caused by one of these simplifyCFG passes. How do I invoke that specific version of simplifycfg from opt to debug it or even produce a reduced test case.
Tue, Nov 14
I don't recognize some of these
Now with test case.
Mon, Nov 13
I noticed this is missing some load folding of stack reloads in the multiply tests. I'm also not seeing much benefit and a couple small regressions in our benchmark runs. So I'm going to hold off on this.
Sun, Nov 12
Add a large code model test. Merge 'else' and 'if (matchVectorAddress". Add a comment.
Sorry, I didn't necessarily mean to add MIN/MAX to this patch. I assume they are missing from all scheduling models.
Because I noticed while looking at this. Can you also add the (V)MAXCPD/MAXCPS/MAXCSD/MAXCSS/MINCPD/MINCPS/MINCSD/MINCSS instructions. They're equivalent to their counterparts with out the 'C'. They are part of a hack to make floating point min/max commutable under fast math.
Sat, Nov 11
Fri, Nov 10
Use matchSelectPattern instead. Use user_back instead of user_begin since it does that same thing without the explicit dereference. Also updated the equivalent place in visitFCmp.
gas seems to error on movl %fs,(%rsi) and movl (%rsi), %fs
Remove the hasIndices check
This solution doesn't seem very general, it won't catch.
Thu, Nov 9
Remove accidental update
- Gather optimization
We should have scheduling information for both forms. They are both used by isel and both make it all the way to code emission.
Tue, Nov 7
Fix some typos
I don't see any tests that produce kunpckbw after this change.
Add full context
Mon, Nov 6
Address Sanjay's comments.
Sat, Nov 4
Ok then we can keep the new test.
Can we just add -Werror to test/CodeGen/3dnow-builtins.c to test this? I believe it should be throwing a warning currently.
I believe the X86ISelLowering.cpp changes are no longer necessary after r317403 and r317410. I've taught lowering to prefer the EVEX encoded SHUF instructions by default and then EVEX->VEX will turn them back into VPERM2F128 if they don't end up being masked, using broadcast loads, or using XMM16-31.
Fri, Nov 3
This had no affect on the binary output for any of the benchmarks we ran on trunk.
In the longer term we may end up making -mprefer-avx256 be on by default for skylake-avx256 so I think a vectorizer specific command line option would prevent us from having that control.
Thu, Nov 2
There's an oddity with fma. The version without __builtin has 'e' already
Wed, Nov 1
I added a CPUISVALID that you should be able to use to know if its valid for builtin_cpu_is. The name for builtin_cpu_is is the first string. Empty string if its not valid. The corresponding march names are the second string. Host.cpp needed to translate to -march names.
Tue, Oct 31
Remove code that wasn't supposed to be in this review.
Mon, Oct 30
Replicate check instead of moving it. Add optsize fast-isel tests.
I don't think this makes fast isel slower rather it make sure the fast isel entry point does this check. (at least the one used by X86FastISel::tryToFoldLoadIntoMI). So it actually may slow down peephole folding and spill folding. Maybe I should just leave the other checks in place and add a new one?