For the costs - ideally you need to run the code snippet through llvm-mca (you can do this in godbolt) for various cpus of that level (e.g. avx1 -> sandybridge/btver2/bdver2, avx2 -> znver2/haswell etc.) and use the worst case throughput cost of those runs. For older targets (sse2...) we're more limited on testable cpu targets, I tend to just use slm's costs as they tend to match weak pre-avx cpus).
sorry - missed a minor - LGTM with that change
Sun, Oct 18
Sat, Oct 17
@saugustine we probably should move this to a bugzilla ticket if that is OK - is there any way that you can get the ir output with/without the non-uniform constant matchers in matchShiftAmount and get the diff? It should be something to do with vector rotate handling in the dag - either generic or powerpc specific - but I'd like to see the ir diff to help narrow it down.
Fri, Oct 16
@saugustine Any update on this? Please can you tell me if its an assert, crash or miscompile?
Thu, Oct 15
@kbelochapka Abandon this patch? The plan is now to handle it in InstCombine inside narrowRotate.
@kbelochapka Abandon this patch?
Wed, Oct 14
Tue, Oct 13
Thanks @evgeny777 !
Reworked with a Constant::mergeUndefsWith helper as suggested by @lebedev.ri
I agree there isn't any reason not to try and move this to generic legalization - although we are still finding a few edge legalization cases where funnel/rotates fail on some targets so you might encounter that.
No objections from me and this looks better than D88194
Google has tracked down a failure in openssl for powerpc to this change. (And the fix for the 32-bit vs 64-bit issue below doesn't seem to fix this.)
I'm trying to get a smaller reproduction. But just so that is on your radar.
@Jac1494 Please can you rebase this patch against trunk - from the CHECK changes in existing tests it looks like you have some local changes as well.
Mon, Oct 12
LGTM - thank you!
I'd prefer that our docs explicitly state what we've implemented instead of just referencing an external webpage.
Also, can BSF be handled here as well?
Sun, Oct 11
Please can you add an entry to the 12.00 release notes describing this? Maybe somewhere in the clang docs as well?
Sat, Oct 10
Use getZeroExtendInReg and refresh with ARM/MIPS changes