For undefined lane indices, fill the mask with {0..N} instead of zeros, to allow further reduction to word/dword shuffle at VM.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
To me this makes since this applies to undefined indices and should not make any difference in case of valid ones.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp | ||
---|---|---|
2325–2328 | Nit: maybe expand on why '{0..J}', it might be not immediately obvious for those who didn't look at shuffles recently. |
The code looks good, but I don't know much about native SIMD. Is this strictly better for engine-side codegen on all platforms? LGTM if so.
Yes, it is, that's why maybe expanding on description/comments would be good :)
Filling undefined lanes with 0's would effectively turn the shuffle into a byte shuffle, which is more expensive than "coarser" shuffles.
Sounds good. @fanchenkong1, it would be great if you could expand the comment as @penzn suggested. After that, do you have commit access or would you like me to merge this for you? If you would like me to commit it, what author email should I use for you?
A stray thought: we have disabled combining shuffles because it was producing precisely the masks this change seeks to eliminate, I am wondering if we can reenable some of those optimization based on this.
Background:
https://github.com/emscripten-core/emscripten/issues/9340
https://github.com/WebAssembly/simd/issues/196
Thank Thomas and Petr for reviewing this patch! The comment is expanded in the latest change.
Failed test is found on x64 windows, which seems to be related to mlir. Is there any action needed to fix the test?
Hi @tlively, I don't have the commit access. Would you please help land this patch once it is ready? The author name and mail could be "Fanchen Kong <fanchen.kong@intel.com>". Thanks!
IMO you don't need to worry about those, you can retest on Windows if you want, but your change shouldn't/can't cause this. @tlively what do you think?
Yeah the unrelated test failures can be ignored. Good point about our shuffle combine workaround, @penzn. Would you have time and resources to investigate whether we can remove the workaround now?
Nit: maybe expand on why '{0..J}', it might be not immediately obvious for those who didn't look at shuffles recently.