This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Improve codegen for shuffles with undefined lane indices
ClosedPublic

Authored by fanchenkong1 on Sep 7 2022, 10:53 PM.

Details

Summary

For undefined lane indices, fill the mask with {0..N} instead of zeros, to allow further reduction to word/dword shuffle at VM.

Diff Detail

Event Timeline

fanchenkong1 created this revision.Sep 7 2022, 10:53 PM
Herald added a project: Restricted Project. · View Herald Transcript
fanchenkong1 requested review of this revision.Sep 7 2022, 10:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 10:53 PM

minor change

penzn added a comment.Sep 8 2022, 4:56 PM

To me this makes since this applies to undefined indices and should not make any difference in case of valid ones.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
2325–2328

Nit: maybe expand on why '{0..J}', it might be not immediately obvious for those who didn't look at shuffles recently.

tlively accepted this revision.Sep 9 2022, 6:44 PM

The code looks good, but I don't know much about native SIMD. Is this strictly better for engine-side codegen on all platforms? LGTM if so.

This revision is now accepted and ready to land.Sep 9 2022, 6:44 PM

The code looks good, but I don't know much about native SIMD. Is this strictly better for engine-side codegen on all platforms? LGTM if so.

Yes, it is, that's why maybe expanding on description/comments would be good :)

Filling undefined lanes with 0's would effectively turn the shuffle into a byte shuffle, which is more expensive than "coarser" shuffles.

Sounds good. @fanchenkong1, it would be great if you could expand the comment as @penzn suggested. After that, do you have commit access or would you like me to merge this for you? If you would like me to commit it, what author email should I use for you?

penzn accepted this revision.Sep 12 2022, 5:37 PM

A stray thought: we have disabled combining shuffles because it was producing precisely the masks this change seeks to eliminate, I am wondering if we can reenable some of those optimization based on this.

Background:

https://github.com/emscripten-core/emscripten/issues/9340
https://github.com/WebAssembly/simd/issues/196

handle comments

minor change

Thank Thomas and Petr for reviewing this patch! The comment is expanded in the latest change.

Failed test is found on x64 windows, which seems to be related to mlir. Is there any action needed to fix the test?

Hi @tlively, I don't have the commit access. Would you please help land this patch once it is ready? The author name and mail could be "Fanchen Kong <fanchen.kong@intel.com>". Thanks!

penzn accepted this revision.Sep 13 2022, 9:28 AM

Failed test is found on x64 windows, which seems to be related to mlir. Is there any action needed to fix the test?

IMO you don't need to worry about those, you can retest on Windows if you want, but your change shouldn't/can't cause this. @tlively what do you think?

Yeah the unrelated test failures can be ignored. Good point about our shuffle combine workaround, @penzn. Would you have time and resources to investigate whether we can remove the workaround now?

This revision was landed with ongoing or failed builds.Sep 13 2022, 4:03 PM
This revision was automatically updated to reflect the committed changes.
penzn added a comment.Sep 13 2022, 4:04 PM

Yeah the unrelated test failures can be ignored. Good point about our shuffle combine workaround, @penzn. Would you have time and resources to investigate whether we can remove the workaround now?

Yes, we will have time to investigate.