Page MenuHomePhabricator

[WebAssembly] Custom optimization for truncate
ClosedPublic

Authored by jingbao on Sep 8 2021, 8:07 PM.

Details

Summary

When possible, optimize TRUNCATE to generate Wasm SIMD narrow instructions (i16x8.narrow_i32x4_u, i8x16.narrow_i16x8_u), rather than generate lots of extract_lane and replace_lane.
See Bug 51006 for discussion.

Diff Detail

Unit TestsFailed

TimeTest
70 msx64 debian > LLVM.Bindings/Go::go.test
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/bin/llvm-go go=/usr/bin/go test llvm.org/llvm/bindings/go/llvm

Event Timeline

jingbao created this revision.Sep 8 2021, 8:07 PM
jingbao requested review of this revision.Sep 8 2021, 8:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 8 2021, 8:07 PM
tlively requested changes to this revision.Sep 9 2021, 5:02 PM

Can you add tests for this to llvm/test/CodeGen/WebAssembly?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
2658

It would be good to write out the transformation we are doing in terms of selection dag operations and WebAssembly instructions. I don't know what PACK*SDW and PACK*SWB mean.

This revision now requires changes to proceed.Sep 9 2021, 5:02 PM
jingbao updated this revision to Diff 372424.Sep 14 2021, 1:17 AM

Update comments and add tests

tlively added inline comments.Sep 14 2021, 3:25 PM
llvm/test/CodeGen/WebAssembly/simd-vector-trunc.ll
2

If you change the run line to this, you can use utils/update_llc_test_checks.py to autogenerate the test output, which will make it easier for me to understand the transformation we are doing. Also, it looks like the code is more general than it needs to be to handle this one case. Is there an opportunity to either simplify the code or test more cases here?

jingbao added inline comments.Sep 14 2021, 7:25 PM
llvm/test/CodeGen/WebAssembly/simd-vector-trunc.ll
2

the code is more general than it needs to be to handle this one case

I'm not sure I fully understand what you mean by this. Do you want me to change the function body of @trunc16i32_16i8? Is there detailed suggestions on how to change?

tlively added inline comments.Sep 14 2021, 7:45 PM
llvm/test/CodeGen/WebAssembly/simd-vector-trunc.ll
2

For example, extractSubVector and truncateVectorWithNARROW look like they are meant to work with many different vector types. If that's correct, it would be good to add tests for more vector types. Alternatively, if we only care about a smaller set of types, it would be nice if we could simplify the code and make it less general.

jingbao added inline comments.Sep 14 2021, 7:54 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
2707

Hi Thomas, you're right that the code is meant to handle more types. My initial purpose is to discuss the solution with you and Petr first so I keep the types simple and leaves a TODO here. If you're OK with the solution, I can try to cover more types and more tests.

(Actually I have little experience with LLVM types, it may take me some time to cover more, and any advice on the if conditions here is very welcome)

jingbao updated this revision to Diff 375171.Sep 27 2021, 1:54 AM

Make it more general

jingbao updated this revision to Diff 375451.Sep 27 2021, 6:17 PM

clang-tidy

Nice, these tests give me a good sense of what this optimization is doing. The transformation looks good.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
2614

For both this and truncateVectorWithNARROW, it would be good to add comments about what the parameters are for and what will be returned.

jingbao updated this revision to Diff 376166.Sep 30 2021, 4:58 AM

Add comments and simplify parameters

Hi Thomas, would you please take a look at the newest patch? Thanks!

jingbao marked 4 inline comments as done.Nov 21 2021, 11:11 PM
jingbao edited the summary of this revision. (Show Details)Nov 21 2021, 11:27 PM

Hi Thomas and Petr,

I added a bit more description to the summary and has done all existing comments. Would you please kindly review again? Thanks a lot!

penzn accepted this revision.Nov 30 2021, 10:56 AM

Looks like the feedback is addressed and overall this is a needed optimization. @tlively what do you think?

tlively accepted this revision.Dec 3 2021, 8:39 PM

Thanks for doing this, @jingbao. Sorry I let it sit for so long. Do you have commit rights or would you like me to land this for you?

This revision is now accepted and ready to land.Dec 3 2021, 8:39 PM

Thank Thomas and Petr. I don't know if I have commit rights, @tlively would you please help land this? Thanks!

@jingbao I was working on landing this, but the CodeGen/WebAssembly/fpclamptosat_vec.ll test had to be updated. Unfortunately Phabricator won't let me attach the changes to this revision, but it looks like this change introduces some new unnecessary masking that was not there before in a couple of the test functions. (It also makes a bunch of good changes!). Would you be able to take a look at that? I'd also be ok with landing this as-is since the improvements looks more significant and more common than the regression.

jingbao updated this revision to Diff 394183.Dec 14 2021, 1:36 AM

update fpclamptosat_vec.ll

Hi @tlively, I updated the test file. After some investigation, I still cannot delete the masking operation if you are talking about the "v128.and" part, because those can avoid the narrow operations to saturate incorrectly. If you have concerns about other parts, please point that out on the test file and we can discuss again.