This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Emulate v128.const efficiently
ClosedPublic

Authored by tlively on Sep 30 2020, 10:01 AM.

Details

Summary

v128.const was recently implemented in V8, but until it rolls into Chrome
stable, we can't enable it in the WebAssembly backend without breaking origin
trial users. So far we have been lowering build_vectors that would otherwise
have been lowered to v128.const to splats followed by sequences of replace_lane
instructions to initialize each lane individually. That produces large and
inefficient code, so this patch introduces new logic to lower integer vector
constants to a single i64x2.splat where possible, with at most a single
i64x2.replace_lane following it if necessary.

Adapted from a patch authored by @omnisip.

Diff Detail

Event Timeline

tlively created this revision.Sep 30 2020, 10:01 AM
tlively requested review of this revision.Sep 30 2020, 10:01 AM

Nice idea! I believe this approach will be better in most cases; the only possible downside I can think of is something like this. In our test case there's this test:

define <4 x i32> @splat_common_const_i32x4() {
  ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>
}

This is compiled to this currently:

i32.const	$push0=, 3
i32x4.splat	$push1=, $pop0
i32.const	$push2=, 1
i32x4.replace_lane	$push3=, $pop1, 3, $pop2

After this patch this becomes:

i64.const	$push0=, 12884901888
i64x2.splat	$push1=, $pop0
i64.const	$push2=, 4294967299
i64x2.replace_lane	$push3=, $pop1, 1, $pop2

The number of instructions is the same, but because we are now using i64.const rather than i32.const, can this increase the code size?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1586

It'd be easier to read to have a little more comments here on what we are trying to do and why it is always better than the simpler splatting-and-replacing option which already existed. Something like this. (Also it might be a good idea to link this issue in the CL/commit description for full reference).

1601

What does getLimitedValue do and why is it necessary?

The number of instructions is the same, but because we are now using i64.const rather than i32.const, can this increase the code size?

Yes, you're right code size can increase. However, once we enable v128.const by default, the code size will increase even more. We've generally been trying to optimize SIMD code to minimize the number of instructions (and more generally maximize performance) rather than minimizing code size, so I don't think this is a problem.

tlively added inline comments.Oct 1 2020, 1:18 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1586

Sounds good. I will elaborate here.

1601

It gets the constant value, capping it so that it fits in a uint64_t. It's similar to getZExtValue, but doesn't assert if the value happens to be too large to fit in a uint64_t for some reason.

tlively updated this revision to Diff 295658.Oct 1 2020, 1:25 PM
  • Elaborate commment
aheejin accepted this revision.Oct 1 2020, 1:52 PM
aheejin added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1601

Given that our biggest lane is 64 bits, do we need this then? (I'm not against using this as a safe measure or anything; I'm just trying to understand what this function does)

This revision is now accepted and ready to land.Oct 1 2020, 1:52 PM
tlively added inline comments.Oct 2 2020, 12:20 AM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1601

No, I think we could have just as well used getZExtVal, but I don't think that's any clearer.

This revision was automatically updated to reflect the committed changes.

This patch appears to be the likely cause of big endian hosts failing with a timeout:

This is what shows on our AIX-hosted bot:

buildbot-user 19006934  7538476 120 07:48:02      - 68:54 /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/build/bin/llc -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128
buildbot-user 30541264  7538476   0 07:48:02      -  0:00 /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/build/bin/FileCheck /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/llvm/llvm/test/CodeGen/WebAssembly/simd-build-vector.ll --check-prefixes=CHECK,SIMD-VM
buildbot-user  7538476 17762386   0 07:48:01      -  0:00 /bin/bash /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/build/test/CodeGen/WebAssembly/Output/simd-build-vector.ll.script

Please address buildbot failure: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/19311/steps/build/logs/stdio

E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1589): error C2100: illegal indirection
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1590): error C2100: illegal indirection
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1594

More comments or assertions about the expected range for ByteStep could be helpful.

1595

More comments of assertions about the expected number of loop iterations and the relationship to ByteStep could be helpful.

stefanp added a subscriber: stefanp.Oct 2 2020, 9:34 AM

I can confirm that this changeset is causing the timeout on the clang-ppc64be-linux buildbot.
The following was run on a Big Endian Power PC machine.

Without the change:

$ ./bin/llvm-lit test/CodeGen/WebAssembly/simd-build-vector.ll
-- Testing: 1 tests, 1 workers --
PASS: LLVM :: CodeGen/WebAssembly/simd-build-vector.ll (1 of 1)Testing Time: 0.31s
  Passed: 1

With the change (I killed it after 200 seconds)

$ ./bin/llvm-lit test/CodeGen/WebAssembly/simd-build-vector.ll
-- Testing: 1 tests, 1 workers --
^C  interrupted by user, skipping remaining testsTesting Time: 201.41s
  Skipped: 1

@stefanp Yikes, I must have messed up the endianness handling here. Will revert and investigate.

max-kudr removed a subscriber: max-kudr.Oct 2 2020, 12:20 PM
efriedma added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

I think this byte_swap isn't doing the right thing; it's in the wrong width.

There are a couple of reasonable ways to write this:

  1. Construct an std::array<uint8_t, 16>/std::array<ulittle16_t, 8>/std::array<ulittle32_t, 4>, and memcpy from it to an std::array<ulittle64_t, 2>.
  2. Don't type-pun at all; something like:
for (int i = 0; i < NumElements; ++i)
  Result[i / (NumElements / 2)] |= Elements[i] << ElementWidth * (i  % (NumElements / 2);
dweber added a subscriber: dweber.Oct 2 2020, 5:08 PM
dweber added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

When I wrote this, I checked the implementation of byte swap. It's a template that derives the byte_swap from the value type. In this case, getLimitedValue returns u64. It should be right. It makes me wonder since webassembly is guaranteed to be little endian if a conversion occurs before this happens.

dweber added inline comments.Oct 2 2020, 5:38 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1595

@tlively @hubert.reinterpretcast the byte step exists to simplify the branching logic that would exist if we had to do masking for each integer type. It takes advantage of little endian to always capture the least significant bits relevant to the typed integer in question.

dweber added a subscriber: max-kudr.Oct 2 2020, 6:02 PM
This comment was removed by dweber.
dweber added a comment.EditedOct 2 2020, 6:56 PM

Please address buildbot failure: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/19311/steps/build/logs/stdio

E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1589): error C2100: illegal indirection
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1590): error C2100: illegal indirection

This issue is caused by VS not accepting the initializer list on the machine running the build. It appears to be broken in VS up through v19.24. But starts working with v19.25. This can be resolved by using the {{0,0}} initializer syntax with std::array.

Please address buildbot failure: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/19311/steps/build/logs/stdio

E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1589): error C2100: illegal indirection
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1590): error C2100: illegal indirection

This issue is caused by VS not accepting the initializer list on the machine running the build. It appears to be broken in VS up through v19.24. But starts working with v19.25. This can be resolved by using the {{0,0}} initializer syntax with std::array.

Thanks for looking into this!

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

Splatted represents a different value on big endian systems at this point.

dweber added inline comments.Oct 2 2020, 7:27 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

@hubert.reinterpretcast If you remove the byte_swap function all together, do you still get the same issue?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

I mean, from code inspection, without needing to assume anything weird on the original byte_swap.

Using half the width as an example (4 elements of 16 bits):
Values:

0x1234 0x5678 0x1234 0x5678

Resulting bytes in storage:

0x34 0x12 0x78 0x56 // 0x34 0x12 0x78 0x56
0xff 0xff 0xff 0xff // 0xff 0xff 0xff 0xff

Request constant from value with bytes:

0x34 0x12 0x78 0x56

i.e., on little endian systems, request constant with value:

0x56781234

or, on big endian systems, request constant with value:

0x34127856
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1634

Similarly for I64s[1] here.

efriedma added inline comments.Oct 2 2020, 7:45 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

Oh, you're extending to u64, putting a little-endian u64 into memory, and copying the first N bits. That's effectively the same as putting a little-endian uN into memory. But I guess that leaves you with a problem at the other end: the elements of I64s are never swapped back from little-endian to native endianness.

In any case, the interaction between the endianness and the pointer manipulation is hard to follow; I'd suggest rewriting it even if the current implementation were correct.

dweber added inline comments.Oct 2 2020, 7:47 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

@hubert.reinterpretcast the reason I ask is because webassembly assumes little endian. On big endian machines, the constant could have been converted before we do anything here -- especially since the developer using emscripten is taught there is no other byte order but little endian. If that's the case, the byte swap could be creating the problem.

dweber added inline comments.Oct 2 2020, 7:49 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

(the above byte_swap on a little endian machine will always produce the original parameter without fail).

dweber added inline comments.Oct 2 2020, 7:53 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

They're not supposed to be converted back to native endianness unless there's something I misunderstand about the implementation. If two byte swaps are required, I'm pretty sure none is required.

tlively added inline comments.Oct 2 2020, 7:58 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the way.

dweber added inline comments.Oct 2 2020, 8:08 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

You could just change ByteStep to IntegerWidthInBytes if it's really a clarification change.

dweber added inline comments.Oct 2 2020, 8:19 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

Two byte swaps would definitely be safe though.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

The first byte_swap is necessary. The following makes the case pass:

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index 8474e50..5a94041 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -1586,8 +1586,10 @@ SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,
     // we can construct the 128-bit constant from a pair of i64s using a splat
     // followed by at most one i64x2.replace_lane. Also keep track of the lanes
     // that actually matter so we can avoid the replace_lane in more cases.
-    std::array<uint64_t, 2> I64s({0, 0});
-    std::array<uint64_t, 2> ConstLaneMasks({0, 0});
+    using llvm::support::ulittle64_t;
+    const ulittle64_t ulittle64_zero(0);
+    std::array<ulittle64_t, 2> I64s({ulittle64_zero, ulittle64_zero});
+    std::array<ulittle64_t, 2> ConstLaneMasks({ulittle64_zero, ulittle64_zero});
     uint8_t *I64Bytes = reinterpret_cast<uint8_t *>(I64s.data());
     uint8_t *MaskBytes = reinterpret_cast<uint8_t *>(ConstLaneMasks.data());
     unsigned I = 0;
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603

I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the way.

I am guessing that a memcpy into a ulittle64_t would be preferred for getting the desired 64-bit value that would produce the right memory image on the target machine.

dweber added inline comments.Oct 2 2020, 8:48 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631

Excellent. If you fix the std:;array initializer syntax to be {{0,0}} instead of ({0,0}) it'll compile and work on VS too.

max-kudr removed a subscriber: max-kudr.Oct 2 2020, 8:52 PM