This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
4/25
WebAssemblyISelLowering.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-build-vector.ll

Differential D88591

[WebAssembly] Emulate v128.const efficiently
ClosedPublic

Authored by tlively on Sep 30 2020, 10:01 AM.

Download Raw Diff

Details

Reviewers

aheejin

Commits

rG542523a61a21: [WebAssembly] Emulate v128.const efficiently

Summary

v128.const was recently implemented in V8, but until it rolls into Chrome
stable, we can't enable it in the WebAssembly backend without breaking origin
trial users. So far we have been lowering build_vectors that would otherwise
have been lowered to v128.const to splats followed by sequences of replace_lane
instructions to initialize each lane individually. That produces large and
inefficient code, so this patch introduces new logic to lower integer vector
constants to a single i64x2.splat where possible, with at most a single
i64x2.replace_lane following it if necessary.

Adapted from a patch authored by @omnisip.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tlively created this revision.Sep 30 2020, 10:01 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2020, 10:01 AM

Herald added subscribers: llvm-commits, ecnelises, sunfish and 4 others. · View Herald Transcript

tlively requested review of this revision.Sep 30 2020, 10:01 AM

Harbormaster completed remote builds in B73536: Diff 295337.Sep 30 2020, 10:13 AM

Nice idea! I believe this approach will be better in most cases; the only possible downside I can think of is something like this. In our test case there's this test:

define <4 x i32> @splat_common_const_i32x4() {
  ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>
}

This is compiled to this currently:

i32.const	$push0=, 3
i32x4.splat	$push1=, $pop0
i32.const	$push2=, 1
i32x4.replace_lane	$push3=, $pop1, 3, $pop2

After this patch this becomes:

i64.const	$push0=, 12884901888
i64x2.splat	$push1=, $pop0
i64.const	$push2=, 4294967299
i64x2.replace_lane	$push3=, $pop1, 1, $pop2

The number of instructions is the same, but because we are now using i64.const rather than i32.const, can this increase the code size?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1586	It'd be easier to read to have a little more comments here on what we are trying to do and why it is always better than the simpler splatting-and-replacing option which already existed. Something like this. (Also it might be a good idea to link this issue in the CL/commit description for full reference).
1601	What does `getLimitedValue` do and why is it necessary?

In D88591#2306093, @aheejin wrote:

The number of instructions is the same, but because we are now using i64.const rather than i32.const, can this increase the code size?

Yes, you're right code size can increase. However, once we enable v128.const by default, the code size will increase even more. We've generally been trying to optimize SIMD code to minimize the number of instructions (and more generally maximize performance) rather than minimizing code size, so I don't think this is a problem.

tlively added inline comments.Oct 1 2020, 1:18 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1586	Sounds good. I will elaborate here.
1601	It gets the constant value, capping it so that it fits in a `uint64_t`. It's similar to `getZExtValue`, but doesn't assert if the value happens to be too large to fit in a uint64_t for some reason.

Elaborate commment

Harbormaster completed remote builds in B73706: Diff 295658.Oct 1 2020, 1:40 PM

aheejin accepted this revision.Oct 1 2020, 1:52 PM

aheejin added inline comments.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1601	Given that our biggest lane is 64 bits, do we need this then? (I'm not against using this as a safe measure or anything; I'm just trying to understand what this function does)

This revision is now accepted and ready to land.Oct 1 2020, 1:52 PM

tlively added inline comments.Oct 2 2020, 12:20 AM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1601	No, I think we could have just as well used getZExtVal, but I don't think that's any clearer.

Closed by commit rG542523a61a21: [WebAssembly] Emulate v128.const efficiently (authored by tlively). · Explain WhyOct 2 2020, 12:28 AM

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rG542523a61a21: [WebAssembly] Emulate v128.const efficiently.

This patch appears to be the likely cause of big endian hosts failing with a timeout:

This is what shows on our AIX-hosted bot:

buildbot-user 19006934  7538476 120 07:48:02      - 68:54 /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/build/bin/llc -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128
buildbot-user 30541264  7538476   0 07:48:02      -  0:00 /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/build/bin/FileCheck /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/llvm/llvm/test/CodeGen/WebAssembly/simd-build-vector.ll --check-prefixes=CHECK,SIMD-VM
buildbot-user  7538476 17762386   0 07:48:01      -  0:00 /bin/bash /buildbot/buildbot-user/buildbot-worker/worker/LLVM-Master-AIX-Release-powerpc64le-gnu-linux/build/test/CodeGen/WebAssembly/Output/simd-build-vector.ll.script

@uweigand, fyi re: clang-s390x-linux bot: https://reviews.llvm.org/D88591#2308590.

Please address buildbot failure: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/19311/steps/build/logs/stdio

E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1589): error C2100: illegal indirection
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1590): error C2100: illegal indirection

stella.stamenova added a reverting change: rG432e4e56d3d2: Revert "[WebAssembly] Emulate v128.const efficiently".Oct 2 2020, 9:26 AM

hubert.reinterpretcast added inline comments.Oct 2 2020, 9:31 AM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1594	More comments or assertions about the expected range for ByteStep could be helpful.
1595	More comments of assertions about the expected number of loop iterations and the relationship to ByteStep could be helpful.

I can confirm that this changeset is causing the timeout on the clang-ppc64be-linux buildbot.
The following was run on a Big Endian Power PC machine.

Without the change:

$ ./bin/llvm-lit test/CodeGen/WebAssembly/simd-build-vector.ll
-- Testing: 1 tests, 1 workers --
PASS: LLVM :: CodeGen/WebAssembly/simd-build-vector.ll (1 of 1)Testing Time: 0.31s
  Passed: 1

With the change (I killed it after 200 seconds)

$ ./bin/llvm-lit test/CodeGen/WebAssembly/simd-build-vector.ll
-- Testing: 1 tests, 1 workers --
^C  interrupted by user, skipping remaining testsTesting Time: 201.41s
  Skipped: 1

@stefanp Yikes, I must have messed up the endianness handling here. Will revert and investigate.

max-kudr removed a subscriber: max-kudr.Oct 2 2020, 12:20 PM

efriedma added a subscriber: efriedma.Oct 2 2020, 1:16 PM

efriedma added inline comments.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	I think this byte_swap isn't doing the right thing; it's in the wrong width. There are a couple of reasonable ways to write this: Construct an `std::array<uint8_t, 16>/std::array<ulittle16_t, 8>/std::array<ulittle32_t, 4>`, and memcpy from it to an `std::array<ulittle64_t, 2>`. Don't type-pun at all; something like: for (int i = 0; i < NumElements; ++i) Result[i / (NumElements / 2)] \|= Elements[i] << ElementWidth * (i % (NumElements / 2);

dweber added a subscriber: dweber.Oct 2 2020, 5:08 PM

dweber added inline comments.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	When I wrote this, I checked the implementation of byte swap. It's a template that derives the byte_swap from the value type. In this case, getLimitedValue returns u64. It should be right. It makes me wonder since webassembly is guaranteed to be little endian if a conversion occurs before this happens.

dweber added inline comments.Oct 2 2020, 5:38 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1595	@tlively @hubert.reinterpretcast the byte step exists to simplify the branching logic that would exist if we had to do masking for each integer type. It takes advantage of little endian to always capture the least significant bits relevant to the typed integer in question.

dweber added a subscriber: max-kudr.Oct 2 2020, 6:02 PM

This comment was removed by dweber.

In D88591#2308659, @max-kudr wrote:
Please address buildbot failure: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/19311/steps/build/logs/stdio
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1589): error C2100: illegal indirection
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1590): error C2100: illegal indirection

This issue is caused by VS not accepting the initializer list on the machine running the build. It appears to be broken in VS up through v19.24. But starts working with v19.25. This can be resolved by using the {{0,0}} initializer syntax with std::array.

In D88591#2309812, @dweber wrote:
In D88591#2308659, @max-kudr wrote:
Please address buildbot failure: http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/19311/steps/build/logs/stdio
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1589): error C2100: illegal indirection
E:\build_slave\lldb-x64-windows-ninja\llvm-project\llvm\lib\Target\WebAssembly\WebAssemblyISelLowering.cpp(1590): error C2100: illegal indirection
This issue is caused by VS not accepting the initializer list on the machine running the build. It appears to be broken in VS up through v19.24. But starts working with v19.25. This can be resolved by using the {{0,0}} initializer syntax with std::array.

Thanks for looking into this!

hubert.reinterpretcast added inline comments.Oct 2 2020, 7:24 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	`Splatted` represents a different value on big endian systems at this point.

dweber added inline comments.Oct 2 2020, 7:27 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	@hubert.reinterpretcast If you remove the byte_swap function all together, do you still get the same issue?

hubert.reinterpretcast added inline comments.Oct 2 2020, 7:37 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	I mean, from code inspection, without needing to assume anything weird on the original `byte_swap`. Using half the width as an example (4 elements of 16 bits): Values: 0x1234 0x5678 0x1234 0x5678 Resulting bytes in storage: 0x34 0x12 0x78 0x56 // 0x34 0x12 0x78 0x56 0xff 0xff 0xff 0xff // 0xff 0xff 0xff 0xff Request constant from value with bytes: 0x34 0x12 0x78 0x56 i.e., on little endian systems, request constant with value: 0x56781234 or, on big endian systems, request constant with value: 0x34127856

hubert.reinterpretcast added inline comments.Oct 2 2020, 7:41 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1634	Similarly for `I64s[1]` here.

efriedma added inline comments.Oct 2 2020, 7:45 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	Oh, you're extending to u64, putting a little-endian u64 into memory, and copying the first N bits. That's effectively the same as putting a little-endian uN into memory. But I guess that leaves you with a problem at the other end: the elements of I64s are never swapped back from little-endian to native endianness. In any case, the interaction between the endianness and the pointer manipulation is hard to follow; I'd suggest rewriting it even if the current implementation were correct.

dweber added inline comments.Oct 2 2020, 7:47 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	@hubert.reinterpretcast the reason I ask is because webassembly assumes little endian. On big endian machines, the constant could have been converted before we do anything here -- especially since the developer using emscripten is taught there is no other byte order but little endian. If that's the case, the byte swap could be creating the problem.

dweber added inline comments.Oct 2 2020, 7:49 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	(the above byte_swap on a little endian machine will always produce the original parameter without fail).

dweber added inline comments.Oct 2 2020, 7:53 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	They're not supposed to be converted back to native endianness unless there's something I misunderstand about the implementation. If two byte swaps are required, I'm pretty sure none is required.

tlively added inline comments.Oct 2 2020, 7:58 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the way.

dweber added inline comments.Oct 2 2020, 8:08 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	You could just change ByteStep to IntegerWidthInBytes if it's really a clarification change.

dweber added inline comments.Oct 2 2020, 8:19 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	Two byte swaps would definitely be safe though.

hubert.reinterpretcast added inline comments.Oct 2 2020, 8:42 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

1631

The first byte_swap is necessary. The following makes the case pass:

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index 8474e50..5a94041 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -1586,8 +1586,10 @@ SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,
     // we can construct the 128-bit constant from a pair of i64s using a splat
     // followed by at most one i64x2.replace_lane. Also keep track of the lanes
     // that actually matter so we can avoid the replace_lane in more cases.
-    std::array<uint64_t, 2> I64s({0, 0});
-    std::array<uint64_t, 2> ConstLaneMasks({0, 0});
+    using llvm::support::ulittle64_t;
+    const ulittle64_t ulittle64_zero(0);
+    std::array<ulittle64_t, 2> I64s({ulittle64_zero, ulittle64_zero});
+    std::array<ulittle64_t, 2> ConstLaneMasks({ulittle64_zero, ulittle64_zero});
     uint8_t *I64Bytes = reinterpret_cast<uint8_t *>(I64s.data());
     uint8_t *MaskBytes = reinterpret_cast<uint8_t *>(ConstLaneMasks.data());
     unsigned I = 0;

hubert.reinterpretcast added inline comments.Oct 2 2020, 8:47 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1603	I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the way. I am guessing that a memcpy into a `ulittle64_t` would be preferred for getting the desired 64-bit value that would produce the right memory image on the target machine.

dweber added inline comments.Oct 2 2020, 8:48 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1631	Excellent. If you fix the std:;array initializer syntax to be {{0,0}} instead of ({0,0}) it'll compile and work on VS too.

max-kudr removed a subscriber: max-kudr.Oct 2 2020, 8:52 PM

tlively added a reverting change: D88773: Reland "[WebAssembly] Emulate v128.const efficiently"".Oct 2 2020, 9:03 PM

tlively added a reverting change: rG72c628e83580: Reland "[WebAssembly] Emulate v128.const efficiently"".Oct 12 2020, 9:37 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.cpp

69 lines

test/

CodeGen/

WebAssembly/

simd-build-vector.ll

69 lines

Diff 295742

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show All 24 Lines
#include "llvm/CodeGen/SelectionDAG.h"		#include "llvm/CodeGen/SelectionDAG.h"
#include "llvm/CodeGen/WasmEHFuncInfo.h"		#include "llvm/CodeGen/WasmEHFuncInfo.h"
#include "llvm/IR/DiagnosticInfo.h"		#include "llvm/IR/DiagnosticInfo.h"
#include "llvm/IR/DiagnosticPrinter.h"		#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/IntrinsicsWebAssembly.h"		#include "llvm/IR/IntrinsicsWebAssembly.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
		#include "llvm/Support/Endian.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetOptions.h"		#include "llvm/Target/TargetOptions.h"
using namespace llvm;		using namespace llvm;

#define DEBUG_TYPE "wasm-lower"		#define DEBUG_TYPE "wasm-lower"

WebAssemblyTargetLowering::WebAssemblyTargetLowering(		WebAssemblyTargetLowering::WebAssemblyTargetLowering(
▲ Show 20 Lines • Show All 1,519 Lines • ▼ Show 20 Lines	if (NumSwizzleLanes >= NumSplatLanes &&
Result = DAG.getNode(WebAssemblyISD::SWIZZLE, DL, VecT, SwizzleSrc,		Result = DAG.getNode(WebAssemblyISD::SWIZZLE, DL, VecT, SwizzleSrc,
SwizzleIndices);		SwizzleIndices);
auto Swizzled = std::make_pair(SwizzleSrc, SwizzleIndices);		auto Swizzled = std::make_pair(SwizzleSrc, SwizzleIndices);
IsLaneConstructed = [&, Swizzled](size_t I, const SDValue &Lane) {		IsLaneConstructed = [&, Swizzled](size_t I, const SDValue &Lane) {
return Swizzled == GetSwizzleSrcs(I, Lane);		return Swizzled == GetSwizzleSrcs(I, Lane);
};		};
} else if (NumConstantLanes >= NumSplatLanes &&		} else if (NumConstantLanes >= NumSplatLanes &&
Subtarget->hasUnimplementedSIMD128()) {		Subtarget->hasUnimplementedSIMD128()) {
		// If we support v128.const, emit it directly
SmallVector<SDValue, 16> ConstLanes;		SmallVector<SDValue, 16> ConstLanes;
for (const SDValue &Lane : Op->op_values()) {		for (const SDValue &Lane : Op->op_values()) {
if (IsConstant(Lane)) {		if (IsConstant(Lane)) {
ConstLanes.push_back(Lane);		ConstLanes.push_back(Lane);
} else if (LaneT.isFloatingPoint()) {		} else if (LaneT.isFloatingPoint()) {
ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT));		ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT));
} else {		} else {
ConstLanes.push_back(DAG.getConstant(0, DL, LaneT));		ConstLanes.push_back(DAG.getConstant(0, DL, LaneT));
}		}
}		}
Result = DAG.getBuildVector(VecT, DL, ConstLanes);		Result = DAG.getBuildVector(VecT, DL, ConstLanes);
IsLaneConstructed = [&](size_t _, const SDValue &Lane) {		IsLaneConstructed = [&IsConstant](size_t _, const SDValue &Lane) {
return IsConstant(Lane);		return IsConstant(Lane);
};		};
		} else if (NumConstantLanes >= NumSplatLanes && VecT.isInteger()) {
		// Otherwise, if this is an integer vector, pack the lane values together so
		// we can construct the 128-bit constant from a pair of i64s using a splat
		aheejinUnsubmitted Not Done Reply Inline Actions It'd be easier to read to have a little more comments here on what we are trying to do and why it is always better than the simpler splatting-and-replacing option which already existed. Something like this. (Also it might be a good idea to link this issue in the CL/commit description for full reference). aheejin: It'd be easier to read to have a little more comments here on what we are trying to do and why…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions Sounds good. I will elaborate here. tlively: Sounds good. I will elaborate here.
		// followed by at most one i64x2.replace_lane. Also keep track of the lanes
		// that actually matter so we can avoid the replace_lane in more cases.
		std::array<uint64_t, 2> I64s({0, 0});
		std::array<uint64_t, 2> ConstLaneMasks({0, 0});
		uint8_t I64Bytes = reinterpret_cast<uint8_t >(I64s.data());
		uint8_t MaskBytes = reinterpret_cast<uint8_t >(ConstLaneMasks.data());
		unsigned I = 0;
		size_t ByteStep = VecT.getScalarSizeInBits() / 8;
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions More comments or assertions about the expected range for ByteStep could be helpful. hubert.reinterpretcast: More comments or assertions about the expected range for ByteStep could be helpful.
		for (const SDValue &Lane : Op->op_values()) {
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions More comments of assertions about the expected number of loop iterations and the relationship to ByteStep could be helpful. hubert.reinterpretcast: More comments of assertions about the expected number of loop iterations and the relationship…
		dweberUnsubmitted Not Done Reply Inline Actions @tlively @hubert.reinterpretcast the byte step exists to simplify the branching logic that would exist if we had to do masking for each integer type. It takes advantage of little endian to always capture the least significant bits relevant to the typed integer in question. dweber: @tlively @hubert.reinterpretcast the byte step exists to simplify the branching logic that…
		if (IsConstant(Lane)) {
		using llvm::support::little;
		using llvm::support::endian::byte_swap;
		// The endianness of the compiler matters here. We want to enforce
		// little endianness so that the bytes of a smaller integer type will
		// occur first in the uint64_t.
		aheejinUnsubmitted Not Done Reply Inline Actions What does `getLimitedValue` do and why is it necessary? aheejin: What does `getLimitedValue` do and why is it necessary?
		tlivelyAuthorUnsubmitted Done Reply Inline Actions It gets the constant value, capping it so that it fits in a `uint64_t`. It's similar to `getZExtValue`, but doesn't assert if the value happens to be too large to fit in a uint64_t for some reason. tlively: It gets the constant value, capping it so that it fits in a `uint64_t`. It's similar to…
		aheejinUnsubmitted Not Done Reply Inline Actions Given that our biggest lane is 64 bits, do we need this then? (I'm not against using this as a safe measure or anything; I'm just trying to understand what this function does) aheejin: Given that our biggest lane is 64 bits, do we need this then? (I'm not against using this as a…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions No, I think we could have just as well used getZExtVal, but I don't think that's any clearer. tlively: No, I think we could have just as well used getZExtVal, but I don't think that's any clearer.
		auto *Const = cast<ConstantSDNode>(Lane.getNode());
		uint64_t Val = byte_swap(Const->getLimitedValue(), little);
		efriedmaUnsubmitted Not Done Reply Inline Actions I think this byte_swap isn't doing the right thing; it's in the wrong width. There are a couple of reasonable ways to write this: Construct an `std::array<uint8_t, 16>/std::array<ulittle16_t, 8>/std::array<ulittle32_t, 4>`, and memcpy from it to an `std::array<ulittle64_t, 2>`. Don't type-pun at all; something like: for (int i = 0; i < NumElements; ++i) Result[i / (NumElements / 2)] \|= Elements[i] << ElementWidth * (i % (NumElements / 2); efriedma: I think this byte_swap isn't doing the right thing; it's in the wrong width. There are a…
		dweberUnsubmitted Not Done Reply Inline Actions When I wrote this, I checked the implementation of byte swap. It's a template that derives the byte_swap from the value type. In this case, getLimitedValue returns u64. It should be right. It makes me wonder since webassembly is guaranteed to be little endian if a conversion occurs before this happens. dweber: When I wrote this, I checked the implementation of byte swap. It's a template that derives the…
		efriedmaUnsubmitted Not Done Reply Inline Actions Oh, you're extending to u64, putting a little-endian u64 into memory, and copying the first N bits. That's effectively the same as putting a little-endian uN into memory. But I guess that leaves you with a problem at the other end: the elements of I64s are never swapped back from little-endian to native endianness. In any case, the interaction between the endianness and the pointer manipulation is hard to follow; I'd suggest rewriting it even if the current implementation were correct. efriedma: Oh, you're extending to u64, putting a little-endian u64 into memory, and copying the first N…
		dweberUnsubmitted Not Done Reply Inline Actions They're not supposed to be converted back to native endianness unless there's something I misunderstand about the implementation. If two byte swaps are required, I'm pretty sure none is required. dweber: They're not supposed to be converted back to native endianness unless there's something I…
		tlivelyAuthorUnsubmitted Done Reply Inline Actions I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the way. tlively: I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the…
		dweberUnsubmitted Not Done Reply Inline Actions You could just change ByteStep to IntegerWidthInBytes if it's really a clarification change. dweber: You could just change ByteStep to IntegerWidthInBytes if it's really a clarification change.
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by the way. I am guessing that a memcpy into a `ulittle64_t` would be preferred for getting the desired 64-bit value that would produce the right memory image on the target machine. hubert.reinterpretcast: > I'm working on a rewrite that uses @efriedma's suggestion to not use type punning above, by…
		uint8_t ValPtr = reinterpret_cast<uint8_t >(&Val);
		std::copy(ValPtr, ValPtr + ByteStep, I64Bytes + I * ByteStep);
		uint64_t Mask = uint64_t(-1LL);
		uint8_t MaskPtr = reinterpret_cast<uint8_t >(&Mask);
		std::copy(MaskPtr, MaskPtr + ByteStep, MaskBytes + I * ByteStep);
		}
		++I;
		}
		// Check whether all constant lanes in the second half of the vector are
		// equivalent in the first half or vice versa to determine whether splatting
		// either side will be sufficient to materialize the constant. As a special
		// case, if the first and second halves have no constant lanes in common, we
		// can just combine them.
		bool FirstHalfSufficient = (I64s[0] & ConstLaneMasks[1]) == I64s[1];
		bool SecondHalfSufficient = (I64s[1] & ConstLaneMasks[0]) == I64s[0];
		bool CombinedSufficient = (ConstLaneMasks[0] & ConstLaneMasks[1]) == 0;

		uint64_t Splatted;
		if (SecondHalfSufficient) {
		Splatted = I64s[1];
		} else if (CombinedSufficient) {
		Splatted = I64s[0] \| I64s[1];
		} else {
		Splatted = I64s[0];
		}

		Result = DAG.getSplatBuildVector(MVT::v2i64, DL,
		DAG.getConstant(Splatted, DL, MVT::i64));
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions `Splatted` represents a different value on big endian systems at this point. hubert.reinterpretcast: `Splatted` represents a different value on big endian systems at this point.
		dweberUnsubmitted Not Done Reply Inline Actions @hubert.reinterpretcast If you remove the byte_swap function all together, do you still get the same issue? dweber: @hubert.reinterpretcast If you remove the byte_swap function all together, do you still get the…
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions I mean, from code inspection, without needing to assume anything weird on the original `byte_swap`. Using half the width as an example (4 elements of 16 bits): Values: 0x1234 0x5678 0x1234 0x5678 Resulting bytes in storage: 0x34 0x12 0x78 0x56 // 0x34 0x12 0x78 0x56 0xff 0xff 0xff 0xff // 0xff 0xff 0xff 0xff Request constant from value with bytes: 0x34 0x12 0x78 0x56 i.e., on little endian systems, request constant with value: 0x56781234 or, on big endian systems, request constant with value: 0x34127856 hubert.reinterpretcast: I mean, from code inspection, without needing to assume anything weird on the original…
		dweberUnsubmitted Not Done Reply Inline Actions @hubert.reinterpretcast the reason I ask is because webassembly assumes little endian. On big endian machines, the constant could have been converted before we do anything here -- especially since the developer using emscripten is taught there is no other byte order but little endian. If that's the case, the byte swap could be creating the problem. dweber: @hubert.reinterpretcast the reason I ask is because webassembly assumes little endian. On big…
		dweberUnsubmitted Not Done Reply Inline Actions (the above byte_swap on a little endian machine will always produce the original parameter without fail). dweber: (the above byte_swap on a little endian machine will always produce the original parameter…
		dweberUnsubmitted Not Done Reply Inline Actions Two byte swaps would definitely be safe though. dweber: Two byte swaps would definitely be safe though.
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions The first `byte_swap` is necessary. The following makes the case pass: diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp index 8474e50..5a94041 100644 --- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp +++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp @@ -1586,8 +1586,10 @@ SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op, // we can construct the 128-bit constant from a pair of i64s using a splat // followed by at most one i64x2.replace_lane. Also keep track of the lanes // that actually matter so we can avoid the replace_lane in more cases. - std::array<uint64_t, 2> I64s({0, 0}); - std::array<uint64_t, 2> ConstLaneMasks({0, 0}); + using llvm::support::ulittle64_t; + const ulittle64_t ulittle64_zero(0); + std::array<ulittle64_t, 2> I64s({ulittle64_zero, ulittle64_zero}); + std::array<ulittle64_t, 2> ConstLaneMasks({ulittle64_zero, ulittle64_zero}); uint8_t I64Bytes = reinterpret_cast<uint8_t >(I64s.data()); uint8_t MaskBytes = reinterpret_cast<uint8_t >(ConstLaneMasks.data()); unsigned I = 0; hubert.reinterpretcast: The first `byte_swap` is necessary. The following makes the case pass: ``` diff --git…
		dweberUnsubmitted Not Done Reply Inline Actions Excellent. If you fix the std:;array initializer syntax to be {{0,0}} instead of ({0,0}) it'll compile and work on VS too. dweber: Excellent. If you fix the std:;array initializer syntax to be {{0,0}} instead of ({0,0}) it'll…
		if (!FirstHalfSufficient && !SecondHalfSufficient && !CombinedSufficient) {
		Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, MVT::v2i64, Result,
		DAG.getConstant(I64s[1], DL, MVT::i64),
		hubert.reinterpretcastUnsubmitted Not Done Reply Inline Actions Similarly for `I64s[1]` here. hubert.reinterpretcast: Similarly for `I64s[1]` here.
		DAG.getConstant(1, DL, MVT::i32));
}		}
if (!Result) {		Result = DAG.getBitcast(VecT, Result);
		IsLaneConstructed = [&IsConstant](size_t _, const SDValue &Lane) {
		return IsConstant(Lane);
		};
		} else {
// Use a splat, but possibly a load_splat		// Use a splat, but possibly a load_splat
LoadSDNode *SplattedLoad;		LoadSDNode *SplattedLoad;
if ((SplattedLoad = dyn_cast<LoadSDNode>(SplatValue)) &&		if ((SplattedLoad = dyn_cast<LoadSDNode>(SplatValue)) &&
SplattedLoad->getMemoryVT() == VecT.getVectorElementType()) {		SplattedLoad->getMemoryVT() == VecT.getVectorElementType()) {
Result = DAG.getMemIntrinsicNode(		Result = DAG.getMemIntrinsicNode(
WebAssemblyISD::LOAD_SPLAT, DL, DAG.getVTList(VecT),		WebAssemblyISD::LOAD_SPLAT, DL, DAG.getVTList(VecT),
{SplattedLoad->getChain(), SplattedLoad->getBasePtr(),		{SplattedLoad->getChain(), SplattedLoad->getBasePtr(),
SplattedLoad->getOffset()},		SplattedLoad->getOffset()},
SplattedLoad->getMemoryVT(), SplattedLoad->getMemOperand());		SplattedLoad->getMemoryVT(), SplattedLoad->getMemOperand());
} else {		} else {
Result = DAG.getSplatBuildVector(VecT, DL, SplatValue);		Result = DAG.getSplatBuildVector(VecT, DL, SplatValue);
}		}
IsLaneConstructed = [&](size_t _, const SDValue &Lane) {		IsLaneConstructed = [&SplatValue](size_t _, const SDValue &Lane) {
return Lane == SplatValue;		return Lane == SplatValue;
};		};
}		}

		assert(Result);
		assert(IsLaneConstructed);

// Add replace_lane instructions for any unhandled values		// Add replace_lane instructions for any unhandled values
for (size_t I = 0; I < Lanes; ++I) {		for (size_t I = 0; I < Lanes; ++I) {
const SDValue &Lane = Op->getOperand(I);		const SDValue &Lane = Op->getOperand(I);
if (!Lane.isUndef() && !IsLaneConstructed(I, Lane))		if (!Lane.isUndef() && !IsLaneConstructed(I, Lane))
Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecT, Result, Lane,		Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecT, Result, Lane,
DAG.getConstant(I, DL, MVT::i32));		DAG.getConstant(I, DL, MVT::i32));
}		}

▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-build-vector.ll

; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+unimplemented-simd128 \| FileCheck %s --check-prefixes=CHECK,UNIMP		; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+unimplemented-simd128 \| FileCheck %s --check-prefixes=CHECK,UNIMP
; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 \| FileCheck %s --check-prefixes=CHECK,SIMD-VM		; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 \| FileCheck %s --check-prefixes=CHECK,SIMD-VM

; Test that the logic to choose between v128.const vector		; Test that the logic to choose between v128.const vector
; initialization and splat vector initialization and to optimize the		; initialization and splat vector initialization and to optimize the
; choice of splat value works correctly.		; choice of splat value works correctly.

target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"		target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
target triple = "wasm32-unknown-unknown"		target triple = "wasm32-unknown-unknown"

		; CHECK-LABEL: emulated_const_trivial_splat:
		; CHECK-NEXT: .functype emulated_const_trivial_splat () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_trivial_splat() {
		ret <4 x i32> <i32 1, i32 2, i32 1, i32 2>
		}

		; CHECK-LABEL: emulated_const_first_sufficient:
		; CHECK-NEXT: .functype emulated_const_first_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_first_sufficient() {
		ret <4 x i32> <i32 1, i32 2, i32 undef, i32 2>
		}

		; CHECK-LABEL: emulated_const_second_sufficient:
		; CHECK-NEXT: .functype emulated_const_second_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_second_sufficient() {
		ret <4 x i32> <i32 1, i32 undef, i32 1, i32 2>
		}

		; CHECK-LABEL: emulated_const_combined_sufficient:
		; CHECK-NEXT: .functype emulated_const_combined_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_combined_sufficient() {
		ret <4 x i32> <i32 1, i32 undef, i32 undef, i32 2>
		}

		; CHECK-LABEL: emulated_const_either_sufficient:
		; CHECK-NEXT: .functype emulated_const_either_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 1
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_either_sufficient() {
		ret <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>
		}

		; CHECK-LABEL: emulated_const_neither_sufficient:
		; CHECK-NEXT: .functype emulated_const_neither_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: i64.const $push2=, 17179869184
		; SIMD-VM-NEXT: i64x2.replace_lane $push3=, $pop1, 1, $pop2
		; SIMD-VM-NEXT: return $pop3
		define <4 x i32> @emulated_const_neither_sufficient() {
		ret <4 x i32> <i32 1, i32 2, i32 undef, i32 4>
		}

; CHECK-LABEL: same_const_one_replaced_i16x8:		; CHECK-LABEL: same_const_one_replaced_i16x8:
; CHECK-NEXT: .functype same_const_one_replaced_i16x8 (i32) -> (v128)		; CHECK-NEXT: .functype same_const_one_replaced_i16x8 (i32) -> (v128)
; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 42, 42, 42, 42, 42, 0, 42, 42		; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 42, 42, 42, 42, 42, 0, 42, 42
; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0		; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0
; UNIMP-NEXT: return $pop[[L1]]		; UNIMP-NEXT: return $pop[[L1]]
; SIMD-VM: i16x8.splat		; SIMD-VM: i64x2.splat
define <8 x i16> @same_const_one_replaced_i16x8(i16 %x) {		define <8 x i16> @same_const_one_replaced_i16x8(i16 %x) {
%v = insertelement		%v = insertelement
<8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>,		<8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>,
i16 %x,		i16 %x,
i32 5		i32 5
ret <8 x i16> %v		ret <8 x i16> %v
}		}

; CHECK-LABEL: different_const_one_replaced_i16x8:		; CHECK-LABEL: different_const_one_replaced_i16x8:
; CHECK-NEXT: .functype different_const_one_replaced_i16x8 (i32) -> (v128)		; CHECK-NEXT: .functype different_const_one_replaced_i16x8 (i32) -> (v128)
; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 1, -2, 3, -4, 5, 0, 7, -8		; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 1, -2, 3, -4, 5, 0, 7, -8
; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0		; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0
; UNIMP-NEXT: return $pop[[L1]]		; UNIMP-NEXT: return $pop[[L1]]
; SIMD-VM: i16x8.splat		; SIMD-VM: i64x2.splat
define <8 x i16> @different_const_one_replaced_i16x8(i16 %x) {		define <8 x i16> @different_const_one_replaced_i16x8(i16 %x) {
%v = insertelement		%v = insertelement
<8 x i16> <i16 1, i16 -2, i16 3, i16 -4, i16 5, i16 -6, i16 7, i16 -8>,		<8 x i16> <i16 1, i16 -2, i16 3, i16 -4, i16 5, i16 -6, i16 7, i16 -8>,
i16 %x,		i16 %x,
i32 5		i32 5
ret <8 x i16> %v		ret <8 x i16> %v
}		}

Show All 24 Lines	%v = insertelement
i32 2		i32 2
ret <4 x float> %v		ret <4 x float> %v
}		}

; CHECK-LABEL: splat_common_const_i32x4:		; CHECK-LABEL: splat_common_const_i32x4:
; CHECK-NEXT: .functype splat_common_const_i32x4 () -> (v128)		; CHECK-NEXT: .functype splat_common_const_i32x4 () -> (v128)
; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 0, 3, 3, 1		; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 0, 3, 3, 1
; UNIMP-NEXT: return $pop[[L0]]		; UNIMP-NEXT: return $pop[[L0]]
; SIMD-VM: i32x4.splat		; SIMD-VM: i64x2.splat
define <4 x i32> @splat_common_const_i32x4() {		define <4 x i32> @splat_common_const_i32x4() {
ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>		ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>
}		}

; CHECK-LABEL: splat_common_arg_i16x8:		; CHECK-LABEL: splat_common_arg_i16x8:
; CHECK-NEXT: .functype splat_common_arg_i16x8 (i32, i32, i32) -> (v128)		; CHECK-NEXT: .functype splat_common_arg_i16x8 (i32, i32, i32) -> (v128)
; CHECK-NEXT: i16x8.splat $push[[L0:[0-9]+]]=, $2		; CHECK-NEXT: i16x8.splat $push[[L0:[0-9]+]]=, $2
; CHECK-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 0, $1		; CHECK-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 0, $1
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines

; CHECK-LABEL: mashup_const_i8x16:		; CHECK-LABEL: mashup_const_i8x16:
; CHECK-NEXT: .functype mashup_const_i8x16 (v128, v128, i32) -> (v128)		; CHECK-NEXT: .functype mashup_const_i8x16 (v128, v128, i32) -> (v128)
; UNIMP: v128.const $push[[L0:[0-9]+]]=, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0		; UNIMP: v128.const $push[[L0:[0-9]+]]=, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0
; UNIMP: i8x16.replace_lane		; UNIMP: i8x16.replace_lane
; UNIMP: i8x16.replace_lane		; UNIMP: i8x16.replace_lane
; UNIMP: i8x16.replace_lane		; UNIMP: i8x16.replace_lane
; UNIMP: return		; UNIMP: return
; SIMD-VM: i8x16.splat		; SIMD-VM: i64x2.splat
define <16 x i8> @mashup_const_i8x16(<16 x i8> %src, <16 x i8> %mask, i8 %splatted) {		define <16 x i8> @mashup_const_i8x16(<16 x i8> %src, <16 x i8> %mask, i8 %splatted) {
; swizzle 0		; swizzle 0
%m0 = extractelement <16 x i8> %mask, i32 0		%m0 = extractelement <16 x i8> %mask, i32 0
%s0 = extractelement <16 x i8> %src, i8 %m0		%s0 = extractelement <16 x i8> %src, i8 %m0
%v0 = insertelement <16 x i8> undef, i8 %s0, i32 0		%v0 = insertelement <16 x i8> undef, i8 %s0, i32 0
; splat 3		; splat 3
%v1 = insertelement <16 x i8> %v0, i8 %splatted, i32 3		%v1 = insertelement <16 x i8> %v0, i8 %splatted, i32 3
; splat 12		; splat 12
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Emulate v128.const efficientlyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 295742

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/simd-build-vector.ll

[WebAssembly] Emulate v128.const efficiently
ClosedPublic