This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
7/16
WebAssemblyISelLowering.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
-
simd-build-vector.ll

Differential D88773

Reland "[WebAssembly] Emulate v128.const efficiently""
ClosedPublic

Authored by tlively on Oct 2 2020, 9:03 PM.

Download Raw Diff

Details

Reviewers

aheejin
efriedma
dweber
hubert.reinterpretcast

Commits

rG72c628e83580: Reland "[WebAssembly] Emulate v128.const efficiently""

Summary

This reverts commit 432e4e56d3d2, which reverted 542523a61a21. Two issues from
the original commit have been fixed. First, MSVC does not like when std::array
is initialized from a braced init list contained within a parenthesized
expression list, so this commit switches to using the more portable double
braces. Second, there was a subtle endianness bug that prevented the original
commit from working correctly on big-endian machines, which has been fixed by
switching to using endianness-agnostic bit twiddling instead of type punning.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tlively created this revision.Oct 2 2020, 9:03 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 2 2020, 9:03 PM

Herald added subscribers: llvm-commits, ecnelises, sunfish and 4 others. · View Herald Transcript

tlively requested review of this revision.Oct 2 2020, 9:03 PM

Thanks all for your help diagnosing and fixing these issues! I decided to go with the bit twiddling solution rather than the corrected type punning solution because it seems simpler overall to not have to think about endianness.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1589–1602	This is the only part that has changed from the previous revision.

hubert.reinterpretcast added inline comments.Oct 2 2020, 9:08 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1589–1590	MSVC doesn't work with the parens even with the extra braces.

dweber added inline comments.Oct 2 2020, 9:16 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1589–1590	Right. It shouldn't have parens.

Harbormaster completed remote builds in B73864: Diff 295957.Oct 2 2020, 9:20 PM

hubert.reinterpretcast added inline comments.Oct 2 2020, 9:25 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1600	Can `LaneBits` be 64?

hubert.reinterpretcast added inline comments.Oct 2 2020, 9:39 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
33	We can remove this now, I think.

Make initializations even more MSVC-friendly

tlively marked 2 inline comments as done and an inline comment as not done.Oct 2 2020, 9:50 PM

tlively added inline comments.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1600	Yes, if the vector is already an v2i64.

Harbormaster completed remote builds in B73866: Diff 295959.Oct 2 2020, 10:01 PM

I can confirm that at least llvm/test/CodeGen/WebAssembly/simd-build-vector.ll is good on AIX (big-endian Power) with this patch (at least with the maskTrailingOnes<uint64_t>(LaneBits)) change.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

1600

Okay, I suggest using maskTrailingOnes<uint64_t>(LaneBits) to avoid undefined behaviour:

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index ec62f2a..b2913b6 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -30,8 +30,8 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsWebAssembly.h"
 #include "llvm/Support/Debug.h"
-#include "llvm/Support/Endian.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetOptions.h"
 using namespace llvm;
@@ -1597,7 +1597,8 @@ SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,
         auto Shift = LaneBits * (I % HalfLanes);
         auto Val = cast<ConstantSDNode>(Lane.getNode())->getZExtValue();
         I64s[I / HalfLanes] |= Val << Shift;
-        ConstLaneMasks[I / HalfLanes] |= ((1ULL << LaneBits) - 1) << Shift;
+        ConstLaneMasks[I / HalfLanes] |= maskTrailingOnes<uint64_t>(LaneBits)
+                                         << Shift;
       }
     }
     // Check whether all constant lanes in the second half of the vector are

In D88773#2309949, @tlively wrote:

Make initializations even more MSVC-friendly

The patch description should be updated to reflect what the issue with MSVC actually was.

dweber added inline comments.Oct 3 2020, 12:15 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1600	@tlively I think you need to add test cases for each integer type with all of the variants of 0xdeadbeef. This logic has so many branches, it would have been easier to read if it merely used if statements to compare integer types for the appropriate shifts.

dweber added inline comments.Oct 3 2020, 12:32 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1584–1633	@tlively this is a real edge case, but I think you need to verify that the integer type is a representable native integer type. If APInt causes VecT.isInteger() to return true, you'll have a colossal headache with supporting integers that are 27 bits.

Fix for the integer check. Proposal for a better, documented alternative solution.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1584–1633
1589–1602

Side note: I filed a bug request on the operator= behavior for ulittle64_t. It has an explicit constructor that takes in any value in native endianness, but operator= is defined to convert endianness. My guess is because the operator= doesn't return ulittle64_t&, it can't construct properly through operator=. Issue is here: https://bugs.llvm.org/show_bug.cgi?id=47719

Thanks for your continued work on this @dweber! I still think this solution is simpler than the type punning solution, though, because it saves readers from having to go figure out what ulittle64_t is and how it works.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1584–1633	This operation lowering occurs only after type legalization, so we know that the only machine value types we need to consider are those that WebAssembly supports. I also don't think this suggested change changes the semantics. Am I missing something?
1589–1590	Thanks!
1600	Is there UB here? `1ULL << 64` is well defined to be 0ULL [1], and 0ULL - 1 is well defined to be the max int for that type [2]. The reason I'm pushing back here is that I want to save readers of this code from having to go look up what exactly `maskTrailingOnes` does. [1] https://en.cppreference.com/w/cpp/language/operator_arithmetic#Bitwise_shift_operators:~:text=For%20unsigned%20a%2C%20the%20value%20of%20a,of%20the%20destination%20type%20are%20discarded). [2] https://en.cppreference.com/w/cpp/language/operator_arithmetic#Overflows:~:text=Unsigned%20integer%20arithmetic%20is%20always%20performed%20modulo%202n

In D88773#2309958, @hubert.reinterpretcast wrote:

In D88773#2309949, @tlively wrote:

Make initializations even more MSVC-friendly

The patch description should be updated to reflect what the issue with MSVC actually was.

I think the description still applies. Were you thinking of mentioning the specific error message? "C2100: illegal indirection" doesn't seem to give any more information than is already in the description.

efriedma added inline comments.Oct 5 2020, 6:11 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1600	I think you missed the part where it says "if the value of the right operand is negative or is greater or equal to the number of bits in the promoted left operand, the behavior is undefined".

In D88773#2312611, @tlively wrote:

I think the description still applies. Were you thinking of mentioning the specific error message? "C2100: illegal indirection" doesn't seem to give any more information than is already in the description.

I was referring to the following.

Quote:

First, MSVC does not like when std::array is initialized with only single braces [ ... ]

MSVC does not like when std::array is initialized from a braced init list contained within a parenthesized expression list.

tlively edited the summary of this revision. (Show Details)Oct 9 2020, 2:38 PM

Address review comments

tlively added inline comments.Oct 9 2020, 5:37 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1600	Ugh, yep. Thank you! Switched to using `maskTrailingOnes`.

Harbormaster completed remote builds in B74668: Diff 297362.Oct 9 2020, 6:06 PM

Thanks. This matches what I tested with a big endian host system. I'm not sure if @dweber agrees that their comments have been addressed though.

In D88773#2323431, @hubert.reinterpretcast wrote:

Thanks. This matches what I tested with a big endian host system. I'm not sure if @dweber agrees that their comments have been addressed though.

I'm pretty okay with this change. It took me a minute to wrap my head around because it wasn't immediately obvious how it was placing the integers inside the 64s, but overall it's fine. The only thing I think is missing is a safety check on the bitwise or operation inside the loop. Before it implicitly extracted the least significant bits by byte size meaning there was no room for a value to exceed range (e.g. bits set above the shift before the or). I spoke with @tlively offline about this, and he said he would add an assertion to make sure the value is in range. With that change, this has my blessing.

Add masking of lane value and new tests

@dweber, you were totally right that this needed to be masked. In particular, when the lane contains a negative number, getZExtValue returns a very large 64-bit constant that needs to be truncated. Thanks for pressing me on that!

In D88773#2326620, @tlively wrote:

Add masking of lane value and new tests

@dweber, you were totally right that this needed to be masked. In particular, when the lane contains a negative number, getZExtValue returns a very large 64-bit constant that needs to be truncated. Thanks for pressing me on that!

Yeah... I had an eerie feeling something like that could happen -- I just couldn't put my finger on it. Really. Thanks go to you for taking care of this.

Do you know if it would still be the case if you were using getLimitedValue instead of getZExtValue? I want to say yes, but I'm not entirely sure how these two work.

In D88773#2326695, @dweber wrote:

Do you know if it would still be the case if you were using getLimitedValue instead of getZExtValue? I want to say yes, but I'm not entirely sure how these two work.

Yes, getLimitedValue bottoms out with this line of code: return ugt(Limit) ? Limit : getZExtValue(); so they are identical in this situation.

In D88773#2326696, @tlively wrote:

In D88773#2326695, @dweber wrote:

Do you know if it would still be the case if you were using getLimitedValue instead of getZExtValue? I want to say yes, but I'm not entirely sure how these two work.

Yes, getLimitedValue bottoms out with this line of code: return ugt(Limit) ? Limit : getZExtValue(); so they are identical in this situation.

That answers all of my questions... now to figure out where I press the checkbox to approve.

dweber accepted this revision.Oct 12 2020, 9:33 PM

This revision is now accepted and ready to land.Oct 12 2020, 9:33 PM

This revision was landed with ongoing or failed builds.Oct 12 2020, 9:37 PM

Closed by commit rG72c628e83580: Reland "[WebAssembly] Emulate v128.const efficiently"" (authored by tlively). · Explain Why

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rG72c628e83580: Reland "[WebAssembly] Emulate v128.const efficiently"".

Harbormaster completed remote builds in B74888: Diff 297752.Oct 12 2020, 10:21 PM

Revision Contents

Path

Size

llvm/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.cpp

61 lines

test/

CodeGen/

WebAssembly/

simd-build-vector.ll

91 lines

Diff 297760

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show All 24 Lines

#include "llvm/CodeGen/SelectionDAG.h" #include "llvm/CodeGen/SelectionDAG.h"

#include "llvm/CodeGen/WasmEHFuncInfo.h" #include "llvm/CodeGen/WasmEHFuncInfo.h"

#include "llvm/IR/DiagnosticInfo.h" #include "llvm/IR/DiagnosticInfo.h"

#include "llvm/IR/DiagnosticPrinter.h" #include "llvm/IR/DiagnosticPrinter.h"

#include "llvm/IR/Function.h" #include "llvm/IR/Function.h"

#include "llvm/IR/Intrinsics.h" #include "llvm/IR/Intrinsics.h"

#include "llvm/IR/IntrinsicsWebAssembly.h" #include "llvm/IR/IntrinsicsWebAssembly.h"

#include "llvm/Support/Debug.h" #include "llvm/Support/Debug.h"

#include "llvm/Support/ErrorHandling.h" #include "llvm/Support/ErrorHandling.h"

hubert.reinterpretcastUnsubmitted

Not Done

We can remove this now, I think.

hubert.reinterpretcast: We can remove this now, I think.

#include "llvm/Support/MathExtras.h"

#include "llvm/Support/raw_ostream.h" #include "llvm/Support/raw_ostream.h"

#include "llvm/Target/TargetOptions.h" #include "llvm/Target/TargetOptions.h"

using namespace llvm; using namespace llvm;

#define DEBUG_TYPE "wasm-lower" #define DEBUG_TYPE "wasm-lower"

WebAssemblyTargetLowering::WebAssemblyTargetLowering( WebAssemblyTargetLowering::WebAssemblyTargetLowering(

const TargetMachine &TM, const WebAssemblySubtarget &STI) const TargetMachine &TM, const WebAssemblySubtarget &STI)

▲ Show 20 Lines • Show All 1,518 Lines • ▼ Show 20 Lines if (NumSwizzleLanes >= NumSplatLanes &&

Result = DAG.getNode(WebAssemblyISD::SWIZZLE, DL, VecT, SwizzleSrc, Result = DAG.getNode(WebAssemblyISD::SWIZZLE, DL, VecT, SwizzleSrc,

SwizzleIndices); SwizzleIndices);

auto Swizzled = std::make_pair(SwizzleSrc, SwizzleIndices); auto Swizzled = std::make_pair(SwizzleSrc, SwizzleIndices);

IsLaneConstructed = [&, Swizzled](size_t I, const SDValue &Lane) { IsLaneConstructed = [&, Swizzled](size_t I, const SDValue &Lane) {

return Swizzled == GetSwizzleSrcs(I, Lane); return Swizzled == GetSwizzleSrcs(I, Lane);

}; };

} else if (NumConstantLanes >= NumSplatLanes && } else if (NumConstantLanes >= NumSplatLanes &&

Subtarget->hasUnimplementedSIMD128()) { Subtarget->hasUnimplementedSIMD128()) {

// If we support v128.const, emit it directly

SmallVector<SDValue, 16> ConstLanes; SmallVector<SDValue, 16> ConstLanes;

for (const SDValue &Lane : Op->op_values()) { for (const SDValue &Lane : Op->op_values()) {

if (IsConstant(Lane)) { if (IsConstant(Lane)) {

ConstLanes.push_back(Lane); ConstLanes.push_back(Lane);

} else if (LaneT.isFloatingPoint()) { } else if (LaneT.isFloatingPoint()) {

ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT)); ConstLanes.push_back(DAG.getConstantFP(0, DL, LaneT));

} else { } else {

ConstLanes.push_back(DAG.getConstant(0, DL, LaneT)); ConstLanes.push_back(DAG.getConstant(0, DL, LaneT));

} }

Result = DAG.getBuildVector(VecT, DL, ConstLanes); Result = DAG.getBuildVector(VecT, DL, ConstLanes);

IsLaneConstructed = [&](size_t _, const SDValue &Lane) { IsLaneConstructed = [&IsConstant](size_t _, const SDValue &Lane) {

return IsConstant(Lane); return IsConstant(Lane);

}; };

} else if (NumConstantLanes >= NumSplatLanes && VecT.isInteger()) {

// Otherwise, if this is an integer vector, pack the lane values together so

// we can construct the 128-bit constant from a pair of i64s using a splat

// followed by at most one i64x2.replace_lane. Also keep track of the lanes

// that actually matter so we can avoid the replace_lane in more cases.

std::array<uint64_t, 2> I64s{{0, 0}};

std::array<uint64_t, 2> ConstLaneMasks{{0, 0}};

hubert.reinterpretcastUnsubmitted

Done

// that actually matter so we can avoid the replace_lane in more cases.

- std::array<uint64_t, 2> I64s({{0, 0}});

- std::array<uint64_t, 2> ConstLaneMasks({{0, 0}});

+ std::array<uint64_t, 2> I64s{{0, 0}};

+ std::array<uint64_t, 2> ConstLaneMasks{{0, 0}};

size_t LaneBits = 128 / Lanes;

MSVC doesn't work with the parens even with the extra braces.

hubert.reinterpretcast: MSVC doesn't work with the parens even with the extra braces.

dweberUnsubmitted

Done

Right. It shouldn't have parens.

dweber: Right. It shouldn't have parens.

tlivelyAuthorUnsubmitted

Done

Thanks!

tlively: Thanks!

size_t LaneBits = 128 / Lanes;

size_t HalfLanes = Lanes / 2;

for (size_t I = 0; I < Lanes; ++I) {

const SDValue &Lane = Op.getOperand(I);

if (IsConstant(Lane)) {

// How much we need to shift Val to position it in an i64

auto Shift = LaneBits * (I % HalfLanes);

auto Mask = maskTrailingOnes<uint64_t>(LaneBits);

auto Val = cast<ConstantSDNode>(Lane.getNode())->getZExtValue() & Mask;

I64s[I / HalfLanes] |= Val << Shift;

hubert.reinterpretcastUnsubmitted

Not Done

Can LaneBits be 64?

hubert.reinterpretcast: Can `LaneBits` be 64?

tlivelyAuthorUnsubmitted

Not Done

Yes, if the vector is already an v2i64.

tlively: Yes, if the vector is already an v2i64.

hubert.reinterpretcastUnsubmitted

Not Done

Okay, I suggest using maskTrailingOnes<uint64_t>(LaneBits) to avoid undefined behaviour:

diff --git a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
index ec62f2a..b2913b6 100644
--- a/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
+++ b/llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
@@ -30,8 +30,8 @@
 #include "llvm/IR/Intrinsics.h"
 #include "llvm/IR/IntrinsicsWebAssembly.h"
 #include "llvm/Support/Debug.h"
-#include "llvm/Support/Endian.h"
 #include "llvm/Support/ErrorHandling.h"
+#include "llvm/Support/MathExtras.h"
 #include "llvm/Support/raw_ostream.h"
 #include "llvm/Target/TargetOptions.h"
 using namespace llvm;
@@ -1597,7 +1597,8 @@ SDValue WebAssemblyTargetLowering::LowerBUILD_VECTOR(SDValue Op,
         auto Shift = LaneBits * (I % HalfLanes);
         auto Val = cast<ConstantSDNode>(Lane.getNode())->getZExtValue();
         I64s[I / HalfLanes] |= Val << Shift;
-        ConstLaneMasks[I / HalfLanes] |= ((1ULL << LaneBits) - 1) << Shift;
+        ConstLaneMasks[I / HalfLanes] |= maskTrailingOnes<uint64_t>(LaneBits)
+                                         << Shift;
       }
     }
     // Check whether all constant lanes in the second half of the vector are

hubert.reinterpretcast: Okay, I suggest using `maskTrailingOnes<uint64_t>(LaneBits)` to avoid undefined behaviour: ```…

tlivelyAuthorUnsubmitted

Done

Is there UB here? 1ULL << 64 is well defined to be 0ULL [1], and 0ULL - 1 is well defined to be the max int for that type [2]. The reason I'm pushing back here is that I want to save readers of this code from having to go look up what exactly maskTrailingOnes does.

[1] https://en.cppreference.com/w/cpp/language/operator_arithmetic#Bitwise_shift_operators:~:text=For%20unsigned%20a%2C%20the%20value%20of%20a,of%20the%20destination%20type%20are%20discarded).

[2] https://en.cppreference.com/w/cpp/language/operator_arithmetic#Overflows:~:text=Unsigned%20integer%20arithmetic%20is%20always%20performed%20modulo%202n

tlively: Is there UB here? `1ULL << 64` is well defined to be 0ULL [1], and 0ULL - 1 is well defined to…

efriedmaUnsubmitted

Not Done

I think you missed the part where it says "if the value of the right operand is negative or is greater or equal to the number of bits in the promoted left operand, the behavior is undefined".

efriedma: I think you missed the part where it says "if the value of the right operand is negative or is…

tlivelyAuthorUnsubmitted

Done

Ugh, yep. Thank you! Switched to using maskTrailingOnes.

tlively: Ugh, yep. Thank you! Switched to using `maskTrailingOnes`.

dweberUnsubmitted

Not Done

@tlively I think you need to add test cases for each integer type with all of the variants of 0xdeadbeef. This logic has so many branches, it would have been easier to read if it merely used if statements to compare integer types for the appropriate shifts.

dweber: @tlively I think you need to add test cases for each integer type with all of the variants of…

ConstLaneMasks[I / HalfLanes] |= Mask << Shift;

}

tlivelyAuthorUnsubmitted

Done

This is the only part that has changed from the previous revision.

tlively: This is the only part that has changed from the previous revision.

dweberUnsubmitted

Not Done

// that actually matter so we can avoid the replace_lane in more cases.

- std::array<uint64_t, 2> I64s{{0, 0}};

- std::array<uint64_t, 2> ConstLaneMasks{{0, 0}};

- size_t LaneBits = 128 / Lanes;

- size_t HalfLanes = Lanes / 2;

- for (size_t I = 0; I < Lanes; ++I) {

- const SDValue &Lane = Op.getOperand(I);

+ // Otherwise, if this is an integer vector, pack the lane values together so

+ // we can construct the 128-bit constant from a pair of i64s using a splat

+ // followed by at most one i64x2.replace_lane. Also keep track of the lanes

+ // that actually matter so we can avoid the replace_lane in more cases.

+ using llvm::support::ulittle64_t;

+ // What ulittle64_t does is guarantee that on big endian machines

+ // operator uint64_t() returns in native byte order.

+ // Essentially this makes it so that the two calls to DAG.getConstant below

+ // produce the proper output value on big endian systems.

+ std::array<ulittle64_t, 2> I64s{{ulittle64_t(0), ulittle64_t(0)}};

+ std::array<ulittle64_t, 2> ConstLaneMasks{{ulittle64_t(0), ulittle64_t(0)}};

+ uint8_t *I64Bytes = reinterpret_cast<uint8_t *>(I64s.data());

+ uint8_t *MaskBytes = reinterpret_cast<uint8_t *>(ConstLaneMasks.data());

+ unsigned I = 0;

+ size_t IntegerWidthInBytes = VecT.getScalarSizeInBits() / 8;

+ for (const SDValue &Lane : Op->op_values()) {

if (IsConstant(Lane)) {

- // How much we need to shift Val to position it in an i64

- auto Shift = LaneBits * (I % HalfLanes);

- auto Val = cast<ConstantSDNode>(Lane.getNode())->getZExtValue();

- I64s[I / HalfLanes] |= Val << Shift;

- ConstLaneMasks[I / HalfLanes] |= ((1ULL << LaneBits) - 1) << Shift;

- }

- // Check whether all constant lanes in the second half of the vector are

+ // The endianness of the compiler matters here. We want to enforce

+ // little endianness so that the bytes of a smaller integer type will

+ // occur first in the uint64_t.

+ auto *Const = cast<ConstantSDNode>(Lane.getNode());

+ // This causes Val to be converted to little endian implicitly if it is not the native byte order.

+ // So if value is a constant 6 in big endian byte order

+ // it goes from a byte array of:

+ // {0,0,0,0,0,0,0,6} to:

+ // {6,0,0,0,0,0,0,0}

+ ulittle64_t Val;

+ // This is really goofy, but it appears that operator= is missing an operator

+ // or friend operator declaration.

+ Val.operator=(Const->getLimitedValue());

+ // Since we're now guaranteed to be in little endian, we can assume that the lowest bits

+ // and bytes of the integer are at the front of the 16 byte ulittle64_t array.

+ uint8_t *ValPtr = reinterpret_cast<uint8_t *>(&Val);

+ // Now that the significant bits of the integer are packed at the front of the byte array,

+ // this will copy each value to the proper position 16 byte array.

+ // As an example, if we have 4x32bit value vector, and the first value (Val) is a constant 6

+ // and last value is a constant 7, this will yield 16 byte array in little endian byte order:

+ // which is really:

+ // {6,0,0,0,0,0,0,0

+ // 0,0,0,0,0,0,0,0

+ // 7,0,0,0,0,0,0,0}

+ // Note that the 2 middle lanes are skipped (left as zero) because there is no constant

+ // value in that spot.

+ std::copy(ValPtr, ValPtr + IntegerWidthInBytes, I64Bytes + I * IntegerWidthInBytes);

+ // This will produce a full mask regardless of byte order.

+ uint64_t Mask = std::numeric_limits<uint64_t>::max();

+ uint8_t *MaskPtr = reinterpret_cast<uint8_t *>(&Mask);

+ std::copy(MaskPtr, MaskPtr + IntegerWidthInBytes, MaskBytes + I * IntegerWidthInBytes);

+ }

+ ++I;

+ } // Check whether all constant lanes in the second half of the vector are

dweber:

}

// Check whether all constant lanes in the second half of the vector are

// equivalent in the first half or vice versa to determine whether splatting

// either side will be sufficient to materialize the constant. As a special

// case, if the first and second halves have no constant lanes in common, we

// can just combine them.

bool FirstHalfSufficient = (I64s[0] & ConstLaneMasks[1]) == I64s[1];

bool SecondHalfSufficient = (I64s[1] & ConstLaneMasks[0]) == I64s[0];

bool CombinedSufficient = (ConstLaneMasks[0] & ConstLaneMasks[1]) == 0;

uint64_t Splatted;

if (SecondHalfSufficient) {

Splatted = I64s[1];

} else if (CombinedSufficient) {

Splatted = I64s[0] | I64s[1];

} else {

Splatted = I64s[0];

}

Result = DAG.getSplatBuildVector(MVT::v2i64, DL,

DAG.getConstant(Splatted, DL, MVT::i64));

if (!FirstHalfSufficient && !SecondHalfSufficient && !CombinedSufficient) {

Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, MVT::v2i64, Result,

DAG.getConstant(I64s[1], DL, MVT::i64),

DAG.getConstant(1, DL, MVT::i32));

} }

if (!Result) { Result = DAG.getBitcast(VecT, Result);

IsLaneConstructed = [&IsConstant](size_t _, const SDValue &Lane) {

return IsConstant(Lane);

};

} else {

dweberUnsubmitted

Not Done

@tlively this is a real edge case, but I think you need to verify that the integer type is a representable native integer type. If APInt causes VecT.isInteger() to return true, you'll have a colossal headache with supporting integers that are 27 bits.

dweber: @tlively this is a real edge case, but I think you need to verify that the integer type is a…

tlivelyAuthorUnsubmitted

Done

This operation lowering occurs only after type legalization, so we know that the only machine value types we need to consider are those that WebAssembly supports. I also don't think this suggested change changes the semantics. Am I missing something?

tlively: This operation lowering occurs only after type legalization, so we know that the only machine…

dweberUnsubmitted

Not Done

return IsConstant(Lane);

};

- } else if (NumConstantLanes >= NumSplatLanes && VecT.isInteger()) {

+ } else if (NumConstantLanes >= NumSplatLanes && VecT.getVectorElementType().isScalarInteger()) {

// Otherwise, if this is an integer vector, pack the lane values together so

dweber:

// Use a splat, but possibly a load_splat // Use a splat, but possibly a load_splat

LoadSDNode *SplattedLoad; LoadSDNode *SplattedLoad;

if ((SplattedLoad = dyn_cast<LoadSDNode>(SplatValue)) && if ((SplattedLoad = dyn_cast<LoadSDNode>(SplatValue)) &&

SplattedLoad->getMemoryVT() == VecT.getVectorElementType()) { SplattedLoad->getMemoryVT() == VecT.getVectorElementType()) {

Result = DAG.getMemIntrinsicNode( Result = DAG.getMemIntrinsicNode(

WebAssemblyISD::LOAD_SPLAT, DL, DAG.getVTList(VecT), WebAssemblyISD::LOAD_SPLAT, DL, DAG.getVTList(VecT),

{SplattedLoad->getChain(), SplattedLoad->getBasePtr(), {SplattedLoad->getChain(), SplattedLoad->getBasePtr(),

SplattedLoad->getOffset()}, SplattedLoad->getOffset()},

SplattedLoad->getMemoryVT(), SplattedLoad->getMemOperand()); SplattedLoad->getMemoryVT(), SplattedLoad->getMemOperand());

} else { } else {

Result = DAG.getSplatBuildVector(VecT, DL, SplatValue); Result = DAG.getSplatBuildVector(VecT, DL, SplatValue);

} }

IsLaneConstructed = [&](size_t _, const SDValue &Lane) { IsLaneConstructed = [&SplatValue](size_t _, const SDValue &Lane) {

return Lane == SplatValue; return Lane == SplatValue;

}; };

} }

assert(Result);

assert(IsLaneConstructed);

// Add replace_lane instructions for any unhandled values // Add replace_lane instructions for any unhandled values

for (size_t I = 0; I < Lanes; ++I) { for (size_t I = 0; I < Lanes; ++I) {

const SDValue &Lane = Op->getOperand(I); const SDValue &Lane = Op->getOperand(I);

if (!Lane.isUndef() && !IsLaneConstructed(I, Lane)) if (!Lane.isUndef() && !IsLaneConstructed(I, Lane))

Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecT, Result, Lane, Result = DAG.getNode(ISD::INSERT_VECTOR_ELT, DL, VecT, Result, Lane,

DAG.getConstant(I, DL, MVT::i32)); DAG.getConstant(I, DL, MVT::i32));

} }

▲ Show 20 Lines • Show All 208 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-build-vector.ll

; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+unimplemented-simd128 \| FileCheck %s --check-prefixes=CHECK,UNIMP		; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+unimplemented-simd128 \| FileCheck %s --check-prefixes=CHECK,UNIMP
; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 \| FileCheck %s --check-prefixes=CHECK,SIMD-VM		; RUN: llc < %s -asm-verbose=false -verify-machineinstrs -disable-wasm-fallthrough-return-opt -wasm-disable-explicit-locals -wasm-keep-registers -mattr=+simd128 \| FileCheck %s --check-prefixes=CHECK,SIMD-VM

; Test that the logic to choose between v128.const vector		; Test that the logic to choose between v128.const vector
; initialization and splat vector initialization and to optimize the		; initialization and splat vector initialization and to optimize the
; choice of splat value works correctly.		; choice of splat value works correctly.

target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"		target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
target triple = "wasm32-unknown-unknown"		target triple = "wasm32-unknown-unknown"

		; CHECK-LABEL: emulated_const_trivial_splat:
		; CHECK-NEXT: .functype emulated_const_trivial_splat () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_trivial_splat() {
		ret <4 x i32> <i32 1, i32 2, i32 1, i32 2>
		}

		; CHECK-LABEL: emulated_const_first_sufficient:
		; CHECK-NEXT: .functype emulated_const_first_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_first_sufficient() {
		ret <4 x i32> <i32 1, i32 2, i32 undef, i32 2>
		}

		; CHECK-LABEL: emulated_const_second_sufficient:
		; CHECK-NEXT: .functype emulated_const_second_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_second_sufficient() {
		ret <4 x i32> <i32 1, i32 undef, i32 1, i32 2>
		}

		; CHECK-LABEL: emulated_const_combined_sufficient:
		; CHECK-NEXT: .functype emulated_const_combined_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_combined_sufficient() {
		ret <4 x i32> <i32 1, i32 undef, i32 undef, i32 2>
		}

		; CHECK-LABEL: emulated_const_either_sufficient:
		; CHECK-NEXT: .functype emulated_const_either_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 1
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		; UNIMP: v128.const
		define <4 x i32> @emulated_const_either_sufficient() {
		ret <4 x i32> <i32 1, i32 undef, i32 1, i32 undef>
		}

		; CHECK-LABEL: emulated_const_neither_sufficient:
		; CHECK-NEXT: .functype emulated_const_neither_sufficient () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 8589934593
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: i64.const $push2=, 17179869184
		; SIMD-VM-NEXT: i64x2.replace_lane $push3=, $pop1, 1, $pop2
		; SIMD-VM-NEXT: return $pop3
		define <4 x i32> @emulated_const_neither_sufficient() {
		ret <4 x i32> <i32 1, i32 2, i32 undef, i32 4>
		}

		; CHECK-LABEL: emulated_const_combined_sufficient_large:
		; CHECK-NEXT: .functype emulated_const_combined_sufficient_large () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, 506097522914230528
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: return $pop1
		define <16 x i8> @emulated_const_combined_sufficient_large() {
		ret <16 x i8> <i8 0, i8 undef, i8 2, i8 undef, i8 4, i8 undef, i8 6, i8 undef,
		i8 undef, i8 1, i8 undef, i8 3, i8 undef, i8 5, i8 undef, i8 7>
		}

		; CHECK-LABEL: emulated_const_neither_sufficient_large:
		; CHECK-NEXT: .functype emulated_const_neither_sufficient_large () -> (v128)
		; SIMD-VM-NEXT: i64.const $push0=, -70368726997663744
		; SIMD-VM-NEXT: i64x2.splat $push1=, $pop0
		; SIMD-VM-NEXT: i64.const $push2=, 504408655873966336
		; SIMD-VM-NEXT: i64x2.replace_lane $push3=, $pop1, 1, $pop2
		; SIMD-VM-NEXT: return $pop3
		define <16 x i8> @emulated_const_neither_sufficient_large() {
		ret <16 x i8> <i8 0, i8 undef, i8 2, i8 undef, i8 4, i8 undef, i8 6, i8 255,
		i8 undef, i8 1, i8 undef, i8 3, i8 undef, i8 5, i8 undef, i8 7>
		}

; CHECK-LABEL: same_const_one_replaced_i16x8:		; CHECK-LABEL: same_const_one_replaced_i16x8:
; CHECK-NEXT: .functype same_const_one_replaced_i16x8 (i32) -> (v128)		; CHECK-NEXT: .functype same_const_one_replaced_i16x8 (i32) -> (v128)
; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 42, 42, 42, 42, 42, 0, 42, 42		; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 42, 42, 42, 42, 42, 0, 42, 42
; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0		; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0
; UNIMP-NEXT: return $pop[[L1]]		; UNIMP-NEXT: return $pop[[L1]]
; SIMD-VM: i16x8.splat		; SIMD-VM: i64x2.splat
define <8 x i16> @same_const_one_replaced_i16x8(i16 %x) {		define <8 x i16> @same_const_one_replaced_i16x8(i16 %x) {
%v = insertelement		%v = insertelement
<8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>,		<8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42>,
i16 %x,		i16 %x,
i32 5		i32 5
ret <8 x i16> %v		ret <8 x i16> %v
}		}

; CHECK-LABEL: different_const_one_replaced_i16x8:		; CHECK-LABEL: different_const_one_replaced_i16x8:
; CHECK-NEXT: .functype different_const_one_replaced_i16x8 (i32) -> (v128)		; CHECK-NEXT: .functype different_const_one_replaced_i16x8 (i32) -> (v128)
; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 1, -2, 3, -4, 5, 0, 7, -8		; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 1, -2, 3, -4, 5, 0, 7, -8
; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0		; UNIMP-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 5, $0
; UNIMP-NEXT: return $pop[[L1]]		; UNIMP-NEXT: return $pop[[L1]]
; SIMD-VM: i16x8.splat		; SIMD-VM: i64x2.splat
define <8 x i16> @different_const_one_replaced_i16x8(i16 %x) {		define <8 x i16> @different_const_one_replaced_i16x8(i16 %x) {
%v = insertelement		%v = insertelement
<8 x i16> <i16 1, i16 -2, i16 3, i16 -4, i16 5, i16 -6, i16 7, i16 -8>,		<8 x i16> <i16 1, i16 -2, i16 3, i16 -4, i16 5, i16 -6, i16 7, i16 -8>,
i16 %x,		i16 %x,
i32 5		i32 5
ret <8 x i16> %v		ret <8 x i16> %v
}		}

Show All 24 Lines	%v = insertelement
i32 2		i32 2
ret <4 x float> %v		ret <4 x float> %v
}		}

; CHECK-LABEL: splat_common_const_i32x4:		; CHECK-LABEL: splat_common_const_i32x4:
; CHECK-NEXT: .functype splat_common_const_i32x4 () -> (v128)		; CHECK-NEXT: .functype splat_common_const_i32x4 () -> (v128)
; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 0, 3, 3, 1		; UNIMP-NEXT: v128.const $push[[L0:[0-9]+]]=, 0, 3, 3, 1
; UNIMP-NEXT: return $pop[[L0]]		; UNIMP-NEXT: return $pop[[L0]]
; SIMD-VM: i32x4.splat		; SIMD-VM: i64x2.splat
define <4 x i32> @splat_common_const_i32x4() {		define <4 x i32> @splat_common_const_i32x4() {
ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>		ret <4 x i32> <i32 undef, i32 3, i32 3, i32 1>
}		}

; CHECK-LABEL: splat_common_arg_i16x8:		; CHECK-LABEL: splat_common_arg_i16x8:
; CHECK-NEXT: .functype splat_common_arg_i16x8 (i32, i32, i32) -> (v128)		; CHECK-NEXT: .functype splat_common_arg_i16x8 (i32, i32, i32) -> (v128)
; CHECK-NEXT: i16x8.splat $push[[L0:[0-9]+]]=, $2		; CHECK-NEXT: i16x8.splat $push[[L0:[0-9]+]]=, $2
; CHECK-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 0, $1		; CHECK-NEXT: i16x8.replace_lane $push[[L1:[0-9]+]]=, $pop[[L0]], 0, $1
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines

; CHECK-LABEL: mashup_const_i8x16:		; CHECK-LABEL: mashup_const_i8x16:
; CHECK-NEXT: .functype mashup_const_i8x16 (v128, v128, i32) -> (v128)		; CHECK-NEXT: .functype mashup_const_i8x16 (v128, v128, i32) -> (v128)
; UNIMP: v128.const $push[[L0:[0-9]+]]=, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0		; UNIMP: v128.const $push[[L0:[0-9]+]]=, 0, 0, 0, 0, 42, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0
; UNIMP: i8x16.replace_lane		; UNIMP: i8x16.replace_lane
; UNIMP: i8x16.replace_lane		; UNIMP: i8x16.replace_lane
; UNIMP: i8x16.replace_lane		; UNIMP: i8x16.replace_lane
; UNIMP: return		; UNIMP: return
; SIMD-VM: i8x16.splat		; SIMD-VM: i64x2.splat
define <16 x i8> @mashup_const_i8x16(<16 x i8> %src, <16 x i8> %mask, i8 %splatted) {		define <16 x i8> @mashup_const_i8x16(<16 x i8> %src, <16 x i8> %mask, i8 %splatted) {
; swizzle 0		; swizzle 0
%m0 = extractelement <16 x i8> %mask, i32 0		%m0 = extractelement <16 x i8> %mask, i32 0
%s0 = extractelement <16 x i8> %src, i8 %m0		%s0 = extractelement <16 x i8> %src, i8 %m0
%v0 = insertelement <16 x i8> undef, i8 %s0, i32 0		%v0 = insertelement <16 x i8> undef, i8 %s0, i32 0
; splat 3		; splat 3
%v1 = insertelement <16 x i8> %v0, i8 %splatted, i32 3		%v1 = insertelement <16 x i8> %v0, i8 %splatted, i32 3
; splat 12		; splat 12
▲ Show 20 Lines • Show All 61 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Reland "[WebAssembly] Emulate v128.const efficiently""ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 297760

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/simd-build-vector.ll

Reland "[WebAssembly] Emulate v128.const efficiently""
ClosedPublic