This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/WebAssembly/
-
Target/
-
WebAssembly/
-
WebAssemblyISelLowering.h
2/6
WebAssemblyISelLowering.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
4/4
simd-shift-in-loop.ll

Differential D158399

[WebAssembly] Optimize vector shift using a splat value from outside block
ClosedPublic

Authored by YolandaCY on Aug 21 2023, 1:54 AM.

Download Raw Diff

Details

Reviewers

tlively
craig.topper

Commits

rG291101aa8ea5: [WebAssembly] Optimize vector shift using a splat value from outside block

Summary

The vector shift operation in WebAssembly uses an i32 shift amount type,
while the LLVM IR requires binary operator uses the same type of operands.
When the shift amount operand is splated from a different block, the splat source
will not be exported and the vector shift will be unrolled to scalar shifts.
This patch enables the vector shift to identify the splat source value from
the other block, and generate expected WebAssembly bytecode when lowering.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

YolandaCY created this revision.Aug 21 2023, 1:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 1:54 AM

Herald added subscribers: sunshaoce, pmatos, asb and 5 others. · View Herald Transcript

Harbormaster completed remote builds in B253790: Diff 551933.Aug 21 2023, 3:02 AM

This is to resolve a WebAssembly codegen issue when vector shift is used in a loop, while the shift amount is initialized outside the loop. Could you help take a look? Thanks!

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:18 AM

Herald added subscribers: llvm-commits, aheejin. · View Herald Transcript

Thanks for the patch! It looks like this will be a nice improvement.

@craig.topper, it would be great to get your comments as well, as someone more familiar with the target-independent infrastructure here.

llvm/include/llvm/CodeGen/SelectionDAG.h
2459–2460 ↗	(On Diff #551933)	It would be good to add a comment describing the contents of this map.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
841	Why allow null instruction pointers here? It would seem simpler to ensure that callers pass a valid instruction and to assume we have a valid instruction here.
845	What is the benefit of including `isShiftAmountScalar()` here, given that we know it is always true?
llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll
2	What do you think about using the auto-update script for this test? The output would be more verbose, but it would also be easy to update if anything changes, and I think it would be helpful to see that the whole function is emitted correctly.
17	Can we add a test where the vshift is a phi, just to show that that still works correctly?

Add test and comments

Harbormaster completed remote builds in B254048: Diff 552298.Aug 22 2023, 4:34 AM

Thank you Thomas for the comments! Please see my updates in the new revision.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
841	This is a fast check on the outside block when visit the splat vector, and don't know yet if the splat vector will be used by a shift op. To identify the instruction I need to iterate all uses of the splat vector until we find the vector shift. Since this is only needed for WebAssembly target, I add a quick check here to reduce the cost for other platforms. Seems a little confusion, do you think we need to seperate it to two functions?
845	This will be called in SelectionDAGBuilder to skip the optimizaiton for other platforms when visit shift.
llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll
2	Sure. Previously I make it simple to avoid mismatch on unrelated changes. But if we have an auto-update script that would be helpful to verify the whole function directly.
17	Sure. I have added one more test.

tlively added inline comments.Aug 22 2023, 12:54 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
841	Oh I see, that makes sense.
845	We know that the `isShiftAmountScalar()` here will be the WebAssemblyTargetLowering version of `isShiftAmountScalar()`, so it will always be true, right? So this line could be: `return I->isShift() && I->getOperand(1) == Splat;`

Could this use the TargetLowering::shouldSinkOperands hook to get CodeGenPrepare to move the splat into the loop. ARM, X86, and RISC-V all do that.

Use the shouldSinkOperands hook in CodeGenPrepare

In D158399#4607933, @craig.topper wrote:

Could this use the TargetLowering::shouldSinkOperands hook to get CodeGenPrepare to move the splat into the loop. ARM, X86, and RISC-V all do that.

Thanks for the suggestion! I have revised the code to use this existing hook. Please help take a look again. @craig.topper @tlively Thanks!

Harbormaster completed remote builds in B254586: Diff 553057.Aug 24 2023, 4:05 AM

Nice, this LGTM. Thanks for the tip, @craig.topper!

This revision is now accepted and ready to land.Aug 24 2023, 7:41 AM

In D158399#4613902, @tlively wrote:

Nice, this LGTM. Thanks for the tip, @craig.topper!

Thanks Thomas. Could you help me commit this change?

Sure thing. I'll use your author email from the other patches in your phabricator profile.

Closed by commit rG291101aa8ea5: [WebAssembly] Optimize vector shift using a splat value from outside block (authored by YolandaCY, committed by tlively). · Explain WhyAug 25 2023, 8:13 AM

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rG291101aa8ea5: [WebAssembly] Optimize vector shift using a splat value from outside block.

In D158399#4617129, @tlively wrote:

Sure thing. I'll use your author email from the other patches in your phabricator profile.

OK, Thank you!

Revision Contents

Path

Size

llvm/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.h

2 lines

WebAssemblyISelLowering.cpp

25 lines

test/

CodeGen/

WebAssembly/

simd-shift-in-loop.ll

104 lines

Diff 553483

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS,		unsigned AS,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;
bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, Align Alignment,		bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, Align Alignment,
MachineMemOperand::Flags Flags,		MachineMemOperand::Flags Flags,
unsigned *Fast) const override;		unsigned *Fast) const override;
bool isIntDivCheap(EVT VT, AttributeList Attr) const override;		bool isIntDivCheap(EVT VT, AttributeList Attr) const override;
bool isVectorLoadExtDesirable(SDValue ExtVal) const override;		bool isVectorLoadExtDesirable(SDValue ExtVal) const override;
bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;		bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
		bool shouldSinkOperands(Instruction *I,
		SmallVectorImpl<Use *> &Ops) const override;
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;
bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,		bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,
MachineFunction &MF,		MachineFunction &MF,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;

void computeKnownBitsForTargetNode(const SDValue Op, KnownBits &Known,		void computeKnownBitsForTargetNode(const SDValue Op, KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

	Show All 26 Lines
	#include "llvm/CodeGen/MachineRegisterInfo.h"			#include "llvm/CodeGen/MachineRegisterInfo.h"
	#include "llvm/CodeGen/SelectionDAG.h"			#include "llvm/CodeGen/SelectionDAG.h"
	#include "llvm/CodeGen/SelectionDAGNodes.h"			#include "llvm/CodeGen/SelectionDAGNodes.h"
	#include "llvm/IR/DiagnosticInfo.h"			#include "llvm/IR/DiagnosticInfo.h"
	#include "llvm/IR/DiagnosticPrinter.h"			#include "llvm/IR/DiagnosticPrinter.h"
	#include "llvm/IR/Function.h"			#include "llvm/IR/Function.h"
	#include "llvm/IR/Intrinsics.h"			#include "llvm/IR/Intrinsics.h"
	#include "llvm/IR/IntrinsicsWebAssembly.h"			#include "llvm/IR/IntrinsicsWebAssembly.h"
				#include "llvm/IR/PatternMatch.h"
	#include "llvm/Support/Debug.h"			#include "llvm/Support/Debug.h"
	#include "llvm/Support/ErrorHandling.h"			#include "llvm/Support/ErrorHandling.h"
	#include "llvm/Support/KnownBits.h"			#include "llvm/Support/KnownBits.h"
	#include "llvm/Support/MathExtras.h"			#include "llvm/Support/MathExtras.h"
	#include "llvm/Support/raw_ostream.h"			#include "llvm/Support/raw_ostream.h"
	#include "llvm/Target/TargetOptions.h"			#include "llvm/Target/TargetOptions.h"
	using namespace llvm;			using namespace llvm;

	▲ Show 20 Lines • Show All 785 Lines • ▼ Show 20 Lines

	bool WebAssemblyTargetLowering::isOffsetFoldingLegal(			bool WebAssemblyTargetLowering::isOffsetFoldingLegal(
	const GlobalAddressSDNode *GA) const {			const GlobalAddressSDNode *GA) const {
	// Wasm doesn't support function addresses with offsets			// Wasm doesn't support function addresses with offsets
	const GlobalValue *GV = GA->getGlobal();			const GlobalValue *GV = GA->getGlobal();
	return isa<Function>(GV) ? false : TargetLowering::isOffsetFoldingLegal(GA);			return isa<Function>(GV) ? false : TargetLowering::isOffsetFoldingLegal(GA);
	}			}

				bool WebAssemblyTargetLowering::shouldSinkOperands(
				Instruction I, SmallVectorImpl<Use > &Ops) const {
				using namespace llvm::PatternMatch;

				if (!I->getType()->isVectorTy() \|\| !I->isShift())
				tlivelyUnsubmitted Not Done Reply Inline Actions Why allow null instruction pointers here? It would seem simpler to ensure that callers pass a valid instruction and to assume we have a valid instruction here. tlively: Why allow null instruction pointers here? It would seem simpler to ensure that callers pass a…
				YolandaCYAuthorUnsubmitted Done Reply Inline Actions This is a fast check on the outside block when visit the splat vector, and don't know yet if the splat vector will be used by a shift op. To identify the instruction I need to iterate all uses of the splat vector until we find the vector shift. Since this is only needed for WebAssembly target, I add a quick check here to reduce the cost for other platforms. Seems a little confusion, do you think we need to seperate it to two functions? YolandaCY: This is a fast check on the outside block when visit the splat vector, and don't know yet if…
				tlivelyUnsubmitted Not Done Reply Inline Actions Oh I see, that makes sense. tlively: Oh I see, that makes sense.
				return false;

				Value *V = I->getOperand(1);
				// We dont need to sink constant splat.
				tlivelyUnsubmitted Not Done Reply Inline Actions What is the benefit of including `isShiftAmountScalar()` here, given that we know it is always true? tlively: What is the benefit of including `isShiftAmountScalar()` here, given that we know it is always…
				YolandaCYAuthorUnsubmitted Done Reply Inline Actions This will be called in SelectionDAGBuilder to skip the optimizaiton for other platforms when visit shift. YolandaCY: This will be called in SelectionDAGBuilder to skip the optimizaiton for other platforms when…
				tlivelyUnsubmitted Not Done Reply Inline Actions We know that the `isShiftAmountScalar()` here will be the WebAssemblyTargetLowering version of `isShiftAmountScalar()`, so it will always be true, right? So this line could be: `return I->isShift() && I->getOperand(1) == Splat;` tlively: We know that the `isShiftAmountScalar()` here will be the WebAssemblyTargetLowering version of…
				if (dyn_cast<Constant>(V))
				return false;

				if (match(V, m_Shuffle(m_InsertElt(m_Value(), m_Value(), m_ZeroInt()),
				m_Value(), m_ZeroMask()))) {
				// Sink insert
				Ops.push_back(&cast<Instruction>(V)->getOperandUse(0));
				// Sink shuffle
				Ops.push_back(&I->getOperandUse(1));
				return true;
				}

				return false;
				}

	EVT WebAssemblyTargetLowering::getSetCCResultType(const DataLayout &DL,			EVT WebAssemblyTargetLowering::getSetCCResultType(const DataLayout &DL,
	LLVMContext &C,			LLVMContext &C,
	EVT VT) const {			EVT VT) const {
	if (VT.isVector())			if (VT.isVector())
	return VT.changeVectorElementTypeToInteger();			return VT.changeVectorElementTypeToInteger();

	// So far, all branch instructions in Wasm take an I32 condition.			// So far, all branch instructions in Wasm take an I32 condition.
	// The default TargetLowering::getSetCCResultType returns the pointer size,			// The default TargetLowering::getSetCCResultType returns the pointer size,
	▲ Show 20 Lines • Show All 2,016 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc < %s -verify-machineinstrs -mattr=+simd128 \| FileCheck %s
				tlivelyUnsubmitted Done Reply Inline Actions What do you think about using the auto-update script for this test? The output would be more verbose, but it would also be easy to update if anything changes, and I think it would be helpful to see that the whole function is emitted correctly. tlively: What do you think about using the auto-update script for this test? The output would be more…
				YolandaCYAuthorUnsubmitted Done Reply Inline Actions Sure. Previously I make it simple to avoid mismatch on unrelated changes. But if we have an auto-update script that would be helpful to verify the whole function directly. YolandaCY: Sure. Previously I make it simple to avoid mismatch on unrelated changes. But if we have an…

				; Test that SIMD shifts can be lowered correctly even when shift
				; values are exported from outside blocks.

				target triple = "wasm32-unknown-unknown"

				define void @shl_loop(ptr %a, i8 %shift, i32 %count) {
				; CHECK-LABEL: shl_loop:
				; CHECK: .functype shl_loop (i32, i32, i32) -> ()
				; CHECK-NEXT: # %bb.0: # %entry
				; CHECK-NEXT: .LBB0_1: # %body
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: loop # label0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 0
				tlivelyUnsubmitted Done Reply Inline Actions Can we add a test where the vshift is a phi, just to show that that still works correctly? tlively: Can we add a test where the vshift is a phi, just to show that that still works correctly?
				YolandaCYAuthorUnsubmitted Done Reply Inline Actions Sure. I have added one more test. YolandaCY: Sure. I have added one more test.
				; CHECK-NEXT: v128.load 0:p2align=0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shl
				; CHECK-NEXT: v128.store 16
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32.const 16
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.set 0
				; CHECK-NEXT: local.get 2
				; CHECK-NEXT: i32.const -1
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.tee 2
				; CHECK-NEXT: i32.eqz
				; CHECK-NEXT: br_if 0 # 0: up to label0
				; CHECK-NEXT: # %bb.2: # %exit
				; CHECK-NEXT: end_loop
				; CHECK-NEXT: # fallthrough-return
				entry:
				%t1 = insertelement <16 x i8> undef, i8 %shift, i32 0
				%vshift = shufflevector <16 x i8> %t1, <16 x i8> undef, <16 x i32> zeroinitializer
				br label %body
				body:
				%out = phi ptr [%a, %entry], [%b, %body]
				%i = phi i32 [0, %entry], [%next, %body]
				%v = load <16 x i8>, ptr %out, align 1
				%r = shl <16 x i8> %v, %vshift
				%b = getelementptr inbounds i8, ptr %out, i32 16
				store <16 x i8> %r, ptr %b
				%next = add i32 %i, 1
				%i.cmp = icmp eq i32 %next, %count
				br i1 %i.cmp, label %body, label %exit
				exit:
				ret void
				}

				; Test that SIMD shifts can be lowered correctly when shift value
				; is a phi inside loop body.

				define void @shl_phi_loop(ptr %a, i8 %shift, i32 %count) {
				; CHECK-LABEL: shl_phi_loop:
				; CHECK: .functype shl_phi_loop (i32, i32, i32) -> ()
				; CHECK-NEXT: # %bb.0: # %entry
				; CHECK-NEXT: .LBB1_1: # %body
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: loop # label1:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: v128.load 0:p2align=0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shl
				; CHECK-NEXT: v128.store 16
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i32.const 1
				; CHECK-NEXT: i32.and
				; CHECK-NEXT: local.set 1
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32.const 16
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.set 0
				; CHECK-NEXT: local.get 2
				; CHECK-NEXT: i32.const -1
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.tee 2
				; CHECK-NEXT: i32.eqz
				; CHECK-NEXT: br_if 0 # 0: up to label1
				; CHECK-NEXT: # %bb.2: # %exit
				; CHECK-NEXT: end_loop
				; CHECK-NEXT: # fallthrough-return
				entry:
				br label %body
				body:
				%out = phi ptr [%a, %entry], [%b, %body]
				%i = phi i32 [0, %entry], [%next, %body]
				%t1 = phi i8 [%shift, %entry], [%sand, %body]
				%t2 = insertelement <16 x i8> undef, i8 %t1, i32 0
				%vshift = shufflevector <16 x i8> %t2, <16 x i8> undef, <16 x i32> zeroinitializer
				%v = load <16 x i8>, ptr %out, align 1
				%r = shl <16 x i8> %v, %vshift
				%b = getelementptr inbounds i8, ptr %out, i32 16
				store <16 x i8> %r, ptr %b
				%sand = and i8 %t1, 1
				%next = add i32 %i, 1
				%i.cmp = icmp eq i32 %next, %count
				br i1 %i.cmp, label %body, label %exit
				exit:
				ret void
				}