This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/CodeGen/
-
llvm/
-
CodeGen/
1/1
SelectionDAG.h
-
TargetLowering.h
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
SelectionDAG.cpp
-
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
-
Target/WebAssembly/
-
WebAssembly/
-
WebAssemblyISelLowering.h
2/6
WebAssemblyISelLowering.cpp
-
test/CodeGen/WebAssembly/
-
CodeGen/
-
WebAssembly/
4/4
simd-shift-in-loop.ll

Differential D158399

[WebAssembly] Optimize vector shift using a splat value from outside block
ClosedPublic

Authored by YolandaCY on Aug 21 2023, 1:54 AM.

Download Raw Diff

Details

Reviewers

tlively
craig.topper

Commits

rG291101aa8ea5: [WebAssembly] Optimize vector shift using a splat value from outside block

Summary

The vector shift operation in WebAssembly uses an i32 shift amount type,
while the LLVM IR requires binary operator uses the same type of operands.
When the shift amount operand is splated from a different block, the splat source
will not be exported and the vector shift will be unrolled to scalar shifts.
This patch enables the vector shift to identify the splat source value from
the other block, and generate expected WebAssembly bytecode when lowering.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,030 ms	x64 debian > MLIR.Examples/standalone::test.toy

Event Timeline

YolandaCY created this revision.Aug 21 2023, 1:54 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 1:54 AM

Herald added subscribers: sunshaoce, pmatos, asb and 5 others. · View Herald Transcript

Harbormaster completed remote builds in B253790: Diff 551933.Aug 21 2023, 3:02 AM

This is to resolve a WebAssembly codegen issue when vector shift is used in a loop, while the shift amount is initialized outside the loop. Could you help take a look? Thanks!

Herald added a project: Restricted Project. · View Herald TranscriptAug 21 2023, 5:18 AM

Herald added subscribers: llvm-commits, aheejin. · View Herald Transcript

Thanks for the patch! It looks like this will be a nice improvement.

@craig.topper, it would be great to get your comments as well, as someone more familiar with the target-independent infrastructure here.

llvm/include/llvm/CodeGen/SelectionDAG.h
2459–2460	It would be good to add a comment describing the contents of this map.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
840	Why allow null instruction pointers here? It would seem simpler to ensure that callers pass a valid instruction and to assume we have a valid instruction here.
844	What is the benefit of including `isShiftAmountScalar()` here, given that we know it is always true?
llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll
2	What do you think about using the auto-update script for this test? The output would be more verbose, but it would also be easy to update if anything changes, and I think it would be helpful to see that the whole function is emitted correctly.
17	Can we add a test where the vshift is a phi, just to show that that still works correctly?

Add test and comments

Harbormaster completed remote builds in B254048: Diff 552298.Aug 22 2023, 4:34 AM

Thank you Thomas for the comments! Please see my updates in the new revision.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
840	This is a fast check on the outside block when visit the splat vector, and don't know yet if the splat vector will be used by a shift op. To identify the instruction I need to iterate all uses of the splat vector until we find the vector shift. Since this is only needed for WebAssembly target, I add a quick check here to reduce the cost for other platforms. Seems a little confusion, do you think we need to seperate it to two functions?
844	This will be called in SelectionDAGBuilder to skip the optimizaiton for other platforms when visit shift.
llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll
2	Sure. Previously I make it simple to avoid mismatch on unrelated changes. But if we have an auto-update script that would be helpful to verify the whole function directly.
17	Sure. I have added one more test.

tlively added inline comments.Aug 22 2023, 12:54 PM

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
840	Oh I see, that makes sense.
844	We know that the `isShiftAmountScalar()` here will be the WebAssemblyTargetLowering version of `isShiftAmountScalar()`, so it will always be true, right? So this line could be: `return I->isShift() && I->getOperand(1) == Splat;`

Could this use the TargetLowering::shouldSinkOperands hook to get CodeGenPrepare to move the splat into the loop. ARM, X86, and RISC-V all do that.

Use the shouldSinkOperands hook in CodeGenPrepare

In D158399#4607933, @craig.topper wrote:

Could this use the TargetLowering::shouldSinkOperands hook to get CodeGenPrepare to move the splat into the loop. ARM, X86, and RISC-V all do that.

Thanks for the suggestion! I have revised the code to use this existing hook. Please help take a look again. @craig.topper @tlively Thanks!

Harbormaster completed remote builds in B254586: Diff 553057.Aug 24 2023, 4:05 AM

Nice, this LGTM. Thanks for the tip, @craig.topper!

This revision is now accepted and ready to land.Aug 24 2023, 7:41 AM

In D158399#4613902, @tlively wrote:

Nice, this LGTM. Thanks for the tip, @craig.topper!

Thanks Thomas. Could you help me commit this change?

Sure thing. I'll use your author email from the other patches in your phabricator profile.

Closed by commit rG291101aa8ea5: [WebAssembly] Optimize vector shift using a splat value from outside block (authored by YolandaCY, committed by tlively). · Explain WhyAug 25 2023, 8:13 AM

This revision was automatically updated to reflect the committed changes.

tlively added a commit: rG291101aa8ea5: [WebAssembly] Optimize vector shift using a splat value from outside block.

In D158399#4617129, @tlively wrote:

Sure thing. I'll use your author email from the other patches in your phabricator profile.

OK, Thank you!

Revision Contents

Path

Size

llvm/

include/

llvm/

CodeGen/

SelectionDAG.h

23 lines

TargetLowering.h

7 lines

lib/

CodeGen/

SelectionDAG/

SelectionDAG.cpp

1 line

SelectionDAGBuilder.h

1 line

SelectionDAGBuilder.cpp

47 lines

Target/

WebAssembly/

WebAssemblyISelLowering.h

3 lines

WebAssemblyISelLowering.cpp

33 lines

test/

CodeGen/

WebAssembly/

simd-shift-in-loop.ll

104 lines

Diff 552298

llvm/include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 873 Lines • ▼ Show 20 Lines	#endif
/// either a BUILD_VECTOR or SPLAT_VECTOR depending on the		/// either a BUILD_VECTOR or SPLAT_VECTOR depending on the
/// scalability of the desired vector type.		/// scalability of the desired vector type.
SDValue getSplat(EVT VT, const SDLoc &DL, SDValue Op) {		SDValue getSplat(EVT VT, const SDLoc &DL, SDValue Op) {
assert(VT.isVector() && "Can't splat to non-vector type");		assert(VT.isVector() && "Can't splat to non-vector type");
return VT.isScalableVector() ?		return VT.isScalableVector() ?
getSplatVector(VT, DL, Op) : getSplatBuildVector(VT, DL, Op);		getSplatVector(VT, DL, Op) : getSplatBuildVector(VT, DL, Op);
}		}

		/// Returns an exported splat source from another block. This helps the
		/// WebAssembly target lowering for vector shift operation where i32 is used
		/// as shift amount value type.
		const Value getExportedSplatSource(const SDNode N, Register &Reg) const {
		auto I = ExportedSplatValueMap.find(N);
		if (I != ExportedSplatValueMap.end()) {
		Reg = I->second.second;
		return I->second.first;
		}
		return nullptr;
		}

		/// Set exported splat source mapping to the splat node.
		void addExportedSplatSource(const SDNode N, const Value V,
		const unsigned Reg) {
		ExportedSplatValueMap[N] = {V, Reg};
		}

/// Returns a vector of type ResVT whose elements contain the linear sequence		/// Returns a vector of type ResVT whose elements contain the linear sequence
/// <0, Step, Step * 2, Step * 3, ...>		/// <0, Step, Step * 2, Step * 3, ...>
SDValue getStepVector(const SDLoc &DL, EVT ResVT, APInt StepVal);		SDValue getStepVector(const SDLoc &DL, EVT ResVT, APInt StepVal);

/// Returns a vector of type ResVT whose elements contain the linear sequence		/// Returns a vector of type ResVT whose elements contain the linear sequence
/// <0, 1, 2, 3, ...>		/// <0, 1, 2, 3, ...>
SDValue getStepVector(const SDLoc &DL, EVT ResVT);		SDValue getStepVector(const SDLoc &DL, EVT ResVT);

▲ Show 20 Lines • Show All 1,543 Lines • ▼ Show 20 Lines	private:
std::vector<CondCodeSDNode*> CondCodeNodes;		std::vector<CondCodeSDNode*> CondCodeNodes;

std::vector<SDNode*> ValueTypeNodes;		std::vector<SDNode*> ValueTypeNodes;
std::map<EVT, SDNode*, EVT::compareRawBits> ExtendedValueTypeNodes;		std::map<EVT, SDNode*, EVT::compareRawBits> ExtendedValueTypeNodes;
StringMap<SDNode*> ExternalSymbols;		StringMap<SDNode*> ExternalSymbols;

std::map<std::pair<std::string, unsigned>, SDNode *> TargetExternalSymbols;		std::map<std::pair<std::string, unsigned>, SDNode *> TargetExternalSymbols;
DenseMap<MCSymbol , SDNode > MCSymbols;		DenseMap<MCSymbol , SDNode > MCSymbols;

		/// Maps the splat source value to splat vector when they are exported from
		tlivelyUnsubmitted Done Reply Inline Actions It would be good to add a comment describing the contents of this map. tlively: It would be good to add a comment describing the contents of this map.
		/// external block. This information is needed for later WebAssembly lowering.
		DenseMap<const SDNode , std::pair<const Value , unsigned>>
		ExportedSplatValueMap;

FlagInserter *Inserter = nullptr;		FlagInserter *Inserter = nullptr;
};		};

template <> struct GraphTraits<SelectionDAG> : public GraphTraits<SDNode> {		template <> struct GraphTraits<SelectionDAG> : public GraphTraits<SDNode> {
using nodes_iterator = pointer_iterator<SelectionDAG::allnodes_iterator>;		using nodes_iterator = pointer_iterator<SelectionDAG::allnodes_iterator>;

static nodes_iterator nodes_begin(SelectionDAG *G) {		static nodes_iterator nodes_begin(SelectionDAG *G) {
return nodes_iterator(G->allnodes_begin());		return nodes_iterator(G->allnodes_begin());
Show All 10 Lines

llvm/include/llvm/CodeGen/TargetLowering.h

Show First 20 Lines • Show All 1,114 Lines • ▼ Show 20 Lines	public:
/// Returns true if the target can instruction select the specified FP		/// Returns true if the target can instruction select the specified FP
/// immediate natively. If false, the legalizer will materialize the FP		/// immediate natively. If false, the legalizer will materialize the FP
/// immediate as a load from a constant pool.		/// immediate as a load from a constant pool.
virtual bool isFPImmLegal(const APFloat & /Imm/, EVT /VT/,		virtual bool isFPImmLegal(const APFloat & /Imm/, EVT /VT/,
bool ForCodeSize = false) const {		bool ForCodeSize = false) const {
return false;		return false;
}		}

		virtual bool isShiftAmountScalar() const { return false; }

		virtual bool hasSplatValueUseForVectorOp(const Instruction *I = nullptr,
		const Value *Splat = nullptr) const {
		return false;
		}

/// Targets can use this to indicate that they only support some		/// Targets can use this to indicate that they only support some
/// VECTOR_SHUFFLE operations, those with specific masks. By default, if a		/// VECTOR_SHUFFLE operations, those with specific masks. By default, if a
/// target supports the VECTOR_SHUFFLE node, all mask values are assumed to be		/// target supports the VECTOR_SHUFFLE node, all mask values are assumed to be
/// legal.		/// legal.
virtual bool isShuffleMaskLegal(ArrayRef<int> /Mask/, EVT /VT/) const {		virtual bool isShuffleMaskLegal(ArrayRef<int> /Mask/, EVT /VT/) const {
return true;		return true;
}		}

▲ Show 20 Lines • Show All 4,228 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,396 Lines • ▼ Show 20 Lines	void SelectionDAG::clear() {
OperandRecycler.clear(OperandAllocator);		OperandRecycler.clear(OperandAllocator);
OperandAllocator.Reset();		OperandAllocator.Reset();
CSEMap.clear();		CSEMap.clear();

ExtendedValueTypeNodes.clear();		ExtendedValueTypeNodes.clear();
ExternalSymbols.clear();		ExternalSymbols.clear();
TargetExternalSymbols.clear();		TargetExternalSymbols.clear();
MCSymbols.clear();		MCSymbols.clear();
		ExportedSplatValueMap.clear();
SDEI.clear();		SDEI.clear();
std::fill(CondCodeNodes.begin(), CondCodeNodes.end(),		std::fill(CondCodeNodes.begin(), CondCodeNodes.end(),
static_cast<CondCodeSDNode*>(nullptr));		static_cast<CondCodeSDNode*>(nullptr));
std::fill(ValueTypeNodes.begin(), ValueTypeNodes.end(),		std::fill(ValueTypeNodes.begin(), ValueTypeNodes.end(),
static_cast<SDNode*>(nullptr));		static_cast<SDNode*>(nullptr));

EntryNode.UseList = nullptr;		EntryNode.UseList = nullptr;
InsertNode(&EntryNode);		InsertNode(&EntryNode);
▲ Show 20 Lines • Show All 11,393 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 408 Lines • ▼ Show 20 Lines	void EmitBranchForMergedCondition(const Value Cond, MachineBasicBlock TBB,
MachineBasicBlock *FBB,		MachineBasicBlock *FBB,
MachineBasicBlock *CurBB,		MachineBasicBlock *CurBB,
MachineBasicBlock *SwitchBB,		MachineBasicBlock *SwitchBB,
BranchProbability TProb, BranchProbability FProb,		BranchProbability TProb, BranchProbability FProb,
bool InvertCond);		bool InvertCond);
bool ShouldEmitAsBranches(const std::vector<SwitchCG::CaseBlock> &Cases);		bool ShouldEmitAsBranches(const std::vector<SwitchCG::CaseBlock> &Cases);
bool isExportableFromCurrentBlock(const Value V, const BasicBlock FromBB);		bool isExportableFromCurrentBlock(const Value V, const BasicBlock FromBB);
void CopyToExportRegsIfNeeded(const Value *V);		void CopyToExportRegsIfNeeded(const Value *V);
		Register GetExportReg(const Value *V);
void ExportFromCurrentBlock(const Value *V);		void ExportFromCurrentBlock(const Value *V);
void LowerCallTo(const CallBase &CB, SDValue Callee, bool IsTailCall,		void LowerCallTo(const CallBase &CB, SDValue Callee, bool IsTailCall,
bool IsMustTailCall, const BasicBlock *EHPadBB = nullptr);		bool IsMustTailCall, const BasicBlock *EHPadBB = nullptr);

// Lower range metadata from 0 to N to assert zext to an integer of nearest		// Lower range metadata from 0 to N to assert zext to an integer of nearest
// floor power of two.		// floor power of two.
SDValue lowerRangeToAssertZExt(SelectionDAG &DAG, const Instruction &I,		SDValue lowerRangeToAssertZExt(SelectionDAG &DAG, const Instruction &I,
SDValue Op);		SDValue Op);
▲ Show 20 Lines • Show All 379 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 820 Lines • ▼ Show 20 Lines	RegsForValue::RegsForValue(const SmallVector<unsigned, 4> &regs, MVT regvt,
EVT valuevt, std::optional<CallingConv::ID> CC)		EVT valuevt, std::optional<CallingConv::ID> CC)
: ValueVTs(1, valuevt), RegVTs(1, regvt), Regs(regs),		: ValueVTs(1, valuevt), RegVTs(1, regvt), Regs(regs),
RegCount(1, regs.size()), CallConv(CC) {}		RegCount(1, regs.size()), CallConv(CC) {}

RegsForValue::RegsForValue(LLVMContext &Context, const TargetLowering &TLI,		RegsForValue::RegsForValue(LLVMContext &Context, const TargetLowering &TLI,
const DataLayout &DL, unsigned Reg, Type *Ty,		const DataLayout &DL, unsigned Reg, Type *Ty,
std::optional<CallingConv::ID> CC) {		std::optional<CallingConv::ID> CC) {
ComputeValueVTs(TLI, DL, Ty, ValueVTs);		ComputeValueVTs(TLI, DL, Ty, ValueVTs);

CallConv = CC;		CallConv = CC;

for (EVT ValueVT : ValueVTs) {		for (EVT ValueVT : ValueVTs) {
unsigned NumRegs =		unsigned NumRegs =
isABIMangled()		isABIMangled()
? TLI.getNumRegistersForCallingConv(Context, *CC, ValueVT)		? TLI.getNumRegistersForCallingConv(Context, *CC, ValueVT)
: TLI.getNumRegisters(Context, ValueVT);		: TLI.getNumRegisters(Context, ValueVT);
MVT RegisterVT =		MVT RegisterVT =
▲ Show 20 Lines • Show All 1,327 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitRet(const ReturnInst &I) {
// Update the DAG with the new chain value resulting from return lowering.		// Update the DAG with the new chain value resulting from return lowering.
DAG.setRoot(Chain);		DAG.setRoot(Chain);
}		}

/// CopyToExportRegsIfNeeded - If the given value has virtual registers		/// CopyToExportRegsIfNeeded - If the given value has virtual registers
/// created for it, emit nodes to copy the value into the virtual		/// created for it, emit nodes to copy the value into the virtual
/// registers.		/// registers.
void SelectionDAGBuilder::CopyToExportRegsIfNeeded(const Value *V) {		void SelectionDAGBuilder::CopyToExportRegsIfNeeded(const Value *V) {
		if (Register reg = GetExportReg(V)) {
		assert((!V->use_empty() \|\| isa<CallBrInst>(V)) &&
		"Unused value assigned virtual registers!");
		CopyValueToVirtualRegister(V, reg);
		}
		}

		Register SelectionDAGBuilder::GetExportReg(const Value *V) {
// Skip empty types		// Skip empty types
if (V->getType()->isEmptyTy())		if (V->getType()->isEmptyTy())
return;		return Register();

DenseMap<const Value *, Register>::iterator VMI = FuncInfo.ValueMap.find(V);		DenseMap<const Value *, Register>::iterator VMI = FuncInfo.ValueMap.find(V);
if (VMI != FuncInfo.ValueMap.end()) {		return (VMI != FuncInfo.ValueMap.end()) ? VMI->second : Register();
assert((!V->use_empty() \|\| isa<CallBrInst>(V)) &&
"Unused value assigned virtual registers!");
CopyValueToVirtualRegister(V, VMI->second);
}
}		}

/// ExportFromCurrentBlock - If this condition isn't known to be exported from		/// ExportFromCurrentBlock - If this condition isn't known to be exported from
/// the current basic block, add it to ValueMap now so that we'll get a		/// the current basic block, add it to ValueMap now so that we'll get a
/// CopyTo/FromReg.		/// CopyTo/FromReg.
void SelectionDAGBuilder::ExportFromCurrentBlock(const Value *V) {		void SelectionDAGBuilder::ExportFromCurrentBlock(const Value *V) {
// No need to export constants.		// No need to export constants.
if (!isa<Instruction>(V) && !isa<Argument>(V)) return;		if (!isa<Instruction>(V) && !isa<Argument>(V)) return;
▲ Show 20 Lines • Show All 1,076 Lines • ▼ Show 20 Lines	SDValue BinNodeValue = DAG.getNode(Opcode, getCurSDLoc(), Op1.getValueType(),
Op1, Op2, Flags);		Op1, Op2, Flags);
setValue(&I, BinNodeValue);		setValue(&I, BinNodeValue);
}		}

void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {		void SelectionDAGBuilder::visitShift(const User &I, unsigned Opcode) {
SDValue Op1 = getValue(I.getOperand(0));		SDValue Op1 = getValue(I.getOperand(0));
SDValue Op2 = getValue(I.getOperand(1));		SDValue Op2 = getValue(I.getOperand(1));

EVT ShiftTy = DAG.getTargetLoweringInfo().getShiftAmountTy(		auto &TLI = DAG.getTargetLoweringInfo();
Op1.getValueType(), DAG.getDataLayout());		Value *V2 = I.getOperand(1);
		if (I.getType()->isVectorTy() && TLI.isShiftAmountScalar() &&
		!(NodeMap[V2].getNode())) {
		const Value *splat = getSplatValue(V2);
		if (splat && !isa<Constant>(splat)) {
		assert(FuncInfo.isExportedInst(splat) && "Splat value is not exported");
		// TODO: It's possible to be mapped to multiple splat vectors.
		DenseMap<const Value *, Register>::iterator It =
		FuncInfo.ValueMap.find(splat);
		if (It != FuncInfo.ValueMap.end()) {
		DAG.addExportedSplatSource(Op2.getNode(), splat, It->second);
		}
		}
		}

		EVT ShiftTy = TLI.getShiftAmountTy(Op1.getValueType(), DAG.getDataLayout());

// Coerce the shift amount to the right type if we can. This exposes the		// Coerce the shift amount to the right type if we can. This exposes the
// truncate or zext to optimization early.		// truncate or zext to optimization early.
if (!I.getType()->isVectorTy() && Op2.getValueType() != ShiftTy) {		if (!I.getType()->isVectorTy() && Op2.getValueType() != ShiftTy) {
assert(ShiftTy.getSizeInBits() >= Log2_32_Ceil(Op1.getValueSizeInBits()) &&		assert(ShiftTy.getSizeInBits() >= Log2_32_Ceil(Op1.getValueSizeInBits()) &&
"Unexpected shift type");		"Unexpected shift type");
Op2 = DAG.getZExtOrTrunc(Op2, getCurSDLoc(), ShiftTy);		Op2 = DAG.getZExtOrTrunc(Op2, getCurSDLoc(), ShiftTy);
}		}
▲ Show 20 Lines • Show All 413 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitShuffleVector(const User &I) {
// The DAGCombiner will perform a BUILD_VECTOR -> SPLAT_VECTOR transformation		// The DAGCombiner will perform a BUILD_VECTOR -> SPLAT_VECTOR transformation
// for targets that support a SPLAT_VECTOR for non-scalable vector types.		// for targets that support a SPLAT_VECTOR for non-scalable vector types.
assert(!VT.isScalableVector() && "Unsupported scalable vector shuffle");		assert(!VT.isScalableVector() && "Unsupported scalable vector shuffle");

unsigned SrcNumElts = SrcVT.getVectorNumElements();		unsigned SrcNumElts = SrcVT.getVectorNumElements();
unsigned MaskNumElts = Mask.size();		unsigned MaskNumElts = Mask.size();

if (SrcNumElts == MaskNumElts) {		if (SrcNumElts == MaskNumElts) {
		if (TLI.hasSplatValueUseForVectorOp() && GetExportReg(&I)) {
		if (const Value *splat = getSplatValue(&I)) {
		for (const Use &U : I.uses()) {
		Instruction *UserI = cast<Instruction>(U.getUser());
		if (UserI->getType()->isVectorTy() &&
		TLI.hasSplatValueUseForVectorOp(UserI, &I)) {
		ExportFromCurrentBlock(splat);
		}
		}
		}
		}
setValue(&I, DAG.getVectorShuffle(VT, DL, Src1, Src2, Mask));		setValue(&I, DAG.getVectorShuffle(VT, DL, Src1, Src2, Mask));
return;		return;
}		}

// Normalize the shuffle vector since mask and vector length don't match.		// Normalize the shuffle vector since mask and vector length don't match.
if (SrcNumElts < MaskNumElts) {		if (SrcNumElts < MaskNumElts) {
// Mask is longer than the source vectors. We can use concatenate vector to		// Mask is longer than the source vectors. We can use concatenate vector to
// make the mask and vectors lengths match.		// make the mask and vectors lengths match.
▲ Show 20 Lines • Show All 8,264 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS,		unsigned AS,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;
bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, Align Alignment,		bool allowsMisalignedMemoryAccesses(EVT, unsigned AddrSpace, Align Alignment,
MachineMemOperand::Flags Flags,		MachineMemOperand::Flags Flags,
unsigned *Fast) const override;		unsigned *Fast) const override;
bool isIntDivCheap(EVT VT, AttributeList Attr) const override;		bool isIntDivCheap(EVT VT, AttributeList Attr) const override;
bool isVectorLoadExtDesirable(SDValue ExtVal) const override;		bool isVectorLoadExtDesirable(SDValue ExtVal) const override;
bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;		bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
		bool isShiftAmountScalar() const override;
		bool hasSplatValueUseForVectorOp(const Instruction *I = nullptr,
		const Value *Splat = nullptr) const override;
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &Context,
EVT VT) const override;		EVT VT) const override;
bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,		bool getTgtMemIntrinsic(IntrinsicInfo &Info, const CallInst &I,
MachineFunction &MF,		MachineFunction &MF,
unsigned Intrinsic) const override;		unsigned Intrinsic) const override;

void computeKnownBitsForTargetNode(const SDValue Op, KnownBits &Known,		void computeKnownBitsForTargetNode(const SDValue Op, KnownBits &Known,
const APInt &DemandedElts,		const APInt &DemandedElts,
▲ Show 20 Lines • Show All 71 Lines • Show Last 20 Lines

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 827 Lines • ▼ Show 20 Lines

bool WebAssemblyTargetLowering::isOffsetFoldingLegal(		bool WebAssemblyTargetLowering::isOffsetFoldingLegal(
const GlobalAddressSDNode *GA) const {		const GlobalAddressSDNode *GA) const {
// Wasm doesn't support function addresses with offsets		// Wasm doesn't support function addresses with offsets
const GlobalValue *GV = GA->getGlobal();		const GlobalValue *GV = GA->getGlobal();
return isa<Function>(GV) ? false : TargetLowering::isOffsetFoldingLegal(GA);		return isa<Function>(GV) ? false : TargetLowering::isOffsetFoldingLegal(GA);
}		}

		bool WebAssemblyTargetLowering::isShiftAmountScalar() const { return true; }

		bool WebAssemblyTargetLowering::hasSplatValueUseForVectorOp(
		const Instruction I, const Value Splat) const {
		if (!I) {
		tlivelyUnsubmitted Not Done Reply Inline Actions Why allow null instruction pointers here? It would seem simpler to ensure that callers pass a valid instruction and to assume we have a valid instruction here. tlively: Why allow null instruction pointers here? It would seem simpler to ensure that callers pass a…
		YolandaCYAuthorUnsubmitted Done Reply Inline Actions This is a fast check on the outside block when visit the splat vector, and don't know yet if the splat vector will be used by a shift op. To identify the instruction I need to iterate all uses of the splat vector until we find the vector shift. Since this is only needed for WebAssembly target, I add a quick check here to reduce the cost for other platforms. Seems a little confusion, do you think we need to seperate it to two functions? YolandaCY: This is a fast check on the outside block when visit the splat vector, and don't know yet if…
		tlivelyUnsubmitted Not Done Reply Inline Actions Oh I see, that makes sense. tlively: Oh I see, that makes sense.
		return isShiftAmountScalar();
		}

		return I->isShift() && isShiftAmountScalar() && I->getOperand(1) == Splat;
		tlivelyUnsubmitted Not Done Reply Inline Actions What is the benefit of including `isShiftAmountScalar()` here, given that we know it is always true? tlively: What is the benefit of including `isShiftAmountScalar()` here, given that we know it is always…
		YolandaCYAuthorUnsubmitted Done Reply Inline Actions This will be called in SelectionDAGBuilder to skip the optimizaiton for other platforms when visit shift. YolandaCY: This will be called in SelectionDAGBuilder to skip the optimizaiton for other platforms when…
		tlivelyUnsubmitted Not Done Reply Inline Actions We know that the `isShiftAmountScalar()` here will be the WebAssemblyTargetLowering version of `isShiftAmountScalar()`, so it will always be true, right? So this line could be: `return I->isShift() && I->getOperand(1) == Splat;` tlively: We know that the `isShiftAmountScalar()` here will be the WebAssemblyTargetLowering version of…
		}

EVT WebAssemblyTargetLowering::getSetCCResultType(const DataLayout &DL,		EVT WebAssemblyTargetLowering::getSetCCResultType(const DataLayout &DL,
LLVMContext &C,		LLVMContext &C,
EVT VT) const {		EVT VT) const {
if (VT.isVector())		if (VT.isVector())
return VT.changeVectorElementTypeToInteger();		return VT.changeVectorElementTypeToInteger();

// So far, all branch instructions in Wasm take an I32 condition.		// So far, all branch instructions in Wasm take an I32 condition.
// The default TargetLowering::getSetCCResultType returns the pointer size,		// The default TargetLowering::getSetCCResultType returns the pointer size,
▲ Show 20 Lines • Show All 1,534 Lines • ▼ Show 20 Lines	if (MaskOp.getValueType().isVector()) {
MaskOp = LHS;		MaskOp = LHS;
}		}

return MaskOp;		return MaskOp;
};		};

// Skip vector and operation		// Skip vector and operation
ShiftVal = SkipImpliedMask(ShiftVal, LaneBits - 1);		ShiftVal = SkipImpliedMask(ShiftVal, LaneBits - 1);
		if (ShiftVal.getValueType().isVector()) {
		auto SavedShiftVal = ShiftVal;
ShiftVal = DAG.getSplatValue(ShiftVal);		ShiftVal = DAG.getSplatValue(ShiftVal);
if (!ShiftVal)		if (!ShiftVal) {
		Register InReg;
		if (auto splat =
		DAG.getExportedSplatSource(SavedShiftVal.getNode(), InReg)) {
		EVT RegisterVT = getRegisterType(
		splat->getContext(),
		getValueType(DAG.getDataLayout(), splat->getType()));
		ShiftVal =
		DAG.getCopyFromReg(DAG.getEntryNode(), DL, InReg, RegisterVT);
		}

		if (!ShiftVal) {
return unrollVectorShift(Op, DAG);		return unrollVectorShift(Op, DAG);
		}
		}
		}

// Skip scalar and operation		// Skip scalar and operation
ShiftVal = SkipImpliedMask(ShiftVal, LaneBits - 1);		ShiftVal = SkipImpliedMask(ShiftVal, LaneBits - 1);
// Use anyext because none of the high bits can affect the shift		// Use anyext because none of the high bits can affect the shift
ShiftVal = DAG.getAnyExtOrTrunc(ShiftVal, DL, MVT::i32);		ShiftVal = DAG.getAnyExtOrTrunc(ShiftVal, DL, MVT::i32);

unsigned Opcode;		unsigned Opcode;
switch (Op.getOpcode()) {		switch (Op.getOpcode()) {
▲ Show 20 Lines • Show All 463 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 2
				; RUN: llc < %s -verify-machineinstrs -mattr=+simd128 \| FileCheck %s
				tlivelyUnsubmitted Done Reply Inline Actions What do you think about using the auto-update script for this test? The output would be more verbose, but it would also be easy to update if anything changes, and I think it would be helpful to see that the whole function is emitted correctly. tlively: What do you think about using the auto-update script for this test? The output would be more…
				YolandaCYAuthorUnsubmitted Done Reply Inline Actions Sure. Previously I make it simple to avoid mismatch on unrelated changes. But if we have an auto-update script that would be helpful to verify the whole function directly. YolandaCY: Sure. Previously I make it simple to avoid mismatch on unrelated changes. But if we have an…

				; Test that SIMD shifts can be lowered correctly even when shift
				; values are exported from outside blocks.

				target triple = "wasm32-unknown-unknown"

				define void @shl_loop(ptr %a, i8 %shift, i32 %count) {
				; CHECK-LABEL: shl_loop:
				; CHECK: .functype shl_loop (i32, i32, i32) -> ()
				; CHECK-NEXT: # %bb.0: # %entry
				; CHECK-NEXT: .LBB0_1: # %body
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: loop # label0:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 0
				tlivelyUnsubmitted Done Reply Inline Actions Can we add a test where the vshift is a phi, just to show that that still works correctly? tlively: Can we add a test where the vshift is a phi, just to show that that still works correctly?
				YolandaCYAuthorUnsubmitted Done Reply Inline Actions Sure. I have added one more test. YolandaCY: Sure. I have added one more test.
				; CHECK-NEXT: v128.load 0:p2align=0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shl
				; CHECK-NEXT: v128.store 16
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32.const 16
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.set 0
				; CHECK-NEXT: local.get 2
				; CHECK-NEXT: i32.const -1
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.tee 2
				; CHECK-NEXT: i32.eqz
				; CHECK-NEXT: br_if 0 # 0: up to label0
				; CHECK-NEXT: # %bb.2: # %exit
				; CHECK-NEXT: end_loop
				; CHECK-NEXT: # fallthrough-return
				entry:
				%t1 = insertelement <16 x i8> undef, i8 %shift, i32 0
				%vshift = shufflevector <16 x i8> %t1, <16 x i8> undef, <16 x i32> zeroinitializer
				br label %body
				body:
				%out = phi ptr [%a, %entry], [%b, %body]
				%i = phi i32 [0, %entry], [%next, %body]
				%v = load <16 x i8>, ptr %out, align 1
				%r = shl <16 x i8> %v, %vshift
				%b = getelementptr inbounds i8, ptr %out, i32 16
				store <16 x i8> %r, ptr %b
				%next = add i32 %i, 1
				%i.cmp = icmp eq i32 %next, %count
				br i1 %i.cmp, label %body, label %exit
				exit:
				ret void
				}

				; Test that SIMD shifts can be lowered correctly when shift value
				; is a phi inside loop body.

				define void @shl_phi_loop(ptr %a, i8 %shift, i32 %count) {
				; CHECK-LABEL: shl_phi_loop:
				; CHECK: .functype shl_phi_loop (i32, i32, i32) -> ()
				; CHECK-NEXT: # %bb.0: # %entry
				; CHECK-NEXT: .LBB1_1: # %body
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: loop # label1:
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: v128.load 0:p2align=0
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i8x16.shl
				; CHECK-NEXT: v128.store 16
				; CHECK-NEXT: local.get 1
				; CHECK-NEXT: i32.const 1
				; CHECK-NEXT: i32.and
				; CHECK-NEXT: local.set 1
				; CHECK-NEXT: local.get 0
				; CHECK-NEXT: i32.const 16
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.set 0
				; CHECK-NEXT: local.get 2
				; CHECK-NEXT: i32.const -1
				; CHECK-NEXT: i32.add
				; CHECK-NEXT: local.tee 2
				; CHECK-NEXT: i32.eqz
				; CHECK-NEXT: br_if 0 # 0: up to label1
				; CHECK-NEXT: # %bb.2: # %exit
				; CHECK-NEXT: end_loop
				; CHECK-NEXT: # fallthrough-return
				entry:
				br label %body
				body:
				%out = phi ptr [%a, %entry], [%b, %body]
				%i = phi i32 [0, %entry], [%next, %body]
				%t1 = phi i8 [%shift, %entry], [%sand, %body]
				%t2 = insertelement <16 x i8> undef, i8 %t1, i32 0
				%vshift = shufflevector <16 x i8> %t2, <16 x i8> undef, <16 x i32> zeroinitializer
				%v = load <16 x i8>, ptr %out, align 1
				%r = shl <16 x i8> %v, %vshift
				%b = getelementptr inbounds i8, ptr %out, i32 16
				store <16 x i8> %r, ptr %b
				%sand = and i8 %t1, 1
				%next = add i32 %i, 1
				%i.cmp = icmp eq i32 %next, %count
				br i1 %i.cmp, label %body, label %exit
				exit:
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Optimize vector shift using a splat value from outside blockClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 552298

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/include/llvm/CodeGen/TargetLowering.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

llvm/test/CodeGen/WebAssembly/simd-shift-in-loop.ll

[WebAssembly] Optimize vector shift using a splat value from outside block
ClosedPublic