Diff 551243

mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h

	Show All 22 Lines
	class MemRefType;			class MemRefType;

	namespace memref {			namespace memref {

	/// Returns true, if the memref type has static shapes and represents a			/// Returns true, if the memref type has static shapes and represents a
	/// contiguous chunk of memory.			/// contiguous chunk of memory.
	bool isStaticShapeAndContiguousRowMajor(MemRefType type);			bool isStaticShapeAndContiguousRowMajor(MemRefType type);

	/// Returns the flattened 1-D memref and linearized offset for narrow type			/// For a `memref` with `offset`, `sizes` and `strides`, returns the
	/// emulation.			/// offset and size to use for the linearized `memref`.
	///			/// - If the linearization is done for emulating load/stores of
	/// The emulation only works on 1D memref types. To make this work on N-D			/// element type with bitwidth `srcBits` using element type with
	/// memref, we need to linearize the offset.			/// bitwidth `dstBits`, the linearized offset and size are
	///			/// scaled down by `dstBits`/`srcBits`.
	/// For example, to emulate i4 to i8, the following op:			/// - If `indices` is provided, it represents the position in the
	///			/// original `memref` being accessed. The method then returns the
	/// %0 = memref.load %arg0[%v0, %v1] :			/// index to use in the linearized `memref`. The linearized index
	/// memref<?x?xi4, strided<[?, ?], offset: ?>>			/// is also scaled down by `dstBits`/`srcBits`. If `indices` is not provided
	///			/// 0, is returned for the linearized index.
	/// can be replaced with			struct LinearizedMemRefInfo {
	///			OpFoldResult linearizedOffset;
	/// %b, %offset, %sizes:2, %strides:2 = memref.extract_strided_metadata %0			OpFoldResult linearizedSize;
	///			};
	/// %linearized_offset = %v0 * %stride#0 + %v1 * %stride#1			std::pair<LinearizedMemRefInfo, OpFoldResult> getLinearizedMemRefOffsetAndSize(
	/// %linearized_size = %size0 * %size1			OpBuilder &builder, Location loc, int srcBits, int dstBits,
	/// %scaled_linear_offset = %linearized_offset / 8 * 4			OpFoldResult offset, ArrayRef<OpFoldResult> sizes,
	/// %scaled_base_offset = %offset / 8 * 4			ArrayRef<OpFoldResult> strides, ArrayRef<OpFoldResult> indices = {});
	///
	/// %linearized = memref.reinterpret_cast %b, offset = [%scaled_base_offset],			/// For a `memref` with `offset` and `sizes`, returns the
	/// sizes = [%linearized_size], strides = [%stride#1]			/// offset and size to use for the linearized `memref`, assuming that
	///			/// the strides are computed from a row-major ordering of the sizes;
	/// %new_load = memref.load %linearized[%scaled_linear_offset] :			/// - If the linearization is done for emulating load/stores of
	/// memref<?xi8, strided<[?], offset: ?>>			/// element type with bitwidth `srcBits` using element type with
	std::pair<Value, Value>			/// bitwidth `dstBits`, the linearized offset and size are
	getLinearizeMemRefAndOffset(Location loc, MemRefType sourceType, int srcBits,			/// scaled down by `dstBits`/`srcBits`.
	int dstBits, SmallVector<Value> indices,			LinearizedMemRefInfo
	memref::ExtractStridedMetadataOp stridedMetadata,			getLinearizedMemRefOffsetAndSize(OpBuilder &builder, Location loc, int srcBits,
	OpBuilder &builder);			int dstBits, OpFoldResult offset,
				ArrayRef<OpFoldResult> sizes);

				hanchungUnsubmitted Not Done Reply Inline Actions Can we just break it into two functions? Looking through the comment and usage, they are two methods with the same interface to me. Breaking it into two function and use them correctly helps readability a lot, and people won't be confused why `std::ignore` is used; they dont have to get back to the comments. The method name should just tell them what they get. hanchung: Can we just break it into two functions? Looking through the comment and usage, they are two…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions I think the `std::ignore` is unrelated. Its about whether the caller needs `LinearizedMemRefInfo` or `linearizedIndices`. Depending on the use case you need one or the other, and the logic to compute these are pretty much similar. So I'd rather not have a combinatorial explosion in the number of API entries, with different callers having to do pretty issue the same sequence of calls. There is no IR overhead since all of this is done through `makeFoldedComposedAffineApplyOp`s. mravishankar: I think the `std::ignore` is unrelated. Its about whether the caller needs…
	} // namespace memref			} // namespace memref
	} // namespace mlir			} // namespace mlir

	#endif // MLIR_DIALECT_MEMREF_UTILS_MEMREFUTILS_H			#endif // MLIR_DIALECT_MEMREF_UTILS_MEMREFUTILS_H

mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp

	Show All 29 Lines

	/// When data is loaded/stored in `targetBits` granularity, but is used in			/// When data is loaded/stored in `targetBits` granularity, but is used in
	/// `sourceBits` granularity (`sourceBits` < `targetBits`), the `targetBits` is			/// `sourceBits` granularity (`sourceBits` < `targetBits`), the `targetBits` is
	/// treated as an array of elements of width `sourceBits`.			/// treated as an array of elements of width `sourceBits`.
	/// Return the bit offset of the value at position `srcIdx`. For example, if			/// Return the bit offset of the value at position `srcIdx`. For example, if
	/// `sourceBits` equals to 4 and `targetBits` equals to 8, the x-th element is			/// `sourceBits` equals to 4 and `targetBits` equals to 8, the x-th element is
	/// located at (x % 2) * 4. Because there are two elements in one i8, and one			/// located at (x % 2) * 4. Because there are two elements in one i8, and one
	/// element has 4 bits.			/// element has 4 bits.
	static Value getOffsetForBitwidth(Location loc, Value srcIdx, int sourceBits,			static Value getOffsetForBitwidth(Location loc, OpFoldResult srcIdx,
	int targetBits, OpBuilder &builder) {			int sourceBits, int targetBits,
				OpBuilder &builder) {
	assert(targetBits % sourceBits == 0);			assert(targetBits % sourceBits == 0);
	IntegerType targetType = builder.getIntegerType(targetBits);			AffineExpr s0;
	IntegerAttr idxAttr =			bindSymbols(builder.getContext(), s0);
	builder.getIntegerAttr(targetType, targetBits / sourceBits);			int scaleFactor = targetBits / sourceBits;
	auto idx = builder.create<arith::ConstantOp>(loc, targetType, idxAttr);			OpFoldResult offsetVal = affine::makeComposedFoldedAffineApply(
	IntegerAttr srcBitsAttr = builder.getIntegerAttr(targetType, sourceBits);			builder, loc, (s0 % scaleFactor) * sourceBits, {srcIdx});
	auto srcBitsValue =			Value bitOffset = getValueOrCreateConstantIndexOp(builder, loc, offsetVal);
	builder.create<arith::ConstantOp>(loc, targetType, srcBitsAttr);			IntegerType dstType = builder.getIntegerType(targetBits);
	auto m = builder.create<arith::RemUIOp>(loc, srcIdx, idx);			return builder.create<arith::IndexCastOp>(loc, dstType, bitOffset);
	return builder.create<arith::MulIOp>(loc, targetType, m, srcBitsValue);
	}			}

	namespace {			namespace {

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// ConvertMemRefAlloc			// ConvertMemRefAlloc
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	struct ConvertMemRefAlloc final : OpConversionPattern<memref::AllocOp> {			struct ConvertMemRefAlloc final : OpConversionPattern<memref::AllocOp> {
	using OpConversionPattern::OpConversionPattern;			using OpConversionPattern::OpConversionPattern;

	LogicalResult			LogicalResult
	matchAndRewrite(memref::AllocOp op, OpAdaptor adaptor,			matchAndRewrite(memref::AllocOp op, OpAdaptor adaptor,
	ConversionPatternRewriter &rewriter) const override {			ConversionPatternRewriter &rewriter) const override {
	Type newTy = getTypeConverter()->convertType(op.getType());			auto currentType = op.getMemref().getType().cast<MemRefType>();
	if (!newTy) {			auto newResultType =
				getTypeConverter()->convertType(op.getType()).dyn_cast<MemRefType>();
				if (!newResultType) {
	return rewriter.notifyMatchFailure(			return rewriter.notifyMatchFailure(
				hanchungUnsubmitted Done Reply Inline Actions style nit: use `auto`. dyn_cst and cast already spell the type. hanchung: style nit: use `auto`. dyn_cst and cast already spell the type.
	op->getLoc(),			op->getLoc(),
	llvm::formatv("failed to convert memref type: {0}", op.getType()));			llvm::formatv("failed to convert memref type: {0}", op.getType()));
	}			}

				// Special case zero-rank memrefs.
				if (currentType.getRank() == 0) {
				rewriter.replaceOpWithNewOp<memref::AllocOp>(
				op, newResultType, ValueRange{}, adaptor.getSymbolOperands(),
				adaptor.getAlignmentAttr());
				return success();
				}

				Location loc = op.getLoc();
				OpFoldResult zero = rewriter.getIndexAttr(0);
				SmallVector<OpFoldResult> indices(currentType.getRank(), zero);

				// Get linearized type.
				int srcBits = currentType.getElementType().getIntOrFloatBitWidth();
				int dstBits = newResultType.getElementType().getIntOrFloatBitWidth();
				SmallVector<OpFoldResult> sizes = op.getMixedSizes();

				memref::LinearizedMemRefInfo linearizedMemRefInfo =
				memref::getLinearizedMemRefOffsetAndSize(
				rewriter, loc, srcBits, dstBits, /offset =/zero, sizes);
				SmallVector<Value> dynamicLinearizedSize;
				if (!newResultType.hasStaticShape()) {
				dynamicLinearizedSize.push_back(getValueOrCreateConstantIndexOp(
				rewriter, loc, linearizedMemRefInfo.linearizedSize));
				}

	rewriter.replaceOpWithNewOp<memref::AllocOp>(			rewriter.replaceOpWithNewOp<memref::AllocOp>(
	op, newTy, adaptor.getDynamicSizes(), adaptor.getSymbolOperands(),			op, newResultType, dynamicLinearizedSize, adaptor.getSymbolOperands(),
	adaptor.getAlignmentAttr());			adaptor.getAlignmentAttr());
	return success();			return success();
	}			}
	};			};

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// ConvertMemRefAssumeAlignment			// ConvertMemRefAssumeAlignment
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	Show All 22 Lines
	// ConvertMemRefLoad			// ConvertMemRefLoad
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	struct ConvertMemRefLoad final : OpConversionPattern<memref::LoadOp> {			struct ConvertMemRefLoad final : OpConversionPattern<memref::LoadOp> {
	using OpConversionPattern::OpConversionPattern;			using OpConversionPattern::OpConversionPattern;

	LogicalResult			LogicalResult
	matchAndRewrite(memref::LoadOp op, OpAdaptor adaptor,			matchAndRewrite(memref::LoadOp op, OpAdaptor adaptor,
	ConversionPatternRewriter &rewriter) const override {			ConversionPatternRewriter &rewriter) const override {
	Type newTy = getTypeConverter()->convertType(op.getMemRefType());			auto convertedType = adaptor.getMemref().getType().cast<MemRefType>();
				hanchungUnsubmitted Not Done Reply Inline Actions Don't we need a check to avoid infinite loop? It's probably get converged after applying the pattern 10 times, but I think we still want to bail out if it's already converted? if (op.getMemRefType() == convertedType) return failure(); hanchung: Don't we need a check to avoid infinite loop? It's probably get converged after applying the…
				mravishankarAuthorUnsubmitted Done Reply Inline Actions I dont think we need to do that. If the `memRefType` is correct already, then the dialect conversion framework will not see the op as illegal and not even call the conversion pattern. mravishankar: I dont think we need to do that. If the `memRefType` is correct already, then the dialect…
				hanchungUnsubmitted Not Done Reply Inline Actions Good point, I missed the mechanism about TypeConversion. Thanks for the explanation! hanchung: Good point, I missed the mechanism about TypeConversion. Thanks for the explanation!
	if (!newTy) {			auto convertedElementType = convertedType.getElementType();
	return rewriter.notifyMatchFailure(
	op->getLoc(), llvm::formatv("failed to convert memref type: {0}",
	op.getMemRefType()));
	}

	if (op.getMemRefType() == newTy)
	return failure();

	auto loc = op.getLoc();
	auto sourceType = cast<MemRefType>(adaptor.getMemref().getType());
	unsigned sourceRank = sourceType.getRank();
	SmallVector<Value> indices = adaptor.getIndices();
	assert(indices.size() == sourceRank);

	auto srcElementType = sourceType.getElementType();
	auto oldElementType = op.getMemRefType().getElementType();			auto oldElementType = op.getMemRefType().getElementType();
	int srcBits = oldElementType.getIntOrFloatBitWidth();			int srcBits = oldElementType.getIntOrFloatBitWidth();
	int dstBits = srcElementType.getIntOrFloatBitWidth();			int dstBits = convertedElementType.getIntOrFloatBitWidth();
	if (dstBits % srcBits != 0) {			if (dstBits % srcBits != 0) {
	return rewriter.notifyMatchFailure(			return rewriter.notifyMatchFailure(
	op, "only dstBits % srcBits == 0 supported");			op, "only dstBits % srcBits == 0 supported");
	}			}

	auto stridedMetadata = rewriter.create<memref::ExtractStridedMetadataOp>(			Location loc = op.getLoc();
	loc, adaptor.getMemref());			// Special case 0-rank memref loads.
				Value bitsLoad;
	Value newLoad, lastIdx;			if (convertedType.getRank() == 0) {
	if (sourceRank == 0) {			bitsLoad = rewriter.create<memref::LoadOp>(loc, adaptor.getMemref(),
	newLoad = rewriter.create<memref::LoadOp>(			ValueRange{});
	loc, srcElementType, adaptor.getMemref(), adaptor.getIndices());

	lastIdx = stridedMetadata.getOffset();
	} else {			} else {
	auto [reinterpret, linearizedOffset] =			SmallVector<OpFoldResult> indices =
	memref::getLinearizeMemRefAndOffset(loc, sourceType, srcBits, dstBits,			getAsOpFoldResult(adaptor.getIndices());
	adaptor.getIndices(),
	stridedMetadata, rewriter);

	newLoad = rewriter.create<memref::LoadOp>(loc, srcElementType,			auto stridedMetadata = rewriter.create<memref::ExtractStridedMetadataOp>(
	reinterpret, linearizedOffset);			loc, op.getMemRef());

	lastIdx = adaptor.getIndices().back();			// Linearize the indices of the original load instruction. Do not account
	}			// for the scaling yet. This will be accounted for later.
				OpFoldResult linearizedIndices;
				std::tie(std::ignore, linearizedIndices) =
				memref::getLinearizedMemRefOffsetAndSize(
				rewriter, loc, srcBits, srcBits,
				stridedMetadata.getConstifiedMixedOffset(),
				stridedMetadata.getConstifiedMixedSizes(),
				stridedMetadata.getConstifiedMixedStrides(), indices);

				AffineExpr s0;
				bindSymbols(rewriter.getContext(), s0);
				int64_t scaler = dstBits / srcBits;
				OpFoldResult scaledLinearizedIndices =
				affine::makeComposedFoldedAffineApply(
				rewriter, loc, s0.floorDiv(scaler), {linearizedIndices});
				Value newLoad = rewriter.create<memref::LoadOp>(
				loc, adaptor.getMemref(),
				getValueOrCreateConstantIndexOp(rewriter, loc,
				scaledLinearizedIndices));

	// Get the offset and shift the bits to the rightmost.			// Get the offset and shift the bits to the rightmost.
	// Note, currently only the big-endian is supported.			// Note, currently only the big-endian is supported.
	auto castLastIdx =			Value bitwidthOffset = getOffsetForBitwidth(loc, linearizedIndices,
	rewriter.create<arith::IndexCastUIOp>(loc, srcElementType, lastIdx);			srcBits, dstBits, rewriter);
				bitsLoad = rewriter.create<arith::ShRSIOp>(loc, newLoad, bitwidthOffset);
	Value BitwidthOffset =			}
	getOffsetForBitwidth(loc, castLastIdx, srcBits, dstBits, rewriter);
	auto bitsLoad =
	rewriter.create<arith::ShRSIOp>(loc, newLoad, BitwidthOffset);

	// Get the corresponding bits. If the arith computation bitwidth equals			// Get the corresponding bits. If the arith computation bitwidth equals
	// to the emulated bitwidth, we apply a mask to extract the low bits.			// to the emulated bitwidth, we apply a mask to extract the low bits.
	// It is not clear if this case actually happens in practice, but we keep			// It is not clear if this case actually happens in practice, but we keep
	// the operations just in case. Otherwise, if the arith computation bitwidth			// the operations just in case. Otherwise, if the arith computation bitwidth
	// is different from the emulated bitwidth we truncate the result.			// is different from the emulated bitwidth we truncate the result.
	Operation *result;			Operation *result;
	auto resultTy = getTypeConverter()->convertType(oldElementType);			auto resultTy = getTypeConverter()->convertType(oldElementType);
	if (resultTy == srcElementType) {			if (resultTy == convertedElementType) {
	auto mask = rewriter.create<arith::ConstantOp>(			auto mask = rewriter.create<arith::ConstantOp>(
	loc, srcElementType,			loc, convertedElementType,
	rewriter.getIntegerAttr(srcElementType, (1 << srcBits) - 1));			rewriter.getIntegerAttr(convertedElementType, (1 << srcBits) - 1));

	result = rewriter.create<arith::AndIOp>(loc, bitsLoad, mask);			result = rewriter.create<arith::AndIOp>(loc, bitsLoad, mask);
	} else {			} else {
	result = rewriter.create<arith::TruncIOp>(loc, resultTy, bitsLoad);			result = rewriter.create<arith::TruncIOp>(loc, resultTy, bitsLoad);
	}			}

	rewriter.replaceOp(op, result->getResult(0));			rewriter.replaceOp(op, result->getResult(0));
	return success();			return success();
	}			}
	};			};
	} // end anonymous namespace			} // end anonymous namespace

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Public Interface Definition			// Public Interface Definition
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	void memref::populateMemRefNarrowTypeEmulationPatterns(			void memref::populateMemRefNarrowTypeEmulationPatterns(
	arith::NarrowTypeEmulationConverter &typeConverter,			arith::NarrowTypeEmulationConverter &typeConverter,
	RewritePatternSet &patterns) {			RewritePatternSet &patterns) {

	// Populate `memref.*` conversion patterns.			// Populate `memref.*` conversion patterns.
	patterns			patterns
	.add<ConvertMemRefAlloc, ConvertMemRefLoad, ConvertMemRefAssumeAlignment>(			.add<ConvertMemRefAlloc, ConvertMemRefLoad, ConvertMemRefAssumeAlignment>(
	typeConverter, patterns.getContext());			typeConverter, patterns.getContext());
				memref::populateResolveExtractStridedMetadataPatterns(patterns);
				}

				static SmallVector<int64_t> getLinearizedShape(MemRefType ty, int srcBits,
				int dstBits) {
				if (ty.getRank() == 0)
				return {};

				hanchungUnsubmitted Done Reply Inline Actions mlir style nit: do not add braces for single statement. hanchung: mlir style nit: do not add braces for single statement.
				int64_t linearizedShape = 1;
				for (auto shape : ty.getShape()) {
				if (shape == ShapedType::kDynamic)
				return {ShapedType::kDynamic};
				linearizedShape *= shape;
				hanchungUnsubmitted Done Reply Inline Actions ditto hanchung: ditto
				}
				int scale = dstBits / srcBits;
				// Scale the size to the ceilDiv(linearizedShape, scale)
				// to accomodate all the values.
				yzhang93Unsubmitted Done Reply Inline Actions Nit: Can you add some comments for the logic behind this? yzhang93: Nit: Can you add some comments for the logic behind this?
				linearizedShape = (linearizedShape + scale - 1) / scale;
				return {linearizedShape};
	}			}

	void memref::populateMemRefNarrowTypeEmulationConversions(			void memref::populateMemRefNarrowTypeEmulationConversions(
	arith::NarrowTypeEmulationConverter &typeConverter) {			arith::NarrowTypeEmulationConverter &typeConverter) {
	typeConverter.addConversion(			typeConverter.addConversion(
	[&typeConverter](MemRefType ty) -> std::optional<Type> {			[&typeConverter](MemRefType ty) -> std::optional<Type> {
	auto intTy = dyn_cast<IntegerType>(ty.getElementType());			auto intTy = dyn_cast<IntegerType>(ty.getElementType());
	if (!intTy)			if (!intTy)
	return ty;			return ty;

	unsigned width = intTy.getWidth();			unsigned width = intTy.getWidth();
	unsigned loadStoreWidth = typeConverter.getLoadStoreBitwidth();			unsigned loadStoreWidth = typeConverter.getLoadStoreBitwidth();
	if (width >= loadStoreWidth)			if (width >= loadStoreWidth)
	return ty;			return ty;

				// Currently only handle innermost stride being 1, checking
				SmallVector<int64_t> strides;
				int64_t offset;
				if (failed(getStridesAndOffset(ty, strides, offset)))
				return std::nullopt;
				if (!strides.empty() && strides.back() != 1)
				return std::nullopt;

				hanchungUnsubmitted Done Reply Inline Actions nit: remove braces hanchung: nit: remove braces
	auto newElemTy = IntegerType::get(ty.getContext(), loadStoreWidth,			auto newElemTy = IntegerType::get(ty.getContext(), loadStoreWidth,
	intTy.getSignedness());			intTy.getSignedness());
	if (!newElemTy)			if (!newElemTy)
	return std::nullopt;			return std::nullopt;

	return ty.cloneWith(std::nullopt, newElemTy);			StridedLayoutAttr layoutAttr;
				if (offset != 0) {
				layoutAttr = StridedLayoutAttr::get(ty.getContext(), offset,
				ArrayRef<int64_t>{1});
				}

				return MemRefType::get(getLinearizedShape(ty, width, loadStoreWidth),
				newElemTy, layoutAttr, ty.getMemorySpace());
	});			});
	}			}

mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp

Show First 20 Lines • Show All 681 Lines • ▼ Show 20 Lines	LogicalResult matchAndRewrite(memref::ExtractStridedMetadataOp op,
}		}

// Put all the values together to replace the results.		// Put all the values together to replace the results.
SmallVector<Value> results;		SmallVector<Value> results;
results.reserve(rank * 2 + 2);		results.reserve(rank * 2 + 2);

auto baseBufferType = cast<MemRefType>(op.getBaseBuffer().getType());		auto baseBufferType = cast<MemRefType>(op.getBaseBuffer().getType());
int64_t offset = 0;		int64_t offset = 0;
		if (op.getBaseBuffer().use_empty()) {
		results.push_back(nullptr);
		} else {
if (allocLikeOp.getType() == baseBufferType)		if (allocLikeOp.getType() == baseBufferType)
results.push_back(allocLikeOp);		results.push_back(allocLikeOp);
else		else
results.push_back(rewriter.create<memref::ReinterpretCastOp>(		results.push_back(rewriter.create<memref::ReinterpretCastOp>(
loc, baseBufferType, allocLikeOp, offset,		loc, baseBufferType, allocLikeOp, offset,
/sizes=/ArrayRef<int64_t>(),		/sizes=/ArrayRef<int64_t>(),
/strides=/ArrayRef<int64_t>()));		/strides=/ArrayRef<int64_t>()));
		}

// Offset.		// Offset.
results.push_back(rewriter.create<arith::ConstantIndexOp>(loc, offset));		results.push_back(rewriter.create<arith::ConstantIndexOp>(loc, offset));

for (OpFoldResult size : sizes)		for (OpFoldResult size : sizes)
results.push_back(getValueOrCreateConstantIndexOp(rewriter, loc, size));		results.push_back(getValueOrCreateConstantIndexOp(rewriter, loc, size));

for (OpFoldResult stride : strides)		for (OpFoldResult stride : strides)
▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp

Show All 40 Lines	bool isStaticShapeAndContiguousRowMajor(MemRefType type) {
while (curDim >= 0 && type.getDimSize(curDim) == 1) {		while (curDim >= 0 && type.getDimSize(curDim) == 1) {
--curDim;		--curDim;
}		}

// All dims are unit-strided or size-1.		// All dims are unit-strided or size-1.
return curDim < 0;		return curDim < 0;
}		}

std::pair<Value, Value>		std::pair<LinearizedMemRefInfo, OpFoldResult> getLinearizedMemRefOffsetAndSize(
getLinearizeMemRefAndOffset(Location loc, MemRefType sourceType, int srcBits,		OpBuilder &builder, Location loc, int srcBits, int dstBits,
int dstBits, SmallVector<Value> indices,		OpFoldResult offset, ArrayRef<OpFoldResult> sizes,
memref::ExtractStridedMetadataOp stridedMetadata,		ArrayRef<OpFoldResult> strides, ArrayRef<OpFoldResult> indices) {
OpBuilder &builder) {		unsigned sourceRank = sizes.size();
auto srcElementType = sourceType.getElementType();		assert(sizes.size() == strides.size() &&
unsigned sourceRank = indices.size();		"expected as many sizes as strides for a memref");
		SmallVector indicesVec = llvm::to_vector(indices);
		hanchungUnsubmitted Not Done Reply Inline Actions hmm, should we pass the type to `SmallVector`? I'm surprised that it's working. hanchung: hmm, should we pass the type to `SmallVector`? I'm surprised that it's working.
		mravishankarAuthorUnsubmitted Done Reply Inline Actions I dont think we need to. Not sure I see the issue here. Its creating a new vector from `indices` and is resized if it empty. mravishankar: I dont think we need to. Not sure I see the issue here. Its creating a new vector from…
Value baseBuffer = stridedMetadata.getBaseBuffer();		if (indices.empty())
SmallVector<Value> baseSizes = stridedMetadata.getSizes();		indicesVec.resize(sourceRank, builder.getIndexAttr(0));
SmallVector<Value> baseStrides = stridedMetadata.getStrides();		assert(indicesVec.size() == strides.size() &&
		hanchungUnsubmitted Done Reply Inline Actions nit: remove braces hanchung: nit: remove braces
Value baseOffset = stridedMetadata.getOffset();		"expected as many indices as rank of memref");
assert(indices.size() == baseStrides.size());

// Create the affine symbols and values for linearization.		// Create the affine symbols and values for linearization.
SmallVector<AffineExpr> symbols(2 * sourceRank + 2);		SmallVector<AffineExpr> symbols(2 * sourceRank);
bindSymbolsList(builder.getContext(), MutableArrayRef{symbols});		bindSymbolsList(builder.getContext(), MutableArrayRef{symbols});
symbols[0] = builder.getAffineSymbolExpr(0);		AffineExpr addMulMap = builder.getAffineConstantExpr(0);
AffineExpr addMulMap = symbols.front();		AffineExpr mulMap = builder.getAffineConstantExpr(1);
AffineExpr mulMap = symbols.front();
		SmallVector<OpFoldResult> offsetValues(2 * sourceRank);
SmallVector<OpFoldResult> offsetValues(2 * sourceRank + 2);		SmallVector<OpFoldResult> sizeValues(sourceRank);
offsetValues[0] = builder.getIndexAttr(0);
SmallVector<OpFoldResult> sizeValues(sourceRank + 1);
sizeValues[0] = builder.getIndexAttr(1);

for (unsigned i = 0; i < sourceRank; ++i) {		for (unsigned i = 0; i < sourceRank; ++i) {
unsigned offsetIdx = 2 * i + 1;		unsigned offsetIdx = 2 * i;
addMulMap = addMulMap + symbols[offsetIdx] * symbols[offsetIdx + 1];		addMulMap = addMulMap + symbols[offsetIdx] * symbols[offsetIdx + 1];
offsetValues[offsetIdx] = indices[i];		offsetValues[offsetIdx] = indicesVec[i];
offsetValues[offsetIdx + 1] = baseStrides[i];		offsetValues[offsetIdx + 1] = strides[i];

unsigned sizeIdx = i + 1;		mulMap = mulMap * symbols[i];
mulMap = mulMap * symbols[sizeIdx];
sizeValues[sizeIdx] = baseSizes[i];
}		}

// Adjust linearizedOffset by the scale factor (dstBits / srcBits).		// Adjust linearizedIndices, size and offset by the scale factor (dstBits /
OpFoldResult scaler = builder.getIndexAttr(dstBits / srcBits);		// srcBits).
AffineExpr scaledAddMulMap = addMulMap.floorDiv(symbols.back());		int64_t scaler = dstBits / srcBits;
offsetValues.back() = scaler;		addMulMap = addMulMap.floorDiv(scaler);
		mulMap = mulMap.floorDiv(scaler);

OpFoldResult linearizedOffset = affine::makeComposedFoldedAffineApply(		OpFoldResult linearizedIndices = affine::makeComposedFoldedAffineApply(
builder, loc, scaledAddMulMap, offsetValues);		builder, loc, addMulMap, offsetValues);
OpFoldResult linearizedSize =		OpFoldResult linearizedSize =
affine::makeComposedFoldedAffineApply(builder, loc, mulMap, sizeValues);		affine::makeComposedFoldedAffineApply(builder, loc, mulMap, sizes);

// Adjust baseOffset by the scale factor (dstBits / srcBits).		// Adjust baseOffset by the scale factor (dstBits / srcBits).
AffineExpr s0, s1;		AffineExpr s0;
bindSymbols(builder.getContext(), s0, s1);		bindSymbols(builder.getContext(), s0);
OpFoldResult adjustBaseOffset = affine::makeComposedFoldedAffineApply(		OpFoldResult adjustBaseOffset = affine::makeComposedFoldedAffineApply(
builder, loc, s0.floorDiv(s1), {baseOffset, scaler});		builder, loc, s0.floorDiv(scaler), {offset});

		return {{adjustBaseOffset, linearizedSize}, linearizedIndices};
		}

// Flatten n-D MemRef to 1-D MemRef.		LinearizedMemRefInfo
std::optional<int64_t> stride =		getLinearizedMemRefOffsetAndSize(OpBuilder &builder, Location loc, int srcBits,
getConstantIntValue(stridedMetadata.getConstifiedMixedStrides().back());		int dstBits, OpFoldResult offset,
auto layoutAttr =		ArrayRef<OpFoldResult> sizes) {
StridedLayoutAttr::get(sourceType.getContext(), ShapedType::kDynamic,		SmallVector<OpFoldResult> strides(sizes.size());
{stride ? stride.value() : ShapedType::kDynamic});		if (sizes.size() > 0) {
int64_t staticShape = sourceType.hasStaticShape()		strides.back() = builder.getIndexAttr(1);
? sourceType.getNumElements()		AffineExpr s0, s1;
: ShapedType::kDynamic;		bindSymbols(builder.getContext(), s0, s1);
auto flattenMemrefType = MemRefType::get(		for (int index = sizes.size() - 1; index > 0; --index) {
staticShape, srcElementType, layoutAttr, sourceType.getMemorySpace());		strides[index - 1] = affine::makeComposedFoldedAffineApply(
		builder, loc, s0 * s1,
auto reinterpret = builder.create<memref::ReinterpretCastOp>(		ArrayRef<OpFoldResult>{strides[index], sizes[index]});
loc, flattenMemrefType, baseBuffer,		}
getValueOrCreateConstantIndexOp(builder, loc, adjustBaseOffset),		}
getValueOrCreateConstantIndexOp(builder, loc, linearizedSize),
baseStrides.back());

return std::make_pair(reinterpret, getValueOrCreateConstantIndexOp(		LinearizedMemRefInfo linearizedMemRefInfo;
builder, loc, linearizedOffset));		std::tie(linearizedMemRefInfo, std::ignore) =
		getLinearizedMemRefOffsetAndSize(builder, loc, srcBits, dstBits, offset,
		sizes, strides);
		return linearizedMemRefInfo;
}		}

} // namespace memref		} // namespace memref
} // namespace mlir		} // namespace mlir

mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	matchAndRewrite(vector::LoadOp op, OpAdaptor adaptor,
// TODO: Currently, only the even number of elements loading is supported.		// TODO: Currently, only the even number of elements loading is supported.
// To deal with the odd number of elements, one has to extract the		// To deal with the odd number of elements, one has to extract the
// subvector at the proper offset after bit-casting.		// subvector at the proper offset after bit-casting.

auto origElements = op.getVectorType().getNumElements();		auto origElements = op.getVectorType().getNumElements();
if (origElements % scale != 0)		if (origElements % scale != 0)
return failure();		return failure();

auto stridedMetadata = rewriter.create<memref::ExtractStridedMetadataOp>(		auto stridedMetadata =
loc, adaptor.getBase());		rewriter.create<memref::ExtractStridedMetadataOp>(loc, op.getBase());

auto [reinterpret, linearizedOffset] = memref::getLinearizeMemRefAndOffset(		OpFoldResult linearizedIndices;
loc, sourceType, srcBits, dstBits, adaptor.getIndices(),		std::tie(std::ignore, linearizedIndices) =
stridedMetadata, rewriter);		memref::getLinearizedMemRefOffsetAndSize(
		rewriter, loc, srcBits, dstBits,
		stridedMetadata.getConstifiedMixedOffset(),
		stridedMetadata.getConstifiedMixedSizes(),
		stridedMetadata.getConstifiedMixedStrides(),
		getAsOpFoldResult(adaptor.getIndices()));

auto srcElementType = sourceType.getElementType();		auto srcElementType = sourceType.getElementType();
auto numElements =		auto numElements =
static_cast<int>(std::ceil(static_cast<double>(origElements) / scale));		static_cast<int>(std::ceil(static_cast<double>(origElements) / scale));
auto newLoad = rewriter.create<vector::LoadOp>(		auto newLoad = rewriter.create<vector::LoadOp>(
loc, VectorType::get(numElements, srcElementType), reinterpret,		loc, VectorType::get(numElements, srcElementType), adaptor.getBase(),
linearizedOffset);		getValueOrCreateConstantIndexOp(rewriter, loc, linearizedIndices));

		hanchungUnsubmitted Not Done Reply Inline Actions I don't follow this. I thought that we should do vector.load on a linearized based pointer (i.e., `void`) with a linearized index? hanchung:* I don't follow this. I thought that we should do vector.load on a linearized based pointer (i.e.
		mravishankarAuthorUnsubmitted Done Reply Inline Actions Yes, it is using the `linearizedIndices` to do the `vector.load`. Not sure I follow the question. mravishankar: Yes, it is using the `linearizedIndices` to do the `vector.load`. Not sure I follow the…
		hanchungUnsubmitted Not Done Reply Inline Actions I think I follow the logic now... My question was why it is `adpator.getBase()`, but not something related to `stridedMetadata.getBaseBuffer()`. It looks like you assume that all the sources should be flattened to 1D memref in the pass as well? hanchung: I think I follow the logic now... My question was why it is `adpator.getBase()`, but not…
		mravishankarAuthorUnsubmitted Done Reply Inline Actions This is the type of the operand after the producer has been modified. The `TypeConverter` ensures that it is a linearized type (see the associated change in the type converter). So at this point the adaptor already has a linearized memref. The base buffer of the strided metadata does not have the offset of the memref included. We should be using the base + offset + linearizedindices. The `adaptor.getBase()` is already at base + offset. mravishankar: This is the type of the operand after the producer has been modified. The `TypeConverter`…
numElements *= scale;		numElements *= scale;
auto castType = VectorType::get(numElements, oldElementType);		auto castType = VectorType::get(numElements, oldElementType);
auto bitCast = rewriter.create<vector::BitCastOp>(loc, castType, newLoad);		auto bitCast = rewriter.create<vector::BitCastOp>(loc, castType, newLoad);

rewriter.replaceOp(op, bitCast->getResult(0));		rewriter.replaceOp(op, bitCast->getResult(0));
return success();		return success();
}		}
};		};
Show All 13 Lines

mlir/test/Dialect/MemRef/emulate-narrow-type-diff-load-compute.mlir

This file was deleted.

	// RUN: mlir-opt --test-emulate-narrow-int="arith-compute-bitwidth=4 memref-load-bitwidth=8" %s \| FileCheck %s

	// CHECK-DAG: #[[$MAP0:.]] = affine_map<()[s0, s1] -> ((s0 s1) floordiv 2)>
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<()[s0] -> (s0 floordiv 2)>
	// CHECK-DAG: #[[$MAP2:.]] = affine_map<()[s0, s1, s2, s3] -> ((s0 s1 + s2 * s3) floordiv 2)>
	// CHECK-DAG: #[[$MAP3:.]] = affine_map<()[s0, s1] -> (s0 s1)>

	// Expect no conversions, i32 is supported.
	// CHECK-LABEL: func @memref_i32
	// CHECK: [[M:%.+]] = memref.alloc() : memref<4xi32, 1>
	// CHECK-NEXT: [[V:%.+]] = memref.load [[M]][{{%.+}}] : memref<4xi32, 1>
	// CHECK-NEXT: memref.store {{%.+}}, [[M]][{{%.+}}] : memref<4xi32, 1>
	// CHECK-NEXT: return
	func.func @memref_i32() {
	%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : i32
	%m = memref.alloc() : memref<4xi32, 1>
	%v = memref.load %m[%c0] : memref<4xi32, 1>
	memref.store %c1, %m[%c0] : memref<4xi32, 1>
	return
	}

	// -----

	// Expect no conversions, f32 is not an integer type.
	// CHECK-LABEL: func @memref_f32
	// CHECK: [[M:%.+]] = memref.alloc() : memref<4xf32, 1>
	// CHECK-NEXT: [[V:%.+]] = memref.load [[M]][{{%.+}}] : memref<4xf32, 1>
	// CHECK-NEXT: memref.store {{%.+}}, [[M]][{{%.+}}] : memref<4xf32, 1>
	// CHECK-NEXT: return
	func.func @memref_f32() {
	%c0 = arith.constant 0 : index
	%c1 = arith.constant 1.0 : f32
	%m = memref.alloc() : memref<4xf32, 1>
	%v = memref.load %m[%c0] : memref<4xf32, 1>
	memref.store %c1, %m[%c0] : memref<4xf32, 1>
	return
	}

	// -----

	// CHECK-LABEL: func @memref_load_i4_zero_rank
	// CHECK-NEXT: %[[M:.*]] = memref.alloc() : memref<i8>
	// CHECK-NEXT: %[[BASE:.]], %[[OFFSET:.]] = memref.extract_strided_metadata %[[M]] : memref<i8> -> memref<i8>, index
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[M]][] : memref<i8>
	// CHECK-NEXT: %[[I:.*]] = arith.index_castui %[[OFFSET]] : index to i8
	// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : i8
	// CHECK-NEXT: %[[C4:.*]] = arith.constant 4 : i8
	// CHECK-NEXT: %[[REM:.*]] = arith.remui %[[I]], %[[C2]] : i8
	// CHECK-NEXT: %[[STEP:.*]] = arith.muli %[[REM]], %[[C4]] : i8
	// CHECK-NEXT: %[[SHIFT:.*]] = arith.shrsi %[[LOAD]], %[[STEP]] : i8
	// CHECK-NEXT: %[[RES:.*]] = arith.trunci %[[SHIFT]] : i8 to i4
	// CHECK-NEXT: return
	func.func @memref_load_i4_zero_rank() {
	%0 = memref.alloc() : memref<i4>
	%1 = memref.load %0[] : memref<i4>
	return
	}

	// -----

	// CHECK-LABEL: func @memref_load_i4
	// CHECK-SAME: (%[[ARG:.*]]: index)
	// CHECK-NEXT: %[[M:.*]] = memref.alloc() : memref<4xi8>
	// CHECK-NEXT: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]], %[[STRIDES:.]] = memref.extract_strided_metadata %[[M]] : memref<4xi8> -> memref<i8>, index, index, index
	// CHECK-NEXT: %[[INDEX:.*]] = affine.apply #[[$MAP0]]()[%[[ARG]], %[[STRIDES]]]
	// CHECK-NEXT: %[[AOFF:.*]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
	// CHECK-NEXT: %[[CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[AOFF]]], sizes: [%[[SIZES]]], strides: [%[[STRIDES]]] : memref<i8> to memref<4xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[CAST]][%[[INDEX]]] : memref<4xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[I:.*]] = arith.index_castui %[[ARG]] : index to i8
	// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : i8
	// CHECK-NEXT: %[[C4:.*]] = arith.constant 4 : i8
	// CHECK-NEXT: %[[REM:.*]] = arith.remui %[[I]], %[[C2]] : i8
	// CHECK-NEXT: %[[STEP:.*]] = arith.muli %[[REM]], %[[C4]] : i8
	// CHECK-NEXT: %[[SHIFT:.*]] = arith.shrsi %[[LOAD]], %[[STEP]] : i8
	// CHECK-NEXT: %[[RES:.*]] = arith.trunci %[[SHIFT]] : i8 to i4
	// CHECK-NEXT: return
	func.func @memref_load_i4(%arg0: index) {
	%0 = memref.alloc() : memref<4xi4>
	%1 = memref.load %0[%arg0] : memref<4xi4>
	return
	}

	// -----

	// CHECK-LABEL: func @memref_load_i4_rank2
	// CHECK-SAME: (%[[ARG:.]]: memref<4x128xi8>, %[[ARG0:.]]: index, %[[ARG1:.*]]: index)
	// CHECK-NEXT: memref.assume_alignment %[[ARG]], 64 : memref<4x128xi8>
	// CHECK-NEXT: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:2, %[[STRIDES:.]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<4x128xi8> -> memref<i8>, index, index, index, index, index
	// CHECK-NEXT: %[[INDEX:.*]] = affine.apply #[[$MAP2]]()[%[[ARG0]], %[[STRIDES]]#0, %[[ARG1]], %[[STRIDES]]#1]
	// CHECK-NEXT: %[[LSIZE:.*]] = affine.apply #[[$MAP3]]()[%[[SIZES]]#0, %[[SIZES]]#1]
	// CHECK-NEXT: %[[AOFF:.*]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
	// CHECK-NEXT: %[[CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[AOFF]]], sizes: [%[[LSIZE]]], strides: [%[[STRIDES]]#1] : memref<i8> to memref<512xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[CAST]][%[[INDEX]]] : memref<512xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[I:.*]] = arith.index_castui %[[ARG1]] : index to i8
	// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : i8
	// CHECK-NEXT: %[[C4:.*]] = arith.constant 4 : i8
	// CHECK-NEXT: %[[REM:.*]] = arith.remui %[[I]], %[[C2]] : i8
	// CHECK-NEXT: %[[STEP:.*]] = arith.muli %[[REM]], %[[C4]] : i8
	// CHECK-NEXT: %[[SHIFT:.*]] = arith.shrsi %[[LOAD]], %[[STEP]] : i8
	// CHECK-NEXT: %[[RES:.*]] = arith.trunci %[[SHIFT]] : i8 to i4
	// CHECK-NEXT: return
	func.func @memref_load_i4_rank2(%0: memref<4x128xi4>, %arg0: index, %arg1: index) {
	memref.assume_alignment %0, 64 : memref<4x128xi4>
	%1 = memref.load %0[%arg0,%arg1] : memref<4x128xi4>
	return
	}

mlir/test/Dialect/MemRef/emulate-narrow-type-same-load-compute.mlir

This file was deleted.

	// RUN: mlir-opt --test-emulate-narrow-int="arith-compute-bitwidth=8 memref-load-bitwidth=8" %s \| FileCheck %s

	// CHECK-DAG: #[[$MAP0:.]] = affine_map<()[s0, s1] -> ((s0 s1) floordiv 2)>
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<()[s0] -> (s0 floordiv 2)>
	// CHECK-DAG: #[[$MAP2:.]] = affine_map<()[s0, s1, s2, s3] -> ((s0 s1 + s2 * s3) floordiv 2)>
	// CHECK-DAG: #[[$MAP3:.]] = affine_map<()[s0, s1] -> (s0 s1)>

	// Expect no conversions.
	// CHECK-LABEL: func @memref_i8
	// CHECK: [[M:%.+]] = memref.alloc() : memref<4xi8, 1>
	// CHECK-NEXT: [[V:%.+]] = memref.load [[M]][{{%.+}}] : memref<4xi8, 1>
	// CHECK-NEXT: memref.store {{%.+}}, [[M]][{{%.+}}] : memref<4xi8, 1>
	// CHECK-NEXT: return
	func.func @memref_i8() {
	%c0 = arith.constant 0 : index
	%c1 = arith.constant 1 : i8
	%m = memref.alloc() : memref<4xi8, 1>
	%v = memref.load %m[%c0] : memref<4xi8, 1>
	memref.store %c1, %m[%c0] : memref<4xi8, 1>
	return
	}

	// -----

	// CHECK-LABEL: func @memref_load_i4
	// CHECK-SAME: (%[[ARG:.*]]: index)
	// CHECK-NEXT: %[[M:.*]] = memref.alloc() : memref<4xi8>
	// CHECK-NEXT: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]], %[[STRIDES:.]] = memref.extract_strided_metadata %[[M]] : memref<4xi8> -> memref<i8>, index, index, index
	// CHECK-NEXT: %[[INDEX:.*]] = affine.apply #[[$MAP0]]()[%[[ARG]], %[[STRIDES]]]
	// CHECK-NEXT: %[[AOFF:.*]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
	// CHECK-NEXT: %[[CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[AOFF]]], sizes: [%[[SIZES]]], strides: [%[[STRIDES]]] : memref<i8> to memref<4xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[CAST]][%[[INDEX]]] : memref<4xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[I:.*]] = arith.index_castui %[[ARG]] : index to i8
	// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : i8
	// CHECK-NEXT: %[[C4:.*]] = arith.constant 4 : i8
	// CHECK-NEXT: %[[REM:.*]] = arith.remui %[[I]], %[[C2]] : i8
	// CHECK-NEXT: %[[STEP:.*]] = arith.muli %[[REM]], %[[C4]] : i8
	// CHECK-NEXT: %[[SHIFT:.*]] = arith.shrsi %[[LOAD]], %[[STEP]] : i8
	// CHECK-NEXT: %[[MASK:.*]] = arith.constant 15 : i8
	// CHECK-NEXT: %[[RES:.*]] = arith.andi %[[SHIFT]], %[[MASK]] : i8
	// CHECK-NEXT: return
	func.func @memref_load_i4(%arg0: index) {
	%0 = memref.alloc() : memref<4xi4>
	%1 = memref.load %0[%arg0] : memref<4xi4>
	return
	}

	// -----

	// CHECK-LABEL: func @memref_load_i4_rank2
	// CHECK-SAME: (%[[ARG:.]]: memref<4x128xi8>, %[[ARG0:.]]: index, %[[ARG1:.*]]: index)
	// CHECK-NEXT: memref.assume_alignment %[[ARG]], 64 : memref<4x128xi8>
	// CHECK-NEXT: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:2, %[[STRIDES:.]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<4x128xi8> -> memref<i8>, index, index, index, index, index
	// CHECK-NEXT: %[[INDEX:.*]] = affine.apply #[[$MAP2]]()[%[[ARG0]], %[[STRIDES]]#0, %[[ARG1]], %[[STRIDES]]#1]
	// CHECK-NEXT: %[[LSIZE:.*]] = affine.apply #[[$MAP3]]()[%[[SIZES]]#0, %[[SIZES]]#1]
	// CHECK-NEXT: %[[AOFF:.*]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
	// CHECK-NEXT: %[[CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[AOFF]]], sizes: [%[[LSIZE]]], strides: [%[[STRIDES]]#1] : memref<i8> to memref<512xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[LOAD:.*]] = memref.load %[[CAST]][%[[INDEX]]] : memref<512xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[I:.*]] = arith.index_castui %[[ARG1]] : index to i8
	// CHECK-NEXT: %[[C2:.*]] = arith.constant 2 : i8
	// CHECK-NEXT: %[[C4:.*]] = arith.constant 4 : i8
	// CHECK-NEXT: %[[REM:.*]] = arith.remui %[[I]], %[[C2]] : i8
	// CHECK-NEXT: %[[STEP:.*]] = arith.muli %[[REM]], %[[C4]] : i8
	// CHECK-NEXT: %[[SHIFT:.*]] = arith.shrsi %[[LOAD]], %[[STEP]] : i8
	// CHECK-NEXT: %[[MASK:.*]] = arith.constant 15 : i8
	// CHECK-NEXT: %[[RES:.*]] = arith.andi %[[SHIFT]], %[[MASK]] : i8
	// CHECK-NEXT: return
	func.func @memref_load_i4_rank2(%0: memref<4x128xi4>, %arg0: index, %arg1: index) {
	memref.assume_alignment %0, 64 : memref<4x128xi4>
	%1 = memref.load %0[%arg0,%arg1] : memref<4x128xi4>
	return
	}

mlir/test/Dialect/MemRef/emulate-narrow-type.mlir

This file was added.

				// RUN: mlir-opt --test-emulate-narrow-int="memref-load-bitwidth=8" --cse --split-input-file %s \| FileCheck %s
				// RUN: mlir-opt --test-emulate-narrow-int="memref-load-bitwidth=32" --cse --split-input-file %s \| FileCheck %s --check-prefix=CHECK32

				// Expect no conversions.
				func.func @memref_i8() -> i8 {
				%c3 = arith.constant 3 : index
				%m = memref.alloc() : memref<4xi8, 1>
				%v = memref.load %m[%c3] : memref<4xi8, 1>
				return %v : i8
				}
				// CHECK-LABEL: func @memref_i8()
				// CHECK: %[[M:.+]] = memref.alloc() : memref<4xi8, 1>
				// CHECK-NEXT: %[[V:.+]] = memref.load %[[M]][%{{.+}}] : memref<4xi8, 1>
				// CHECK-NEXT: return %[[V]]

				// CHECK32-LABEL: func @memref_i8()
				// CHECK32: %[[M:.+]] = memref.alloc() : memref<1xi32, 1>
				// CHECK32: %[[C0:.+]] = arith.constant 0 : index
				// CHECK32: %[[V:.+]] = memref.load %[[M]][%[[C0]]] : memref<1xi32, 1>
				// CHECK32: %[[C24:.+]] = arith.constant 24 : index
				// CHECK32: %[[CAST:.+]] = arith.index_cast %[[C24]] : index to i32
				// CHECK32: %[[SHIFTRT:.+]] = arith.shrsi %[[V]], %[[CAST]]
				// CHECK32: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i32 to i8
				// CHECK32-NEXT: return %[[TRUNC]]

				// -----

				func.func @memref_load_i4(%arg0: index) -> i4 {
				%0 = memref.alloc() : memref<5xi4>
				%1 = memref.load %0[%arg0] : memref<5xi4>
				return %1 : i4
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 floordiv 2)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 * 4 - (s0 floordiv 2) * 8)
				// CHECK: func @memref_load_i4(
				// CHECK-SAME: %[[ARG0:.+]]: index
				// CHECK: %[[ALLOC:.+]] = memref.alloc() : memref<3xi8>
				// CHECK: %[[INDEX:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]]]
				// CHECK: %[[LOADVAL:.+]] = memref.load %[[ALLOC]][%[[INDEX]]]
				// CHECK: %[[BITOFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]]]
				// CHECK: %[[CAST:.+]] = arith.index_cast %[[BITOFFSET]] : index to i8
				// CHECK: %[[SHIFTRT:.+]] = arith.shrsi %[[LOADVAL]], %[[CAST]]
				// CHECK: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i8 to i4
				// CHECK: return %[[TRUNC]]

				// CHECK32-DAG: #[[MAP0:.+]] = affine_map<()[s0] -> (s0 floordiv 8)>
				// CHECK32-DAG: #[[MAP1:.+]] = affine_map<()[s0] -> (s0 * 4 - (s0 floordiv 8) * 32)
				// CHECK32: func @memref_load_i4(
				// CHECK32-SAME: %[[ARG0:.+]]: index
				// CHECK32: %[[ALLOC:.+]] = memref.alloc() : memref<1xi32>
				// CHECK32: %[[INDEX:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]]]
				// CHECK32: %[[LOADVAL:.+]] = memref.load %[[ALLOC]][%[[INDEX]]]
				// CHECK32: %[[BITOFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]]]
				// CHECK32: %[[CAST:.+]] = arith.index_cast %[[BITOFFSET]] : index to i32
				// CHECK32: %[[SHIFTRT:.+]] = arith.shrsi %[[LOADVAL]], %[[CAST]]
				// CHECK32: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i32 to i4
				// CHECK32: return %[[TRUNC]]

				// -----

				func.func @memref_load_i4_rank2(%arg0: index, %arg1: index) -> i4 {
				%0 = memref.alloc() : memref<3x125xi4>
				memref.assume_alignment %0, 64 : memref<3x125xi4>
				%1 = memref.load %0[%arg0,%arg1] : memref<3x125xi4>
				return %1 : i4
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> ((s0 * 125 + s1) floordiv 2)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 * 500 + s1 * 4 - ((s0 * 125 + s1) floordiv 2) * 8)
				// CHECK: func @memref_load_i4_rank2(
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: index
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: index
				// CHECK: %[[ALLOC:.+]] = memref.alloc() : memref<188xi8>
				// CHECK: memref.assume_alignment %[[ALLOC]], 64 : memref<188xi8>
				// CHECK: %[[INDEX:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK: %[[LOAD:.+]] = memref.load %[[ALLOC]][%[[INDEX]]]
				// CHECK: %[[BITOFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK: %[[CAST:.+]] = arith.index_cast %[[BITOFFSET]] : index to i8
				// CHECK: %[[SHIFTRT:.+]] = arith.shrsi %[[LOAD]], %[[CAST]]
				// CHECK: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i8 to i4
				// CHECK: return %[[TRUNC]]

				// CHECK32-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> ((s0 * 125 + s1) floordiv 8)>
				// CHECK32-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1] -> (s0 * 500 + s1 * 4 - ((s0 * 125 + s1) floordiv 8) * 32)
				// CHECK32: func @memref_load_i4_rank2(
				// CHECK32-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: index
				// CHECK32-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: index
				// CHECK32: %[[ALLOC:.+]] = memref.alloc() : memref<47xi32>
				// CHECK32: memref.assume_alignment %[[ALLOC]], 64 : memref<47xi32>
				// CHECK32: %[[INDEX:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK32: %[[LOAD:.+]] = memref.load %[[ALLOC]][%[[INDEX]]]
				// CHECK32: %[[BITOFFSET:.+]] = affine.apply #[[MAP1]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK32: %[[CAST:.+]] = arith.index_cast %[[BITOFFSET]] : index to i32
				// CHECK32: %[[SHIFTRT:.+]] = arith.shrsi %[[LOAD]], %[[CAST]]
				// CHECK32: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i32 to i4
				// CHECK32: return %[[TRUNC]]

				// -----

				func.func @memref_load_i4_dynamic(%arg0: index, %arg1 : index, %arg2 : index, %arg3 : index) -> i4 {
				%0 = memref.alloc(%arg0, %arg1) : memref<?x?xi4>
				%1 = memref.load %0[%arg2, %arg3] : memref<?x?xi4>
				return %1 : i4
				}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> ((s0 * s1) floordiv 2)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1, s2] -> ((s2 + s0 * s1) floordiv 2)>
				// CHECK-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1, s2] -> ((s0 * s1) * 4 + s2 * 4 - ((s2 + s0 * s1) floordiv 2) * 8)>
				// CHECK: func @memref_load_i4_dynamic(
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: index
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9]+]]: index
				// CHECK-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index
				// CHECK-SAME: %[[ARG3:[a-zA-Z0-9]+]]: index
				// CHECK: %[[SIZE:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK: %[[ALLOC:.+]] = memref.alloc(%[[SIZE]])
				// CHECK: %[[INDEX:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
				// CHECK: %[[LOAD:.+]] = memref.load %[[ALLOC]][%[[INDEX]]]
				// CHECK: %[[BITOFFSET:.+]] = affine.apply #[[MAP2]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
				// CHECK: %[[CAST:.+]] = arith.index_cast %[[BITOFFSET]] : index to i8
				// CHECK: %[[SHIFTRT:.+]] = arith.shrsi %[[LOAD]], %[[CAST]]
				// CHECK: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i8 to i4
				// CHECK: return %[[TRUNC]]

				// CHECK32-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> ((s0 * s1) floordiv 8)>
				// CHECK32-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1, s2] -> ((s2 + s0 * s1) floordiv 8)>
				// CHECK32-DAG: #[[MAP2:.+]] = affine_map<()[s0, s1, s2] -> ((s0 * s1) * 4 + s2 * 4 - ((s2 + s0 * s1) floordiv 8) * 32)>
				// CHECK32: func @memref_load_i4_dynamic(
				// CHECK32-SAME: %[[ARG0:[a-zA-Z0-9]+]]: index
				// CHECK32-SAME: %[[ARG1:[a-zA-Z0-9]+]]: index
				// CHECK32-SAME: %[[ARG2:[a-zA-Z0-9]+]]: index
				// CHECK32-SAME: %[[ARG3:[a-zA-Z0-9]+]]: index
				// CHECK32: %[[SIZE:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK32: %[[ALLOC:.+]] = memref.alloc(%[[SIZE]])
				// CHECK32: %[[INDEX:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
				// CHECK32: %[[LOAD:.+]] = memref.load %[[ALLOC]][%[[INDEX]]]
				// CHECK32: %[[BITOFFSET:.+]] = affine.apply #[[MAP2]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
				// CHECK32: %[[CAST:.+]] = arith.index_cast %[[BITOFFSET]] : index to i32
				// CHECK32: %[[SHIFTRT:.+]] = arith.shrsi %[[LOAD]], %[[CAST]]
				// CHECK32: %[[TRUNC:.+]] = arith.trunci %[[SHIFTRT]] : i32 to i4
				// CHECK32: return %[[TRUNC]]

				// -----

				func.func @rank_zero_memref() -> i4 {
				%0 = memref.alloc() : memref<i4>
				%1 = memref.load %0[] : memref<i4>
				return %1 : i4
				}
				// CHECK-LABEL: func @rank_zero_memref()
				// CHECK: %[[ALLOC:.+]] = memref.alloc() : memref<i8>
				// CHECK: %[[LOAD:.+]] = memref.load %[[ALLOC]][] : memref<i8>
				// CHECK: %[[TRUNC:.+]] = arith.trunci %[[LOAD]] : i8 to i4
				// CHECK: return %[[TRUNC]]

				// CHECK32-LABEL: func @rank_zero_memref()
				// CHECK32: %[[ALLOC:.+]] = memref.alloc() : memref<i32>
				// CHECK32: %[[LOAD:.+]] = memref.load %[[ALLOC]][] : memref<i32>
				// CHECK32: %[[TRUNC:.+]] = arith.trunci %[[LOAD]] : i32 to i4
				// CHECK32: return %[[TRUNC]]

mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir

	// RUN: mlir-opt --test-emulate-narrow-int="arith-compute-bitwidth=4 memref-load-bitwidth=8" %s \| FileCheck %s			// RUN: mlir-opt --test-emulate-narrow-int="memref-load-bitwidth=8" --cse --split-input-file %s \| FileCheck %s
				// RUN: mlir-opt --test-emulate-narrow-int="memref-load-bitwidth=32" --cse --split-input-file %s \| FileCheck %s --check-prefix=CHECK32
	// CHECK-DAG: #[[$MAP1:.*]] = affine_map<()[s0] -> (s0 floordiv 2)>
	// CHECK-DAG: #[[$MAP2:.]] = affine_map<()[s0, s1, s2, s3] -> ((s0 s1 + s2 * s3) floordiv 2)>
	// CHECK-DAG: #[[$MAP3:.]] = affine_map<()[s0, s1] -> (s0 s1)>

				func.func @vector_load_i8(%arg1: index, %arg2: index) -> vector<4xi8> {
				%0 = memref.alloc() : memref<3x4xi8>
				%1 = vector.load %0[%arg1, %arg2] : memref<3x4xi8>, vector<4xi8>
				return %1 : vector<4xi8>
				}
	// Expect no conversions, i8 is supported.			// Expect no conversions, i8 is supported.
	// CHECK-LABEL: func @vector_load_i8			// CHECK: func @vector_load_i8(
	// CHECK-SAME: (%[[ARG:.]]: memref<3x4xi8>, %[[IDX0:.]]: index, %[[IDX1:.*]]: index)			// CHECK-SAME: %[[ARG0:[a-zA-Z0-9]+]]: index, %[[ARG1:[a-zA-Z0-9]+]]: index)
	// CHECK-NEXT: [[L:%.+]] = vector.load %[[ARG]][%[[IDX0]], %[[IDX1]]] : memref<3x4xi8>, vector<4xi8>			// CHECK-NEXT: %[[ALLOC:.+]] = memref.alloc() : memref<3x4xi8>
				// CHECK-NEXT: [[L:%.+]] = vector.load %[[ALLOC]][%[[ARG0]], %[[ARG1]]] : memref<3x4xi8>, vector<4xi8>
	// CHECK-NEXT: return			// CHECK-NEXT: return
	func.func @vector_load_i8(%arg0: memref<3x4xi8>, %arg1: index, %arg2: index) {
	%0 = vector.load %arg0[%arg1, %arg2] : memref<3x4xi8>, vector<4xi8>			// CHECK32: #[[MAP:.+]] = affine_map<()[s0, s1] -> (s0 + s1 floordiv 4)>
	return			// CHECK32: func @vector_load_i8(
				// CHECK32-SAME: %[[ARG0:[a-zA-Z0-9]+]]: index, %[[ARG1:[a-zA-Z0-9]+]]: index)
				// CHECK32: %[[ALLOC:.+]] = memref.alloc() : memref<3xi32>
				// CHECK32: %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK32: %[[VECLOAD:.+]] = vector.load %[[ALLOC]][%[[INDEX]]] : memref<3xi32>, vector<1xi32>
				// CHECK32: %[[VEC_I4:.+]] = vector.bitcast %[[VECLOAD]] : vector<1xi32> to vector<4xi8>
				// CHECK32: return %[[VEC_I4]]

				// -----

				func.func @vector_load_i4(%arg1: index, %arg2: index) -> vector<3x8xi4> {
				%0 = memref.alloc() : memref<3x8xi4>
				%cst = arith.constant dense<0> : vector<3x8xi4>
				%1 = vector.load %0[%arg1, %arg2] : memref<3x8xi4>, vector<8xi4>
				%2 = vector.insert %1, %cst [0] : vector<8xi4> into vector<3x8xi4>
				return %2 : vector<3x8xi4>
	}			}
				// CHECK-DAG: #[[MAP:.+]] = affine_map<()[s0, s1] -> (s0 * 4 + s1 floordiv 2)>
				// CHECK: func @vector_load_i4
				// CHECK-SAME: (%[[ARG0:[a-zA-Z0-9]+]]: index, %[[ARG1:[a-zA-Z0-9]+]]: index)
				// CHECK: %[[ALLOC:.+]] = memref.alloc() : memref<12xi8>
				// CHECK: %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK: %[[VEC:.+]] = vector.load %[[ALLOC]][%[[INDEX]]] : memref<12xi8>, vector<4xi8>
				// CHECK: %[[VEC_I4:.+]] = vector.bitcast %[[VEC]] : vector<4xi8> to vector<8xi4>

				// CHECK32-DAG: #[[MAP:.+]] = affine_map<()[s0, s1] -> (s0 + s1 floordiv 8)>
				// CHECK32: func @vector_load_i4
				// CHECK32-SAME: (%[[ARG0:[a-zA-Z0-9]+]]: index, %[[ARG1:[a-zA-Z0-9]+]]: index)
				// CHECK32: %[[ALLOC:.+]] = memref.alloc() : memref<3xi32>
				// CHECK32: %[[INDEX:.+]] = affine.apply #[[MAP]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK32: %[[VEC:.+]] = vector.load %[[ALLOC]][%[[INDEX]]] : memref<3xi32>, vector<1xi32>
				// CHECK32: %[[VEC_I4:.+]] = vector.bitcast %[[VEC]] : vector<1xi32> to vector<8xi4>

	// -----			// -----

	// CHECK-LABEL: func @vector_load_i4			func.func @vector_load_i4_dynamic(%arg0 : index, %arg1 : index, %arg2 : index, %arg3 : index) -> vector<8xi4> {
	// CHECK-SAME: (%[[ARG:.]]: memref<3x4xi8>, %[[IDX0:.]]: index, %[[IDX1:.*]]: index)			%0 = memref.alloc(%arg0, %arg1) : memref<?x?xi4>
	// CHECK-NEXT: %[[CST:.*]] = arith.constant dense<0> : vector<3x4xi4>			%1 = vector.load %0[%arg2, %arg3] : memref<?x?xi4>, vector<8xi4>
	// CHECK-NEXT: %[[BASE:.]], %[[OFFSET:.]], %[[SIZES:.]]:2, %[[STRIDES:.]]:2 = memref.extract_strided_metadata %[[ARG]] : memref<3x4xi8> -> memref<i8>, index, index, index, index, index			return %1 : vector<8xi4>
	// CHECK-NEXT: %[[INDEX:.*]] = affine.apply #[[$MAP2]]()[%[[IDX0]], %[[STRIDES]]#0, %[[IDX1]], %[[STRIDES]]#1]
	// CHECK-NEXT: %[[LSIZE:.*]] = affine.apply #[[$MAP3]]()[%[[SIZES]]#0, %[[SIZES]]#1]
	// CHECK-NEXT: %[[AOFF:.*]] = affine.apply #[[$MAP1]]()[%[[OFFSET]]]
	// CHECK-NEXT: %[[CAST:.*]] = memref.reinterpret_cast %[[BASE]] to offset: [%[[AOFF]]], sizes: [%[[LSIZE]]], strides: [%[[STRIDES]]#1] : memref<i8> to memref<12xi8, strided<[1], offset: ?>>
	// CHECK-NEXT: %[[LOAD:.*]] = vector.load %[[CAST]][%[[INDEX]]] : memref<12xi8, strided<[1], offset: ?>>, vector<2xi8>
	// CHECK-NEXT: %[[BITCAST:.*]] = vector.bitcast %[[LOAD]] : vector<2xi8> to vector<4xi4>
	// CHECK-NEXT: %[[INSERT:.*]] = vector.insert %[[BITCAST]], %[[CST]] [0] : vector<4xi4> into vector<3x4xi4>
	// CHECK-NEXT: return
	func.func @vector_load_i4(%arg0: memref<3x4xi4>, %arg1: index, %arg2: index) {
	%cst = arith.constant dense<0> : vector<3x4xi4>
	%0 = vector.load %arg0[%arg1, %arg2] : memref<3x4xi4>, vector<4xi4>
	%1 = vector.insert %0, %cst [0] : vector<4xi4> into vector<3x4xi4>
	return
	}			}
				// CHECK-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> ((s0 * s1) floordiv 2)>
				// CHECK-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1, s2] -> ((s2 + s0 * s1) floordiv 2)>
				// CHECK: func.func @vector_load_i4_dynamic(
				// CHECK-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: index
				// CHECK-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: index
				// CHECK-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: index
				// CHECK-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: index
				// CHECK: %[[SIZE:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK: %[[ALLOC:.+]] = memref.alloc(%[[SIZE]]) : memref<?xi8>
				// CHECK: %[[INDEX:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
				// CHECK: %[[VEC:.+]] = vector.load %[[ALLOC]][%[[INDEX]]] : memref<?xi8>, vector<4xi8>
				// CHECK: %[[VEC_I4:.+]] = vector.bitcast %[[VEC]] : vector<4xi8> to vector<8xi4>

				// CHECK32-DAG: #[[MAP0:.+]] = affine_map<()[s0, s1] -> ((s0 * s1) floordiv 8)>
				// CHECK32-DAG: #[[MAP1:.+]] = affine_map<()[s0, s1, s2] -> ((s2 + s0 * s1) floordiv 8)>
				// CHECK32: func.func @vector_load_i4_dynamic(
				// CHECK32-SAME: %[[ARG0:[a-zA-Z0-9_]+]]: index
				// CHECK32-SAME: %[[ARG1:[a-zA-Z0-9_]+]]: index
				// CHECK32-SAME: %[[ARG2:[a-zA-Z0-9_]+]]: index
				// CHECK32-SAME: %[[ARG3:[a-zA-Z0-9_]+]]: index
				// CHECK32: %[[SIZE:.+]] = affine.apply #[[MAP0]]()[%[[ARG0]], %[[ARG1]]]
				// CHECK32: %[[ALLOC:.+]] = memref.alloc(%[[SIZE]]) : memref<?xi32>
				// CHECK32: %[[INDEX:.+]] = affine.apply #[[MAP1]]()[%[[ARG2]], %[[ARG1]], %[[ARG3]]]
				// CHECK32: %[[VEC:.+]] = vector.load %[[ALLOC]][%[[INDEX]]] : memref<?xi32>, vector<1xi32>
				// CHECK32: %[[VEC_I4:.+]] = vector.bitcast %[[VEC]] : vector<1xi32> to vector<8xi4>

mlir/test/lib/Dialect/MemRef/TestEmulateNarrowType.cpp

Show First 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	target.addDynamicallyLegalOp<func::FuncOp>([&typeConverter](Operation *op) {
return typeConverter.isLegal(cast<func::FuncOp>(op).getFunctionType());		return typeConverter.isLegal(cast<func::FuncOp>(op).getFunctionType());
});		});
auto opLegalCallback = [&typeConverter](Operation *op) {		auto opLegalCallback = [&typeConverter](Operation *op) {
return typeConverter.isLegal(op);		return typeConverter.isLegal(op);
};		};
target.addDynamicallyLegalOp<func::CallOp, func::ReturnOp>(opLegalCallback);		target.addDynamicallyLegalOp<func::CallOp, func::ReturnOp>(opLegalCallback);
target.addDynamicallyLegalDialect<		target.addDynamicallyLegalDialect<
arith::ArithDialect, vector::VectorDialect, memref::MemRefDialect,		arith::ArithDialect, vector::VectorDialect, memref::MemRefDialect,
affine::AffineDialect>(		affine::AffineDialect>(opLegalCallback);
[&typeConverter](Operation *op) { return typeConverter.isLegal(op); });

RewritePatternSet patterns(ctx);		RewritePatternSet patterns(ctx);

arith::populateArithNarrowTypeEmulationPatterns(typeConverter, patterns);		arith::populateArithNarrowTypeEmulationPatterns(typeConverter, patterns);
memref::populateMemRefNarrowTypeEmulationPatterns(typeConverter, patterns);		memref::populateMemRefNarrowTypeEmulationPatterns(typeConverter, patterns);
vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns);		vector::populateVectorNarrowTypeEmulationPatterns(typeConverter, patterns);

if (failed(applyPartialConversion(op, target, std::move(patterns))))		if (failed(applyPartialConversion(op, target, std::move(patterns))))
Show All 19 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Revamp implementation of sub-byte load/store emulation.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 551243

mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h

mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp

mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp

mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp

mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp

mlir/test/Dialect/MemRef/emulate-narrow-type-diff-load-compute.mlir

mlir/test/Dialect/MemRef/emulate-narrow-type-same-load-compute.mlir

mlir/test/Dialect/MemRef/emulate-narrow-type.mlir

mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir

mlir/test/lib/Dialect/MemRef/TestEmulateNarrowType.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Revamp implementation of sub-byte load/store emulation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 551243

mlir/include/mlir/Dialect/MemRef/Utils/MemRefUtils.h

mlir/lib/Dialect/MemRef/Transforms/EmulateNarrowType.cpp

mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp

mlir/lib/Dialect/MemRef/Utils/MemRefUtils.cpp

mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp

mlir/test/Dialect/MemRef/emulate-narrow-type-diff-load-compute.mlir

mlir/test/Dialect/MemRef/emulate-narrow-type-same-load-compute.mlir

mlir/test/Dialect/MemRef/emulate-narrow-type.mlir

mlir/test/Dialect/Vector/vector-emulate-narrow-type.mlir

mlir/test/lib/Dialect/MemRef/TestEmulateNarrowType.cpp

[mlir] Revamp implementation of sub-byte load/store emulation.
ClosedPublic