This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Add remark propagation along the inlined-at chain.
ClosedPublic

Authored by fhahn on Jan 28 2020, 5:52 PM.

Download Raw Diff

Details

Reviewers

anemet
Gerolf
thegameg
hfinkel
andrew.w.kaylor
LuoYuanke

Commits

rGbc6c8c4bbbee: [Matrix] Add remark propagation along the inlined-at chain.

Summary

This patch adds support for propagating matrix expressions along the
inlined-at chain and emitting remarks at the traversed function scopes.

To motivate this new behavior, consider the example below. Without the
remark 'up-leveling', we would only get remarks in load.h and store.h,
but we cannot generate a remark describing the full expression in
toplevel.cpp, which is the place where the user has the best chance of
spotting/fixing potential problems.

With this patch, we generate a remark for the load in load.h, one for
the store in store.h and one for the complete expression in
toplevel.cpp. For a bigger example, please see remarks-inlining.ll.

load.h:
template <typename Ty, unsigned R, unsigned C> Matrix<Ty, R, C> load(Ty *Ptr) {
  Matrix<Ty, R, C> Result;
  Result.value = *reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t *>(Ptr);
  return Result;
}

store.h:
template <typename Ty, unsigned R, unsigned C> void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
   *reinterpret_cast<typename decltype(M1)::matrix_t *>(Ptr) = M1.value;
}

toplevel.cpp
void test(double *A, double *B, double *C) {
  store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
}

For a given function, we traverse the inlined-at chain for each
matrix instruction (= instructions with shape information). We collect
the matrix instructions in each DISubprogram we visit. This produces a
mapping of DISubprogram -> (List of matrix instructions visible in the
subpogram). We then generate remarks using the list of instructions for
each subprogram. This allows surfacing the remarks at a level useful to
users.
Please note that the current approach may create a lot of extra remarks.
Additional heuristics to cut-off the traversal can be implemented in the
future. For example, it might make sense to stop 'up-leveling' once all
matrix instructions are at the same debug location.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Jan 28 2020, 5:52 PM

Herald added a project: Restricted Project. · View Herald TranscriptJan 28 2020, 5:52 PM

Herald added subscribers: tschuett, hiraditya. · View Herald Transcript

Unit tests: fail. 62274 tests passed, 1 failed and 827 were skipped.

failed: Clang.CodeGenOpenCL/amdgpu-features.cl

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Herald added a subscriber: ormris. · View Herald TranscriptJan 28 2020, 6:07 PM

Harbormaster failed remote builds in B45198: Diff 241036!Jan 28 2020, 6:14 PM

On the description: I would first explain what you do and then how to do, i.e. have "To motivate" and the example before the paragprah "For a given function, we traverse".

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
960	I am assuming these values that falls into a given DISubprogram? More explanation would be good here and probably a bit more specific name than ExprSet.
1222	Explain Ops in the comment.
1297	Mapping is usually not a good name ;)

Add additional comments, fix some variable names.

In D73600#1852881, @anemet wrote:

On the description: I would first explain what you do and then how to do, i.e. have "To motivate" and the example before the paragprah "For a given function, we traverse".

Sounds good, thanks. I'll re-order the paragraphs.

fhahn marked 5 inline comments as done.Mar 7 2020, 7:09 AM

fhahn added inline comments.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
960	Yes, I've renamed it to ExprsInSubprogram and updated the comment.
1222	Also renamed Ops to ExprsInSubprogram.
1297	Done, renamed to Subprog2Exprs.

fhahn edited the summary of this revision. (Show Details)Mar 7 2020, 7:10 AM

Harbormaster failed remote builds in B48456: Diff 248930!Mar 7 2020, 8:02 AM

In the description when you say:

We then generate remarks using the list of instructions for
each subprogram. This allows surfacing the remarks at a level useful to
users.

I would make it explicit that here subprograms is meant to include its own subprograms recursively. E.g. using the example for the subprogram test this includes load and store inlined functions.

LGTM with these.

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp
1207	Also explain why we need leaves and what that means for sharing. Again an example would be useful.
1226	returning

This revision is now accepted and ready to land.Mar 11 2020, 8:55 AM

In D73600#1917094, @anemet wrote:

In the description when you say:

We then generate remarks using the list of instructions for
each subprogram. This allows surfacing the remarks at a level useful to
users.

I would make it explicit that here subprograms is meant to include its own subprograms recursively. E.g. using the example for the subprogram test this includes load and store inlined functions.

LGTM with these.

Thanks, will do!

Closed by commit rGbc6c8c4bbbee: [Matrix] Add remark propagation along the inlined-at chain. (authored by fhahn). · Explain WhyMar 11 2020, 10:46 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

LowerMatrixIntrinsics.cpp

193 lines

test/

Transforms/

LowerMatrixIntrinsics/

remarks-inlining.ll

166 lines

remarks.ll

14 lines

Diff 249683

llvm/lib/Transforms/Scalar/LowerMatrixIntrinsics.cpp

//===- LowerMatrixIntrinsics.cpp - Lower matrix intrinsics ------ C++ --===//		//===- LowerMatrixIntrinsics.cpp - Lower matrix intrinsics ------ C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Lower matrix intrinsics to vector operations.		// Lower matrix intrinsics to vector operations.
//		//
// TODO:		// TODO:
// * Implement multiply & add fusion		// * Implement multiply & add fusion
// * Add remark, summarizing the available matrix optimization opportunities
// (WIP).
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"		#include "llvm/Transforms/Scalar/LowerMatrixIntrinsics.h"
#include "llvm/ADT/GraphTraits.h"		#include "llvm/ADT/GraphTraits.h"
#include "llvm/ADT/PostOrderIterator.h"		#include "llvm/ADT/PostOrderIterator.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/OptimizationRemarkEmitter.h"		#include "llvm/Analysis/OptimizationRemarkEmitter.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Analysis/VectorUtils.h"		#include "llvm/Analysis/VectorUtils.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
Show All 9 Lines	static cl::opt<bool> EnableShapePropagation(
cl::desc("Enable/disable shape propagation from matrix intrinsics to other "		cl::desc("Enable/disable shape propagation from matrix intrinsics to other "
"instructions."));		"instructions."));

static cl::opt<bool> AllowContractEnabled(		static cl::opt<bool> AllowContractEnabled(
"matrix-allow-contract", cl::init(false), cl::Hidden,		"matrix-allow-contract", cl::init(false), cl::Hidden,
cl::desc("Allow the use of FMAs if available and profitable. This may "		cl::desc("Allow the use of FMAs if available and profitable. This may "
"result in different results, due to less rounding error."));		"result in different results, due to less rounding error."));

		/// Helper function to either return Scope, if it is a subprogram or the
		/// attached subprogram for a local scope.
		static DISubprogram getSubprogram(DIScope Scope) {
		if (auto *Subprogram = dyn_cast<DISubprogram>(Scope))
		return Subprogram;
		return cast<DILocalScope>(Scope)->getSubprogram();
		}

namespace {		namespace {

// Given an element poitner \p BasePtr to the start of a (sub) matrix, compute		// Given an element poitner \p BasePtr to the start of a (sub) matrix, compute
// the start address of column \p Col with type (\p EltType x \p NumRows)		// the start address of column \p Col with type (\p EltType x \p NumRows)
// assuming \p Stride elements between start two consecutive columns.		// assuming \p Stride elements between start two consecutive columns.
// \p Stride must be >= \p NumRows.		// \p Stride must be >= \p NumRows.
//		//
// Consider a 4x4 matrix like below		// Consider a 4x4 matrix like below
▲ Show 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	for (auto *BB : RPOT) {
Changed \|= VisitBinaryOperator(BinOp);		Changed \|= VisitBinaryOperator(BinOp);
if (match(&Inst, m_Load(m_Value(Op1))))		if (match(&Inst, m_Load(m_Value(Op1))))
Changed \|= VisitLoad(&Inst, Op1, Builder);		Changed \|= VisitLoad(&Inst, Op1, Builder);
else if (match(&Inst, m_Store(m_Value(Op1), m_Value(Op2))))		else if (match(&Inst, m_Store(m_Value(Op1), m_Value(Op2))))
Changed \|= VisitStore(&Inst, Op1, Op2, Builder);		Changed \|= VisitStore(&Inst, Op1, Op2, Builder);
}		}
}		}

RemarkGenerator RemarkGen(Inst2ColumnMatrix, ORE, DL);		RemarkGenerator RemarkGen(Inst2ColumnMatrix, ORE, Func);
RemarkGen.emitRemarks();		RemarkGen.emitRemarks();

for (Instruction *Inst : reverse(ToRemove))		for (Instruction *Inst : reverse(ToRemove))
Inst->eraseFromParent();		Inst->eraseFromParent();

return Changed;		return Changed;
}		}

▲ Show 20 Lines • Show All 359 Lines • ▼ Show 20 Lines	struct ExprLinearizer {
/// Mapping from instructions to column matrixes. It is used to identify		/// Mapping from instructions to column matrixes. It is used to identify
/// matrix instructions.		/// matrix instructions.
const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix;		const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix;

/// Mapping from values to the leaves of all expressions that the value is		/// Mapping from values to the leaves of all expressions that the value is
/// part of.		/// part of.
const DenseMap<Value , SmallPtrSet<Value , 2>> &Shared;		const DenseMap<Value , SmallPtrSet<Value , 2>> &Shared;

		/// Set of matrix expressions in the scope of a given DISubprogram.
		anemetUnsubmitted Not Done Reply Inline Actions I am assuming these values that falls into a given DISubprogram? More explanation would be good here and probably a bit more specific name than ExprSet. anemet: I am assuming these values that falls into a given DISubprogram? More explanation would be…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yes, I've renamed it to ExprsInSubprogram and updated the comment. fhahn: Yes, I've renamed it to ExprsInSubprogram and updated the comment.
		const SmallSetVector<Value *, 32> &ExprsInSubprogram;

/// Leaf node of the expression to linearize.		/// Leaf node of the expression to linearize.
Value *Leaf;		Value *Leaf;

/// Used to keep track of sub-expressions that get reused while linearizing		/// Used to keep track of sub-expressions that get reused while linearizing
/// the expression. Re-used sub-expressions are marked as (reused).		/// the expression. Re-used sub-expressions are marked as (reused).
SmallPtrSet<Value *, 8> ReusedExprs;		SmallPtrSet<Value *, 8> ReusedExprs;

ExprLinearizer(const DataLayout &DL,		ExprLinearizer(const DataLayout &DL,
const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix,		const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix,
const DenseMap<Value , SmallPtrSet<Value , 2>> &Shared,		const DenseMap<Value , SmallPtrSet<Value , 2>> &Shared,
		const SmallSetVector<Value *, 32> &ExprsInSubprogram,
Value *Leaf)		Value *Leaf)
: Str(), Stream(Str), DL(DL), Inst2ColumnMatrix(Inst2ColumnMatrix),		: Str(), Stream(Str), DL(DL), Inst2ColumnMatrix(Inst2ColumnMatrix),
Shared(Shared), Leaf(Leaf) {}		Shared(Shared), ExprsInSubprogram(ExprsInSubprogram), Leaf(Leaf) {}

void indent(unsigned N) {		void indent(unsigned N) {
LineLength += N;		LineLength += N;
for (unsigned i = 0; i < N; i++)		for (unsigned i = 0; i < N; i++)
Stream << " ";		Stream << " ";
}		}

void lineBreak() {		void lineBreak() {
Show All 17 Lines	struct ExprLinearizer {
Value getUnderlyingObjectThroughLoads(Value V) {		Value getUnderlyingObjectThroughLoads(Value V) {
if (Value *Ptr = getPointerOperand(V))		if (Value *Ptr = getPointerOperand(V))
return getUnderlyingObjectThroughLoads(Ptr);		return getUnderlyingObjectThroughLoads(Ptr);
else if (V->getType()->isPointerTy())		else if (V->getType()->isPointerTy())
return GetUnderlyingObject(V, DL);		return GetUnderlyingObject(V, DL);
return V;		return V;
}		}

/// Returns true if \p V is a matrix value.		/// Returns true if \p V is a matrix value in the given subprogram.
bool isMatrix(Value *V) const {		bool isMatrix(Value *V) const { return ExprsInSubprogram.count(V); }
return Inst2ColumnMatrix.find(V) != Inst2ColumnMatrix.end();
}

/// If \p V is a matrix value, print its shape as as NumRows x NumColumns to		/// If \p V is a matrix value, print its shape as as NumRows x NumColumns to
/// \p SS.		/// \p SS.
void prettyPrintMatrixType(Value *V, raw_string_ostream &SS) {		void prettyPrintMatrixType(Value *V, raw_string_ostream &SS) {
auto M = Inst2ColumnMatrix.find(V);		auto M = Inst2ColumnMatrix.find(V);
if (M == Inst2ColumnMatrix.end())		if (M == Inst2ColumnMatrix.end())
SS << "unknown";		SS << "unknown";
else {		else {
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	struct ExprLinearizer {
const std::string &getResult() {		const std::string &getResult() {
Stream.flush();		Stream.flush();
return Str;		return Str;
}		}
};		};

/// Generate remarks for matrix operations in a function. To generate remarks		/// Generate remarks for matrix operations in a function. To generate remarks
/// for matrix expressions, the following approach is used:		/// for matrix expressions, the following approach is used:
/// 1. Collect leafs of matrix expressions (done in		/// 1. Use the inlined-at debug information to group matrix operations to the
/// RemarkGenerator::getExpressionLeaves). Leaves are lowered matrix		/// DISubprograms they are contained in.
/// instructions without other matrix users (like stores).		/// 2. Collect leaves of matrix expressions (done in
///		/// RemarkGenerator::getExpressionLeaves) for each subprogram - expression
/// 2. For each leaf, create a remark containing a linearizied version of the		// mapping. Leaves are lowered matrix instructions without other matrix
		anemetUnsubmitted Not Done Reply Inline Actions Also explain why we need leaves and what that means for sharing. Again an example would be useful. anemet: Also explain why we need leaves and what that means for sharing. Again an example would be…
/// matrix expression.		// users (like stores) in the current subprogram.
///		/// 3. For each leaf, create a remark containing a linearizied version of the
/// TODO:		/// matrix expression. The expression is linearized by a recursive
/// * Summarize number of vector instructions generated for each expression.		/// bottom-up traversal of the matrix operands, starting at a leaf. Note
/// * Propagate matrix remarks up the inlining chain.		/// that multiple leaves can share sub-expressions. Shared subexpressions
		/// are explicitly marked as shared().
struct RemarkGenerator {		struct RemarkGenerator {
const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix;		const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix;
OptimizationRemarkEmitter &ORE;		OptimizationRemarkEmitter &ORE;
		Function &Func;
const DataLayout &DL;		const DataLayout &DL;

RemarkGenerator(const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix,		RemarkGenerator(const MapVector<Value *, ColumnMatrixTy> &Inst2ColumnMatrix,
OptimizationRemarkEmitter &ORE, const DataLayout &DL)		OptimizationRemarkEmitter &ORE, Function &Func)
: Inst2ColumnMatrix(Inst2ColumnMatrix), ORE(ORE), DL(DL) {}		: Inst2ColumnMatrix(Inst2ColumnMatrix), ORE(ORE), Func(Func),
		anemetUnsubmitted Done Reply Inline Actions Explain Ops in the comment. anemet: Explain Ops in the comment.
		fhahnAuthorUnsubmitted Done Reply Inline Actions Also renamed Ops to ExprsInSubprogram. fhahn: Also renamed Ops to ExprsInSubprogram.
		DL(Func.getParent()->getDataLayout()) {}
/// Return all leafs of matrix expressions. Those are instructions in
/// Inst2ColumnMatrix returing void. Currently that should only include		/// Return all leaves of the expressions in \p ExprsInSubprogram. Those are
/// stores.		/// instructions in Inst2ColumnMatrix returning void or without any users in
		anemetUnsubmitted Not Done Reply Inline Actions returning anemet: returning
SmallVector<Value *, 4> getExpressionLeaves() {		/// \p ExprsInSubprogram. Currently that should only include stores.
		SmallVector<Value *, 4>
		getExpressionLeaves(const SmallSetVector<Value *, 32> &ExprsInSubprogram) {
SmallVector<Value *, 4> Leaves;		SmallVector<Value *, 4> Leaves;
for (auto &KV : Inst2ColumnMatrix)		for (auto *Expr : ExprsInSubprogram)
if (KV.first->getType()->isVoidTy())		if (Expr->getType()->isVoidTy() \|\|
Leaves.push_back(KV.first);		!any_of(Expr->users(), [&ExprsInSubprogram](User *U) {
		return ExprsInSubprogram.count(U);
		}))
		Leaves.push_back(Expr);
return Leaves;		return Leaves;
}		}

/// Recursively traverse expression \p V starting at \p Leaf and add \p Leaf		/// Recursively traverse expression \p V starting at \p Leaf and add \p Leaf
/// to all visited expressions in \p Shared.		/// to all visited expressions in \p Shared. Limit the matrix operations to
		/// the ones in \p ExprsInSubprogram.
void collectSharedInfo(Value Leaf, Value V,		void collectSharedInfo(Value Leaf, Value V,
		const SmallSetVector<Value *, 32> &ExprsInSubprogram,
DenseMap<Value , SmallPtrSet<Value , 2>> &Shared) {		DenseMap<Value , SmallPtrSet<Value , 2>> &Shared) {

if (Inst2ColumnMatrix.find(V) == Inst2ColumnMatrix.end())		if (!ExprsInSubprogram.count(V))
return;		return;

auto I = Shared.insert({V, {}});		auto I = Shared.insert({V, {}});
I.first->second.insert(Leaf);		I.first->second.insert(Leaf);

for (Value *Op : cast<Instruction>(V)->operand_values())		for (Value *Op : cast<Instruction>(V)->operand_values())
collectSharedInfo(Leaf, Op, Shared);		collectSharedInfo(Leaf, Op, ExprsInSubprogram, Shared);
return;		return;
}		}

/// Calculate the number of exclusive and shared op counts for expression		/// Calculate the number of exclusive and shared op counts for expression
/// starting at \p V. Expressions used multiple times are counted once.		/// starting at \p V. Expressions used multiple times are counted once.
		/// Limit the matrix operations to the ones in \p ExprsInSubprogram.
std::pair<OpInfoTy, OpInfoTy>		std::pair<OpInfoTy, OpInfoTy>
sumOpInfos(Value Root, SmallPtrSetImpl<Value > &ReusedExprs,		sumOpInfos(Value Root, SmallPtrSetImpl<Value > &ReusedExprs,
DenseMap<Value , SmallPtrSet<Value , 2>> &Shared) {		const SmallSetVector<Value *, 32> &ExprsInSubprogram,
auto CM = Inst2ColumnMatrix.find(Root);		DenseMap<Value , SmallPtrSet<Value , 2>> &Shared) const {
if (CM == Inst2ColumnMatrix.end())		if (!ExprsInSubprogram.count(Root))
return {};		return {};

// Already counted this expression. Stop.		// Already counted this expression. Stop.
if (!ReusedExprs.insert(Root).second)		if (!ReusedExprs.insert(Root).second)
return {};		return {};

OpInfoTy SharedCount;		OpInfoTy SharedCount;
OpInfoTy Count;		OpInfoTy Count;

auto I = Shared.find(Root);		auto I = Shared.find(Root);
		auto CM = Inst2ColumnMatrix.find(Root);
if (I->second.size() == 1)		if (I->second.size() == 1)
Count = CM->second.getOpInfo();		Count = CM->second.getOpInfo();
else		else
SharedCount = CM->second.getOpInfo();		SharedCount = CM->second.getOpInfo();

for (Value *Op : cast<Instruction>(Root)->operand_values()) {		for (Value *Op : cast<Instruction>(Root)->operand_values()) {
auto C = sumOpInfos(Op, ReusedExprs, Shared);		auto C = sumOpInfos(Op, ReusedExprs, ExprsInSubprogram, Shared);
Count += C.first;		Count += C.first;
SharedCount += C.second;		SharedCount += C.second;
}		}
return {Count, SharedCount};		return {Count, SharedCount};
}		}

void emitRemarks() {		void emitRemarks() {
if (!ORE.allowExtraAnalysis(DEBUG_TYPE))		if (!ORE.allowExtraAnalysis(DEBUG_TYPE))
return;		return;

// Find leafs of matrix expressions.		// Map matrix operations to their containting subprograms, by traversing
auto Leaves = getExpressionLeaves();		// the inlinedAt chain. If the function does not have a DISubprogram, we
		// only map them to the containing function.
		MapVector<DISubprogram , SmallVector<Value , 8>> Subprog2Exprs;
		anemetUnsubmitted Done Reply Inline Actions Mapping is usually not a good name ;) anemet: Mapping is usually not a good name ;)
		fhahnAuthorUnsubmitted Done Reply Inline Actions Done, renamed to Subprog2Exprs. fhahn: Done, renamed to Subprog2Exprs.
		for (auto &KV : Inst2ColumnMatrix) {
		if (Func.getSubprogram()) {
		auto *I = cast<Instruction>(KV.first);
		DILocation *Context = I->getDebugLoc();
		while (Context) {
		auto I =
		Subprog2Exprs.insert({getSubprogram(Context->getScope()), {}});
		I.first->second.push_back(KV.first);
		Context = DebugLoc(Context).getInlinedAt();
		}
		} else {
		auto I = Subprog2Exprs.insert({nullptr, {}});
		I.first->second.push_back(KV.first);
		}
		}
		for (auto &KV : Subprog2Exprs) {
		SmallSetVector<Value *, 32> ExprsInSubprogram(KV.second.begin(),
		KV.second.end());
		auto Leaves = getExpressionLeaves(ExprsInSubprogram);

DenseMap<Value , SmallPtrSet<Value , 2>> Shared;		DenseMap<Value , SmallPtrSet<Value , 2>> Shared;

for (Value *Leaf : Leaves)		for (Value *Leaf : Leaves)
collectSharedInfo(Leaf, Leaf, Shared);		collectSharedInfo(Leaf, Leaf, ExprsInSubprogram, Shared);

// Generate remarks for each leaf.		// Generate remarks for each leaf.
for (auto *L : Leaves) {		for (auto *L : Leaves) {

		DebugLoc Loc = cast<Instruction>(L)->getDebugLoc();
		DILocation *Context = cast<Instruction>(L)->getDebugLoc();
		while (Context) {
		if (getSubprogram(Context->getScope()) == KV.first) {
		Loc = Context;
		break;
		}
		Context = DebugLoc(Context).getInlinedAt();
		}

SmallPtrSet<Value *, 8> ReusedExprs;		SmallPtrSet<Value *, 8> ReusedExprs;
OpInfoTy Counts, SharedCounts;		OpInfoTy Counts, SharedCounts;
std::tie(Counts, SharedCounts) = sumOpInfos(L, ReusedExprs, Shared);		std::tie(Counts, SharedCounts) =
		sumOpInfos(L, ReusedExprs, ExprsInSubprogram, Shared);

OptimizationRemark Rem(DEBUG_TYPE, "matrix-lowered",		OptimizationRemark Rem(DEBUG_TYPE, "matrix-lowered", Loc,
cast<Instruction>(L)->getDebugLoc(),
cast<Instruction>(L)->getParent());		cast<Instruction>(L)->getParent());

Rem << "Lowered with ";		Rem << "Lowered with ";
Rem << ore::NV("NumStores", Counts.NumStores) << " stores, "		Rem << ore::NV("NumStores", Counts.NumStores) << " stores, "
<< ore::NV("NumLoads", Counts.NumLoads) << " loads, "		<< ore::NV("NumLoads", Counts.NumLoads) << " loads, "
<< ore::NV("NumComputeOps", Counts.NumComputeOps) << " compute ops";		<< ore::NV("NumComputeOps", Counts.NumComputeOps)
		<< " compute ops";

if (SharedCounts.NumStores > 0 \|\| SharedCounts.NumLoads > 0 \|\|		if (SharedCounts.NumStores > 0 \|\| SharedCounts.NumLoads > 0 \|\|
SharedCounts.NumComputeOps > 0) {		SharedCounts.NumComputeOps > 0) {
Rem << ",\nadditionally "		Rem << ",\nadditionally "
<< ore::NV("NumStores", SharedCounts.NumStores) << " stores, "		<< ore::NV("NumStores", SharedCounts.NumStores) << " stores, "
<< ore::NV("NumLoads", SharedCounts.NumLoads) << " loads, "		<< ore::NV("NumLoads", SharedCounts.NumLoads) << " loads, "
<< ore::NV("NumFPOps", SharedCounts.NumComputeOps)		<< ore::NV("NumFPOps", SharedCounts.NumComputeOps)
<< " compute ops"		<< " compute ops"
<< " are shared with other expressions";		<< " are shared with other expressions";
}		}

Rem << ("\n" + linearize(L, Shared, DL));		Rem << ("\n" + linearize(L, Shared, ExprsInSubprogram, DL));
ORE.emit(Rem);		ORE.emit(Rem);
}		}
}		}
		}

std::string		std::string
linearize(Value *L,		linearize(Value *L,
const DenseMap<Value , SmallPtrSet<Value , 2>> &Shared,		const DenseMap<Value , SmallPtrSet<Value , 2>> &Shared,
		const SmallSetVector<Value *, 32> &ExprsInSubprogram,
const DataLayout &DL) {		const DataLayout &DL) {
ExprLinearizer Lin(DL, Inst2ColumnMatrix, Shared, L);		ExprLinearizer Lin(DL, Inst2ColumnMatrix, Shared, ExprsInSubprogram, L);
Lin.linearizeExpr(L, 0, false, false);		Lin.linearizeExpr(L, 0, false, false);
return Lin.getResult();		return Lin.getResult();
}		}
};		};
};		};
} // namespace		} // namespace

PreservedAnalyses LowerMatrixIntrinsicsPass::run(Function &F,		PreservedAnalyses LowerMatrixIntrinsicsPass::run(Function &F,
▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/test/Transforms/LowerMatrixIntrinsics/remarks-inlining.ll

This file was added.

				; REQUIRES: aarch64-registered-target

				; This test needs to be target specific due to the cost estimate in the output.

				; RUN: opt -lower-matrix-intrinsics -pass-remarks=lower-matrix-intrinsics -mtriple=arm64-apple-iphoneos -S < %s 2>&1 \| FileCheck %s

				; Test the propagation of matrix expressions along to inlined-at chain. The IR
				; in the test roughly corresponds to the C++ code below, with the IR containing
				; references to a few more functions.

				; matrix.h
				; template <typename Ty, unsigned R, unsigned C>
				; struct Matrix {
				; using matrix_t = Ty __attribute__((matrix_type(R, C)));
				;
				; matrix_t value;
				; };
				;
				; ; add.h
				; template <typename Ty, unsigned R, unsigned C>
				; Matrix<Ty, R, C> add(Matrix<Ty, R, C> M1, Matrix<Ty, R, C> M2) {
				; Matrix<Ty, R, C> Result;
				; Result.value = __builtin_matrix_add(M1.value, M2.value);
				; return Result;
				; }
				;
				; load.h:
				; template <typename Ty, unsigned R, unsigned C>
				; Matrix<Ty, R, C> load(Ty *Ptr) {
				; Matrix<Ty, R, C> Result;
				; Result.value = reinterpret_cast <typename Matrix<Ty, R, C>::matrix_t >(Ptr);
				; return Result;
				; }
				;
				; store.h:
				; template <typename Ty, unsigned R, unsigned C>
				; void store(Matrix<Ty, R, C> M1, Ty *Ptr) {
				; reinterpret_cast<typename decltype(M1)::matrix_t >(Ptr) = M1.value;
				; }
				;
				; toplevel.cpp
				; void test(double A, double B, double *C) {
				; store(add(load<double, 3, 5>(A), load<double, 3, 5>(B)), C);
				; }
				;

				target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "aarch64-apple-ios"

				; CHECK-LABEL: remark: load.h:41:43: Lowered with 0 stores, 10 loads, 0 compute ops
				; CHECK-NEXT: load(addr %A)

				; CHECK-LABEL: remark: load.h:41:43: Lowered with 0 stores, 10 loads, 0 compute ops
				; CHECK-NEXT: columnwise.load.3x5.double(addr %B, 5)

				; CHECK-LABEL: remark: load.h:41:11: Lowered with 0 stores, 1 loads, 0 compute ops
				; CHECK-NEXT: load(addr %D)

				; CHECK-LABEL: remark: assign.h:32:43: Lowered with 0 stores, 10 loads, 0 compute ops
				; CHECK-NEXT: load(addr %A)

				; CHECK-LABEL: remark: assign.h:32:43: Lowered with 0 stores, 10 loads, 0 compute ops
				; CHECK-NEXT: columnwise.load.3x5.double(addr %B, 5)

				; CHECK-LABEL: remark: toplevel.c:410:0: Lowered with 10 stores, 20 loads, 10 compute ops
				; CHECK-NEXT: store(
				; CHECK-NEXT: fadd(
				; CHECK-NEXT: load(addr %A),
				; CHECK-NEXT: columnwise.load.3x5.double(addr %B, 5)),
				; CHECK-NEXT: addr %C)

				; CHECK-LABEL: remark: toplevel.c:510:0: Lowered with 1 stores, 1 loads, 8 compute ops
				; CHECK-NEXT: store(
				; CHECK-NEXT: transpose.1x2.float(transpose.2x1.float(load(addr %D))),
				; CHECK-NEXT: addr %D)

				; CHECK-LABEL: remark: add.h:66:11: Lowered with 0 stores, 0 loads, 10 compute ops
				; CHECK-NEXT: fadd(
				; CHECK-NEXT: addr %A,
				; CHECK-NEXT: scalar)

				; CHECK-LABEL: remark: store.h:10:11: Lowered with 10 stores, 0 loads, 0 compute ops
				; CHECK-NEXT: store(
				; CHECK-NEXT: scalar,
				; CHECK-NEXT: addr %C)

				; CHECK-LABEL: remark: store.h:66:11: Lowered with 1 stores, 0 loads, 0 compute ops
				; CHECK-NEXT: store(
				; CHECK-NEXT: scalar,
				; CHECK-NEXT: addr %D)

				; CHECK-LABEL: remark: transpose.h:13:11: Lowered with 0 stores, 0 loads, 8 compute ops
				; CHECK-NEXT: transpose.1x2.float(transpose.2x1.float(addr %D))

				define void @toplevel(<15 x double>* %A, <15 x double>* %B, <15 x double>* %C, <2 x float>* %D) !dbg !16 {
				entry:
				%a = load <15 x double>, <15 x double> *%A, align 16, !dbg !3791
				%b = call <15 x double> @llvm.matrix.columnwise.load(<15 x double>* %B, i32 5, i32 3, i32 5), !dbg !3793
				%c = fadd <15 x double> %a, %b, !dbg !100
				store <15 x double> %c, <15 x double> *%C, align 16, !dbg !102

				%load = load <2 x float>, <2 x float>* %D, !dbg !104
				%t1 = call <2 x float> @llvm.matrix.transpose(<2 x float> %load, i32 2, i32 1), !dbg !106
				%t2 = call <2 x float> @llvm.matrix.transpose(<2 x float> %t1, i32 1, i32 2), !dbg !106
				store <2 x float> %t2, <2 x float>* %D, !dbg !108
				ret void
				}

				declare <15 x double> @llvm.matrix.columnwise.load(<15 x double>*, i32, i32, i32)
				declare <2 x float> @llvm.matrix.transpose(<2 x float>, i32, i32)

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "load.h", directory: "/test")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!5 = distinct !DISubprogram(name: "load_fn", scope: !1, file: !1, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
				!17 = !DIFile(filename: "toplevel.c", directory: "/test")
				!16 = distinct !DISubprogram(name: "toplevel", scope: !1, file: !17, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)
				!18 = !DIFile(filename: "assign.h", directory: "/test")
				!19 = distinct !DISubprogram(name: "assign", scope: !1, file: !18, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)

				!20 = !DIFile(filename: "add.h", directory: "/test")
				!21 = distinct !DISubprogram(name: "add_fn", scope: !1, file: !20, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)

				!22 = !DIFile(filename: "store.h", directory: "/test")
				!23 = distinct !DISubprogram(name: "store_fn", scope: !1, file: !22, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)

				!24 = !DIFile(filename: "transpose.h", directory: "/test")
				!25 = distinct !DISubprogram(name: "transpose", scope: !1, file: !24, line: 1, type: !6, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: true, unit: !0, retainedNodes: !12)


				!6 = !DISubroutineType(types: !7)
				!7 = !{null, !8, !8, !11}
				!8 = !DIDerivedType(tag: DW_TAG_restrict_type, baseType: !9)
				!9 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !10, size: 32, align: 32)
				!10 = !DIBasicType(name: "float", size: 32, align: 32, encoding: DW_ATE_float)
				!11 = !DIBasicType(name: "int", size: 32, align: 32, encoding: DW_ATE_signed)
				!12 = !{!13}
				!13 = !DILocalVariable(name: "a", arg: 1, scope: !5, file: !1, line: 1, type: !8)
				!14 = !DILocation(line: 1, column: 27, scope: !5)

				!3791 = !DILocation(line: 41, column: 43, scope: !5, inlinedAt: !3795)
				!3792 = !DILocation(line: 405, column: 3, scope: !16)
				!3793 = !DILocation(line: 41, column: 43, scope: !5, inlinedAt: !3796)
				!3794 = !DILocation(line: 406, column: 11, scope: !16)
				!3795 = !DILocation(line: 32, column: 43, scope: !19, inlinedAt: !3792)
				!3796 = !DILocation(line: 32, column: 43, scope: !19, inlinedAt: !3794)

				!100 = !DILocation(line: 66, column: 11, scope: !21, inlinedAt: !101)
				!101 = !DILocation(line: 410, column: 11, scope: !16)

				!102 = !DILocation(line: 10, column: 11, scope: !23, inlinedAt: !103)
				!103 = !DILocation(line: 410, column: 0, scope: !16)

				!104 = !DILocation(line: 41, column: 11, scope: !5, inlinedAt: !101)
				!105 = !DILocation(line: 500, column: 11, scope: !16)

				!106 = !DILocation(line: 13, column: 11, scope: !25, inlinedAt: !101)
				!107 = !DILocation(line: 510, column: 11, scope: !16)

				!108 = !DILocation(line: 66, column: 11, scope: !23, inlinedAt: !109)
				!109 = !DILocation(line: 510, column: 0, scope: !16)

llvm/test/Transforms/LowerMatrixIntrinsics/remarks.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: columnwise.load.3x3.double(addr %A, 5)			; CHECK-NEXT: columnwise.load.3x3.double(addr %A, 5)
	; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),			; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),
	; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),			; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),
	; CHECK-NEXT: addr %B,			; CHECK-NEXT: addr %B,
	; CHECK-NEXT: 10)			; CHECK-NEXT: 10)

	define void @binaryops(<9 x double>* %A, <9 x double>* %B) !dbg !31 {			define void @binaryops(<9 x double>* %A, <9 x double>* %B) !dbg !31 {
	%A.matrix = call <9 x double> @llvm.matrix.columnwise.load(<9 x double>* %A, i32 5, i32 3, i32 3), !dbg !32			%A.matrix = call <9 x double> @llvm.matrix.columnwise.load(<9 x double>* %A, i32 5, i32 3, i32 3), !dbg !32
	%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix			%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix, !dbg !32
	%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix			%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix, !dbg !32
	call void @llvm.matrix.columnwise.store(<9 x double> %R2.matrix, <9 x double>* %B, i32 10, i32 3, i32 3), !dbg !32			call void @llvm.matrix.columnwise.store(<9 x double> %R2.matrix, <9 x double>* %B, i32 10, i32 3, i32 3), !dbg !32
	ret void			ret void
	}			}

	; CHECK-LABEL: remark: test.h:90:20: Lowered with 6 stores, 6 loads, 12 compute ops			; CHECK-LABEL: remark: test.h:90:20: Lowered with 6 stores, 6 loads, 12 compute ops
	; CHECK-NEXT: columnwise.store.3x3.double(			; CHECK-NEXT: columnwise.store.3x3.double(
	; CHECK-NEXT: fmul(			; CHECK-NEXT: fmul(
	; CHECK-NEXT: fadd(			; CHECK-NEXT: fadd(
	; CHECK-NEXT: columnwise.load.3x3.double(addr %A, 5)			; CHECK-NEXT: columnwise.load.3x3.double(addr %A, 5)
	; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),			; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),
	; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),			; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),
	; CHECK-NEXT: addr %B,			; CHECK-NEXT: addr %B,
	; CHECK-NEXT: 10)			; CHECK-NEXT: 10)
	; CHECK-NEXT: remark: test.h:90:20: Lowered with 2 stores, 12 loads, 22 compute ops			; CHECK-NEXT: remark: test.h:90:20: Lowered with 2 stores, 12 loads, 22 compute ops
	; CHECK-NEXT: store(			; CHECK-NEXT: store(
	; CHECK-NEXT: multiply.2x6.6x2.double(			; CHECK-NEXT: multiply.2x6.6x2.double(
	; CHECK-NEXT: load(addr %C),			; CHECK-NEXT: load(addr %C),
	; CHECK-NEXT: load(addr %D)),			; CHECK-NEXT: load(addr %D)),
	; CHECK-NEXT: addr %E)			; CHECK-NEXT: addr %E)

	define void @multiple_expressions(<9 x double>* %A, <9 x double>* %B, <12 x double>* %C, <12 x double>* %D, <4 x double>* %E) !dbg !33 {			define void @multiple_expressions(<9 x double>* %A, <9 x double>* %B, <12 x double>* %C, <12 x double>* %D, <4 x double>* %E) !dbg !33 {
	%A.matrix = call <9 x double> @llvm.matrix.columnwise.load(<9 x double>* %A, i32 5, i32 3, i32 3), !dbg !34			%A.matrix = call <9 x double> @llvm.matrix.columnwise.load(<9 x double>* %A, i32 5, i32 3, i32 3), !dbg !34
	%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix			%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix, !dbg !34
	%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix			%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix, !dbg !34
	call void @llvm.matrix.columnwise.store(<9 x double> %R2.matrix, <9 x double>* %B, i32 10, i32 3, i32 3), !dbg !34			call void @llvm.matrix.columnwise.store(<9 x double> %R2.matrix, <9 x double>* %B, i32 10, i32 3, i32 3), !dbg !34

	%C.matrix = load <12 x double>, <12 x double>* %C, !dbg !34			%C.matrix = load <12 x double>, <12 x double>* %C, !dbg !34
	%D.matrix = load <12 x double>, <12 x double>* %D, !dbg !34			%D.matrix = load <12 x double>, <12 x double>* %D, !dbg !34
	%Mult.matrix = call <4 x double> @llvm.matrix.multiply(<12 x double> %C.matrix, <12 x double> %D.matrix, i32 2, i32 6, i32 2), !dbg !34			%Mult.matrix = call <4 x double> @llvm.matrix.multiply(<12 x double> %C.matrix, <12 x double> %D.matrix, i32 2, i32 6, i32 2), !dbg !34
	store <4 x double> %Mult.matrix, <4 x double>* %E, !dbg !34			store <4 x double> %Mult.matrix, <4 x double>* %E, !dbg !34

	ret void			ret void
	}			}

	; CHECK-LABEL: remark: test.h:100:20: Lowered with 6 stores, 6 loads, 12 compute ops			; CHECK-LABEL: remark: test.h:100:20: Lowered with 6 stores, 6 loads, 12 compute ops
	; CHECK-NEXT: columnwise.store.3x3.double(			; CHECK-NEXT: columnwise.store.3x3.double(
	; CHECK-NEXT: fmul(			; CHECK-NEXT: fmul(
	; CHECK-NEXT: fadd(			; CHECK-NEXT: fadd(
	; CHECK-NEXT: columnwise.load.3x3.double(addr %A, 5)			; CHECK-NEXT: columnwise.load.3x3.double(addr %A, 5)
	; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),			; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),
	; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),			; CHECK-NEXT: (reused) columnwise.load.3x3.double(addr %A, 5)),
	; CHECK-NEXT: stack addr %B,			; CHECK-NEXT: stack addr %B,
	; CHECK-NEXT: 10)			; CHECK-NEXT: 10)
	define void @stackaddresses(<9 x double>* %A) !dbg !35 {			define void @stackaddresses(<9 x double>* %A) !dbg !35 {
	%B = alloca <9 x double>			%B = alloca <9 x double>
	%A.matrix = call <9 x double> @llvm.matrix.columnwise.load(<9 x double>* %A, i32 5, i32 3, i32 3), !dbg !36			%A.matrix = call <9 x double> @llvm.matrix.columnwise.load(<9 x double>* %A, i32 5, i32 3, i32 3), !dbg !36
	%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix			%R1.matrix = fadd <9 x double> %A.matrix, %A.matrix, !dbg !36
	%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix			%R2.matrix = fmul <9 x double> %R1.matrix, %A.matrix, !dbg !36
	call void @llvm.matrix.columnwise.store(<9 x double> %R2.matrix, <9 x double>* %B, i32 10, i32 3, i32 3), !dbg !36			call void @llvm.matrix.columnwise.store(<9 x double> %R2.matrix, <9 x double>* %B, i32 10, i32 3, i32 3), !dbg !36
	ret void			ret void
	}			}

	; CHECK-LABEL: remark: test.h:30:20: Lowered with 10 stores, 9 loads, 30 compute ops			; CHECK-LABEL: remark: test.h:30:20: Lowered with 10 stores, 9 loads, 30 compute ops
	; CHECK-NEXT: store(			; CHECK-NEXT: store(
	; CHECK-NEXT: transpose.5x3.double(load(addr %A)),			; CHECK-NEXT: transpose.5x3.double(load(addr %A)),
	; CHECK-NEXT: stack addr %s1)			; CHECK-NEXT: stack addr %s1)
	%S1 = type {<15 x double>*}			%S1 = type {<15 x double>*}
	define void @get_underlying_object(%S1* %A) !dbg !21 {			define void @get_underlying_object(%S1* %A) !dbg !21 {
	entry:			entry:
	%s1 = alloca <15 x double>, !dbg !22			%s1 = alloca <15 x double>, !dbg !22
	%a1 = getelementptr %S1, %S1* %A, i32 0, i32 0, !dbg !22			%a1 = getelementptr %S1, %S1* %A, i32 0, i32 0, !dbg !22
	%a2 = load <15 x double>, <15 x double>* %a1, !dbg !22			%a2 = load <15 x double>, <15 x double>* %a1, !dbg !22
	%av = load <15 x double>, <15 x double>* %a2, !dbg !22			%av = load <15 x double>, <15 x double>* %a2, !dbg !22

	%s2 = bitcast <15 x double>* %s1 to i64*, !dbg !22			%s2 = bitcast <15 x double>* %s1 to i64*, !dbg !22
	%s3 = bitcast i64* %s2 to <15 x double>*, !dbg !22			%s3 = bitcast i64* %s2 to <15 x double>*, !dbg !22

	%t = call <15 x double> @llvm.matrix.transpose.v15f64.v15f64(<15 x double> %av, i32 5, i32 3)			%t = call <15 x double> @llvm.matrix.transpose.v15f64.v15f64(<15 x double> %av, i32 5, i32 3), !dbg !22

	store <15 x double> %t, <15 x double>* %s3, !dbg !22			store <15 x double> %t, <15 x double>* %s3, !dbg !22
	ret void			ret void
	}			}

	declare <15 x double> @llvm.matrix.transpose.v15f64.v15f64(<15 x double>, i32, i32)			declare <15 x double> @llvm.matrix.transpose.v15f64.v15f64(<15 x double>, i32, i32)

	!llvm.dbg.cu = !{!0}			!llvm.dbg.cu = !{!0}
	▲ Show 20 Lines • Show All 45 Lines • Show Last 20 Lines