This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
2
FunctionComparator.cpp
-
unittests/Transforms/Utils/
-
Transforms/
-
Utils/
2
FunctionComparatorTest.cpp

Differential D59442

Enable Operand Reordering for Commutative Instructions in the FunctionComparator/MergeFunctions
Needs ReviewPublic

Authored by rcorcs on Mar 15 2019, 5:11 PM.

Download Raw Diff

Details

Reviewers

jfb
davide

Summary

The function merger should be able to handle small differences if the two functions are still equivalent.
This patch, as the title suggests, enables the function merger to handle simple operand reordering for commutative binary operators.
For example:

mul nsw i32 3, %a

should be equivalent to

mul nsw i32 %a, 3

This check is handled by the FunctionComparator. Whenever it says that two functions are equivalent, the MergeFunctions pass is able to merge them.

I've added a simple unit test to exercise this change. All unit tests pass as expected.

Rodrigo Rocha

Diff Detail

Repository: rL LLVM

Event Timeline

rcorcs created this revision.Mar 15 2019, 5:11 PM

Herald added a project: Restricted Project. · View Herald TranscriptMar 15 2019, 5:11 PM

Herald added subscribers: llvm-commits, jdoerfert. · View Herald Transcript

This seems like something that we should canonicalize in general, no? i.e. the function comparator should only compare the canonicalized form (which another pass would ensure), and never expect to see a non-canonical one. I'd much rather make sure something else canonicalizes this than try to merge all permutations.

This example that I gave in my comment is a simplistic one that gets canonicalized.
However, this patch would handle more interesting examples as well. Such as the following one:

int foo(int *v, int a, int b, int c) {

int t1 = (3*a);
int t2 = (c*b);
v[0] = t1+t2;
int t3 = (b*7);
int t4 = (c*a);
v[1] = t3+t4;
return v[0]+v[1];

}

int bar(int *v, int a, int b, int c) {

int t1 = (a*3);
int t2 = (b*c);
v[0] = t2+t1;
int t3 = (7*b);
int t4 = (a*c);
v[1] = t4+t3;
return v[1]+v[0];

}

which clang with -O1, -O2, or -Os (I haven't tested with other flags) will produce something similar to:

define dso_local i32 @foo(i32* nocapture %v, i32 %a, i32 %b, i32 %c) local_unnamed_addr #0 {
entry:

%mul = mul nsw i32 %a, 3
%mul1 = mul nsw i32 %c, %b
%add = add nsw i32 %mul1, %mul
store i32 %add, i32* %v, align 4, !tbaa !2
%mul2 = mul nsw i32 %b, 7
%mul3 = mul nsw i32 %c, %a
%add4 = add nsw i32 %mul3, %mul2
%arrayidx5 = getelementptr inbounds i32, i32* %v, i64 1
store i32 %add4, i32* %arrayidx5, align 4, !tbaa !2
%add8 = add nsw i32 %add, %add4
ret i32 %add8

}

define dso_local i32 @bar(i32* nocapture %v, i32 %a, i32 %b, i32 %c) local_unnamed_addr #0 {
entry:

%mul = mul nsw i32 %a, 3
%mul1 = mul nsw i32 %c, %b
%add = add nsw i32 %mul1, %mul
store i32 %add, i32* %v, align 4, !tbaa !2
%mul2 = mul nsw i32 %b, 7
%mul3 = mul nsw i32 %c, %a
%add4 = add nsw i32 %mul3, %mul2
%arrayidx5 = getelementptr inbounds i32, i32* %v, i64 1
store i32 %add4, i32* %arrayidx5, align 4, !tbaa !2
%add8 = add nsw i32 %add4, %add
ret i32 %add8

}

and if you notice the instruction

%add8 = add nsw i32 %add4, %add

has the operands in the "wrong" order compared to the first function.

This makes the existing -mergefunc fail, but this patch fixes it.

I agree that canonicalization steps help a lot, but, of course, that I like the idea of having a more powerful function merger (as a proponent of one).

In D59442#1431789, @rcorcs wrote:

This example that I gave in my comment is a simplistic one that gets canonicalized.
However, this patch would handle more interesting examples as well. Such as the following one:

[...]

This makes the existing -mergefunc fail, but this patch fixes it.

I agree that canonicalization steps help a lot, but, of course, that I like the idea of having a more powerful function merger (as a proponent of one).

Gotcha. Have you seen this occur? Which cases? Are there better mergefuncs-independent canonicalizations that would handle such matching? i.e. can we improve canonicalization so that we don't need to do the matching here?

I want to make sure we don't unnecessarily slow down mergefuncs. We might at some point consider a "slower but more powerful" set of comparators, and "fast but good" ones. As I described on the mailing list last week, mergefuncs can help compile time if we run the simple one early, and way later run the complex one.

lib/Transforms/Utils/FunctionComparator.cpp
780	Please run the added code through clang-format.
928	How does your change behave when hashing? Hashing is an important part of mergefuncs avoiding O(n²) behavior.
unittests/Transforms/Utils/FunctionComparatorTest.cpp
163	You should test the more complex example you describe as well. In general you should be testing all the binary ops that LLVM has.
172	I don't think you want the tests here, you instead want regular IR tests with FileCheck. See `test/Transforms/MergeFunc/`.

I was talking to @dexonsmith about this review, and we like where this will eventually get LLVM. However, what really makes me uneasy is that I'm not sure the example you're trying to optimize happens often enough to warrant the algorithmic complexity in trying to perform the matching. Here are few more thoughts:

In the limit, you're trying to prove that two function graphs are isomorphic and that's generally NP. We don't want that by default, though it might be interesting to go full NP at high optimization levels for a subset of problems that occur in real code (e.g. use some heuristic to determine when going NP might pay off and not take too long (whatever that means)).
Without doing full graph isomorphism, there are probably some limited graph canonicalizations that would generally pay off on real-world code (for mergefuncs and LLVM in general). That sounds like what you tried to do in your paper on sequence alignment, and I am worried at that compile-time overhead of 15%.
We want mergefuncs to help compile time, maybe even at -O0, so there needs to be a way to run mergefuncs early with very few complex matching rules.
We'd like mergefuncs to be on by default before trying to make it super powerful. Solving the engineering problems that prevent it from being on by default is a good amount of work, but the payoff is that super powerful optimizations will get way more testing and real-world measurements (both on compile-time and size improvements) than if mergefuncs is off.

jfb added a subscriber: vsk.Mar 19 2019, 9:52 AM

Cannot we canonicalize to have a smaller index as lhs ?

v[0] + v[1] - ok
v[1] + v[0] - swap to indexes

In D59442#1435093, @xbolva00 wrote:

Cannot we canonicalize to have a smaller index as lhs ?

v[0] + v[1] - ok
v[1] + v[0] - swap to indexes

For arrays that have constant indices sure, but in general no: it could be arrays with variable indices or just straight up variables and then we've got no truly natural ordering.

I understand your point of having a light-weight function merging (FM) to leave always on to reduce code-size/compilation-time.
I agree that perhaps we could have a parametrized merger that enables/disables more powerful comparisons.
That's why I also suggested that a more powerful merger (like my FM by Sequence Alignment) could be only enabled for heavier code-size optimizations (like -Oz) and have a light-weight FM always enabled (or enabled for most of the optimization settings).

You're right about the general problem being NP, but we can do similar tricks to what SLP does, for example.
In SLP, we are also constantly fighting this trade-off between having a better way of identifying isomorphism and compilation-time.
SLP is also based on finding isomorphic graphs (mostly trees), and they also handle operand reordering for commutative operations. In particular, in the function buildTree_rec when SLP is handling binary operations, which is done by the function reorderInputsAccordingToOpcode.
In this patch, I'm kind of borrowing that from SLP.

Regarding the 15% overhead of my FMSA, that is true for my prototype implementation, but I believe that this number can be reduced significantly.
For example, in this paper, we're still doing a lot of wasteful merges (80+%) that we throw away because we always try to merge with the best candidate from the ranking mechanism.
But in most cases, even the top candidate is very dissimilar from the current function being considered, so that we could just avoid trying to merge them.
However, we didn't try to improve that for this paper. But we could have a way for quickly discarding bad candidates, similar to how the ranking itself works.
This, among other things, would definitely reduce the 15% average overhead.

I will look into the canonicalization pass.
I also need to read the discussions, that you shared with me in the mailing list, in more detail. But I'm aware of the MergeSimilarFunctions work, as it is the state-of-the-art that I use for comparison in my paper.

Nicola added a subscriber: Nicola.Apr 8 2019, 3:04 AM

dmgreen added a subscriber: dmgreen.Apr 11 2019, 5:29 AM

aykevl added a subscriber: aykevl.May 16 2019, 5:42 PM

dnelson_1901 added a subscriber: dnelson_1901.Aug 15 2019, 5:54 PM

Revision Contents

Path

Size


	llvm/

lib/

Transforms/

Utils/

FunctionComparator.cpp

36 lines

unittests/

Transforms/

Utils/

FunctionComparatorTest.cpp

43 lines

Diff 190926

lib/Transforms/Utils/FunctionComparator.cpp

Show First 20 Lines • Show All 752 Lines • ▼ Show 20 Lines	int FunctionComparator::cmpBasicBlocks(const BasicBlock *BBL,

do {		do {
bool needToCmpOperands = true;		bool needToCmpOperands = true;
if (int Res = cmpOperations(&InstL, &InstR, needToCmpOperands))		if (int Res = cmpOperations(&InstL, &InstR, needToCmpOperands))
return Res;		return Res;
if (needToCmpOperands) {		if (needToCmpOperands) {
assert(InstL->getNumOperands() == InstR->getNumOperands());		assert(InstL->getNumOperands() == InstR->getNumOperands());

		const BinaryOperator *BOL = dyn_cast<BinaryOperator>(InstL);
		const BinaryOperator *BOR = dyn_cast<BinaryOperator>(InstR);
		if (BOL && BOR && BOL->isCommutative()) {
		assert(BOR->isCommutative());

		Value *Op0L = BOL->getOperand(0);
		Value *Op0R = BOR->getOperand(0);
		Value *Op1L = BOL->getOperand(1);
		Value *Op1R = BOR->getOperand(1);
		int Res00 = cmpValues(Op0L, Op0R);
		int Res01 = cmpValues(Op0L, Op1R);
		int Res10 = cmpValues(Op1L, Op0R);
		int Res11 = cmpValues(Op1L, Op1R);
		int FinalRes;
		//get the final result to return
		if (Res00==0 && Res11==0) FinalRes = 0; //validate operands in the original order
		else if (Res01==0 && Res10==0) FinalRes = 0; //validate reodered operands
		else if (Res00) FinalRes = Res00; //assume original order
		else FinalRes = Res11;
		if (FinalRes) return FinalRes;
		jfbUnsubmitted Not Done Reply Inline Actions Please run the added code through clang-format. jfb: Please run the added code through clang-format.
		} else {
for (unsigned i = 0, e = InstL->getNumOperands(); i != e; ++i) {		for (unsigned i = 0, e = InstL->getNumOperands(); i != e; ++i) {
Value *OpL = InstL->getOperand(i);		Value *OpL = InstL->getOperand(i);
Value *OpR = InstR->getOperand(i);		Value *OpR = InstR->getOperand(i);
if (int Res = cmpValues(OpL, OpR))		if (int Res = cmpValues(OpL, OpR))
return Res;		return Res;
// cmpValues should ensure this is true.		// cmpValues should ensure this is true.
assert(cmpTypes(OpL->getType(), OpR->getType()) == 0);		assert(cmpTypes(OpL->getType(), OpR->getType()) == 0);
}		}
}		}
		}

++InstL;		++InstL;
++InstR;		++InstR;
} while (InstL != InstLE && InstR != InstRE);		} while (InstL != InstLE && InstR != InstRE);

if (InstL != InstLE && InstR == InstRE)		if (InstL != InstLE && InstR == InstRE)
return 1;		return 1;
if (InstL == InstLE && InstR != InstRE)		if (InstL == InstLE && InstR != InstRE)
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
// whether a function is varargs, the order of basic blocks (given by the		// whether a function is varargs, the order of basic blocks (given by the
// successors of each basic block in depth first order), and the order of		// successors of each basic block in depth first order), and the order of
// opcodes of each instruction within each of these basic blocks. This mirrors		// opcodes of each instruction within each of these basic blocks. This mirrors
// the strategy compare() uses to compare functions by walking the BBs in depth		// the strategy compare() uses to compare functions by walking the BBs in depth
// first order and comparing each instruction in sequence. Because this hash		// first order and comparing each instruction in sequence. Because this hash
// does not look at the operands, it is insensitive to things such as the		// does not look at the operands, it is insensitive to things such as the
// target of calls and the constants used in the function, which makes it useful		// target of calls and the constants used in the function, which makes it useful
// when possibly merging functions which are the same modulo constants and call		// when possibly merging functions which are the same modulo constants and call
// targets.		// targets.
		jfbUnsubmitted Not Done Reply Inline Actions How does your change behave when hashing? Hashing is an important part of mergefuncs avoiding O(n²) behavior. jfb: How does your change behave when hashing? Hashing is an important part of mergefuncs avoiding O…
FunctionComparator::FunctionHash FunctionComparator::functionHash(Function &F) {		FunctionComparator::FunctionHash FunctionComparator::functionHash(Function &F) {
HashAccumulator64 H;		HashAccumulator64 H;
H.add(F.isVarArg());		H.add(F.isVarArg());
H.add(F.arg_size());		H.add(F.arg_size());

SmallVector<const BasicBlock *, 8> BBs;		SmallVector<const BasicBlock *, 8> BBs;
SmallPtrSet<const BasicBlock *, 16> VisitedBBs;		SmallPtrSet<const BasicBlock *, 16> VisitedBBs;

Show All 21 Lines

unittests/Transforms/Utils/FunctionComparatorTest.cpp

//===- FunctionComparator.cpp - Unit tests for FunctionComparator ---------===//		//===- FunctionComparator.cpp - Unit tests for FunctionComparator ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "llvm/Transforms/Utils/FunctionComparator.h"		#include "llvm/Transforms/Utils/FunctionComparator.h"
#include "llvm/IR/BasicBlock.h"		#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
		#include "llvm/AsmParser/Parser.h"
		#include "llvm/Support/SourceMgr.h"
#include "gtest/gtest.h"		#include "gtest/gtest.h"

using namespace llvm;		using namespace llvm;


		static std::unique_ptr<Module> parseIR(LLVMContext &C, const char *IR) {
		SMDiagnostic Err;
		std::unique_ptr<Module> Mod = parseAssemblyString(IR, Err, C);
		if (!Mod)
		Err.print("FunctionComparatorTest", errs());
		return Mod;
		}

/// Generates a simple test function.		/// Generates a simple test function.
struct TestFunction {		struct TestFunction {
Function *F;		Function *F;
BasicBlock *BB;		BasicBlock *BB;
Constant *C;		Constant *C;
Instruction *I;		Instruction *I;
Type *T;		Type *T;

▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	TEST(FunctionComparatorTest, TestAPI) {
EXPECT_EQ(Cmp.testCmpGlobalValues(F1.F, F2.F), -1);		EXPECT_EQ(Cmp.testCmpGlobalValues(F1.F, F2.F), -1);
EXPECT_EQ(Cmp.testCmpValues(F1.I, F2.I), 0);		EXPECT_EQ(Cmp.testCmpValues(F1.I, F2.I), 0);
bool needToCmpOperands = false;		bool needToCmpOperands = false;
EXPECT_EQ(Cmp.testCmpOperations(F1.I, F2.I, needToCmpOperands), 0);		EXPECT_EQ(Cmp.testCmpOperations(F1.I, F2.I, needToCmpOperands), 0);
EXPECT_TRUE(needToCmpOperands);		EXPECT_TRUE(needToCmpOperands);
EXPECT_EQ(Cmp.testCmpTypes(F1.T, F2.T), 0);		EXPECT_EQ(Cmp.testCmpTypes(F1.T, F2.T), 0);
EXPECT_EQ(Cmp.testCmpPrimitives(), -4);		EXPECT_EQ(Cmp.testCmpPrimitives(), -4);
}		}


		TEST(FunctionComparatorTest, TestOperandReordering) {
		LLVMContext C;

		std::unique_ptr<Module> M = parseIR(
		C,
		R"(
		define dso_local i32 @foo(i32 %a, i32 %b, i32 %c) local_unnamed_addr {
		entry:
		%mul1 = mul nsw i32 %b, %c
		%mul = mul nsw i32 3, %a
		%add = add nsw i32 %mul1, %mul
		ret i32 %add
		}

		define dso_local i32 @bar(i32 %a, i32 %b, i32 %c) local_unnamed_addr {
		entry:
		%mul = mul nsw i32 %c, %b
		%mul1 = mul nsw i32 %a, 3
		%add = add nsw i32 %mul1, %mul
		ret i32 %add
		}
		jfbUnsubmitted Not Done Reply Inline Actions You should test the more complex example you describe as well. In general you should be testing all the binary ops that LLVM has. jfb: You should test the more complex example you describe as well. In general you should be testing…
		)");

		auto *F1 = M->getFunction("foo");
		auto *F2 = M->getFunction("bar");
		GlobalNumberState GlobalNumbers;
		FunctionComparator FCmp(F1,F2,&GlobalNumbers);
		bool AreEqual = FCmp.compare()==0;
		EXPECT_TRUE(AreEqual);
		}
		jfbUnsubmitted Not Done Reply Inline Actions I don't think you want the tests here, you instead want regular IR tests with FileCheck. See `test/Transforms/MergeFunc/`. jfb: I don't think you want the tests here, you instead want regular IR tests with FileCheck. See…