This is an archive of the discontinued LLVM Phabricator instance.

An analysis to find external function pointers and trace their data flow
AbandonedPublic

Authored by tmroeder on Feb 24 2014, 4:20 PM.

Download Raw Diff

Details

Reviewers: None

Summary

This patch provides a new analysis: ExternalFunctionAnalysis (EFA). EFA is useful for my Control-Flow Integrity pass (I'm waiting to submit the CFI patch until EFA is dealt with), since it can inform CFI about potential false positives due to known external function pointers.

The end result of EFA is an analysis that can answer two questions:

is this Instruction maybe an indirect call that is calling a function that is not in the current Module?
does this Function maybe contain indirect calls that target functions that are not in the current Module?

These questions are useful when the "current Module" is the Module used during Link-Time Optimization (LTO), since LTO gathers as much of the code as possible into a single Module.

EFA looks for function pointers returned by functions external to the module it is analyzing, and it traces the dataflow of these incoming external function pointers to find places where they are stored or called in indirect function calls.

EFA also supports an attribute annotation: "efa-maybe-external". This can be applied to variables, pointers, or functions (using llvm.var.annotation, llvm.ptr.annotation, and llvm.global.annotations, respectively). EFA traces the dataflow from variables and pointers annotated with efa-maybe-external and finds store instructions and indirect calls. It tries to match these store locations with the stores found from the incoming external function pointer analysis. And it warns if external function pointers are being stored into non-annotated locations.

EFA could be generalized to an Analysis Group, since there's a trivial EFA that just returns false for both questions mentioned above: it simply assumes that no instructions target external function pointers, and no functions contain indirect calls to external function pointers.

Diff Detail

Event Timeline

First pass comments.

Overall LLVM has had a lot of churn w.r.t. C++11, so you may want to rebase and use the new C++11 features since it'll make some of the code easier to read (and it won't need updating).

include/llvm/Analysis/ExternalFunctionAnalysis.h
11	It would be nice to have a better explanation of the algorithm and its use here, or on the class. You had some good information in your email.
22	I'll let others comment here, but it seems like LLVM likes using its own datastructures for some use cases in the Programmer's manual. I don't think you need SetVector<Value*> since you AFAICT you don't traverse the set, but you may want DenseMap.
49	virtual
76	It seems weird to call this and the next two functions "get*" when they don't return anything. Compute instead?
lib/Analysis/ExternalFunctionAnalysis.cpp
43	Put in anonymous namespace since it's local to the file.
54	You should probably hoist the above strings with efa_annotation since it's use in other parts of the file.
181	S->getValueOperand()
185	S->getPointerOperand()
199	This could be a constant if the declaration were an array, and you used array_lengthof.
418	You should use include/llvm/Target/TargetLibraryInfo.h. You're also missing the nothrow variant.
433	Why not malloc too? Is this just a performance thing, or does it impact correctness?

jfb added inline comments.Mar 10 2014, 4:20 PM

include/llvm/Analysis/ExternalFunctionAnalysis.h
2	s/cpp/h/

This diff makes changes as suggested by the last review. In summary:

Fixed the minor points noted in JF's comments
C++11-ified loops and such. Please let me know if there's other C++11-ification I should do here
Switched to using TargetLibraryInfo instead of hardcoded strings to check for memory-allocation functions.
Switched to using LLVM ADT as much as possible; I still need std::list in a couple places for the iterator guarantees it gives
The switch to C++11 range loops forced a bunch of const throughout the code. This probably should have been there in the first place.
Added explanatory docs on the class.

tmroeder added inline comments.Mar 12 2014, 12:21 PM

lib/Analysis/ExternalFunctionAnalysis.cpp
433	Memory allocation functions are a frequent source of false positives, so the code tries to skip them. I see what you mean though, about malloc, and in the next revision, I've switched to using TargetLibraryInfo and adding other memory allocation functions from it.

jfb added inline comments.Mar 12 2014, 4:58 PM

lib/Analysis/ExternalFunctionAnalysis.cpp
132	Doesn't the algorithm become imprecise if you don't follow vaarg arguments? AFAICT the call may just fail? Realistically it's probably just printf, but it would be good to document what happens with: void ugly(size_t n, ...) { typedef void (F)(); va_list list; va_start(list, n); for (size_t i = 0; i != n; ++n) { F f = va_arg(list, F); (f)(); } va_end(list); } void callee() { printf("Callee\n"); } int main() { ugly(1, &callee); } Maybe add a test if it's expected to work?
141	This could just be a cast<> and get the assert for free.
237	I'm not sure I understand what the bad implication is here, could you improve the warning to explain? Is it a kind of taint tracking thing, where it hard to know which maybe-external the pointer now refers to?
252	This should be array_lengthof(efa_annotation) - 1? The array contains the null terminator. The same applies a few places below.
261	The assert is redundant when using cast<>.
268	You could build a local StringRef(efa_annotation, annotation_len) and then just use operator== here. The same applies below.
290	Redundant.
300	Redundant.
302	Redundant.
328	3 redundant asserts above.

This patch removes several asserts and fixes string handling, as suggested. It also removes some extra unnecessary/unused code from the tests.

lib/Analysis/ExternalFunctionAnalysis.cpp
132	Yes, the algorithm becomes (more) imprecise by not following var args. The algorithm does not guarantee that it finds all the dataflow for external function pointers, only that it finds some. In fact, it's relatively easy to write code that the analysis misses even without considering var args: just do pointer arithmetic in C. The analysis doesn't trace dataflow through arithmetic operations, so it will lose track of that external pointer. In this case, not following the var args means that the analysis will miss places that external functions might be called, and they will become false positives when this analysis is used for Control-flow integrity. The CFI failure function will determine in that case what happens to the call. It needn't necessarily fail. Note that the case in the code above will not cause a call to fail or be a false positive, since callee is defined in the module. The only time this matters is when code is passing an external function pointer (e.g., from dlsym) to a var args call that then calls this function pointer. So, I think it's probably not worth the extra complexity to make the analysis more precise on an unusual corner case.
237	Yes, it's a matter of the precision of the analysis: this means that a maybe-external has been stored into a pointer that we're not tracking. It's just a warning, though, and is only a DEBUG warning, since imprecision of this algorithm is almost guaranteed in any complex program. I've updated the warning to try to make this more clear.
252	It turns out that the initializer length in the IR also contains the null terminator, so both strings end up with the same length this way.
261	The assert is for getOperand rather than the cast. But maybe getOperand can't return null. It will certainly assert if its argument is out of range.

Looks good overall, but it would be great to have someone more seasoned with LLVM reviewing this.

lib/Analysis/ExternalFunctionAnalysis.cpp
252	Hmm, weird. It still seems wrong to initialized the StringRef below counting the null terminator, instead of excluding it. Isn't the StringRef str below initialized with just the const char * constructor for StringRef? That would then set Length with strlen on Data, and the length would mismatch that of StringRef efa (because StringRef::equal first compare length, and then compares memory). Anyhow, I may be confused, but I'm surprised when string comparison counts the null terminator too, it seems like it could break in odd ways if one of the string's length starts not containing the null terminator.
261	Oh right, I was over-zealous on cast<>.
278	Unrelated to your change, but I was sad that StringRef can't have the following constructor: template<size_t N> StringRef(const char (&Data)[N]) : StringRef(Data, N) { } Templates don't participate in overload resolution if there is a best match nontemplate overload.

I tried to request a reviewer earlier today, but I didn't see any email sent about it. Apologies in advance if I somehow missed it and this is effectively a duplicate email.

Eric: this is a prerequisite for the implementation of Control-Flow Integrity (CFI) code we were discussing on llvmdev; this helps find external functions pointers that can cause false positives in CFI. Can you please take a look and tell me what you think? Or can you point me at the right person to review this if it's not you?

I think as far as the pass infrastructure and analysis passes in general that Chandler is going to be the best person to review this.

echristo resigned from this revision.Apr 22 2014, 9:36 AM

echristo removed a reviewer: echristo.

ping

Just a few comments from looking at it once, this isn't a complete review yet.

lib/Analysis/ExternalFunctionAnalysis.cpp
164–168	llvm::isAllocationFn? It's in include/llvm/Analysis/MemoryBuiltins.h and there are other similar functions in case that one isn't quite right.
224–225	If I is not null then I->getParent() is guaranteed* to be a BasicBlock. The only way it isn't guaranteed is if your pass is creating instructions and not inserting them into BasicBlocks (or taking instructions out of BBs and not deleting them). At the start and finish of every pass, this property holds. I think what you're referring to as instructions that appear as subexpressions are actually ConstantExpr's.
283	I claim this is always false. The only way an operand can be null is if the object whose operands we're looking at is metadata, and metadata never shows up as a user of another Value.
448–449	Why not if (!F->isDeclaration()) without the explicit cast?

This fixes comments from the recent review

nlewycky added a subscriber: nlewycky.Aug 5 2014, 8:28 PM

nlewycky added inline comments.

include/llvm/Analysis/ExternalFunctionAnalysis.h
38–39	This means "does this Function contain any Instruction matching case #1" right?
47–50	There's three possible states, certainly not calling an external function, certainly calling an external function, and who knows. The previous paragraph: / 1. Is this Instruction an indirect call that is maybe calling a function / that is not defined or declared in the current Module? suggests that you answer "false" only when certainly not calling into an external function, and "true" when calling into an external function or you're not sure. This paragraph suggests that you're going to track uses of external function pointers and try to track them, and that's only going to help you solve cases which are certainly calling an external function. Which way is it? I can't review the dataflow part of this patch without understanding this.
89–92	In C++ "struct SourcePair { ... };" will suffice.
lib/Analysis/ExternalFunctionAnalysis.cpp
1	No need for the emacs major mode marker on a .cpp file, just on the .h files.
71–82	This always returns an Argument, any reason not to make that clear in the type? Why not use std::advance? I think this becomes: static const Argument getParameterForArgument(unsigned ArgNo, const Function F) { if (ArgNo < F->arg_size()) return nullptr; return std::advance(F->arg_begin(), ArgNo); } but I haven't tried to compile that.
132–134	The form: if (const Value *Arg = getParameterForArgument(ArgNo, CalledFun)) followValue(Arg, FoundValues, SeenValues); is very common in LLVM.
299–300	It's possible to do that with a vector too, you just need to keep an index number (I assume you only add to the back).
372–373	Missing blank line.
392	Don't create a new one here, use the one that was created for the given target and module being compiled: const TargetLibraryInfo *TLI = P->getAnalysisIfAvailable<TargetLibraryInfo>();
424	TargetLibraryInfo supports a whole ton of different functions. Do you just care about allocation functions or any recognized library function (strlen, etc.)?
test/Analysis/ExternalFunctionAnalysis/external_function_analysis.ll
1	That shouldn't be necessary, opt works on .ll files as well as .bc files.
2	If you plan to discard %t2, use -disable-output instead?
2	Please don't use -debug-only=efa here, it makes your test only work on debug builds of llvm and not optimized builds. Instead, you could implement Pass::print() in your pass and then use that via "opt -analyze". This is the same thing that the "opt -analyze -scalar-evolution" tests do with the ScalarEvolution analysis.
150	Why?

amanone added a subscriber: amanone.Nov 6 2014, 6:35 AM

chandlerc removed a reviewer: chandlerc.Mar 29 2015, 11:36 AM

This is an old revision that is no longer needed (the implementation of CFI moved on long ago).

Herald added subscribers: dexonsmith, mehdi_amini, mgorny. · View Herald TranscriptNov 16 2018, 4:00 PM

Revision Contents

Path

Size

include/

llvm/

Analysis/

ExternalFunctionAnalysis.h

139 lines

InitializePasses.h

1 line

lib/

Analysis/

Analysis.cpp

1 line

CMakeLists.txt

1 line

ExternalFunctionAnalysis.cpp

458 lines

test/

Analysis/

ExternalFunctionAnalysis/

external_function_analysis.ll

159 lines

Diff 11586

include/llvm/Analysis/ExternalFunctionAnalysis.h

This file was added.

				//=- ExternalFunctionAnalysis.h: Find external function pointers -- C++ --==//
				//
				jfbUnsubmitted Not Done Reply Inline Actions s/cpp/h/ jfb: s/cpp/h/
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// \brief A pass that finds incoming external function pointers and finds
				/// annotated storage locations and indirect calls based on these locations.
				///
				jfbUnsubmitted Not Done Reply Inline Actions It would be nice to have a better explanation of the algorithm and its use here, or on the class. You had some good information in your email. jfb: It would be nice to have a better explanation of the algorithm and its use here, or on the…
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_ANALYSIS_EXTERNALFUNCTIONANALYSIS_H_
				#define LLVM_ANALYSIS_EXTERNALFUNCTIONANALYSIS_H_

				#include "llvm/ADT/DenseMap.h"
				#include "llvm/ADT/DenseSet.h"
				#include "llvm/Pass.h"

				namespace llvm {

				jfbUnsubmitted Not Done Reply Inline Actions I'll let others comment here, but it seems like LLVM likes using its own datastructures for some use cases in the Programmer's manual. I don't think you need SetVector<Value> since you AFAICT you don't traverse the set, but you may want DenseMap. jfb:* I'll let others comment here, but it seems like LLVM likes using its own datastructures for…
				class AnalysisUsage;
				class CallInst;
				class CallSite;
				class Function;
				class GlobalVariable;
				class Instruction;
				class StoreInst;
				class Value;

				/// External-Function Analysis (EFA) finds external function pointers and
				/// related stores and indirect calls.
				///
				/// EFA can answer two questions after it has run on a Module:
				/// 1. Is this Instruction an indirect call that is maybe calling a function
				/// that is not defined or declared in the current Module?
				/// 2. Does this Function contain indirect calls that maybe target functions
				/// that are not defined or declared in the current Module?
				nlewyckyUnsubmitted Not Done Reply Inline Actions This means "does this Function contain any Instruction matching case #1" right? nlewycky: This means "does this Function contain any Instruction matching case #1" right?
				///
				/// Answers to these questions are more useful when the current Module is the
				/// Module used during Link-Time Optimization (LTO), since LTO gathers as much
				/// of the code as possible into a single Module. This pass is useful in
				/// particular for Control-Flow Integrity (CFI), since CFI passes often need to
				/// find indirect calls that might be made through external function pointers.
				///
				/// EFA looks for function pointers returned by functions external to the module
				/// it is analyzing, and it traces the dataflow of these incoming external
				/// function pointers to find places where they are stored or called in indirect
				jfbUnsubmitted Not Done Reply Inline Actions virtual jfb: virtual
				/// function calls.
				nlewyckyUnsubmitted Not Done Reply Inline Actions There's three possible states, certainly not calling an external function, certainly calling an external function, and who knows. The previous paragraph: / 1. Is this Instruction an indirect call that is maybe calling a function / that is not defined or declared in the current Module? suggests that you answer "false" only when certainly not calling into an external function, and "true" when calling into an external function or you're not sure. This paragraph suggests that you're going to track uses of external function pointers and try to track them, and that's only going to help you solve cases which are certainly calling an external function. Which way is it? I can't review the dataflow part of this patch without understanding this. nlewycky: There's three possible states, certainly not calling an external function, certainly calling an…
				///
				/// EFA also supports an attribute annotation: "efa-maybe-external". This can be
				/// applied to variables, pointers, or functions (using llvm.var.annotation,
				/// llvm.ptr.annotation, and llvm.global.annotations, respectively). EFA traces
				/// the dataflow from variables and pointers annotated with efa-maybe-external
				/// and finds store instructions and indirect calls. It tries to match these
				/// store locations with the stores found from the incoming external function
				/// pointer analysis. And it warns if external function pointers are being
				/// stored into non-annotated locations.
				class ExternalFunctionAnalysis : public ModulePass {

				ExternalFunctionAnalysis(const ExternalFunctionAnalysis &)
				LLVM_DELETED_FUNCTION;

				ExternalFunctionAnalysis &
				operator=(const ExternalFunctionAnalysis &) LLVM_DELETED_FUNCTION;

				public:
				static char ID;
				ExternalFunctionAnalysis() : ModulePass(ID) {
				initializeExternalFunctionAnalysisPass(*PassRegistry::getPassRegistry());
				}

				virtual ~ExternalFunctionAnalysis() {}

				bool runOnModule(Module &M);
				jfbUnsubmitted Not Done Reply Inline Actions It seems weird to call this and the next two functions "get" when they don't return anything. Compute instead? jfb:* It seems weird to call this and the next two functions "get*" when they don't return anything.
				void getAnalysisUsage(AnalysisUsage &AU) const;
				const char *getPassName() const { return "ExternalFunctionAnalysis"; }

				/// Analyzes an instruction to see if it might be an indirect call to an
				/// external function pointer.
				bool maybeIsExternalCall(const Instruction *I);

				/// Analyzes a function to see if it was annotated to say it might contain
				/// indirect external calls.
				bool maybeContainsExternalCall(const Function *F);

				private:
				typedef struct {
				const Function *Source;
				const Function *Caller;
				} SourcePair;
				nlewyckyUnsubmitted Not Done Reply Inline Actions In C++ "struct SourcePair { ... };" will suffice. nlewycky: In C++ "struct SourcePair { ... };" will suffice.

				typedef DenseSet<const Instruction *> InstructionSet;
				typedef DenseSet<const StoreInst *> StoreSet;
				typedef DenseSet<const Function *> FunctionSet;
				typedef DenseMap<const StoreInst *, SourcePair> StoreSources;

				StoreSet MaybeExternalStores;
				FunctionSet MaybeExternalFuns;
				InstructionSet MaybeExternalCalls;

				/// Gets indirect call/invoke instructions that came from values that
				/// were annotated with __attribute__((annotate("efa-maybe-external"))).
				void computeMaybeExternalPtrInstrs(Module &M);

				/// Gets indirect call/invoke instructions that came from values that
				/// were annotated with __attribute__((annotate("efa-maybe-external"))). Also
				/// finds places where values are stored into these variables.
				void computeMaybeExternalVarInstrs(Module &M);

				/// Gets indirect call/invoked instructions that are in functions that
				/// are annotated with __attribute__((annotate("efa-maybe-external"))). Also
				/// gets call instructions that flow from annotated global function pointer
				/// variables.
				void computeMaybeExternalFuns(Module &M);

				/// Finds calls to the given GlobalVariable and finds stores into this
				/// variable.
				void findCalls(const GlobalVariable *GV);

				/// Finds call instructions and function types for each call that returns an
				/// external function pointer.
				void findExternalFunctionPointers(const Module &M);

				/// Finds store instructions that flow from a function pointer in the given
				/// instruction (and are derived from a call to the given function).
				void findFPStores(const Function F, const Instruction I,
				StoreSources &FPStores);

				/// Walks the chain of uses from a Value and adds any call instructions in
				/// chain to the Instrs set.
				void findRelatedInstrs(const Value *Val);
				};

				ModulePass *createExternalFunctionAnalysisPass();
				}

				#endif /* LLVM_ANALYSIS_EXTERNALFUNCTIONANALYSIS_H_ */

include/llvm/InitializePasses.h

Context not available.
	void initializeCFGOnlyViewerPass(PassRegistry&);	void initializeCFGOnlyViewerPass(PassRegistry&);
	void initializeCFGPrinterPass(PassRegistry&);	void initializeCFGPrinterPass(PassRegistry&);
	void initializeCFGSimplifyPassPass(PassRegistry&);	void initializeCFGSimplifyPassPass(PassRegistry&);
		void initializeExternalFunctionAnalysisPass(PassRegistry&);
	void initializeFlattenCFGPassPass(PassRegistry&);	void initializeFlattenCFGPassPass(PassRegistry&);
	void initializeStructurizeCFGPass(PassRegistry&);	void initializeStructurizeCFGPass(PassRegistry&);
	void initializeCFGViewerPass(PassRegistry&);	void initializeCFGViewerPass(PassRegistry&);
Context not available.

lib/Analysis/Analysis.cpp

Context not available.
	initializeDomViewerPass(Registry);	initializeDomViewerPass(Registry);
	initializeDomPrinterPass(Registry);	initializeDomPrinterPass(Registry);
	initializeDomOnlyViewerPass(Registry);	initializeDomOnlyViewerPass(Registry);
		initializeExternalFunctionAnalysisPass(Registry);
	initializePostDomViewerPass(Registry);	initializePostDomViewerPass(Registry);
	initializeDomOnlyPrinterPass(Registry);	initializeDomOnlyPrinterPass(Registry);
	initializePostDomPrinterPass(Registry);	initializePostDomPrinterPass(Registry);
Context not available.

lib/Analysis/CMakeLists.txt

Context not available.
	DependenceAnalysis.cpp	DependenceAnalysis.cpp
	DomPrinter.cpp	DomPrinter.cpp
	DominanceFrontier.cpp	DominanceFrontier.cpp
		ExternalFunctionAnalysis.cpp
	IVUsers.cpp	IVUsers.cpp
	InstCount.cpp	InstCount.cpp
	InstructionSimplify.cpp	InstructionSimplify.cpp
Context not available.

lib/Analysis/ExternalFunctionAnalysis.cpp

This file was added.

				//=- ExternalFunctionAnalysis.cpp: Find external function pointers -- C++ --//
				nlewyckyUnsubmitted Not Done Reply Inline Actions No need for the emacs major mode marker on a .cpp file, just on the .h files. nlewycky: No need for the emacs major mode marker on a .cpp file, just on the .h files.
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// \brief A pass that finds incoming external function pointers and finds
				/// annotated storage locations and indirect calls based on these locations.
				///
				//===----------------------------------------------------------------------===//

				#define DEBUG_TYPE "efa"
				#include "llvm/Analysis/ExternalFunctionAnalysis.h"

				#include "llvm/ADT/Statistic.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/Analysis/MemoryBuiltins.h"
				#include "llvm/IR/CallSite.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/DerivedTypes.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/GlobalValue.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Support/raw_ostream.h"
				#include "llvm/Target/TargetLibraryInfo.h"

				#include <list>

				using namespace llvm;

				STATISTIC(NumMaybeExternalCalls, "Number of indirect call sites maybe using"
				" external function pointers");
				STATISTIC(NumMaybeExternalStores, "Number of store instructions into annotated"
				" locations");
				STATISTIC(NumAnnotatedFunctions, "Number of functions annotated as maybe"
				" containing indirect calls to external code");

				char ExternalFunctionAnalysis::ID = 0;
				jfbUnsubmitted Not Done Reply Inline Actions Put in anonymous namespace since it's local to the file. jfb: Put in anonymous namespace since it's local to the file.
				INITIALIZE_PASS(ExternalFunctionAnalysis, "efa", "ExternalFunctionAnalysis",
				true, true)

				ModulePass *llvm::createExternalFunctionAnalysisPass() {
				return new ExternalFunctionAnalysis();
				}

				// This is the annotation string used to help the analysis.
				static const char efa_annotation[] = "efa-maybe-external";

				// These are the names of llvm-added external functions that appear when code is
				jfbUnsubmitted Not Done Reply Inline Actions You should probably hoist the above strings with efa_annotation since it's use in other parts of the file. jfb: You should probably hoist the above strings with efa_annotation since it's use in other parts…
				// annotated with __attribute__((annotate(...))).
				static const char llvm_var_annotation[] = "llvm.var.annotation";
				static const char llvm_ptr_annotation[] = "llvm.ptr.annotation";
				static const char llvm_ptr_annotation_p0i8[] = "llvm.ptr.annotation.p0i8";
				static const char llvm_global_annotations[] = "llvm.global.annotations";

				// Helper functions

				static bool isLLVMExternal(const Function *F) {
				StringRef FunName(F->getName());
				StringRef VarAnnotation(llvm_var_annotation);
				StringRef PtrAnnotation(llvm_ptr_annotation);
				return (FunName.startswith(VarAnnotation) \|\|
				FunName.startswith(PtrAnnotation));
				}

				static const Value getParameterForArgument(unsigned ArgNo, const Function F) {
				unsigned count = 0;
				Function::const_arg_iterator FAI, FAE;
				for (FAI = F->arg_begin(), FAE = F->arg_end(); FAI != FAE; ++FAI, ++count) {
				if (count == ArgNo) {
				const Value *Arg = FAI;
				return Arg;
				}
				}

				return NULL;
				}
				nlewyckyUnsubmitted Not Done Reply Inline Actions This always returns an Argument, any reason not to make that clear in the type? Why not use std::advance? I think this becomes: static const Argument getParameterForArgument(unsigned ArgNo, const Function F) { if (ArgNo < F->arg_size()) return nullptr; return std::advance(F->arg_begin(), ArgNo); } but I haven't tried to compile that. nlewycky: This always returns an Argument*, any reason not to make that clear in the type? Why not use…

				static void followValue(const Value Val, std::list<const Value > &FoundValues,
				DenseSet<const Value *> &SeenValues) {
				if (SeenValues.find(Val) == SeenValues.end()) {
				SeenValues.insert(Val);
				FoundValues.push_back(Val);
				}
				}

				// Follows a given call instruction that is either calling Val or is passing Val
				// in one of its arguments. If Val is being called, then add this to the set of
				// calls that might use an external function pointer. Otherwise, trace the
				// argument of a direct call down into the function itself and add the argument
				// to the list of values to follow.
				static void followCall(const CallInst CI, const Use U, const Value *Val,
				std::list<const Value *> &FoundValues,
				DenseSet<const Value *> &SeenValues,
				DenseSet<const Instruction *> &MaybeExternalCalls) {
				if (CI->getCalledValue() == Val) {
				MaybeExternalCalls.insert(CI);
				++NumMaybeExternalCalls;
				return;
				}

				// It must be one of the operands. So, add the operand in the called
				// function if this is a direct call to a function defined in this
				// module.
				Function *CalledFun = CI->getCalledFunction();
				if (!CalledFun)
				return;

				if (CalledFun->isDeclaration()) {
				if (isLLVMExternal(CalledFun)) {
				// An LLVM external like llvm.ptr.annotation or
				// llvm.var.annotation is like a cast instruction in effect.
				followValue(Val, FoundValues, SeenValues);
				}

				return;
				}

				ImmutableCallSite ICS(CI);
				if (!ICS.hasArgument(Val))
				return;

				unsigned ArgNo = ICS.getArgumentNo(U);

				// This can return NULL if CalledFun is a VarArg function, and the
				// argument we want is in the "..." part.
				const Value *Arg = getParameterForArgument(ArgNo, CalledFun);
				jfbUnsubmitted Not Done Reply Inline Actions Doesn't the algorithm become imprecise if you don't follow vaarg arguments? AFAICT the call may just fail? Realistically it's probably just printf, but it would be good to document what happens with: void ugly(size_t n, ...) { typedef void (F)(); va_list list; va_start(list, n); for (size_t i = 0; i != n; ++n) { F f = va_arg(list, F); (f)(); } va_end(list); } void callee() { printf("Callee\n"); } int main() { ugly(1, &callee); } Maybe add a test if it's expected to work? jfb: Doesn't the algorithm become imprecise if you don't follow vaarg arguments? AFAICT the call may…
				tmroederAuthorUnsubmitted Not Done Reply Inline Actions Yes, the algorithm becomes (more) imprecise by not following var args. The algorithm does not guarantee that it finds all the dataflow for external function pointers, only that it finds some. In fact, it's relatively easy to write code that the analysis misses even without considering var args: just do pointer arithmetic in C. The analysis doesn't trace dataflow through arithmetic operations, so it will lose track of that external pointer. In this case, not following the var args means that the analysis will miss places that external functions might be called, and they will become false positives when this analysis is used for Control-flow integrity. The CFI failure function will determine in that case what happens to the call. It needn't necessarily fail. Note that the case in the code above will not cause a call to fail or be a false positive, since callee is defined in the module. The only time this matters is when code is passing an external function pointer (e.g., from dlsym) to a var args call that then calls this function pointer. So, I think it's probably not worth the extra complexity to make the analysis more precise on an unusual corner case. tmroeder: Yes, the algorithm becomes (more) imprecise by not following var args. The algorithm does not…
				if (Arg)
				followValue(Arg, FoundValues, SeenValues);
				nlewyckyUnsubmitted Not Done Reply Inline Actions The form: if (const Value Arg = getParameterForArgument(ArgNo, CalledFun)) followValue(Arg, FoundValues, SeenValues); is very common in LLVM. nlewycky:* The form: if (const Value *Arg = getParameterForArgument(ArgNo, CalledFun)) followValue…
				}

				static const Function getFunctionParent(const Value Val) {
				// Add to the list all the calls to this function.
				const Instruction *I = cast<Instruction>(Val);
				const BasicBlock *BB = I->getParent();
				assert(BB && "Couldn't get the parent of an instruction");
				jfbUnsubmitted Not Done Reply Inline Actions This could just be a cast<> and get the assert for free. jfb: This could just be a cast<> and get the assert for free.

				const Function *ParentFun = BB->getParent();
				assert(ParentFun && "Couldn't get the function parent of a BasicBlock");
				return ParentFun;
				}

				// Follows a return instruction in Val by finding places it might return and
				// adding them to the list of values to follow (in FoundValues), if they haven't
				// been seen before.
				static void followReturn(const Value *Val,
				std::list<const Value *> &FoundValues,
				DenseSet<const Value *> &SeenValues) {
				const Function *ParentFun = getFunctionParent(Val);
				Function::const_use_iterator PFI, PFE;
				for (PFI = ParentFun->use_begin(), PFE = ParentFun->use_end(); PFI != PFE;
				++PFI) {
				const Use &PFU = *PFI;
				const User *PFUs = PFU.getUser();
				followValue(cast<Value>(PFUs), FoundValues, SeenValues);
				}
				}

				// Member functions

				void ExternalFunctionAnalysis::getAnalysisUsage(AnalysisUsage &AU) const {
				AU.setPreservesAll();
				}
				nicholasUnsubmitted Not Done Reply Inline Actions llvm::isAllocationFn? It's in include/llvm/Analysis/MemoryBuiltins.h and there are other similar functions in case that one isn't quite right. nicholas: llvm::isAllocationFn? It's in include/llvm/Analysis/MemoryBuiltins.h and there are other…

				void ExternalFunctionAnalysis::findRelatedInstrs(const Value *Val) {
				// Search all related values and the instructions that use them. This is
				// a very restricted data-flow analysis.

				const Function *F = getFunctionParent(Val);
				// We can't use any of the vector classes here, since iterators must still be
				// valid after push_back, which isn't guaranteed by a vector.
				std::list<const Value *> Vals;
				Vals.push_back(Val);
				DenseSet<const Value *> SeenValues;
				SeenValues.insert(Val);

				jfbUnsubmitted Not Done Reply Inline Actions S->getValueOperand() jfb: S->getValueOperand()
				for (const Value *V : Vals) {
				for (const Use &U : V->uses()) {
				const User *Us = U.getUser();
				if (isa<CastInst>(Us) \|\| isa<LoadInst>(Us) \|\|
				jfbUnsubmitted Not Done Reply Inline Actions S->getPointerOperand() jfb: S->getPointerOperand()
				isa<GetElementPtrInst>(Us)) {
				followValue(cast<Value>(Us), Vals, SeenValues);
				} else if (const CallInst *CI = dyn_cast<CallInst>(Us)) {
				followCall(CI, &U, V, Vals, SeenValues, MaybeExternalCalls);
				} else if (const StoreInst *S = dyn_cast<StoreInst>(Us)) {
				if (S->getValueOperand() == V) {
				DEBUG(dbgs() << "Warning: in " << F->getName() << " an"
				<< " efa-maybe-external pointer gets stored into another"
				<< " pointer; the analysis will not track this new"
				<< " pointer.\n");
				} else if (S->getPointerOperand() == V) {
				MaybeExternalStores.insert(S);
				++NumMaybeExternalStores;
				}
				jfbUnsubmitted Not Done Reply Inline Actions This could be a constant if the declaration were an array, and you used array_lengthof. jfb: This could be a constant if the declaration were an array, and you used array_lengthof.
				}
				}
				}
				}

				void ExternalFunctionAnalysis::computeMaybeExternalPtrInstrs(Module &M) {
				const Function *PtrAnnotation = M.getFunction(llvm_ptr_annotation_p0i8);
				if (!PtrAnnotation)
				return;

				size_t annotation_len = array_lengthof(efa_annotation);
				StringRef efa(efa_annotation, annotation_len);
				for (const auto &Us : PtrAnnotation->users()) {
				// The second operand should say "efa-maybe-external" for this to be the
				// right kind of annotation.

				const Value *OpVal = Us->getOperand(1);
				const Value *V = cast<User>(OpVal)->getOperand(0);
				const ConstantDataSequential *StrInit =
				cast<ConstantDataSequential>(cast<GlobalVariable>(V)->getInitializer());
				StringRef str = StrInit->getAsString();
				if (efa == str) {
				// The llvm.ptr.annotation returns a value that replaces the first arg.
				const Value *VPtr = cast<Value>(Us);
				findRelatedInstrs(VPtr);
				}
				nicholasUnsubmitted Not Done Reply Inline Actions If I is not null then I->getParent() is guaranteed* to be a BasicBlock. The only way it isn't guaranteed is if your pass is creating instructions and not inserting them into BasicBlocks (or taking instructions out of BBs and not deleting them). At the start and finish of every pass, this property holds. I think what you're referring to as instructions that appear as subexpressions are actually ConstantExpr's. nicholas: If I is not null then I->getParent() is guaranteed* to be a BasicBlock. * The only way it…
				}
				}

				void ExternalFunctionAnalysis::computeMaybeExternalVarInstrs(Module &M) {
				Function *VarAnnotation = M.getFunction(llvm_var_annotation);
				if (!VarAnnotation)
				return;

				size_t annotation_len = array_lengthof(efa_annotation);
				StringRef efa(efa_annotation, annotation_len);
				for (const auto &Us : VarAnnotation->users()) {
				// The second operand should say "efa-maybe-external" for this to be the
				jfbUnsubmitted Not Done Reply Inline Actions I'm not sure I understand what the bad implication is here, could you improve the warning to explain? Is it a kind of taint tracking thing, where it hard to know which maybe-external the pointer now refers to? jfb: I'm not sure I understand what the bad implication is here, could you improve the warning to…
				tmroederAuthorUnsubmitted Not Done Reply Inline Actions Yes, it's a matter of the precision of the analysis: this means that a maybe-external has been stored into a pointer that we're not tracking. It's just a warning, though, and is only a DEBUG warning, since imprecision of this algorithm is almost guaranteed in any complex program. I've updated the warning to try to make this more clear. tmroeder: Yes, it's a matter of the precision of the analysis: this means that a maybe-external has been…
				// right kind of annotation.
				const Value *OpVal = Us->getOperand(1);
				const Value *V = cast<User>(OpVal)->getOperand(0);
				const ConstantDataSequential *StrInit =
				cast<ConstantDataSequential>(cast<GlobalVariable>(V)->getInitializer());
				StringRef str = StrInit->getAsString();
				if (efa == str) {
				// The llvm.var.annotation does not return a value, but the pointer it
				// annotates is its first argument.
				const Value *VPtr = cast<User>(Us->getOperand(0))->getOperand(0);
				findRelatedInstrs(VPtr);
				}
				}
				}

				jfbUnsubmitted Not Done Reply Inline Actions This should be array_lengthof(efa_annotation) - 1? The array contains the null terminator. The same applies a few places below. jfb: This should be array_lengthof(efa_annotation) - 1? The array contains the null terminator. The…
				tmroederAuthorUnsubmitted Not Done Reply Inline Actions It turns out that the initializer length in the IR also contains the null terminator, so both strings end up with the same length this way. tmroeder: It turns out that the initializer length in the IR also contains the null terminator, so both…
				jfbUnsubmitted Not Done Reply Inline Actions Hmm, weird. It still seems wrong to initialized the StringRef below counting the null terminator, instead of excluding it. Isn't the StringRef str below initialized with just the const char * constructor for StringRef? That would then set Length with strlen on Data, and the length would mismatch that of StringRef efa (because StringRef::equal first compare length, and then compares memory). Anyhow, I may be confused, but I'm surprised when string comparison counts the null terminator too, it seems like it could break in odd ways if one of the string's length starts not containing the null terminator. jfb: Hmm, weird. It still seems wrong to initialized the StringRef below counting the null…
				void ExternalFunctionAnalysis::computeMaybeExternalFuns(Module &M) {
				GlobalVariable *Annotations = M.getNamedGlobal(llvm_global_annotations);
				if (!Annotations)
				return;

				size_t annotation_len = array_lengthof(efa_annotation);
				StringRef efa(efa_annotation, annotation_len);
				Constant *Init = Annotations->getInitializer();
				assert(Init && "No initializer for global annotations");
				jfbUnsubmitted Not Done Reply Inline Actions The assert is redundant when using cast<>. jfb: The assert is redundant when using cast<>.
				tmroederAuthorUnsubmitted Not Done Reply Inline Actions The assert is for getOperand rather than the cast. But maybe getOperand can't return null. It will certainly assert if its argument is out of range. tmroeder: The assert is for getOperand rather than the cast. But maybe getOperand can't return null. It…
				jfbUnsubmitted Not Done Reply Inline Actions Oh right, I was over-zealous on cast<>. jfb: Oh right, I was over-zealous on cast<>.
				for (const auto &V : Init->operand_values()) {
				const User *U = cast<User>(V);
				const Value *V2 = U->getOperand(0);
				const User *U2 = cast<User>(V2);
				if (const Function *F = dyn_cast<Function>(U2->getOperand(0))) {
				const Value *StrV = cast<User>(U->getOperand(1))->getOperand(0);
				const ConstantDataSequential *StrInit = cast<ConstantDataSequential>(
				jfbUnsubmitted Not Done Reply Inline Actions You could build a local StringRef(efa_annotation, annotation_len) and then just use operator== here. The same applies below. jfb: You could build a local StringRef(efa_annotation, annotation_len) and then just use operator==…
				cast<GlobalVariable>(StrV)->getInitializer());
				StringRef str = StrInit->getAsString();
				if (efa == str) {
				MaybeExternalFuns.insert(F);
				++NumAnnotatedFunctions;
				}
				} else if (const GlobalVariable *GV =
				dyn_cast<GlobalVariable>(U2->getOperand(0))) {
				// Only look at this global variable if it is of pointer type.
				Type *GVT = GV->getType();
				jfbUnsubmitted Not Done Reply Inline Actions Unrelated to your change, but I was sad that StringRef can't have the following constructor: template<size_t N> StringRef(const char (&Data)[N]) : StringRef(Data, N) { } Templates don't participate in overload resolution if there is a best match nontemplate overload. jfb: Unrelated to your change, but I was sad that StringRef can't have the following constructor…
				if (isa<PointerType>(GVT)) {
				findCalls(GV);
				}
				}
				}
				nicholasUnsubmitted Not Done Reply Inline Actions I claim this is always false. The only way an operand can be null is if the object whose operands we're looking at is metadata, and metadata never shows up as a user of another Value. nicholas: I claim this is always false. The only way an operand can be null is if the object whose…
				}

				void ExternalFunctionAnalysis::findCalls(const GlobalVariable *GV) {
				for (const auto &GUs : GV->users()) {
				const Value *V = cast<Value>(GUs);

				// For store instructions, the global variable can be used directly.
				jfbUnsubmitted Not Done Reply Inline Actions Redundant. jfb: Redundant.
				if (const StoreInst *VS = dyn_cast<StoreInst>(V)) {
				const Value *P = VS->getPointerOperand();
				if (P == GV) {
				MaybeExternalStores.insert(VS);
				++NumMaybeExternalStores;
				}
				}

				// We can't use a vector type here because this algorithm depends on being
				// able to continue iterating through FoundValues as new items are added.
				jfbUnsubmitted Not Done Reply Inline Actions Redundant. jfb: Redundant.
				nlewyckyUnsubmitted Not Done Reply Inline Actions It's possible to do that with a vector too, you just need to keep an index number (I assume you only add to the back). nlewycky: It's possible to do that with a vector too, you just need to keep an index number (I assume you…
				std::list<const Value *> FoundValues;
				FoundValues.push_back(V);
				jfbUnsubmitted Not Done Reply Inline Actions Redundant. jfb: Redundant.
				DenseSet<const Value *> SeenValues;
				SeenValues.insert(V);

				for (const Value *V : FoundValues) {
				for (const Use &U : V->uses()) {
				const User *Us = U.getUser();
				const Value *Val = cast<Value>(Us);
				if (isa<CastInst>(Val)) {
				followValue(Val, FoundValues, SeenValues);
				} else if (isa<ReturnInst>(Val)) {
				followReturn(Val, FoundValues, SeenValues);
				} else if (const CallInst *CI = dyn_cast<CallInst>(Val)) {
				followCall(CI, &U, V, FoundValues, SeenValues, MaybeExternalCalls);
				} else if (const StoreInst *S = dyn_cast<StoreInst>(Val)) {
				// If it's storing into this variable, then mark this store as a safe
				// store.
				const Value *P = S->getPointerOperand();
				if (P == V) {
				MaybeExternalStores.insert(S);
				++NumMaybeExternalStores;
				}
				}
				}
				}
				}
				}
				jfbUnsubmitted Not Done Reply Inline Actions 3 redundant asserts above. jfb: 3 redundant asserts above.

				void ExternalFunctionAnalysis::findFPStores(const Function *F,
				const Instruction *I,
				StoreSources &FPStores) {
				// We can't use a vector here because the algorithm depends on modifying
				// FoundValues while iterating through it.
				std::list<const Value *> FoundValues;
				FoundValues.push_back(I);
				DenseSet<const Value *> SeenValues;
				SeenValues.insert(I);

				const Function *CallerFun = getFunctionParent(I);

				for (const auto &V : FoundValues) {
				for (const Use &U : V->uses()) {
				const User *Us = U.getUser();
				const Value *Val = cast<Value>(Us);
				if (isa<CastInst>(Val)) {
				followValue(Val, FoundValues, SeenValues);
				} else if (isa<ReturnInst>(Val)) {
				followReturn(Val, FoundValues, SeenValues);
				} else if (const StoreInst *S = dyn_cast<StoreInst>(Val)) {
				const Value *P = S->getPointerOperand();
				Type *OpTy = P->getType();
				PointerType *PPTy = dyn_cast<PointerType>(OpTy);
				if (!PPTy)
				continue;

				Type *PElementTy = PPTy->getElementType();
				PointerType *PTy = dyn_cast<PointerType>(PElementTy);
				if (!PTy)
				continue;

				Type *ElementTy = PTy->getElementType();
				if (isa<FunctionType>(ElementTy)) {
				FPStores[S].Source = F;
				FPStores[S].Caller = CallerFun;
				}
				} else if (const CallInst *CI = dyn_cast<CallInst>(Val)) {
				followCall(CI, &U, V, FoundValues, SeenValues, MaybeExternalCalls);
				}
				}
				}
				}
				bool ExternalFunctionAnalysis::runOnModule(Module &M) {
				nlewyckyUnsubmitted Not Done Reply Inline Actions Missing blank line. nlewycky: Missing blank line.
				// Build up the sets of maybe-external functions, variables, pointers, and
				// their associated indirect calls and stores by looking for incoming external
				// function pointers and tracing both their dataflow and dataflow from
				// annotated storage locations.
				computeMaybeExternalFuns(M);
				computeMaybeExternalVarInstrs(M);
				computeMaybeExternalPtrInstrs(M);

				// Find function pointers returned from external functions.
				findExternalFunctionPointers(M);

				return true;
				}

				// For each call site of an external function that returns a pointer, trace this
				// value up to see if it is ever cast to a function pointer and stored.
				void ExternalFunctionAnalysis::findExternalFunctionPointers(const Module &M) {

				TargetLibraryInfo TLI;
				nlewyckyUnsubmitted Not Done Reply Inline Actions Don't create a new one here, use the one that was created for the given target and module being compiled: const TargetLibraryInfo TLI = P->getAnalysisIfAvailable<TargetLibraryInfo>(); nlewycky:* Don't create a new one here, use the one that was created for the given target and module being…

				// Walk through the set of functions looking for ones that return pointers.
				// The first function in FPStores is the external function that originally
				// generated the external pointer, and the second function is the function in
				// which the call to the first function took place.
				StoreSources FPStores;
				for (const Function &FR : M) {
				const Function *F = &FR;
				// We only follow calls to external pointers.
				if (!F->isDeclaration())
				continue;

				// We don't follow calls to llvm annotation functions
				if (isLLVMExternal(F))
				continue;

				// Make sure this external function returns a pointer type.
				FunctionType *FT = F->getFunctionType();
				Type *RT = FT->getReturnType();
				if (!isa<PointerType>(RT))
				continue;
				DEBUG(dbgs() << "External function '" << F->getName()
				<< "' returns a pointer\n");

				for (const auto &Us : F->users()) {
				const Instruction *I = dyn_cast<Instruction>(Us);
				jfbUnsubmitted Not Done Reply Inline Actions You should use include/llvm/Target/TargetLibraryInfo.h. You're also missing the nothrow variant. jfb: You should use include/llvm/Target/TargetLibraryInfo.h. You're also missing the nothrow variant.
				if (!I)
				continue;

				// By policy, we ignore memory allocation, since it is a frequent source
				// of false positives.
				if (isAllocationFn(I, &TLI, true /* LookThroughBitCast */))
				nlewyckyUnsubmitted Not Done Reply Inline Actions TargetLibraryInfo supports a whole ton of different functions. Do you just care about allocation functions or any recognized library function (strlen, etc.)? nlewycky: TargetLibraryInfo supports a whole ton of different functions. Do you just care about…
				break;

				// Only trace uses of direct calls.
				ImmutableCallSite ICS(I);
				if ((ICS.isCall() \|\| ICS.isInvoke()) && ICS.getCalledFunction()) {
				findFPStores(F, I, FPStores);
				}
				}
				}
				jfbUnsubmitted Not Done Reply Inline Actions Why not malloc too? Is this just a performance thing, or does it impact correctness? jfb: Why not malloc too? Is this just a performance thing, or does it impact correctness?
				tmroederAuthorUnsubmitted Not Done Reply Inline Actions Memory allocation functions are a frequent source of false positives, so the code tries to skip them. I see what you mean though, about malloc, and in the next revision, I've switched to using TargetLibraryInfo and adding other memory allocation functions from it. tmroeder: Memory allocation functions are a frequent source of false positives, so the code tries to skip…

				for (const auto &KV : FPStores) {
				// Is the storage location annotated with efa-maybe-external? If not, then
				// complain.
				if (MaybeExternalStores.find(KV.first) != MaybeExternalStores.end())
				continue;

				const Function *ParentFun = getFunctionParent(KV.first);
				if (!maybeContainsExternalCall(ParentFun)) {
				errs() << "A store instruction in " << ParentFun->getName()
				<< " is storing an external function pointer derived from a call"
				<< " to " << KV.second.Source->getName() << " in the function "
				<< KV.second.Caller->getName() << " but is not annotated with"
				<< " efa-maybe-external\n";
				}
				}
				nicholasUnsubmitted Not Done Reply Inline Actions Why not if (!F->isDeclaration()) without the explicit cast? nicholas: Why not if (!F->isDeclaration()) without the explicit cast?
				}

				bool ExternalFunctionAnalysis::maybeIsExternalCall(const Instruction *I) {
				return MaybeExternalCalls.find(I) != MaybeExternalCalls.end();
				}

				bool ExternalFunctionAnalysis::maybeContainsExternalCall(const Function *F) {
				return MaybeExternalFuns.find(F) != MaybeExternalFuns.end();
				}

test/Analysis/ExternalFunctionAnalysis/external_function_analysis.ll

This file was added.

				; RUN: llvm-as < %s >%t1
				nlewyckyUnsubmitted Not Done Reply Inline Actions That shouldn't be necessary, opt works on .ll files as well as .bc files. nlewycky: That shouldn't be necessary, opt works on .ll files as well as .bc files.
				; RUN: opt -efa -o %t2 %t1 -stats -debug-only=efa 2>&1 \| FileCheck %s
				nlewyckyUnsubmitted Not Done Reply Inline Actions If you plan to discard %t2, use -disable-output instead? nlewycky: If you plan to discard %t2, use -disable-output instead?
				nlewyckyUnsubmitted Not Done Reply Inline Actions Please don't use -debug-only=efa here, it makes your test only work on debug builds of llvm and not optimized builds. Instead, you could implement Pass::print() in your pass and then use that via "opt -analyze". This is the same thing that the "opt -analyze -scalar-evolution" tests do with the ScalarEvolution analysis. nlewycky: Please don't use -debug-only=efa here, it makes your test only work on debug builds of llvm and…

				target triple = "x86_64-unknown-linux-gnu"

				@.str1 = private unnamed_addr constant [17 x i8] c"external_call.ll\00", section "llvm.metadata"
				@.str2 = private unnamed_addr constant [19 x i8] c"efa-maybe-external\00", section "llvm.metadata"
				@fff = internal global i32 (...)* null, align 8
				@llvm.global.annotations = appending global [3 x { i8, i8, i8, i32 }] [{ i8, i8, i8, i32 } { i8* bitcast (i32 (...)** @fff to i8), i8 getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 19 }, { i8, i8, i8, i32 } { i8 bitcast (void ()* @known_external_call_fun to i8), i8 getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 19 }, { i8, i8, i8, i32 } { i8 bitcast (void ()* @known_external_call_fun to i8), i8 getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 19 }], section "llvm.metadata"

				%struct.fun_struct = type { i32 (...)* }
				@sfs = internal global %struct.fun_struct zeroinitializer, align 8

				define internal i32 (...)* @g() {
				entry:
				%call = call i32 (...)* (...)* @f()
				ret i32 (...)* %call
				}

				declare i32 (...)* @f(...)
				declare i8* (...)* @f2(...)
				declare void ()* @get_fun()

				define i32 @m(void ()* %fun) {
				call void ()* %fun()
				ret i32 0
				}

				define i32 @m_no_rewrite(void ()* %fun) {
				call void ()* %fun()
				ret i32 0
				}

				declare void @llvm.var.annotation(i8, i8, i8*, i32)
				declare i8* @llvm.ptr.annotation.p0i8(i8, i8, i8*, i32)

				define i32 @call_ext_fun() {
				%f = call void ()* ()* @get_fun()
				%a = call i32 @m(void ()* %f)
				ret i32 %a
				}

				; Check the case of storing into an annotated variable
				define void @var_annotation() {
				%h = alloca i32 (...)*, align 8
				%h1 = bitcast i32 (...)** %h to i8*
				call void @llvm.var.annotation(i8* %h1, i8* getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 25)
				%call = call i32 (...)* ()* @g()
				store i32 (...)* %call, i32 (...)** %h, align 8
				ret void
				}

				; Check the case of storing into an annotated struct member
				define void @struct_annotation() {
				%fs = alloca %struct.fun_struct, align 8
				%call = call i8* (...)* (...)* @f2()
				%1 = bitcast i8* (...)* %call to i32 (...)*
				%v = getelementptr inbounds %struct.fun_struct* %fs, i32 0, i32 0
				%2 = bitcast i32 (...)** %v to i8*
				%3 = call i8* @llvm.ptr.annotation.p0i8(i8* %2, i8* getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 10)
				%4 = bitcast i8* %3 to i32 (...)**
				store i32 (...)* %1, i32 (...)** %4, align 8
				ret void
				}

				; Check the case of storing into part of an annotated structure
				define void @struct_annotation_through_GEP() {
				%fs = alloca %struct.fun_struct, align 8
				%fs.v = bitcast %struct.fun_struct* %fs to i8*
				%1 = call i8* @llvm.ptr.annotation.p0i8(i8* %fs.v, i8* getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 10)
				%2 = bitcast i8* %1 to %struct.fun_struct*
				%call = call i8* (...)* (...)* @f2()
				%call.cast = bitcast i8* (...)* %call to i32 (...)*
				%v = getelementptr inbounds %struct.fun_struct* %2, i32 0, i32 0
				store i32 (...)* %call.cast, i32 (...)** %v, align 8
				ret void
				}

				; Check the case of storing into a non-annotated struct member
				define void @struct_non_annotation() {
				%fs = alloca %struct.fun_struct, align 8
				%call = call i8* (...)* (...)* @f2()
				%1 = bitcast i8* (...)* %call to i32 (...)*
				%v = getelementptr inbounds %struct.fun_struct* %fs, i32 0, i32 0
				store i32 (...)* %1, i32 (...)** %v, align 8
				ret void
				}

				; Check the case of calling a value from an annotated struct member
				define void @call_struct_annotation() {
				%fs = alloca %struct.fun_struct, align 8
				%v = getelementptr inbounds %struct.fun_struct* %fs, i32 0, i32 0
				%1 = bitcast i32 (...)** %v to i8*
				%2 = call i8* @llvm.ptr.annotation.p0i8(i8* %1, i8* getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 10)
				%3 = bitcast i8* %2 to i32 (...)**
				%4 = load i32 (...)** %3, align 8
				%rv = call i32 (...)* %4()
				ret void
				}

				define i32 (...)* @transitive_ext_fun() {
				%f = call i32 (...)* (...)* @f()
				%b = bitcast i32 (...)* %f to i8*
				%a = bitcast i8* %b to i32 (...)*
				ret i32 (...)* %f
				}

				define i32 (...)* @transitive_return_fun(i32 (...)* %in) {
				%b = bitcast i32 (...)* %in to i8*
				%a = bitcast i8* %b to i32 (...)*
				ret i32 (...)* %a
				}

				; Check the case of storing into an annotated variable
				define void @transitive_pointer_detection() {
				%h = alloca i32 (...)*, align 8
				%h1 = bitcast i32 (...)** %h to i8*
				call void @llvm.var.annotation(i8* %h1, i8* getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 25)
				%call = call i32 (...)* ()* @transitive_ext_fun()
				store i32 (...)* %call, i32 (...)** %h, align 8
				ret void
				}

				; Check the case of storing into an annotated variable
				define void @transitive_return_detection() {
				%h = alloca i32 (...)*, align 8
				%h1 = bitcast i32 (...)** %h to i8*
				call void @llvm.var.annotation(i8* %h1, i8* getelementptr inbounds ([19 x i8]* @.str2, i32 0, i32 0), i8* getelementptr inbounds ([17 x i8]* @.str1, i32 0, i32 0), i32 25)
				%f = call i32 (...)* (...)* @f()
				%call = call i32 (...)* (i32 (...)) @transitive_return_fun(i32 (...)* %f)
				store i32 (...)* %call, i32 (...)** %h, align 8
				ret void
				}

				; Check the case of storing into an annotated variable
				define void @var_non_annotation() {
				%h = alloca i32 (...)*, align 8
				%call = call i32 (...)* ()* @g()
				store i32 (...)* %call, i32 (...)** %h, align 8
				ret void
				}

				define void @known_external_call_fun() {
				%h = alloca i32 (...)*, align 8
				%call = call i32 (...)* ()* @g()
				store i32 (...)* %call, i32 (...)** %h, align 8
				ret void
				}

				; XFAIL: win32
				nlewyckyUnsubmitted Not Done Reply Inline Actions Why? nlewycky: Why?
				; CHECK: External function 'f' returns a pointer
				; CHECK: External function 'f2' returns a pointer
				; CHECK: External function 'get_fun' returns a pointer
				; CHECK-NOT: A store instruction in struct_annotation_through_GEP is storing an external function pointer derived from a call to f2 in the function struct_annotation_through_GEP
				; CHECK-DAG: A store instruction in struct_non_annotation is storing an external function pointer derived from a call to f2 in the function struct_non_annotation
				; CHECK-DAG: A store instruction in var_non_annotation is storing an external function pointer derived from a call to f in the function g
				; CHECK-NOT: A store instruction in known_external_call_fun is storing an external function pointer derived from a call to f in the function g
				; CHECK: 2 efa - Number of indirect call sites maybe using external function pointers
				; CHECK: 5 efa - Number of store instructions into annotated locations

This is an archive of the discontinued LLVM Phabricator instance.

An analysis to find external function pointers and trace their data flowAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 11586

include/llvm/Analysis/ExternalFunctionAnalysis.h

include/llvm/InitializePasses.h

lib/Analysis/Analysis.cpp

lib/Analysis/CMakeLists.txt

lib/Analysis/ExternalFunctionAnalysis.cpp

test/Analysis/ExternalFunctionAnalysis/external_function_analysis.ll

An analysis to find external function pointers and trace their data flow
AbandonedPublic