This is an archive of the discontinued LLVM Phabricator instance.

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
224	What is the purpose for having different struct types for each tool? Is there a case where they are different or would it allways be "{ToolType, ModuleName}"?

zhaoqin added inline comments.May 24 2016, 8:16 AM

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
224	So far, only the cache fragmentation tool takes the variable passed to runtime. Creating a struct variable includes two parts: type specification: line 202 - line 214 and variable initialization: line 216 - line 225. I feel it is better/clearer/simpler to put the two parts together for easy code reading. The header creation mainly just line 216 - line 218 and line 223. Separate it seems does more harm than good.

Rename CacheFragType to CacheFragTy for consistency.

update commit msg title

zhaoqin retitled this revision from [esan|cfrag]: Add createCacheFragGV. to [esan|cfrag] Create the cfrag variable for the runtime.May 24 2016, 9:34 AM

zhaoqin added a parent revision: D20488: [esan] Add calls from the ctor/dtor to the runtime library.May 24 2016, 12:34 PM

aizatsky added inline comments.May 24 2016, 2:16 PM

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
224	What I meant: maybe there should be "createToolGV" that would be used for all efficiency sanitizers? Do you think that work set tool will need a different layout of this variable? Maybe it is something you don't have to specialize, but keep generic instead?

aizatsky mentioned this in D20542: [esan|cfrag] Add the skeleton to handle the cfrag argument.May 24 2016, 2:18 PM

aizatsky accepted this revision.May 24 2016, 3:04 PM

aizatsky edited edge metadata.

This revision is now accepted and ready to land.May 24 2016, 3:04 PM

Do you want me to have a general GV layout for different Tool?
If so, it would probably be something like:
struct ToolTy {

int ToolType;
const char *UnitName;
void *SpecificToolGV;

};
Otherwise, expanding struct ToolTy with different layouts for different tools is too troublesome, (need specify the struct layout upfront line 213).
It would add indirect reference at runtime, but probably fine.

Do you want me to have a general GV layout for different Tool?

No, I don't think you need a general layout. But you can have a base struct with ToolType at least. This way you'll guarantee that ToolType would always be a first field => no confusion would happened.

Optional though.

In D20541#438443, @aizatsky wrote:

Do you want me to have a general GV layout for different Tool?

No, I don't think you need a general layout. But you can have a base struct with ToolType at least. This way you'll guarantee that ToolType would always be a first field => no confusion would happened.

Yes, I agree that would be better. What I worried is how to implement so.
As you can see at line 213, the Struct Type is declared upfront and then we create the new globalvariable with that type and fill in the value at line 220.
I can think of two ways to do so:

general layout as I described before
creating a struct ToolHeader or something to put beginning of each tool specific struct. Then it still rely on each tool putting that header struct at beginning.

In D20541#438477, @zhaoqin wrote:

In D20541#438443, @aizatsky wrote:

Do you want me to have a general GV layout for different Tool?

No, I don't think you need a general layout. But you can have a base struct with ToolType at least. This way you'll guarantee that ToolType would always be a first field => no confusion would happened.

Yes, I agree that would be better. What I worried is how to implement so.
As you can see at line 213, the Struct Type is declared upfront and then we create the new globalvariable with that type and fill in the value at line 220.
I can think of two ways to do so:

general layout as I described before

creating a struct ToolHeader or something to put beginning of each tool specific struct. Then it still rely on each tool putting that header struct at beginning.

Let's see how this will play out. Will push this decision onto the one adding the second struct type :)

Please add a test for the cache fragmentation tool's global variable.

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
232	I don't really see a reason to include the tool type here. We're already emitting it in the call to `__esan_init`.
251	I'd rather have a more abstract version of this that just creates an anonymous structure, and then each tool knows what to do with it. A bit like what UBSan does in clang. Check out clang's `lib/CodeGen/CGExpr.cpp` Create the array and then call EmitCheck (line 585): llvm::Constant StaticData[] = { EmitCheckSourceLocation(E->getLocStart()), EmitCheckTypeDescriptor(CalleeType) }; EmitCheck(Checks, "type_mismatch", StaticData, Ptr); In EmitCheck, we have a common way to emit the static args: // Emit handler arguments and create handler function type. if (!StaticArgs.empty()) { llvm::Constant Info = llvm::ConstantStruct::getAnon(StaticArgs); auto InfoPtr = new llvm::GlobalVariable(CGM.getModule(), Info->getType(), false, llvm::GlobalVariable::PrivateLinkage, Info); InfoPtr->setUnnamedAddr(true); CGM.getSanitizerMetadata()->disableSanitizerForGlobal(InfoPtr); Args.push_back(Builder.CreateBitCast(InfoPtr, Int8PtrTy)); ArgTypes.push_back(Int8PtrTy); } That way, each tool can just create the ArrayRef and pass it to a general `createToolGV` or something. `__esan_init` would get the two args: `ToolType` and a `void` which is tool specific.

In D20541#439070, @filcab wrote:

Please add a test for the cache fragmentation tool's global variable.

http://reviews.llvm.org/D20542 is the test for the arg passed to runtime.
Are you thinking of a different test in llvm?

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
232	One of the reason is to potentially support multiple tools in the same build/run, where each tool may need separate information. It is not in any near future plan, so no code refactoring for it, i.e., remove ToolType from __esan_init. But adding code with that in mind would help.
251	Thanks for the pointer. It is nice to know there are flexible ways to create structs. Working on it.

In D20541#439341, @zhaoqin wrote:

In D20541#439070, @filcab wrote:

Please add a test for the cache fragmentation tool's global variable.

http://reviews.llvm.org/D20542 is the test for the arg passed to runtime.
Are you thinking of a different test in llvm?

Yes, I'd like an LLVM test checking we emit the correct tool number.

Thanks for working on this.

Filipe

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
231	I'd prefer having `CacheFragInfo` (`CacheFragCUInfo` since it's a per-compile unit thing?) or something similar. `Ty` doesn't add anything to the rest of the name, and is not explanatory.
232	From what I remember, the plan is to only have one active tool at a time, especially with the different shadow scales, etc. Please don't implement partial extension points unless there is a plan (I'm ok with planning for portability across OS if it's not a big problem. But I'd rather avoid doing small things "planning" possible functionality which would imply huge changes). We're already saying this is a CacheFrag thing, so the tool was already picked.

In D20541#439464, @filcab wrote:

In D20541#439341, @zhaoqin wrote:

In D20541#439070, @filcab wrote:

Please add a test for the cache fragmentation tool's global variable.

http://reviews.llvm.org/D20542 is the test for the arg passed to runtime.
Are you thinking of a different test in llvm?

Yes, I'd like an LLVM test checking we emit the correct tool number.

In that case, http://reviews.llvm.org/D20488 has add test to check the tool number passed to __esan_init.

PTAL

In D20541#439794, @zhaoqin wrote:

In that case, http://reviews.llvm.org/D20488 has add test to check the tool number passed to __esan_init.

It's already there, in cache_frag_basic.ll.

Re: this change:
I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
Having to track what's getting pushed onto the SmallVector seems harder to follow without much gain.
(I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:

Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)

That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
133	That `Vec` in the name feels weird. Why not just `ToolInfoData` or even `ToolData`? No need repeating something that's in the type ("saying" that it's a vector).

In D20541#440870, @filcab wrote:

In D20541#439794, @zhaoqin wrote:

In that case, http://reviews.llvm.org/D20488 has add test to check the tool number passed to __esan_init.

It's already there, in cache_frag_basic.ll.

Re: this change:
I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
Having to track what's getting pushed onto the SmallVector seems harder to follow without much gain.
(I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)

I feel we are going circle here!!!
In my original CL, the full struct description and definition is at one place, i.e., in the tool specific createGV function.
I am asked to split it so the header (ToolType/UnitName) would be put at beginning of the struct for all different types of Tool GV.
Now it is hard to track what the struct is because of the split!!!

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:
Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)
That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?

I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?
The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.

StructType *CacheFragType =
  StructType::get(Int32Ty, IntptrTy, nullptr);
...
GlobalVariable *CacheFragGV = new GlobalVariable(
    M, CacheFragType, true, GlobalVariable::InternalLinkage,
    ConstantStruct::get(CacheFragType,
                        ConstantInt::get(Int32Ty, Options.ToolType),
                        ConstantExpr::getPointerCast(UnitName, IntptrTy),
                        nullptr));

It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
133	It is not about knowing what type it is at where it is declared. It is for knowing what type it is where it is used. If someone reading the code sees ToolInfoVec, it is clear that ToolInfoVec is a vec for ToolInfo with multiple entries and extensible without looking for its declaration. In contrast, ToolInfoData tells much less information, ToolInfoEntries would better. But I feel TooInfoVec is even better.

In D20541#440917, @zhaoqin wrote:

In D20541#440870, @filcab wrote:

Re: this change:
I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
Having to track what's getting pushed onto the SmallVector seems harder to follow without much gain.
(I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)

I feel we are going circle here!!!
In my original CL, the full struct description and definition is at one place, i.e., in the tool specific createGV function.
I am asked to split it so the header (ToolType/UnitName) would be put at beginning of the struct for all different types of Tool GV.
Now it is hard to track what the struct is because of the split!!!

I don't think we're going full circle. But sorry if I'm hard to understand. Sometimes I might not explain what I would like to have in the best way. Feel free to ask for more details if what I'm asking for seems out of place or seems to go against what I (or others) have said before.

In your original CL you had createCacheFragGV which would:
create a struct type
create the string global for the module
create the constant int and the constant expr to put in the struct
create the struct itself
create the GV

This is to create the GV argument, which tools need for getting information from the ESan pass, basically.

Out of those, some are not needed (create the struct type. We can use an anonymous struct), some are common to any tool that needs an extra argument (create the struct itself, create the GV), and others are, for now, common to every tool that gets an argument (create the string global for the module). The rest is tool-specific, and is about creating the values to put in the struct.

My previous comment was about extracting away the struct + GV creation (and not needing to have a struct type).
Those two steps are what I pointed to from clang's source: every check will create an ArrayRef with the data to be put on the struct, and then common code simply takes that and creates an anonymous structure.

Your change addressed those two points, and thanks for doing that. But the resulting code, for me, seems to be more confusing than it needs to be. One of the very good things about the clang example I posted was that you just need to look at the temporary array to figure out what exactly the structure that is passed to __esan_init contains. This also makes it easy to add/change fields to that structure (simply add a new llvm::Constant *[] element to the array creation).

My opinion is that it's much easier to create the array at a single place than to track down the SmallVector's inserted elements.

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:
Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)
That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?
I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?
The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.
StructType *CacheFragType =
  StructType::get(Int32Ty, IntptrTy, nullptr);
...
GlobalVariable *CacheFragGV = new GlobalVariable(
    M, CacheFragType, true, GlobalVariable::InternalLinkage,
    ConstantStruct::get(CacheFragType,
                        ConstantInt::get(Int32Ty, Options.ToolType),
                        ConstantExpr::getPointerCast(UnitName, IntptrTy),
                        nullptr));
It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?

The rationale is that each tool knows what things to place in its structure. If it wants to get the UnitName and wants it to be the first thing in the structure, it will place it there.
This is what is being done, for example, in clang's EmitCheck Look at the lib/CodeGen/CGExpr.cpp file I mentioned and search for calls to EmitCheck (especially the third parameter, usually called StaticData).

I don't think we're going full circle. But sorry if I'm hard to understand. Sometimes I might not explain what I would like to have in the best way. Feel free to ask for more details if what I'm asking for seems out of place or seems to go against what I (or others) have said before.

In your original CL you had createCacheFragGV which would:
create a struct type
create the string global for the module
create the constant int and the constant expr to put in the struct
create the struct itself
create the GV

This is to create the GV argument, which tools need for getting information from the ESan pass, basically.

Out of those, some are not needed (create the struct type. We can use an anonymous struct), some are common to any tool that needs an extra argument (create the struct itself, create the GV), and others are, for now, common to every tool that gets an argument (create the string global for the module). The rest is tool-specific, and is about creating the values to put in the struct.

IMHO, if you know a struct type layout, specify it upfront, i.e., create the struct type, and fill in the entries.
Because that's a good way to check that what you fill in is what you specified. I made several mistakes while implementing it, and thanks to the specified struct type I can easily spot the problem.

My previous comment was about extracting away the struct + GV creation (and not needing to have a struct type).
Those two steps are what I pointed to from clang's source: every check will create an ArrayRef with the data to be put on the struct, and then common code simply takes that and creates an anonymous structure.

The EmitCheck has a much more work than simply creating a GV, so it makes sense to having a separate place creating the StaticData and then call EmitCheck.
Here, we have one statement creating the GV, it does not make any sense to have a separate function for that.

Your change addressed those two points, and thanks for doing that. But the resulting code, for me, seems to be more confusing than it needs to be. One of the very good things about the clang example I posted was that you just need to look at the temporary array to figure out what exactly the structure that is passed to __esan_init contains. This also makes it easy to add/change fields to that structure (simply add a new llvm::Constant *[] element to the array creation).

It is easy to add/change fields, but also make it easy to make mistakes, because there is no struct type to check against.
Even we want to change the struct with a specified struct type, I do not see how hard to change the struct type, esp if it is at the beginning of the same function.
Basically, I feel having a specified struct type upfront has great benefit of error checking, and little downside extra step of changing the struct layout.

My opinion is that it's much easier to create the array at a single place than to track down the SmallVector's inserted elements.
How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:
Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)
That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?
I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?
The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.
StructType *CacheFragType =
  StructType::get(Int32Ty, IntptrTy, nullptr);
...
GlobalVariable *CacheFragGV = new GlobalVariable(
    M, CacheFragType, true, GlobalVariable::InternalLinkage,
    ConstantStruct::get(CacheFragType,
                        ConstantInt::get(Int32Ty, Options.ToolType),
                        ConstantExpr::getPointerCast(UnitName, IntptrTy),
                        nullptr));
It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?
The rationale is that each tool knows what things to place in its structure. If it wants to get the UnitName and wants it to be the first thing in the structure, it will place it there.

The only reason I change it to Vec is to make sure we can insert UnitName upfront in CreateEsanInitToolGV, and extend the struct in tool specific createGV functions.

In summary:

each tool itself is responsible for the struct layout, so no vec is necessary.
having a struct type specified helps check the correctness of creating the struct data.
creating the GV is a single statement, not worth separting out.

Looks like my original CL was good.

zhaoqin updated this object.May 27 2016, 8:40 AM

zhaoqin updated this object.May 27 2016, 8:43 AM

create static struct instead of using vec for creating ToolInfo and CacheFragInfo.

zhaoqin added a reviewer: bruening.May 27 2016, 9:56 AM

I think the commit message could do a better job of explaining that this is just a skeleton data structure being created here. Its contents are not filled in yet for the cache frag tool.

[esan|cfrag] Create the cfrag variable for the runtime

To me this implies it's created the finished data passed to the runtime, which is not true. Adding "skeleton" or something, or "Part 1 of 2" or something would help.

Adds static help routine createPrivateGlobalForString.

IMHO this is not a useful commit message line and should be removed. Please describe what the new code is doing, not the name of helper routines.

Adds createCacheFragInfoGV to create the cache-fragmentation tool
specific variable passed to the runtime library.

Updates createEsanInitToolInfoGV to create the default ToolInfo struct
if no tool specific variable is created.

I would suggest removing the function names and describing in prose what is happening. Maybe something like "Adds a global variable containing preliminary information for the runtime. Passes a pointer to the variable to the compilation unit init routine in the runtime. The varable initially contains just the compilation unit name."

bruening added inline comments.May 27 2016, 10:35 AM

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
234	Please elaborate in the comment: really it's not "expand" which implies we have something complete now: really, this unit name is not important and this is just a skeleton struct that is not finished. What we really want is to pass pointers to counters this pass will create for each field of each class/struct in the program. Maybe part of the confusion is that this CL is only passing something that seems general to every tool, which is why the other reviews all wanted to share fields and pass something for all tools. In hindsight it may have been better to send a CL that has all of the cache frag data (the counters) that we want to pass, and nullptr for other tools.
266	This seems to be a compromise that does not seem ideal: now we have two structs with the same field that is duplicated instead of shared. I would say, either go back to the original CL and pass nullptr here, or share this field via a contained struct or an anonymous generated struct sharing that field code (that approach seems to have been tried and rejected though).

zhaoqin retitled this revision from [esan|cfrag] Create the cfrag variable for the runtime to [esan|cfrag] Create the skeleton of cfrag variable for the runtime.May 27 2016, 3:34 PM

zhaoqin updated this object.

zhaoqin edited edge metadata.

zhaoqin updated this object.May 27 2016, 3:38 PM

PTAL

zhaoqin marked an inline comment as done.May 27 2016, 3:39 PM

zhaoqin added inline comments.

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
266	create a nullptr instead.

bruening accepted this revision.May 31 2016, 8:05 AM

bruening edited edge metadata.

Closed by commit rL271298: [esan|cfrag] Create the skeleton of cfrag variable for the runtime (authored by zhaoqin). · Explain WhyMay 31 2016, 10:20 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Instrumentation/

EfficiencySanitizer.cpp

79 lines

Diff 58795

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	OverrideOptionsFromCL(EfficiencySanitizerOptions Options) {
// Direct opt invocation with no params will have the default ESAN_None.		// Direct opt invocation with no params will have the default ESAN_None.
// We run the default tool in that case.		// We run the default tool in that case.
if (Options.ToolType == EfficiencySanitizerOptions::ESAN_None)		if (Options.ToolType == EfficiencySanitizerOptions::ESAN_None)
Options.ToolType = EfficiencySanitizerOptions::ESAN_CacheFrag;		Options.ToolType = EfficiencySanitizerOptions::ESAN_CacheFrag;

return Options;		return Options;
}		}

		// Create a constant for Str so that we can pass it to the run-time lib.
		static GlobalVariable *createPrivateGlobalForString(Module &M, StringRef Str,
		bool AllowMerging) {
		Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
		// We use private linkage for module-local strings. If they can be merged
		// with another one, we set the unnamed_addr attribute.
		GlobalVariable *GV =
		new GlobalVariable(M, StrConst->getType(), true,
		GlobalValue::PrivateLinkage, StrConst, "");
		if (AllowMerging)
		GV->setUnnamedAddr(true);
		GV->setAlignment(1); // Strings may not be merged w/o setting align 1.
		return GV;
		}

/// EfficiencySanitizer: instrument each module to find performance issues.		/// EfficiencySanitizer: instrument each module to find performance issues.
class EfficiencySanitizer : public ModulePass {		class EfficiencySanitizer : public ModulePass {
public:		public:
EfficiencySanitizer(		EfficiencySanitizer(
const EfficiencySanitizerOptions &Opts = EfficiencySanitizerOptions())		const EfficiencySanitizerOptions &Opts = EfficiencySanitizerOptions())
: ModulePass(ID), Options(OverrideOptionsFromCL(Opts)) {}		: ModulePass(ID), Options(OverrideOptionsFromCL(Opts)) {}
const char *getPassName() const override;		const char *getPassName() const override;
bool runOnModule(Module &M) override;		bool runOnModule(Module &M) override;
static char ID;		static char ID;

private:		private:
bool initOnModule(Module &M);		bool initOnModule(Module &M);
void initializeCallbacks(Module &M);		void initializeCallbacks(Module &M);
GlobalVariable *createEsanInitToolGV(Module &M);		GlobalVariable createCacheFragInfoGV(Module &M, Constant UnitName);
		filcabUnsubmitted Not Done Reply Inline Actions That `Vec` in the name feels weird. Why not just `ToolInfoData` or even `ToolData`? No need repeating something that's in the type ("saying" that it's a vector). filcab: That `Vec` in the name feels weird. Why not just `ToolInfoData` or even `ToolData`? No need…
		zhaoqinAuthorUnsubmitted Not Done Reply Inline Actions It is not about knowing what type it is at where it is declared. It is for knowing what type it is where it is used. If someone reading the code sees ToolInfoVec, it is clear that ToolInfoVec is a vec for ToolInfo with multiple entries and extensible without looking for its declaration. In contrast, ToolInfoData tells much less information, ToolInfoEntries would better. But I feel TooInfoVec is even better. zhaoqin: It is not about knowing what type it is at where it is declared. It is for knowing what type it…
		GlobalVariable *createEsanInitToolInfoGV(Module &M);
void createDestructor(Module &M, GlobalVariable *GV);		void createDestructor(Module &M, GlobalVariable *GV);
bool runOnFunction(Function &F, Module &M);		bool runOnFunction(Function &F, Module &M);
bool instrumentLoadOrStore(Instruction *I, const DataLayout &DL);		bool instrumentLoadOrStore(Instruction *I, const DataLayout &DL);
bool instrumentMemIntrinsic(MemIntrinsic *MI);		bool instrumentMemIntrinsic(MemIntrinsic *MI);
bool shouldIgnoreMemoryAccess(Instruction *I);		bool shouldIgnoreMemoryAccess(Instruction *I);
int getMemoryAccessFuncIndex(Value *Addr, const DataLayout &DL);		int getMemoryAccessFuncIndex(Value *Addr, const DataLayout &DL);
Value appToShadow(Value Shadow, IRBuilder<> &IRB);		Value appToShadow(Value Shadow, IRBuilder<> &IRB);
bool instrumentFastpath(Instruction *I, const DataLayout &DL, bool IsStore,		bool instrumentFastpath(Instruction *I, const DataLayout &DL, bool IsStore,
Value *Addr, unsigned Alignment);		Value *Addr, unsigned Alignment);
// Each tool has its own fastpath routine:		// Each tool has its own fastpath routine:
bool instrumentFastpathCacheFrag(Instruction *I, const DataLayout &DL,		bool instrumentFastpathCacheFrag(Instruction *I, const DataLayout &DL,
Value *Addr, unsigned Alignment);		Value *Addr, unsigned Alignment);
bool instrumentFastpathWorkingSet(Instruction *I, const DataLayout &DL,		bool instrumentFastpathWorkingSet(Instruction *I, const DataLayout &DL,
Value *Addr, unsigned Alignment);		Value *Addr, unsigned Alignment);
		vitalybukaUnsubmitted Done Reply Inline Actions This function does nothing with internals of EfficiencySanitizer so why not to keep it as regular function outside of the class? vitalybuka: This function does nothing with internals of EfficiencySanitizer so why not to keep it as…

EfficiencySanitizerOptions Options;		EfficiencySanitizerOptions Options;
LLVMContext *Ctx;		LLVMContext *Ctx;
Type *IntptrTy;		Type *IntptrTy;
// Our slowpath involves callouts to the runtime library.		// Our slowpath involves callouts to the runtime library.
// Access sizes are powers of two: 1, 2, 4, 8, 16.		// Access sizes are powers of two: 1, 2, 4, 8, 16.
static const size_t NumberOfAccessSizes = 5;		static const size_t NumberOfAccessSizes = 5;
Function *EsanAlignedLoad[NumberOfAccessSizes];		Function *EsanAlignedLoad[NumberOfAccessSizes];
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	void EfficiencySanitizer::initializeCallbacks(Module &M) {
MemcpyFn = checkSanitizerInterfaceFunction(		MemcpyFn = checkSanitizerInterfaceFunction(
M.getOrInsertFunction("memcpy", IRB.getInt8PtrTy(), IRB.getInt8PtrTy(),		M.getOrInsertFunction("memcpy", IRB.getInt8PtrTy(), IRB.getInt8PtrTy(),
IRB.getInt8PtrTy(), IntptrTy, nullptr));		IRB.getInt8PtrTy(), IntptrTy, nullptr));
MemsetFn = checkSanitizerInterfaceFunction(		MemsetFn = checkSanitizerInterfaceFunction(
M.getOrInsertFunction("memset", IRB.getInt8PtrTy(), IRB.getInt8PtrTy(),		M.getOrInsertFunction("memset", IRB.getInt8PtrTy(), IRB.getInt8PtrTy(),
IRB.getInt32Ty(), IntptrTy, nullptr));		IRB.getInt32Ty(), IntptrTy, nullptr));
}		}

// Create the tool-specific global variable passed to EsanInit and EsanExit.		// Create the global variable for the cache-fragmentation tool.
GlobalVariable *EfficiencySanitizer::createEsanInitToolGV(Module &M) {		GlobalVariable *EfficiencySanitizer::createCacheFragInfoGV(
		aizatskyUnsubmitted Not Done Reply Inline Actions What is the purpose for having different struct types for each tool? Is there a case where they are different or would it allways be "{ToolType, ModuleName}"? aizatsky: What is the purpose for having different struct types for each tool? Is there a case where they…
		zhaoqinAuthorUnsubmitted Not Done Reply Inline Actions So far, only the cache fragmentation tool takes the variable passed to runtime. Creating a struct variable includes two parts: type specification: line 202 - line 214 and variable initialization: line 216 - line 225. I feel it is better/clearer/simpler to put the two parts together for easy code reading. The header creation mainly just line 216 - line 218 and line 223. Separate it seems does more harm than good. zhaoqin: So far, only the cache fragmentation tool takes the variable passed to runtime. Creating a…
		aizatskyUnsubmitted Not Done Reply Inline Actions What I meant: maybe there should be "createToolGV" that would be used for all efficiency sanitizers? Do you think that work set tool will need a different layout of this variable? Maybe it is something you don't have to specialize, but keep generic instead? aizatsky: What I meant: maybe there should be "createToolGV" that would be used for all efficiency…
GlobalVariable *GV = nullptr;		Module &M, Constant *UnitName) {
// FIXME: create the tool specific global variable.		assert(Options.ToolType == EfficiencySanitizerOptions::ESAN_CacheFrag);
if (GV == nullptr) {		// Create the cache-fragmentation specific variable.
GV = new GlobalVariable(M, IntptrTy, true, GlobalVariable::InternalLinkage,		// This structure should be kept consistent with the CacheFragInfo struct
Constant::getNullValue(IntptrTy));		// in the runtime library.
		// struct CacheFragInfo {
		// const char *UnitName;
		filcabUnsubmitted Done Reply Inline Actions I'd prefer having `CacheFragInfo` (`CacheFragCUInfo` since it's a per-compile unit thing?) or something similar. `Ty` doesn't add anything to the rest of the name, and is not explanatory. filcab: I'd prefer having `CacheFragInfo` (`CacheFragCUInfo` since it's a per-compile unit thing?) or…
		// ...
		filcabUnsubmitted Not Done Reply Inline Actions I don't really see a reason to include the tool type here. We're already emitting it in the call to `__esan_init`. filcab: I don't really see a reason to include the tool type here. We're already emitting it in the…
		zhaoqinAuthorUnsubmitted Not Done Reply Inline Actions One of the reason is to potentially support multiple tools in the same build/run, where each tool may need separate information. It is not in any near future plan, so no code refactoring for it, i.e., remove ToolType from __esan_init. But adding code with that in mind would help. zhaoqin: One of the reason is to potentially support multiple tools in the same build/run, where each…
		filcabUnsubmitted Done Reply Inline Actions From what I remember, the plan is to only have one active tool at a time, especially with the different shadow scales, etc. Please don't implement partial extension points unless there is a plan (I'm ok with planning for portability across OS if it's not a big problem. But I'd rather avoid doing small things "planning" possible functionality which would imply huge changes). We're already saying this is a CacheFrag thing, so the tool was already picked. filcab: From what I remember, the plan is to only have one active tool at a time, especially with the…
		// };
		// FIXME: expand CacheFragInfo.
		brueningUnsubmitted Done Reply Inline Actions Please elaborate in the comment: really it's not "expand" which implies we have something complete now: really, this unit name is not important and this is just a skeleton struct that is not finished. What we really want is to pass pointers to counters this pass will create for each field of each class/struct in the program. Maybe part of the confusion is that this CL is only passing something that seems general to every tool, which is why the other reviews all wanted to share fields and pass something for all tools. In hindsight it may have been better to send a CL that has all of the cache frag data (the counters) that we want to pass, and nullptr for other tools. bruening: Please elaborate in the comment: really it's not "expand" which implies we have something…
		auto Int8PtrTy = Type::getInt8PtrTy(Ctx);
		// We specify the struct type upfront instead of using anonymous struct
		// for better error check.
		auto *CacheFragInfoTy = StructType::get(Int8PtrTy, nullptr);
		auto *CacheFragInfoGV = new GlobalVariable(
		M, CacheFragInfoTy, true, GlobalVariable::InternalLinkage,
		ConstantStruct::get(CacheFragInfoTy,
		UnitName,
		nullptr));
		return CacheFragInfoGV;
}		}
return GV;
		// Create the tool-specific global variable passed to EsanInit and EsanExit.
		GlobalVariable *EfficiencySanitizer::createEsanInitToolInfoGV(Module &M) {
		// This structure contains tool-specific information about each compilation
		// unit (module) and is passed to the runtime library.

		filcabUnsubmitted Done Reply Inline Actions I'd rather have a more abstract version of this that just creates an anonymous structure, and then each tool knows what to do with it. A bit like what UBSan does in clang. Check out clang's `lib/CodeGen/CGExpr.cpp` Create the array and then call EmitCheck (line 585): llvm::Constant StaticData[] = { EmitCheckSourceLocation(E->getLocStart()), EmitCheckTypeDescriptor(CalleeType) }; EmitCheck(Checks, "type_mismatch", StaticData, Ptr); In EmitCheck, we have a common way to emit the static args: // Emit handler arguments and create handler function type. if (!StaticArgs.empty()) { llvm::Constant Info = llvm::ConstantStruct::getAnon(StaticArgs); auto InfoPtr = new llvm::GlobalVariable(CGM.getModule(), Info->getType(), false, llvm::GlobalVariable::PrivateLinkage, Info); InfoPtr->setUnnamedAddr(true); CGM.getSanitizerMetadata()->disableSanitizerForGlobal(InfoPtr); Args.push_back(Builder.CreateBitCast(InfoPtr, Int8PtrTy)); ArgTypes.push_back(Int8PtrTy); } That way, each tool can just create the ArrayRef and pass it to a general `createToolGV` or something. `__esan_init` would get the two args: `ToolType` and a `void` which is tool specific. filcab: I'd rather have a more abstract version of this that just creates an anonymous structure, and…
		zhaoqinAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the pointer. It is nice to know there are flexible ways to create structs. Working on it. zhaoqin: Thanks for the pointer. It is nice to know there are flexible ways to create structs. Working…
		auto Int8PtrTy = Type::getInt8PtrTy(Ctx);
		// Compilation unit name.
		auto *UnitName = ConstantExpr::getPointerCast(
		createPrivateGlobalForString(M, M.getModuleIdentifier(), true),
		Int8PtrTy);

		// Create the tool-specific variable.
		if (Options.ToolType == EfficiencySanitizerOptions::ESAN_CacheFrag)
		return createCacheFragInfoGV(M, UnitName);

		// Create the default general variable.
		// This structure should be kept consistent with the ToolInfo struct
		// in the runtime library.
		// struct ToolInfo {
		// const char *UnitName;
		brueningUnsubmitted Done Reply Inline Actions This seems to be a compromise that does not seem ideal: now we have two structs with the same field that is duplicated instead of shared. I would say, either go back to the original CL and pass nullptr here, or share this field via a contained struct or an anonymous generated struct sharing that field code (that approach seems to have been tried and rejected though). bruening: This seems to be a compromise that does not seem ideal: now we have two structs with the same…
		zhaoqinAuthorUnsubmitted Not Done Reply Inline Actions create a nullptr instead. zhaoqin: create a nullptr instead.
		// };
		auto *ToolInfoTy = StructType::get(Int8PtrTy, nullptr);
		auto *ToolInfoGV = new GlobalVariable(M, ToolInfoTy, true,
		GlobalVariable::InternalLinkage,
		ConstantStruct::get(ToolInfoTy,
		UnitName,
		nullptr));
		return ToolInfoGV;
}		}

void EfficiencySanitizer::createDestructor(Module &M, GlobalVariable *GV) {		void EfficiencySanitizer::createDestructor(Module &M, GlobalVariable *GV) {
EsanDtorFunction = Function::Create(FunctionType::get(Type::getVoidTy(*Ctx),		EsanDtorFunction = Function::Create(FunctionType::get(Type::getVoidTy(*Ctx),
false),		false),
GlobalValue::InternalLinkage,		GlobalValue::InternalLinkage,
EsanModuleDtorName, &M);		EsanModuleDtorName, &M);
ReturnInst::Create(Ctx, BasicBlock::Create(Ctx, "", EsanDtorFunction));		ReturnInst::Create(Ctx, BasicBlock::Create(Ctx, "", EsanDtorFunction));
Show All 9 Lines

bool EfficiencySanitizer::initOnModule(Module &M) {		bool EfficiencySanitizer::initOnModule(Module &M) {
Ctx = &M.getContext();		Ctx = &M.getContext();
const DataLayout &DL = M.getDataLayout();		const DataLayout &DL = M.getDataLayout();
IRBuilder<> IRB(M.getContext());		IRBuilder<> IRB(M.getContext());
IntegerType *OrdTy = IRB.getInt32Ty();		IntegerType *OrdTy = IRB.getInt32Ty();
IntptrTy = DL.getIntPtrType(M.getContext());		IntptrTy = DL.getIntPtrType(M.getContext());
// Create the variable passed to EsanInit and EsanExit.		// Create the variable passed to EsanInit and EsanExit.
GlobalVariable *GV = createEsanInitToolGV(M);		GlobalVariable *GV = createEsanInitToolInfoGV(M);
// Constructor		// Constructor
std::tie(EsanCtorFunction, std::ignore) = createSanitizerCtorAndInitFunctions(		std::tie(EsanCtorFunction, std::ignore) = createSanitizerCtorAndInitFunctions(
M, EsanModuleCtorName, EsanInitName, /InitArgTypes=/{OrdTy, IntptrTy},		M, EsanModuleCtorName, EsanInitName, /InitArgTypes=/{OrdTy, IntptrTy},
/InitArgs=/{		/InitArgs=/{
ConstantInt::get(OrdTy, static_cast<int>(Options.ToolType)),		ConstantInt::get(OrdTy, static_cast<int>(Options.ToolType)),
ConstantExpr::getPointerCast(GV, IntptrTy)});		ConstantExpr::getPointerCast(GV, IntptrTy)});
appendToGlobalCtors(M, EsanCtorFunction, EsanCtorAndDtorPriority);		appendToGlobalCtors(M, EsanCtorFunction, EsanCtorAndDtorPriority);

▲ Show 20 Lines • Show All 252 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[esan|cfrag] Create the skeleton of cfrag variable for the runtimeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 58795

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp

[esan|cfrag] Create the skeleton of cfrag variable for the runtime
ClosedPublic