This is an archive of the discontinued LLVM Phabricator instance.

[esan|cfrag] Create the skeleton of cfrag variable for the runtime
ClosedPublic

Authored by zhaoqin on May 23 2016, 2:48 PM.

Details

Summary

Creates a global variable containing preliminary information
for the cache-fragmentation tool runtime.

Passes a pointer to the variable (null if no variable is created) to the
compilation unit init and exit routines in the runtime.

Diff Detail

Repository
rL LLVM

Event Timeline

zhaoqin updated this revision to Diff 58154.May 23 2016, 2:48 PM
zhaoqin retitled this revision from to [esan|cfrag]: Add createCacheFragGV..
zhaoqin updated this object.
zhaoqin added a reviewer: aizatsky.
zhaoqin added subscribers: zhaoqin, llvm-commits, eugenis and 4 others.
vitalybuka added inline comments.May 23 2016, 2:59 PM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
109 ↗(On Diff #58154)

This function does nothing with internals of EfficiencySanitizer so why not to keep it
as regular function outside of the class?

zhaoqin marked an inline comment as done.May 23 2016, 3:19 PM
zhaoqin updated this revision to Diff 58161.May 23 2016, 3:20 PM

Move createPrivateGlobalForString out of class.

Do not use ClTool* options.

aizatsky added inline comments.May 23 2016, 4:08 PM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
199 ↗(On Diff #58161)

What is the purpose for having different struct types for each tool? Is there a case where they are different or would it allways be "{ToolType, ModuleName}"?

zhaoqin added inline comments.May 24 2016, 8:16 AM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
199 ↗(On Diff #58161)

So far, only the cache fragmentation tool takes the variable passed to runtime.

Creating a struct variable includes two parts:
type specification: line 202 - line 214
and
variable initialization: line 216 - line 225.
I feel it is better/clearer/simpler to put the two parts together for easy code reading.

The header creation mainly just line 216 - line 218 and line 223. Separate it seems does more harm than good.

zhaoqin updated this revision to Diff 58245.May 24 2016, 9:16 AM

Rename CacheFragType to CacheFragTy for consistency.

zhaoqin updated this revision to Diff 58246.May 24 2016, 9:23 AM

update commit msg title

zhaoqin retitled this revision from [esan|cfrag]: Add createCacheFragGV. to [esan|cfrag] Create the cfrag variable for the runtime.May 24 2016, 9:34 AM
aizatsky added inline comments.May 24 2016, 2:16 PM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
199 ↗(On Diff #58246)

What I meant: maybe there should be "createToolGV" that would be used for all efficiency sanitizers? Do you think that work set tool will need a different layout of this variable? Maybe it is something you don't have to specialize, but keep generic instead?

aizatsky accepted this revision.May 24 2016, 3:04 PM
aizatsky edited edge metadata.
This revision is now accepted and ready to land.May 24 2016, 3:04 PM

Do you want me to have a general GV layout for different Tool?
If so, it would probably be something like:
struct ToolTy {

int ToolType;
const char *UnitName;
void *SpecificToolGV;

};
Otherwise, expanding struct ToolTy with different layouts for different tools is too troublesome, (need specify the struct layout upfront line 213).
It would add indirect reference at runtime, but probably fine.

Do you want me to have a general GV layout for different Tool?

No, I don't think you need a general layout. But you can have a base struct with ToolType at least. This way you'll guarantee that ToolType would always be a first field => no confusion would happened.

Optional though.

Do you want me to have a general GV layout for different Tool?

No, I don't think you need a general layout. But you can have a base struct with ToolType at least. This way you'll guarantee that ToolType would always be a first field => no confusion would happened.

Yes, I agree that would be better. What I worried is how to implement so.
As you can see at line 213, the Struct Type is declared upfront and then we create the new globalvariable with that type and fill in the value at line 220.
I can think of two ways to do so:

  1. general layout as I described before
  2. creating a struct ToolHeader or something to put beginning of each tool specific struct. Then it still rely on each tool putting that header struct at beginning.

Do you want me to have a general GV layout for different Tool?

No, I don't think you need a general layout. But you can have a base struct with ToolType at least. This way you'll guarantee that ToolType would always be a first field => no confusion would happened.

Yes, I agree that would be better. What I worried is how to implement so.
As you can see at line 213, the Struct Type is declared upfront and then we create the new globalvariable with that type and fill in the value at line 220.
I can think of two ways to do so:

  1. general layout as I described before
  2. creating a struct ToolHeader or something to put beginning of each tool specific struct. Then it still rely on each tool putting that header struct at beginning.

Let's see how this will play out. Will push this decision onto the one adding the second struct type :)

filcab added a subscriber: filcab.May 25 2016, 6:57 AM

Please add a test for the cache fragmentation tool's global variable.

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
207 ↗(On Diff #58246)

I don't really see a reason to include the tool type here. We're already emitting it in the call to __esan_init.

226 ↗(On Diff #58246)

I'd rather have a more abstract version of this that just creates an anonymous structure, and then each tool knows what to do with it.
A bit like what UBSan does in clang.

Check out clang's lib/CodeGen/CGExpr.cpp
Create the array and then call EmitCheck (line 585):

llvm::Constant *StaticData[] = {
  EmitCheckSourceLocation(E->getLocStart()),
  EmitCheckTypeDescriptor(CalleeType)
};
EmitCheck(Checks, "type_mismatch", StaticData, Ptr);

In EmitCheck, we have a common way to emit the static args:

// Emit handler arguments and create handler function type.
if (!StaticArgs.empty()) {
  llvm::Constant *Info = llvm::ConstantStruct::getAnon(StaticArgs);
  auto *InfoPtr =
      new llvm::GlobalVariable(CGM.getModule(), Info->getType(), false,
                               llvm::GlobalVariable::PrivateLinkage, Info);
  InfoPtr->setUnnamedAddr(true);
  CGM.getSanitizerMetadata()->disableSanitizerForGlobal(InfoPtr);
  Args.push_back(Builder.CreateBitCast(InfoPtr, Int8PtrTy));
  ArgTypes.push_back(Int8PtrTy);
}

That way, each tool can just create the ArrayRef and pass it to a general createToolGV or something.

__esan_init would get the two args: ToolType and a void* which is tool specific.

Please add a test for the cache fragmentation tool's global variable.

http://reviews.llvm.org/D20542 is the test for the arg passed to runtime.
Are you thinking of a different test in llvm?

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
207 ↗(On Diff #58246)

One of the reason is to potentially support multiple tools in the same build/run, where each tool may need separate information.
It is not in any near future plan, so no code refactoring for it, i.e., remove ToolType from __esan_init.
But adding code with that in mind would help.

226 ↗(On Diff #58246)

Thanks for the pointer.
It is nice to know there are flexible ways to create structs. Working on it.

Please add a test for the cache fragmentation tool's global variable.

http://reviews.llvm.org/D20542 is the test for the arg passed to runtime.
Are you thinking of a different test in llvm?

Yes, I'd like an LLVM test checking we emit the correct tool number.

Thanks for working on this.

Filipe

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
206 ↗(On Diff #58246)

I'd prefer having CacheFragInfo (CacheFragCUInfo since it's a per-compile unit thing?) or something similar. Ty doesn't add anything to the rest of the name, and is not explanatory.

207 ↗(On Diff #58246)

From what I remember, the plan is to only have one active tool at a time, especially with the different shadow scales, etc.

Please don't implement partial extension points unless there is a plan (I'm ok with planning for portability across OS if it's not a big problem. But I'd rather avoid doing small things "planning" possible functionality which would imply huge changes).
We're already saying this is a CacheFrag thing, so the tool was already picked.

zhaoqin marked 3 inline comments as done.May 25 2016, 1:24 PM

Please add a test for the cache fragmentation tool's global variable.

http://reviews.llvm.org/D20542 is the test for the arg passed to runtime.
Are you thinking of a different test in llvm?

Yes, I'd like an LLVM test checking we emit the correct tool number.

In that case, http://reviews.llvm.org/D20488 has add test to check the tool number passed to __esan_init.

zhaoqin updated this revision to Diff 58489.May 25 2016, 1:25 PM
zhaoqin edited edge metadata.

PTAL

In that case, http://reviews.llvm.org/D20488 has add test to check the tool number passed to __esan_init.

It's already there, in cache_frag_basic.ll.

Re: this change:
I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
Having to track what's getting pushed onto the SmallVector seems harder to follow without much gain.
(I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:

Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)

That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
133 ↗(On Diff #58489)

That Vec in the name feels weird. Why not just ToolInfoData or even ToolData? No need repeating something that's in the type ("saying" that it's a vector).

In that case, http://reviews.llvm.org/D20488 has add test to check the tool number passed to __esan_init.

It's already there, in cache_frag_basic.ll.

Re: this change:
I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
Having to track what's getting pushed onto the SmallVector seems harder to follow without much gain.
(I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)

I feel we are going circle here!!!
In my original CL, the full struct description and definition is at one place, i.e., in the tool specific createGV function.
I am asked to split it so the header (ToolType/UnitName) would be put at beginning of the struct for all different types of Tool GV.
Now it is hard to track what the struct is because of the split!!!

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:

Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)

That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?

I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?
The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.

StructType *CacheFragType =
  StructType::get(Int32Ty, IntptrTy, nullptr);
...
GlobalVariable *CacheFragGV = new GlobalVariable(
    M, CacheFragType, true, GlobalVariable::InternalLinkage,
    ConstantStruct::get(CacheFragType,
                        ConstantInt::get(Int32Ty, Options.ToolType),
                        ConstantExpr::getPointerCast(UnitName, IntptrTy),
                        nullptr));

It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?

lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
133 ↗(On Diff #58489)

It is not about knowing what type it is at where it is declared. It is for knowing what type it is where it is used.
If someone reading the code sees ToolInfoVec, it is clear that ToolInfoVec is a vec for ToolInfo with multiple entries and extensible without looking for its declaration.
In contrast, ToolInfoData tells much less information, ToolInfoEntries would better. But I feel TooInfoVec is even better.

Re: this change:
I think it's better, but I don't really like one thing here: Finding out what's getting included in the anonymous structure has us browsing the source code in several different locations.
Having to track what's getting pushed onto the SmallVector seems harder to follow without much gain.
(I think the current solution is acceptable, but I'd prefer to have it be a bit easier to read/follow)

I feel we are going circle here!!!
In my original CL, the full struct description and definition is at one place, i.e., in the tool specific createGV function.
I am asked to split it so the header (ToolType/UnitName) would be put at beginning of the struct for all different types of Tool GV.
Now it is hard to track what the struct is because of the split!!!

I don't think we're going full circle. But sorry if I'm hard to understand. Sometimes I might not explain what I would like to have in the best way. Feel free to ask for more details if what I'm asking for seems out of place or seems to go against what I (or others) have said before.

In your original CL you had createCacheFragGV which would:
create a struct type
create the string global for the module
create the constant int and the constant expr to put in the struct
create the struct itself
create the GV

This is to create the GV argument, which tools need for getting information from the ESan pass, basically.

Out of those, some are not needed (create the struct type. We can use an anonymous struct), some are common to any tool that needs an extra argument (create the struct itself, create the GV), and others are, for now, common to every tool that gets an argument (create the string global for the module). The rest is tool-specific, and is about creating the values to put in the struct.

My previous comment was about extracting away the struct + GV creation (and not needing to have a struct type).
Those two steps are what I pointed to from clang's source: every check will create an ArrayRef with the data to be put on the struct, and then common code simply takes that and creates an anonymous structure.

Your change addressed those two points, and thanks for doing that. But the resulting code, for me, seems to be more confusing than it needs to be. One of the very good things about the clang example I posted was that you just need to look at the temporary array to figure out what exactly the structure that is passed to __esan_init contains. This also makes it easy to add/change fields to that structure (simply add a new llvm::Constant *[] element to the array creation).

My opinion is that it's much easier to create the array at a single place than to track down the SmallVector's inserted elements.

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:

Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)

That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?

I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?
The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.

StructType *CacheFragType =
  StructType::get(Int32Ty, IntptrTy, nullptr);
...
GlobalVariable *CacheFragGV = new GlobalVariable(
    M, CacheFragType, true, GlobalVariable::InternalLinkage,
    ConstantStruct::get(CacheFragType,
                        ConstantInt::get(Int32Ty, Options.ToolType),
                        ConstantExpr::getPointerCast(UnitName, IntptrTy),
                        nullptr));

It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?

The rationale is that each tool knows what things to place in its structure. If it wants to get the UnitName and wants it to be the first thing in the structure, it will place it there.
This is what is being done, for example, in clang's EmitCheck Look at the lib/CodeGen/CGExpr.cpp file I mentioned and search for calls to EmitCheck (especially the third parameter, usually called StaticData).

I don't think we're going full circle. But sorry if I'm hard to understand. Sometimes I might not explain what I would like to have in the best way. Feel free to ask for more details if what I'm asking for seems out of place or seems to go against what I (or others) have said before.

In your original CL you had createCacheFragGV which would:
create a struct type
create the string global for the module
create the constant int and the constant expr to put in the struct
create the struct itself
create the GV

This is to create the GV argument, which tools need for getting information from the ESan pass, basically.

Out of those, some are not needed (create the struct type. We can use an anonymous struct), some are common to any tool that needs an extra argument (create the struct itself, create the GV), and others are, for now, common to every tool that gets an argument (create the string global for the module). The rest is tool-specific, and is about creating the values to put in the struct.

IMHO, if you know a struct type layout, specify it upfront, i.e., create the struct type, and fill in the entries.
Because that's a good way to check that what you fill in is what you specified. I made several mistakes while implementing it, and thanks to the specified struct type I can easily spot the problem.

My previous comment was about extracting away the struct + GV creation (and not needing to have a struct type).
Those two steps are what I pointed to from clang's source: every check will create an ArrayRef with the data to be put on the struct, and then common code simply takes that and creates an anonymous structure.

The EmitCheck has a much more work than simply creating a GV, so it makes sense to having a separate place creating the StaticData and then call EmitCheck.
Here, we have one statement creating the GV, it does not make any sense to have a separate function for that.

Your change addressed those two points, and thanks for doing that. But the resulting code, for me, seems to be more confusing than it needs to be. One of the very good things about the clang example I posted was that you just need to look at the temporary array to figure out what exactly the structure that is passed to __esan_init contains. This also makes it easy to add/change fields to that structure (simply add a new llvm::Constant *[] element to the array creation).

It is easy to add/change fields, but also make it easy to make mistakes, because there is no struct type to check against.
Even we want to change the struct with a specified struct type, I do not see how hard to change the struct type, esp if it is at the beginning of the same function.
Basically, I feel having a specified struct type upfront has great benefit of error checking, and little downside extra step of changing the struct layout.

My opinion is that it's much easier to create the array at a single place than to track down the SmallVector's inserted elements.

How about:
The createEsanInitToolGV just dispatches to the appropriate (tool-specific) function and returns the same as that function (possibly creates the module string and passes it to the next function too, if you want to factor that out to here);
The tool-specific function:

Gets all the info/adds constants
Sets up an array with the information to be placed in the structure
Calls into a helper "create a GV for tool init" function (optional if you want to return the GV from the tool-specific function, at the expense of code duplication)

That way, each function deals only with one thing. Instead of having dispatch + (some) setup + GV creation all mixed in the same function. The creation of the structure is very easy to read, and you can easily make sure both places (here and compiler-rt) have a matching structure definition.
Adding a tool-specific function will also avoid introducing more bugs, since you'd only need to touch the dispatcher + your own tool's struct creator. The dispatcher wouldn't need to know anything about how the GV is created.

What do you think?

I am not sure what's the difference from my original CL, where the struct is declared in each tool-specific create GV?
The original CL had a static struct declaration and then filled the entries and created the GV at the end of the function.

StructType *CacheFragType =
  StructType::get(Int32Ty, IntptrTy, nullptr);
...
GlobalVariable *CacheFragGV = new GlobalVariable(
    M, CacheFragType, true, GlobalVariable::InternalLinkage,
    ConstantStruct::get(CacheFragType,
                        ConstantInt::get(Int32Ty, Options.ToolType),
                        ConstantExpr::getPointerCast(UnitName, IntptrTy),
                        nullptr));

It seems clearer than pushing a vector in a separate the function and then create the GV without knowing what's inside.

Also, how do you guarantee the UnitName would be the first entry if each tool-specific function "Gets all the info/adds constants"?

The rationale is that each tool knows what things to place in its structure. If it wants to get the UnitName and wants it to be the first thing in the structure, it will place it there.

The only reason I change it to Vec is to make sure we can insert UnitName upfront in CreateEsanInitToolGV, and extend the struct in tool specific createGV functions.

In summary:

  1. each tool itself is responsible for the struct layout, so no vec is necessary.
  2. having a struct type specified helps check the correctness of creating the struct data.
  3. creating the GV is a single statement, not worth separting out.

Looks like my original CL was good.

zhaoqin updated this object.May 27 2016, 8:40 AM
zhaoqin updated this object.May 27 2016, 8:43 AM
zhaoqin updated this revision to Diff 58795.May 27 2016, 8:49 AM
zhaoqin updated this object.

create static struct instead of using vec for creating ToolInfo and CacheFragInfo.

bruening edited edge metadata.May 27 2016, 10:27 AM

I think the commit message could do a better job of explaining that this is just a skeleton data structure being created here. Its contents are not filled in yet for the cache frag tool.

[esan|cfrag] Create the cfrag variable for the runtime

To me this implies it's created the finished data passed to the runtime, which is not true. Adding "skeleton" or something, or "Part 1 of 2" or something would help.

Adds static help routine createPrivateGlobalForString.

IMHO this is not a useful commit message line and should be removed. Please describe what the new code is doing, not the name of helper routines.

Adds createCacheFragInfoGV to create the cache-fragmentation tool
specific variable passed to the runtime library.

Updates createEsanInitToolInfoGV to create the default ToolInfo struct
if no tool specific variable is created.

I would suggest removing the function names and describing in prose what is happening. Maybe something like "Adds a global variable containing preliminary information for the runtime. Passes a pointer to the variable to the compilation unit init routine in the runtime. The varable initially contains just the compilation unit name."

bruening added inline comments.May 27 2016, 10:35 AM
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
234 ↗(On Diff #58795)

Please elaborate in the comment: really it's not "expand" which implies we have something complete now: really, this unit name is not important and this is just a skeleton struct that is not finished. What we really want is to pass pointers to counters this pass will create for each field of each class/struct in the program. Maybe part of the confusion is that this CL is only passing something that seems general to every tool, which is why the other reviews all wanted to share fields and pass something for all tools. In hindsight it may have been better to send a CL that has all of the cache frag data (the counters) that we want to pass, and nullptr for other tools.

266 ↗(On Diff #58795)

This seems to be a compromise that does not seem ideal: now we have two structs with the same field that is duplicated instead of shared. I would say, either go back to the original CL and pass nullptr here, or share this field via a contained struct or an anonymous generated struct sharing that field code (that approach seems to have been tried and rejected though).

zhaoqin retitled this revision from [esan|cfrag] Create the cfrag variable for the runtime to [esan|cfrag] Create the skeleton of cfrag variable for the runtime.May 27 2016, 3:34 PM
zhaoqin updated this object.
zhaoqin edited edge metadata.
zhaoqin updated this object.May 27 2016, 3:38 PM
zhaoqin updated this revision to Diff 58851.May 27 2016, 3:38 PM
zhaoqin marked an inline comment as done.

PTAL

zhaoqin marked an inline comment as done.May 27 2016, 3:39 PM
zhaoqin added inline comments.
lib/Transforms/Instrumentation/EfficiencySanitizer.cpp
266 ↗(On Diff #58795)

create a nullptr instead.

bruening accepted this revision.May 31 2016, 8:05 AM
bruening edited edge metadata.
This revision was automatically updated to reflect the committed changes.