Page MenuHomePhabricator

[ASan] Make AddressSanitizer a ModulePass
AbandonedPublic

Authored by leonardchan on Nov 9 2018, 11:21 AM.

Details

Summary

This patch changes AddressSanitizer from a FunctionPass to a ModulePass. The purpose of this is to make it simpler to eventually port this pass to the new pass manager. The class depends on the module for some initialization, but does this through doInitialization, but the new PM infra does not seem to have an equivalent overridable method for new PM passes. Making the sanitizer run on modules allows for performing this initialization at the start of every module run while still performing per-function instrumentation. This change is made so that the logic behind AddressSanitizer can be abstracted out and be used between separate module passes for the legacy and new PM.

This is part of the second attempt of porting ASan to the new PM after D52739.

Other changes:

  • Remove the unused DominatorTree dependency

Diff Detail

Repository
rL LLVM

Event Timeline

leonardchan created this revision.Nov 9 2018, 11:21 AM
leonardchan edited the summary of this revision. (Show Details)Nov 9 2018, 11:24 AM

Making this a module pass will cost us all the nice chaining and cache locality we get from a function pass. That will make the already slow instrumentation even more expensive. Do you see a way to keep this as a function pass?

Making this a module pass will cost us all the nice chaining and cache locality we get from a function pass.
That will make the already slow instrumentation even more expensive. Do you see a way to keep this as a function pass?

I assume the main problem with this being function pass in NewPM is the need for initialization, which is too slow when being done per-function.

There are ways around this issue, perhaps the most logical one would be to implement analysis for this GlobalMetadata thing...

The problem is not speed but correctness. The initialization adds function definitions, which function passes can't do. So we need to initialize out-of-band, either with a separate module pass, or some function outside of the pass pipeline. I strongly prefer the latter, but it's not entirely clear to me how this function should be called. It's obviousl when running this through clang, but what about opt?

The problem is not speed but correctness. The initialization adds function definitions, which function passes can't do.

Hmm... can you point me where function definitions are being added in doInitialization, I just dont see it (frankly I'm not familiar with this code at all).

So we need to initialize out-of-band, either with a separate module pass, or some function outside of the pass pipeline. I strongly prefer the latter, but it's not entirely clear to me how this function should be called. It's obviousl when running this through clang, but what about opt?

I dont see why do you prefer the latter, adding function definitions in module passes seems to be quite normal, and then it will definitely solve the problem with opt.

Hmm... can you point me where function definitions are being added in doInitialization, I just dont see it (frankly I'm not familiar with this code at all).

I might have been wrong here, I assumed the Asan function pass did this,too, because all sanitizers I've looked at do it. But it's possible that only the Module variant does it.

I dont see why do you prefer the latter, adding function definitions in module passes seems to be quite normal, and then it will definitely solve the problem with opt.

I'm against a module pass because it implies inter-pass-dependencies. If you run the function pass but not the module pass, you end up with a broken program. I'd really like to avoid that.

Sorry for the delayed response. So the main problem is that AddressSantizer needs to perform some initialization involving reading stuff from global metadata and getting some target specific information. More specifically, just these 6 things which are all initialized in doInitialization:

GlobalsMD.init(M);
C = &(M.getContext());
LongSize = M.getDataLayout().getPointerSizeInBits();
IntptrTy = Type::getIntNTy(*C, LongSize);
TargetTriple = Triple(M.getTargetTriple());
Mapping = getShadowMapping(TargetTriple, LongSize, CompileKernel);

which are all accessible from the Module. (I think) this doesn't require adding any function definitions or changes to the module other than adding the declarations for functions used for the ASan runtime, and AddressSanitizerModule I think is the only ASan related pass that calls createSanitizerCtorAndInitFunctions for making these function definitions.

To me it seems that there just needs to be an equivalent doInitialization that accepts modules for function passes in the new PM, but it seems that given the way runs are performed in the new PM, there's no inherit way to access the "parent" IRUnit of a given IRUnit. Was this what you were suggesting with the function outside of the pass pipeline @philip.pfaffe ? An alternative idea that @tamur suggested was putting the global metadata and other initialized data into a lazy data structure that can be accessed by the AddressSanitizers.

Would having this as a ModulePass also make that much of an impact in instrumentation speed? From what I could determine, with AddressSanitizer as a FunctionPass, control of this is delegated to the FunctionPassManager whose 'runOnModule` method just iterates through every function in the module, and it seems that the only actions done between the FunctionPass's doInitialization and FunctionPassManager's runOnModule is counting the size of the module and initializing some analyses, but I could also be overlooking something big here.

Sorry for the delayed response. So the main problem is that AddressSantizer needs to perform some initialization involving reading stuff from global metadata and getting some target specific information. More specifically, just these 6 things which are all initialized in doInitialization:

GlobalsMD.init(M);
C = &(M.getContext());
LongSize = M.getDataLayout().getPointerSizeInBits();
IntptrTy = Type::getIntNTy(*C, LongSize);
TargetTriple = Triple(M.getTargetTriple());
Mapping = getShadowMapping(TargetTriple, LongSize, CompileKernel);

If asan doesn't modify the module for this ('this' meaning the Mapping and the GlobalsMD here), than this is an excellent candidate for an analysis.

To me it seems that there just needs to be an equivalent doInitialization that accepts modules for function passes in the new PM, but it seems that given the way runs are performed in the new PM, there's no inherit way to access the "parent" IRUnit of a given IRUnit. Was this what you were suggesting with the function outside of the pass pipeline @philip.pfaffe ? An alternative idea that @tamur suggested was putting the global metadata and other initialized data into a lazy data structure that can be accessed by the AddressSanitizers.

There can't be a doInitialization in the new pass manager. It really doesn't fit the design. There can be multiple instances of a pass, both in the same pass manager or in different managers. Who does the initialization?

The free function thing I suggested was to do the initialization bits that modify the module. Function passes currently are allowed to insert global variables, MD, and other function declarations, but no function bodies. If asan doesn't need that, great! Doing the rest of the initialization lazily works, though! A thing to keep in mind though is that multiple instances of the pass still initialize multiple times, so it should be both cheap(-ish) and reentrant (in the sense that it shouldn't add e.g. functions twice).

Would having this as a ModulePass also make that much of an impact in instrumentation speed? From what I could determine, with AddressSanitizer as a FunctionPass, control of this is delegated to the FunctionPassManager whose 'runOnModule` method just iterates through every function in the module, and it seems that the only actions done between the FunctionPass's doInitialization and FunctionPassManager's runOnModule is counting the size of the module and initializing some analyses, but I could also be overlooking something big here.

Function passes are pipelined per function. That means that for a given function you first run simplification, optimization, and then the sanitizer without touching another function. So everything is nice and local. With a module pass you loose all that locality.

Sorry for the delayed response. So the main problem is that AddressSantizer needs to perform some initialization involving reading stuff from global metadata and getting some target specific information. More specifically, just these 6 things which are all initialized in doInitialization:

GlobalsMD.init(M);
C = &(M.getContext());
LongSize = M.getDataLayout().getPointerSizeInBits();
IntptrTy = Type::getIntNTy(*C, LongSize);
TargetTriple = Triple(M.getTargetTriple());
Mapping = getShadowMapping(TargetTriple, LongSize, CompileKernel);

If asan doesn't modify the module for this ('this' meaning the Mapping and the GlobalsMD here), than this is an excellent candidate for an analysis.

When you say analysis, do you mean like the TargetLibraryInfoWrapperPass that AddressSanitizer is dependent on? Not sure what the difference between passes and analyses are, but if the purpose of TargetLibraryInfoWrapperPass is to just pass a TargetLibraryInfo to AddressSantizer, then it seems a new WrapperPass could similarly be made for this initialized data as a custom class made specifically for ASan.

I am not an expert or have a good context, please excuse my trespassing, just as a guy reading the code and trying to understand, this is my understanding:

  1. GlobalsMD.init(M); is the only expensive part of the AddressSanitizer initialization.
  2. All data in GlobalsMD is obtained from a module, in GlobalsMetadata::init(Module& M).
  3. (Maybe this is not obvious) Function objects have a reference to their module via Function::getParent() method.
  4. Therefore, every expensive data that the function pass for AddressSanitizer needs, can be stored in Module objects:
  5. GlobalsMetadata class is probably no longer needed, at worst, the GlobalVariable* --> Entry mapping can be stored within the Module class.
  6. I haven't looked whether this mapping can be initialized within the constructor of Module. If yes, great. If not, some care needs to be taken to make the mapping be initialized only once on the first demand, also with care with respect to synchronization (multiple AddressSanitizers for functions of the same module may be running in parallel -if not now, in the future-), but these should be minor complications.

With these changes, AddressSanitizer can become a function level pass.

There is also an AddressSanitizerModule class in AddressSanitizer.cpp. If I understand correctly, now it is running as a module level pass before the AddressSanitizer the function level pass. The owners probably do not want to have this as a module level pass either. There may be a better way to accomplish this, but this is the simplest solution that comes to my mind:

  • It seems that AddressSanitizerModule is run only for its side effects on the Module. Which is good, so we don't need to find a place to store the calculated data. (If I'm wrong please ignore the rest of my comment)
  • Let's call the AddressSanitizerModule business logic ASM_BL.
  • Either add a boolean instance variable to Module, or have a global Module-->bool mapping to keep track of for which modules ASM_BL code has run.
  • Each address sanitizer function pass looks whether the ASM_BL for the function's module has already run, and if not runs it as the first thing.
  • But be careful about synchronization, protect ASM_BL with a mutex or something, be sure that it runs only once.
  • Alternatively, make AddressSanitizerModule pass a separate function level pass, make it a prerequisite of the AddressSanitizer the function level pass, and still take care of synchronization etc. i.e. hide ASM_BL behind some abstraction, so that even though every function of the module will call ASM_BL, it will actually run only once.

Any time your understanding or other reviewers' suggestions contradict mine, please feel free to ignore mine.

I am not an expert or have a good context, please excuse my trespassing, just as a guy reading the code and trying to understand, this is my understanding:

  1. GlobalsMD.init(M); is the only expensive part of the AddressSanitizer initialization.
  2. All data in GlobalsMD is obtained from a module, in GlobalsMetadata::init(Module& M).
  3. (Maybe this is not obvious) Function objects have a reference to their module via Function::getParent() method.
  4. Therefore, every expensive data that the function pass for AddressSanitizer needs, can be stored in Module objects:

Up to this point it kinda makes sense.
However, this suggestion of storing supplementary passes information into the IR itself does not sound like a good design decision.
A typical solution for LLVM to store supporting information is to introduce an Analysis, so it kinda floats around the IR, but
is not being stored directly into it.

If asan doesn't modify the module for this ('this' meaning the Mapping and the GlobalsMD here), than this is an excellent candidate for an analysis.

When you say analysis, do you mean like the TargetLibraryInfoWrapperPass that AddressSanitizer is dependent on? Not sure what the difference between passes and analyses are, but if the purpose of TargetLibraryInfoWrapperPass is to just pass a TargetLibraryInfo to AddressSantizer, then it seems a new WrapperPass could similarly be made for this initialized data as a custom class made specifically for ASan.

Passes are transformations that are allowed (and actually expected) to mutate the IR.
Analyses are (well, in NewPM) not passes, they are run over IR and store their "analytics" as the analysis result into the Analysis Manager.
Passes can query Analysis Manager and get the results that are either cached or calculated on demand.

Analysis can be any information which is not an IR itself. TargetLibraryInfo, DominatorTree, GlobalsMetadata... whatever.
The only problem that you might find when using your custom analysis from the pass will be similar to the current dichotomy of module & function ASan.
You can not *run* module analysis from within a function pass. It can only be run on a module level.

Analysis can be any information which is not an IR itself. TargetLibraryInfo, DominatorTree, GlobalsMetadata... whatever.
The only problem that you might find when using your custom analysis from the pass will be similar to the current dichotomy of module & function ASan.
You can not *run* module analysis from within a function pass. It can only be run on a module level.

Hmm. So if we keep ASan as a function pass, there doesn't seem to be a clean already existing way of performing one task before pass runs, especially if a pass for one IR unit cannot depend on/require running a pass on a different IR unit type first.

It depends on what you mean by task. There is no clean way for depending on changes between IR units. I.e. you can't express the requirement that two passes have to run in a specific order. But that's okay, implementing such requirements would incure insane complexity. Fortunately, if I recall correctly Asan oesn't need that! Correct me if I'm wrong.

But depending on information about other IR units is possible, that's why we have analyses. For asan this means: leave asan a function pass, but pull things pre-computed for the module into a module level analysis.

@leonardchan Are you making any progress here?

@leonardchan Are you making any progress here?

I haven't gotten back to this yet, but will do so after looking into what may be required for also porting SafeStack and ShadowCallStack to new PM.

It depends on what you mean by task. There is no clean way for depending on changes between IR units. I.e. you can't express the requirement that two passes have to run in a specific order. But that's okay, implementing such requirements would incure insane complexity. Fortunately, if I recall correctly Asan oesn't need that! Correct me if I'm wrong.
But depending on information about other IR units is possible, that's why we have analyses. For asan this means: leave asan a function pass, but pull things pre-computed for the module into a module level analysis.

You're right in that Asan doesn't need to keep track of changes between IR units. I haven't started this yet, but the main thing I was concerned about was that, for some reason, function passes in the new PM couldn't depend on module level analyses and only function level analyses, but I think this is false since TargetLibraryAnalysis is an example of an analysis that was ported but is required by many function passes.

I haven't gotten back to this yet, but will do so after looking into what may be required for also porting SafeStack and ShadowCallStack to new PM.

If I recall correctly, it was pointed out on the SafeStack thread that, since SafeStack is part of CodeGen, porting it is not possible right now.

You're right in that Asan doesn't need to keep track of changes between IR units. I haven't started this yet, but the main thing I was concerned about was that, for some reason, function passes in the new PM couldn't depend on module level analyses and only function level analyses, but I think this is false since TargetLibraryAnalysis is an example of an analysis that was ported but is required by many function passes.

TargetLibraryAnalysis is in fact a function analysis. Function analyses cannot request Module analyses to be ran, that is correct. They can, however, use analysis results that are already available.

Posted second attempt at porting ASan using an analysis at D56470.

leonardchan abandoned this revision.Feb 19 2019, 6:46 PM

Addressed in rC353985

Herald added a project: Restricted Project. · View Herald TranscriptFeb 19 2019, 6:46 PM
Herald added a subscriber: jdoerfert. · View Herald Transcript