This is an archive of the discontinued LLVM Phabricator instance.

[NewPM] Second attempt at porting ASan
ClosedPublic

Authored by leonardchan on Jan 8 2019, 6:59 PM.

Details

Summary

This is the second attempt to port ASan to new PM after D52739. This takes the initialization requried by ASan from the Module by moving it into a separate class with it's own analysis that the new PM ASan can use.

Changes:

  • Split AddressSanitizer into 2 passes: 1 for the instrumentation on the function, and 1 for the pass itself which creates an instance of the first during it's run. The same is done for AddressSanitizerModule.
  • Add new PM AddressSanitizer and AddressSanitizerModule.
  • Add legacy and new PM analyses for reading data needed to initialize ASan with.
  • Removed DominatorTree dependency from ASan since it was unused.
  • Move GlobalsMetadata and ShadowMapping out of anonymous namespace since the new PM analysis holds these 2 classes and will need to expose them.

Diff Detail

Repository
rL LLVM

Event Timeline

leonardchan created this revision.Jan 8 2019, 6:59 PM
chandlerc requested changes to this revision.Jan 8 2019, 11:34 PM

I think a key thing that this is currently missing (or I'm missing it when looking at hte patch) the core shift that we made working through how to do sanitizer stuff with MSan.

Specifically, I suspect that we don't need any analysis layer, and the module and function layer can operate entirely independently. This would require roughly the following changes to the current ASan infrastructure:

  1. Change how the per-module initialization of the AddressSanitizerFunction pass works to resemble hwat was done for MSan and TSan, factoring it into a helper class that is used by a legacy function pass an a new function pass. This will require the get-or-create pattern for all the things touched, and registering things only when creation is necessary.
  1. Make a few things in the current function pass lazy such as buildig the inline assembly needed, etc.
  1. Directly port the AddressSanitizerModule to new PM.
  1. (maybe) decouple the ctor insertion in the module pass from the ctor insertion in the function pass. but maybe no need, just the first one to need it does it.

I don't think any analysis should be required at this point?

Let me know if this isn't making sense and I can try to explain more.

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizerPass.h
108–123 ↗(On Diff #180780)

Why is this analysis needed? It seems like it could just be directly computed as part of the pass?

Is there some logic shared between this and some other pass? If so, documenting what it is and how it works seems really important.

125–126 ↗(On Diff #180780)

Similar to the MSan and TSan cases, this should also explain *how* this will modify the IR and whether there are any oddities or constraints on it.

140–141 ↗(On Diff #180780)

We should have some documentation for what this does?

This revision now requires changes to proceed.Jan 8 2019, 11:34 PM
leonardchan marked an inline comment as done.Jan 9 2019, 11:06 AM

I think a key thing that this is currently missing (or I'm missing it when looking at hte patch) the core shift that we made working through how to do sanitizer stuff with MSan.

Specifically, I suspect that we don't need any analysis layer, and the module and function layer can operate entirely independently. This would require roughly the following changes to the current ASan infrastructure:

Replied to a comment below with reasoning for the analysis, but it could also be that there's another way to perform initialization without the analysis.

  1. Change how the per-module initialization of the AddressSanitizerFunction pass works to resemble hwat was done for MSan and TSan, factoring it into a helper class that is used by a legacy function pass an a new function pass. This will require the get-or-create pattern for all the things touched, and registering things only when creation is necessary.

I believe I did this in this patch. AddressSanitizer was separated into the helper class AddressSanitizer and the pass AddressSanitizerLegacyPass. The helper class is also used by the new PM function pass AddressSanitizerPass. Similar actions were done with AddressSanitizerModule (which I believe is independent from AddressSanitizer) and created AddressSanitizerModuleLegacyPass and AddressSanitizerModulePass. I have not yet used getOrCreateInitFunction, but I think it only needs to be used with AddressSanitizerModule since AddressSanitizer does not seem to require changing llvm.global_ctors.

  1. Make a few things in the current function pass lazy such as buildig the inline assembly needed, etc.

Would this be more appropriate in a separate patch since this involves changing some ASan internals? The primary focus of this patch was to be able to use ASan with the new PM.

  1. Directly port the AddressSanitizerModule to new PM.

Already made a new PM equivalent to this (AddressSanitizerModulePass), or did you mean that the legacy AddressSanitizerModule should be deleted?

  1. (maybe) decouple the ctor insertion in the module pass from the ctor insertion in the function pass. but maybe no need, just the first one to need it does it.

This could be done with the module pass, but I don't think the AddressSanitizer function pass does any constructor insertion.

I don't think any analysis should be required at this point?

Let me know if this isn't making sense and I can try to explain more.

I think I understand where you're coming from, but do correct me if I'm wrong. Essentially follow the same pattern as MSan and TSan, which I think is just creating a sanitizer specific initialization function that gets added to llvm.global_ctors. I could be wrong on this, but I think this would only apply to AddressSanitizerModule and not AddressSanitizer specifically, where the main problem encountered last time with AddressSanitizer was that initializing GlobalsMD on each function pass was too expensive. Perhaps the initialization could also be moved to a function that could be added to llvm.global_ctors, although I would need to dig deeper into ASan internals and how it depends on GlobalsMetadata to see how this could work.

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizerPass.h
108–123 ↗(On Diff #180780)

The main reason for the analysis is to just initialize some variables that were initialized in AddressSanitizer's doInitialization, namely:

GlobalsMetadata GlobalsMD;
LLVMContext *C;
int LongSize;
Type *IntptrTy;
Triple TargetTriple;
ShadowMapping Mapping;

These members only need to be initialized once per module instead of per function and can be retrieved from the module. We initially did what was done with MSan and TSan by moving this initialization to the sanitizer constructor (in D52739), but this initialization turned out to be expensive and timely since we would be performing this initialization every time we passed over a function. This resulted in a timeout for an internal ASan build. These parameters can be technically computed during construction of each sanitizer but would be expensive.

From what we could tell, it was initializing GlobalsMD which seemed to be the most expensive part, and neither MSan nor TSan require initializing something like GlobalsMetadata. A bunch of alternative approaches were discussed in D54337 and a recommended one was to create a class that just contained these required members and initializing them in an analysis that the function pass can depend on. This should work since the analysis wouldn't be changing any IR, the passes can depend on it and just get the variables it needs from it, and it can be used by both the independent AddressSanitizer and AddressSanitizerModule passes.

I ran a benchmark using asan on a synthetic testcase made up of 10k functions and 10k globals, and it's clearly visible that GlobalsMD causes the performance problem. This is because during initialization it looks at all the globals registered in llvm.asan.globals, which is used to mark the globals coming from the frontend which are supposed to be instrumented. The function pass, however, doesn't use most of this, and accesses GlobalsMD only within AddressSanitizer::GlobalIsLinkerInitialized to check whether a global is dynamically initialized. This means we can fix this in two ways:

  1. Change how the frontend marks globals, e.g. by attaching metadata directly to them instead of making the llvm.asan.globals list. This is the cleanest solution but it's quite invasive, and requires involving the sanitizer people to get them on board.
  2. Make a module-level analysis that unpacks the global list.

You've done a lot of #2 already, and so it seems to be the easiest path forward. However, your solution needs a couple improvements:

  • Your analysis does way too much. It should be a metadata analysis only.
  • You need to handle invalidation, in the sense that your analysis is never invalidated.
  • FIXME all the things, to make clear it should be replaced with direct metadata as in #1
  • Use a hook to insert a require wrapper into the last module pass manager EP before the function pass manager containing asan is ran.

I ran a benchmark using asan on a synthetic testcase made up of 10k functions and 10k globals, and it's clearly visible that GlobalsMD causes the performance problem. This is because during initialization it looks at all the globals registered in llvm.asan.globals, which is used to mark the globals coming from the frontend which are supposed to be instrumented. The function pass, however, doesn't use most of this, and accesses GlobalsMD only within AddressSanitizer::GlobalIsLinkerInitialized to check whether a global is dynamically initialized. This means we can fix this in two ways:

  1. Change how the frontend marks globals, e.g. by attaching metadata directly to them instead of making the llvm.asan.globals list. This is the cleanest solution but it's quite invasive, and requires involving the sanitizer people to get them on board.
  2. Make a module-level analysis that unpacks the global list.

You've done a lot of #2 already, and so it seems to be the easiest path forward. However, your solution needs a couple improvements:

  • Your analysis does way too much. It should be a metadata analysis only.

Removed AddressSanitizerParams so that only GlobalsMetadata is passed around. The analysis is now called GlobalsMetadataAnalysis and only initialized and returns a GlobalsMetadata instance.

  • You need to handle invalidation, in the sense that your analysis is never invalidated.

Not sure if this was the correct way to resolve this, bit I added invalidate() methods to the wrapper class for GlobalsMetadata.

  • FIXME all the things, to make clear it should be replaced with direct metadata as in #1

Added to places where we access GlobalsMD.

  • Use a hook to insert a require wrapper into the last module pass manager EP before the function pass manager containing asan is ran.

Added the same code for enabling asan with new PM in D52814, although I thought this could be added in a separate patch.

Nice, we're getting somewhere! A lot of comments inline. Most importantly, your analysis is a function analysis, which it mustn't be, otherwise there's no gain from it at all.

In addition to basic.ll, you should also test the code paths using GlobalsMD now.

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizerPass.h
14 ↗(On Diff #183033)

The filename should be just AddressSanitizer.h, like for the other sanitizers.

49 ↗(On Diff #183033)

This isn't needed.

54 ↗(On Diff #183033)

You can drop this and init on construction.

69 ↗(On Diff #183033)

No need for an extra class. Just make GlobalsMetadata the result of the analysis.

90 ↗(On Diff #183033)

Name it ASanGlobalsMetadataAnalysis to make more clear what it does. Since it is an analysis now that people can use, this needs comments!

96 ↗(On Diff #183033)

This can't be a function analysis, or it would still run for every function.

102 ↗(On Diff #183033)

Make GlobalsMetadata the direct result of the analysis.

llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
534

The convention is to name this ASanGlobalsMetadataWrapperPass.

562

explicit is superfluous now.

620

It's called instrumentFunction in the other sanitizers.

714

I'd prefer ModuleAddressSanitizer, because it's not a module, it's a sanitizer.

743

Should be named instrumentModule.

1147

When GlobalsMD stops being a function analysis, this won't work anymore. It might then happen that the result is not available, so you need to handle that case. IMHO that should be a hard error, but alternatively asan can either not run or compute the MD itself. In any case it should not be silent!

1175

Use the same name as for the new PM

1176

This description is wrong.

1202–1203

Needs a name update. At the same time, it's declaration should go into AddressSanitizer.h

One thing I missed.

clang/lib/CodeGen/BackendUtil.cpp
1104

This is wrong. Check out how this is done for the other sanitizers above! On top of what's happening there, you'll also have to add a requires wrapper for the GlobalsMD analysis at the last module EP before asan is scheduled.

leonardchan marked an inline comment as done.Jan 23 2019, 2:52 PM
leonardchan added inline comments.
llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
1147

One question before continuing: so how can this function pass use the results of this module analysis if it's not a function pass? When turning on the debug pass manager with opt, the analysis does not actually get run. Wrapping it in a require wrapper with "-passed='require<asan-globals-md>'" allows it to run on the module pass manager, but still crashes when the function pass manager runs.

The crash is that "This analysis pass was not registered prior to being queried" (the GlobalsMetadataAnalysis).

philip.pfaffe added inline comments.Jan 25 2019, 12:48 AM
llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
1147

You need to get it through the ModuleAnalysisManager proxy.

leonardchan marked 25 inline comments as done.
leonardchan added inline comments.
clang/lib/CodeGen/BackendUtil.cpp
1104

Done. Although I think placing them here will require compiling with -O1 in order to enter this block.

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizerPass.h
69 ↗(On Diff #183033)

Done. But if we don't have a wrapper that holds a reference to GlobalsMetadata, wouldn't that mean copying it's internal map every time we return it as a result from the analysis?

Although I'm not sure how big this map can actually get, so it may not be worth caring about. I also couldn't find examples of analyses that return references either.

llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
714

Done. Also renamed other functions + classes with AddressSanitizerModule to ModuleAddressSanitizer.

1147

Ah. I did not know about this proxy. This is very useful. Although this requires adding require<analysis> to the tests.

Also chose to make it a hard error if the analysis result was not found beforehand, but if there's a better alternative I don't mind changing it.

1202–1203

Done. Also moved the one for the function address sanitizer.

This is starting to look great! Minor nits inline.

Don't worry about copying the Result! Returning usually doesn't copy, but that's not really the point. Analyses are only ran if their result is missing or invalid, and the AnalysisManager caches analysis results! That's the absolute beauty of it!

llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
529–531

No need to pull this out!

As a followup, I want to add options handling just like we did for loop unrolling. Then this will get unified in a much nicer way!

1140

This can be made a bit simpler. No need for an extra GlobalsMD variable, and you don't need to call F.getParent() twice.

leonardchan marked 3 inline comments as done.
leonardchan added inline comments.
llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
529–531

Done. I assume also rL350808 is the commit you are referring to? Should that be a patch separate from this or added on top of this also?

philip.pfaffe accepted this revision.Feb 2 2019, 3:31 PM

Great! Just three more nits. Feel free to land after adressing them!

llvm/include/llvm/Transforms/Instrumentation/AddressSanitizer.h
75

GlobalsMetadata

llvm/lib/Transforms/Instrumentation/AddressSanitizer.cpp
529–531

Correct. I just submitted D57640 against msan to show how this might look.

1202–1203

The full name should be createModuleAddressSanitizerLegacyPassPass

llvm/test/Instrumentation/AddressSanitizer/basic.ll
4

Could you drop a comment here why the require<> is required?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 2 2019, 3:31 PM
fedor.sergeev accepted this revision.Feb 4 2019, 12:44 AM

LGTM bar the small nit below.

clang/test/CodeGen/asan-new-pm.ll
23–26

If I get it right, the primary goal of this test is to check that address sanitizer actually kicks in for the new pass manager with -fexperimental-new-pass-manager -fsanitize=address.

Then the only check that needs to be done here is the presence of __asan instrumentation function calls and thats all.
No need to check exact sequence of instructions in the function, which might be subject to change depending on how -O1 is packed with optimizations.

And then perhaps you can even skip -O1.

leonardchan marked 6 inline comments as done.

Updated. Before submitting, @chandlerc would you mind taking one last look at this to make sure everything is in order? I wouldn't want to accidentally cause another ASan related timeout.

clang/test/CodeGen/asan-new-pm.ll
23–26

Yup, this is just for checking if the sanitizer works with those arguments. Based off where the sanitizer passes are added though, it seems that the -O1 is necessary since the sanitizers cannot get added if there is no optimization. On top of this, I think only a few __asan functions are declared since I think having -O1 removes unused functions.

Nevertheless, added the checks to make sure the appropriate __asan functions get declared.

philip.pfaffe requested changes to this revision.Feb 4 2019, 12:55 PM

Sorry, there's something I missed, comment inline.

It also sounds like we should enable inserting asan at O0, and as a followup probably do the same for msan and tsan as well.

clang/lib/CodeGen/BackendUtil.cpp
1064

This is wrong! Sorry I overlooked that. This pass needs to go into the last Module EP before the sanitizer is inserted.

You can't add it to MPM, it's being overwritten below.

This revision now requires changes to proceed.Feb 4 2019, 12:55 PM
leonardchan marked 2 inline comments as done.
leonardchan added inline comments.
clang/lib/CodeGen/BackendUtil.cpp
1064

I hope this was inserted correctly this time. By the last Module EP, do you mean the pipeline EP callbacks run over the ModulePassManager?

Also added RUN tests to make sure this works with LTO and ThinLTO.

*ping* @philip.pfaffe Any last comments on this?

It's still missing the O0 part. I forgot adding it for msan and tsan, but we should do asan correctly from the start.

It's still missing the O0 part. I forgot adding it for msan and tsan, but we should do asan correctly from the start.

Done. Although it isn't the exact same as what's in D57793.

philip.pfaffe accepted this revision.Feb 13 2019, 12:30 PM

Tiny final nit, otherwise looks great!

clang/lib/CodeGen/BackendUtil.cpp
921

Could this be a free function?

leonardchan marked 2 inline comments as done.
leonardchan added inline comments.
clang/lib/CodeGen/BackendUtil.cpp
921

Yes, although the sanitizer depends on a few things provided by the EmitAssemblyHelper, so those will have to be passed as arguments.

This revision was not accepted when it landed; it landed in state Needs Review.Feb 13 2019, 2:22 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 13 2019, 2:22 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript