This is an archive of the discontinued LLVM Phabricator instance.

Move Sanitizer metadata to be on-GlobalValue.
AbandonedPublic

Authored by hctim on Apr 26 2022, 6:09 PM.

Details

Summary

Currently sanitizer metadata is stored in a global LLVM metadata table,
generally "llvm.asan.globals". This contains certain attributes relevant
to the sanitizers that need to be carried around.

A while ago, when working on MTE globals, I discovered that some of the
GlobalOpt passes would delete a global variable, and replace it with
a new global variable. These optimisation passes clean up all the
general references, but don't clean up the sanitizer-specific metadata
because that wasn't associated well with the original global.

I could have just fixed that, but decided to actually fix the
outstanding N-year-old comment of "this should really be part of the
GlobalValue itself". So, remove the old sanitizer-specific global
descriptions, and move the information about sanitization to be on the
GlobalValue.

Because we don't want to carry around too much information on the
GlobalValue, I also made it so that we don't carry around the line
number and column information for each global variable in the binary in
an ASan-specific way. Instead, this patch (along with some dependencies
that add support in llvm-symbolizer and friends) changes ASan to look
for debuginfo to get this information. This mean that without debuginfo,
global-buffer-overflows and ODR reports now say "variable 'foo' defined
in 'my_source_file.cpp'" instead of "variable 'foo' defined in
'my_source_file.cpp:123:2'" (note that we still have the source line
from the file descriptions). I think this is a reasonable change.
This also provides binary savings of 0.275% when building clang under
ASan (~2.08MiB).

Diff Detail

Event Timeline

hctim created this revision.Apr 26 2022, 6:09 PM
Herald added a project: Restricted Project. · View Herald Transcript
hctim requested review of this revision.Apr 26 2022, 6:09 PM
Herald added projects: Restricted Project, Restricted Project, Restricted Project. · View Herald TranscriptApr 26 2022, 6:09 PM
Herald added subscribers: llvm-commits, Restricted Project, cfe-commits, MaskRay. · View Herald Transcript
hctim added a comment.Apr 26 2022, 6:11 PM

Sorry for the large patch. Honestly, looks bigger than it is due to 1-line changes in many files + many-line changes in stuff that was pretty much moved around. I didn't see a great way to split this up except for sharding out the GlobalValue changes + IR + bitcode read/write, but didn't think that was substantial enough to warrant it.

hctim updated this revision to Diff 425379.Apr 26 2022, 6:12 PM

Wrong base, updating.

filcab added a subscriber: filcab.Apr 27 2022, 8:40 AM

Hi @hctim, thanks for the patch.
I have one question, though. Do you really need to remove the information you removed?
Some people might be testing ASan binaries without access to debug info (because it's been stripped or it's not been loaded and is not available for the sanitizers to get symbols from, etc), and having the full global information would be useful for those reports.

Can this be done by having a flag to make clang not emit the source information + the sanitizer patch to get the line and column?
That way we could keep the current behaviour of emitting more info, and you'd still get your savings + use of debuginfo.

Thank you,
Filipe

hctim retitled this revision from [symbolizer] Parse DW_TAG_variable DIs to show line info for globals to Move Sanitizer metadata to be on-GlobalValue..Apr 27 2022, 2:58 PM
hctim edited the summary of this revision. (Show Details)
kstoimenov removed a subscriber: kstoimenov.

One meta-observation: can you split this into smaller patches?

Hi @hctim, thanks for the patch.
I have one question, though. Do you really need to remove the information you removed?
Some people might be testing ASan binaries without access to debug info (because it's been stripped or it's not been loaded and is not available for the sanitizers to get symbols from, etc), and having the full global information would be useful for those reports.

Can this be done by having a flag to make clang not emit the source information + the sanitizer patch to get the line and column?
That way we could keep the current behaviour of emitting more info, and you'd still get your savings + use of debuginfo.

Thank you,
Filipe

It's going to cost us supporting two implementations of the same thing? It would be nice to pick a single approach if existing behavior is not critical.

clang/lib/CodeGen/CMakeLists.txt
83

why do you need this rename?

depending on your response you should either:

  1. revert
  2. rename as NFC in a separate patch and rebase the rest
clang/lib/CodeGen/SanitizerMetadataFactory.h
6

can you please either revert

vitalybuka added inline comments.May 2 2022, 3:39 PM
clang/lib/CodeGen/SanitizerMetadataFactory.h
6

ignore this one

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
1352

all serialization stuff deserve own patch, then rebase the rest on that

llvm/lib/IR/AsmWriter.cpp
3550

it extends IR language, corresponding documentation needs to be updated

kstoimenov added inline comments.May 2 2022, 3:45 PM
clang/lib/CodeGen/SanitizerMetadataFactory.h
33

Not sure if this class follows the 'factory' design pattern, which typically it would have some sort of 'produce' or 'create' method. Maybe SanitizerMatadataWriter or SanitizerMatadataCreator?

46–48

This could be private I think.

compiler-rt/lib/asan/asan_globals.cpp
90

Outside of the scope of this change. It looks like DataInfo's constructor is calling memset on it's fields. I would have probably designed this differently because now it looks like potential uninitialized when we check for 'info.line != 0'.

llvm/include/llvm/AsmParser/LLToken.h
205

Why do we need to add these to lex? I am surprised that didn't have the need so far.

llvm/include/llvm/IR/GlobalValue.h
312

I suspect that there might be another place where that logic exists. I am worried that we are duplicating it. Do you might looking for it in the code base?

I've always viewed the current implementation as a hack, and believed this data should live in debug info.

It can be convenient to get symbolized reports for global overflow bug in a stripped binary, but those bugs are quite rare, and "symbolization" only affects the global itself, not the stack trace of the bad memory access. It does not seem worth it to support two code paths just for this.

hctim planned changes to this revision.May 23 2022, 3:42 PM

Pulled out the IR-specific changes to D126100.

hctim abandoned this revision.Jun 16 2022, 3:08 PM

Integrated slowly and surely as part of the stack leading up to D127911.

clang/lib/CodeGen/ItaniumCXXABI.cpp