This is an archive of the discontinued LLVM Phabricator instance.

[dfsan] Record dfsan metadata in globals
ClosedPublic

Authored by gbalats on Feb 24 2021, 12:04 PM.

Details

Summary

This will allow identifying exactly how many shadow bytes were used
during compilation, for when fast8 mode is introduced.

Also, it will provide a consistent matching point for instrumentation
tests so that the exact llvm type used (i8 or i16) for the shadow can
be replaced by a pattern substitution. This is handy for tests with
multiple prefixes.

Diff Detail

Event Timeline

gbalats created this revision.Feb 24 2021, 12:04 PM
gbalats requested review of this revision.Feb 24 2021, 12:04 PM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1273

Are these two variables shared by dfsan runtime and each compilation unit?
Weak leakage is better, it allows all of them 'share' a global: each compilation unit inserts one if users select fast8, and the dfsan runtime reads it and decides what to do.
See __dfsan_track_origins as an example.

PrimitiveShadowTy is used for shadow data, while __dfsan_shadow_width_bits is only a flag.
Can we use IntegerType::get(*Ctx, 8)?

1283

Do we need both dfsan_shadow_width_bits and dfsan_shadow_width_bytes? __dfsan_shadow_width_bits seems sufficient.

gbalats added inline comments.Feb 24 2021, 1:20 PM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1273

I thought the sharing between compilation units part was satisfied by the use of getOrInsertGlobal and that internal vs external linkage was only relevant as to what compilation units can actually reference that variable (and since we don't need other compilation units to reference it at all, internal linkage was sufficient). Does this mean that this getOrInsertGlobal would actually create one definition per compilation unit with the current code?

As for the use of PrimitiveShadowTy, you're right that you could use i8. I opted for the actual shadow type to make pattern matching in tests easier (so that you could extract the type directly from the global).

1283

We don't need both. But, again, it will make testing easier as there are some patterns that could use expressions that either depend on one or the other, so having both would make the testing code simpler.

gbalats updated this revision to Diff 326192.Feb 24 2021, 1:36 PM
gbalats marked an inline comment as not done.

Fixed variable naming

llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1273

Yeah. Each compilation unit will create one because the instrumentation pass runs per compilation unit.

See the comments below. -D could help test bitwidth at different modes.
So here we can have a different type name. I feel i8 may be strange at C level since dfsan.cpp also needs to define it. Maybe i32 is better, in dfsan.cpp, that will be int.

I just noticed __dfsan_track_origins I added also uses OriginTy... This also needs to be fixed.

1283

This option -D may help.
See examples from llvm/test/FileCheck/numeric-expression.txt

If this works, we could keep only one weak global flag to indicate what runtime can do.

gbalats updated this revision to Diff 326216.Feb 24 2021, 3:18 PM

Switch to weak_odr linkage

gbalats marked an inline comment as not done.Feb 24 2021, 3:23 PM
gbalats added inline comments.
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1273

Changed to weak ODR linkage. For the rest, see comment below.

1283

After offline discussion, switched to weak odr linkage which should remove the overhead of the two global flags to avoid specifying -D options to each RUN line.

llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1273

Please add some comments for injectMetadataGlobals about how the two variables would be used. For example, one is used by runtime, and both can help testing.

gbalats updated this revision to Diff 326225.Feb 24 2021, 3:48 PM

Add comment

gbalats marked an inline comment as done.Feb 24 2021, 3:49 PM
This revision is now accepted and ready to land.Feb 24 2021, 3:51 PM
morehouse added inline comments.Feb 25 2021, 10:20 AM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1278

If these globals have PrimitiveShadowTy type, will we have to declare multiple weak globals in the runtime (i.e. one that's int8 and one thats int16)?

Why not make it an int or size_t to simplify the runtime side?

gbalats added inline comments.Feb 25 2021, 11:26 AM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1278

I'm not yet familiar with how the runtime integration works. Is there a reason why the runtime has to mirror these using exactly the same type instead of e.g. int32?

The only reason to use PrimitiveShadowTy here, instead of i8, is to make testing easier so that by matching just one variable one can extract both the i16 and 16 parts to use later in the test:

; CHECK: @__dfsan_shadow_width_bits = weak_odr constant [[ST:i[0-9]+]] [[#SBITS:]], align {{[1-2]}}

Of course, you could use i[[SBITS]] instead of [[ST]] but it would be cumbersome. Alternatively, we could use a different global to extract the shadow type, though I'm not sure which one (it would have to be generated in all option combinations; perhaps one of the callback functions).

morehouse added inline comments.Feb 25 2021, 2:25 PM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1278

If the linker does not give an explicit error, I believe it is still undefined behavior to link two symbols of the same name with different types.

I would prefer using a single standard type across runtime/instrumentation and extracting the SBITS value in tests.

Also, I don't see much use for __dfsan_shadow_width_bytes since it isn't used in tests and its value can be easily obtained in the runtime from __dfsan_shadow_width_bits.

gbalats updated this revision to Diff 326548.Feb 25 2021, 4:12 PM

Use int32 instead of shadow type for dfsan metadata.

gbalats added inline comments.Feb 25 2021, 4:18 PM
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp
1278

Changed to i32. Updated tests to use SBITS instead.

Regarding the usefulness of the __dfsan_shadow_width_bytes, it is in fact used in tests. Right now it's hardcoded to 2 in several places, like alignments of some alloca/load/store instructions that operate on a shadow. These would be change to #SBYTES.

morehouse accepted this revision.Feb 26 2021, 7:21 AM
This revision was landed with ongoing or failed builds.Feb 26 2021, 2:43 PM
This revision was automatically updated to reflect the committed changes.