This is a child diff of D92261. It extended TLS arg/ret to work with aggregate types. For a function t foo(t1 a1, t2 a2, ... tn an) Its arguments shadow are saved in TLS args like a1_s, a2_s, ..., an_s TLS ret simply includes r_s. By calculating the type size of each shadow value, we can get their offset. This is similar to what MSan does. See __msan_retval_tls and __msan_param_tls from llvm/lib/Transforms/Instrumentation/MemorySanitizer.cpp. Note that this change does not add test cases for overflowed TLS arg/ret because this is hard to test w/o supporting aggregate shdow types. We will be adding them after supporting that.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Time | Test | |
---|---|---|
2,250 ms | x64 debian > libarcher.races::lock-unrelated.c |
Event Timeline
llvm/lib/Transforms/Instrumentation/DataFlowSanitizer.cpp | ||
---|---|---|
1079 | This is a duplicate of the declaration comment. Let's remove this one. (Same for functions below). | |
1101 | Nit: omit curly braces for one-line bodies. (Same elsewhere) | |
1109 | Nit: This comment can fit on the same line as the break. (Same below) | |
1122 | Nit: Base is a confusing name since it sounds like the base pointer to the TLS, not the offset. Let's rename it something else, maybe ArgShadowPtr? | |
llvm/test/Instrumentation/DataFlowSanitizer/arith.ll | ||
7–10 | Should we verify that align is 8? | |
8 | IIUC, this adds 8 to __dfsan_arg_tls to get b's shadow. This is because we have 8-byte alignment, right? Why do we need 8 byte alignment? | |
12 | The ST and VT names are a little confusing to me. What do they mean? | |
llvm/test/Instrumentation/DataFlowSanitizer/select.ll | ||
9 | Should we verify offsets in the TLS for each argument? |
llvm/test/Instrumentation/DataFlowSanitizer/arith.ll | ||
---|---|---|
8 | This flows MSan's setting. https://github.com/llvm/llvm-project/commit/d2bd319adc7cf19d6e47d3caf3b6f508f299ea57 introduced 8-byte alignment for MSan. It sounds like sometimes the origin arg's alignment can be too large. This makes arg_tls run out of space. | |
12 | renamed ST to SHADOWTYPE, and VT to ARGTLSTYPE |
Looks like load.ll and phi.ll need to be updated.
Also, what is the impact on code size with this change?
updated load.ll and phi.ll.
Evaluated code size.
- for large binaries with original size 2G, the additional overhead is within 0.2%.
- We also examined a small case.
Loading TLS arguments and storing TLS ret inside a function have the same overhead
between the two approaches.
At callsite, if all parameters are variables, their overheads are also the same.
When any parameter is a constant, the i16-shadow shadow can be of less overhead.
At a callsite this diff stores shadows for the 6 parameters like this:
mov 0x0(%rip),%rax
xor %ecx,%ecx
mov %cx,%fs:(%rax)
mov %cx,%fs:0x8(%rax)
mov %cx,%fs:0x10(%rax)
mov %cx,%fs:0x18(%rax)
mov %cx,%fs:0x20(%rax)
mov %cx,%fs:0x28(%rax)
The i16-shadow stores shadows for the 6 parameters like this:
mov 0x0(%rip),%rax
movq $0x0,%fs:(%rax)
movl $0x0,%fs:0x8(%rax)
Because of the 8-byte alignment codegen cannot zero-out consecutive memory addresses
with fewer instructions. If this turns out to be a common case, we may want to revisit this.
Why do we need the 8 byte alignment at all? It seems a little silly to have gaps in the TLS shadow.
Currently, when there's 6 params we get the following TLS pattern:
xxxxxx11xxxxxx22xxxxxx33xxxxxx44xxxxxx55xxxxxx66
where each x is an unused byte.
Hm, I checked the diff that introduced 8-byte alignment for MSan again.
It seems that this is specific to MSan because MSan's shadow type is the same to its original type: m128i's shadow type is also m128i. If m128i is set with a large alignment,
so does its shadow type. However DFSan's approach always uses i16 for all primitive types: if m128i exists, its shadow type is still i16.
So it is not necessary to follow MSan's. I will update.
switched to 2-byte alignment to be consistent with 16bit shadow values.
With this alignment, its binary code about TLS is the same as the code before this diff.
For large binaries with original size > 2G, the additional overhead is -0.1%. So its
overall code size overhead is slightly smaller than the current codegen.
clang-tidy: warning: invalid case style for variable 'kShadowTLSAlignment' [readability-identifier-naming]
not useful