Adding support for sscanf in DFSAN
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Sorry for the long delay. I was on vacation.
| compiler-rt/test/dfsan/custom.cpp | ||
|---|---|---|
| 2110–2115 | What is this testing, that is not tested with the labelled arg below? | |
| 2113 | Should this also check the return value of sscanf? | |
| 2124 | Should test with a label applied to padded_format, to show that it does not propagate. | |
| 2164–2168 | dfsan_get_origin() calls here are on empty variables. Is it intended to be called on some characters in s, like s_o does? | |
Thanks for the comments and sorry for the delay.
I would have the changes by now but I came across a strange issue. When doing ASSERT_INIT_ORIGINS I don't get the expected values. For example in the parsed string only the first 2 bytes have the right origin the others are 0. For the "m" variable even the first byte origin is incorrect. I was checking the implementation but it seems I am propagating the origins along with the labels. Could you please have a quick look, perhaps I'm missing something obvious here.
| compiler-rt/test/dfsan/custom.cpp | ||
|---|---|---|
| 2110–2115 | right, I'll leave just the labelled version | |
| 2113 | good point, I'll add it | |
| 2124 | yes, I will add a label to the format | |
| 2164–2168 | ah, right, good point, I'll fix this. | |
Specifically, I mean the following example from the test case:
strcpy(buf, "hello world, 2014/8/27 12345.678123 % 1000"); char *s = buf + 6; //starts with world dfsan_origin s_o = dfsan_get_origin((long)(s[1])); dfsan_set_label(i_label, (void *)(s + 12), 1); dfsan_origin m_o = dfsan_get_origin((long)s[12]);
...
int r = sscanf(buf, "hello %s %d/%d/%d %f %% %n%d", buf_out, &y, &m, &d,
               &fval, &n, &val);
assert(r == 6);
assert(strcmp(buf_out, "world,") == 0);
ASSERT_INIT_ORIGINS(buf_out + 1, 2, s_o); // if we do 3 or more bytes, this fails
ASSERT_INIT_ORIGINS(&m, 1, m_o); // this fails even with a single byteI'm not sure what you expect to happen differently.
Origins work a bit differently to taint labels.
Each taint label is 1 byte. Each Origin ID is 4 bytes (aligned). Origin IDs are only propagated when more than 4 bytes of contiguous data is written to memory (and it has to be aligned).
e.g. string where each letter is a byte:
char* x = strdup("abcdef");
// Updates 6 bytes of corresponding taint labels to [0x01, 0x01, 0x01, 0x01, 0x01, 0x01].
// Also updates 6 bytes of corresponding origin data to [<4 byte origin id>, 0x00, 0x00].
dfsan_set_label(0x01, x, 6);
char* y = strdup("abcdefgh");
// Updates 8 bytes of corresponding taint labels to [0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02].
// Also updates 8 bytes of corresponding origin data to [<4 byte origin id>, <same 4 byte origin id>].
dfsan_set_label(0x02, y, 8);If you expect to find non-zero origins, be careful that the bytes are copied together. A for loop copying one byte at a time could break the origin chain (or it might be optimized to a larger copy by the time the DFSan pass runs).
Thank your for the comments. I addressed the comments from the review and fixed test cases.
One suggestion for checking the expected origins in the test are not zero, then LGTM.
| compiler-rt/test/dfsan/custom.cpp | ||
|---|---|---|
| 2172 | If origin tracking is enabled, I suggest checking that the origins (e.g. s_o, m_o, d_o, f_o) are non-zero. | |
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2628:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2663:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2680:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2718:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~I will fix these warnings with changes like this:
diff --git a/compiler-rt/lib/dfsan/dfsan_custom.cpp b/compiler-rt/lib/dfsan/dfsan_custom.cpp index 8fe0c6e50f16..e903a99c97cc 100644 --- a/compiler-rt/lib/dfsan/dfsan_custom.cpp +++ b/compiler-rt/lib/dfsan/dfsan_custom.cpp @@ -2624,8 +2624,8 @@ static int scan_buffer(char *str, size_t size, const char *fmt, dfsan_set_label(l, dst_ptr, write_size); else { dfsan_set_label(l, dst_ptr, write_size); - int scan_count = formatter.num_written_bytes(retval); - int size = scan_count > write_size ? write_size : scan_count; + size_t scan_count = formatter.num_written_bytes(retval); + size_t size = scan_count > write_size ? write_size : scan_count; dfsan_mem_origin_transfer(dst_ptr, formatter.str_cur(), size); } end_fmt = true;
Should this also check the return value of sscanf?