This is an archive of the discontinued LLVM Phabricator instance.

[DFSAN] Add support for sscanf
ClosedPublic

Authored by tkuchta on Jun 26 2023, 8:06 AM.

Details

Summary

Adding support for sscanf in DFSAN

Diff Detail

Event Timeline

tkuchta created this revision.Jun 26 2023, 8:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 8:06 AM
Herald added a subscriber: Enna1. · View Herald Transcript
tkuchta requested review of this revision.Jun 26 2023, 8:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 26 2023, 8:06 AM
Herald added a subscriber: Restricted Project. · View Herald Transcript

Sorry for the long delay. I was on vacation.

compiler-rt/test/dfsan/custom.cpp
2110–2115

What is this testing, that is not tested with the labelled arg below?

2113

Should this also check the return value of sscanf?

2124

Should test with a label applied to padded_format, to show that it does not propagate.

2164–2168

dfsan_get_origin() calls here are on empty variables.

Is it intended to be called on some characters in s, like s_o does?

tkuchta marked an inline comment as done.Aug 11 2023, 9:49 AM

Thanks for the comments and sorry for the delay.

I would have the changes by now but I came across a strange issue. When doing ASSERT_INIT_ORIGINS I don't get the expected values. For example in the parsed string only the first 2 bytes have the right origin the others are 0. For the "m" variable even the first byte origin is incorrect. I was checking the implementation but it seems I am propagating the origins along with the labels. Could you please have a quick look, perhaps I'm missing something obvious here.

compiler-rt/test/dfsan/custom.cpp
2110–2115

right, I'll leave just the labelled version

2113

good point, I'll add it

2124

yes, I will add a label to the format

2164–2168

ah, right, good point, I'll fix this.

tkuchta marked an inline comment as done.Aug 11 2023, 12:05 PM

Specifically, I mean the following example from the test case:

strcpy(buf, "hello world, 2014/8/27 12345.678123 % 1000");
char *s = buf + 6; //starts with world

dfsan_origin s_o = dfsan_get_origin((long)(s[1]));
dfsan_set_label(i_label, (void *)(s + 12), 1);
dfsan_origin m_o = dfsan_get_origin((long)s[12]);

...

int r = sscanf(buf, "hello %s %d/%d/%d %f %% %n%d", buf_out, &y, &m, &d,
               &fval, &n, &val);
assert(r == 6);

assert(strcmp(buf_out, "world,") == 0);
ASSERT_INIT_ORIGINS(buf_out + 1, 2, s_o); // if we do 3 or more bytes, this fails
ASSERT_INIT_ORIGINS(&m, 1, m_o); // this fails even with a single byte

I'm not sure what you expect to happen differently.

Origins work a bit differently to taint labels.

Each taint label is 1 byte. Each Origin ID is 4 bytes (aligned). Origin IDs are only propagated when more than 4 bytes of contiguous data is written to memory (and it has to be aligned).

e.g. string where each letter is a byte:

char* x = strdup("abcdef");

// Updates 6 bytes of corresponding taint labels to [0x01, 0x01, 0x01, 0x01, 0x01, 0x01].
// Also updates 6 bytes of corresponding origin data to [<4 byte origin id>, 0x00, 0x00].
dfsan_set_label(0x01, x, 6);

char* y = strdup("abcdefgh");

// Updates 8 bytes of corresponding taint labels to [0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02, 0x02].
// Also updates 8 bytes of corresponding origin data to [<4 byte origin id>, <same 4 byte origin id>].
dfsan_set_label(0x02, y, 8);

If you expect to find non-zero origins, be careful that the bytes are copied together. A for loop copying one byte at a time could break the origin chain (or it might be optimized to a larger copy by the time the DFSan pass runs).

tkuchta updated this revision to Diff 551233.Aug 17 2023, 12:59 PM

Thank your for the comments. I addressed the comments from the review and fixed test cases.

browneee accepted this revision.Aug 21 2023, 9:50 AM

One suggestion for checking the expected origins in the test are not zero, then LGTM.

compiler-rt/test/dfsan/custom.cpp
2172

If origin tracking is enabled, I suggest checking that the origins (e.g. s_o, m_o, d_o, f_o) are non-zero.

This revision is now accepted and ready to land.Aug 21 2023, 9:50 AM
tkuchta updated this revision to Diff 552303.Aug 22 2023, 3:46 AM

Thank you, I've added the asserts.

Please feel free to land the patch in case it's LGTM. Thank you for the reviews!

/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2628:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2663:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2680:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~
/code/llvm-project/compiler-rt/lib/dfsan/dfsan_custom.cpp:2718:35: warning: comparison of integers of different signs: 'int' and 'size_t' (aka 'unsigned long') [-Wsign-compare]
            int size = scan_count > write_size ? write_size : scan_count;
                       ~~~~~~~~~~ ^ ~~~~~~~~~~

I will fix these warnings with changes like this:

diff --git a/compiler-rt/lib/dfsan/dfsan_custom.cpp b/compiler-rt/lib/dfsan/dfsan_custom.cpp
index 8fe0c6e50f16..e903a99c97cc 100644
--- a/compiler-rt/lib/dfsan/dfsan_custom.cpp
+++ b/compiler-rt/lib/dfsan/dfsan_custom.cpp
@@ -2624,8 +2624,8 @@ static int scan_buffer(char *str, size_t size, const char *fmt,
             dfsan_set_label(l, dst_ptr, write_size);
           else {
             dfsan_set_label(l, dst_ptr, write_size);
-            int scan_count = formatter.num_written_bytes(retval);
-            int size = scan_count > write_size ? write_size : scan_count;
+            size_t scan_count = formatter.num_written_bytes(retval);
+            size_t size = scan_count > write_size ? write_size : scan_count;
             dfsan_mem_origin_transfer(dst_ptr, formatter.str_cur(), size);
           }
           end_fmt = true;
This revision was automatically updated to reflect the committed changes.