This is an archive of the discontinued LLVM Phabricator instance.

[lldb/DataFormatter] Check for overflow when finding NSDate epoch

Authored by vsk on May 18 2020, 12:13 PM.



Fixes UBSan-reported issues where the date value inside of an
uninitialized NSDate overflows the 64-bit epoch.


Diff Detail

Event Timeline

vsk created this revision.May 18 2020, 12:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2020, 12:13 PM
Herald added a subscriber: mgorny. · View Herald Transcript
davide added a subscriber: davide.May 18 2020, 12:35 PM

Looking, as I touched NSDate last year.

davide accepted this revision.May 18 2020, 12:38 PM
davide added inline comments.
1–8 ↗(On Diff #264695)

Neat way to test.


I think you can drop this comment [you just moved it, but it feels irrelevant at this point because there's only one vendor].

This revision is now accepted and ready to land.May 18 2020, 12:38 PM

Davide already covered what I was going to say :-)

vsk marked 2 inline comments as done.May 18 2020, 1:11 PM



I'll fix this before committing.

This revision was automatically updated to reflect the committed changes.
vsk marked an inline comment as done.
labath added a subscriber: labath.May 19 2020, 5:54 AM

While I think the idea in this patch is good, the actual implementation seems problematic to me, on several levels.

The name Mock.h is overly generic, and inappropriate. According to wikipedia, mock (object)s are "simulated objects that mimic the behavior of real objects in controlled ways, most often as part of a software testing initiative". This file misses that mark completely, because it is not simulating anything -- it *is* the real thing, production code. It's true that the main reason it is even declared in a header is testing, but that does not make it a "mock". A mock "nsdate formatter" would be if we had the ability to, in a test, replace the implementation of the "convert an nsdate value to a string". Such a thing might potentially be useful as a way to test some higher level data formatter functionality, but it is definitely not what is happening here.

The second issue is that the placement of the file in lldb/DataFormatter breaks encapsulation. The generic data formatter code should have no knowledge of a particular language (runtime) implementation details. That might be excused if this particular functionality was useful for more than one language, but I don't see any evidence that this is the case. What's worse, by putting the declaration into lldb/DataFormatter, but having the implementation in Language/ObjC you're making it very easy to unknowingly depend on a plugin. Last, but not least, it is against the llvm library layering rules. (The paths there are specific to the llvm subtree, but it's clear the intent is for the header/interface and the implementation to live together.)

TL;DR: The fix is quite simple: move lldb/DataFormatter/Mock.h to Plugins/Language/ObjC and rename it to something less misleading (Utilities.h ?) And move the test to unittests/Language/ObjC.

27 ↗(On Diff #264708)

EXPECT_EQ(formatDateValue(...), std::string(...)) would be better. That way we'll get a test failure if the function returns None (instead of an assertion failure due to "dereferencing" an empty optional. Optional<string> and string are comparable and pretty-print correctly. The only reason this does not work out-of-the-box is because the Optional's operator== does not accept implicit conversions due to it being a template.

30 ↗(On Diff #264708)

Isn't this actually std::numeric_limits<time_t>::min() (and UB due to singed wraparound) ? Did you want to convert to double before doing the +1 ?

vsk marked 3 inline comments as done.May 19 2020, 4:11 PM

@labath Agreed on all points, I've addressed the feedback in 82dbf4aca84 by moving "DataFormatters/Mock.h" to "Plugins/Language/ObjC/Utilities.h", and adding a separate LanguageObjCTests unit test.

30 ↗(On Diff #264708)

Yes, thank you! It looks like Eric caught this before I did.

omjavaid reopened this revision.May 19 2020, 7:18 PM
omjavaid added a subscriber: omjavaid.

This patch breaks lldb unit tests on lldb-arm-ubuntu buildbot.

This revision is now accepted and ready to land.May 19 2020, 7:18 PM
In D80150#2045364, @vsk wrote:

@labath Agreed on all points, I've addressed the feedback in 82dbf4aca84 by moving "DataFormatters/Mock.h" to "Plugins/Language/ObjC/Utilities.h", and adding a separate LanguageObjCTests unit test.

Cool. Thanks.

30 ↗(On Diff #264708)

Actually, thinking about that further, (for 64-bit time_ts), double(numeric_limits<time_t>::max()) is exactly the same value as double(numeric_limits<time_t>::max())+1.0 because double doesn't have enough bits to represent the value precisely. So, I have a feeling these checks are still not testing the exact thing you want to test (though I'm not sure what that is exactly).

This patch breaks lldb unit tests on lldb-arm-ubuntu buildbot.

It also breaks our internal build bots with an "Illegal instruction". Here is the stacktrace:

0  <redacted>                    0x00007f8c5f6d87ba llvm::sys::PrintStackTrace(llvm::raw_ostream&) + 58
1  <redacted>                    0x00007f8c5f6d8c79
2  <redacted>                    0x00007f8c5f6d71bb llvm::sys::RunSignalHandlers() + 123
3  <redacted>                    0x00007f8c5f6d92fb
4                            0x00007f8c5e7509a0
5  <redacted>                       0x00007f8c764820b1 lldb_private::formatters::NSDate::FormatDateValue(double, lldb_private::Stream&) + 129
6  <redacted> 0x00007f8c771c29b5
7  <redacted> 0x00007f8c771c24d8 DataFormatterMockTest_NSDate_Test::TestBody() + 248
8  <redacted>                      0x00007f8c5f85ce44 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 132
9  <redacted>                      0x00007f8c5f84b532 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 114
10 <redacted>                      0x00007f8c5f840866 testing::Test::Run() + 182
11 <redacted>                      0x00007f8c5f840eb5 testing::TestInfo::Run() + 213
12 <redacted>                      0x00007f8c5f8413e4 testing::TestCase::Run() + 228
13 <redacted>                      0x00007f8c5f8461db testing::internal::UnitTestImpl::RunAllTests() + 731
14 <redacted>                      0x00007f8c5f861a24 bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 132
15 <redacted>                      0x00007f8c5f84cf42 bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) + 114
16 <redacted>                      0x00007f8c5f845eda testing::UnitTest::Run() + 186
17 <redacted>                0x00007f8c5f8b7ce1 RUN_ALL_TESTS() + 17
18 <redacted>                0x00007f8c5f8b7c6a main + 186
19                                  0x00007f8c5e5bebbd __libc_start_main + 253
20 language_tests                             0x0000556b0f40dd09
Illegal instruction

@gribozavr2 is preparing a revert of this commit and the two follow-up changes.

MForster added inline comments.

This is the line where the "Illegal instruction" happens. Actually here, exactly (as there has been a follow-up change by @echristo)

vsk marked 2 inline comments as done and an inline comment as not done.May 22 2020, 6:04 PM
vsk added inline comments.
30 ↗(On Diff #264708)

Hm, that's right. What I'm trying to do is look for float-cast UB in two places: first, when we initially convert the double "date_value" to time_t, and second, when we add the OSXEpoch to that time_t.

I'll take a fresh cut at this. For the first conversion, I think I can rely on implementation-defined behavior from std::llrint to do the conversion without crashing LLDB due to a floating-point exception. For the second, we have checkedAdd from llvm, but the tricky part is testing it: to do that, I'm crafting a special time_t that should trigger overflow, and asserting that a round-trip conversion to/from double doesn't break the property.

vsk updated this revision to Diff 265829.May 22 2020, 6:11 PM

Apologies for all the breakage. I think I've identified the issues in the
initial patch which caused breakage on the bots, and which caused the tests to
not target what they meant to test.

I've reworked this patch to use std::llrint, instead of casting the max
time_t value to double as part of an overflow check. The cast to double
was not precise, and could crash on some machines (presumably due to a floating
point exception?).

For the tests, I've added asserts to make sure that the specially-crafted
values that are meant to trigger clamping behavior in std::llrint, or an
overflow in llvm::checkedAdd, actually do satisfy those conditions.

vsk updated this revision to Diff 265830.May 22 2020, 6:20 PM

Fix an ASSERT_LE that was actually meant to be an ASSERT_GE.

I don't know how widespread the llrint behavior is. Clamping to the largest/smallest int seems reasonable, so it might be reasonable to depend on that -- though I don't know what will that do for NaNs...

I actually don't think the previous approach was wrong. IIUC, it only became a problem when Eric added an explicit int cast to the input value -- it should have been done the other way around, casting the integer to the float type. Then we'd just need to be careful in the test to select a number that is sufficiently bigger than UINT64_MAX, so that the difference is observable after conversion to floating point.

In a way, I think that the real problem here is that we're using host floating point arithmetic on numbers that are: a) untrusted; b) come from a system which has potentially different floating point encoding/semantics. A more principled approach would be to do it like the compiler, and work with APFloats with semantics explicitly configured from based on the target.


I think that changing !_WIN32 to APPLE is a step in the wrong direction. Ideally, this function should work the same everywhere. The way to achieve that would be to have a host function which selects between gmtime and _mkgmtime depending on the platform. But failing that, I don't see a reason to restrict the scope of this further.


What about infinities and NaNs? (I can believe this will do the right thing for the former, but I have no clue what will happen in the NaN case).


I don't think the distinction between AppleObjC and ObjC is very clear, but if the code under test lives in ObjC, I think it makes sense to call the test folder the same way.