This is an archive of the discontinued LLVM Phabricator instance.

[Sema] -Wformat-pedantic only for NSInteger/NSUInteger %zu/%zi on Darwin
ClosedPublic

Authored by jfb on May 23 2018, 3:07 PM.

Details

Summary

Pick D42933 back up, and make NSInteger/NSUInteger with %zu/%zi specifiers on Darwin warn only in pedantic mode. The default -Wformat recently started warning for the following code because of the added support for analysis for the '%zi' specifier.

NSInteger i = NSIntegerMax;
NSLog(@"max NSInteger = %zi", i);

The problem is that on armv7 %zi is 'long', and NSInteger is typedefed to 'int' in Foundation. We should avoid this warning as it's inconvenient to our users: it's target specific (happens only on armv7 and not arm64), and breaks their existing code. We should also silence the warning for the '%zu' specifier to ensure consistency. This is acceptable because Darwin guarantees that, despite the unfortunate choice of typedef, sizeof(size_t) == sizeof(NS[U]Integer), the warning is therefore noisy for pedantic reasons. Once this is in I'll update public documentation.

Related discussion on cfe-dev:
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058050.html

rdar://36874921&40501559

Diff Detail

Event Timeline

jfb created this revision.May 23 2018, 3:07 PM
test/FixIt/fixit-format-ios-nopedantic.m
22

maybe i'm missing smth, but i don't see any verification in this test.

jfb marked an inline comment as done.May 23 2018, 3:28 PM
jfb added inline comments.
test/FixIt/fixit-format-ios-nopedantic.m
22

-Werror makes the test fail if something is reported.

jfb edited the summary of this revision. (Show Details)May 23 2018, 3:42 PM
ahatanak added inline comments.May 24 2018, 9:41 AM
test/SemaObjC/format-size-spec-nsinteger.m
19

This test looks identical to test/SemaObjC/format-size-spec-nsinteger-nopedantic.m except for the CHECK lines. You can probably merge the two files if you separate the CHECK lines from the lines that call NSLog (see test/Sema/format-strings-darwin.c for example).

jfb updated this revision to Diff 148467.May 24 2018, 1:13 PM
jfb marked an inline comment as done.
  • Merge format-size-spec-nsinteger
jfb marked an inline comment as done.May 24 2018, 1:14 PM
jfb added inline comments.
test/SemaObjC/format-size-spec-nsinteger.m
19

Cute, I didn't know about this pattern.

This is relaxing -Wformat and making users instead write -Wformat-pedantic to get the strong guarantees, which is the opposite of what I thought the consensus was. Have I misunderstood something?

include/clang/Analysis/Analyses/FormatString.h
263–266

If we're going to update this, can you fix the parameter names to be in line with the coding conventions?

lib/Sema/SemaChecking.cpp
6613–6617

These identifiers also need some love to match the coding conventions. Happens elsewhere in this patch as well.

jfb marked an inline comment as done.May 26 2018, 12:20 PM

This is relaxing -Wformat and making users instead write -Wformat-pedantic to get the strong guarantees, which is the opposite of what I thought the consensus was. Have I misunderstood something?

From Shoaib's email:

Based on all the discussion so far, I think the consensus is that we don't want to relax -Wformat by default, but an additional relaxed option would be fine. That works for me (where I can just switch my codebases to use the new warning option), but it wouldn't handle JF Bastien's use case, so he would still want to pursue the relaxations specific to Apple platforms.

This patch is specific to the Darwin platform, as proposed in Alex' original patch and as I promised to do in the cfe-dev thread. Shoaib's follow-up would be different and do what I think you're referring to.

jfb updated this revision to Diff 148737.May 26 2018, 12:36 PM
jfb marked 2 inline comments as done.
  • Fix variable capitalization.
jfb added a comment.May 26 2018, 12:36 PM

Addressed comments.

In D47290#1113412, @jfb wrote:

This is relaxing -Wformat and making users instead write -Wformat-pedantic to get the strong guarantees, which is the opposite of what I thought the consensus was. Have I misunderstood something?

From Shoaib's email:

Based on all the discussion so far, I think the consensus is that we don't want to relax -Wformat by default, but an additional relaxed option would be fine. That works for me (where I can just switch my codebases to use the new warning option), but it wouldn't handle JF Bastien's use case, so he would still want to pursue the relaxations specific to Apple platforms.

This patch is specific to the Darwin platform, as proposed in Alex' original patch and as I promised to do in the cfe-dev thread. Shoaib's follow-up would be different and do what I think you're referring to.

There were several people on the thread asking to not relax -Wformat for any target, but instead wanted a separate warning flag to perform a more relaxed check.

http://lists.llvm.org/pipermail/cfe-dev/2018-May/057938.html
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058030.html
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058039.html

jfb added a comment.May 29 2018, 2:49 PM
In D47290#1113412, @jfb wrote:

This is relaxing -Wformat and making users instead write -Wformat-pedantic to get the strong guarantees, which is the opposite of what I thought the consensus was. Have I misunderstood something?

From Shoaib's email:

Based on all the discussion so far, I think the consensus is that we don't want to relax -Wformat by default, but an additional relaxed option would be fine. That works for me (where I can just switch my codebases to use the new warning option), but it wouldn't handle JF Bastien's use case, so he would still want to pursue the relaxations specific to Apple platforms.

This patch is specific to the Darwin platform, as proposed in Alex' original patch and as I promised to do in the cfe-dev thread. Shoaib's follow-up would be different and do what I think you're referring to.

There were several people on the thread asking to not relax -Wformat for any target, but instead wanted a separate warning flag to perform a more relaxed check.

http://lists.llvm.org/pipermail/cfe-dev/2018-May/057938.html

I don't think this one is talking about platform-specific warnings, I therefore don't think it applies to this patch.

http://lists.llvm.org/pipermail/cfe-dev/2018-May/058030.html
http://lists.llvm.org/pipermail/cfe-dev/2018-May/058039.html

Hans' comment and your +1 are on-topic here, and are the only ones AFAIK (others don't care or explicitly agree with this patch). What I'd like to emphasize is that the new stricter warnings are causing issues for developers on our platform. The warnings are new, and they don't help developers because the platform guarantees that their code is fine. That frustrates developers because they *know* that we know better (they might even find this discussion when searching! πŸ‘‹), yet we chose to come in and warn them anyways. I have yet to hear a reason why we're doing them a service by warning in this case. It really has the feel of a pedantic warning, and that's what this patch makes it.

I'll quote James slightly out of context:

No, this actually has been a thorn in the side of some people at google,
causing a good deal of angst (only for a very small number of people,
granted). I'd not truly put the blame on the warning, but rather on printf
itself -- and that we're still USING printf (and other functions that
ultimately wrap printf) all over the place. The long-term plan is certainly
to switch to better APIs. But, it'd be nice to be able to reduce the issue
in the meantime.

The problem we have is that Google has an internal "int64" typedef, which
predates the availability of C99 "int64_t". We'd like to eliminate this,
and switch everything over to using the standard int64_t. (Well, really
we'd've liked to have done that years ago...) However, "int64" is defined
as "long long", while "int64_t" is defined as "long" on 64-bit linux. A
bunch of printf specifiers all throughout the codebase use %lld. This
mismatching type-specifier makes it quite difficult to change the
type. Automatically updating all the format strings has not been found to
be easy.

If we could eliminate the warning on mismatch of "%lld" vs "long", on a
platform where they're the same, that could make things a whole lot easier.

It seems Hans' comment came from the sentiment "warning is the right thing to do, and from Google's vast experience it's not a problem". James provides a counter-point in line with ours: we receive exactly that feedback for NSInteger and %z. The code exists, it works fine, the platform guarantees it'll keep working fine, and now a tonne of "useless" warnings come in. That's frustrating because it means developers have these options:

  • Useless code churn in code that otherwise won't change, ever
  • Silence the warning, potentially silencing useful warnings later (because -Wformat-relaxed could start doing more things)

That's frustrating when you update compilers, especially if your update is incremental because only some developers see the warning. It means people are reluctant to update, because the first thing they see is us being pedants, not the cool new features we bring in. If they saw a flood of really useful warnings instead they'd be happy, but even there they get drowned by these format warnings... Kind of a shit sandwich if you'll pardon the graphic description.

Hopefully this makes sense? I'm not sure I summarize my context very well. I'm happy to talk about it in-person next week too.

In D47290#1115352, @jfb wrote:

Hopefully this makes sense? I'm not sure I summarize my context very well. I'm happy to talk about it in-person next week too.

I appreciate the in-person conversation, it helped me to understand your goals better.

That said, I still see this as changing the portability aspects of -Wformat (because we'd be changing the behavior based on the target, but this is perhaps different from the portability concerns @rjmccall raised) and I don't see where to get off that train. For instance, do we also silence -Wformat when targeting Windows with %ld and %d because long and int have the same size and alignment requirements for x86 and x86-64 (at least, I don't know about other Windows targets) and that is effectively baked into the platform requirements? If we don't, how is that situation different than the one motivating your changes?

As a user, I very much appreciate that -Wformat gives me the same diagnostics no matter what platform I target, so these changes make me a bit uncomfortable -- those warnings are not useless to me. You (rightly) asked for a concrete explanation of why. My reason is: the user's code here is UB and there is a long history demonstrating that "it's UB, but it works and does what I want!" suddenly changing into "it's not what I want" silently, perhaps many years down the line, because the optimizer suddenly got smarter and it results in exploitable security vulnerabilities. However, I'm no longer as scared that the optimizer is going to start making type-based decisions based on this format specifier string. I can imagine the optimizer making value-based decisions based on a format specifier string (like seeing a value matches a %s specifier and assuming the value cannot be null), but that's not this. However, I have had users for which UB is verboten. Period. Full stop. I doubt these users care at all about Apple's platform, so *this* change isn't likely to impact them, but the *precedent* from this change certainly could. However, perhaps making this pedantic as in this patch will be reasonable (I believe these folks already pass -pedantic when they realize that's a thing). I have to do my homework on that (and can't do that homework this week).

hans added a comment.Jun 7 2018, 5:42 AM

What's special about size_t though? If I understand your patch correctly, it would suppress warning about printing NSInteger with %zd, but still warn about %ld even though ssize_t=long on the target? As a user I'd find this confusing.

If we really want to special-case NSInteger, and given that you're targeting a specific wide-spread pattern maybe that's the right thing to do, I think we should make -Wformat accept (move the warning behind -Wformat-pedantic I suppose) printing NSInteger with *any* integral type of the right size, not just size_t.

Also, I haven't looked at what happens on the scanf side. Does that need an equivalent change?

If we really want to special-case NSInteger, and given that you're targeting a specific wide-spread pattern maybe that's the right thing to do, I think we should make -Wformat accept (move the warning behind -Wformat-pedantic I suppose) printing NSInteger with *any* integral type of the right size, not just size_t.

Would you be similarly okay with %ld and %d on Windows platforms when mixing up int and long?

hans added a comment.Jun 7 2018, 6:51 AM

If we really want to special-case NSInteger, and given that you're targeting a specific wide-spread pattern maybe that's the right thing to do, I think we should make -Wformat accept (move the warning behind -Wformat-pedantic I suppose) printing NSInteger with *any* integral type of the right size, not just size_t.

Would you be similarly okay with %ld and %d on Windows platforms when mixing up int and long?

No, I'm against a general relaxation of -Wformat, but to solve JF's problem I think special-casing NSInteger might be reasonable.

If we really want to special-case NSInteger, and given that you're targeting a specific wide-spread pattern maybe that's the right thing to do, I think we should make -Wformat accept (move the warning behind -Wformat-pedantic I suppose) printing NSInteger with *any* integral type of the right size, not just size_t.

Would you be similarly okay with %ld and %d on Windows platforms when mixing up int and long?

No, I'm against a general relaxation of -Wformat, but to solve JF's problem I think special-casing NSInteger might be reasonable.

How is JF's problem different?

hans added a comment.Jun 7 2018, 7:15 AM

If we really want to special-case NSInteger, and given that you're targeting a specific wide-spread pattern maybe that's the right thing to do, I think we should make -Wformat accept (move the warning behind -Wformat-pedantic I suppose) printing NSInteger with *any* integral type of the right size, not just size_t.

Would you be similarly okay with %ld and %d on Windows platforms when mixing up int and long?

No, I'm against a general relaxation of -Wformat, but to solve JF's problem I think special-casing NSInteger might be reasonable.

How is JF's problem different?

It concerns a vendor-specific type. Of course I personally think it would be better if the code could be fixed, but it doesn't sound like that's an option so then I think special-casing for NSInteger is an acceptable solution.

If we really want to special-case NSInteger, and given that you're targeting a specific wide-spread pattern maybe that's the right thing to do, I think we should make -Wformat accept (move the warning behind -Wformat-pedantic I suppose) printing NSInteger with *any* integral type of the right size, not just size_t.

Would you be similarly okay with %ld and %d on Windows platforms when mixing up int and long?

No, I'm against a general relaxation of -Wformat, but to solve JF's problem I think special-casing NSInteger might be reasonable.

How is JF's problem different?

It concerns a vendor-specific type. Of course I personally think it would be better if the code could be fixed, but it doesn't sound like that's an option so then I think special-casing for NSInteger is an acceptable solution.

Okay, that's fair, but the vendor-specific type for my Windows example is spelled DWORD. I'm really worried that this special case will become a precedent and we'll wind up with -Wformat being relaxed for everything based on the same rationale. If that's how the community wants -Wformat to work, cool, but I'd like to know if we're intending to change (what I see as) the design of this warning.

Okay, that's fair, but the vendor-specific type for my Windows example is spelled DWORD. I'm really worried that this special case will become a precedent and we'll wind up with -Wformat being relaxed for everything based on the same rationale. If that's how the community wants -Wformat to work, cool, but I'd like to know if we're intending to change (what I see as) the design of this warning.

I spoke with @jfb offline and am less concerned about this patch now. He's welcome to correct me if I misrepresent anything, but the precedent this sets is that a platform "owner" (someone with authority, not Joe Q Random-User) can relax -Wformat as in this patch, but this is not a precedent for a blanket change to -Wformat just because the UB happens to work and we "know" it.

jfb added a comment.Jun 8 2018, 11:18 PM

Okay, that's fair, but the vendor-specific type for my Windows example is spelled DWORD. I'm really worried that this special case will become a precedent and we'll wind up with -Wformat being relaxed for everything based on the same rationale. If that's how the community wants -Wformat to work, cool, but I'd like to know if we're intending to change (what I see as) the design of this warning.

I spoke with @jfb offline and am less concerned about this patch now. He's welcome to correct me if I misrepresent anything, but the precedent this sets is that a platform "owner" (someone with authority, not Joe Q Random-User) can relax -Wformat as in this patch, but this is not a precedent for a blanket change to -Wformat just because the UB happens to work and we "know" it.

Thanks for asking these questions Aaron, it's helped answer everyone's concerns and explain our respective positions. You've certainly summarized what I was thinking.

It sounds like you're both OK moving forward with this patch?

In D47290#1127190, @jfb wrote:

Okay, that's fair, but the vendor-specific type for my Windows example is spelled DWORD. I'm really worried that this special case will become a precedent and we'll wind up with -Wformat being relaxed for everything based on the same rationale. If that's how the community wants -Wformat to work, cool, but I'd like to know if we're intending to change (what I see as) the design of this warning.

I spoke with @jfb offline and am less concerned about this patch now. He's welcome to correct me if I misrepresent anything, but the precedent this sets is that a platform "owner" (someone with authority, not Joe Q Random-User) can relax -Wformat as in this patch, but this is not a precedent for a blanket change to -Wformat just because the UB happens to work and we "know" it.

Thanks for asking these questions Aaron, it's helped answer everyone's concerns and explain our respective positions. You've certainly summarized what I was thinking.

Excellent!

It sounds like you're both OK moving forward with this patch?

I am okay moving forward.

arphaman accepted this revision.Jun 22 2018, 2:33 PM
arphaman added a subscriber: arphaman.

LGTM. Thanks for working on this!

This revision is now accepted and ready to land.Jun 22 2018, 2:33 PM
This revision was automatically updated to reflect the committed changes.