This is an archive of the discontinued LLVM Phabricator instance.

[clang][NFC] Inclusive terms: replace some uses of sanity in clang
ClosedPublic

Authored by ZarkoCA on Nov 16 2021, 1:20 PM.

Details

Summary

Rewording of comments to avoid using sanity test, sanity check.

Diff Detail

Event Timeline

ZarkoCA created this revision.Nov 16 2021, 1:20 PM
ZarkoCA requested review of this revision.Nov 16 2021, 1:20 PM
aaron.ballman accepted this revision.Nov 18 2021, 4:48 AM

LGTM, thank you!

This revision is now accepted and ready to land.Nov 18 2021, 4:48 AM

Do we have a comprehensive list of non-inclusive terms and their inclusive correspondent somewhere available?
I mean master -> main, white list -> inclusive list, sanity -> validation, ...
I'd assume that we go through that list, and that could give me a clue of how many such patches to expect in the future.

Also, I was wondering that in list perhaps we could provide why we consider a term non-inclusive. Maybe it is just me but why is sanity considered non-inclusive?

Do we have a comprehensive list of non-inclusive terms and their inclusive correspondent somewhere available?
I mean master -> main, white list -> inclusive list, sanity -> validation, ...
I'd assume that we go through that list, and that could give me a clue of how many such patches to expect in the future.

Also, I was wondering that in list perhaps we could provide why we consider a term non-inclusive. Maybe it is just me but why is sanity considered non-inclusive?

I don't know of an official LLVM list, there could be one but I am not certain.

These two sources have been a good place to start for me:
https://tools.ietf.org/id/draft-knodel-terminology-00.html
https://developers.google.com/style/inclusive-documentation

I have been working on finding replacements for blacklist, whitelist, and sanity in Clang/LLVM and @quinnp has been working on replacing master when possible. That's the extent of our list.

Quuxplusone requested changes to this revision.Nov 18 2021, 7:25 AM
Quuxplusone added a subscriber: Quuxplusone.

Peanut gallery says: I'd recommend not taking this particular patch; it seems much less "obvious" than the whitelist/inclusionlist type of patch. Whereas "whitelist" is just a made-up buzzword that can easily be replaced with a different made-up buzzword, "sanity check" is not at all the same as "validation test." In many cases, I think "sanity-check" could be reasonably replaced with "smoke-test," but (1) this PR doesn't do that, and (2) the phrase "smoke-test" is probably harder to understand, and (3) this PR currently touches many instances of "sanity/insanity/sane/insane" in code comments which have nothing to do with testing or checking at all.

"We could do X, but that would be insane" does not mean "We could do X, but that would not pass validations." It means it would be stupid, irrational, foolish for obvious reasons, something along those lines.

I agree that "X is dumb/stupid/insane" is a bad code comment and should be improved. It is bad not just because it is un-PC but because it is vague and therefore not (directly) helpful to the reader. However, you cannot fix this kind of comment by just substituting some made-up reason like "it would fail validation," because a lying comment is worse than a vague comment.

Not all of the individual diffs in this PR are bad. But some of them are.

clang/lib/Sema/SemaDeclCXX.cpp
9175–9176

This comment seems incorrectly translated.

12260

This comment seems incorrectly translated.

clang/lib/StaticAnalyzer/Core/Store.cpp
252–255

This comment seems incorrectly translated. I interpret the writer's intent as, "We really don't ever expect to get here from a reinterpret_cast; but if we ever do, let's just return something plausible [so that we avoid doing anything insane in the following codepath]."

According to my interpretation, this would be better expressed as a simple assert, and you should file a bug if the assert ever gets hit.

Alternatively, this if is necessary for correctness, and the comment should simply say something like "We can get here from a reinterpret_cast; in that case, return UnknownVal so that [whatever the reason is]."

This revision now requires changes to proceed.Nov 18 2021, 7:25 AM

Do we have a comprehensive list of non-inclusive terms and their inclusive correspondent somewhere available?
I mean master -> main, white list -> inclusive list, sanity -> validation, ...
I'd assume that we go through that list, and that could give me a clue of how many such patches to expect in the future.

Also, I was wondering that in list perhaps we could provide why we consider a term non-inclusive. Maybe it is just me but why is sanity considered non-inclusive?

How dare you question our world's new overlords.

Do we have a comprehensive list of non-inclusive terms and their inclusive correspondent somewhere available?
I mean master -> main, white list -> inclusive list, sanity -> validation, ...
I'd assume that we go through that list, and that could give me a clue of how many such patches to expect in the future.

Also, I was wondering that in list perhaps we could provide why we consider a term non-inclusive. Maybe it is just me but why is sanity considered non-inclusive?

https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc has a pretty reasonable explanation about why sanity is problematic.

How dare you question our world's new overlords.

This is not a particularly constructive comment...

ZarkoCA updated this revision to Diff 388207.Nov 18 2021, 8:05 AM
ZarkoCA edited the summary of this revision. (Show Details)
  • Reword comments based on suggestions.

Peanut gallery says: I'd recommend not taking this particular patch; it seems much less "obvious" than the whitelist/inclusionlist type of patch. Whereas "whitelist" is just a made-up buzzword that can easily be replaced with a different made-up buzzword, "sanity check" is not at all the same as "validation test." In many cases, I think "sanity-check" could be reasonably replaced with "smoke-test," but (1) this PR doesn't do that, and (2) the phrase "smoke-test" is probably harder to understand, and (3) this PR currently touches many instances of "sanity/insanity/sane/insane" in code comments which have nothing to do with testing or checking at all.

I think I understand your point and have tried to address that. Additionally, I would also argue that sanity check/test also serve as buzzwords and, while we may need to take greater care of how we reword things so they convey the meaning better, it shouldn't be fine to find replacements for them. Your review helps, thanks.

"We could do X, but that would be insane" does not mean "We could do X, but that would not pass validations." It means it would be stupid, irrational, foolish for obvious reasons, something along those lines.

I agree that "X is dumb/stupid/insane" is a bad code comment and should be improved. It is bad not just because it is un-PC but because it is vague and therefore not (directly) helpful to the reader. However, you cannot fix this kind of comment by just substituting some made-up reason like "it would fail validation," because a lying comment is worse than a vague comment.

Not all of the individual diffs in this PR are bad. But some of them are.

I've tried to address this in the updated diff. However, I think this also makes a case that there is a general issue with using sanity check/test aside from inclusive language.

clang/include/clang/AST/Redeclarable.h
261–262

// Make sure we don't infinite-loop on an invalid redecl chain. This should never happen.

clang/include/clang/CodeGen/CGFunctionInfo.h
253

This still feels like a misuse of "validation."
I'd say something like // Check that unpaddedCoerceToType has roughly the right shape.

To me, "validation" suggests the phrase "passes validations," which is like a Quality Assurance thing, and means "we did a lot of tests and couldn't break it, it's definitely valid," which is the exact opposite of the rough-cut quick-and-dirty smoke-test we mean when we say "sanity check."

After we have validated an input, the postcondition should be that the input is valid. That's not what's happening in any of these cases. These sanity-checks are just rejecting specific easy-to-detect invalid cases. They're not digging deep to actually validate the input. They're just sanity-checking it.

clang/include/clang/Sema/Lookup.h
708–709

The original comment strikes me as pretty useless, except that it's kind of obliquely explaining the meaning of the ungrammatical bool sanity() const. It could have been fixed better, pre-PC, by just removing the comment and changing the function to bool isSane() const. I have no particular suggestion for this one, other than "you'll have to look at how it's used" and/or "just leave the comment alone, until you're ready to rename the actual identifiers too" and/or "just delete the comment because it's useless."

clang/lib/Analysis/BodyFarm.cpp
793–795

// We expect that the property is...
although again it's not clear to me why this isn't an assertion.

clang/lib/Basic/SourceManager.cpp
61–67

This code seems silly to me. I would just delete these lines; or else how about

llvm::MemoryBuffer::BufferKind ContentCache::getMemoryBufferKind() const {
  if (Buffer == nullptr) {
    assert(0 && "Buffer should never be null");
    return llvm::MemoryBuffer::MemoryBuffer_Malloc;
  }
  return Buffer->getBufferKind();
}
864–865

Throughout this file, I'd just delete these comments.
This isn't the use of "sanity-checking" I'm familiar with, btw. This is what I would call "defensive programming," where we believe a codepath is unreachable but we leave it in the codebase anyway "just in case." I think the fact that the codepath starts with assert(0 && "reason") is sufficient to mark it as a defensive codepath, and it doesn't need any comment about why it's "sane" and/or "required".

The codepath on line 64 is the same deal.

Validation: Ensure that the state is good. (Can generally be done only to user inputs. Once pointers are in the mix, "validation" of an intermediate state quickly becomes Halting-Problem-hard.)
Sanity-checking/smoke-testing: Quickly reject some states that are obviously bad. (Assert-fail when "this should never be happening.")
Defensive programming: Do something sensible even when in a bad state. ("This should never be happening," but describe what to do instead of asserting, anyway.)
"For sanity": This data field should never be read ever again, but just in case it is, put something sensible into it. ("That should never happen.")

clang/lib/Driver/ToolChains/Clang.cpp
4305–4306

This comment can just be eliminated.

clang/lib/Sema/SemaChecking.cpp
5535–5536

This is the "for sanity" idiom; it has nothing to do with validation or even smoke-testing. This comment translates to something like

// GCC does not enforce these rules for GNU atomics, but we do,
// because if we didn't, it would be very confusing. [For whom? How so?]

Ditto line 5577.

clang/lib/Sema/SemaDecl.cpp
12624–12626

I suggest that lines 12627-12628 should be moved up here and combined with this comment in some way. We have the power to be much less vague here about the precise form of sanity we're attempting to restore.

clang/lib/Sema/SemaDeclCXX.cpp
9175–9176

This comment still seems incorrectly translated.
Things we do "for sanity's sake" aren't necessarily required, technically; but we're doing them anyway, for sanity.

clang/lib/Sema/SemaExpr.cpp
11187–11188

// Reject obviously incorrect shift values.
or just eliminate the comment because it's redundant with line 11189

clang/lib/Sema/SemaExprCXX.cpp
1510–1513

This comment is incorrectly translated. I'm not sure I understand what's going on on this codepath, but it seems like the right comment is just
// Functions aren't objects, so they can't take initializers.
The original commenter would have added, The Standard doesn't seem to say this anywhere, but it makes sense, right? However, I'm sure there must be at least an open issue about this, if it hasn't already been fixed; that second sentence would just bit-rot, so I wouldn't bother adding it.

clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp
185

I believe this is another instance of someone misusing the term "sanity check" for (something like) "defensive programming." We're not checking anything on this codepath; we're doing something sensible in a case that has already been dealt with elsewhere (but in this case, it actually can fall through to here; asserting here would be wrong). I'd write:
// The frontend has already issued a warning. Just return.

clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
1672–1673

// At this point, the negation of the constraint should be infeasible. If it is feasible,...

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
329–333

This seems to be the "for sanity" idiom, but I have no idea what it's trying to say (because I don't know this code).

ZarkoCA updated this revision to Diff 388332.Nov 18 2021, 2:41 PM
  • Address review comments
  • Also rename some variables
ZarkoCA marked 13 inline comments as done.Nov 18 2021, 2:44 PM

@Quuxplusone Thanks for thorough review.

I think "sanity-check" could be reasonably replaced with "smoke-test," but (1) this PR doesn't do that, and (2) the phrase "smoke-test" is probably harder to understand,

It seems difficult considering the potential atmospheric pollution, carbon footprint, health issues, lung cancer, drug abuse, etc. implied.

aaron.ballman accepted this revision.Nov 19 2021, 5:10 AM

I think "sanity-check" could be reasonably replaced with "smoke-test," but (1) this PR doesn't do that, and (2) the phrase "smoke-test" is probably harder to understand,

It seems difficult considering the potential atmospheric pollution, carbon footprint, health issues, lung cancer, drug abuse, etc. implied.

This is not a constructive comment either, please stop.

@Quuxplusone Thanks for thorough review.

+1, you caught some stuff I was glossing over, but this is much improved. I made a few tiny suggestions (take them or leave them). Continues to LGTM

clang/include/clang/AST/Redeclarable.h
261–262

Alternatively, "Make sure we don't infinitely loop..."

clang/lib/Sema/SemaChecking.cpp
5536
5578
ZarkoCA marked 3 inline comments as done.Nov 19 2021, 7:07 AM

I think "sanity-check" could be reasonably replaced with "smoke-test," but (1) this PR doesn't do that, and (2) the phrase "smoke-test" is probably harder to understand,

It seems difficult considering the potential atmospheric pollution, carbon footprint, health issues, lung cancer, drug abuse, etc. implied.

This is not a constructive comment either, please stop.

@Quuxplusone Thanks for thorough review.

+1, you caught some stuff I was glossing over, but this is much improved. I made a few tiny suggestions (take them or leave them). Continues to LGTM

Yes, agreed, the suggestions made this much better. Thanks.

ZarkoCA updated this revision to Diff 388494.Nov 19 2021, 7:08 AM
  • Add FIXME to comments and fix grammar on one comment

(Re repeated thanks: You're welcome. :) For the record, I personally see nothing wrong with the phrase "sanity-check" either; but given that it's gonna happen, and we're re-wording comments on by definition the subtlest and most confusing parts of the code, I'm trying to help us not to lose/distort the semantics of those comments. Some of these comments are even improving through this exercise.)

clang/include/clang/AST/Redeclarable.h
261–262

s/on on/on an/
Also, I think comments should always end with a period ., or is it the other way around? :)

clang/include/clang/Sema/Lookup.h
708–709

The new version has the problem that check() is really vague, which again means that in order to explain what it does, you have to use the term "sanity-check". ;) Perhaps change the identifier to checkAssumptions() or even checkDebugAssumptions()?
(Pre-existing problem: the name of the function still doesn't describe what it does, because it doesn't check or assert anything; it simply returns true or false, and it's up to the caller to assert on the return value. But conformsToAssumptions() is a silly name.)

clang/lib/Basic/DiagnosticIDs.cpp
695

Minimum, and also, I think it's a maximum? :P

clang/lib/Sema/SemaDeclCXX.cpp
9175–9176

"Don't check ... but check it anyway"?

ZarkoCA updated this revision to Diff 388550.Nov 19 2021, 10:01 AM
ZarkoCA marked 3 inline comments as done.
  • Address latest round of comments
  • Fix unrelated formatting to stop distracting clang-format linter complaints
ZarkoCA added inline comments.Nov 19 2021, 10:05 AM
clang/lib/Sema/SemaDeclCXX.cpp
9175–9176

Right, that didn't make sense :). I noticed that there were warnings for this case in SemaDecl.cpp AFAIU so edited the comment to state that. Should be better now?

clang/lib/Sema/SemaDeclCXX.cpp
9175–9176

Well, this version of the comment now gives the impression that it must be clear to someone. ;) I don't understand its implications, but I also don't know the code.

Specifically, I don't know what "This" refers to (grammatically) — "Anonymous union types aren't conforming, but we support them anyway, and this is the right thing to do with them"? "Our behavior in this case isn't conforming, but it wouldn't make sense to do the conforming thing [wat]"? or something else?

More fundamentally to my confusion (but not to that hypothetical other someone), I don't know what "the implicit member of the anonymous union type" actually means (in terms of the arcane details of C++), i.e., I personally don't know when this codepath is hit or what its effect is when it does get hit.

12260

Perhaps just // where they may be used by two-phase lookup. (But this is now just a nit.)

Quuxplusone accepted this revision.Nov 19 2021, 10:35 AM
This revision is now accepted and ready to land.Nov 19 2021, 10:35 AM
ZarkoCA marked an inline comment as done.Nov 19 2021, 11:58 AM

Thanks @aaron.ballman and @Quuxplusone for the constructive reviews.

gandhi21299 added inline comments.
clang/lib/Sema/SemaChecking.cpp
5536

Is this change really necessary? It is a confusing comment and probably not too helpful for the developers.

If you aren't sure what a comment means, please feel free to CC Richard or me, and we might be able to help.

clang/include/clang/Analysis/CFG.h
520

Proper translation here is probably "assertions". A "validity check" sounds like a semantic restriction on the source language.

clang/lib/Sema/SemaChecking.cpp
5536

Perhaps "but we do to help catch trivial type errors."

5578

We enforce this for consistency with other atomics, which generally all require a trivially-copyable type because atomics just copy bits.

clang/lib/Sema/SemaDeclCXX.cpp
9175–9176

@rsmith probably has a better idea. I think it's saying that we shouldn't fall down into the generic path for the implicit field created for an anonymous union, presumably because we do special-case checks on the members of the anonymous union just above this point. I guess this isn't explicitly sanctioned by the standard but is the only sensible approach.

12260

This comment seems to be missing something — I think the previous comment was implying that the language rule really needs tags to be hidden when instantiating a using declaration. I don't know why that would be true, though; @rsmith might.

clang/lib/Sema/SemaExprCXX.cpp
1510–1513

I would prefer something like:

The standard doesn't explicitly forbid function types here, but that's an
obvious oversight, as there's no way to dynamically construct a function
in general.

@ZarkoCA If you are planning to do a lot of this, it might be good to write a script (or a clang-tidy check even) that we can add to CI, so these terms don't get re-introduced.

@rjmccall thanks for having a look, I will clarify the comments with your suggestions.

@ZarkoCA If you are planning to do a lot of this, it might be good to write a script (or a clang-tidy check even) that we can add to CI, so these terms don't get re-introduced.

That's a good idea, I will look into this definitely.

@rjmccall These are the proposed changes which address some of your comments. I am planning on waiting for further clarification on some of the others.

clang/include/clang/Analysis/CFG.h
520

Changed to:

/// Number of different kinds, for assertions. We subtract 1 so that
/// to keep receiving compiler warnings when we don't cover all enum values
/// in a switch
clang/lib/Sema/SemaChecking.cpp
5536

Changed to:

// GCC does not enforce these rules for GNU atomics, but we do to help catch
// trivial type errors.
5578

Changed to:

// For GNU atomics, require a trivially-copyable type. This is not part of
// the GNU atomics specification but we enforce it for consistency with
// other atomics which generally all require a trivially-copyable type. This
// is because atomics just copy bits.
clang/lib/Sema/SemaExprCXX.cpp
1510–1513

Changed to:

// The standard doesn't explicitly forbid function types here, but that's an
// obvious oversight, as there's no way to dynamically construct a function
// in general.

Those all look good, thanks.

Those all look good, thanks.

Committed here

This comment was removed by mehdi_amini.
Herald added a project: Restricted Project. · View Herald Transcript