Rewording of comments to avoid using sanity test, sanity check.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Do we have a comprehensive list of non-inclusive terms and their inclusive correspondent somewhere available?
I mean master -> main, white list -> inclusive list, sanity -> validation, ...
I'd assume that we go through that list, and that could give me a clue of how many such patches to expect in the future.
Also, I was wondering that in list perhaps we could provide why we consider a term non-inclusive. Maybe it is just me but why is sanity considered non-inclusive?
I don't know of an official LLVM list, there could be one but I am not certain.
These two sources have been a good place to start for me:
https://tools.ietf.org/id/draft-knodel-terminology-00.html
https://developers.google.com/style/inclusive-documentation
I have been working on finding replacements for blacklist, whitelist, and sanity in Clang/LLVM and @quinnp has been working on replacing master when possible. That's the extent of our list.
Peanut gallery says: I'd recommend not taking this particular patch; it seems much less "obvious" than the whitelist/inclusionlist type of patch. Whereas "whitelist" is just a made-up buzzword that can easily be replaced with a different made-up buzzword, "sanity check" is not at all the same as "validation test." In many cases, I think "sanity-check" could be reasonably replaced with "smoke-test," but (1) this PR doesn't do that, and (2) the phrase "smoke-test" is probably harder to understand, and (3) this PR currently touches many instances of "sanity/insanity/sane/insane" in code comments which have nothing to do with testing or checking at all.
"We could do X, but that would be insane" does not mean "We could do X, but that would not pass validations." It means it would be stupid, irrational, foolish for obvious reasons, something along those lines.
I agree that "X is dumb/stupid/insane" is a bad code comment and should be improved. It is bad not just because it is un-PC but because it is vague and therefore not (directly) helpful to the reader. However, you cannot fix this kind of comment by just substituting some made-up reason like "it would fail validation," because a lying comment is worse than a vague comment.
Not all of the individual diffs in this PR are bad. But some of them are.
clang/lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
9175–9176 | This comment seems incorrectly translated. | |
12260 | This comment seems incorrectly translated. | |
clang/lib/StaticAnalyzer/Core/Store.cpp | ||
252–255 | This comment seems incorrectly translated. I interpret the writer's intent as, "We really don't ever expect to get here from a reinterpret_cast; but if we ever do, let's just return something plausible [so that we avoid doing anything insane in the following codepath]." According to my interpretation, this would be better expressed as a simple assert, and you should file a bug if the assert ever gets hit. Alternatively, this if is necessary for correctness, and the comment should simply say something like "We can get here from a reinterpret_cast; in that case, return UnknownVal so that [whatever the reason is]." |
https://gist.github.com/seanmhanson/fe370c2d8bd2b3228680e38899baf5cc has a pretty reasonable explanation about why sanity is problematic.
This is not a particularly constructive comment...
I think I understand your point and have tried to address that. Additionally, I would also argue that sanity check/test also serve as buzzwords and, while we may need to take greater care of how we reword things so they convey the meaning better, it shouldn't be fine to find replacements for them. Your review helps, thanks.
"We could do X, but that would be insane" does not mean "We could do X, but that would not pass validations." It means it would be stupid, irrational, foolish for obvious reasons, something along those lines.
I agree that "X is dumb/stupid/insane" is a bad code comment and should be improved. It is bad not just because it is un-PC but because it is vague and therefore not (directly) helpful to the reader. However, you cannot fix this kind of comment by just substituting some made-up reason like "it would fail validation," because a lying comment is worse than a vague comment.
Not all of the individual diffs in this PR are bad. But some of them are.
I've tried to address this in the updated diff. However, I think this also makes a case that there is a general issue with using sanity check/test aside from inclusive language.
clang/include/clang/AST/Redeclarable.h | ||
---|---|---|
261–262 | // Make sure we don't infinite-loop on an invalid redecl chain. This should never happen. | |
clang/include/clang/CodeGen/CGFunctionInfo.h | ||
253 | This still feels like a misuse of "validation." To me, "validation" suggests the phrase "passes validations," which is like a Quality Assurance thing, and means "we did a lot of tests and couldn't break it, it's definitely valid," which is the exact opposite of the rough-cut quick-and-dirty smoke-test we mean when we say "sanity check." After we have validated an input, the postcondition should be that the input is valid. That's not what's happening in any of these cases. These sanity-checks are just rejecting specific easy-to-detect invalid cases. They're not digging deep to actually validate the input. They're just sanity-checking it. | |
clang/include/clang/Sema/Lookup.h | ||
708–709 | The original comment strikes me as pretty useless, except that it's kind of obliquely explaining the meaning of the ungrammatical bool sanity() const. It could have been fixed better, pre-PC, by just removing the comment and changing the function to bool isSane() const. I have no particular suggestion for this one, other than "you'll have to look at how it's used" and/or "just leave the comment alone, until you're ready to rename the actual identifiers too" and/or "just delete the comment because it's useless." | |
clang/lib/Analysis/BodyFarm.cpp | ||
793–795 | // We expect that the property is... | |
clang/lib/Basic/SourceManager.cpp | ||
61–67 | This code seems silly to me. I would just delete these lines; or else how about llvm::MemoryBuffer::BufferKind ContentCache::getMemoryBufferKind() const { if (Buffer == nullptr) { assert(0 && "Buffer should never be null"); return llvm::MemoryBuffer::MemoryBuffer_Malloc; } return Buffer->getBufferKind(); } | |
864–865 | Throughout this file, I'd just delete these comments. The codepath on line 64 is the same deal. Validation: Ensure that the state is good. (Can generally be done only to user inputs. Once pointers are in the mix, "validation" of an intermediate state quickly becomes Halting-Problem-hard.) | |
clang/lib/Driver/ToolChains/Clang.cpp | ||
4305–4306 | This comment can just be eliminated. | |
clang/lib/Sema/SemaChecking.cpp | ||
5535–5536 | This is the "for sanity" idiom; it has nothing to do with validation or even smoke-testing. This comment translates to something like // GCC does not enforce these rules for GNU atomics, but we do, // because if we didn't, it would be very confusing. [For whom? How so?] Ditto line 5577. | |
clang/lib/Sema/SemaDecl.cpp | ||
12624–12626 | I suggest that lines 12627-12628 should be moved up here and combined with this comment in some way. We have the power to be much less vague here about the precise form of sanity we're attempting to restore. | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
9175–9176 | This comment still seems incorrectly translated. | |
clang/lib/Sema/SemaExpr.cpp | ||
11187–11188 | // Reject obviously incorrect shift values. | |
clang/lib/Sema/SemaExprCXX.cpp | ||
1510–1513 | This comment is incorrectly translated. I'm not sure I understand what's going on on this codepath, but it seems like the right comment is just | |
clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp | ||
185 | I believe this is another instance of someone misusing the term "sanity check" for (something like) "defensive programming." We're not checking anything on this codepath; we're doing something sensible in a case that has already been dealt with elsewhere (but in this case, it actually can fall through to here; asserting here would be wrong). I'd write: | |
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
1672–1673 | // At this point, the negation of the constraint should be infeasible. If it is feasible,... | |
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | ||
329–333 | This seems to be the "for sanity" idiom, but I have no idea what it's trying to say (because I don't know this code). |
It seems difficult considering the potential atmospheric pollution, carbon footprint, health issues, lung cancer, drug abuse, etc. implied.
This is not a constructive comment either, please stop.
+1, you caught some stuff I was glossing over, but this is much improved. I made a few tiny suggestions (take them or leave them). Continues to LGTM
clang/include/clang/AST/Redeclarable.h | ||
---|---|---|
261–262 | Alternatively, "Make sure we don't infinitely loop..." | |
clang/lib/Sema/SemaChecking.cpp | ||
5536 | ||
5578 |
(Re repeated thanks: You're welcome. :) For the record, I personally see nothing wrong with the phrase "sanity-check" either; but given that it's gonna happen, and we're re-wording comments on by definition the subtlest and most confusing parts of the code, I'm trying to help us not to lose/distort the semantics of those comments. Some of these comments are even improving through this exercise.)
clang/include/clang/AST/Redeclarable.h | ||
---|---|---|
261–262 | s/on on/on an/ | |
clang/include/clang/Sema/Lookup.h | ||
708–709 | The new version has the problem that check() is really vague, which again means that in order to explain what it does, you have to use the term "sanity-check". ;) Perhaps change the identifier to checkAssumptions() or even checkDebugAssumptions()? | |
clang/lib/Basic/DiagnosticIDs.cpp | ||
695 | Minimum, and also, I think it's a maximum? :P | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
9175–9176 | "Don't check ... but check it anyway"? |
- Address latest round of comments
- Fix unrelated formatting to stop distracting clang-format linter complaints
clang/lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
9175–9176 | Right, that didn't make sense :). I noticed that there were warnings for this case in SemaDecl.cpp AFAIU so edited the comment to state that. Should be better now? |
clang/lib/Sema/SemaDeclCXX.cpp | ||
---|---|---|
9175–9176 | Well, this version of the comment now gives the impression that it must be clear to someone. ;) I don't understand its implications, but I also don't know the code. Specifically, I don't know what "This" refers to (grammatically) — "Anonymous union types aren't conforming, but we support them anyway, and this is the right thing to do with them"? "Our behavior in this case isn't conforming, but it wouldn't make sense to do the conforming thing [wat]"? or something else? More fundamentally to my confusion (but not to that hypothetical other someone), I don't know what "the implicit member of the anonymous union type" actually means (in terms of the arcane details of C++), i.e., I personally don't know when this codepath is hit or what its effect is when it does get hit. | |
12260 | Perhaps just // where they may be used by two-phase lookup. (But this is now just a nit.) |
clang/lib/Sema/SemaChecking.cpp | ||
---|---|---|
5536 | Is this change really necessary? It is a confusing comment and probably not too helpful for the developers. |
If you aren't sure what a comment means, please feel free to CC Richard or me, and we might be able to help.
clang/include/clang/Analysis/CFG.h | ||
---|---|---|
520 | Proper translation here is probably "assertions". A "validity check" sounds like a semantic restriction on the source language. | |
clang/lib/Sema/SemaChecking.cpp | ||
5536 | Perhaps "but we do to help catch trivial type errors." | |
5578 | We enforce this for consistency with other atomics, which generally all require a trivially-copyable type because atomics just copy bits. | |
clang/lib/Sema/SemaDeclCXX.cpp | ||
9175–9176 | @rsmith probably has a better idea. I think it's saying that we shouldn't fall down into the generic path for the implicit field created for an anonymous union, presumably because we do special-case checks on the members of the anonymous union just above this point. I guess this isn't explicitly sanctioned by the standard but is the only sensible approach. | |
12260 | This comment seems to be missing something — I think the previous comment was implying that the language rule really needs tags to be hidden when instantiating a using declaration. I don't know why that would be true, though; @rsmith might. | |
clang/lib/Sema/SemaExprCXX.cpp | ||
1510–1513 | I would prefer something like:
|
@ZarkoCA If you are planning to do a lot of this, it might be good to write a script (or a clang-tidy check even) that we can add to CI, so these terms don't get re-introduced.
@rjmccall thanks for having a look, I will clarify the comments with your suggestions.
That's a good idea, I will look into this definitely.
@rjmccall These are the proposed changes which address some of your comments. I am planning on waiting for further clarification on some of the others.
clang/include/clang/Analysis/CFG.h | ||
---|---|---|
520 | Changed to: /// Number of different kinds, for assertions. We subtract 1 so that /// to keep receiving compiler warnings when we don't cover all enum values /// in a switch | |
clang/lib/Sema/SemaChecking.cpp | ||
5536 | Changed to: // GCC does not enforce these rules for GNU atomics, but we do to help catch // trivial type errors. | |
5578 | Changed to: // For GNU atomics, require a trivially-copyable type. This is not part of // the GNU atomics specification but we enforce it for consistency with // other atomics which generally all require a trivially-copyable type. This // is because atomics just copy bits. | |
clang/lib/Sema/SemaExprCXX.cpp | ||
1510–1513 | Changed to: // The standard doesn't explicitly forbid function types here, but that's an // obvious oversight, as there's no way to dynamically construct a function // in general. |
// Make sure we don't infinite-loop on an invalid redecl chain. This should never happen.