Page MenuHomePhabricator

Teach clang to tolerate the 'p = nullptr + n' idiom used by glibc
ClosedPublic

Authored by andrew.w.kaylor on Aug 22 2017, 5:33 PM.

Details

Summary

This patch adds a hack to clang's emitPointerArithmetic() function to get it to emit an inttoptr instruction rather than a GEP with a null base pointer when the 'p = nullptr + n' idiom is used to convert a pointer-sized integer to a pointer. This idiom is used by some versions of glibc and gcc.

This was previously discussed here: http://lists.llvm.org/pipermail/llvm-dev/2017-July/115053.html

Diff Detail

Repository
rL LLVM

Event Timeline

I'm not sure why the svn attributes got attached to the file I added. I'll remove them before committing.

I'd expect this to be more limited.

Something like, "if we have a BinaryOperator of + between a CStyleCastExpr of NullToPointer and 'X', emit inttoptr(X)"

I'd expect this to be more limited.

Something like, "if we have a BinaryOperator of + between a CStyleCastExpr of NullToPointer and 'X', emit inttoptr(X)"

Although maybe that is too strict... I guess stuff gets hairy quickly if you want to be able to also catch (char*)NULL + x....

My intention here was to make this as strict/limited as possible while still handling the cases of interest. There are two different implementations I want to handle. You can see the first variation in the __BPTR_ALIGN definition being added here:

https://github.com/lattera/glibc/commit/2fd4de4b15a66f821057af90714145d2c034a609#diff-cd0c2b2693d632ad8843ae58a6ea7ffaR125

You can see the other in the __INT_TO_PTR definition that was being deleted in the same change set here:

https://github.com/lattera/glibc/commit/2fd4de4b15a66f821057af90714145d2c034a609#diff-cd0c2b2693d632ad8843ae58a6ea7ffaL122

This second case shows up in some benchmark code, so it's important even though glibc phased it out.

I'm really not sure what this looks like in the front end, but it seems like it could be relatively open-ended as to where the value came from.

Basically, my intention is to eliminate anything I know isn't what I'm looking for and then handle whatever remains being added to a null pointer. Given that adding to a null pointer is undefined behavior, whatever we do should be acceptable in all cases. What I found in practice was that even with the existing handling of this it took an awful lot of optimizations before the null-based GEP and the load associated with it got close enough together for us to recognize that we could optimize it away, so I don't think we'd be losing much by tolerating additional cases in the way this patch proposes.

I'd really like to see this done in some way such that we can issue a warning along with generating well-defined code. The warning can specifically state that the code is using an extension, etc.

I'd really like to see this done in some way such that we can issue a warning along with generating well-defined code. The warning can specifically state that the code is using an extension, etc.

Should it warn on any nullptr arithmetic, or just the cases that are being handled by this change?

It would be nice to warn on any nullptr arithmetic, yes. (We probably want the wording of the warning to be a bit different if we're activating this workaround.)

It would be nice to warn on any nullptr arithmetic, yes. (We probably want the wording of the warning to be a bit different if we're activating this workaround.)

+1 (I'll likely want the ability to turn off the warning for one without turning off the warning for the other)

Added warnings for null pointer arithmetic.

I didn't think of this earlier, but strictly speaking, I think "(char*)nullptr+0" isn't undefined in C++? But probably worth emitting the warning anyway; anyone writing out arithmetic on null is probably doing something suspicious, even if it isn't technically undefined.

include/clang/Basic/DiagnosticSemaKinds.td
6032 ↗(On Diff #113134)

The keyword "inttoptr" is part of LLVM, not something we expect users to understand.

lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

Please make sure you use exactly the same check in Sema and CodeGen (probably easiest to stick a helper into lib/AST/).

rsmith edited edge metadata.Aug 29 2017, 1:48 PM

I didn't think of this earlier, but strictly speaking, I think "(char*)nullptr+0" isn't undefined in C++?

Yes, that's correct. (C++'s model is basically equivalent to having an object of size zero at the null address, so things like (char*)nullptr - (char*)nullptr and (char*)nullptr + 0 are valid.)

But probably worth emitting the warning anyway; anyone writing out arithmetic on null is probably doing something suspicious, even if it isn't technically undefined.

I agree. We should probably suppress the warning if the non-pointer operand is known to be zero in C++, and weaken the wording slightly "[..] has undefined behavior if offset is nonzero" otherwise, though.

include/clang/Basic/DiagnosticSemaKinds.td
6032 ↗(On Diff #113134)

How about something like "arithmetic on null pointer treated as cast from integer to pointer as a GNU extension"?

lib/Sema/SemaExpr.cpp
8799 ↗(On Diff #113134)

Maybe // Adding to a null pointer results in undefined behavior. to explain why we warn (a reader of the code can already see that we do warn in this case).

8805 ↗(On Diff #113134)

It seems strange to me to disable this when the RHS is an ICE. If we're going to support this as an extension, we should make the rules for it as simple as we reasonably can; this ICE check seems like an unnecessary complication.

include/clang/Basic/DiagnosticSemaKinds.td
6032 ↗(On Diff #113134)

I like that. Thanks for the suggestion.

lib/Sema/SemaExpr.cpp
8805 ↗(On Diff #113134)

You're probably right. My thinking was that I wanted to keep the extension idiom as narrow as was reasonable, so these were cases that I felt like I could rule out, but the argument for simplicity is compelling since the alternative isn't doing anything particularly desirable in most cases.

lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

That's a good idea, but I'm not really familiar enough with the structure of clang to be sure I'm not doing it in a ham-fisted way. Can you clarify? Are you suggesting adding something like ASTContext::isPointeeTypeCharSize() and ASTContext::isIntegerTypePointerSize()? Or maybe a single specialized function somewhere that does both checks like ASTContext::areOperandNullPointerArithmeticCompatible()?

rsmith added inline comments.Aug 29 2017, 4:57 PM
lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

I would suggest something even more specific, such as isNullPointerPlusIntegerExtension

efriedma added inline comments.Aug 29 2017, 5:04 PM
lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

I'm most concerned about the difference between "isa<llvm::ConstantPointerNull>(pointer)" and "PExp->IgnoreParenCasts()->isNullPointerConstant()"... they're different in important ways.

I was thinking something like "BinaryOperator::isNullPointerArithmeticExtension()"

lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

At this point in Sema the BinaryOperator for the addition hasn't been created yet, right? So a static function that takes the opcode and operands?

efriedma added inline comments.Aug 29 2017, 5:29 PM
lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

I wasn't really thinking about that... but yes, something like that.

Refactored the GNU idiom check to be shared by Sema and CodeGen.
Refined the checks so that nullptr+0 doesn't warn in C++.
Added the zero offset qualification in the warning produced with C++.

rsmith added inline comments.Aug 30 2017, 2:53 PM
lib/AST/Expr.cpp
1857 ↗(On Diff #113311)

If we get here with a value-dependent expression, we should treat it as non-null (do not warn on (char*)PtrTemplateParameter + N).

lib/Sema/SemaExpr.cpp
8804 ↗(On Diff #113311)

Likewise here, we should treat a value-dependent expression as non-null.

8877 ↗(On Diff #113311)

Subtracting zero from a null pointer should not warn in C++.

(Conversely, subtracting a non-null pointer from a pointer should warn in C++, and subtracting any pointer from a null pointer should warn in C.)

8881 ↗(On Diff #113311)

Likewise.

rjmccall added inline comments.Aug 30 2017, 3:17 PM
lib/Sema/SemaExpr.cpp
8808 ↗(On Diff #113134)

I've long thought that it would make sense to store a semantic operation kind in BinaryOperator and UnaryOperator instead of treating the operator alone as a sufficient discriminator. Of course the operator is *usually* sufficient, but there are cases where that's not true — for example, we could use that operation kind to distinguish integer and pointer arithmetic, and maybe even to identify which side of a + is the pointer.

If we had that, we could very easily flag a null+int operation as a completely different semantic operation, which it basically is.

lib/AST/Expr.cpp
1857 ↗(On Diff #113311)

OK. I wasn't sure about that.

So how do I test that? Is this right?

template<typename T, T *P>
T* f(intptr_t offset) {
  return P + offset;
}

char *test(intptr_t offset) {
  return f<char, nullptr>(offset);
}
lib/Sema/SemaExpr.cpp
8877 ↗(On Diff #113311)

Is it OK if I just add a FIXME in the section below that handles pointer-pointer?

rsmith added inline comments.Aug 30 2017, 4:17 PM
lib/AST/Expr.cpp
1857 ↗(On Diff #113311)

You can simplify that slightly by using template<char *P>, but yes.

lib/Sema/SemaExpr.cpp
8807 ↗(On Diff #113311)

This talk about value-dependence reminds me: it is an error to call EvaluateAsInt on a value-dependent expression (the expression evaluator will probably assert). If IExp->isValueDependent(), you should skip the diagnostic, since it might instantiate to zero.

8877 ↗(On Diff #113311)

Yes, that's fine. (But the nullptr - 0 case should be handled in this patch.)

Fixed value-dependent argument in isNullPointerConstant checks.
Added check for C++ zero offset in subtraction.
Added value-dependent test cases.

Does anything else need to be done for this to be ready to land?

efriedma added inline comments.Sep 13 2017, 1:08 PM
include/clang/Basic/DiagnosticSemaKinds.td
6031 ↗(On Diff #113343)

"extension" isn't really right here; this shouldn't be an error in -pedantic-errors mode. Probably best to just stick this into the NullPointerArithmetic, like the other new warning.

include/clang/Basic/DiagnosticSemaKinds.td
6031 ↗(On Diff #113343)

So how should a word the warning? Just this:

"arithmetic on a null pointer treated as a cast from integer to pointer"?

efriedma added inline comments.Sep 13 2017, 2:45 PM
include/clang/Basic/DiagnosticSemaKinds.td
6031 ↗(On Diff #113343)

That wasn't what I meant; the current wording is fine. I meant this should be something like def warn_gnu_null_ptr_arith : Warning<.

include/clang/Basic/DiagnosticSemaKinds.td
6031 ↗(On Diff #113343)

OK. I think I understand the behavior you wanted. I just thought maybe the current wording might be technically incorrect. I wasn't sure how precisely defined we consider "extension" to be in this context.

efriedma added inline comments.Sep 13 2017, 3:01 PM
include/clang/Basic/DiagnosticSemaKinds.td
6031 ↗(On Diff #113343)

The part that makes this a little weird is that unlike most extensions, this code is already well-formed. It's an extension because we're guaranteeing runtime behavior for a construct which has undefined behavior at runtime according to the standard. (This is in contrast to "implementation-defined" behaviors, which are the gaps in the standard we're allowed to fill in as we see fit.)

Given that, I think calling it a "GNU extension" in the text is fine.

-Changed GNU idiom from extension to warning.
-Updated to ToT.

This revision is now accepted and ready to land.Sep 15 2017, 4:47 PM
This revision was automatically updated to reflect the committed changes.

This is breaking buildbots for 32-bit targets because the offset in 'nullptr + int8_t_N' is being implicitly cast to an int. That makes the sizeof(offset) == sizeof(ptr) check turn out differently than my tests were assuming.

I can get the buildbots green quickly by taking out parts of the tests, but I just talked to Erich Keane about this and we think the right way to fix this long term is to stop checking for sizeof(offset) == sizeof(ptr) in the code that identifies the idiom, since that check is of dubious value and would be difficult to always get to behave the way I intended.

I agree, it doesn't add much value.

Either way, though, please make sure you address the buildbot failures quickly.

For the record, Firefox was using this trick. This patch is breaking a ci build (clang trunk + warning as errors)
More information here: https://bugzilla.mozilla.org/show_bug.cgi?id=1402362