This is an archive of the discontinued LLVM Phabricator instance.

Omit nullptr check for sufficiently simple delete-expressions
Needs ReviewPublic

Authored by ahh on Feb 17 2018, 12:21 AM.

Details

Reviewers
rsmith
rjmccall
Summary

[expr.delete] is pretty crystal clear that it's OK to invoke a
deallocation-function on a nullptr delete-expression:

"If the value of the operand of the delete-expression is a null
pointer value, it is unspecified whether a deallocation function will
be called as described above."

Even so, we currently check against nullptr unconditionally. This is
wasteful for anything with a trivial destructor; deleting nullptr is
very rare so it's not worth saving the call to ::operator delete, and
this is a useless branch (and waste of code size) in the common case.

Condition emission of the branch on us needing to actually look
through the pointer for a vtable, size cookie, or nontrivial
destructor. (In principle a nontrivial destructor with no side
effects that didn't touch the object would also be safe to run
unconditionally, but I don't know how to test we have one of those and
who in the world would write one?)

On an important and very large (~500 MiB) Google search binary, this
saves approximately 32 KiB of text. Okay, it's not impressive in a
relative sense, but it's still the right thing to do.

A note on optimization: we still successfully elide delete-expressions of
(literal) nullptr. Where before they were stuck behind a never-taken branch,
now they reduce (effectively) to calls to __builtin_operator_delete(nullptr),
which EarlyCSE is capable of optimizing out. So this has no cost in the
already well-optimized case.

Diff Detail

Event Timeline

ahh created this revision.Feb 17 2018, 12:21 AM
ahh added a comment.Feb 17 2018, 12:26 AM

On my workstation's checkout of head, one test fails (Clang :: Driver/response-file.c) both with and without this change; everything else appears to pass.

I believe that between the tests I add to delete.cpp and the ones that are already there (and destroying-delete.cpp) we cover every case that has to get a nullptr check, and pretty much every one that should *not*.

Name of the helper function is, uh, good enough for me, but no objections to changing it.

kimgr added a subscriber: kimgr.Feb 17 2018, 1:40 AM

Peanut gallery observation: there was a discussion on the Boost list years and years ago where someone made the case that if (x != nullptr) delete x; was measurably faster than just calling delete x; I can't find it now, but I think it might have been in the context of their checked_delete library. Anyway, the reasoning was that with an external nullptr check, you'd pay for one comparison, but without it you'd always pay for a jump + a comparison. I suppose that only holds true for null pointers, for non-null pointers the extra check is just waste.

It looks to me like the compiler inserts an external null check, and you're now removing it in select cases, did I understand that right? I wonder if this could have negative effects for frequent deletion of nullptrs (e.g. a sometimes-allocated member of a heavily used value type).

That said, I'm not sure how valid the observation back then still is.

I wonder if this could have negative effects for frequent deletion of nullptrs (e.g. a sometimes-allocated member of a heavily used value type).

For that to be better, I think we'd need one of two things to happen:

  1. The compiler can statically detect that the pointer is null, and remove the call to operator delete and potentially other code too. (This happens, eg, when inlining vector::push_back on an empty vector.)
  2. The condition cannot be determined statically, but dynamically it turns out that the pointer is very frequently null, so that the cost of the extra checks in the non-null case are cheaper than the cost of the function call in the null case.

For case 1, the optimizer already knows that it can remove calls to usual operator delete functions on a null pointer, so that optimization should not be inhibited by this change.

For case 2, it seems to me that our default assumption should probably be that most deleted pointers are not null. But I don't have measurements to back that up. If the user knows that their pointers are usually null, they can express that knowledge with an if, but if we always generate the branch on null here, then there would be no easy way for the programmer to express their intent that the pointer is usually not null.

lib/CodeGen/CGExprCXX.cpp
1977–1978

Reindent.

LGTM, but I'd also like @rjmccall's opinion.

ahh updated this revision to Diff 134846.Feb 18 2018, 1:13 PM

Fix indentation

ahh marked an inline comment as done.Feb 18 2018, 1:14 PM

Discourse nitpick: I would encourage you to just use the ordinary phrase "null pointer", or just "null", when referring to a pointer value that happens to be null and to reserve "nullptr" for *statically* null pointers, especially the nullptr expression.

If the pointer is not null, the runtime overhead of the null check is pretty negligible next to the cost of actually doing the allocation. If the pointer is null, the runtime overhead of making at least one unnecessary call — probably two, if 'operator delete' doesn't do its own null check before calling 'free', and probably one that crosses image boundaries — is not negligible at all. So the relative impact on code that does end up destroying a trivial value is outsized.

On the other hand, if the programmer adds an explicit null-check, it's unlikely to be optimized away; that means that if we did this automatically, there would still be an avenue for them to get the null check back.

The code-size argument against doing the null check seems strong, however. Have you considered just doing this in the code-size-sensitive modes, in particular -Os/-Oz (for obvious reasons) and -O0 (because less code size == faster compiles, especially when it involves control flow)?

ahh added a comment.Feb 23 2018, 11:50 PM

If the pointer is not null, the runtime overhead of the null check is pretty negligible next to the cost of actually doing the allocation. If the pointer is null, the runtime overhead of making at least one unnecessary call — probably two, if 'operator delete' doesn't do its own null check before calling 'free', and probably one that crosses image boundaries — is not negligible at all. So the relative impact on code that does end up destroying a trivial value is outsized.

In a reply of mine that I think got eaten by list moderation, I looked into this and benchmarked the cost of ::operator delete; with our tcmalloc, the cost of deleting null is about 8 cycles (compared to an empty loop.) (I don't really know how to benchmark the version with an if around it, but if we assume that's free, 8 cycles is still very cheap.)

I suppose this might go up somewhat in an environment where we have to make some sort of PLT call or even two. My Google centric response is "don't do that"--linking directly against any modern malloc should avoid any jump in ::operator delete and our environment make direct calls quite fast; I'm not sure how generalizable that is. (The linking is I think universally good advice; I'm not sure who runs in an environment that cannot make efficient far calls. But point is that: while your statement is true, the penalty for getting this wrong seems very small, and as you say any programmer can if around it at a "hot" null delete, no?

This is one of the few aspects of malloc calls that I don't have near-infinite telemetry for (our sampling architecture doesn't easily collect it.) So I cannot give you a hard number of the fraction of deletes that are of null pointers, but I am convinced that is very small. Would more (Google-internal, obviously) data on this make a decision easier?

I could see why maybe this could be gated on -Os, but I didn't do this for two reasons:

  • I am new at Clang development and wasn't sure how to put that sort of a check in :) Though I can learn if this is a hard requirement.
  • From our perspective, I think Google would want this flag in non-size optimization (-O2 or whatever.) We delete null infrequently enough that I'd expect this to be a pure cycle win (if a very small one) and even though (because?) we don't optimize for size, we have a number of very large binaries, and reducing icache hit can help a lot.

I'm unsure exactly how to make progress here, since for one thing I'm unsure how strongly you feel about the potential cost/benefits. Guidance would be greatly appreciated!

That is an extremely Google-specific analysis, actually; AFAIK almost nobody else uses static linking all the way down to and including the C and C++ standard libraries unless they're literally building an executable for a fully-static environment, like the kernel. The C and C++ language standards state that operator delete and free are independently overridable by just defining those functions outside the stdlib, so they generally cannot be resolved as direct calls without the sort of whole-program analysis that the linker can only do when linking the final executable.

I think a more reasonable benchmark would be to build a standard Linux executable that dynamically links libc and lib{std,}c++, or perhaps something with the ADK or Darwin.

I'm quite open to the idea that the right thing to do is just to do this in all modes, but I do think we should understand the cost a little better. (Xcode defaults release builds to -Os, so in practice my proposal of "-Os or -O0" or would enable this by default for almost all builds for us at Apple.)

You can check for -Os by just checking getCodeGenOpts().OptimizeSize.

It should be quite easy to collect null-deletion counts by (1) enabling your patch and (2) writing an operator delete that counts nulls before calling free and reports that count in a global destructor. Then you just need to pick a C++-heavy program to count them in. :) Clang compiling 403.gcc isn't an unreasonable choice, although LLVM's use of allocation pools does mean that we're likely to have fewer delete calls than you might think.