Page MenuHomePhabricator

Introduce noundef attribute at call sites for stricter poison analysis
Needs ReviewPublic

Authored by guiand on Jun 11 2020, 11:43 AM.

Details

Summary

This change adds a new IR noundef attribute, which denotes when a function call argument or return val may never contain uninitialized bits.

In MemorySanitizer, this attribute enables optimizations which decrease instrumented code size by up to 17% (measured with an instrumented build of clang) . I'll introduce the change allowing msan to take advantage of this information in a separate patch.

Tests as a separate patch:
https://reviews.llvm.org/D82317

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
guiand marked an inline comment as done.Jun 25 2020, 9:47 AM
guiand added inline comments.
clang/lib/CodeGen/CGCall.cpp
2148

This is a good point, and I think there was some nuance that I had previously included in determining this for records (since removed) that I didn't think to include in CGCall.cpp.

I think one particular check, llvm::DataLayout::typeSizeEqualsStoreSize, handles a lot of these cases:

  • long double
  • oddly-sized vectors
  • _ExtInt types

I don't know if the matrix type has internal padding (like structs) but if not it would cover that as well.
I believe that the struct with padding wouldn't be an issue as we're excluding record types here.
And I'd have to take a closer look at nullptr_t to see how it behaves here.

~~~

But if I'd like to build an exhaustive list of types I think will work correctly with a check like this:

  • scalars
  • floating point numbers
  • pointers

I think I'd need also need an inner typeSizeEqualsStoreSize check on the base type of vectors/complex numbers.

guiand updated this revision to Diff 273456.Jun 25 2020, 10:57 AM
guiand retitled this revision from Introduce frozen attribute at call sites for stricter poison analysis to Introduce noundef attribute at call sites for stricter poison analysis.
guiand edited the summary of this revision. (Show Details)

Renamed to noundef, added additional checks before determining attribute

guiand marked 3 inline comments as done.Jun 25 2020, 10:58 AM
guiand marked an inline comment as done.Jun 25 2020, 11:13 AM
guiand added inline comments.
clang/lib/CodeGen/CGCall.cpp
2148

nullptr_t (which has the same size and alignment as void* but is all padding)

From what I can gather looking at the IR of test/CodeGenCXX/nullptr.cpp, it looks like nullptr_t arguments and return values are represented as normal i8* types but given the special value null. From this it seems like we can mark them noundef without worry?

guiand marked 2 inline comments as not done.Jun 25 2020, 11:13 AM
guiand marked an inline comment as done.Jun 25 2020, 2:49 PM
guiand added inline comments.
clang/lib/CodeGen/CGCall.cpp
2148

Actually I've been thinking more about these cases, and they should only really be a problem if clang coerces them away from their native types to something larger.

For example, if clang emits i33 that's not a problem, as potential padding belonging to i33 can be always be determined by looking at the data layout.
But if it emits { i32, i32 } or i64 or something similar, it's introducing new hidden/erased padding and so we can't mark it noundef.
Same thing goes for long double: if it's emitted as x86_fp80 that's no problem.

There's been some confusion wrt this in the LangRef patch as well. I think what we really want is an attribute that indicates the presence of undef bits *not inherent to the IR type*. So (for the sake of argument, we're not handling structs right now) struct { char; short } emitted as { i8, i16 } noundef is fine, but emitting it as i64 is a problem.

guiand updated this revision to Diff 273558.Jun 25 2020, 5:03 PM

Made DetermineNoUndef more robust

nikic added inline comments.Jun 27 2020, 1:42 AM
llvm/lib/Transforms/Utils/CodeExtractor.cpp
932 ↗(On Diff #273558)

Not familiar with this code, but based on the placement of other similar attributes like nonnull, this should probably be in the first list.

jdoerfert added inline comments.Jun 29 2020, 8:27 AM
llvm/lib/Transforms/Utils/CodeExtractor.cpp
932 ↗(On Diff #273558)

Technically, it should not matter because this is not a function attribute. That said, the first list makes logically more sense. I would even prefer three lists, OK to propagate, not OK, and assertion.

jdoerfert added inline comments.Jun 29 2020, 9:15 AM
clang/lib/CodeGen/CGCall.cpp
2148

struct { char; short } emitted as { i8, i16 } noundef is fine, but emitting it as i64 is a problem.

FWIW, I agree with that.

I've run across an issue when bootstrapping clang, that manifests itself in the following code when compiled with optimizations:

#include <math.h>

float my_exp2(float a) {
    return pow(2.0, a);
}

With this code, the call to pow is lifted to a exp2, and then the resulting call causes llvm::verifyFunction to fail. The error is that attributes are present past the end of the argument list.
I'm thinking that whatever code does the lifting here fails to remove the attribute list corresponding to the 2.0 parameter here.

Could anyone point me to where I might need to make a change for that?

Could anyone point me to where I might need to make a change for that?

LibCallSimplifier::optimizePow

guiand updated this revision to Diff 274895.Jul 1 2020, 12:51 PM

Addressed comments, added test for indirect calls

guiand updated this revision to Diff 276487.Jul 8 2020, 10:32 AM

Per @nikic's suggestion, I isolated the LLVM side of the changes to a separate revision D83412, which should be good to go.

Is anything still pending here (besides the tests, of course)?

jdoerfert resigned from this revision.Jul 17 2020, 7:23 PM

My concerns have been addressed. I am not the right one to look at the clang changes but what we merged into LLVM was really useful, thanks again for working on this!

This revision is now accepted and ready to land.Jul 17 2020, 7:23 PM
guiand closed this revision.Jul 17 2020, 8:13 PM
This comment was removed by guiand.
guiand reopened this revision.Jul 17 2020, 8:14 PM

Sorry, closed this mistakenly!

This revision is now accepted and ready to land.Jul 17 2020, 8:14 PM
rsmith requested changes to this revision.Jul 21 2020, 2:26 PM

I'd like to see some testcases for the C++ side of this. The following things all seem like they might be interesting: passing and returning classes and unions, especially ones that aren't trivially-copyable, nullptr_t, pointers to members, this parameters, VTTs, the "complete object" flag for MS ABI constructors, the this return from constructors in the ARM ABI, parameters and return values of virtual adjustment thunks.

clang/lib/CodeGen/CGCall.cpp
1911

This includes nullptr_t (which is never noundef) and member pointers (which are sometimes noundef, and you'll need to ask the CXXABI implementation if you don't want to conservatively return false).

2150

This only applies in C++. In C, it's valid for a function to omit its return statement if the result value is unused, and in that case the returned value will legitimately be undef.

Even in C++, we're conservative about enforcing the constraint that functions return a proper value and control flow doesn't just fall off the bottom of the function. See the full set of checks here: https://github.com/llvm/llvm-project/blob/master/clang/lib/CodeGen/CodeGenFunction.cpp#L1322

This revision now requires changes to proceed.Jul 21 2020, 2:26 PM
rsmith added inline comments.Jul 21 2020, 2:27 PM
clang/lib/CodeGen/CGCall.cpp
2150

It would seem sensible to suppress this for extern "C" functions too, on the basis that the definition of such functions is probably in C code, and thus they might return undef.

guiand updated this revision to Diff 279949.Jul 22 2020, 3:00 PM

Adds additional constraints on noundef: Not a nullptr_t, not a member pointer, and not coerced to a type of larger size. Disabled emitting in return position for non-C++ languages (or inside extern "C").

@rsmith, I just didn't address your comment about being conservative with black-holeing C++ functions returning undef. At least for msan purposes this shouldn't be a *too* much of a problem, but of course this attribute is for general use by LLVM passes. I'm just wondering what kind of tradeoff we want to make here, I suppose.

guiand edited the summary of this revision. (Show Details)Jul 22 2020, 3:08 PM

Just saw your comment about tests as well. The idea was to have all tests ported over as part of a separate commit (I linked it in the main patch description) and then only to push either commit once both are ready to land.

To make it easier to be sure this specific feature is working correctly, I can port over some of those tests to this patch for review purposes.

guiand updated this revision to Diff 280622.Jul 24 2020, 5:03 PM

Added an across-the-board test with several different interesting cases. @rsmith, I'm not sure how to test things like VTables/VTTs, since afaik the way they would be exposed to function attributes would be if a struct is being passed by value. But in that case, we currently never emit noundef.

guiand updated this revision to Diff 281311.Jul 28 2020, 11:46 AM

Incorporate C++'s more conservative checks for omitted return values

guiand updated this revision to Diff 281323.Jul 28 2020, 12:08 PM
guiand edited the summary of this revision. (Show Details)

Fixes regression; allows emitting noundef for non-FunctionDecls as well.

guiand updated this revision to Diff 281345.Jul 28 2020, 1:23 PM

Fix typo in MayDropFunctionReturn

rsmith accepted this revision.Jul 28 2020, 1:45 PM

Looks good with suggestions applied, and with the portability problems in the test fixed. (Maybe just add a -triple? Though it would be good to also test inalloca and ARM constructor this returns and so on.)

clang/include/clang/Driver/Options.td
3980

(since this doesn't affect return values)

clang/lib/CodeGen/CGCall.cpp
2150

OpenCLCPlusPlus should only ever be true when CPlusPlus is also true, so the second half of this || should be unnecessary.

2154–2156

TargetDecl (the callee of a function call) should never be a variable. You shouldn't need this check.

clang/test/CodeGen/attr-noundef.cpp
23

I think this test will fail on 32-bit Windows because e will be passed inalloca.

79

I believe this test will fail on ARM, due to constructors implicitly returning this. Consider either specifying a target for this test or weakening the void to {{.*}} here.

80

I believe this test will fail on Windows, because getData and Object will appear in the opposite order in the mangling. Consider either specifying a target or matching these names either way around.

This revision is now accepted and ready to land.Jul 28 2020, 1:45 PM
guiand updated this revision to Diff 281676.Jul 29 2020, 11:13 AM

Addressed comments

guiand updated this revision to Diff 281679.Jul 29 2020, 11:16 AM

Updated comment on disable-noundef-args option

guiand marked 5 inline comments as done.Jul 29 2020, 11:18 AM
guiand added inline comments.
clang/lib/CodeGen/CGCall.cpp
2154–2156

I tried replacing the body of this if statement with an assertion to make sure, and the assertion fires.

guiand updated this revision to Diff 284907.Aug 11 2020, 3:06 PM

To try to alleviate the tests issue, @eugenis and I discussed that it might be best to take it slow. So now this patch will mask off emitting the attribute on clang tests by default.

guiand requested review of this revision.Aug 11 2020, 3:06 PM

I think I'd like someone to take a look at the llvm-lit changes to see if this makes sense as an approach

guiand updated this revision to Diff 284931.Aug 11 2020, 4:20 PM

Made the compiler flag non-public