This is an archive of the discontinued LLVM Phabricator instance.

[SanCov] Properly set ABI parameter attributes
ClosedPublic

Authored by aeubanks on May 27 2021, 2:56 PM.

Details

Summary

Arguments need to have the proper ABI parameter attributes set.

Followup to D101806.

Diff Detail

Unit TestsFailed

Event Timeline

aeubanks created this revision.May 27 2021, 2:56 PM
aeubanks requested review of this revision.May 27 2021, 2:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 27 2021, 2:56 PM
rnk added inline comments.May 27 2021, 3:03 PM
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
845

Can we do something like CB->setAttributes(CB->getDirectCallee()->getAttributes() that avoids duplicating the knowledge of the ABI attributes to the call site? We know at this point that the constructed call will be direct, nothing stops this code from looking at the callee.

aeubanks updated this revision to Diff 348398.May 27 2021, 3:07 PM

directly use callee attributes

rnk accepted this revision.May 27 2021, 3:11 PM

lgtm

llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
892

nit: on second thought, maybe it's simpler to use CallbackFunc rather than CB->getCalledFunction(). It seems silly to recompute that which is available in a local variable.

This revision is now accepted and ready to land.May 27 2021, 3:11 PM
aeubanks updated this revision to Diff 348400.May 27 2021, 3:15 PM

use FunctionCallee's function
this is actually a bit more verbose since we have to cast FunctionCallee's callee to a Function

rnk added a comment.May 27 2021, 3:21 PM

Ew. I guess I was wrong, the getCalledFunction version was better. Sorry about that. :)

aeubanks updated this revision to Diff 348401.May 27 2021, 3:24 PM

back to getCalledFunction()

This revision was landed with ongoing or failed builds.May 27 2021, 3:27 PM
This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.May 30 2021, 1:18 AM
nikic added inline comments.
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
845

Is there anything preventing us from doing this directly in IRBuilder? Is there some situation where we don't want to inherit attributes from the declaration?

aeubanks added inline comments.May 31 2021, 9:11 AM
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp
845

In the case of indirect calls, we don't have a callee to look at. It would be unintuitive to only sometimes copy attributes depending on whether the call is direct or not.
In these cases we know that we're creating a direct call.

We could create a separate IRBuilder::CreateDirectCall() which does that.

I ran the verifier extension (D103412) on SystemZ/SPEC17, and got several failures. I reduced one as a test case:

declare dso_local i8* @strchr(i8*, i32) local_unnamed_addr #1
declare dso_local i8* @memchr(i8*, i32 signext, i64)
declare void @llvm.assume(i1 noundef)

@0 = private unnamed_addr constant [21 x i8] c"$&*(){}[]'\22;\\|?<>~`\0A\00", align 2

define i1 @Perl_do_exec3(i8* %arg, i32 %arg1, i32 %arg2) {
bb:
  %i = call i8* @strchr(i8* getelementptr inbounds ([21 x i8], [21 x i8]* @0, i64 0, i64 0), i32 signext undef) #4
  %i3 = icmp ne i8* %i, null
  call void @llvm.assume(i1 %i3)
  br label %bb5

bb5:                                              ; preds = %bb
  ret i1 undef
}
bin/opt -march=z14 -S -o - -enable-new-pm=0 -instcombine

gives

argument to signext parameter requires signext attribute
  %memchr = call i8* @memchr(i8* noundef nonnull dereferenceable(21) getelementptr inbounds ([21 x i8], [21 x i8]* @0, i64 0, i64 0), i32 undef, i64 21)
in function Perl_do_exec3
LLVM ERROR: Broken function found, compilation aborted!

I tried this patch, which helped me with the test case:

--- a/llvm/lib/IR/Instructions.cpp
+++ b/llvm/lib/IR/Instructions.cpp
@@ -501,6 +501,9 @@ void CallInst::init(FunctionType *FTy, Value *Func, ArrayRef<Value *> Args,
   (void)It;
   assert(It + 1 == op_end() && "Should add up!");
 
+ if (Function *Fun = dyn_cast<Function>(Func))
+   setAttributes(Fun->getAttributes());
+
   setName(NameStr);
 }

I saw the question as to why not do this in IRBuilder with the response that "It would be unintuitive to only sometimes copy attributes depending on whether the call is direct or not.", but there didn't seem to be some consensus on this. I would be one to agree that it seems very desirable to have this handled generally and not just hope that everyone will always remember this. It seems to me it could be done in the CallInst/InvokeInst constructors even per above. If somebody has the intent of *not* adding the attributes to the CallInst (don't understand why), there could be an option to not do so - like an argument 'bool CopyAttributes = true'.

I am new to this discussion so I may be missing something... I guess I don't fully understand why the declaration is no longer used for attributes - can the same function be called with/without e.g. a ZExt attribute on a parameter? I thought the callee expects this always per the ABI and that it would be an error to omit it..?

rnk added a comment.Jun 3 2021, 4:40 PM

I don't like the idea of having the IRBuilder look at the call target and implicitly setting attributes, but I am open to revisiting the backend change. Here's a rationale for going back to the old behavior:

  • calling a function with the wrong function prototype is UB
  • calling a function with mismatched ABI attributes is very much like prototype mismatch, so it's UB
  • LLVM can generate whatever code it wants in cases of UB. So, if we have a direct callee, why not have the backend implicitly use the attributes from the callee?

I've forgotten why exactly we wanted to avoid looking at the callee's attributes.

I was looking back to D101713 where I found some crashing tests due to mismatched ABI attributes, specifically ones with type parameters like byval. The solution I had come up with was D101806, which was to ignore caller ABI attributes. We could also fix those crashes by also looking at the caller indirect ABI attribute type parameters.

I still think it's more consistent to only look at the callee parameter ABI attributes. It's weird that indirect and direct calls could be different. For example, ABI lowering could be different if an indirect call is promoted to a direct call with some ABI attributes not present on the call instruction. But it's less work to insert random calls to functions (especially compiler-rt functions or library calls) and not have to worry about ABI attributes. At this point I'm willing to give up on D101806 and just go down the easier route of looking at the caller for the type parameters to byval/etc (and revert the changes I've made similar to this).

rnk added a comment.Jun 4 2021, 12:39 PM

OK, right, the issue we were trying to solve is that LLVM was crashing when byval (+inalloca, etc) types mismatched somehow.

It occurs to me that, in a world of typeless pointers, it is possible that the function type used by the call may not match the prototype of the callee. Consider something like this:

define dso_local void @pass_int() #0 {
entry:
  call void bitcast (void (i32*)* @use_int to void (i32)*)(i32 0)
  ret void
}
declare dso_local void @use_int(i32* byval(i32)) #1

With opaque pointers, we lose the bitcast, so the call may appear to be direct, but the prototypes do not match. If the prototypes don't match, the attribute lists may not even correspond. In this case, the argument we are passing to a byval parameter isn't even a pointer. I don't know how the codegenerator would handle that.

We may need to teach CallBase::getCallee to do more than just dyn_cast, maybe it should check that the function prototypes match, and the call will be considered indirect otherwise. Is this a known issue, or something new?

If the prototypes do match, then I think it is OK to look at ABI attributes on the declaration of the callee. In the case of byval, LLVM would get the type from the declaration and use that. The caller would have only a typeless pointer argument anyway, which has no information about the underlying value to be passed.

The other option is that we compromise and allow codegen to look at the callee only for some kinds of ABI attributes.

jonpa added a comment.Jun 9 2021, 3:05 PM

calling a function with mismatched ABI attributes is very much like prototype mismatch, so it's UB

Sounds right to me.

The other option is that we compromise and allow codegen to look at the callee only for some kinds of ABI attributes.

...just go down the easier route of looking at the caller for the type parameters to byval/etc

The sext/zext attributes are the ones of immediate concern and must never be missing, at least on SystemZ.

I have proposed a patch at https://reviews.llvm.org/D103992, that follows the route of trying to make sure that all calls everywhere have the proper attributes set. That is needed if you would prefer to avoid the checking of the prototype during isel. (On the other hand, I am not really saying that I think that is better than the old behaviour...)