This is an archive of the discontinued LLVM Phabricator instance.

[objc_direct] fix codegen for mismatched Decl/Impl return types
ClosedPublic

Authored by MadCoder on Jan 22 2020, 8:53 AM.

Details

Summary

For non direct methods, the codegen uses the type of the Implementation.
Because Objective-C rules allow some differences between the Declaration
and Implementation return types, when the Implementation is in this
translation unit, the type of the Implementation should be preferred to
emit the Function over the Declaration.

Radar-Id: rdar://problem/58797748

Diff Detail

Event Timeline

MadCoder created this revision.Jan 22 2020, 8:53 AM

Why isn't a similar dance needed for non-direct methods?

MadCoder added a comment.EditedJan 22 2020, 5:51 PM

Why isn't a similar dance needed for non-direct methods?

because non direct methods do not need an llvm::Function to be synthesized at the call-site. direct methods do, and they form one with the type of the declaration they see. Then that same llvm::Function is used when you CodeGen the Implementation, so if there's a mismatch, sadness ensues because the LLVM IR verifier will notice the discrepancy between the declared return type of the function and the actual types coming out of the ret codepaths.

Regular obj-C methods use the _implementation_ types for the codegen (the declaration(s) aren't even consulted) and I want to stick at what obj-c does as much as I can.

(as a data point: If you use obj-C types with C functions, the type of the first declaration seen is used instead).

Why isn't a similar dance needed for non-direct methods?

because non direct methods do not need an llvm::Function to be synthesized at the call-site. direct methods do, and they form one with the type of the declaration they see. Then that same llvm::Function is used when you CodeGen the Implementation, so if there's a mismatch, sadness ensues because the LLVM IR verifier will notice the discrepancy between the declared return type of the function and the actual types coming out of the ret codepaths.

Regular obj-C methods use the _implementation_ types for the codegen (the declaration(s) aren't even consulted) and I want to stick at what obj-c does as much as I can.

(as a data point: If you use obj-C types with C functions, the type of the first declaration seen is used instead).

Okay, that makes sense to me.

Another solution would be to change IRGen for the implementation: if the declaration already exists (getFunction), do a bitcast + RAUW dance to fix it up (and update the DirectMethodDefinitions table). WDYT?

Why isn't a similar dance needed for non-direct methods?

because non direct methods do not need an llvm::Function to be synthesized at the call-site. direct methods do, and they form one with the type of the declaration they see. Then that same llvm::Function is used when you CodeGen the Implementation, so if there's a mismatch, sadness ensues because the LLVM IR verifier will notice the discrepancy between the declared return type of the function and the actual types coming out of the ret codepaths.

Regular obj-C methods use the _implementation_ types for the codegen (the declaration(s) aren't even consulted) and I want to stick at what obj-c does as much as I can.

(as a data point: If you use obj-C types with C functions, the type of the first declaration seen is used instead).

Okay, that makes sense to me.

Another solution would be to change IRGen for the implementation: if the declaration already exists (getFunction), do a bitcast + RAUW dance to fix it up (and update the DirectMethodDefinitions table). WDYT?

I didn't want to do that because that would mean that the type used for the implementation would depart from dynamic Objective-C methods, and it feels that it shouldn't. hence I took this option.

Why isn't a similar dance needed for non-direct methods?

because non direct methods do not need an llvm::Function to be synthesized at the call-site. direct methods do, and they form one with the type of the declaration they see. Then that same llvm::Function is used when you CodeGen the Implementation, so if there's a mismatch, sadness ensues because the LLVM IR verifier will notice the discrepancy between the declared return type of the function and the actual types coming out of the ret codepaths.

Regular obj-C methods use the _implementation_ types for the codegen (the declaration(s) aren't even consulted) and I want to stick at what obj-c does as much as I can.

(as a data point: If you use obj-C types with C functions, the type of the first declaration seen is used instead).

Okay, that makes sense to me.

Another solution would be to change IRGen for the implementation: if the declaration already exists (getFunction), do a bitcast + RAUW dance to fix it up (and update the DirectMethodDefinitions table). WDYT?

I didn't want to do that because that would mean that the type used for the implementation would depart from dynamic Objective-C methods, and it feels that it shouldn't. hence I took this option.

I think we're talking across each other. The idea is check the type when generating the implementation; if it's not correct, you fix it (need to update existing uses to bitcast). So the type used for the implementation would match dynamic methods.

Why isn't a similar dance needed for non-direct methods?

because non direct methods do not need an llvm::Function to be synthesized at the call-site. direct methods do, and they form one with the type of the declaration they see. Then that same llvm::Function is used when you CodeGen the Implementation, so if there's a mismatch, sadness ensues because the LLVM IR verifier will notice the discrepancy between the declared return type of the function and the actual types coming out of the ret codepaths.

Regular obj-C methods use the _implementation_ types for the codegen (the declaration(s) aren't even consulted) and I want to stick at what obj-c does as much as I can.

(as a data point: If you use obj-C types with C functions, the type of the first declaration seen is used instead).

Okay, that makes sense to me.

Another solution would be to change IRGen for the implementation: if the declaration already exists (getFunction), do a bitcast + RAUW dance to fix it up (and update the DirectMethodDefinitions table). WDYT?

I didn't want to do that because that would mean that the type used for the implementation would depart from dynamic Objective-C methods, and it feels that it shouldn't. hence I took this option.

I think we're talking across each other. The idea is check the type when generating the implementation; if it's not correct, you fix it (need to update existing uses to bitcast). So the type used for the implementation would match dynamic methods.

ah I see, I don't know how to do that, I couldn't figure out how to fix the function declaration, if you want to give it a stab I'd love it.

I think we're talking across each other. The idea is check the type when generating the implementation; if it's not correct, you fix it (need to update existing uses to bitcast). So the type used for the implementation would match dynamic methods.

ah I see, I don't know how to do that, I couldn't figure out how to fix the function declaration, if you want to give it a stab I'd love it.

I missed that question until now. This uses llvm::Value::replaceAllUsesWith, if you git grep through clang you'll find examples.

Here's one from CodeGenFunction::AddInitializerToStaticVarDecl:

// The initializer may differ in type from the global. Rewrite
// the global to match the initializer.  (We have to do this
// because some types, like unions, can't be completely represented
// in the LLVM type system.)
if (GV->getType()->getElementType() != Init->getType()) {
  llvm::GlobalVariable *OldGV = GV;

  GV = new llvm::GlobalVariable(CGM.getModule(), Init->getType(),
                                OldGV->isConstant(),
                                OldGV->getLinkage(), Init, "",
                                /*InsertBefore*/ OldGV,
                                OldGV->getThreadLocalMode(),
                         CGM.getContext().getTargetAddressSpace(D.getType()));
  GV->setVisibility(OldGV->getVisibility());
  GV->setDSOLocal(OldGV->isDSOLocal());
  GV->setComdat(OldGV->getComdat());

  // Steal the name of the old global
  GV->takeName(OldGV);

  // Replace all uses of the old global with the new global
  llvm::Constant *NewPtrForOldDecl =
  llvm::ConstantExpr::getBitCast(GV, OldGV->getType());
  OldGV->replaceAllUsesWith(NewPtrForOldDecl);

  // Erase the old global, since it is no longer used.
  OldGV->eraseFromParent();
}

This example is analogous to your case. The pattern is:

  1. Create the new thing (with no name, or with a temporary/bad name).
  2. Steal the name of the old thing.
  3. Bitcast from the new thing to the old type.
  4. RAUW to replace all uses of the old thing with the bitcast-of-new-thing from (3).
  5. Delete the old thing.

The RAUW isn't going to fix up the DenseMap you have on the side though, so you'll need to handle that explicitly; likely during step (4) makes sense.

MadCoder updated this revision to Diff 241599.Jan 30 2020, 3:11 PM

@dexonsmith here, I still hook the same method but do it in a more LLVM-approved way ;)

It works with minimap perf impect because the only case we call GenerateDirectMethod with an Implementation is if we're about to codegen the body, so the case when you build against a header without seing the @implementation will not be affected perf-wise.

MadCoder updated this revision to Diff 241601.Jan 30 2020, 3:12 PM

damn you tabs!

dexonsmith accepted this revision.Jan 30 2020, 5:13 PM

LGTM, with one style nitpick.

clang/lib/CodeGen/CGObjCMac.cpp
4065–4067

I think the LLVM style is to leave out these braces.

This revision is now accepted and ready to land.Jan 30 2020, 5:13 PM
MadCoder updated this revision to Diff 241613.Jan 30 2020, 5:18 PM
MadCoder marked an inline comment as done.
This revision was automatically updated to reflect the committed changes.