This is an archive of the discontinued LLVM Phabricator instance.

[ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR
ClosedPublic

Authored by ahatanak on Dec 7 2020, 7:15 PM.

Details

Summary

Background:

This fixes a longstanding problem where llvm breaks ARC's autorelease optimization (see the link below) by separating calls from the marker instructions or retainRV/claimRV calls. The backend changes are in https://reviews.llvm.org/D92569.

https://clang.llvm.org/docs/AutomaticReferenceCounting.html#arc-runtime-objc-autoreleasereturnvalue

What this patch does to fix the problem:

  • The front-end adds operand bundle "clang.arc.attachedcall" to calls which indicates the call is implicitly followed by a marker instruction and an implicit retainRV/claimRV call that consumes the call result. In addition, it emits a call to @llvm.objc.clang.arc.noop.use, which consumes the call result, to prevent the middle-end passes from changing the return type of the called function. This is currently done only when the target is arm64 and the optimization level is higher than -O0.
  • ARC optimizer temporarily emits retainRV/claimRV calls after the calls with the operand bundle in the IR and removes the inserted calls after processing the function.
  • ARC contract pass emits retainRV/claimRV calls after the call with the operand bundle. It doesn't remove the operand bundle on the call since the backend needs it to emit the marker instruction. The retainRV/claimRV calls are emitted late in the pipeline to prevent optimization passes from transforming the IR in a way that makes it harder for the ARC middle-end passes to figure out the def-use relationship between the call and the retainRV/claimRV calls (which is the cause of PR31925).
  • The function inliner removes an autoreleaseRV call in the callee if nothing in the callee prevents it from being paired up with the retainRV/claimRV call in the caller. It then inserts a release call if claimRV is attached to the call since autoreleaseRV+claimRV is equivalent to a release. If it cannot find an autoreleaseRV call, it tries to transfer the operand bundle to a function call in the callee. This is important since the ARC optimizer (ObjCARCOpt::OptimizeReturns) can remove the autoreleaseRV returning the callee result, which makes it impossible to pair it up with the retainRV/claimRV call in the caller. If that fails, it simply emits a retain call in the IR if retainRV is attached to the call and does nothing if claimRV is attached to it.
  • SCCP refrains from replacing the return value of a call with a constant value if the call has the operand bundle. This ensures the call always has at least one user (the call to @llvm.objc.clang.arc.noop.use), which is necessary to prevent passes like deadargelim from changing the return type.
  • This patch also fixes a bug in replaceUsesOfNonProtoConstant where multiple operand bundles of the same kind were being added to a call.

Future work:

  • Use the operand bundle on x86-64.
  • Fix the auto upgrader to convert call+retainRV/claimRV pairs into calls with the operand bundles.

rdar://71443534

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
rjmccall added inline comments.Dec 8 2020, 2:28 PM
clang/lib/CodeGen/CGObjC.cpp
2283

This is too generic for a global variable name.

It should also be const.

2329

It would be good to explain why this is target-specific in a comment.

2335

Didn't we just add this? Can that fail to add the marker?

ahatanak updated this revision to Diff 310387.Dec 8 2020, 5:09 PM
ahatanak marked 2 inline comments as done.
ahatanak edited the summary of this revision. (Show Details)
ahatanak added inline comments.
clang/lib/CodeGen/CGObjC.cpp
2329

I didn't really have a reason for limiting this to arm64 and non-windows OS other than I didn't want to do too many things in the initial patch. I don't think it would be hard to do this on x86-64. For windows, it looks like we'll have to emit the markers and the ARC calls before CodeGen (before WinEHPrepare) is run, but I haven't thought it through.

2335

The check for module flag was here because the initial version of this patch added the attribute when the target wasn't arm64 (e.g., x86-64) too. I've removed the check for now.

compnerd added inline comments.Dec 9 2020, 8:32 AM
clang/lib/CodeGen/CGObjC.cpp
2327

Nit: I think that > 0 is easier to read.

2329

Hmm, why the explicit check for not-Windows?

llvm/include/llvm/Analysis/ObjCARCRVAttr.h
44

Nit: A new line before this would be nice.

llvm/lib/Transforms/ObjCARC/ObjCARCContract.cpp
443

It seems that IsBundled is unused after this point, why not just do:

if (BundledInsts->contains(Inst))
  return false;
ahatanak updated this revision to Diff 310561.Dec 9 2020, 9:14 AM
ahatanak marked 3 inline comments as done.

Address review comment.

clang/lib/CodeGen/CGObjC.cpp
2329

My understanding is that we'll have to do things differently in ARC contract on Windows, so for now, this patch disables this optimization on Windows.

Is the FIXME comment about Windows I added correct?.

There is nothing particularly special about that. The reason for the funclet handling there is that in the case of an outlined block for exception handling (i.e. a funclet), we need to ensure that the assembly marker received the funclet token as failure to do so would mark the assembly as unreachable and would thus thwart the auto-release claim. For the normal codepath, the behaviour should be identical.

ahatanak updated this revision to Diff 310568.Dec 9 2020, 9:31 AM

Fix an incorrect comment in insertRetainOrClaimRVCalls in InlineFunction.cpp.

There is nothing particularly special about that. The reason for the funclet handling there is that in the case of an outlined block for exception handling (i.e. a funclet), we need to ensure that the assembly marker received the funclet token as failure to do so would mark the assembly as unreachable and would thus thwart the auto-release claim. For the normal codepath, the behaviour should be identical.

ARC contract pass removes attribute retainRV/claimRV (but not attribute rv_marker) from a call and inserts a retainRV/claimRV call after the call in the IR. In that case, the inserted retainRV/claimRV should get the funclet token. Is that right?

Hmm, it seems that calls to some of the ARC runtime functions don't get the token. In some cases, I'm seeing WinEHPrepare (ToT clang, without this patch) turn a call to @objc_unsafeClaimAutoreleasedReturnValue in an EH cleanup block into an unreachable.

ahatanak updated this revision to Diff 310693.Dec 9 2020, 3:55 PM
ahatanak edited the summary of this revision. (Show Details)

Enable the optimization on Windows.

Move createCallInst defined in ObjCARCContract.cpp to ObjCARC.h and use it to create calls to retainRV/claimRV with funclet tokens when ARC contraction is run. Add a test case to contract-marker-funclet.ll.

ToT clang/llvm (without this patch) is turning the call to @objc_unsafeClaimAutoreleasedReturnValue in the EH cleanup block into an unreachable when compiling the following code. It looks like this happens because the inliner isn't adding the funclet token, but it's not clear to me whether this is a bug or not.

$ cat test.mm

void foo();
id noexcept_func() noexcept;

struct S0 {
  id x;
  ~S0() {
    noexcept_func();
  }
};

void test_attr_claimRV() {
  S0 s0;
  foo();
}

$ clang++ -std=c++11 -fobjc-arc test.mm -O3 -S -o - "-fobjc-runtime=ios-11.0.0" -target aarch64-unknown-windows-msvc

ahatanak updated this revision to Diff 310709.Dec 9 2020, 4:24 PM

Elaborate on when createCallInstWithColors should be called.

Hmm, I think that the function is marked as noexcept, so the the funclet shouldn't be invoked and should get DCE'd, but that seems unrelated to ARC.

ahatanak updated this revision to Diff 311662.Dec 14 2020, 11:42 AM

Set the insert point before generating new instructions in insertRetainOrClaimRVCalls.

ahatanak updated this revision to Diff 315886.Jan 11 2021, 11:43 AM

Rebase and fix a few bugs in the patch.

  • In the loop in DeadArgumentEliminationPass::SurveyFunction that visits the uses of a function, continue instead of break so that the function is marked live when a function is used by something that isn't a CallBase.
  • Make ObjCARCOpt::OptimizeIndividualCallImpl return early if the instruction is a retainRV/claimRV that was temporarily inserted after a call annotated with a retainRV/claimRV attribute. This fixes the first assertion in ObjCARCOpt::OptimizeRetainRVCall.

I think the plan should be to use this code pattern on all targets, but starting it with arm64 seems fine.

clang/lib/CodeGen/CGObjC.cpp
2341

It's weird that these attributes use two different capitalization styles. Also, why are they both needed? Did you consider making the role (retain/claim) be the value of the attribute rather than a separate attribute?

Should the attribute be namespaced, like clang.arc.rv_marker?

Let's go ahead and add globals for these strings so we can refer to them symbolically, like you did with retainRVMarkerKey. Is there an LLVM header for ARC optimization we could reasonably pull them from, or are we doomed to repeat ourselves across projects?

ahatanak marked an inline comment as done.Jan 21 2021, 11:58 AM

Add helper functions to ObjCARCRVAttr.h for adding, removing, or looking up attributes.

ahatanak updated this revision to Diff 318278.Jan 21 2021, 12:01 PM

Update patch.

Okay, so it's important that we emit both attributes? I'll take your word for it.

llvm/include/llvm/Analysis/ObjCARCRVAttr.h
57

This is just doing the lookup twice, right? You can just unconditionally remove the attribute, here and below.

ahatanak added inline comments.Jan 21 2021, 2:52 PM
clang/lib/CodeGen/CGObjC.cpp
2341

I think you are asking why both retain/claim and rv_marker are needed? Or are you asking why both 'retain' and 'claim' are needed?

We could do without clang.arc.rv_marker and just use clang.arc.rv=retain/claim since the middle-end and backend passes can easily determine whether a marker is needed or not. The only drawback to using only clang.arc.rv I can think of is that it's no longer possible to tell whether an instruction is implicitly followed by marker+retain/claimRV or just the marker, which makes Instruction::mayWriteToMemory return a conservative answer if the function call is read-only. A read-only call cannot be treated as read-only if it's followed by marker+retain/claimRV, but it is still read-only if it is followed just by the marker.

Note that ARC contract pass emits the retain/claimRV instruction into the IR and drops the corresponding clang.arc.rv attribute but doesn't remove the clang.arc.rv_marker attribute. So if we used only clang.arc.rv, a call annotated with the attribute would be implicitly followed by marker+retain/claimRV before ARC contract, while it would be followed by just the marker after ARC contract, but Instruction::mayWriteToMemory wouldn't' be able to tell the difference.

rjmccall added inline comments.Jan 21 2021, 3:11 PM
clang/lib/CodeGen/CGObjC.cpp
2341

I think you are asking why both retain/claim and rv_marker are needed?

Yes.

We could do without clang.arc.rv_marker and just use clang.arc.rv=retain/claim since the middle-end and backend passes can easily determine whether a marker is needed or not. The only drawback to using only clang.arc.rv I can think of is that it's no longer possible to tell whether an instruction is implicitly followed by marker+retain/claimRV or just the marker, which makes Instruction::mayWriteToMemory return a conservative answer if the function call is read-only. A read-only call cannot be treated as read-only if it's followed by marker+retain/claimRV, but it is still read-only if it is followed just by the marker.

Is there a situation where we emit the marker but not a following call? Or is that just in the places where we've already made the following call explicit in IR? Because in the latter case, presumably the following call will still act as a barrier to anything that's querying mayWriteToMemory, right? (That is: in general, I expect that nobody just asks whether a specific instruction can write to memory, they ask if any of the instructions in a region can write to memory, and since the following call can...)

ahatanak added inline comments.Jan 22 2021, 9:53 AM
clang/lib/CodeGen/CGObjC.cpp
2341

I think you are right. The retainRV/claimRV instruction that follows the annotated call will act as a barrier, so using clang.arc.rv_marker to distinguish between marker+retain/claimRV and marker only probably won't help the optimizations in practice.

ahatanak updated this revision to Diff 318594.Jan 22 2021, 12:07 PM
ahatanak edited the summary of this revision. (Show Details)

Stop using clang.arc.rv_marker and just use "clang.arc.rv"="retain/claim".

rjmccall accepted this revision.Jan 22 2021, 12:31 PM

LGTM. I know you mentioned doing this for more targets in follow-up commits. It would be good to also do this for -O0, so that we have a single correct pattern coming out of the frontend in all modes; but that's also okay to do in follow-ups.

This revision is now accepted and ready to land.Jan 22 2021, 12:31 PM
This revision was landed with ongoing or failed builds.Jan 25 2021, 11:57 AM
This revision was automatically updated to reflect the committed changes.
nikic added a subscriber: nikic.Jan 25 2021, 12:44 PM
nikic added inline comments.
llvm/lib/IR/Instruction.cpp
580

This change looks pretty fishy. Objective C shouldn't be hijacking LLVMs core instruction model in this way. If it writes to memory, this should either be reflected in the attributes, or modeled using operand bundles.

@fhahn Did you review these changes? If not, I'd suggest to revert this patch and get a review on the LLVM changes.

rsmith added a subscriber: rsmith.Jan 25 2021, 1:37 PM

This change violates layering by including an Analysis header from IR; I'll be landing a revert as soon as I've finished testing it.

This change violates layering by including an Analysis header from IR; I'll be landing a revert as soon as I've finished testing it.

Should the header just be in IR? We'd like to avoid duplicating constant strings unnecessarily between IRGen and IR passes.

llvm/lib/IR/Instruction.cpp
580

This could definitely be an operand bundle, and I suppose the presence of a bundle does force a conservative assumption on passes.

fhahn added inline comments.Jan 26 2021, 2:35 AM
llvm/include/llvm/Analysis/ObjCARCRVAttr.h
29

Is there a place the attributes could be documented?

llvm/lib/IR/Instruction.cpp
580

@fhahn Did you review these changes?

Nope I didn't have time to look at this so far.

<snip>

Can functions marked as readonly/readnone be called with the objc attributes?

I'm not very familiar with ObjC, but even for a function that just returns a passed in object id, don't we need to retain & release the object in the function? Which means the function cannot be readonly because we need to call @llvm.objc* functions? If that's the case, could we just check in the verifier that the attributes are never used with readonly functions?

If there are indeed cases where we can call readonly functions, operand bundles would probably be safest. It would probably also good to have at least a few alias-analysis tests, to make sure things work as expected.

llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
576

This seems fragile. IIUC this workaround is needed because the return value of a retainRV/claimRV function always has one implicit use. This aspect could get missed in other places as well.

As an alternative, could the frontend insert 'dummy' uses (e.g. llvm.objc.arc.use) to make this assumption transparent? Those could be removed sufficiently late. Their placement doesn't really matter and neither does if any instructions are moved between the original call and the dummy use, but the fact that the returned value is used is now explicit. IICU they could be marked as accessing inaccessible memory only, so they should be too much trouble for other optimizations (at least not more than the previous explicit release calls). Or perhaps there's a different way to mark the return value as used explicitly.

ahatanak added inline comments.Jan 26 2021, 7:17 AM
llvm/lib/IR/Instruction.cpp
580

A function compiled using ARC can call a function compiled using MRR, which can be readonly/readnone. Also, a function compiled using ARC can be marked as readonly/readnone (if an optimization pass wants to do so) after ARC optimizer removes the retainRV/autoreleaseRV pair.

define i8* @foo() {
  %1 = tail call i8* @readonlyMRRFunc()
  ; This function can be readonly if ARC optimizer removes the following two instructions.
  %2 = call i8* @llvm.objc.retainAutoreleasedReturnValue(i8* %1)
  %3 = tail call i8* @llvm.objc.autoreleaseReturnValue(i8* %2)
  ret i8* 
}
llvm/lib/Transforms/IPO/DeadArgumentElimination.cpp
576

I think we can make the front-end insert a dummy use instruction. The front-end currently emits @llvm.objc.clang.arc.use in some cases to make sure the object doesn't get released too early, but we can't use that in this case since ARC optimizer treats a call to @llvm.objc.clang.arc.use as a real use, so we probably need a new intrinsic.

fhahn added inline comments.Jan 26 2021, 7:50 AM
llvm/lib/IR/Instruction.cpp
580

A function compiled using ARC can call a function compiled using MRR, which can be readonly/readnone

Ok, sounds like a bundle would be a good option then?

ahatanak added inline comments.Jan 26 2021, 2:11 PM
llvm/lib/IR/Instruction.cpp
580

Yes. I think bundle should be used here.

I'm considering passing an integer flag, which distinguishes between retain and claim (e.g., 0 for retain and 1 for claim), to the bundles:

call i8* @foo() [ "clang.arc.rv"(i64 0) ]

Do you see any problems with this approach? Alternatively, we could pass the pointer to the runtime function (e.g., pointer to @llvm.objc.retainAutoreleasedReturnValue).

ahatanak updated this revision to Diff 319719.Jan 27 2021, 5:03 PM
ahatanak retitled this revision from [ObjC][ARC] Annotate calls with attributes instead of emitting retainRV or claimRV calls in the IR to [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR.
ahatanak edited the summary of this revision. (Show Details)
ahatanak reopened this revision.Jan 27 2021, 5:06 PM

Address post-commit review comments.

  • Use operand bundle instead of attribute.
  • Emit a call to @llvm.objc.clang.arc.noop.use in the front-end so that the optimization passes know the call result is used.
This revision is now accepted and ready to land.Jan 27 2021, 5:06 PM
ahatanak requested review of this revision.Jan 27 2021, 5:06 PM
ahatanak updated this revision to Diff 320001.Jan 28 2021, 4:59 PM

Teach markTails to ignore operand bundle clang.arc.rv when determining whether a call can be marked as tail. ObjCARCOpt::OptimizeReturn cannot eliminate the retainRV/autoreleaseRV pair if the call isn't marked as tail.

fhahn added a comment.Feb 1 2021, 10:06 AM

Thanks for the update! The approach with operand bundles and the no-op uses looks cleaner than the original version. The special handling in the inliner seems a bit unfortunate, but I guess there's no way around that?

(as a side note, it might be easier to submit if this would be split up in the changes adding the bundle, followed by the ARC optimization & clang changes)

llvm/include/llvm/Analysis/ObjCARCUtil.h
41 ↗(On Diff #320001)

I think we should include LLVMContext.h explicitly?

llvm/include/llvm/IR/InstrTypes.h
1217 ↗(On Diff #320001)

Please add documentation for the new functions.

llvm/include/llvm/IR/Intrinsics.td
449 ↗(On Diff #320001)

Can this be a DefaultAttrIntrinsics (which adds nofree nosync nounwind willreturn)?

Same probably for all/most objc_* intrinsics.

llvm/lib/Transforms/Utils/InlineFunction.cpp
1616

Could you add a comment elaborating what this function does and why it is needed?

ahatanak updated this revision to Diff 320663.Feb 1 2021, 7:42 PM
ahatanak marked 4 inline comments as done.

Address review comments.

We could run another pass to clean up the IR after inlining, but I think it's better to do the cleanup right after the callee function is cloned in the caller function.

llvm/include/llvm/IR/Intrinsics.td
449 ↗(On Diff #320001)

I'll change the other intrinsics in a separate patch.

fhahn added a comment.Feb 2 2021, 6:27 AM

Thanks for the updates! The LLVM side of things now looks good to me. Could you also add a description of the new bundle to LangRef https://llvm.org/docs/LangRef.html#operand-bundles?

We could run another pass to clean up the IR after inlining, but I think it's better to do the cleanup right after the callee function is cloned in the caller function.

Thanks for elaborating. Given that it is isolated to a single place and the inliner already has logic to deal with a few different intrinsics/bundle types, I think that should be fine.

llvm/include/llvm/IR/Intrinsics.td
449 ↗(On Diff #320001)

SGTM. thanks!

llvm/lib/Transforms/Utils/InlineFunction.cpp
1621

nit: typo autoreleasaeRV

1635

nit: Can you just use CB.getModule()?

1639

we are in the llvm:: namespace here I think, so this should not be needed ? above you use the objcarc namespace without llvm:: prefix as well. Same in other places in the function.

ahatanak updated this revision to Diff 320956.Feb 2 2021, 5:33 PM
ahatanak marked 3 inline comments as done.

Address review comments.

ahatanak updated this revision to Diff 320985.Feb 2 2021, 8:59 PM

Add a description of the new bundle to LangRef.

fhahn accepted this revision.Feb 4 2021, 6:20 AM

Core LLVM parts LGTM, thanks! I'm going to mark this as accepted, given that the other parts have been reviewed earlier already

This revision is now accepted and ready to land.Feb 4 2021, 6:20 AM
This revision was landed with ongoing or failed builds.Feb 5 2021, 5:55 AM
This revision was automatically updated to reflect the committed changes.

I ended up reverting the changes I made to llvm/lib/IR/AutoUpgrade.cpp as the file was including llvm/Analysis/ObjCARCUtil.h, which was violating layering.

I ended up reverting the changes I made to llvm/lib/IR/AutoUpgrade.cpp as the file was including llvm/Analysis/ObjCARCUtil.h, which was violating layering.

Are you planning to land the upgrade another way?

I do have a plan to modify the auto upgrader to use the bundles when it's possible to do so, but I'm currently not considering landing the changes I reverted since they were only needed to avoid duplicating constant strings. I don't think we can avoid duplicating the string unless objcarc::getRVMarkerModuleFlagStr() can be moved to a file in llvm/include/llvm/IR.

thakis added a subscriber: thakis.Feb 9 2021, 7:46 AM

This makes clang assert for some inputs. See https://bugs.chromium.org/p/chromium/issues/detail?id=1175886#c10 for a repro.

thakis added a comment.Feb 9 2021, 8:07 AM

Here's a reduced repro:

thakis@thakis:~/src/llvm-project$ cat GREYAssertDefaultConfiguration-9e1f3b.reduced.m
inline id a() {}
void b() {
  a();
  a();
}
thakis@thakis:~/src/llvm-project$ out/gn/bin/clang -cc1 -cc1 -triple arm64-apple-ios12.2.0 -Oz -fobjc-arc -emit-llvm GREYAssertDefaultConfiguration-9e1f3b.reduced.m
GREYAssertDefaultConfiguration-9e1f3b.reduced.m:1:16: warning: non-void function does not return a value [-Wreturn-type]
inline id a() {}
               ^
clang: ../../llvm/include/llvm/IR/InstrTypes.h:1969: Optional<llvm::OperandBundleUse> llvm::CallBase::getOperandBundle(uint32_t) const: Assertion `countOperandBundlesOfType(ID) < 2 && "Precondition violated!"' failed.
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: out/gn/bin/clang -cc1 -cc1 -triple arm64-apple-ios12.2.0 -Oz -fobjc-arc -emit-llvm GREYAssertDefaultConfiguration-9e1f3b.reduced.m
1.	<eof> parser at end of file
2.	Per-module optimization passes
3.	Running pass 'CallGraph Pass Manager' on module 'GREYAssertDefaultConfiguration-9e1f3b.reduced.m'.
 #0 0x00000000034a054c llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/lib/Support/Unix/Signals.inc:565:13
 #1 0x000000000349e45e llvm::sys::RunSignalHandlers() /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/lib/Support/Signals.cpp:72:18
 #2 0x00000000034a089f SignalHandler(int) /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/lib/Support/Unix/Signals.inc:407:1
 #3 0x00007f8505e34140 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14140)
 #4 0x00007f850594cce1 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007f8505936537 abort ./stdlib/abort.c:81:7
 #6 0x00007f850593640f get_sysdep_segment_value ./intl/loadmsgcat.c:509:8
 #7 0x00007f850593640f _nl_load_domain ./intl/loadmsgcat.c:970:34
 #8 0x00007f8505945662 (/lib/x86_64-linux-gnu/libc.so.6+0x34662)
 #9 0x0000000002be88bd (out/gn/bin/clang+0x2be88bd)
#10 0x0000000003afdd35 hasValue /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/include/llvm/ADT/Optional.h:194:53
#11 0x0000000003afdd35 hasValue /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/include/llvm/ADT/Optional.h:286:52
#12 0x0000000003afdd35 hasRVOpBundle /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/include/llvm/Analysis/ObjCARCUtil.h:42:61
#13 0x0000000003afdd35 llvm::InlineFunction(llvm::CallBase&, llvm::InlineFunctionInfo&, llvm::AAResults*, bool, llvm::Function*) /usr/local/google/home/thakis/src/llvm-project/out/gn/../../llvm/lib/Transforms/Utils/InlineFunction.cpp:1953:9

Given that the repro is so tiny, I've reverted this for now in de1966e5427985163f8e816834b3a0564b5e24cd.

It looks like there is a pre-existing bug in replaceUsesOfNonProtoConstant where the operand bundles of all call sites are accumulated into newBundles. This will be fixed if its declaration is moved into the loop body. Also, we found another case of deadargelim changing the return type of the called function to void. I'll post a new patch when I have the fixes.

This patch causes opt to crash when the following IR is compiled:

$ cat test.ll

@g0 = global i8 zeroinitializer, align 1

define internal i8* @foo() {
  ret i8* @g0
}

define void @test() {
  %r = call i8* @foo() [ "clang.arc.rv"(i64 1) ]
  call void (...) @llvm.objc.clang.arc.noop.use(i8* %r)
  ret void
}

declare void @llvm.objc.clang.arc.noop.use(...)

!llvm.module.flags = !{!0}

!0 = !{i32 1, !"clang.arc.retainAutoreleasedReturnValueMarker", !"mov\09fp, fp\09\09// marker for objc_retainAutoreleaseReturnValue"}

$ opt -ipsccp -deadargelim -objc-arc -o - -S test.ll

What's happening is that ipsccp replaces argument %r passed to call @llvm.objc.clang.arc.noop.use with @g0 and then deadargelim changes the return type of @foo to void since there are no explicit users of its return. ARC optimizer crashes because it expects a call with the operand bundle to have a return.

To fix the crash, we can modify tryToReplaceWithConstant to prevent ipsccp from replacing @llvm.objc.clang.arc.noop.use's argument. Alternatively, we can restore the check in DeadArgumentEliminationPass::SurveyFunction, which was in earlier versions of the patch (see https://reviews.llvm.org/D92808?id=319082#inline-892965). I feel like the latter option is better since we wouldn't' have to prevent ipsccp from optimizing the IR, which is perfectly legal. If it's not desirable to add a check that is too specific to ObjC, we could add a new generic function to the core library, which indicates the function's return type shouldn't be changed, and use it to tell deadargelim not to change the return type. Or we could use a bundle that is more generic than clang.arc.rv (for example, "implicitly.used.by"(@llvm.objc.retainAutoreleasedReturnValue), which indicates the result is used by an instruction that isn't explicitly emitted in the IR).

I have a couple of suggestions below to add to the mix (but I don't feel strongly about either of them, just adding them for consideration).

define void @test() {
  %r = call i8* @foo() [ "clang.arc.rv"(i64 1) ]
  call void (...) @llvm.objc.clang.arc.noop.use(i8* %r)
  ret void
}

Firstly, it feels to me like this IR would clarify that the bundle looks at the return value:

define void @test() {
  %r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* %r) ]
  ret void
}

Notice this passes %r to the operand bundle, which the verifier rejects. I wonder if the IR rule could safely be weakened to allow an operand bundle to reference the call/invoke it's attached to? (Not sure what all the implications would be, or if this would be sufficient...)

Or we could use a bundle that is more generic than clang.arc.rv (for example, "implicitly.used.by"(@llvm.objc.retainAutoreleasedReturnValue), which indicates the result is used by an instruction that isn't explicitly emitted in the IR).

Secondly, I agree it might be cleaner to formalize a generic operand bundle that has the semantics ARC needs. It might be nice to have this also model the inlining semantics, maybe something like "returnfilter":

%x = call type @f1(...) [ "returnfilter"(type(type)* @f2) ]

The semantics could be that %x is implicitly passed through @f2. Concretely, maybe that's:

  • %x has an implicit use.
  • @f1's return type cannot be changed.
  • Inlining @f1 generates explicit calls to @f2.
  • The backend lowers the bundle as late as possible to calls to @f2.

Thirdly, you could combine the two previous ideas, generalizing the semantics to an "onreturn" bundle that is allowed to self-reference the callable, that the inliner knows about, etc., something like:

%x = call type @f1(...) [ "onreturn"(type(type)* @f2, type %x) ]

This @f2 is implicitly called and its return used in place of %x, but it explicitly lists its arguments. Compared to "returnfilter", it has the flexibility to specify argument order and might be more useful for other applications.

The ultimate code-generation consequences of this feature aren't very generic, so we shouldn't let ourselves get caught up designing an extremely general feature that ultimately either doesn't do what we need it to do or doesn't actually work for any of the generalizations.

The ultimate code-generation consequences of this feature aren't very generic, so we shouldn't let ourselves get caught up designing an extremely general feature that ultimately either doesn't do what we need it to do or doesn't actually work for any of the generalizations.

Sure, those are two bad potential outcomes.

Stepping back, the goal of this patch is to encode ARC's code generation invariants in the IR so that maintainers don't inadvertently break them. I think the IR should describe those invariants as clearly and precisely as is reasonable.

Besides idle / misplaced optimism about reuse, I was trying to make the following points along those lines, so I'll restate them more directly:

  • Explicitly adding a use of the call avoids having to teach transformations / analyses about it. It might be simpler to add a self-reference carve-out for this operand bundle (or operand bundles in general) than to teach all the passes that this operand bundle implies a use of the callsite (but I'm not sure; maybe there are lots of assumptions about self-references not existing outside of PHIs...).
  • Passing the function that's implicitly called into the operand bundle is simpler to reason about than an opaque table index.
  • Naming the operand bundle after its code generation consequences makes it simpler reason about. (A few generic names that make some sense to me (besides what I listed yesterday) include "callafter", "callimmediately", "forwardto", and "attachedcall". If it's important to make this ARC only, it could be (e.g.) "clang.arc.attachedcall".)
ahatanak updated this revision to Diff 322867.Feb 10 2021, 5:20 PM

Fix two bugs which were discovered.

  • Fix bug in replaceUsesOfNonProtoConstant.
  • Teach SCCP not to replace the return of a "clang.arc.rv" call with a constant.
ahatanak reopened this revision.Feb 10 2021, 5:41 PM

The ultimate code-generation consequences of this feature aren't very generic, so we shouldn't let ourselves get caught up designing an extremely general feature that ultimately either doesn't do what we need it to do or doesn't actually work for any of the generalizations.

Sure, those are two bad potential outcomes.

Stepping back, the goal of this patch is to encode ARC's code generation invariants in the IR so that maintainers don't inadvertently break them. I think the IR should describe those invariants as clearly and precisely as is reasonable.

Besides idle / misplaced optimism about reuse, I was trying to make the following points along those lines, so I'll restate them more directly:

  • Explicitly adding a use of the call avoids having to teach transformations / analyses about it. It might be simpler to add a self-reference carve-out for this operand bundle (or operand bundles in general) than to teach all the passes that this operand bundle implies a use of the callsite (but I'm not sure; maybe there are lots of assumptions about self-references not existing outside of PHIs...).
  • Passing the function that's implicitly called into the operand bundle is simpler to reason about than an opaque table index.
  • Naming the operand bundle after its code generation consequences makes it simpler reason about. (A few generic names that make some sense to me (besides what I listed yesterday) include "callafter", "callimmediately", "forwardto", and "attachedcall". If it's important to make this ARC only, it could be (e.g.) "clang.arc.attachedcall".)

The second and third suggestions seem like an improvement over the bundle used in this patch.

I'm not sure about the first one though as it seems to me that you'll still have to teach at least some of the passes about the self-reference.

For example, if SCCP just does a normal RAUW on the following call, which is taken from the example I posted,

%r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* %r) ]

will become

%r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* @g0) ]

%r doesn't have an explicit use in the IR anymore, so passes like deadargelim can change the return type of the function to void.

This revision is now accepted and ready to land.Feb 10 2021, 5:41 PM
ahatanak requested review of this revision.Feb 10 2021, 5:42 PM
fhahn added a comment.Feb 11 2021, 5:12 AM

The ultimate code-generation consequences of this feature aren't very generic, so we shouldn't let ourselves get caught up designing an extremely general feature that ultimately either doesn't do what we need it to do or doesn't actually work for any of the generalizations.

Sure, those are two bad potential outcomes.

Stepping back, the goal of this patch is to encode ARC's code generation invariants in the IR so that maintainers don't inadvertently break them. I think the IR should describe those invariants as clearly and precisely as is reasonable.

Besides idle / misplaced optimism about reuse, I was trying to make the following points along those lines, so I'll restate them more directly:

  • Explicitly adding a use of the call avoids having to teach transformations / analyses about it. It might be simpler to add a self-reference carve-out for this operand bundle (or operand bundles in general) than to teach all the passes that this operand bundle implies a use of the callsite (but I'm not sure; maybe there are lots of assumptions about self-references not existing outside of PHIs...).
  • Passing the function that's implicitly called into the operand bundle is simpler to reason about than an opaque table index.
  • Naming the operand bundle after its code generation consequences makes it simpler reason about. (A few generic names that make some sense to me (besides what I listed yesterday) include "callafter", "callimmediately", "forwardto", and "attachedcall". If it's important to make this ARC only, it could be (e.g.) "clang.arc.attachedcall".)

The second and third suggestions seem like an improvement over the bundle used in this patch.

+1, especially passing the function to call as argument to the bundle would make the lowering slightly easier; at the moment we may need to inject a new declaration during codegen.

I'm not sure about the first one though as it seems to me that you'll still have to teach at least some of the passes about the self-reference.

The patch already tries to do that, by adding calls to llvm.objc.clang.arc.noop.use, to make it explicit that there is a use of the returned value. This ensures most optimizations are aware of the fact that the return value is used. The problem Akira is that regular uses can be updated and are not enough to block the problematic transformation in SCCP.

If the use is directly at the bundle, I think we could somehow teach to lowering to use the referenced value and pass that to the ObjC runtime. But it would disable the runtime optimization.

I think in the short-term it would be OK to update SCCP as done in this patch, if we specify the arc bundle implicitly uses the return value, which does not refer to any ObjC specifics. We already do something similar for function with certain attributes.

I think it would be good to consider generalizing this 'implicitly used return value bundle' concept as a follow up, if there are other potential users. We then already laid the ground-work and updated the existing passes to work for the arc bundle and only have to update the checks.

@dexonsmith @rjmccall What do you think? Would you be happy with iterating on the suggestions in tree?

llvm/docs/LangRef.rst
2335 ↗(On Diff #322867)

Can you add a sentence making it explicit that calls with that bundle implicitly use the returned value?

llvm/lib/Transforms/Scalar/SCCP.cpp
148 ↗(On Diff #322867)

I think it would be slightly simpler to have a positive name for the set, e.g. MustPreserveReturnsInFn. What do you think? IMO that's more general than referring to 'not zapping'.

1643 ↗(On Diff #322867)

I'd keep things generic and say something like Calls with "clang.arc.rv" implicitly use the return value and those uses cannot be updated with a constant..

fhahn added a comment.Feb 11 2021, 8:06 AM

Another thing I noticed that there's verifier support missing. I think we should at least check that only a single clang.arc.rv bundle is specified (https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3191). We should probably also enforce that the bundle is only provided for functions with an i8* return type. That can also be done after the main patch lands.

For example, if SCCP just does a normal RAUW on the following call, which is taken from the example I posted,

%r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* %r) ]

will become

%r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* @g0) ]

%r doesn't have an explicit use in the IR anymore, so passes like deadargelim can change the return type of the function to void.

Is that a problem? It seems like this keeps all the information ARC needs. This could be lowered to:

call void @foo()
call @unsafeClaimAutoreleasedReturnValue(i8* @g0)

The ARC optimizer and ARC lowering both need to know about this case, but I don't see why deadargelim needs to know anything special.

@dexonsmith @rjmccall What do you think? Would you be happy with iterating on the suggestions in tree?

I don't feel strongly either way; happy for it to land as-is (bugs fixed) and iterated on there. One thought is that if the name is going to change it'd be nice to have it right for git-blame (today I would vote for "clang.arc.attachedcall" but anything a bit more clear would be fine), but of course that can change later too.

ahatanak marked 3 inline comments as done.Feb 11 2021, 11:50 AM

For example, if SCCP just does a normal RAUW on the following call, which is taken from the example I posted,

%r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* %r) ]

will become

%r = call i8* @foo() [ "clang.arc.rv"(i64 1, i8* @g0) ]

%r doesn't have an explicit use in the IR anymore, so passes like deadargelim can change the return type of the function to void.

Is that a problem? It seems like this keeps all the information ARC needs. This could be lowered to:

call void @foo()
call @unsafeClaimAutoreleasedReturnValue(i8* @g0)

The ARC optimizer and ARC lowering both need to know about this case, but I don't see why deadargelim needs to know anything special.

I see, you are right.

ahatanak updated this revision to Diff 323107.Feb 11 2021, 11:54 AM

Address all review comments except for the ones about the bundle name.

Another thing I noticed that there's verifier support missing. I think we should at least check that only a single clang.arc.rv bundle is specified (https://github.com/llvm/llvm-project/blob/main/llvm/lib/IR/Verifier.cpp#L3191). We should probably also enforce that the bundle is only provided for functions with an i8* return type. That can also be done after the main patch lands.

I added the checks to the verifier. Note that the verifier accepts calls returning any pointer type since the return type isn't always i8* (e.g., NSObject *foo(void)).

fhahn accepted this revision.Feb 11 2021, 12:06 PM

LGTM as initial step, but it would be good to adjust the name as Duncan suggested before landing.

This revision is now accepted and ready to land.Feb 11 2021, 12:06 PM
ahatanak updated this revision to Diff 323218.Feb 11 2021, 8:58 PM
ahatanak retitled this revision from [ObjC][ARC] Use operand bundle 'clang.arc.rv' instead of explicitly emitting retainRV or claimRV calls in the IR to [ObjC][ARC] Use operand bundle 'clang.arc.attachedcall' instead of explicitly emitting retainRV or claimRV calls in the IR.
ahatanak edited the summary of this revision. (Show Details)
  • Renamed the operand bundle name to clang.arc.attachedcall. I also tried to pass the address of the ARC function as the argument, but the verifier doesn't allow taking addresses of intrinsics.
  • Removed the call to @llvm.objc.clang.arc.noop.use, which in some cases was preventing the inliner to attach the operand bundle to a call in the callee function, when the operand bundle is removed from a call (see BundledRetainClaimRVs::eraseInst).
  • Make sure emitOptimizedARCReturnCall doesn't drop any existing operand bundle on a call when adding the new operand bundle to the call.
This revision was landed with ongoing or failed builds.Feb 12 2021, 9:52 AM
This revision was automatically updated to reflect the committed changes.

I ended up reverting the changes I made to llvm/lib/IR/AutoUpgrade.cpp as the file was including llvm/Analysis/ObjCARCUtil.h, which was violating layering.

It looks like it was not actually reverted in the version ultimately submitted. I've pushed commit 3c06676de14d7dd6dc3d1bf2989c503a2a5d438a which reverts the change to llvm/lib/IR/AutoUpgrade.cpp (duplicating the string constant).

Thanks! I didn't realize it hadn't been fixed in the patch I committed.

FYI, we're seeing test failures caused by this patch in iOS/arm64 builds when arc is used (simulator is fine though): https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c11

We'll try to investigate a bit more on our side, but I wanted to give an early(ish) heads-up in case others see this or whatnot.

Not sure if this landed before or after the 12.0 branch.

FYI, we're seeing test failures caused by this patch in iOS/arm64 builds when arc is used (simulator is fine though): https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c11

We'll try to investigate a bit more on our side, but I wanted to give an early(ish) heads-up in case others see this or whatnot.

Not sure if this landed before or after the 12.0 branch.

Thank you. Let me know when you have more information.

This isn't in release/12.x. The older version of the patch, which used attributes instead of operand bundles, was in release/12.x, but got reverted later.

FYI, we're seeing test failures caused by this patch in iOS/arm64 builds when arc is used (simulator is fine though): https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c11

We'll try to investigate a bit more on our side, but I wanted to give an early(ish) heads-up in case others see this or whatnot.

Not sure if this landed before or after the 12.0 branch.

Thank you. Let me know when you have more information.

This isn't in release/12.x. The older version of the patch, which used attributes instead of operand bundles, was in release/12.x, but got reverted later.

Just to confirm, the revert is also in release/12.x, so I don't think there's any branch concerns here.

thakis added a comment.Mar 3 2021, 5:39 AM

Thank you. Let me know when you have more information.

Repro is moving along. https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c26 and onward are getting pretty close.

fhahn added a comment.Mar 3 2021, 5:55 AM

Thank you. Let me know when you have more information.

Repro is moving along. https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c26 and onward are getting pretty close.

That's interesting, thanks for sharing! I had a look at the linked good.s/bad.s and the sequence mentioned in the comments looks unexpected. mov x29, x29 should be exactly between a regular call and _objc_retainAutoreleasedReturnValue, but this is not what's happening in this case. I could not find a link to download the (pre-processed) source file. Would it be possible to get that file?

Ltmp239:
	bl	_objc_msgSend
Ltmp705:
	mov	x29, x29
Ltmp240:
	b	LBB19_3
LBB19_3:                                ; %invoke.cont4
	.loc	149 0 9                         ; ../../ios/chrome/browser/ui/infobars/infobar_container_coordinator.mm:0:9
	bl	_objc_retainAutoreleasedReturnValue
hans added a subscriber: hans.Mar 3 2021, 6:45 AM

The preprocessed source is attached to this comment: https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c26

good.s and bad.s are here: https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c28

And this shows the problematic difference between them: https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c30

After this change, for some reason x0 is getting clobbered in the prologue.

In https://bugs.chromium.org/p/chromium/issues/detail?id=1182642#c31 I verify that manually changing the register back to x1 fixes the issue.

fhahn added a comment.Mar 3 2021, 1:59 PM

Thanks! I pushed a fix for the issue in 75805dce5ff8. @hans, would it be possible to check if D92808 + 75805dce5ff8 fixes the issue or should we just recommit D92808?

hans added a comment.Mar 4 2021, 1:23 AM

Thanks! I pushed a fix for the issue in 75805dce5ff8. @hans, would it be possible to check if D92808 + 75805dce5ff8 fixes the issue or should we just recommit D92808?

Yes, I confirmed that 75805dce5ff8 fixes the test case I was looking at. Thanks!

fhahn added a comment.Mar 4 2021, 11:24 AM

Thanks! I pushed a fix for the issue in 75805dce5ff8. @hans, would it be possible to check if D92808 + 75805dce5ff8 fixes the issue or should we just recommit D92808?

Yes, I confirmed that 75805dce5ff8 fixes the test case I was looking at. Thanks!

Thanks for confirming!