This is an archive of the discontinued LLVM Phabricator instance.

[GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots
ClosedPublic

Authored by kubamracek on Sep 1 2021, 5:24 PM.

Diff Detail

Event Timeline

kubamracek created this revision.Sep 1 2021, 5:24 PM
kubamracek requested review of this revision.Sep 1 2021, 5:24 PM
fhahn added a comment.Sep 2 2021, 12:29 AM

Could you elaborate on the motivation? If it is unused, it shouldn't matter what value we put in the slot, right?

Two distinct motivations:

  1. Turns out that the "relative pointer" expression, if it gets a NULL pointer replaced into it...
i32 trunc (i64 sub (i64 0, i64 ptrtoint ({ [2 x i32] }* @vtable to i64)) to i32)

...becomes invalid as a relocation expression, and so it can't actually be lowered into machine instructions.

  1. To make diagnosing possible miscompiles easier, because the crash will have a unique symbol name.
pcc added a comment.Sep 2 2021, 11:12 AM

Maybe you can set the relative offset to 0 in this case? That way, the crashing PC will point to the vtable, which should make it easier to track down the source of any problems. This would also require no runtime support (so doesn't need to be behind a flag) and also saves a PLT slot if the trap function would be in another DSO.

Maybe you can set the relative offset to 0 in this case? That way, the crashing PC will point to the vtable, which should make it easier to track down the source of any problems. This would also require no runtime support (so doesn't need to be behind a flag) and also saves a PLT slot if the trap function would be in another DSO.

Yes, that's certainly a feasible option, too. Although, may I ask what is it you don't like about the trap function approach? The patch as it is emits the trap function into the current translation unit, so I think it'll never be in another DSO, and if yes then wouldn't that be an invalid relocation again (and defeat the purpose of this change)?

I am asking because I think the implementation of the "set offset to 0" would be more complicated and more invasive -- the F->replaceNonMetadataUsesWith() step would then need to "walk up" the expression somehow and replace the outer "i32 trunc" with an "i32 0", which does not necessarily need to be the top-level entry in the array (the vtable structure for Swift classes is more complex than just a array of pointers).

pcc added a comment.Sep 2 2021, 12:47 PM

Oh right, I missed that you were emitting the function locally. So then replace "saves a PLT slot" with "saves emitting a trap function".

The PLT slot would be possible if it were an external function and you were using dso_local_equivalent to refer to it -- I'm not sure if you're doing that in Swift or even if the Mach-O side of that is implemented.

It seems that this should depend on what kind of relocation this is (absolute or relative), rather than a module flag. Because it's always possible that you'll want both modes in the same linkage unit (e.g. LTOing together C++ and Swift), and forcing everything to use the same mode will either produce invalid relocations or unnecessary relocations.

It doesn't seem too bad to look through the use list to find any ptrtoint references and then any sub references. At that point you can replace with 0, no need to look for the trunc as well.

It doesn't seem too bad to look through the use list to find any ptrtoint references and then any sub references. At that point you can replace with 0, no need to look for the trunc as well.

This sounds like a good option to me as well!

llvm/test/Transforms/GlobalDCE/virtual-functions-trap.ll
4 ↗(On Diff #370120)

can you add a comment why there are multiple run lines with different numbers of globaldce invocations?

kubamracek added inline comments.Sep 13 2021, 2:50 PM
llvm/test/Transforms/GlobalDCE/virtual-functions-trap.ll
4 ↗(On Diff #370120)

I'm going to drop that, since it looks like the way forward is to avoid emitting the trap function. The explanation why it was there is that in a previous iteration of the trap-function approach, I didn't have the if (name == kTrapFunctionName) IgnoreDependency = false change in UpdateGVDependencies, which caused a funny bug: If GlobalDCE is run exactly once, it works, fine, but the second time it runs (and in a typical full optimization pipeline it does run multiple times), it tried to eliminate the trap call as well, causing a F->replaceNonMetadataUsesWith(F) call which is invalid and asserts.

kubamracek retitled this revision from [GlobalDCE] Add a mode to VFE that replaces unused vtable slots with a trap function instead of a NULL pointer to [GlobalDCE] In VFE, replace the whole 'sub' expression of unused relative-pointer-based vtable slots.

Changing the title to reflect the change in direction of this patch.

fhahn added a comment.Sep 17 2021, 1:47 AM

Thanks for the update! Could you update the patch description? it looks empty at the moment.

llvm/lib/Transforms/IPO/GlobalDCE.cpp
427

Would it be possible to either only iterate over all users if we know that there are relative pointers or iterate over all uses once and either replace the sub with zero or the use with nullptr?

429

Can the code here (matching ptrtoint/sub) be unified with the code that matches it in getPointerAtOffset, so those patterns won't diverge?

430

the assert here seems dangerous, as nothing guards against PtrExpr having multiple users, right? PtrExpr having multiple users is valid IR, so we should probably handle this gracefully.

Can the code here (matching ptrtoint/sub) be unified with the code that matches it in getPointerAtOffset, so those patterns won't diverge?

Suggestions how to do that? The code in getPointerAtOffset is quite different, because is recurses *down into* the expression, whereas here we need to walk up by looking at users. I could move the code to be next to each other, would that be good enough?

the assert here seems dangerous, as nothing guards against PtrExpr having multiple users, right? PtrExpr having multiple users is valid IR, so we should probably handle this gracefully.

I believe the assert actually holds. At this point, we know that F is *only* referenced from vtables (either one or more) and nothing else (otherwise it wouldn't be in DeadFunctions). F can have multiple users, if there's multiple vtables, but each use is a Constant/ConstantExpr in a vtable. Unless I misunderstand something, I believe in a ConstantExpr like trunc(sub(ptrtoint(x),ptrtoint(y)) all the nodes always have a unique single user and that's the parent ConstantExpr. Therefore the ptrtoint here, which must be in the vtable expression, must always have a single user.

Would it be possible to either only iterate over all users if we know that there are relative pointers or iterate over all uses once and either replace the sub with zero or the use with nullptr?

Attempted to do this, but it complicates the implementation. Not sure if the extra complexity is worth it (we only save an extra iteration over vtables, and presumably in realistic programs that's a limited amount of those).

fhahn added a comment.Sep 30 2021, 9:02 AM

Can the code here (matching ptrtoint/sub) be unified with the code that matches it in getPointerAtOffset, so those patterns won't diverge?

Suggestions how to do that? The code in getPointerAtOffset is quite different, because is recurses *down into* the expression, whereas here we need to walk up by looking at users. I could move the code to be next to each other, would that be good enough?

Yeah moving the matcher code to a function might help, as well as moving them next to each other.

the assert here seems dangerous, as nothing guards against PtrExpr having multiple users, right? PtrExpr having multiple users is valid IR, so we should probably handle this gracefully.

I believe the assert actually holds. At this point, we know that F is *only* referenced from vtables (either one or more) and nothing else (otherwise it wouldn't be in DeadFunctions). F can have multiple users, if there's multiple vtables, but each use is a Constant/ConstantExpr in a vtable. Unless I misunderstand something, I believe in a ConstantExpr like trunc(sub(ptrtoint(x),ptrtoint(y)) all the nodes always have a unique single user and that's the parent ConstantExpr. Therefore the ptrtoint here, which must be in the vtable expression, must always have a single user.

Sounds good, can you add a message/comment to the assert?

Would it be possible to either only iterate over all users if we know that there are relative pointers or iterate over all uses once and either replace the sub with zero or the use with nullptr?

Attempted to do this, but it complicates the implementation. Not sure if the extra complexity is worth it (we only save an extra iteration over vtables, and presumably in realistic programs that's a limited amount of those).

Agreed that it might not be too important, I'm fine with either.

kubamracek updated this revision to Diff 376706.Oct 2 2021, 9:21 AM

Yeah moving the matcher code to a function might help, as well as moving them next to each other.

Done.

Sounds good, can you add a message/comment to the assert?

Done.

Attempted to do this, but it complicates the implementation. Not sure if the extra complexity is worth it (we only save an extra iteration over vtables, and presumably in realistic programs that's a limited amount of those).

Agreed that it might not be too important, I'm fine with either.

Moved back to iterating over users twice, I think the benefits of avoiding the extra work are small and probably not worth the complexity.

fhahn accepted this revision.Oct 4 2021, 11:08 AM

LGTM, thanks!

llvm/include/llvm/Analysis/TypeMetadataUtils.h
80

shouldn't this be ///?

llvm/lib/Analysis/TypeMetadataUtils.cpp
204

it might be worth turning some of those checks into early continues instead, to reduce the nesting level.

This revision is now accepted and ready to land.Oct 4 2021, 11:08 AM
fhahn added inline comments.Oct 4 2021, 11:09 AM
llvm/lib/Analysis/TypeMetadataUtils.cpp
206

oh and just to double check, the assert was removed intentionally?

kubamracek added inline comments.Oct 4 2021, 11:31 AM
llvm/include/llvm/Analysis/TypeMetadataUtils.h
80

Fixed (and also fixed for getPointerAtOffset).

llvm/lib/Analysis/TypeMetadataUtils.cpp
204

Yes. Much better now :)

206

Yes, I dropped it intentionally, and replaced it with a loop over all users to be more conservative.

Actually, I did end up seeing the assert fail (!) in my testing on a large codebase (compiling the Swift stdlib with VFE on), but I couldn't figure out how to isolate the exact IR that could reproduce the problem. I guess I should try harder and add it as a lit testcase...

kubamracek added inline comments.Oct 4 2021, 12:09 PM
llvm/lib/Analysis/TypeMetadataUtils.cpp
206

Added "GlobalDCE/call-with-ptrtoint.ll" that triggers the assert -- in this case, we end up with a dead function that has a ptrtoint user, which itself has zero users (!), which I assume is an expected artifact of using GlobalDCE dropping all bodies of dead functions upfront. If unexpected, then it sounds like a separate bug.

In either case, I think dropping the assert and just iterating over all users of PtrExpr is the right approach here. It also makes this helper function more robust (as it doesn't assume the assert to hold).

fhahn added a comment.Oct 4 2021, 12:22 PM

Thanks for the update. Still LGTM, but please wait a day or 2 with committing in case there are additional comments.

llvm/lib/Analysis/TypeMetadataUtils.cpp
204

I think you could also combine both checks if (!PtrExpr || PtrExpr->getOpcode() !=...) but that's a super tiny nit.

206

Ok thanks for clarifying & adding the test.

kubamracek added inline comments.
llvm/lib/Analysis/TypeMetadataUtils.cpp
204

Fixed!

Thanks for the review!