This is an archive of the discontinued LLVM Phabricator instance.

[RFC][Clang][Codegen] `std::type_info` needs special care with explicit address spaces
ClosedPublic

Authored by AlexVlx on Aug 8 2023, 4:17 PM.

Details

Summary

After https://reviews.llvm.org/D153092, for targets that use a non-default AS for globals, an "interesting" situation arises around typeid and its paired type, type_info:

  • on the AST level, the type_info interface is defined with default / generic addresses, be it for function arguments, or for this;
  • in IR, type_info values are globals, and thus pointers to type_info values are pointers to global

This leads to a mismatch between the function signature / formal type of the argument, and its actual type. Currently we try to handle such mismatches via bitcast, but that is wrong in this case, since an ascast is required. However, I'm not convinced this is the right way to address it, I've just not found a better / less noisy one (yet?); the other alternative would be to special case codegen for typeinfo itself, and adjust the IR function signatures there. That's less centralised and doesn't actually seem correct, since type_info has a C++(mangled) interface, and we'd be basically lying about the type. Hopefully I'm missing some obvious and significantly more tidy solution - hence the RFC.

Diff Detail

Event Timeline

AlexVlx created this revision.Aug 8 2023, 4:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 4:17 PM
AlexVlx requested review of this revision.Aug 8 2023, 4:17 PM
yaxunl added a comment.Aug 9 2023, 7:24 AM

It is a little concerning how far the global address will spread further. Compared to handling user-defined global variables, we keep the global address to its definition in the IR and any use of it will use the generic pointer addrcasted from its definition. This simplifies things a lot since the AST is not aware of the global address. Should we reconsider the handling of type id and type info here with a similar approach to how the user-defined global variables are handled? Or we are confident that the effect of global address can be confined.

It is a little concerning how far the global address will spread further. Compared to handling user-defined global variables, we keep the global address to its definition in the IR and any use of it will use the generic pointer addrcasted from its definition. This simplifies things a lot since the AST is not aware of the global address. Should we reconsider the handling of type id and type info here with a similar approach to how the user-defined global variables are handled? Or we are confident that the effect of global address can be confined.

This is a really good observation / question. I'm fairly confident this should be the last piece of noise, until (if) we decide to do something about functions. My reservation about address casting uses is that it might lead to a proliferation of casts, and it also appeared (when I tried) that it would be fairly spread out. type_info is special and a bit obnoxious because it is actually defined in the standard and has a mangled interface, so "lying" about the signature of its member functions in IR seems risky too. FWIW, this was silently incorrect in some cases today as well (without asserts we just do the bitcast, which happens to work on our target).

arsenm added inline comments.Aug 10 2023, 2:51 PM
clang/lib/CodeGen/CGCall.cpp
5237–5238 ↗(On Diff #548380)

you can also just unconditionally call CreateAddrSpaceCast and let it no-op if the types match

rjmccall added a comment.EditedAug 14 2023, 2:54 PM

The path of least resistance here is that IRGen should just insert conversions from the global AS to the default AS as part of evaluating typeid. I haven't looked at it closely, but that seems to be what this patch is doing.

However, std::type_info is an interesting special case in that we actually know statically that all objects of that type will be allocated in the global AS, so there's really no reason to pass around pointers in the default AS; std::type_info * should just default to being in the global AS. It'd be a non-trivial feature in support of a somewhat uncommonly-used C++ feature, and I can't tell how best to spend your time, *but*... if you're so inclined, I think it would make a somewhat compelling feature to be able to declare a default AS for a class type, which your target could then adopt in the headers for std::type_info.

AlexVlx updated this revision to Diff 550311.Aug 15 2023, 6:53 AM

Rework the patch as the proposed approach was unsound; keep typeid generic.

AlexVlx added a comment.EditedAug 15 2023, 6:57 AM

The path of least resistance here is that IRGen should just insert conversions from the global AS to the default AS as part of evaluating typeid. I haven't looked at it closely, but that seems to be what this patch is doing.

However, std::type_info is an interesting special case in that we actually know statically that all objects of that type will be allocated in the global AS, so there's really no reason to pass around pointers in the default AS; std::type_info * should just default to being in the global AS. It'd be a non-trivial feature in support of a somewhat uncommonly-used C++ feature, and I can't tell how best to spend your time, *but*... if you're so inclined, I think it would make a somewhat compelling feature to be able to declare a default AS for a class type, which your target could then adopt in the headers for std::type_info.

I've reworked things along these lines, as both you and @yaxunl suggested, thank you; turns out that what I was doing was unsound / would not catch all cases, whereas this does. As for the feature suggestion, that actually seems as if it would be very useful, beyond this application; I will add it to my TODO list, although I cannot promise to get to investigating it right away.

AlexVlx added inline comments.Aug 15 2023, 7:03 AM
clang/lib/CodeGen/CGCall.cpp
5237–5238 ↗(On Diff #548380)

I would've if I could've:) Sadly, CastIsValid returns false for address space casts between the same AS: https://github.com/llvm/llvm-project/blob/ca68a7f956f24aa3882134c5d8e72659355292dc/llvm/lib/IR/Instructions.cpp#L3895, and this makes asserts flare. I'm not sure if that is vestigial or just overly cautious behaviour.

AlexVlx updated this revision to Diff 550516.Aug 15 2023, 3:59 PM

Remove unneeded cast, the dynamic case already emitted a generic pointer to typeinfo

Gentle ping.

yaxunl accepted this revision.Aug 23 2023, 11:02 AM

LGTM. Thanks.

This revision is now accepted and ready to land.Aug 23 2023, 11:02 AM
This revision was landed with ongoing or failed builds.Aug 28 2023, 12:44 PM
This revision was automatically updated to reflect the committed changes.