This is an archive of the discontinued LLVM Phabricator instance.

[Clang][CodeGen]`vtable`, `typeinfo` et al. are globals
ClosedPublic

Authored by AlexVlx on Jun 15 2023, 6:07 PM.

Details

Summary

All data structures and values associated with handling virtual functions / inheritance, as well as RTTI, are globals and thus can only reside in the global address space. This was not taken fully taken into account because for most targets, global & generic appear to coincide. However, on targets where global & generic ASes differ (e.g. AMDGPU), this was problematic, since it led to the generation of invalid bitcasts (which would trigger asserts in Debug) and less than optimal code. This patch does two things:

  • ensures that vtables, vptrs, vtts, typeinfo are generated in the right AS, and populated accordingly;
  • removes a bunch of bitcasts which look like left-overs from the typed ptr era.

This is a bit more noisy than I'd have liked, but functionality is somewhat spread out. There's one bit of less than ideal code, stemming from the fact that functions are in the generic AS, and thus it's necessary to insert a constexpr cast from generic to global when populating the vtable. Adjusting appears disruptive enough to prefer to do it separately (unless I missed something obvious).

Diff Detail

Event Timeline

AlexVlx created this revision.Jun 15 2023, 6:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 6:07 PM
AlexVlx requested review of this revision.Jun 15 2023, 6:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 15 2023, 6:07 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
AlexVlx updated this revision to Diff 532470.Jun 18 2023, 6:00 AM

clang-format

Gentle ping.

AlexVlx updated this revision to Diff 534380.Jun 25 2023, 4:16 PM

Fixed issue found via internal testing (thanks @yaxunl).

Fixed issue found via internal testing (thanks @yaxunl).

Can we add a test to cover the regression found via internal testing? Thanks.

AlexVlx updated this revision to Diff 534670.Jun 26 2023, 11:34 AM

Add missing test for vtable initializers on the __device__ side.

Fixed issue found via internal testing (thanks @yaxunl).

Can we add a test to cover the regression found via internal testing? Thanks.

Done, I had forgotten about that.

This could be a good chance to switch VT to constant address space instead of global address space. AFAIK if a target has global addr space they usually also has constant addr space since they usually support OpenCL or CUDA/HIP. Is there any reason we cannot introduce a CGM.ConstantGlobalsInt8PtrTy and use it for VT instead?

This could be a good chance to switch VT to constant address space instead of global address space. AFAIK if a target has global addr space they usually also has constant addr space since they usually support OpenCL or CUDA/HIP. Is there any reason we cannot introduce a CGM.ConstantGlobalsInt8PtrTy and use it for VT instead?

I did give this some thought and the benefits are somewhat unclear to the point of being ultimately counterproductive. Note that these are already marked constant, which IIRC is / was going to be enough to get most of the benefits, at least on our back-end. Furthermore, the semantics of the constant address space are a bit weird in something like OpenCL e.g. A pointer that points to the constant address space cannot be cast or implicitly converted to the generic address space.. This would lead to weirdness when composing with CUDA / HIP, where constant is treated as device, which is to say global. IIRC, you are also meant to use magical interfaces to write into constant from the host, which a loader wouldn't necessarily do. Overall, I think that the OCL formulation of constant is actually meant to allow for relatively strange things like loading things into ROM or having different pointer types (be it width or canonicity). TL;DR, I am concerned that a target could validly have the constant addr space be disjoint from generic/flat, with no viable way to even cast between the two. We could say “yes, but this is not that constant”, but then if OCL ever starts supporting dynamic polymorphism it would get confusing.

yaxunl accepted this revision.Jun 30 2023, 8:27 AM

This could be a good chance to switch VT to constant address space instead of global address space. AFAIK if a target has global addr space they usually also has constant addr space since they usually support OpenCL or CUDA/HIP. Is there any reason we cannot introduce a CGM.ConstantGlobalsInt8PtrTy and use it for VT instead?

I did give this some thought and the benefits are somewhat unclear to the point of being ultimately counterproductive. Note that these are already marked constant, which IIRC is / was going to be enough to get most of the benefits, at least on our back-end. Furthermore, the semantics of the constant address space are a bit weird in something like OpenCL e.g. A pointer that points to the constant address space cannot be cast or implicitly converted to the generic address space.. This would lead to weirdness when composing with CUDA / HIP, where constant is treated as device, which is to say global. IIRC, you are also meant to use magical interfaces to write into constant from the host, which a loader wouldn't necessarily do. Overall, I think that the OCL formulation of constant is actually meant to allow for relatively strange things like loading things into ROM or having different pointer types (be it width or canonicity). TL;DR, I am concerned that a target could validly have the constant addr space be disjoint from generic/flat, with no viable way to even cast between the two. We could say “yes, but this is not that constant”, but then if OCL ever starts supporting dynamic polymorphism it would get confusing.

I agree that constant address space may be target dependent about what can be put there. I also agree there is

This could be a good chance to switch VT to constant address space instead of global address space. AFAIK if a target has global addr space they usually also has constant addr space since they usually support OpenCL or CUDA/HIP. Is there any reason we cannot introduce a CGM.ConstantGlobalsInt8PtrTy and use it for VT instead?

I did give this some thought and the benefits are somewhat unclear to the point of being ultimately counterproductive. Note that these are already marked constant, which IIRC is / was going to be enough to get most of the benefits, at least on our back-end. Furthermore, the semantics of the constant address space are a bit weird in something like OpenCL e.g. A pointer that points to the constant address space cannot be cast or implicitly converted to the generic address space.. This would lead to weirdness when composing with CUDA / HIP, where constant is treated as device, which is to say global. IIRC, you are also meant to use magical interfaces to write into constant from the host, which a loader wouldn't necessarily do. Overall, I think that the OCL formulation of constant is actually meant to allow for relatively strange things like loading things into ROM or having different pointer types (be it width or canonicity). TL;DR, I am concerned that a target could validly have the constant addr space be disjoint from generic/flat, with no viable way to even cast between the two. We could say “yes, but this is not that constant”, but then if OCL ever starts supporting dynamic polymorphism it would get confusing.

Sounds reasonable.

LGTM. Thanks.

This revision is now accepted and ready to land.Jun 30 2023, 8:27 AM

Thank you @yaxunl. @rjmccall / @efriedma any input on this? I'd like to try landing it next week to unblock some additional work. Thanks!

This revision was landed with ongoing or failed builds.Jul 19 2023, 10:05 AM
This revision was automatically updated to reflect the committed changes.
efriedma added inline comments.Jul 19 2023, 1:02 PM
clang/lib/CodeGen/CGVTables.cpp
836

If I follow correctly, the fnPtr is guaranteed to be in the global address-space, but GetAddrOfFunction returns a generic pointer. So the vtable entries are in the global address-space for efficiency? Seems reasonable.

There isn't really much point to explicitly checking FnAS != GVAS before the call to getAddrSpaceCast; getAddrSpaceCast does the same check internally.

AlexVlx added inline comments.Jul 19 2023, 1:15 PM
clang/lib/CodeGen/CGVTables.cpp
836

Right, we know that these are going to be in global memory, and there is overhead when dealing with flat/generic. I would've liked to remove the check, but I think that's infeasible with how getAddrSpaceCast is currently implemented, because it asserts on castIsValid; addrspacecasts from the same AS to the same AS are invalid, so the assert flares on targets where FnAS == GVAS (e.g. x86). This is not super ergonomic IMHO, as it should be valid & a NOP just returning the source, but that's a change that would be required to allow deleting the silly check, I believe.

bjope added a subscriber: bjope.Jul 19 2023, 5:07 PM
bjope added inline comments.
clang/lib/CodeGen/CGVTables.cpp
693

I noticed that we have some old fixes downstream that conflicts with the changes you've made here. I thought that perhaps we could get rid of those now when you've fixed the code upstream.

Isn't the VTable holding function pointers when not using the relative layout, and then this should be a pointer to the ProgramAddressSpace and not a pointer to the DefaultGlobalsAddressSpace?

Downstream we've been using a special FnVoidPtrTy here. Defined as FnVoidPtrTy = Int8Ty->getPointerTo(DL.getProgramAddressSpace());.

rjmccall added inline comments.Jul 19 2023, 11:20 PM
clang/lib/CodeGen/CGVTables.cpp
693

It's a mix. The type_info pointer should be in the global address space (although it would be forgivable to just use the default address space), the top, vbase, and vcall offsets are all ptrdiff_ts (presumably the same size as the default address space), and the virtual functions are function pointers. If we're going to support a target where those can be different sizes, we probably need to start computing a byte layout of the v-table and doing byte GEPs into it, because our current IR patterns are naively assuming the components are all the same size.