When lowering from a HLL that does not use explicit generic address spaces (e.g. HIP) to target specific IR for a target that uses explicit address spaces (e.g. AMDGPU), it is possible for an explicitly placed pointer to be passed as an argument to a function taking a generic pointer. An example of this situation is when GEPs are formed into VTTs. This flares in Debug builds (since a bitcast would be invalid) and, possibly, at runtime on targets where the bitwise representation of pointers in different address spaces is not identical.
Details
Diff Detail
Event Timeline
Thank you for the fix! Please be sure to add test coverage for the changes and a release note for the fix.
Adding the codegen code owners for review. FWIW, precommit CI is failing to build the patch.
Corrected typo that was breaking the build; rewrote the AS-using branch to account for the (unsupported, but possible, I think?) case of typed pointers, which was showing up in some tests that haven't been ported to opaque pointers.
Thank you and apologies for the noise. I will add test coverage (shied away from doing it as I ran into this in a fairly oblique context, but turns out there are some existing tests that hit pretty close to home and I'll start from those).
At first glance, this seems like the wrong place to put this cast. If an expression in the AST produces a pointer with generic pointer type, then CodeGen should generate code for that expression that returns a generic pointer type. We shouldn't wait until the pointer is used to force a cast to the correct type.
I agree with Eli; if the address space is wrong coming into this, we're doing something wrong at a more basic level.
Thanks for the feedback. I believe what is going on is that for something like HIP, unless one goes outside of the language / touches builtins or intrinsics, one will never obtain anything but generic pointers as functiona arguments. However, it is possible to bind e.g. a global (the VTT case I mentioned), and we emit those in the global address space, so there's a bit of impedance mismatch for lack of a better word. I guess two alternative approaches here could be either: a) hoist the cast into GetVTTParameter, so that the GEP is formed over a casted pointer; b) modify buildStructorSignature so that it inserts VTT as a pointer to a pointer to void* in the target's GlobalAS. Seems like the latter is closer to the spirit of what is being suggested, unless I'm reading you wrong?
Passing the VTT in the appropriate AS for global variables seems like the right way to go — we do know that that argument will always be such a pointer, so there's no point in converting to the generic address space.
For that matter, v-table pointer slots within classes should also be pointers into the global AS and not the generic AS. That's a separate issue, though.
Sounds good, I will rework this patch; I have a set of forthcoming patches dealing with the second paragraph, FWIW:)
Reworked in accordance with review comments, correcting the embedded assumption about VTT being always in the generic AS. This ended up being slightly noisier than anticipated since functionality is spread out.
Can you add a test? I think we have some in-tree targets which put globals in a non-default address space.
clang/lib/CodeGen/ItaniumCXXABI.cpp | ||
---|---|---|
1679–1681 | Please just use GetGlobalVarAddressSpace here; we should try to avoid these reverse-mappings. |
Use GetGlobalVarAddressSpace everywhere. Added test, which I'm marking XFAIL for now since it turns out I'll need to fix the remaining vptr / vtable handling parts, yay.
I've added the test (think we, i.e. AMDGPU, are the only ones doing this at the moment; both SPIRV & PTX use generic, and I've not checked DXIL since it doesn't seem like a target for this use case), but I'd like to suggest that we keep it disabled until I fix the other parts of handling this (vtables & vptrs), since as it is it will just produce invalid bitcasts in the IR. Would that be acceptable? Thanks.
It's well-formed as IR, just not semantically valid, right? As long as it's not actually crashing in the verifier, please test as much as you reasonably can that's correct, like that the constructor and destructor functions take something in the right address space, and that calls to them pass a valid VTT. And then yeah, when you pull a v-table out and try to assign it, that much will be incorrect until your next fix, but that's okay.
Apologies, should have clarified: it is not well-formed IR in that it bitcasts pointers in different address spaces (e.g. vtable or typeinfo initializers are incorrectly typed as global array of generic pointers), so it will crash the verifier - think this https://gcc.godbolt.org/z/shfobqMqY captures most (including the behaviour this patch addresses). We "get away" with this in AMDGPU because this is a seldom if ever exercised code path and, more importantly, because the bitwise representation of generic & global pointers is the same, so there are no actual runtime bugs. My intention was to fix each of the issues in separate, isolated patches, fill out this test with additional checks since it captures all the problematic cases, and re-enable it as part of the final patch.
I see. Yes, in that case I think you're right that we can't test this yet — only base-subobject ctors and dtors get VTT parameters, we only emit calls to those from other ctors and dtors, and those ctors and dtors will always have their own stores to the v-table slot that will be invalid IR. So I agree with the plan of building up an XFAILed test. You should be able to locally build up that test by just adding an early return to CodeGenFunction::InitializeVTablePointer — obviously that will break a bunch of other tests, but as long as you don't commit it, you can create a fairly complete test case that at least should work in the future. I find it helpful to do that as you're going instead of going back at the end of a patch series and trying to add tests for all the cases you implemented. Please test (1) the declarations of base-subobject ctors/dtors, (2) the initial passing of the VTT argument from complete-object to base-subobject ctors/dtors, and (3) the forwarding of VTT arguments in base-subobject ctors/dtors to the ctors/dtors of their own base subobjects.
The actual code LGTM.
Apologies for the delay in resuming this. I've updated the test to be more comprehensive. FWIW I do not have commit rights, should I just follow the procedure here https://www.llvm.org/docs/DeveloperPolicy.html#obtaining-commit-access, or have a few initial patches committed by one of my colleagues & then ask for commit access?
The process is pretty lightweight; I'd recommend just doing it if you expect to make more than one patch. But we don't have a policy against committing patches for other people as long as there's clear attribution in the commit.
I do think you could successfully test the basic code patterns with passing VTT arguments. Your test is solely testing that these constructors and destructors are being declared with the right signature, but it doesn't test any calls to them.
Thanks, I've sorted the commit access since this (and some associated work) will entail multiple patches. I would like to clarify the second point to make sure that I'm incorporating that suggestion - the checks on lines 25, 26 & 27 are for actual calls - since now the signature matches the VTT type there's no arg setup prior to the call, it's a direct passing of the global. Were you thinking about something else here / am I missing somethign obvious?
clang/lib/CodeGen/ItaniumCXXABI.cpp | ||
---|---|---|
1630–1632 | Does it worth extracting the code as Context.getVTTType() since it is used at three locations. Since VTT seems to be immutable, in case we want to put it in constant addr space in the future, it will make things easier. |
clang/lib/CodeGen/ItaniumCXXABI.cpp | ||
---|---|---|
1630–1632 | That's not a bad idea. I think it might be profitable to do something like Context.getVTableType(), since there's actually 2.5 interlinked things here (VTT, vtable & vptr), and it makes intuitive to me to base it all around that (vptr points to vtbl) etc. |
I would like to clarify the second point to make sure that I'm incorporating that suggestion - the checks on lines 25, 26 & 27 are for actual calls - since now the signature matches the VTT type there's no arg setup prior to the call, it's a direct passing of the global. Were you thinking about something else here / am I missing somethign obvious?
Oh, sorry, somehow I glazed over those three lines.
clang/lib/CodeGen/ItaniumCXXABI.cpp | ||
---|---|---|
1630–1632 | Actually, I think I should probably do this in a separate patch; we already have getVtblPtrAddressSpace() in TargetInfo, but it doesn't seem to be used anywhere but for generating debug info. I suspect the latter will be subtly wrong for us anyway, because we'll advertise the constant AS, even though we're today using global. TL;DR, I'd handle this suggestion separately @yaxunl, if you don't mind. |
clang/lib/CodeGen/ItaniumCXXABI.cpp | ||
---|---|---|
1630–1632 | OK for me |
Does it worth extracting the code as Context.getVTTType() since it is used at three locations. Since VTT seems to be immutable, in case we want to put it in constant addr space in the future, it will make things easier.