If the alloca is accessed through an addrspacecasted pointer, allow
the normal changes on the alloca. Cast back to the original use
address space instead of the new alloca's natural address space.
Details
Diff Detail
Event Timeline
I don't have opinions on this, but Philip might have comments around whether this is okay for GC pointers.
No opinion GC wise, but mostly because I don't understand the intent of this patch. What is the desired outcome here?
For an equivalent bitcast, the type of the underlying alloca is changed into a nicer type and possibly split into multiple allocas (with volatile accesses). I conservatively added checks in r363462 to avoid using the alloca's natural address space for the new operations. I figure it's safer to not change the address space of volatile accesses, but I don't particularly care about this property. InferAddressSpaces avoids changing volatiles, but that was also my idea.
So, just to say this back to you, this would be treating an addrspacecast more like a bitcast when the source is known to be an alloca?
Honestly, this feels more than a bit suspect to me, but I can't argue against it on a principled basis. Semantics for addrspacecast are so ill specified that it's hard to say what is and isn't legal in terms of transforms. Just to be clear, the critical detail here to me is "how do we handle addrspacecast", not "how to do handle volatile". From what I can tell, the existing code for non-volatiles would already have surprising semantics. I don't see any need to have *different* semantics for volatile vs non-volatile accesses to an addrspacecast derived pointer. The asc either had semantics, or it didn't. Whether an access is later volatile shouldn't change that.
So, if anything, I'd either lean towards a more-aggressive transform which changed the AS for the volatile access, or not splitting through an addrspacecast at all. The intermediate semantics feel odd.
In the original patch for addrspacecast support in SROA, the only questions that came up were surrounding overflow behavior which is currently not specified in the LangRef. As for interchanging accesses in different address spaces, I think that is clearly allowed. The original and result pointer still need to access the same memory, so changing the address space or eliminating the access should be OK. I think volatile is the question here because different address spaces may have different caching behavior or something along those lines
It seems this patch contradicts the proposed LangRef patch. The LangRef patch says that you can't assume that a volatile operation yields the same result in different spaces.
And I think that's a fair condition. Otherwise we need to document that constraint for hardware vendors and change LangRef. Pick one of the patches basically :)
This isn't contradictory. This is leaving the address space untouched, and doing the other simplifications SROA does (like replace byte arrays with int). A non-volatile access in this case would have been rewritten to point to the original alloca address space
Ah, my bad, I misread the summary.
The patch keeps the AS of the original instructions.
So the concept looks correct to me. But I don't know enough of the SROA's code to know if all transformations that are done are ok. The test coverage doesn't look very extensive.
Looks reasonable. I assume the case where this would be actually useful if an alloca gets split into parts that have volatile and non-volatile accesses, and the non-volatile accesses then get promoted?
llvm/lib/Transforms/Scalar/SROA.cpp | ||
---|---|---|
3033 | Can't we use getPtrToNewAI here and keep the rest of the code structure? | |
llvm/test/CodeGen/AMDGPU/flat-address-space.ll | ||
132 | Can we use a proper pointer here to avoid making the store UB? (A null pointer is probably fine, given the non-zero address space.) | |
llvm/test/Transforms/SROA/addrspacecast.ll | ||
177 | Why does this load not get promoted (to undef)? | |
182 | Test needs to be converted to opaque pointers. (We should probably add a check to guard against mixed opaque + typed pointer tests, where the typed pointers get promoted.) |
Yes, but I was mostly noticing the missing cleanup up of the aggregate types
llvm/lib/Transforms/Scalar/SROA.cpp | ||
---|---|---|
3033 | I'm not sure what you're asking here, the new code does use it below | |
llvm/test/Transforms/SROA/addrspacecast.ll | ||
177 | I don't know, but I see the same thing without the patch and without address spaces or volatiles involved |
This looks good to me but I feel I'm not familiar enough with SROA to approve the patch.
Is there sufficient test coverage for this?
Did you check and ensure that this works as intended
on the addressspace-heavy code and does not miscompile it?
Every AMDGPU alloca has a non-0 address space, and this passed regular testing at some point (which includes some specific I-want-to-break-the-stack volatile tests). Volatile stack objects aren't super common for GPU code since that kind of defeats the point
Can't we use getPtrToNewAI here and keep the rest of the code structure?