Currently, the compiler crashes in instruction selection of global load/stores in gfx600 due to the lack of FLAT instructions. This patch fix the crash by selecting MUBUF instructions for global load/stores in gfx600.
Details
Diff Detail
Event Timeline
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | Is this asking if the code contains use of the generic address space? | |
| 199 | This seems a very strange way for carrying this knowledge to the later code. Why not use a separate variable with a clear name? | |
| llvm/test/CodeGen/AMDGPU/si-global-buffer.ll | ||
| 1–18 ↗ | (On Diff #308957) | I think it would be better to add a run line to existing memory-legalizer tests rather than have a separate test. gfx600 now supports the memory model and so can now be tested by the memory-legalizer tests. | 
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 130–132 | This isn't a generation check, and should be the same as the above addr64 check | |
| 132 | This string search isn't actually a reliable test but this is already used | |
| 133 | I'm not sure printing a warning here is particularly helpful. It's not going to be reported in any meaningful way. A proper Diagnostic would be better, but I don't think this would be the right place to emit it. | |
Comments inline
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 130–132 | I'm not sure I follow your comment, this patch uses MUBUF instructions only for gfx60x. The addr64 check returns true for both southern_island and sea_island processors. I think if we conditionally check only for hasAddr64, that may not be enough to handle the case where (FlatForGlobal = false, for SOUTHERN_ISLANDS and FlatForGlobal = true, for SEA_ISLANDS). | |
| 133 | I will remove this check and warning here. | |
| 198 | No, this isn't a check for generic address space. The GPU string is set to a string "generic" by default when no target cpu is specified. It has nothing to do with address spaces access. | |
| 199 | In the previous version for this piece of code, the Gen member of this class is initialized to sea_islands when OS is HSA and southern_islands for all other cases and then the requested 'gen' is picked by the tablegen generated function "ParseSubtargetFeatures" in initializeSubtargetDependencies. The above mentioned behavior is not correct, because it initializes the Gen is sea_islands for "-mtriple=amdgcn-amd-amdhsa -mcpu=gfx600" where the correct "Gen" would be southern islands. This piece of change fixes this bug, and also preserves the existing functionality. I don't think we need a separate variable since this is the only place we are handling. please correct me if I'm wrong. | |
fixed a typo in comments.
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 199 | Sorry there is a typo, in the second passage it should be: " The above mentioned behavior is not correct, because it initializes the Gen to  | |
Removed error reporting based on string comparison. Updated the memory legalizer tests to include amdhsa for gfx60x
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | I would probably just error out if we have a generic CPU + amdhsa requested. It seems cleaner. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | Thanks for your comment, this patch preserves the earlier functionality except the use of mubuf. Perhaps we can address your suggestion in a separate patch with some added context, and exploring the possibilities of reporting the error in other target-triple, cpu pairs? | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | It is OK to have it in a separate commit, but I do not follow logic here. You are setting SI for any NON-generic chip. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | For a non-generic cpu, the correct Gen member is set by the ParseSubtargetFeatures function call (generated by tablegen) in initializeSubtargetDependencies function. It is similar to the previous version of this code, where Gen is set to SI if OS is not HSA and set to CI if OS is HSA and correct Gen is set by the call to ParseSubtargetFeatures function. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | Then this line is a no-op? You say: "for a non-generic CPU gen is initialized elsewhere". An this change only forces SI for a non-generic CPU. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | Actually there are bunch of lit tests which assumes Gen as CI if cpu isn't specified with AMDHSA OS. This change just means for HSA OS if the cpu isn't specified it initializes to CI, for non-HSA if cpu isn't specified it initializes to SI.  | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | I still do not understand how does that correspond to the condition. That is the old condition: TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS And that is literally what you have described: "This change just means for HSA OS if the cpu isn't specified it initializes to CI, for non-HSA if cpu isn't specified it initializes to SI". At the same time this code: !GPU.contains("generic") ? SOUTHERN_ISLANDSReads: use SI for any CPU besides generic regardless of the triple. | |
Comments inline
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 198 | Sorry for the delay. I think I didn't explain it very well in my previous comments. So I take an attempt to describe here:  TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS for HSA OS type and -mcpu=gfx600 the above code forces Gen member variable to be SEA_ISLANDS where it should have been SOUTHERN_ISLANDS. This bug only shows up when you try to use HSA OS for -mcpu=gfx600 processors. What I'm trying to do is initialize the Gen member variable in following cases: if mcpu is specified, set Gen to the correct generation that cpu belongs to, by initializing it with SOUTHERN_ISLANDS. if mcpu is not specified (in this case GPU = "generic-hsa"), set Gen to the SEA_LANDS to be in parity with previous code I didn't see how I can pick up the correct generation that the processor belong to without forcing this in constructor. I would like to know you thoughts on this?  | |
You are forcing SI for any SPECIFIED GPU instead:
!GPU.contains("generic") ? SOUTHERN_ISLANDSRead this code line: if CPU is NOT generic, force it to SI.
Ugh. I see what you mean now, it took debugger to understand. The real value of Gen is assigned below in the GCNSubtarget::initializeSubtargetDependencies().
But I still do not think this is a right thing to do to accept amdhsa on SI, even if you turn off flat instructions. SI does not support HSA.
As far as I the support of HSA in SI processors. Tony suggested that gfx60x processors should support AMDHSA OS because it is capable of supporting OpenCL 1.2 which not need generic pointers. This topic was discussed in amdgcn weekly and using MUBUF instructions for global address space in gfx60x is decided. The same is also documented in AMDGPU User Guide earlier.
But it is not. In fact AMDGPUUsage.rst says that SI does NOT support amdhsa and it is only supported starting from gfx700:
**GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-----------------------------------------------------------------------------------------------------------------------
``gfx600``  - ``tahiti``    ``amdgcn``   dGPU                    - Does not      - *pal-amdpal*
                                                                   support
                                                                   generic
                                                                   address
                                                                   space
``gfx601``  - ``pitcairn``  ``amdgcn``   dGPU                    - Does not      - *pal-amdpal*
            - ``verde``                                            support
                                                                   generic
                                                                   address
                                                                   space
``gfx602``  - ``hainan``    ``amdgcn``   dGPU                    - Does not      - *pal-amdpal*
            - ``oland``                                            support
                                                                   generic
                                                                   address
                                                                   spaceIt is only amdpal listed here.
What was discussed is that we can use buffer instructions for global on SI even if amdhsa is not supported.
At one point, while you were on vacation, the consensus was "support amdhsa OS with gfx6, and fail to compile in the presence of e.g. generic pointers".
Has there been more discussion since then?
Yeah, support amdhsa OS
@scott.linder there is no discussion since then.
@rampitec the final consensus is to use buffer instructions for global address space accesses in SI in presence of amdhsa. In other OS types for SI, backend generates buffer instructions for global addr space access already.
I updated AMDGPUUsage to reflect the current state which is that gfx6* does not support amdhsa. But if this patch allows gfx6* to be supported without generic addresses then that table should also be updated as part of the patch.
We did discuss this further and decided that gfx6* should report it supports amdhsa with the restriction that it does not support generic addresses. AMDGPUUsage already notes the restriction in all relevant sections. The LIT tests would also need updating accordingly. The COMgr tests would also need updating.
The alternative would be that the compiler reports an error message for gfx6* if the triple specified amdhsa (rather than crash as it does currently). But it was felt that the amdhsa ABI can allow targets that do not support generic to be included provided it is documented. There is no need to artificially restrict them as a user may want to try using the target on an HSA complaint runtime to implement OpenCL 1.2.
It does not allow for generic pointers to be used on SI.
We did discuss this further and decided that gfx6* should report it supports amdhsa with the restriction that it does not support generic addresses. AMDGPUUsage already notes the restriction in all relevant sections. The LIT tests would also need updating accordingly. The COMgr tests would also need updating.
AMDGPUUsage does not list amdhsa for all 3 SI targets, yet it is going to be accepted with this patch. If that is so then amdhsa has to be listed in the table for these targets.
The alternative would be that the compiler reports an error message for gfx6* if the triple specified amdhsa (rather than crash as it does currently). But it was felt that the amdhsa ABI can allow targets that do not support generic to be included provided it is documented. There is no need to artificially restrict them as a user may want to try using the target on an HSA complaint runtime to implement OpenCL 1.2.
The table is indicating what runtimes support which ABIs. Since ROCm does not support gfx6* it would still be correct to not list it in the table. It may be useful to have another column to indicate which ABIs are supported by which processors.
Pfff... I do not like it, but OK.
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 199 | Please at least remove the parenthesis and reformat this statement. | |
I am not completely happy with it either, but wanted to address questions of which runtimes support which processors. So I looked at what ROCr, VDI and PAL runtimes currently support and added that to the column that already existed but was woefully out of date and did not include PAL information. MESA information is also still missing.
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 199 | I think I finally understand what this is doing, but I still think it is worth trying this another way as it seems to have confused each of Stas, Tony, and I to some extent. I don't think this is Praveen's doing, the underlying mechanisms we use for this are just not very intuitive. What I'd suggest is adding a new INVALID = 0 variant to enum AMDGPUSubtarget::Generation, and changing this initializer to Gen(INVALID). Then in GCNSubtarget::initializeSubtargetDependencies, after the call to ParseSubtargetFeatures, check for (Gen == INVALID), and implement whatever "generic" CPU stuff we want. I'd also accompany both places (the enum and the code after ParseSubtargetFeatures) with a summary of the situation in a comment. Maybe:   class AMDGPUSubtarget {
  public:
    enum Generation {
~     // INVALID is only used as an initial value before `ParseSubtargetFeatures`,
~     // and as a way to detect the "generic" processor case.
~     INVALID,
~     R600,
~     R700,
~     EVERGREEN,
~     NORTHERN_ISLANDS,
~     SOUTHERN_ISLANDS,
~     SEA_ISLANDS,
+     VOLCANIC_ISLANDS,
+     GFX9,
+     GFX10
    };     ParseSubtargetFeatures(GPU, /*TuneCPU*/ GPU, FullFS);
+   // Implement the "generic" processors, which act as the default when no
+   // generation features are enabled (e.g. for -mcpu=''). There are two variants
+   // of the generic processor, one for HSA and one for everything else.
+   if (Gen == INVALID) {
+     Gen = TT.getOS() == Triple::AMDHSA ? SEA_ISLANDS : SOUTHERN_ISLANDS;
+   }Thoughts? If this doesn't seem appreciably better to everyone I'm also OK with the patch as-is. As a separate issue, which we can discuss and address in a separate patch if we choose to do anything, is why we need/want the "generic" processor to actually be two different processors, with the decision being based on the OS. Tony suggested, and I think I agree, that we should probably just pick one generation that is the "lowest common denominator" that supports the major things we need. As the flat address space is pretty fundamental why not just have this be SEA_ISLANDS? Then we just have: if (Gen == INVALID) {
  Gen = SEA_ISLANDS;
}Doing this breaks about a dozen lit tests, which I assume can all be fixed by either adding -mcpu=gfx600 (i.e. they no longer test "generic") or updating some check lines. | |
I like what Scott describes as it seems intuitively obvious and cleaner. Are there any concerns for making these changes?
Updated the code according to scott's suggestion. And I tried to compile sample program with all possible combinations of processors and ABIs( amdhsa, mesa3d, amdpal), it seems like the backend supports all the ABIs for all processors. So Instead of adding a separate column, (since values of processors will be the same) I added a text pharse above the processor table which documents. Is it right thing to do so?
thanks
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 105 | I suggest: <p>Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following exceptions: * `amdhsa` is not supported on the `r600` architecture (see :ref:`amdgpu-architecture-table`). | |
Updated the AMDGPUUsage doc. Also I don't have commit access, please commit this patch on my behalf. 
Thanks a lot your time.
| llvm/docs/AMDGPUUsage.rst | ||
|---|---|---|
| 108 | Need blank line before start of list for reStructure to format it correctly. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | Seems +flat-for-global should not be added if SI. Suggest pulling up later logic to here and combine. | |
| 115–116 | Should we consider using AMDGPUSubtarget::SEA_ISLANDS as the "generic" processor for all OSs? That can be a separate patch. | |
| 125–134 | Would seem better to combine these? Perhaps move this up to be adjacent or combined with code above that sets flat-for-global. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | But the gen member variable is set to correct generation only after the call to parsesubtargetfeatures function, so it would be incorrect to use getGeneration function in this place. I think it would be better to conditionally select later. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | But could that earlier code be using some other means to determine if the target supports flat operations? Is there no target property for that? If not then leaving "as is" is fine with me. Just would be nice if the logic for each aspect was logically organized together rather than spread over different places making it hard to figure out what is happening. | |
If this is fine, place land. I don't have commit access yet.
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | As far as I know there is no target property for that. I agree that it would be very much clear to have a common place for this mechanism | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | There is FeatureFlatAddressSpace, but it will also only be set as part of ParseSubtargetFeatures. I'm not sure why this needs to be here, unless some parts of ParseSubtargetFeatures depend on these features being set or not? Otherwise this could all be done below in one place. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | I suppose the issue is that putting these after the call to ParseSubtargetFeatures makes allowing it to be overridden by the user messier? It seems like it might be nice to just update TableGen to also maintain a bitmap of explicitly set features. Then we can avoid any string manipulation here (all of these things can move below the call, and can just make the default conditional on ExplicitySet.test(FeatureFoo)). That way everyone has a uniform "escape hatch" for implementing things like mutual exclusion, or more complicated "implies" relationships, without all the string manipulation and comparison. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 93 | It seems good to maintain a separate bitmap in tablegen but that should be separate patch since it is a refactor and applies to all features. | |
| llvm/lib/Target/AMDGPU/AMDGPUSubtarget.cpp | ||
|---|---|---|
| 92 | Suggest: // Turn on features that HSA ABI requires. Also turn on FlatForGlobal by // default. if (isAmdHsaOS()) | |
| 97–105 | I do not understand how this is supposed to work. The code does not seem to match the comment. The parser seems to convert all these features into a single WavefrontSizeLog2 so last + one would win. So why is this even needed? | |
| 111–113 | Suggest: // Implement the "generic" processors, which acts as the default when no // generation features are enabled (e.g for -mcpu=''). HSA OS defaults to the // first amdgcn target that supports flat addressing. Other OSes default to // the first amdgcn target. | |
| 122–132 | Suggest: // Targets must either support 64-bit offsets for MUBUF instructions, and/or
// support flat operations, otherwise they cannot access a 64-bit global
// address space.
assert(hasAddr64() || hasFlat());
// Unless +-flat-for-global is specified, turn on FlatForGlobal for targets
// that do not support ADDR64 variants of MUBUF instructions. Such targets
// cannot use a 64 bit offset with a MUBUF instruction to access the global
// address space.
if (!hasAddr64() && !FS.contains("flat-for-global") && !FlatForGlobal) {
  ToggleFeature(AMDGPU::FeatureFlatForGlobal);
  FlatForGlobal = true;
}
// Unless +-flat-for-global is specified, use MUBUF instructions for global
// address space access if flat operations are not available.
if (!hasFlat() && !FS.contains("flat-for-global") && FlatForGlobal) {
  ToggleFeature(AMDGPU::FeatureFlatForGlobal);
  FlatForGlobal = false;
}Also add the following to llvm/lib/Target/AMDGPU/AMDGPUSubtarget.h since that is the way other similar predicates are defined: bool hasFlat() const {
  return (getGeneration() > AMDGPUSubtarget::SOUTHERN_ISLANDS);
} | |
I suggest: