This reduces register pressure and helps avoid some unnecessary waits
as well.
Details
- Reviewers
• tstellarAMD
Diff Detail
Event Timeline
You also need to add lit tests for this. Take a look in test/CodeGen/AMDGPU for examples.
lib/Target/AMDGPU/AMDGPUInstrInfo.cpp | ||
---|---|---|
326–327 | Coding style braces go on the same line as function signature. | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
2098–2099 | Coding style. | |
2128–2131 | Coding style for the brace, and also function arguments should go on the same line. You should reivew. http://llvm.org/docs/CodingStandards.html You can also run clang-format on the file to get the correct style, but I'm sure there are other errors in the file, so this will probably update more than just your code. | |
lib/Target/AMDGPU/SIInstrInfo.td | ||
2787–2788 | You only need to create the mapping for the pseudo instructions, so I think this means you can drop isPseudo and Subtarget from the list. | |
lib/Target/AMDGPU/SIInstructions.td | ||
921 | Interesting, I did not now it was possible to inherit from multiclasses and classes, but as mentioned above, I think the pseudo instructions should inherit from ResizableLoad instead. |
I think this should be done on the IR intrinsics in instcombine. Is there some advantage to doing it in the DAG? We don't produce formatted loads for any other reason
Matt, can you give me some hints as to how to go about this? Keep in mind that things are icky when reducing to a single f32, because we need to bypass the EXTRACT_SUBREG.
Do you think adjustWritemask (for texture sampling) should be moved analogously? There may be a reason why that is done where it's done.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2128–2131 | Heh. I actually read those standards, and produced what I feel is the most readable code following the rules spelled out in there (e.g., the braces thing is not actually in there). Function arguments on the same line is difficult and counter-productive for readability with the 80 columns limit (and also not in that document), etc. Anyway, thanks for telling me about clang-format (yet another thing not mentioned in that document...) - I'll run that later and see what it has to say. | |
lib/Target/AMDGPU/SIInstrInfo.td | ||
2787–2788 | True, I only need them for the pseudo instructions, but how can I tell TableGen about that? The issue is that the _si and _vi variants are generated in the inner-most defm, and I attach the ResizableLoad class outside, so the _si and _vi variants are added to the InstrMapping table even though I don't need them, and dropping isPseudo and Subtarget leads to a legitimate error about multiple matches in the same mapping table row. The thing is, it _would_ be nicer to add only the pseudos to the mapping table, but I ultimately gave up on that because I don't know how to add the ResizableLoad class only to the pseudo instructions that are loads. The defm at which the split into pseudo, si and vi happens is used by all MUBUF instructions, including stores and atomics. If you have a cleaner way to do it, I'd be delighted, but at some point I just gave up fighting TableGen and went with what you see here. | |
lib/Target/AMDGPU/SIInstructions.td | ||
921 | See the above. I see no way of doing that other than duplicating MUBUF_m into MUBUF_m_load, MUBUF_m_store, and MUBUF_m_atomic (or something like that). And even then, there's still the issue of what to do about the other BUFFER_LOAD_ instruction that also go via MUBUF_Load_Helper. |
Hmm.... conceptually, the issue with TableGen is that we have multiple dimensions in which instructions are created (pseudo/si/vi, offen/idxen/both, different types of load/store/atomic), and we want to filter out the rows in the InstrMapping based on _two_ (or more?) of those dimensions, which is something that InstrMapping does not support. So perhaps the correct way is to extend what InstrMapping can do - perhaps allow filtering rows based on values instead of just based on an attached class.
What do you think?
I still need to discuss with Matt the best way to implement this.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2128–2131 | Ok, you are right, those things aren't in the document, I think clang-format may be the definitive coding style now. Let's just keep things consistent with the rest of the file, which means braces on the same line and when lines are too long arguments, argument on the next line are aligned with arguments on the first line. | |
lib/Target/AMDGPU/SIInstructions.td | ||
921 | I think it will work if you have MUBUF_Psuedo class inherit from ResizableLoad. |
That's why I think doing it in instcombine is easier. The set of possible users is shrunk, so you don't need to worry about as many of the variants of copy. You just need to see if you only have constant extractelement and shufflevector users
It can be done in LLVM instcombine. The intrinsics would have to be moved to include/llvm/IR/IntrinsicsAMDGPU.td (and ideally changed to use the amdgcn prefix and clean any other mistakes in how these are currently defined). There are already a few intrinsics handled there.
I thought about it some more and I think it's probably a bit much for instcombine. It should probably go in a new AMDGPU CodeGenPrepare like IR pass
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
921 | Sorry, missed that. That would cause buffer_store and _atomic instructions to also inherit from ResizableLoad, if I'm reading the code right. |
Coding style braces go on the same line as function signature.