Added: Tests for implemented features.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
391 | What will these be used for? | |
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp | ||
549 | enum name should be capitalized: RegisterKind | |
test/CodeGen/AMDGPU/and.ll | ||
259–260 | Is adding -DAG here really necessary? Since both patterns define variables, I would expect them to be matched in the order they were written. |
I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was
Thanks for reviewing. I am going to fix issues found and get back to fixing/improving CodeGen tests (to make those less dependent on inst scheduling) til no regressions.
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp | ||
---|---|---|
391 | For implementing GPU trap/exception handlers. Writing of trap handlers require register allocation scheme which is different from one used for normal code. I am assuming that CodeGen should not use TBA/TMA/TTMPxx registers. | |
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp | ||
567 | Don't mind, I will update that on next iteration or remove prior submit. We need support for TTMP quads for writing trap handlers in assembly. | |
lib/Target/AMDGPU/SIRegisterInfo.td | ||
68 | Yes, but... Please let me keep this small fragment just for aesthetic reasons. At least something easily understandable in .td files ))) | |
268–270 | I'm going to update this on next iteration or remove prior submit. | |
test/CodeGen/AMDGPU/and.ll | ||
259–260 | Yes. For verde target (but not for tonga), the next two insns (s_mov_b32 and s_movk_i32) are emitted prior these two buffer loads and missed. Adding -DAG to loads fixes that. |
It's a pity... So I will continue with tests. Perhaps the problem is inevitable or hard to resolve (e.g. if it is related to some unspecified aspects of C++ and/or llvm containers etc).
Small fixes as per review etc.
All failing tests now handled (fixed of marked XFAIL).
Hopefully, the final version.
I'm not really happy about all the XFAILs and test changes (I know these aren't your fault). Is it possible to support these in the compiler without adding the registers to the SReg_32 class or adding all the trap register classes?
Me too, especially taking into account how much efforts it took to double-check all those failures ))
Is it possible to support these in the compiler without adding the registers to the SReg_32 class or adding all the trap register classes?
I am not llvm expert yet, but I guess it is possible but would require workaround-style coding (like duplication of code and similar). TTMP registers can be used just like scalar registers (except that these can be written only in exception context).
I believe that test changes will be required sooner or later. Perhaps we can minimize the changes - for example, switch off MI scheduler instead of making tests scheduling-tolerant. But that may lead to narrower testing coverage etc.
WRT XFAIL tests - there are only two cases. I think that ds_write2 case just reveals some different problem related to load-store-opt. Another case (si-triv-disjoint-mem-access) is related to the machinery I am not quite familiar with, e.g. @reorder_constant_load_local_store_constant_load. I am going to submit bugzillas for both so we will not overlook those in the future.
I don't want to merge this with all these test changes. Maybe we can wait until someone can make the scheduler less susceptible to register changes.
All right. As far as I understand, the change is OK (except arguable test changes), and the problem lies in the scheduler.
Yes, we can wait, but not too long.
We need to provide trap handler support for the debugger team in reasonable time span.
Do you or Matt know who can update the scheduler and when?
If you have an old patch (which fixes that issue but have been reverted by some reason), we can find time to make that patch acceptable for the llvm trunk.
test\CodeGen\AMDGPU\ctlz.ll v_ctlz_i32: Error at line 39. Line 38 is suspicious. v_ctlz_i8: Error at line 102. v_ctlz_i64: Error at line 147. test\CodeGen\AMDGPU\ds_write2st64.ll simple_write2st64_two_val_max_offset_f32: Error at line 47. test\CodeGen\AMDGPU\setcc-opt.ll zext_bool_icmp_ne_1: Missing s_engpgm after line 145. test\CodeGen\AMDGPU\sra.ll s_ashr_63_i64: Error at line 234. test\CodeGen\AMDGPU\udivrem.ll test_udivrem: Three (too many) v_subrev_i32_e32 insts. Actual ISA contains only two of those. I do not know how current version of test pass.
I recommend integration of those changes regardless of the the rest.
You can try applying these 3 patches before your change and see if it helps to reduce the test changes:
http://reviews.llvm.org/D18451
http://reviews.llvm.org/D18452
http://reviews.llvm.org/D18453
Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.
This current patch is a good example of this. Just take a look at any of the changed tests with and without this patch.
This was r252674, which was reverted. I think this help with some similar random effects, but it was a while ago now
Hi Matthias, I would appreciate if you share any news w.r.t. the scheduling issue. For example, is it so that any of patches mentioned by Tom and Matt solve the problem? Thanks!
I've committed some scheduling changes to trunk. You might want to try rebasing this patch to see if any of those patches helped.
Thanks to Tom, scheduling changes do not appear anymore. No more unnecessary test changes in this patch.
What will these be used for?