This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] [llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA)
ClosedPublic

Authored by artem.tamazov on Mar 2 2016, 12:43 PM.

Details

Summary

Added: Tests for implemented features.
TODO: Support for TTMP quads, comma-separated syntax in "[]" and more.

Diff Detail

Repository
rL LLVM

Event Timeline

artem.tamazov retitled this revision from to [AMDGPU] [llvm-mc] Support of Trap Handler registers (TTMP0..11 and TBA/TMA).
artem.tamazov updated this object.
artem.tamazov set the repository for this revision to rL LLVM.
artem.tamazov added a project: Restricted Project.
artem.tamazov added a subscriber: Restricted Project.
tstellarAMD added inline comments.Mar 2 2016, 7:41 PM
lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
391

What will these be used for?

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
549

enum name should be capitalized: RegisterKind

test/CodeGen/AMDGPU/and.ll
259–260

Is adding -DAG here really necessary? Since both patterns define variables, I would expect them to be matched in the order they were written.

arsenm edited edge metadata.Mar 2 2016, 7:58 PM

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

arsenm added inline comments.Mar 2 2016, 8:00 PM
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
549

it cal also be enum RegisterKind with no typedef

567

Dead code

lib/Target/AMDGPU/SIRegisterInfo.td
68

You can use a loop here over the numbers and add to get the encoding value

artem.tamazov marked an inline comment as done.Mar 3 2016, 1:42 AM

Thanks for reviewing. I am going to fix issues found and get back to fixing/improving CodeGen tests (to make those less dependent on inst scheduling) til no regressions.

lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
391

For implementing GPU trap/exception handlers. Writing of trap handlers require register allocation scheme which is different from one used for normal code. I am assuming that CodeGen should not use TBA/TMA/TTMPxx registers.

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
567

Don't mind, I will update that on next iteration or remove prior submit. We need support for TTMP quads for writing trap handlers in assembly.

lib/Target/AMDGPU/SIRegisterInfo.td
68

Yes, but... Please let me keep this small fragment just for aesthetic reasons. At least something easily understandable in .td files )))

268–270

I'm going to update this on next iteration or remove prior submit.

test/CodeGen/AMDGPU/and.ll
259–260

Yes. For verde target (but not for tonga), the next two insns (s_mov_b32 and s_movk_i32) are emitted prior these two buffer loads and missed. Adding -DAG to loads fixes that.

artem.tamazov marked an inline comment as done.Mar 3 2016, 2:02 AM

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

It's a pity... So I will continue with tests. Perhaps the problem is inevitable or hard to resolve (e.g. if it is related to some unspecified aspects of C++ and/or llvm containers etc).

artem.tamazov updated this object.
artem.tamazov edited edge metadata.

Small fixes as per review etc.
All failing tests now handled (fixed of marked XFAIL).
Hopefully, the final version.

artem.tamazov marked 5 inline comments as done.Mar 16 2016, 1:33 PM

I'm not really happy about all the XFAILs and test changes (I know these aren't your fault). Is it possible to support these in the compiler without adding the registers to the SReg_32 class or adding all the trap register classes?

I'm not really happy about all the XFAILs and test changes (I know these aren't your fault).

Me too, especially taking into account how much efforts it took to double-check all those failures ))

Is it possible to support these in the compiler without adding the registers to the SReg_32 class or adding all the trap register classes?

I am not llvm expert yet, but I guess it is possible but would require workaround-style coding (like duplication of code and similar). TTMP registers can be used just like scalar registers (except that these can be written only in exception context).

I believe that test changes will be required sooner or later. Perhaps we can minimize the changes - for example, switch off MI scheduler instead of making tests scheduling-tolerant. But that may lead to narrower testing coverage etc.

WRT XFAIL tests - there are only two cases. I think that ds_write2 case just reveals some different problem related to load-store-opt. Another case (si-triv-disjoint-mem-access) is related to the machinery I am not quite familiar with, e.g. @reorder_constant_load_local_store_constant_load. I am going to submit bugzillas for both so we will not overlook those in the future.

I don't want to merge this with all these test changes. Maybe we can wait until someone can make the scheduler less susceptible to register changes.

artem.tamazov added a comment.EditedMar 23 2016, 6:52 AM

I don't want to merge this with all these test changes. Maybe we can wait until someone can make the scheduler less susceptible to register changes.

All right. As far as I understand, the change is OK (except arguable test changes), and the problem lies in the scheduler.

Yes, we can wait, but not too long.
We need to provide trap handler support for the debugger team in reasonable time span.
Do you or Matt know who can update the scheduler and when?
If you have an old patch (which fixes that issue but have been reverted by some reason), we can find time to make that patch acceptable for the llvm trunk.

NOTE: you do not need to perform actual merge. As soon as review is accepted, I will rebase the changes to the tip and update the diff.
IMPORTANT: The change contains fixes for the following issues in the tests:
test\CodeGen\AMDGPU\ctlz.ll
	v_ctlz_i32: Error at line 39. Line 38 is suspicious.
	v_ctlz_i8: Error at line 102.
	v_ctlz_i64: Error at line 147.
test\CodeGen\AMDGPU\ds_write2st64.ll
	simple_write2st64_two_val_max_offset_f32: Error at line 47.
test\CodeGen\AMDGPU\setcc-opt.ll
	zext_bool_icmp_ne_1: Missing s_engpgm after line 145.
test\CodeGen\AMDGPU\sra.ll
	s_ashr_63_i64: Error at line 234.
test\CodeGen\AMDGPU\udivrem.ll
	test_udivrem: Three (too many) v_subrev_i32_e32 insts. Actual ISA contains only two of those. I do not know how current version of test pass.

I recommend integration of those changes regardless of the the rest.

You can try applying these 3 patches before your change and see if it helps to reduce the test changes:

http://reviews.llvm.org/D18451
http://reviews.llvm.org/D18452
http://reviews.llvm.org/D18453

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

This current patch is a good example of this. Just take a look at any of the changed tests with and without this patch.

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

This was r252674, which was reverted. I think this help with some similar random effects, but it was a while ago now

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

Hi Matthias, I would appreciate if you share any news w.r.t. the scheduling issue. For example, is it so that any of patches mentioned by Tom and Matt solve the problem? Thanks!

I've seen random scheduling changes before by adding registers. I had a patch which I think solved this a few months ago, but it was reverted and I haven't had time to find out what the problem was

Can you point me to this patch? I don't know why the scheduler would behave that much differently with more registers. Of course register pressure changes, but my understanding is that GPUs rarely hit the register limit anyway.

Hi Matthias, I would appreciate if you share any news w.r.t. the scheduling issue. For example, is it so that any of patches mentioned by Tom and Matt solve the problem? Thanks!

I've committed some scheduling changes to trunk. You might want to try rebasing this patch to see if any of those patches helped.

artem.tamazov updated this object.
artem.tamazov edited edge metadata.

Thanks to Tom, scheduling changes do not appear anymore. No more unnecessary test changes in this patch.

tstellarAMD accepted this revision.Apr 7 2016, 11:23 AM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Apr 7 2016, 11:23 AM
This revision was automatically updated to reflect the committed changes.