This is an archive of the discontinued LLVM Phabricator instance.

SIScheduler should track lane masks
AbandonedPublic

Authored by alex-t on Feb 2 2017, 4:23 AM.

Details

Summary

On subreg COPY operation the not-clobbered subreg of the destination register is considered to be implicitly used.
For example: on "%vreg1.sub0 = COPY %vreg2" >> %vreg1.sub1 is implicit Use.
W/o lane lane masks tracking, destination register of the COPY is considered live in in current scheduling block. Here scheduling block has meaning as described in SIScheduler design.
Since it is not live out in any of the predecessors we end up in assert.

Diff Detail

Event Timeline

alex-t created this revision.Feb 2 2017, 4:23 AM
alex-t edited the summary of this revision. (Show Details)Feb 2 2017, 4:42 AM
alex-t abandoned this revision.Feb 2 2017, 5:18 AM
axeldavy edited edge metadata.Feb 2 2017, 8:09 AM

If the patch fixes a crash, why is it abandonned ?

alex-t added a comment.Feb 2 2017, 8:14 AM

There is a better way to enforce correct sub-register liveness tracking.
I should better switch EnableSIScheduler subtarget feature ON so that the liveness information is already correct at the point where RPTracker uses it.

alex-t updated this revision to Diff 86823.Feb 2 2017, 8:47 AM

SIMachineScheduler really treat not-clobbered sub-registers as an implicit uses and produce incorrect liv in set for the scheduling block.
Some time ago someone added a condition to the SISubtarget::overrideSchedPolicy function:
if (!enableSIScheduler())

Policy.ShouldTrackLaneMasks = true;
So, the correct answer is:
in case sub-registers liveness a tracked SIScheduleDAGMI::initRPTracker should call init with TrackLaneMasks = true
Now we have a bug in SIScheduleDAGMI that was covered with a lame fix mentioned above.

rampitec edited edge metadata.Feb 2 2017, 9:31 AM

Is there any test which crashes without it? Can you add one?

lib/Target/AMDGPU/AMDGPUSubtarget.cpp
263

Should be better to use enableSubRegLiveness() query.

lib/Target/AMDGPU/SIMachineScheduler.h
443–444

Same here.

More tests are definitely needed. SIScheduler currently is the worst tested file in the backend: http://llvm.org/reports/coverage/lib/Target/AMDGPU/index-sort-l.html

More tests are definitely needed. SIScheduler currently is the worst tested file in the backend: http://llvm.org/reports/coverage/lib/Target/AMDGPU/index-sort-l.html

I'm not going to switch it ON by default. BTW we have no command line option for that - just Subtarget Ctor initialized with 'false' for this feature.
I ran it on SGEMM and experienced no crashes. That is a kind of good testing itself.

More tests are definitely needed. SIScheduler currently is the worst tested file in the backend: http://llvm.org/reports/coverage/lib/Target/AMDGPU/index-sort-l.html

I'm not going to switch it ON by default. BTW we have no command line option for that - just Subtarget Ctor initialized with 'false' for this feature.
I ran it on SGEMM and experienced no crashes. That is a kind of good testing itself.

As far as I can see si-scheduler.ll enables it with --misched=si -mattr=si-scheduler. A test with si scheduler enabled to check it does not fail would be desirable.

alex-t updated this revision to Diff 86955.Feb 3 2017, 6:31 AM

Updated according to reviewers comments

rampitec accepted this revision.Feb 3 2017, 10:03 AM

LGTM.

This revision is now accepted and ready to land.Feb 3 2017, 10:03 AM
rampitec resigned from this revision.Feb 9 2017, 4:38 PM
This revision now requires review to proceed.Feb 9 2017, 4:38 PM

I have hard time thinking what differences the subreg tracking causes.

The reason I had advised to disable it with this scheduler when subreg tracking was introduced is I couldn't exactly figure out what are the implications of tracking subregs.

Possibly tracking subregs require as well changes to initRegPressure (Are the assumptions at the end still valid ?), or introduce changes to the way the scheduler works to track subregs as well instead of regs.

Could you explain why your patch is sufficient ?

alex-t accepted this revision.Feb 10 2017, 10:54 AM

This is a temporary solution. Given that nobody uses SI misched except AMD I had a hope that this change does not break anything.
The aim of this patch was to fix concrete crash to make possible investigate the SI misched behavior on concrete cases.
We need this to find out is this scheduler can solve specific problems.

Register tracking algorithm need to be revised if we want count lane masks. This is obvious.
Now I'm looking on the another crash due to the similar reason. I'm going to come up with the real patch soon.
As for this one I'd prefer to keep it in case it doesn't break things for you.

This revision is now accepted and ready to land.Feb 10 2017, 10:54 AM

I suspect it causes a regression on several games.

https://bugs.freedesktop.org/show_bug.cgi?id=99748

The user is currently compiling with this patch reverted, and we'll have confirmation it causes the regression.

The user says with the patch reverted, things work again.

I believe adding lanemask tracking to sisched will be non trivial work (and I think it would complexify the code, which is already complex).

Basically the scheduler does its own tracking (since it regroups instructions into blocks, and you want to track how scheduling a given block would affect register usage).
Thus its own tracking should be adapted to take lanemask into account.

I'm afraid turning RPTracker's lanemask tracking on breaks sisched assumptions when it builds its blocks.

With lanemask tracking disabled, it's not supposed to crash. Perhaps we could we look at fixing the example you reported ?
Which assert is raised ? The one in decreaseLiveRegs ?
The case you describe is weird, because the LiveIn registers are initialised with the registers that the RP tracker finds are used without being defined.
Thus in your example, it shouldn't crash.

alex-t added a comment.EditedFeb 13 2017, 5:25 AM

With lanemask tracking disabled, it's not supposed to crash. Perhaps we could we look at fixing the example you reported ?
Which assert is raised ? The one in decreaseLiveRegs ?
The case you describe is weird, because the LiveIn registers are initialised with the registers that the RP tracker finds are used without being defined.
Thus in your example, it shouldn't crash.

"%vreg1.sub0 = COPY %vreg2"

%vreg1.sub0 is defined
%vreg1.sub1 is implicitly used
w/o sub-registers tracking %vreg1 is cinsidered as used but not defined in the current block i.e. live in. Same time it is not live out in any of predecessors so we've got an assert in decreaseLiveRegs.

My question is: why do we count live ins in such manner? Why we look at the "used but not defined"? In my understanding, Live in is a register that is live out in any of the predecessors and that's it. Any other ways to count live ins look odd to me.

patch reverted
r295054 git svn

It looks like with --misched=si, the lane mask tracking is enabled, while with -mattr=si-scheduler it isn't.
That may explain why you were seing the crash in the first place, perhaps you were using --misched=si.

alex-t abandoned this revision.Jun 15 2017, 7:21 AM