On subreg COPY operation the not-clobbered subreg of the destination register is considered to be implicitly used.
For example: on "%vreg1.sub0 = COPY %vreg2" >> %vreg1.sub1 is implicit Use.
W/o lane lane masks tracking, destination register of the COPY is considered live in in current scheduling block. Here scheduling block has meaning as described in SIScheduler design.
Since it is not live out in any of the predecessors we end up in assert.
Details
- Reviewers
• tstellarAMD vpykhtin axeldavy alex-t
Diff Detail
Event Timeline
There is a better way to enforce correct sub-register liveness tracking.
I should better switch EnableSIScheduler subtarget feature ON so that the liveness information is already correct at the point where RPTracker uses it.
SIMachineScheduler really treat not-clobbered sub-registers as an implicit uses and produce incorrect liv in set for the scheduling block.
Some time ago someone added a condition to the SISubtarget::overrideSchedPolicy function:
if (!enableSIScheduler())
Policy.ShouldTrackLaneMasks = true;
So, the correct answer is:
in case sub-registers liveness a tracked SIScheduleDAGMI::initRPTracker should call init with TrackLaneMasks = true
Now we have a bug in SIScheduleDAGMI that was covered with a lame fix mentioned above.
More tests are definitely needed. SIScheduler currently is the worst tested file in the backend: http://llvm.org/reports/coverage/lib/Target/AMDGPU/index-sort-l.html
I'm not going to switch it ON by default. BTW we have no command line option for that - just Subtarget Ctor initialized with 'false' for this feature.
I ran it on SGEMM and experienced no crashes. That is a kind of good testing itself.
As far as I can see si-scheduler.ll enables it with --misched=si -mattr=si-scheduler. A test with si scheduler enabled to check it does not fail would be desirable.
I have hard time thinking what differences the subreg tracking causes.
The reason I had advised to disable it with this scheduler when subreg tracking was introduced is I couldn't exactly figure out what are the implications of tracking subregs.
Possibly tracking subregs require as well changes to initRegPressure (Are the assumptions at the end still valid ?), or introduce changes to the way the scheduler works to track subregs as well instead of regs.
Could you explain why your patch is sufficient ?
This is a temporary solution. Given that nobody uses SI misched except AMD I had a hope that this change does not break anything.
The aim of this patch was to fix concrete crash to make possible investigate the SI misched behavior on concrete cases.
We need this to find out is this scheduler can solve specific problems.
Register tracking algorithm need to be revised if we want count lane masks. This is obvious.
Now I'm looking on the another crash due to the similar reason. I'm going to come up with the real patch soon.
As for this one I'd prefer to keep it in case it doesn't break things for you.
I suspect it causes a regression on several games.
https://bugs.freedesktop.org/show_bug.cgi?id=99748
The user is currently compiling with this patch reverted, and we'll have confirmation it causes the regression.
I believe adding lanemask tracking to sisched will be non trivial work (and I think it would complexify the code, which is already complex).
Basically the scheduler does its own tracking (since it regroups instructions into blocks, and you want to track how scheduling a given block would affect register usage).
Thus its own tracking should be adapted to take lanemask into account.
I'm afraid turning RPTracker's lanemask tracking on breaks sisched assumptions when it builds its blocks.
With lanemask tracking disabled, it's not supposed to crash. Perhaps we could we look at fixing the example you reported ?
Which assert is raised ? The one in decreaseLiveRegs ?
The case you describe is weird, because the LiveIn registers are initialised with the registers that the RP tracker finds are used without being defined.
Thus in your example, it shouldn't crash.
"%vreg1.sub0 = COPY %vreg2"
%vreg1.sub0 is defined
%vreg1.sub1 is implicitly used
w/o sub-registers tracking %vreg1 is cinsidered as used but not defined in the current block i.e. live in. Same time it is not live out in any of predecessors so we've got an assert in decreaseLiveRegs.
My question is: why do we count live ins in such manner? Why we look at the "used but not defined"? In my understanding, Live in is a register that is live out in any of the predecessors and that's it. Any other ways to count live ins look odd to me.
It looks like with --misched=si, the lane mask tracking is enabled, while with -mattr=si-scheduler it isn't.
That may explain why you were seing the crash in the first place, perhaps you were using --misched=si.
Should be better to use enableSubRegLiveness() query.