This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add lane tracking to SI Scheduler
Needs ReviewPublic

Authored by axeldavy on Mar 19 2017, 4:14 AM.

Details

Summary

This patch adds lane tracking to SI Scheduler.

To handle lanes,
. When a register is always used with all its lanes, it replaces the register with RegisterMaskPair(Reg, LaneBitMask::all())
. In the other cases, it determines a 'basis' of masks such that any register/lane usage can be decomposed into fake registers of RegisterMaskPair(Reg, LaneBitMask element of the basis).

Previously the code assumed that a Register cannot be defined if already defined. LaneMasks break this assumption.
Decomposing into unique RegisterMaskPair "registers" (such that the Lanes don't intersect) enables to reuse the previous assumption.
A RegisterMaskPair can only be defined if not already alive. Thus enables to use the previous code, with some updates to how register usage is computed.

depends on D31123 and D31587

Diff Detail

Repository
rL LLVM

Event Timeline

axeldavy created this revision.Mar 19 2017, 4:14 AM
axeldavy updated this revision to Diff 92297.Mar 19 2017, 4:45 PM

I removed the usage of the [] operator for DenseMap<unsigned, RegisterMaskPair> IdentifierToReg.

This doesn't need anymore RegisterMaskPair to have a default constructor with no argument.

rampitec added inline comments.Mar 21 2017, 5:14 PM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1788

Forgot '&'?

1961

Can you please avoid copy here and in function's arguments?

1991

auto& (and in other places too)?

axeldavy added inline comments.Mar 22 2017, 12:37 AM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1788

indeed. Thanks.

1961

Right, I can get
SIScheduleBlockScheduler::getPairsForRegs(const SmallVector<RegisterMaskPair, 8> &Regs)
{

SmallVector<RegisterMaskPair, 8> Result;
for (const auto &RegPair : Regs) {

But for the line with:
for (const auto RegPairRes : getPairsForReg(RegPair.RegUnit,

Do you mean just use &RegPairRes ?
Daniel Berlin suggested to use std::transform in combination with a new vector_inserter to replace the two loops.
That wouldn't prevent the fact there is in all cases a copy from getPairsForReg results into the result tab of getPairsForRegs.

I could probably solve the issue by adding a new getPairsForReg function that takes the Result array to append to, and using that helper with the two functions. Do you think that would be a good solution ?

1991

Ok

rampitec added inline comments.Mar 22 2017, 11:03 AM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1961

Yes, it sounds more efficient to me.

axeldavy updated this revision to Diff 92690.Mar 22 2017, 12:53 PM

Use more references.
Improved efficiency of getPairsForReg.

rampitec added inline comments.Mar 22 2017, 12:59 PM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1936

ToAppend is passed by value... As far as I understand it should not work, i.e. it should not return any values.

1965

Argument needs to be reference.

axeldavy updated this revision to Diff 92703.Mar 22 2017, 1:56 PM

Fix getPairsForRegs.

axeldavy updated this revision to Diff 92705.Mar 22 2017, 2:03 PM

Use reference for the input argument of getPairsForRegs.

rampitec edited edge metadata.Mar 22 2017, 3:08 PM

Use reference for the input argument of getPairsForRegs.

Thank you. It looks good, but w/o reference previous version should not been working. I.e. it looks like this code is not covered by any test.
I think you need to add tests for this.

Si scheduler has a test file: test/CodeGen/AMDGPU/si-scheduler.ll

But perhaps you meant a test with subreg ?
I'm afraid it's hard to design a test for subregs specifically, since if the ops using subregs are inside the same blocks, the blocks inputs and outputs won't have subregs. And block creation algorithms can vary.

Si scheduler has a test file: test/CodeGen/AMDGPU/si-scheduler.ll

But perhaps you meant a test with subreg ?
I'm afraid it's hard to design a test for subregs specifically, since if the ops using subregs are inside the same blocks, the blocks inputs and outputs won't have subregs. And block creation algorithms can vary.

You can create a MIR test. For example schedule-regpressure.mir runs only machine scheduler.

t-tye added a subscriber: t-tye.Mar 22 2017, 6:40 PM
tony-tye removed a subscriber: tony-tye.Mar 22 2017, 6:47 PM
axeldavy updated this revision to Diff 92876.Mar 23 2017, 3:31 PM

Check shouldTrackLaneMask when building the list of the Block outputs.
When not set, force the LaneMask to be full. Because of findDefsBetween, if can end up partial.
Fixes assertion fault on some shaders.

Si scheduler has a test file: test/CodeGen/AMDGPU/si-scheduler.ll

But perhaps you meant a test with subreg ?
I'm afraid it's hard to design a test for subregs specifically, since if the ops using subregs are inside the same blocks, the blocks inputs and outputs won't have subregs. And block creation algorithms can vary.

You can create a MIR test. For example schedule-regpressure.mir runs only machine scheduler.

That doesn't entirely fix the mentionned problem.
Sisched works by regroupping instructions into Blocks, and scheduling the Blocks between them.
For the test to really check subregs, we'd have to make sure some blocks have subreg outputs.
One way would be to have the subregs involve high latency instructions, because those are always in separated blocks when there is Data dependency.
That is complicated though, I'd have to really investigate how to write tests.

axeldavy updated this revision to Diff 93183.Mar 27 2017, 3:07 PM

Switched the order of the patch with https://reviews.llvm.org/D30147

Why SIScheduleBlockCreator::scheduleInsideBlocks() actually move instructions? Why it isn't done on the final scheduling?

lib/Target/AMDGPU/SIMachineScheduler.cpp
1388

duplicated code

lib/Target/AMDGPU/SIMachineScheduler.h
498

You can make it const and return reference, like

const SmallVector<RegisterMaskPair, 8> &getInRegs() const;

502

Ditto

Ok, I understood SIScheduleBlockCreator::scheduleInsideBlocks() moves instructions to actually get LiveIn and LiveOut set for a block, but this is rather heavy. Have you thought about getting those using DAG directly, not regpressure tracker? By the common sence the dependencies between blocks correspond to that liveness info. There is a problem however: LiveIn and LiveOut dependencies aren't modelled for boundary SUs. I have local patch that build such dependencies - scheduling region LiveIns edges comes from EntrySU, LiveOut - to ExitSU. Another problem - dependency edges doesn't have lanemask, need to think how to deal with this.

lib/Target/AMDGPU/SIMachineScheduler.cpp
387

I would move LIS->getInstructionIndex(*BeginBlock).getRegSlot(), LIS->getInstructionIndex(*EndBlock).getRegSlot() out of the loop

Ok, I understood SIScheduleBlockCreator::scheduleInsideBlocks() moves instructions to actually get LiveIn and LiveOut set for a block, but this is rather heavy. Have you thought about getting those using DAG directly, not regpressure tracker? By the common sence the dependencies between blocks correspond to that liveness info. There is a problem however: LiveIn and LiveOut dependencies aren't modelled for boundary SUs. I have local patch that build such dependencies - scheduling region LiveIns edges comes from EntrySU, LiveOut - to ExitSU. Another problem - dependency edges doesn't have lanemask, need to think how to deal with this.

This part is the slowest part of the scheduler (handleMove is very innefficient). It's probably a good idea to replace it with a faster system.

axeldavy added inline comments.Mar 28 2017, 11:26 AM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1388

This is duplicated with your iterative scheduler, and I suggested to have a common function for both schedulers and the default scheduler (you said you extracted the code from there), but you didn't take the comment into account.
Do you want a common function for both SI schedulers or with default scheduler too ?

lib/Target/AMDGPU/SIMachineScheduler.h
498

Right.

I had in mind we'd add the correct computation of the block liveins here (as for GCNScheduler), but for now we can return reference.

vpykhtin added inline comments.Mar 28 2017, 11:32 AM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1388

Yes, I remember that comment. This doesn't mean this code should be duplicated even more, I encorage you to make short functions whenever possible, even for your own use. This code is actually isn't very efficient and I planned to make efficient version, I should have been noted this, sorry.

axeldavy updated this revision to Diff 93294.Mar 28 2017, 2:02 PM

Updated according to comments.

rampitec added inline comments.Mar 28 2017, 3:19 PM
lib/Target/AMDGPU/SIMachineScheduler.cpp
1704

According to coding standard, if (...) return;

Axel, thanks for the update.

If you aren't going to add test for this, can you at least run your change on the debug build (or release with asserts) with sisched turned on by default over all AMDGPU lit tests? They mostly would fail on miscompare, but they should run without asserts.

Axel, thanks for the update.

If you aren't going to add test for this, can you at least run your change on the debug build (or release with asserts) with sisched turned on by default over all AMDGPU lit tests? They mostly would fail on miscompare, but they should run without asserts.

I think it is a good idea to add a test, I think it'd have to be something with a lot of loads and involving a lot of subregs. I just hadn't time yet to write it.

vpykhtin added inline comments.Mar 31 2017, 3:21 AM
lib/Target/AMDGPU/SIMachineScheduler.cpp
2067

SmallPtrSet is designed for pointer types but used for unsigned type, gcc complains, try consider SmallDenseSet.

I ran lit tests with sished with ShouldTrackLaneMasks=true enabled by default with this patch, the following tests asserted:

******************** TEST 'LLVM :: CodeGen/AMDGPU/fceil64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_sint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_uint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/srem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/urem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.

I enabled sisched by commenting out the following line:

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  return createGCNMaxOccupancyMachineScheduler(C);
}

This would indicate that handleMove + adjustLaneLiveness is insufficient. Do you have any ideas about what is missing ?

I ran lit tests with sished with ShouldTrackLaneMasks=true enabled by default with this patch, the following tests asserted:

******************** TEST 'LLVM :: CodeGen/AMDGPU/fceil64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_sint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_uint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/srem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/urem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.

I enabled sisched by commenting out the following line:

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  return createGCNMaxOccupancyMachineScheduler(C);
}

I didn't debugged it and I don't know why you decided so.

This would indicate that handleMove + adjustLaneLiveness is insufficient. Do you have any ideas about what is missing ?

The assert you have is in CodeGen/RegisterPressure.cpp: void decreaseSetPressure

This call can be from only three locations:
. in SIScheduleBlock::initRegPressure, when we initialize block liveins and liveouts for a given Block.
. In SIScheduleBlock::schedule, when we compute the register usage inside a block with the scheduled order inside the Block.
. outside sisched

If outside sisched, it means we have given an invalid schedule.
For the first two cases, it likely means we have done something wrong when changing the order of the SUs, and thus that handleMove and adjustLaneLiveness wouldn't be enough.

I didn't debugged it and I don't know why you decided so.

This would indicate that handleMove + adjustLaneLiveness is insufficient. Do you have any ideas about what is missing ?

Stack trace:

llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPre
ssure[*PSetI] >= Weight && "register pressure underflow"' failed.
#0 0x00000000030175bb llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000301764c PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003015aac llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:44:0
#3 0x0000000003016f53 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002af5bbdcc330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x00002af5bce85c37 gsignal /build/eglibc-oGUzwX/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00002af5bce89028 abort /build/eglibc-oGUzwX/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00002af5bce7ebf6 __assert_fail_base /build/eglibc-oGUzwX/eglibc-2.19/assert/assert.c:92:0
#8 0x00002af5bce7eca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x000000000270061f decreaseSetPressure(std::vector<unsigned int, std::allocator<unsigned int> >&, llvm::MachineRegisterInfo const&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask) /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:75:0
#10 0x0000000002700d9b llvm::RegPressureTracker::decreaseRegPressure(unsigned int, llvm::LaneBitmask, llvm::LaneBitmask) /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:158:0
#11 0x0000000002703e14 llvm::RegPressureTracker::advance(llvm::RegisterOperands const&) /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:892:0
#12 0x0000000002704053 llvm::RegPressureTracker::advance() /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:933:0
#13 0x0000000001561341 llvm::SIScheduleBlock::initRegPressure(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:338:0
#14 0x0000000001561756 llvm::SIScheduleBlock::schedule(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:415:0
#15 0x000000000156726e llvm::SIScheduleBlockCreator::scheduleInsideBlocks() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1375:0
#16 0x0000000001562d74 llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:661:0
#17 0x000000000156a92d llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1929:0
#18 0x000000000156bb10 llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2120:0
#19 0x0000000002648b0d (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#20 0x0000000002647f56 (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#21 0x00000000025bdc15 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#22 0x000000000297d26c llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#23 0x000000000297d3ff llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#24 0x000000000297d79a (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#25 0x000000000297deea llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#26 0x000000000297e12b llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#27 0x0000000001243107 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#28 0x0000000001241788 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#29 0x00002af5bce70f45 __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:321:0
#30 0x000000000123f799 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x123f799)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@fceil_v4f64'

In general, I think moving instructions just to use standard RP tracker to discover liveins/liveouts isn't a good idea. It isn't only slow but doesn't look reliable too. Why not discover these sets using DAG directly?

Could you describe how you would get all the relevant information ?

Liveins are everything needed by instructions of the block that are not produced by the block (but you need to be careful of register reuse. Some SU in the block may produce the register, but after another would have consumed it).
LiveOuts are everything produced by the Block which is not released (because another consumer elsewhere). Similarly you need to capture when the register is consumed for the last time in the Block (there may be a later block with a consumer of the register, but another instruction of another Block would have produced the register before).
This seems hard to do without correct liveIntervals. Those are correct only when reordering and doing handleMove, etc.

In general, I think moving instructions just to use standard RP tracker to discover liveins/liveouts isn't a good idea. It isn't only slow but doesn't look reliable too. Why not discover these sets using DAG directly?

I may miss something, but it looks that you can build data edges when building a superdag consisting of blocks. Incoming data edges would be liveins, outcoming - liveouts.

Could you describe how you would get all the relevant information ?

Liveins are everything needed by instructions of the block that are not produced by the block (but you need to be careful of register reuse. Some SU in the block may produce the register, but after another would have consumed it).
LiveOuts are everything produced by the Block which is not released (because another consumer elsewhere). Similarly you need to capture when the register is consumed for the last time in the Block (there may be a later block with a consumer of the register, but another instruction of another Block would have produced the register before).
This seems hard to do without correct liveIntervals. Those are correct only when reordering and doing handleMove, etc.

In general, I think moving instructions just to use standard RP tracker to discover liveins/liveouts isn't a good idea. It isn't only slow but doesn't look reliable too. Why not discover these sets using DAG directly?

Yes, that could work.
I didn't think of that.

Can we rely on having these properties:
. If SU(i) relies on the register produced by SU(j), there is a data dependency (even if SU(i) depends on SU(k), which depends itself on SU(j), in other words, redundancies are not removed).
. A data dependency between SU(j) and SU(i) always means that SU(i) needs as input the output of SU(j).

To sum up, do we have equivalence between "SU(i) has data dependency with SU(j)" and "one of SU(i) inputs is SU(j) output"

If we have these, I think we can make it work.

I may miss something, but it looks that you can build data edges when building a superdag consisting of blocks. Incoming data edges would be liveins, outcoming - liveouts.

If SU(i) uses register produced by SI(j):

SU(i) has a predecessor data edge with a number of register produced by SU(j)
SU(j) has a successor data edge with a number of register used by SU(i)

Except that data edges doesn't contain lanemask, but I think this can be solved.

This way SU(i) has predecessor data edges for all used registers in that SU, and SU(j) has successor data edges for every SU using SU(j)'s output.

Yes, that could work.
I didn't think of that.

Can we rely on having these properties:
. If SU(i) relies on the register produced by SU(j), there is a data dependency (even if SU(i) depends on SU(k), which depends itself on SU(j), in other words, redundancies are not removed).
. A data dependency between SU(j) and SU(i) always means that SU(i) needs as input the output of SU(j).

To sum up, do we have equivalence between "SU(i) has data dependency with SU(j)" and "one of SU(i) inputs is SU(j) output"

If we have these, I think we can make it work.

I may miss something, but it looks that you can build data edges when building a superdag consisting of blocks. Incoming data edges would be liveins, outcoming - liveouts.

axeldavy updated this revision to Diff 93815.Apr 2 2017, 2:43 PM
axeldavy added a reviewer: vpykhtin.

This update reworks significantly the structure of the scheduler.

. Put all the code to handle register tracking into a class SISchedulerRPTracker. The class handles lane tracking as described previously.
. The code to schedule inside blocks has been completly reworked to use this class. Thus it doesn't use default RPTracker anymore, doesn't call handleMove, etc and is thus very significantly faster.

depends on D31587

vpykhtin edited edge metadata.Apr 4 2017, 8:01 AM

First part of comments related only to C++ issues

lib/Target/AMDGPU/SIMachineScheduler.cpp
144

const SmallVectorImpl<RegisterMaskPair> &

196

I recommend use auto

210

You're iterating over a vector increasing its' size at the same time. This is almost ok, except that vector can reallocate and all iterators would become invalidated.

Iterator Invalidation Rules (C++03)
vector: all iterators and references before the point of insertion are unaffected, unless the new container size is greater than the previous capacity (in which case all iterators and references are invalidated) [23.2.4.3/1]

217

Use const SmallVectorImpl<RegisterMaskPair> &

229

auto

247

auto

294

auto

325

SmallPtrSet is designed for pointers, gcc would complain, try consider SmallDenseSet

Are you developing using MSVS?

344

I saw a lot of such code snippets in your code, please make a function

376

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
LLVM_DUMP_METHOD

lib/Target/AMDGPU/SIMachineScheduler.h
102

I think I should change unsiged to signed values inside GCNRegPressure structure so we could use it as a regpressure difference too.

123

const SmallVectorImpl<RegisterMaskPair> &

In general, when using Small... containers use SmallContainerImpl<T>& for passing by reference, its designed to be used so.

124

ditto

219

const SmallVectorImpl<RegisterMaskPair>&

220

ditto

248–254

const SmallVector<RegisterMaskPair, 8> &getInRegs() const { return LiveInRegs; }

249

ditto

393

const SmallVectorImpl<RegisterMaskPair>&

395

ditto

525–526

const SmallVector<RegisterMaskPair, 8> &getInRegs() const

529–530

ditto

vpykhtin added a comment.EditedApr 6 2017, 9:41 AM

The following tests assert with this (and predecessors) patches with SISched turned on by default:

FAIL: LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll (4169 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU
/coalescer-subrange-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

FileCheck error: '-' is empty.
FileCheck command line:  /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll


FAIL: LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll (5010 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDG
PU/undefined-subreg-liverange.ll
--
Exit Code: 1

Command Output (stderr):
--
llc: /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1793: void llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, const llvm::SUnit*): Assertion `UsePos != Uses.end()' failed.
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002b433235c330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x00002b4333415c37 gsignal /build/eglibc-MjiXCM/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00002b4333419028 abort /build/eglibc-MjiXCM/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00002b433340ebf6 __assert_fail_base /build/eglibc-MjiXCM/eglibc-2.19/assert/assert.c:92:0
#8 0x00002b433340eca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x000000000157376e llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, llvm::SUnit const*) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1795:0
#10 0x00000000015727e3 llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1613:0
#11 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#12 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#13 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#14 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#15 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#16 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#17 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#18 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#19 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#20 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#21 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#22 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#23 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#24 0x00002b4333400f45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#25 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@partially_undef_copy'

FAIL: LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll (5126 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002ac49b609330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x000000000156a920 llvm::SISchedulerRPTracker::SISchedulerRPTracker(llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, llvm::MachineRegisterInfo const*, llvm::TargetRegisterInfo const*, unsigned int, unsigned int, bool) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:198:0
#6 0x000000000156dace llvm::SIScheduleBlock::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:835:0
#7 0x000000000156df10 llvm::SIScheduleBlock::finalize() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:877:0
#8 0x0000000001572b1d llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1647:0
#9 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#10 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#11 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#12 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#13 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#14 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#15 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#16 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#17 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#18 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#19 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#20 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#21 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#22 0x00002ac49c6adf45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#23 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@main'

For the first assert, it's a mistake in my code. Instead of assert, I should return doing nothing.

For the second error, I don't understand. There doesn't seem to be anything wrong at this line,
and when I run llc on that .ll I don't get error.

The following tests assert with this (and predecessors) patches with SISched turned on by default:

FAIL: LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll (4169 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU
/coalescer-subrange-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

FileCheck error: '-' is empty.
FileCheck command line:  /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll


FAIL: LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll (5010 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDG
PU/undefined-subreg-liverange.ll
--
Exit Code: 1

Command Output (stderr):
--
llc: /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1793: void llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, const llvm::SUnit*): Assertion `UsePos != Uses.end()' failed.
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002b433235c330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x00002b4333415c37 gsignal /build/eglibc-MjiXCM/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00002b4333419028 abort /build/eglibc-MjiXCM/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00002b433340ebf6 __assert_fail_base /build/eglibc-MjiXCM/eglibc-2.19/assert/assert.c:92:0
#8 0x00002b433340eca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x000000000157376e llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, llvm::SUnit const*) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1795:0
#10 0x00000000015727e3 llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1613:0
#11 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#12 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#13 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#14 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#15 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#16 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#17 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#18 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#19 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#20 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#21 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#22 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#23 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#24 0x00002b4333400f45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#25 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@partially_undef_copy'

FAIL: LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll (5126 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002ac49b609330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x000000000156a920 llvm::SISchedulerRPTracker::SISchedulerRPTracker(llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, llvm::MachineRegisterInfo const*, llvm::TargetRegisterInfo const*, unsigned int, unsigned int, bool) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:198:0
#6 0x000000000156dace llvm::SIScheduleBlock::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:835:0
#7 0x000000000156df10 llvm::SIScheduleBlock::finalize() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:877:0
#8 0x0000000001572b1d llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1647:0
#9 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#10 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#11 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#12 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#13 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#14 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#15 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#16 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#17 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#18 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#19 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#20 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#21 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#22 0x00002ac49c6adf45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#23 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@main'
axeldavy updated this revision to Diff 94440.Apr 6 2017, 2:41 PM

Fixed the assert in removeUseFromDef.

Added some const and replaced SmallVector<RegisterMaskPair, 8> by SmallVectorImpl<RegisterMaskPair>

axeldavy added inline comments.Apr 6 2017, 2:45 PM
lib/Target/AMDGPU/SIMachineScheduler.cpp
210

What do you suggest to fix this problem ?

376

Why LLVM_DUMP_METHOD on an empty line ?

lib/Target/AMDGPU/SIMachineScheduler.h
219

right, I missed that for this update. Will be for next time.

393

canno be const here because we remove things from Uses.

Thanks Axel!

I had to mention that I'm enabling SISched by commenting out

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  //return createGCNMaxOccupancyMachineScheduler(C);
}

This way I got ShouldTrackLaneMasks = true

Sure. In this patch I make ShouldTrackLaneMasks set to true always.

Thanks Axel!

I had to mention that I'm enabling SISched by commenting out

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  //return createGCNMaxOccupancyMachineScheduler(C);
}

This way I got ShouldTrackLaneMasks = true

For the second assert it may be the case for iterator invalidation, though I haven't checked.

lib/Target/AMDGPU/SIMachineScheduler.cpp
210

Easy way is to reserve sufficient number of elements to avoid reallocation, but this is not reliable as code correctness depends on reserve which is in general optional. You may consider using std::list for this.

376

I meant something like this:

#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)

LLVM_DUMP_METHOD
SISchedulerRPTracker::printDebugLives()

axeldavy updated this revision to Diff 103878.Jun 25 2017, 3:24 PM

I think I took into account all remaining comments, please tell if I missed something.

axeldavy edited the summary of this revision. (Show Details)Jun 25 2017, 3:27 PM
axeldavy updated this revision to Diff 104283.Jun 27 2017, 3:18 PM

I had incorrectly tested after introducing getPressureOfReg.
RPTracker->getPressureOfReg uses were incorrect, because the RPTracker wasn't initialized yet at the moment it was called.

I fixed the problem by defining it twice (each version using the available data in its class).
This is a small function, introduced to remove redundant code as you suggested.
If you have a better suggestion I'll take it.

Is this still needed?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 7:10 PM

Is this still needed?

To my knowledge, yes. Do you have any insight on why it wouldn't be anymore ? Has there been change related to lane tracking ?