This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/SI: Add lane tracking to SI Scheduler
Needs ReviewPublic

Authored by axeldavy on Mar 19 2017, 4:14 AM.

Download Raw Diff

Details

Reviewers

rampitec
vpykhtin
arsenm
alex-t

Summary

This patch adds lane tracking to SI Scheduler.

To handle lanes,
. When a register is always used with all its lanes, it replaces the register with RegisterMaskPair(Reg, LaneBitMask::all())
. In the other cases, it determines a 'basis' of masks such that any register/lane usage can be decomposed into fake registers of RegisterMaskPair(Reg, LaneBitMask element of the basis).

Previously the code assumed that a Register cannot be defined if already defined. LaneMasks break this assumption.
Decomposing into unique RegisterMaskPair "registers" (such that the Lanes don't intersect) enables to reuse the previous assumption.
A RegisterMaskPair can only be defined if not already alive. Thus enables to use the previous code, with some updates to how register usage is computed.

depends on D31123 and D31587

Diff Detail

Repository: rL LLVM

Event Timeline

axeldavy created this revision.Mar 19 2017, 4:14 AM

Herald added subscribers: tpr, dstuttard, tony-tye and 5 others. · View Herald TranscriptMar 19 2017, 4:14 AM

axeldavy edited the summary of this revision. (Show Details)Mar 19 2017, 3:52 PM

axeldavy added a parent revision: D31123: RegisterPressure: Add operators to RegisterMaskPair.

I removed the usage of the [] operator for DenseMap<unsigned, RegisterMaskPair> IdentifierToReg.

This doesn't need anymore RegisterMaskPair to have a default constructor with no argument.

rampitec added inline comments.Mar 21 2017, 5:14 PM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1788	Forgot '&'?
1961	Can you please avoid copy here and in function's arguments?
1991	auto& (and in other places too)?

axeldavy added inline comments.Mar 22 2017, 12:37 AM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1788	indeed. Thanks.
1961	Right, I can get SIScheduleBlockScheduler::getPairsForRegs(const SmallVector<RegisterMaskPair, 8> &Regs) { SmallVector<RegisterMaskPair, 8> Result; for (const auto &RegPair : Regs) { But for the line with: for (const auto RegPairRes : getPairsForReg(RegPair.RegUnit, Do you mean just use &RegPairRes ? Daniel Berlin suggested to use std::transform in combination with a new vector_inserter to replace the two loops. That wouldn't prevent the fact there is in all cases a copy from getPairsForReg results into the result tab of getPairsForRegs. I could probably solve the issue by adding a new getPairsForReg function that takes the Result array to append to, and using that helper with the two functions. Do you think that would be a good solution ?
1991	Ok

rampitec added inline comments.Mar 22 2017, 11:03 AM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1961	Yes, it sounds more efficient to me.

Use more references.
Improved efficiency of getPairsForReg.

rampitec added inline comments.Mar 22 2017, 12:59 PM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1936	ToAppend is passed by value... As far as I understand it should not work, i.e. it should not return any values.
1965	Argument needs to be reference.

Fix getPairsForRegs.

Use reference for the input argument of getPairsForRegs.

In D31124#707998, @axeldavy wrote:

Use reference for the input argument of getPairsForRegs.

Thank you. It looks good, but w/o reference previous version should not been working. I.e. it looks like this code is not covered by any test.
I think you need to add tests for this.

Si scheduler has a test file: test/CodeGen/AMDGPU/si-scheduler.ll

But perhaps you meant a test with subreg ?
I'm afraid it's hard to design a test for subregs specifically, since if the ops using subregs are inside the same blocks, the blocks inputs and outputs won't have subregs. And block creation algorithms can vary.

In D31124#708105, @axeldavy wrote:

Si scheduler has a test file: test/CodeGen/AMDGPU/si-scheduler.ll

But perhaps you meant a test with subreg ?
I'm afraid it's hard to design a test for subregs specifically, since if the ops using subregs are inside the same blocks, the blocks inputs and outputs won't have subregs. And block creation algorithms can vary.

You can create a MIR test. For example schedule-regpressure.mir runs only machine scheduler.

t-tye added a subscriber: t-tye.Mar 22 2017, 6:40 PM

tony-tye removed a subscriber: tony-tye.Mar 22 2017, 6:47 PM

Check shouldTrackLaneMask when building the list of the Block outputs.
When not set, force the LaneMask to be full. Because of findDefsBetween, if can end up partial.
Fixes assertion fault on some shaders.

In D31124#708110, @rampitec wrote:

In D31124#708105, @axeldavy wrote:

Si scheduler has a test file: test/CodeGen/AMDGPU/si-scheduler.ll

But perhaps you meant a test with subreg ?
I'm afraid it's hard to design a test for subregs specifically, since if the ops using subregs are inside the same blocks, the blocks inputs and outputs won't have subregs. And block creation algorithms can vary.

You can create a MIR test. For example schedule-regpressure.mir runs only machine scheduler.

That doesn't entirely fix the mentionned problem.
Sisched works by regroupping instructions into Blocks, and scheduling the Blocks between them.
For the test to really check subregs, we'd have to make sure some blocks have subreg outputs.
One way would be to have the subregs involve high latency instructions, because those are always in separated blocks when there is Data dependency.
That is complicated though, I'd have to really investigate how to write tests.

Switched the order of the patch with https://reviews.llvm.org/D30147

Why SIScheduleBlockCreator::scheduleInsideBlocks() actually move instructions? Why it isn't done on the final scheduling?

lib/Target/AMDGPU/SIMachineScheduler.cpp
1388	duplicated code
lib/Target/AMDGPU/SIMachineScheduler.h
498	You can make it const and return reference, like const SmallVector<RegisterMaskPair, 8> &getInRegs() const;
502	Ditto

Ok, I understood SIScheduleBlockCreator::scheduleInsideBlocks() moves instructions to actually get LiveIn and LiveOut set for a block, but this is rather heavy. Have you thought about getting those using DAG directly, not regpressure tracker? By the common sence the dependencies between blocks correspond to that liveness info. There is a problem however: LiveIn and LiveOut dependencies aren't modelled for boundary SUs. I have local patch that build such dependencies - scheduling region LiveIns edges comes from EntrySU, LiveOut - to ExitSU. Another problem - dependency edges doesn't have lanemask, need to think how to deal with this.

lib/Target/AMDGPU/SIMachineScheduler.cpp
387	I would move LIS->getInstructionIndex(BeginBlock).getRegSlot(), LIS->getInstructionIndex(EndBlock).getRegSlot() out of the loop

In D31124#712336, @vpykhtin wrote:

Ok, I understood SIScheduleBlockCreator::scheduleInsideBlocks() moves instructions to actually get LiveIn and LiveOut set for a block, but this is rather heavy. Have you thought about getting those using DAG directly, not regpressure tracker? By the common sence the dependencies between blocks correspond to that liveness info. There is a problem however: LiveIn and LiveOut dependencies aren't modelled for boundary SUs. I have local patch that build such dependencies - scheduling region LiveIns edges comes from EntrySU, LiveOut - to ExitSU. Another problem - dependency edges doesn't have lanemask, need to think how to deal with this.

This part is the slowest part of the scheduler (handleMove is very innefficient). It's probably a good idea to replace it with a faster system.

axeldavy added inline comments.Mar 28 2017, 11:26 AM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1388	This is duplicated with your iterative scheduler, and I suggested to have a common function for both schedulers and the default scheduler (you said you extracted the code from there), but you didn't take the comment into account. Do you want a common function for both SI schedulers or with default scheduler too ?
lib/Target/AMDGPU/SIMachineScheduler.h
498	Right. I had in mind we'd add the correct computation of the block liveins here (as for GCNScheduler), but for now we can return reference.

vpykhtin added inline comments.Mar 28 2017, 11:32 AM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1388	Yes, I remember that comment. This doesn't mean this code should be duplicated even more, I encorage you to make short functions whenever possible, even for your own use. This code is actually isn't very efficient and I planned to make efficient version, I should have been noted this, sorry.

Updated according to comments.

rampitec added inline comments.Mar 28 2017, 3:19 PM

lib/Target/AMDGPU/SIMachineScheduler.cpp
1704	According to coding standard, if (...) return;

Axel, thanks for the update.

If you aren't going to add test for this, can you at least run your change on the debug build (or release with asserts) with sisched turned on by default over all AMDGPU lit tests? They mostly would fail on miscompare, but they should run without asserts.

In D31124#712850, @vpykhtin wrote:

Axel, thanks for the update.

If you aren't going to add test for this, can you at least run your change on the debug build (or release with asserts) with sisched turned on by default over all AMDGPU lit tests? They mostly would fail on miscompare, but they should run without asserts.

I think it is a good idea to add a test, I think it'd have to be something with a lot of loads and involving a lot of subregs. I just hadn't time yet to write it.

vpykhtin added inline comments.Mar 31 2017, 3:21 AM

lib/Target/AMDGPU/SIMachineScheduler.cpp
2067	SmallPtrSet is designed for pointer types but used for unsigned type, gcc complains, try consider SmallDenseSet.

I ran lit tests with sished with ShouldTrackLaneMasks=true enabled by default with this patch, the following tests asserted:

******************** TEST 'LLVM :: CodeGen/AMDGPU/fceil64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_sint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_uint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/srem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/urem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.

I enabled sisched by commenting out the following line:

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  return createGCNMaxOccupancyMachineScheduler(C);
}

This would indicate that handleMove + adjustLaneLiveness is insufficient. Do you have any ideas about what is missing ?

In D31124#715170, @vpykhtin wrote:

I ran lit tests with sished with ShouldTrackLaneMasks=true enabled by default with this patch, the following tests asserted:

******************** TEST 'LLVM :: CodeGen/AMDGPU/fceil64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_sint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/fp_to_uint.f64.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/srem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.
******************** TEST 'LLVM :: CodeGen/AMDGPU/urem.ll' FAILED ********************
llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPressure[*PSetI] >= Weight && "register pressure underflow"' failed.

I enabled sisched by commenting out the following line:

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  return createGCNMaxOccupancyMachineScheduler(C);
}

I didn't debugged it and I don't know why you decided so.

In D31124#715389, @axeldavy wrote:

This would indicate that handleMove + adjustLaneLiveness is insufficient. Do you have any ideas about what is missing ?

The assert you have is in CodeGen/RegisterPressure.cpp: void decreaseSetPressure

This call can be from only three locations:
. in SIScheduleBlock::initRegPressure, when we initialize block liveins and liveouts for a given Block.
. In SIScheduleBlock::schedule, when we compute the register usage inside a block with the scheduled order inside the Block.
. outside sisched

If outside sisched, it means we have given an invalid schedule.
For the first two cases, it likely means we have done something wrong when changing the order of the SUs, and thus that handleMove and adjustLaneLiveness wouldn't be enough.

In D31124#715398, @vpykhtin wrote:

I didn't debugged it and I don't know why you decided so.

In D31124#715389, @axeldavy wrote:

This would indicate that handleMove + adjustLaneLiveness is insufficient. Do you have any ideas about what is missing ?

Stack trace:

llc: /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:74: void decreaseSetPressure(std::vector<unsigned int>&, const llvm::MachineRegisterInfo&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask): Assertion `CurrSetPre
ssure[*PSetI] >= Weight && "register pressure underflow"' failed.
#0 0x00000000030175bb llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000301764c PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003015aac llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:44:0
#3 0x0000000003016f53 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002af5bbdcc330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x00002af5bce85c37 gsignal /build/eglibc-oGUzwX/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00002af5bce89028 abort /build/eglibc-oGUzwX/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00002af5bce7ebf6 __assert_fail_base /build/eglibc-oGUzwX/eglibc-2.19/assert/assert.c:92:0
#8 0x00002af5bce7eca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x000000000270061f decreaseSetPressure(std::vector<unsigned int, std::allocator<unsigned int> >&, llvm::MachineRegisterInfo const&, unsigned int, llvm::LaneBitmask, llvm::LaneBitmask) /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:75:0
#10 0x0000000002700d9b llvm::RegPressureTracker::decreaseRegPressure(unsigned int, llvm::LaneBitmask, llvm::LaneBitmask) /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:158:0
#11 0x0000000002703e14 llvm::RegPressureTracker::advance(llvm::RegisterOperands const&) /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:892:0
#12 0x0000000002704053 llvm::RegPressureTracker::advance() /srv/vpykhtin/git/llvm/lib/CodeGen/RegisterPressure.cpp:933:0
#13 0x0000000001561341 llvm::SIScheduleBlock::initRegPressure(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:338:0
#14 0x0000000001561756 llvm::SIScheduleBlock::schedule(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:415:0
#15 0x000000000156726e llvm::SIScheduleBlockCreator::scheduleInsideBlocks() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1375:0
#16 0x0000000001562d74 llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:661:0
#17 0x000000000156a92d llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1929:0
#18 0x000000000156bb10 llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2120:0
#19 0x0000000002648b0d (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#20 0x0000000002647f56 (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#21 0x00000000025bdc15 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#22 0x000000000297d26c llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#23 0x000000000297d3ff llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#24 0x000000000297d79a (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#25 0x000000000297deea llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#26 0x000000000297e12b llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#27 0x0000000001243107 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#28 0x0000000001241788 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#29 0x00002af5bce70f45 __libc_start_main /build/eglibc-oGUzwX/eglibc-2.19/csu/libc-start.c:321:0
#30 0x000000000123f799 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x123f799)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@fceil_v4f64'

In general, I think moving instructions just to use standard RP tracker to discover liveins/liveouts isn't a good idea. It isn't only slow but doesn't look reliable too. Why not discover these sets using DAG directly?

Could you describe how you would get all the relevant information ?

Liveins are everything needed by instructions of the block that are not produced by the block (but you need to be careful of register reuse. Some SU in the block may produce the register, but after another would have consumed it).
LiveOuts are everything produced by the Block which is not released (because another consumer elsewhere). Similarly you need to capture when the register is consumed for the last time in the Block (there may be a later block with a consumer of the register, but another instruction of another Block would have produced the register before).
This seems hard to do without correct liveIntervals. Those are correct only when reordering and doing handleMove, etc.

In D31124#715499, @vpykhtin wrote:

In general, I think moving instructions just to use standard RP tracker to discover liveins/liveouts isn't a good idea. It isn't only slow but doesn't look reliable too. Why not discover these sets using DAG directly?

I may miss something, but it looks that you can build data edges when building a superdag consisting of blocks. Incoming data edges would be liveins, outcoming - liveouts.

In D31124#715509, @axeldavy wrote:

Could you describe how you would get all the relevant information ?

Liveins are everything needed by instructions of the block that are not produced by the block (but you need to be careful of register reuse. Some SU in the block may produce the register, but after another would have consumed it).
LiveOuts are everything produced by the Block which is not released (because another consumer elsewhere). Similarly you need to capture when the register is consumed for the last time in the Block (there may be a later block with a consumer of the register, but another instruction of another Block would have produced the register before).
This seems hard to do without correct liveIntervals. Those are correct only when reordering and doing handleMove, etc.

In D31124#715499, @vpykhtin wrote:

In general, I think moving instructions just to use standard RP tracker to discover liveins/liveouts isn't a good idea. It isn't only slow but doesn't look reliable too. Why not discover these sets using DAG directly?

Yes, that could work.
I didn't think of that.

Can we rely on having these properties:
. If SU(i) relies on the register produced by SU(j), there is a data dependency (even if SU(i) depends on SU(k), which depends itself on SU(j), in other words, redundancies are not removed).
. A data dependency between SU(j) and SU(i) always means that SU(i) needs as input the output of SU(j).

To sum up, do we have equivalence between "SU(i) has data dependency with SU(j)" and "one of SU(i) inputs is SU(j) output"

If we have these, I think we can make it work.

In D31124#715515, @vpykhtin wrote:

I may miss something, but it looks that you can build data edges when building a superdag consisting of blocks. Incoming data edges would be liveins, outcoming - liveouts.

If SU(i) uses register produced by SI(j):

SU(i) has a predecessor data edge with a number of register produced by SU(j)
SU(j) has a successor data edge with a number of register used by SU(i)

Except that data edges doesn't contain lanemask, but I think this can be solved.

This way SU(i) has predecessor data edges for all used registers in that SU, and SU(j) has successor data edges for every SU using SU(j)'s output.

In D31124#715528, @axeldavy wrote:

Yes, that could work.
I didn't think of that.

Can we rely on having these properties:
. If SU(i) relies on the register produced by SU(j), there is a data dependency (even if SU(i) depends on SU(k), which depends itself on SU(j), in other words, redundancies are not removed).
. A data dependency between SU(j) and SU(i) always means that SU(i) needs as input the output of SU(j).

To sum up, do we have equivalence between "SU(i) has data dependency with SU(j)" and "one of SU(i) inputs is SU(j) output"

If we have these, I think we can make it work.

In D31124#715515, @vpykhtin wrote:

I may miss something, but it looks that you can build data edges when building a superdag consisting of blocks. Incoming data edges would be liveins, outcoming - liveouts.

axeldavy mentioned this in D31587: MachineScheduler/ScheduleDAG: Add support for getSUTopoIndex.Apr 2 2017, 2:38 PM

This update reworks significantly the structure of the scheduler.

. Put all the code to handle register tracking into a class SISchedulerRPTracker. The class handles lane tracking as described previously.
. The code to schedule inside blocks has been completly reworked to use this class. Thus it doesn't use default RPTracker anymore, doesn't call handleMove, etc and is thus very significantly faster.

depends on D31587

First part of comments related only to C++ issues

lib/Target/AMDGPU/SIMachineScheduler.cpp
144	const SmallVectorImpl<RegisterMaskPair> &
196	I recommend use auto
210	You're iterating over a vector increasing its' size at the same time. This is almost ok, except that vector can reallocate and all iterators would become invalidated. Iterator Invalidation Rules (C++03) vector: all iterators and references before the point of insertion are unaffected, unless the new container size is greater than the previous capacity (in which case all iterators and references are invalidated) [23.2.4.3/1]
217	Use const SmallVectorImpl<RegisterMaskPair> &
229	auto
247	auto
294	auto
325	SmallPtrSet is designed for pointers, gcc would complain, try consider SmallDenseSet Are you developing using MSVS?
344	I saw a lot of such code snippets in your code, please make a function
376	#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD
lib/Target/AMDGPU/SIMachineScheduler.h
102	I think I should change unsiged to signed values inside GCNRegPressure structure so we could use it as a regpressure difference too.
123	const SmallVectorImpl<RegisterMaskPair> & In general, when using Small... containers use SmallContainerImpl<T>& for passing by reference, its designed to be used so.
124	ditto
219	const SmallVectorImpl<RegisterMaskPair>&
220	ditto
248–254	const SmallVector<RegisterMaskPair, 8> &getInRegs() const { return LiveInRegs; }
249	ditto
393	const SmallVectorImpl<RegisterMaskPair>&
395	ditto
525–526	const SmallVector<RegisterMaskPair, 8> &getInRegs() const
529–530	ditto

The following tests assert with this (and predecessors) patches with SISched turned on by default:

FAIL: LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll (4169 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU
/coalescer-subrange-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

FileCheck error: '-' is empty.
FileCheck command line:  /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll


FAIL: LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll (5010 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDG
PU/undefined-subreg-liverange.ll
--
Exit Code: 1

Command Output (stderr):
--
llc: /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1793: void llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, const llvm::SUnit*): Assertion `UsePos != Uses.end()' failed.
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002b433235c330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x00002b4333415c37 gsignal /build/eglibc-MjiXCM/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00002b4333419028 abort /build/eglibc-MjiXCM/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00002b433340ebf6 __assert_fail_base /build/eglibc-MjiXCM/eglibc-2.19/assert/assert.c:92:0
#8 0x00002b433340eca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x000000000157376e llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, llvm::SUnit const*) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1795:0
#10 0x00000000015727e3 llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1613:0
#11 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#12 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#13 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#14 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#15 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#16 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#17 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#18 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#19 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#20 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#21 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#22 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#23 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#24 0x00002b4333400f45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#25 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@partially_undef_copy'

FAIL: LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll (5126 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002ac49b609330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x000000000156a920 llvm::SISchedulerRPTracker::SISchedulerRPTracker(llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, llvm::MachineRegisterInfo const*, llvm::TargetRegisterInfo const*, unsigned int, unsigned int, bool) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:198:0
#6 0x000000000156dace llvm::SIScheduleBlock::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:835:0
#7 0x000000000156df10 llvm::SIScheduleBlock::finalize() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:877:0
#8 0x0000000001572b1d llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1647:0
#9 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#10 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#11 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#12 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#13 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#14 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#15 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#16 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#17 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#18 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#19 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#20 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#21 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#22 0x00002ac49c6adf45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#23 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@main'

For the first assert, it's a mistake in my code. Instead of assert, I should return doing nothing.

For the second error, I don't understand. There doesn't seem to be anything wrong at this line,
and when I run llc on that .ll I don't get error.

In D31124#720304, @vpykhtin wrote:

The following tests assert with this (and predecessors) patches with SISched turned on by default:

FAIL: LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll (4169 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/coalescer-subrange-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU
/coalescer-subrange-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
llc: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.

FileCheck error: '-' is empty.
FileCheck command line:  /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/coalescer-subrange-crash.ll


FAIL: LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll (5010 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/undefined-subreg-liverange.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/undefined-subreg-liverange.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDG
PU/undefined-subreg-liverange.ll
--
Exit Code: 1

Command Output (stderr):
--
llc: /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1793: void llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, const llvm::SUnit*): Assertion `UsePos != Uses.end()' failed.
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002b433235c330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x00002b4333415c37 gsignal /build/eglibc-MjiXCM/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#6 0x00002b4333419028 abort /build/eglibc-MjiXCM/eglibc-2.19/stdlib/abort.c:91:0
#7 0x00002b433340ebf6 __assert_fail_base /build/eglibc-MjiXCM/eglibc-2.19/assert/assert.c:92:0
#8 0x00002b433340eca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#9 0x000000000157376e llvm::SIScheduleBlockCreator::removeUseFromDef(llvm::SmallVector<llvm::RegisterMaskPair, 8u>&, unsigned int, llvm::SUnit const*) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1795:0
#10 0x00000000015727e3 llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1613:0
#11 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#12 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#13 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#14 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#15 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#16 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#17 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#18 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#19 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#20 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#21 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#22 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#23 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#24 0x00002b4333400f45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#25 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@partially_undef_copy'

FAIL: LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll (5126 of 20226)
******************** TEST 'LLVM :: CodeGen/AMDGPU/unigine-liveness-crash.ll' FAILED ********************
Script:
--
/srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs < /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll | /srv/vpykhtin/git/debug.llvm/./bin/FileCheck /srv/vpykhtin/git/llvm/test/CodeGen/AMDGPU/unigine-liveness-crash.ll
--
Exit Code: 2

Command Output (stderr):
--
#0 0x00000000030298dd llvm::sys::PrintStackTrace(llvm::raw_ostream&) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:398:0
#1 0x000000000302996e PrintStackTraceSignalHandler(void*) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:462:0
#2 0x0000000003027dce llvm::sys::RunSignalHandlers() /srv/vpykhtin/git/llvm/lib/Support/Signals.cpp:43:0
#3 0x0000000003029275 SignalHandler(int) /srv/vpykhtin/git/llvm/lib/Support/Unix/Signals.inc:252:0
#4 0x00002ac49b609330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#5 0x000000000156a920 llvm::SISchedulerRPTracker::SISchedulerRPTracker(llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, llvm::SmallVector<llvm::RegisterMaskPair, 8u> const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<unsigned int, 8u>, std::allocator<llvm::SmallVector<unsigned int, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, std::vector<llvm::SmallVector<llvm::RegisterMaskPair, 8u>, std::allocator<llvm::SmallVector<llvm::RegisterMaskPair, 8u> > > const&, llvm::MachineRegisterInfo const*, llvm::TargetRegisterInfo const*, unsigned int, unsigned int, bool) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:198:0
#6 0x000000000156dace llvm::SIScheduleBlock::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:835:0
#7 0x000000000156df10 llvm::SIScheduleBlock::finalize() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:877:0
#8 0x0000000001572b1d llvm::SIScheduleBlockCreator::createBlocksForVariant(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1647:0
#9 0x000000000156ef3e llvm::SIScheduleBlockCreator::getBlocks(llvm::SISchedulerBlockCreatorVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:1043:0
#10 0x0000000001574e69 llvm::SIScheduler::scheduleVariant(llvm::SISchedulerBlockCreatorVariant, llvm::SISchedulerBlockSchedulerVariant) /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2073:0
#11 0x0000000001575eea llvm::SIScheduleDAGMI::schedule() /srv/vpykhtin/git/llvm/lib/Target/AMDGPU/SIMachineScheduler.cpp:2233:0
#12 0x0000000002659da5 (anonymous namespace)::MachineSchedulerBase::scheduleRegions(llvm::ScheduleDAGInstrs&, bool) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:524:0
#13 0x00000000026591ee (anonymous namespace)::MachineScheduler::runOnMachineFunction(llvm::MachineFunction&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineScheduler.cpp:387:0
#14 0x00000000025ce4f9 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/CodeGen/MachineFunctionPass.cpp:62:0
#15 0x000000000298e138 llvm::FPPassManager::runOnFunction(llvm::Function&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1513:0
#16 0x000000000298e2cb llvm::FPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1534:0
#17 0x000000000298e666 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1590:0
#18 0x000000000298edb6 llvm::legacy::PassManagerImpl::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1693:0
#19 0x000000000298eff7 llvm::legacy::PassManager::run(llvm::Module&) /srv/vpykhtin/git/llvm/lib/IR/LegacyPassManager.cpp:1725:0
#20 0x000000000124c627 compileModule(char**, llvm::LLVMContext&) /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:579:0
#21 0x000000000124aca8 main /srv/vpykhtin/git/llvm/tools/llc/llc.cpp:331:0
#22 0x00002ac49c6adf45 __libc_start_main /build/eglibc-MjiXCM/eglibc-2.19/csu/libc-start.c:321:0
#23 0x0000000001248cb9 _start (/srv/vpykhtin/git/debug.llvm/./bin/llc+0x1248cb9)
Stack dump:
0.      Program arguments: /srv/vpykhtin/git/debug.llvm/./bin/llc -march=amdgcn -verify-machineinstrs
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'Machine Instruction Scheduler' on function '@main'

Fixed the assert in removeUseFromDef.

Added some const and replaced SmallVector<RegisterMaskPair, 8> by SmallVectorImpl<RegisterMaskPair>

axeldavy added inline comments.Apr 6 2017, 2:45 PM

lib/Target/AMDGPU/SIMachineScheduler.cpp
210	What do you suggest to fix this problem ?
376	Why LLVM_DUMP_METHOD on an empty line ?
lib/Target/AMDGPU/SIMachineScheduler.h
219	right, I missed that for this update. Will be for next time.
393	canno be const here because we remove things from Uses.

Thanks Axel!

I had to mention that I'm enabling SISched by commenting out

ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  //return createGCNMaxOccupancyMachineScheduler(C);
}

This way I got ShouldTrackLaneMasks = true

Sure. In this patch I make ShouldTrackLaneMasks set to true always.

In D31124#720730, @vpykhtin wrote:
Thanks Axel!

I had to mention that I'm enabling SISched by commenting out
ScheduleDAGInstrs *GCNPassConfig::createMachineScheduler(
  MachineSchedContext *C) const {
  const SISubtarget &ST = C->MF->getSubtarget<SISubtarget>();
  //if (ST.enableSIScheduler())
    return createSIMachineScheduler(C);
  //return createGCNMaxOccupancyMachineScheduler(C);
}
This way I got ShouldTrackLaneMasks = true

For the second assert it may be the case for iterator invalidation, though I haven't checked.

lib/Target/AMDGPU/SIMachineScheduler.cpp
210	Easy way is to reserve sufficient number of elements to avoid reallocation, but this is not reliable as code correctness depends on reserve which is in general optional. You may consider using std::list for this.
376	I meant something like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD SISchedulerRPTracker::printDebugLives()

I think I took into account all remaining comments, please tell if I missed something.

axeldavy edited the summary of this revision. (Show Details)Jun 25 2017, 3:27 PM

I had incorrectly tested after introducing getPressureOfReg.
RPTracker->getPressureOfReg uses were incorrect, because the RPTracker wasn't initialized yet at the moment it was called.

I fixed the problem by defining it twice (each version using the available data in its class).
This is a small function, introduced to remove redundant code as you suggested.
If you have a better suggestion I'll take it.

Is this still needed?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2019, 7:10 PM

Herald added subscribers: jdoerfert, javed.absar, jvesely. · View Herald Transcript

In D31124#1406796, @arsenm wrote:

Is this still needed?

To my knowledge, yes. Do you have any insight on why it wouldn't be anymore ? Has there been change related to lane tracking ?

Revision Contents

Path

Size

lib/

Target/

AMDGPU/

AMDGPUSubtarget.cpp

4 lines

SIMachineScheduler.h

221 lines

SIMachineScheduler.cpp

1403 lines

Diff 103878

lib/Target/AMDGPU/AMDGPUSubtarget.cpp

Show First 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	void SISubtarget::overrideSchedPolicy(MachineSchedPolicy &Policy,
// SIRegisterInfo::getRegPressureSetLimit()		// SIRegisterInfo::getRegPressureSetLimit()
Policy.ShouldTrackPressure = true;		Policy.ShouldTrackPressure = true;

// Enabling both top down and bottom up scheduling seems to give us less		// Enabling both top down and bottom up scheduling seems to give us less
// register spills than just using one of these approaches on its own.		// register spills than just using one of these approaches on its own.
Policy.OnlyTopDown = false;		Policy.OnlyTopDown = false;
Policy.OnlyBottomUp = false;		Policy.OnlyBottomUp = false;

// Enabling ShouldTrackLaneMasks crashes the SI Machine Scheduler.
if (!enableSIScheduler())
Policy.ShouldTrackLaneMasks = true;		Policy.ShouldTrackLaneMasks = true;
}		}

bool SISubtarget::isVGPRSpillingEnabled(const Function& F) const {		bool SISubtarget::isVGPRSpillingEnabled(const Function& F) const {
return EnableVGPRSpilling \|\| !AMDGPU::isShader(F.getCallingConv());		return EnableVGPRSpilling \|\| !AMDGPU::isShader(F.getCallingConv());
}		}

unsigned SISubtarget::getKernArgSegmentSize(const MachineFunction &MF,		unsigned SISubtarget::getKernArgSegmentSize(const MachineFunction &MF,
unsigned ExplicitArgBytes) const {		unsigned ExplicitArgBytes) const {
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

lib/Target/AMDGPU/SIMachineScheduler.h

Show All 10 Lines
/// \brief SI Machine Scheduler interface		/// \brief SI Machine Scheduler interface
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_LIB_TARGET_AMDGPU_SIMACHINESCHEDULER_H		#ifndef LLVM_LIB_TARGET_AMDGPU_SIMACHINESCHEDULER_H
#define LLVM_LIB_TARGET_AMDGPU_SIMACHINESCHEDULER_H		#define LLVM_LIB_TARGET_AMDGPU_SIMACHINESCHEDULER_H

#include "SIInstrInfo.h"		#include "SIInstrInfo.h"
		#include "llvm/ADT/DenseMap.h"
#include "llvm/CodeGen/MachineBasicBlock.h"		#include "llvm/CodeGen/MachineBasicBlock.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
#include "llvm/CodeGen/RegisterPressure.h"		#include "llvm/CodeGen/RegisterPressure.h"
#include "llvm/CodeGen/ScheduleDAG.h"		#include "llvm/CodeGen/ScheduleDAG.h"
		#include "llvm/MC/LaneBitmask.h"
#include <cassert>		#include <cassert>
#include <cstdint>		#include <cstdint>
#include <map>		#include <map>
#include <memory>		#include <memory>
#include <set>		#include <set>
#include <vector>		#include <vector>

namespace llvm {		namespace llvm {

		class SISchedulerRPTracker {
		const MachineRegisterInfo *MRI;
		const TargetRegisterInfo *TRI;
		unsigned VGPRSetID, SGPRSetID;

		// Register -> Base of LaneBitmask to describe all possible
		// LaneBitmask used by Items Ins and Outs.
		// Except specified otherwise, RegisterMaskPairs in this class
		// are always having the LaneMask be one of the element of
		// LaneMaskBasisForReg[Reg]
		std::map<unsigned, SmallVector<LaneBitmask, 8>> LaneMaskBasisForReg;

		// Static information about the Items:

		// Index of Item Successors or Predecessors.
		std::vector<SmallVector<unsigned, 8>> ItemSuccs;
		std::vector<SmallVector<unsigned, 8>> ItemPreds;

		std::vector<unsigned> TopoIndexToItem;
		std::vector<unsigned> TopoItemToIndex;

		// Blocks's getInRegs, but with RegisterMaskPair with Mask
		// in LaneMaskBasisForReg[Reg] if exists.
		std::vector<SmallVector<RegisterMaskPair, 8>> InRegsForItem;
		std::vector<SmallVector<RegisterMaskPair, 8>> OutRegsForItem;

		// Item num -> Number of usages for each Item output.
		std::vector<std::map<RegisterMaskPair, unsigned>> OutRegsNumUsages;

		// Variable information during the schedule:

		// Items ready for schedule
		SmallVector<unsigned, 16> ReadyItems;

		// Current live regs, and the valid LaneBitmask
		std::map<unsigned, LaneBitmask> LiveRegs;

		// Number of schedulable unscheduled blocks reading the register.
		std::map<RegisterMaskPair, unsigned> RemainingRegsConsumers;

		std::vector<unsigned> ItemNumPredsLeft;

		unsigned CurrentVGPRUsage, CurrentSGPRUsage;

		public:
		// Here the RegisterMaskPair can have arbitrary LaneMask (not
		// elements of LaneMaskBasisForReg, which is not yet built).
		SISchedulerRPTracker(
		const SmallVectorImpl<RegisterMaskPair> &LiveIns,
		const SmallVectorImpl<RegisterMaskPair> &LiveOuts,
		const std::vector<SmallVector<unsigned, 8>> &ItemSuccs,
		const std::vector<SmallVector<unsigned, 8>> &ItemPreds,
		const std::vector<SmallVector<RegisterMaskPair, 8>> &InRegsForItem_,
		const std::vector<SmallVector<RegisterMaskPair, 8>> &OutRegsForItem_,
		const MachineRegisterInfo *MRI,
		const TargetRegisterInfo *TRI,
		unsigned VGPRSetID,
		unsigned SGPRSetID
		);

		void itemScheduled(unsigned ID);

		const SmallVector<unsigned, 16> &getReadyItems() {
		return ReadyItems;
		}

		void getCurrentRegUsage(unsigned &VGPR, unsigned &SGPR);
		// Check register pressure change
		// by scheduling a item
		vpykhtinUnsubmitted Not Done Reply Inline Actions I think I should change unsiged to signed values inside GCNRegPressure structure so we could use it as a regpressure difference too. vpykhtin: I think I should change unsiged to signed values inside GCNRegPressure structure so we could…
		void checkRegUsageImpact(unsigned ID, int &DiffVGPR, int &DiffSGPR);
		// Give the pressure of a register
		void getPressureOfReg(unsigned Reg, unsigned &PressureVGPR,
		unsigned &PressureSGPR);

		void printDebugLives();

		private:
		// Convert Reg/Mask to a list of Reg/Mask, with Mask in
		// LaneMaskBasisForReg.
		SmallVector<RegisterMaskPair, 8> getPairsForReg(unsigned Reg,
		LaneBitmask Mask);
		// ToAppend: where to append the result.
		void getPairsForReg(SmallVector<RegisterMaskPair, 8> &ToAppend,
		unsigned Reg, LaneBitmask Mask);
		// Idem for a list of Reg/Mask
		SmallVector<RegisterMaskPair, 8> getPairsForRegs(
		const SmallVectorImpl<RegisterMaskPair> &Regs);

		void fillTopoData();

		vpykhtinUnsubmitted Not Done Reply Inline Actions const SmallVectorImpl<RegisterMaskPair> & In general, when using Small... containers use SmallContainerImpl<T>& for passing by reference, its designed to be used so. vpykhtin: const SmallVectorImpl<RegisterMaskPair> & In general, when using Small... containers use…
		void addLiveRegs(const SmallVectorImpl<RegisterMaskPair> &Regs);
		vpykhtinUnsubmitted Not Done Reply Inline Actions ditto vpykhtin: ditto
		void decreaseLiveRegs(const SmallVectorImpl<RegisterMaskPair> &Regs);
		void releaseItemSuccs(unsigned ID);
		};

enum SIScheduleCandReason {		enum SIScheduleCandReason {
NoCand,		NoCand,
RegUsage,		RegUsage,
Latency,		Latency,
Successor,		Successor,
Depth,		Depth,
NodeOrder		NodeOrder
};		};
Show All 18 Lines	enum SIScheduleBlockLinkKind {
NoData,		NoData,
Data		Data
};		};

class SIScheduleBlock {		class SIScheduleBlock {
SIScheduleDAGMI *DAG;		SIScheduleDAGMI *DAG;
SIScheduleBlockCreator *BC;		SIScheduleBlockCreator *BC;

		std::unique_ptr<SISchedulerRPTracker> RPTracker;

std::vector<SUnit*> SUnits;		std::vector<SUnit*> SUnits;
std::map<unsigned, unsigned> NodeNum2Index;		std::map<unsigned, unsigned> NodeNum2Index;
std::vector<SUnit*> TopReadySUs;
std::vector<SUnit*> ScheduledSUnits;		std::vector<SUnit*> ScheduledSUnits;

/// The top of the unscheduled zone.
IntervalPressure TopPressure;
RegPressureTracker TopRPTracker;

// Pressure: number of said class of registers needed to		// Pressure: number of said class of registers needed to
// store the live virtual and real registers.		// store the live virtual and real registers.
// We do care only of SGPR32 and VGPR32 and do track only virtual registers.		// We do care only of SGPR32 and VGPR32 and do track only virtual registers.
// Pressure of additional registers required inside the block.		// Pressure of additional registers required inside the block.
std::vector<unsigned> InternalAdditionnalPressure;		std::vector<unsigned> InternalAdditionnalPressure;
// Pressure of input and output registers		// Pressure of input and output registers
std::vector<unsigned> LiveInPressure;		unsigned LiveInVGPRPressure;
std::vector<unsigned> LiveOutPressure;		unsigned LiveInSGPRPressure;
		unsigned LiveOutVGPRPressure;
		unsigned LiveOutSGPRPressure;
// Registers required by the block, and outputs.		// Registers required by the block, and outputs.
// We do track only virtual registers.		// We do track only virtual registers.
// Note that some registers are not 32 bits,		// Note that some registers are not 32 bits,
// and thus the pressure is not equal		// and thus the pressure is not equal
// to the number of live registers.		// to the number of live registers.
std::set<unsigned> LiveInRegs;		SmallVector<RegisterMaskPair, 8> LiveInRegs;
std::set<unsigned> LiveOutRegs;		SmallVector<RegisterMaskPair, 8> LiveOutRegs;

bool Scheduled = false;
bool HighLatencyBlock = false;		bool HighLatencyBlock = false;

std::vector<unsigned> HasLowLatencyNonWaitedParent;		std::vector<unsigned> HasLowLatencyNonWaitedParent;

// Unique ID, the index of the Block in the SIScheduleDAGMI Blocks table.		// Unique ID, the index of the Block in the SIScheduleDAGMI Blocks table.
unsigned ID;		unsigned ID;

std::vector<SIScheduleBlock*> Preds; // All blocks predecessors.		std::vector<SIScheduleBlock*> Preds; // All blocks predecessors.
// All blocks successors, and the kind of link		// All blocks successors, and the kind of link
std::vector<std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind>> Succs;		std::vector<std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind>> Succs;
unsigned NumHighLatencySuccessors = 0;		unsigned NumHighLatencySuccessors = 0;

public:		public:
SIScheduleBlock(SIScheduleDAGMI DAG, SIScheduleBlockCreator BC,		SIScheduleBlock(SIScheduleDAGMI DAG, SIScheduleBlockCreator BC,
unsigned ID):		unsigned ID):
DAG(DAG), BC(BC), TopRPTracker(TopPressure), ID(ID) {}		DAG(DAG), BC(BC), LiveInVGPRPressure(0), LiveInSGPRPressure(0),
		LiveOutVGPRPressure(0), LiveOutSGPRPressure(0), LiveInRegs(),
		LiveOutRegs(), ID(ID) {}

~SIScheduleBlock() = default;		~SIScheduleBlock() = default;

unsigned getID() const { return ID; }		unsigned getID() const { return ID; }

/// Functions for Block construction.		/// Functions for Block construction.
void addUnit(SUnit *SU);		void addUnit(SUnit *SU);

// When all SUs have been added.		// When all SUs have been added, and liveIns/Outs computed.
void finalizeUnits();		void finalize();

// Add block pred, which has instruction predecessor of SU.		// Add block pred, which has instruction predecessor of SU.
void addPred(SIScheduleBlock *Pred);		void addPred(SIScheduleBlock *Pred);
void addSucc(SIScheduleBlock *Succ, SIScheduleBlockLinkKind Kind);		void addSucc(SIScheduleBlock *Succ, SIScheduleBlockLinkKind Kind);
		void addLiveIns(const SmallVector<RegisterMaskPair, 8> &Ins);
		vpykhtinUnsubmitted Not Done Reply Inline Actions const SmallVectorImpl<RegisterMaskPair>& vpykhtin: const SmallVectorImpl<RegisterMaskPair>&
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions right, I missed that for this update. Will be for next time. axeldavy: right, I missed that for this update. Will be for next time.
		void addLiveOuts(const SmallVector<RegisterMaskPair, 8> &Outs);
		vpykhtinUnsubmitted Not Done Reply Inline Actions ditto vpykhtin: ditto

const std::vector<SIScheduleBlock*>& getPreds() const { return Preds; }		const std::vector<SIScheduleBlock*>& getPreds() const { return Preds; }
ArrayRef<std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind>>		ArrayRef<std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind>>
getSuccs() const { return Succs; }		getSuccs() const { return Succs; }

unsigned Height; // Maximum topdown path length to block without outputs		unsigned Height; // Maximum topdown path length to block without outputs
unsigned Depth; // Maximum bottomup path length to block without inputs		unsigned Depth; // Maximum bottomup path length to block without inputs

unsigned getNumHighLatencySuccessors() const {		unsigned getNumHighLatencySuccessors() const {
return NumHighLatencySuccessors;		return NumHighLatencySuccessors;
}		}

bool isHighLatencyBlock() { return HighLatencyBlock; }		bool isHighLatencyBlock() { return HighLatencyBlock; }

// This is approximative.		// This is approximative.
// Ideally should take into accounts some instructions (rcp, etc)		// Ideally should take into accounts some instructions (rcp, etc)
// are 4 times slower.		// are 4 times slower.
int getCost() { return SUnits.size(); }		int getCost() { return SUnits.size(); }

// The block Predecessors and Successors must be all registered
// before fastSchedule().
// Fast schedule with no particular requirement.
void fastSchedule();

std::vector<SUnit*> getScheduledUnits() { return ScheduledSUnits; }		std::vector<SUnit*> getScheduledUnits() { return ScheduledSUnits; }

// Complete schedule that will try to minimize reg pressure and
// low latencies, and will fill liveins and liveouts.
// Needs all MIs to be grouped between BeginBlock and EndBlock.
// The MIs can be moved after the scheduling,
// it is just used to allow correct track of live registers.
void schedule(MachineBasicBlock::iterator BeginBlock,
MachineBasicBlock::iterator EndBlock);

bool isScheduled() { return Scheduled; }

// Needs the block to be scheduled inside		// Needs the block to be scheduled inside
// TODO: find a way to compute it.		// TODO: find a way to compute it.
std::vector<unsigned> &getInternalAdditionnalRegUsage() {		std::vector<unsigned> &getInternalAdditionnalRegUsage() {
return InternalAdditionnalPressure;		return InternalAdditionnalPressure;
}		}

std::set<unsigned> &getInRegs() { return LiveInRegs; }		const SmallVector<RegisterMaskPair, 8> &getInRegs() const {
std::set<unsigned> &getOutRegs() { return LiveOutRegs; }		return LiveInRegs;
		vpykhtinUnsubmitted Not Done Reply Inline Actions ditto vpykhtin: ditto
		}

		const SmallVector<RegisterMaskPair, 8> &getOutRegs() const {
		return LiveOutRegs;
		}
		vpykhtinUnsubmitted Not Done Reply Inline Actions const SmallVector<RegisterMaskPair, 8> &getInRegs() const { return LiveInRegs; } vpykhtin: const SmallVector<RegisterMaskPair, 8> &getInRegs() const { return LiveInRegs; }

void printDebug(bool Full);		void printDebug(bool Full);

private:		private:
struct SISchedCandidate : SISchedulerCandidate {		struct SISchedCandidate : SISchedulerCandidate {
// The best SUnit candidate.		// The best SUnit candidate.
SUnit *SU = nullptr;		SUnit *SU = nullptr;

Show All 15 Lines	void setBest(SISchedCandidate &Best) {
SGPRUsage = Best.SGPRUsage;		SGPRUsage = Best.SGPRUsage;
VGPRUsage = Best.VGPRUsage;		VGPRUsage = Best.VGPRUsage;
IsLowLatency = Best.IsLowLatency;		IsLowLatency = Best.IsLowLatency;
LowLatencyOffset = Best.LowLatencyOffset;		LowLatencyOffset = Best.LowLatencyOffset;
HasLowLatencyNonWaitedParent = Best.HasLowLatencyNonWaitedParent;		HasLowLatencyNonWaitedParent = Best.HasLowLatencyNonWaitedParent;
}		}
};		};

void undoSchedule();

void undoReleaseSucc(SUnit SU, SDep SuccEdge);
void releaseSucc(SUnit SU, SDep SuccEdge);
// InOrOutBlock: restrict to links pointing inside the block (true),
// or restrict to links pointing outside the block (false).
void releaseSuccessors(SUnit *SU, bool InOrOutBlock);

void nodeScheduled(SUnit *SU);		void nodeScheduled(SUnit *SU);
void tryCandidateTopDown(SISchedCandidate &Cand, SISchedCandidate &TryCand);		void tryCandidateTopDown(SISchedCandidate &Cand, SISchedCandidate &TryCand);
void tryCandidateBottomUp(SISchedCandidate &Cand, SISchedCandidate &TryCand);		void tryCandidateBottomUp(SISchedCandidate &Cand, SISchedCandidate &TryCand);
SUnit* pickNode();		SUnit* pickNode();
void traceCandidate(const SISchedCandidate &Cand);		void traceCandidate(const SISchedCandidate &Cand);
void initRegPressure(MachineBasicBlock::iterator BeginBlock,		void schedule();
MachineBasicBlock::iterator EndBlock);
};		};

struct SIScheduleBlocks {		struct SIScheduleBlocks {
std::vector<SIScheduleBlock*> Blocks;		std::vector<SIScheduleBlock*> Blocks;
std::vector<int> TopDownIndex2Block;		std::vector<int> TopDownIndex2Block;
std::vector<int> TopDownBlock2Index;		std::vector<int> TopDownBlock2Index;
};		};

▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	private:
// Put in one group all instructions with no users in this scheduling region		// Put in one group all instructions with no users in this scheduling region
// (we'd want these groups be at the end).		// (we'd want these groups be at the end).
void regroupNoUserInstructions();		void regroupNoUserInstructions();

void createBlocksForVariant(SISchedulerBlockCreatorVariant BlockVariant);		void createBlocksForVariant(SISchedulerBlockCreatorVariant BlockVariant);

void topologicalSort();		void topologicalSort();

void scheduleInsideBlocks();

void fillStats();		void fillStats();

		LaneBitmask getLaneBitmaskForDef(const SUnit *SU, unsigned Reg);
		LaneBitmask getLaneBitmaskForUse(const SUnit *SU, unsigned Reg);
		void removeUseFromDef(SmallVectorImpl<RegisterMaskPair> &Uses,
		vpykhtinUnsubmitted Not Done Reply Inline Actions const SmallVectorImpl<RegisterMaskPair>& vpykhtin: const SmallVectorImpl<RegisterMaskPair>&
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions canno be const here because we remove things from Uses. axeldavy: canno be const here because we remove things from Uses.
		unsigned Reg, const SUnit *SU);
		void addDefFromUse(SmallVectorImpl<RegisterMaskPair> &Defs,
		vpykhtinUnsubmitted Not Done Reply Inline Actions ditto vpykhtin: ditto
		unsigned Reg, const SUnit SUDef, const SUnit SUUse);
};		};

enum SISchedulerBlockSchedulerVariant {		enum SISchedulerBlockSchedulerVariant {
BlockLatencyRegUsage,		BlockLatencyRegUsage,
BlockRegUsageLatency,		BlockRegUsageLatency,
BlockRegUsage		BlockRegUsage
};		};

class SIScheduleBlockScheduler {		class SIScheduleBlockScheduler {
SIScheduleDAGMI *DAG;		SIScheduleDAGMI *DAG;
SISchedulerBlockSchedulerVariant Variant;		SISchedulerBlockSchedulerVariant Variant;
std::vector<SIScheduleBlock*> Blocks;		std::vector<SIScheduleBlock*> Blocks;

std::vector<std::map<unsigned, unsigned>> LiveOutRegsNumUsages;		std::unique_ptr<SISchedulerRPTracker> RPTracker;
std::set<unsigned> LiveRegs;
// Num of schedulable unscheduled blocks reading the register.
std::map<unsigned, unsigned> LiveRegsConsumers;

std::vector<unsigned> LastPosHighLatencyParentScheduled;		std::vector<unsigned> LastPosHighLatencyParentScheduled;
int LastPosWaitedHighLatency;		int LastPosWaitedHighLatency;

std::vector<SIScheduleBlock*> BlocksScheduled;		std::vector<SIScheduleBlock*> BlocksScheduled;
unsigned NumBlockScheduled;		unsigned NumBlockScheduled;
std::vector<SIScheduleBlock*> ReadyBlocks;

unsigned VregCurrentUsage;
unsigned SregCurrentUsage;

// Currently is only approximation.		// Currently is only approximation.
unsigned maxVregUsage;		unsigned maxVregUsage;
unsigned maxSregUsage;		unsigned maxSregUsage;

std::vector<unsigned> BlockNumPredsLeft;
std::vector<unsigned> BlockNumSuccsLeft;

public:		public:
SIScheduleBlockScheduler(SIScheduleDAGMI *DAG,		SIScheduleBlockScheduler(SIScheduleDAGMI *DAG,
SISchedulerBlockSchedulerVariant Variant,		SISchedulerBlockSchedulerVariant Variant,
SIScheduleBlocks BlocksStruct);		SIScheduleBlocks BlocksStruct);
~SIScheduleBlockScheduler() = default;		~SIScheduleBlockScheduler() = default;

std::vector<SIScheduleBlock*> getBlocks() { return BlocksScheduled; }		std::vector<SIScheduleBlock*> getBlocks() { return BlocksScheduled; }

Show All 31 Lines	private:
};		};

bool tryCandidateLatency(SIBlockSchedCandidate &Cand,		bool tryCandidateLatency(SIBlockSchedCandidate &Cand,
SIBlockSchedCandidate &TryCand);		SIBlockSchedCandidate &TryCand);
bool tryCandidateRegUsage(SIBlockSchedCandidate &Cand,		bool tryCandidateRegUsage(SIBlockSchedCandidate &Cand,
SIBlockSchedCandidate &TryCand);		SIBlockSchedCandidate &TryCand);
SIScheduleBlock *pickBlock();		SIScheduleBlock *pickBlock();

void addLiveRegs(std::set<unsigned> &Regs);
void decreaseLiveRegs(SIScheduleBlock *Block, std::set<unsigned> &Regs);
void releaseBlockSuccs(SIScheduleBlock *Parent);
void blockScheduled(SIScheduleBlock *Block);		void blockScheduled(SIScheduleBlock *Block);

// Check register pressure change
// by scheduling a block with these LiveIn and LiveOut.
std::vector<int> checkRegUsageImpact(std::set<unsigned> &InRegs,
std::set<unsigned> &OutRegs);

void schedule();		void schedule();
};		};

struct SIScheduleBlockResult {		struct SIScheduleBlockResult {
std::vector<unsigned> SUs;		std::vector<unsigned> SUs;
unsigned MaxSGPRUsage;		unsigned MaxSGPRUsage;
unsigned MaxVGPRUsage;		unsigned MaxVGPRUsage;
};		};
Show All 11 Lines	public:
scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,		scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,
SISchedulerBlockSchedulerVariant ScheduleVariant);		SISchedulerBlockSchedulerVariant ScheduleVariant);
};		};

class SIScheduleDAGMI final : public ScheduleDAGMILive {		class SIScheduleDAGMI final : public ScheduleDAGMILive {
const SIInstrInfo *SITII;		const SIInstrInfo *SITII;
const SIRegisterInfo *SITRI;		const SIRegisterInfo *SITRI;

std::vector<SUnit> SUnitsLinksBackup;		std::vector<SUnit> SUnitsLinksBackup;
		vpykhtinUnsubmitted Not Done Reply Inline Actions You can make it const and return reference, like const SmallVector<RegisterMaskPair, 8> &getInRegs() const; vpykhtin: You can make it const and return reference, like const SmallVector<RegisterMaskPair, 8>…
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions Right. I had in mind we'd add the correct computation of the block liveins here (as for GCNScheduler), but for now we can return reference. axeldavy: Right. I had in mind we'd add the correct computation of the block liveins here (as for…

// For moveLowLatencies. After all Scheduling variants are tested.		// For moveLowLatencies. After all Scheduling variants are tested.
std::vector<unsigned> ScheduledSUnits;		std::vector<unsigned> ScheduledSUnits;
std::vector<unsigned> ScheduledSUnitsInv;		std::vector<unsigned> ScheduledSUnitsInv;
		vpykhtinUnsubmitted Not Done Reply Inline Actions Ditto vpykhtin: Ditto

unsigned VGPRSetID;		unsigned VGPRSetID;
unsigned SGPRSetID;		unsigned SGPRSetID;

public:		public:
SIScheduleDAGMI(MachineSchedContext *C);		SIScheduleDAGMI(MachineSchedContext *C);

~SIScheduleDAGMI() override;		~SIScheduleDAGMI() override;

// Entry point for the schedule.		// Entry point for the schedule.
void schedule() override;		void schedule() override;

// To init Block's RPTracker.
void initRPTracker(RegPressureTracker &RPTracker) {
RPTracker.init(&MF, RegClassInfo, LIS, BB, RegionBegin, false, false);
}

MachineBasicBlock *getBB() { return BB; }		MachineBasicBlock *getBB() { return BB; }
MachineBasicBlock::iterator getCurrentTop() { return CurrentTop; }		MachineBasicBlock::iterator getCurrentTop() { return CurrentTop; }
MachineBasicBlock::iterator getCurrentBottom() { return CurrentBottom; }		MachineBasicBlock::iterator getCurrentBottom() { return CurrentBottom; }
LiveIntervals *getLIS() { return LIS; }		LiveIntervals *getLIS() { return LIS; }
MachineRegisterInfo *getMRI() { return &MRI; }		MachineRegisterInfo *getMRI() { return &MRI; }
const TargetRegisterInfo *getTRI() { return TRI; }		const TargetRegisterInfo *getTRI() { return TRI; }
ScheduleDAGTopologicalSort *GetTopo() { return &Topo; }		ScheduleDAGTopologicalSort *getTopo() { return &Topo; }
SUnit& getEntrySU() { return EntrySU; }		SUnit& getEntrySU() { return EntrySU; }
SUnit& getExitSU() { return ExitSU; }		SUnit& getExitSU() { return ExitSU; }

void restoreSULinksLeft();		const SmallVector<RegisterMaskPair, 8> &getInRegs() const {
		return RPTracker.getPressure().LiveInRegs;
		vpykhtinUnsubmitted Not Done Reply Inline Actions const SmallVector<RegisterMaskPair, 8> &getInRegs() const vpykhtin: const SmallVector<RegisterMaskPair, 8> &getInRegs() const
template<typename _Iterator> void fillVgprSgprCost(_Iterator First,
_Iterator End,
unsigned &VgprUsage,
unsigned &SgprUsage);

std::set<unsigned> getInRegs() {
std::set<unsigned> InRegs;
for (const auto &RegMaskPair : RPTracker.getPressure().LiveInRegs) {
InRegs.insert(RegMaskPair.RegUnit);
}
return InRegs;
}		}

std::set<unsigned> getOutRegs() {		const SmallVector<RegisterMaskPair, 8> &getOutRegs() const {
std::set<unsigned> OutRegs;		return RPTracker.getPressure().LiveOutRegs;
		vpykhtinUnsubmitted Not Done Reply Inline Actions ditto vpykhtin: ditto
for (const auto &RegMaskPair : RPTracker.getPressure().LiveOutRegs) {
OutRegs.insert(RegMaskPair.RegUnit);
}
return OutRegs;
};		};

unsigned getVGPRSetID() const { return VGPRSetID; }		unsigned getVGPRSetID() const { return VGPRSetID; }
unsigned getSGPRSetID() const { return SGPRSetID; }		unsigned getSGPRSetID() const { return SGPRSetID; }

		bool shouldTrackLaneMasks() const { return ShouldTrackLaneMasks; }

private:		private:
void topologicalSort();		void topologicalSort();
// After scheduling is done, improve low latency placements.		// After scheduling is done, improve low latency placements.
void moveLowLatencies();		void moveLowLatencies();

public:		public:
// Some stats for scheduling inside blocks.		// Some stats for scheduling inside blocks.
std::vector<unsigned> IsLowLatencySU;		std::vector<unsigned> IsLowLatencySU;
Show All 11 Lines

lib/Target/AMDGPU/SIMachineScheduler.cpp

Show All 19 Lines
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/CodeGen/LiveInterval.h"		#include "llvm/CodeGen/LiveInterval.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"		#include "llvm/CodeGen/LiveIntervalAnalysis.h"
#include "llvm/CodeGen/MachineInstr.h"		#include "llvm/CodeGen/MachineInstr.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"		#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/MachineScheduler.h"		#include "llvm/CodeGen/MachineScheduler.h"
#include "llvm/CodeGen/RegisterPressure.h"		#include "llvm/CodeGen/RegisterPressure.h"
#include "llvm/CodeGen/SlotIndexes.h"		#include "llvm/CodeGen/SlotIndexes.h"
		#include "llvm/MC/LaneBitmask.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Target/TargetRegisterInfo.h"		#include "llvm/Target/TargetRegisterInfo.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <map>		#include <map>
#include <set>		#include <set>
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
//		//
// As of today the driver doesn't preload the constants in cache, thus the		// As of today the driver doesn't preload the constants in cache, thus the
// first loads get extra latency. The doc says global memory access can be		// first loads get extra latency. The doc says global memory access can be
// 300-600 cycles. We do not specially take that into account when scheduling		// 300-600 cycles. We do not specially take that into account when scheduling
// As we expect the driver to be able to preload the constants soon.		// As we expect the driver to be able to preload the constants soon.

// common code //		// common code //

		SISchedulerRPTracker::SISchedulerRPTracker(
		const SmallVectorImpl<RegisterMaskPair> &LiveIns,
		const SmallVectorImpl<RegisterMaskPair> &LiveOuts,
		vpykhtinUnsubmitted Not Done Reply Inline Actions const SmallVectorImpl<RegisterMaskPair> & vpykhtin: const SmallVectorImpl<RegisterMaskPair> &
		const std::vector<SmallVector<unsigned, 8>> &ItemSuccs,
		const std::vector<SmallVector<unsigned, 8>> &ItemPreds,
		const std::vector<SmallVector<RegisterMaskPair, 8>> &InRegsForItem_,
		const std::vector<SmallVector<RegisterMaskPair, 8>> &OutRegsForItem_,
		const MachineRegisterInfo *MRI,
		const TargetRegisterInfo *TRI,
		unsigned VGPRSetID,
		unsigned SGPRSetID
		):
		MRI(MRI), TRI(TRI), VGPRSetID(VGPRSetID), SGPRSetID(SGPRSetID),
		ItemSuccs(ItemSuccs), ItemPreds(ItemPreds), CurrentVGPRUsage(0),
		CurrentSGPRUsage(0)
		{
		unsigned DAGSize = ItemSuccs.size();

		fillTopoData();

		// To track register usage, we define for each register definition
		// a number of usages before it gets released.
		// This doesn't work with LaneMasks.
		// To handle LaneMasks, we 'cut' registers affected by LaneMasks
		// into all their different Lanes possible
		// and behave as if that (reg, LaneMask) was a register.
		std::map<unsigned, SmallVector<LaneBitmask, 8>> RegWithLaneMask;
		auto addRegToRegWithLaneMask = [&](const RegisterMaskPair &P) {
		if (!P.LaneMask.all() &&
		P.LaneMask != MRI->getMaxLaneMaskForVReg(P.RegUnit))
		RegWithLaneMask[P.RegUnit].push_back(P.LaneMask);
		};

		std::for_each(LiveIns.begin(), LiveIns.end(), addRegToRegWithLaneMask);
		std::for_each(LiveOuts.begin(), LiveOuts.end(), addRegToRegWithLaneMask);
		for (const auto &InRegs : InRegsForItem_)
		std::for_each(InRegs.begin(), InRegs.end(), addRegToRegWithLaneMask);
		for (const auto &OutRegs : OutRegsForItem_)
		std::for_each(OutRegs.begin(), OutRegs.end(), addRegToRegWithLaneMask);

		// Since we ignored when the lane mask was getMaxLaneMaskForVReg,
		// we need to add it back. It doesn't hurt if there was no element
		// with this mask for this register.
		for (auto &RegLaneMasks : RegWithLaneMask) {
		RegLaneMasks.second.push_back(MRI->getMaxLaneMaskForVReg(RegLaneMasks.first));
		}

		// Fills LaneMaskBasisForReg
		for (const auto &RegLaneMasks : RegWithLaneMask) {
		SmallVector<LaneBitmask, 8> &LaneBasis =
		LaneMaskBasisForReg[RegLaneMasks.first];
		for (const LaneBitmask &LaneMask : RegLaneMasks.second) {
		LaneBitmask Remaining = LaneMask;
		// LaneBasis size may change during the iterations.
		for (unsigned LaneBasisIndex = 0; LaneBasisIndex != LaneBasis.size();
		vpykhtinUnsubmitted Not Done Reply Inline Actions I recommend use auto vpykhtin: I recommend use auto
		++LaneBasisIndex) {
		LaneBitmask Elem = LaneBasis[LaneBasisIndex];
		if ((Remaining & Elem).none())
		continue;
		if ((Remaining & Elem) == Elem) {
		Remaining &= ~Elem;
		continue;
		}
		// Remaining intersects with Elem, but Elem is not
		// included in remaining. We divide Elem into two elements.
		// The one included in Remaining, and the rest.
		LaneBitmask NewElem = Elem & ~Remaining;
		LaneBasis[LaneBasisIndex] = Elem & Remaining;
		Remaining &= ~Elem;
		vpykhtinUnsubmitted Not Done Reply Inline Actions You're iterating over a vector increasing its' size at the same time. This is almost ok, except that vector can reallocate and all iterators would become invalidated. Iterator Invalidation Rules (C++03) vector: all iterators and references before the point of insertion are unaffected, unless the new container size is greater than the previous capacity (in which case all iterators and references are invalidated) [23.2.4.3/1] vpykhtin: You're iterating over a vector increasing its' size at the same time. This is almost ok, except…
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions What do you suggest to fix this problem ? axeldavy: What do you suggest to fix this problem ?
		vpykhtinUnsubmitted Not Done Reply Inline Actions Easy way is to reserve sufficient number of elements to avoid reallocation, but this is not reliable as code correctness depends on reserve which is in general optional. You may consider using std::list for this. vpykhtin: Easy way is to reserve sufficient number of elements to avoid reallocation, but this is not…
		LaneBasis.push_back(NewElem);
		}
		if (Remaining.any())
		LaneBasis.push_back(Remaining);
		}
		}

		vpykhtinUnsubmitted Not Done Reply Inline Actions Use const SmallVectorImpl<RegisterMaskPair> & vpykhtin: Use const SmallVectorImpl<RegisterMaskPair> &
		auto convertRegs = [=](const SmallVectorImpl<RegisterMaskPair> &Regs) {
		return getPairsForRegs(Regs);
		};

		InRegsForItem.resize(DAGSize);
		OutRegsForItem.resize(DAGSize);
		std::transform(InRegsForItem_.begin(), InRegsForItem_.end(),
		InRegsForItem.begin(), convertRegs);
		std::transform(OutRegsForItem_.begin(), OutRegsForItem_.end(),
		OutRegsForItem.begin(), convertRegs);

		// Add InRegs to LiveRegs after we add missing LiveIns
		vpykhtinUnsubmitted Not Done Reply Inline Actions auto vpykhtin: auto
		auto InRegs = getPairsForRegs(LiveIns);

		// Fill the usage of every output
		// Warning: while by construction we always have a link between two items
		// when one needs a result from the other, the number of users of an output
		// is not the sum of child items having as input the same virtual register.
		// Here is an example. A produces x and y. B eats x and produces x'.
		// C eats x' and y. The register coalescer may have attributed the same
		// virtual register to x and x'.
		// To count accurately, we do a topological sort. In case the register is
		// found for several parents, we increment the usage of the one with the
		// highest topological index.
		OutRegsNumUsages.resize(DAGSize);
		for (unsigned i = 0; i < DAGSize; i++) {
		for (const auto &RegPair : InRegsForItem[i]) {
		bool Found = false;
		int topoInd = -1;
		for (unsigned PredID : ItemPreds[i]) {
		vpykhtinUnsubmitted Not Done Reply Inline Actions auto vpykhtin: auto
		const auto &PredOutRegs =
		this->OutRegsForItem[PredID];
		for (const auto &RegPair2 : PredOutRegs) {
		if (RegPair == RegPair2) {
		Found = true;
		if (topoInd < (int)TopoItemToIndex[PredID]) {
		topoInd = TopoItemToIndex[PredID];
		}
		break;
		}
		}
		}

		if (!Found) {
		// Fill RemainingRegsConsumers for regs that were already
		// defined before scheduling.
		++RemainingRegsConsumers[RegPair];
		// Workaround for incomplete liveIns: Add missing liveIns.
		// addLiveRegs is noop if already Live.
		InRegs.push_back(RegPair);
		}
		else {
		unsigned PredID = TopoIndexToItem[topoInd];
		++OutRegsNumUsages[PredID][RegPair];
		}
		}
		}

		addLiveRegs(InRegs);

		ItemNumPredsLeft.resize(DAGSize);

		for (unsigned i = 0; i < DAGSize; i++) {
		unsigned NumPreds = ItemPreds[i].size();
		ItemNumPredsLeft[i] = NumPreds;
		if (NumPreds == 0)
		ReadyItems.push_back(i);
		}

		// Increase OutRegsNumUsages for items
		// producing registers consumed in another
		// scheduling region.
		for (const auto &RegPair : getPairsForRegs(LiveOuts)) {
		for (unsigned i = 0; i < DAGSize; i++) {
		// Do reverse traversal
		bool Found = false;
		int ID = TopoIndexToItem[DAGSize-1-i];
		vpykhtinUnsubmitted Not Done Reply Inline Actions auto vpykhtin: auto
		const auto &OutRegs =
		this->OutRegsForItem[ID];

		for (const auto &RegPair2 : OutRegs) {
		if (RegPair == RegPair2) {
		Found = true;
		break;
		}
		}

		if (!Found)
		continue;

		++OutRegsNumUsages[ID][RegPair];
		break;
		}
		}
		}

		void
		SISchedulerRPTracker::getCurrentRegUsage(unsigned &VGPR, unsigned &SGPR)
		{
		VGPR = CurrentVGPRUsage;
		SGPR = CurrentSGPRUsage;
		}

		void
		SISchedulerRPTracker::checkRegUsageImpact(unsigned ID,
		int &DiffVGPR,
		int &DiffSGPR) {
		SmallDenseMap<unsigned, LaneBitmask> Map;
		vpykhtinUnsubmitted Not Done Reply Inline Actions SmallPtrSet is designed for pointers, gcc would complain, try consider SmallDenseSet Are you developing using MSVS? vpykhtin: SmallPtrSet is designed for pointers, gcc would complain, try consider SmallDenseSet Are you…
		SmallDenseSet<unsigned> Set;

		DiffVGPR = 0;
		DiffSGPR = 0;

		for (const auto &RegPair : InRegsForItem[ID]) {
		// For now only track virtual registers.
		unsigned Reg = RegPair.RegUnit;
		if (TargetRegisterInfo::isVirtualRegister(Reg))
		continue;

		if (RemainingRegsConsumers[RegPair] > 1)
		continue;
		Map[Reg] \|= RegPair.LaneMask;
		}

		for (const auto &RegPair : Map) {
		if (LiveRegs[RegPair.first] == RegPair.second) {
		unsigned PressureVGPR, PressureSGPR;
		vpykhtinUnsubmitted Not Done Reply Inline Actions I saw a lot of such code snippets in your code, please make a function vpykhtin: I saw a lot of such code snippets in your code, please make a function
		getPressureOfReg(RegPair.first, PressureVGPR, PressureSGPR);
		DiffVGPR -= PressureVGPR;
		DiffSGPR -= PressureSGPR;
		}
		}

		for (const auto &RegPair : OutRegsForItem[ID]) {
		// For now only track virtual registers.
		unsigned Reg = RegPair.RegUnit;
		if (TargetRegisterInfo::isVirtualRegister(Reg))
		continue;

		Set.insert(Reg);
		}

		for (unsigned Reg : Set) {
		// Check register is not already alive (at least some lanes)
		if (LiveRegs.find(Reg) == LiveRegs.end()) {
		unsigned PressureVGPR, PressureSGPR;
		getPressureOfReg(Reg, PressureVGPR, PressureSGPR);
		DiffVGPR += PressureVGPR;
		DiffSGPR += PressureSGPR;
		}
		}
		}

		void
		SISchedulerRPTracker::getPressureOfReg(unsigned Reg,
		unsigned &PressureVGPR,
		unsigned &PressureSGPR)
		{
		PressureVGPR = 0;
		vpykhtinUnsubmitted Not Done Reply Inline Actions #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD vpykhtin: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions Why LLVM_DUMP_METHOD on an empty line ? axeldavy: Why LLVM_DUMP_METHOD on an empty line ?
		vpykhtinUnsubmitted Not Done Reply Inline Actions I meant something like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP) LLVM_DUMP_METHOD SISchedulerRPTracker::printDebugLives() vpykhtin: I meant something like this: #if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)…
		PressureSGPR = 0;
		PSetIterator PSetI = MRI->getPressureSets(Reg);
		for (; PSetI.isValid(); ++PSetI) {
		if (*PSetI == VGPRSetID)
		PressureVGPR += PSetI.getWeight();
		if (*PSetI == SGPRSetID)
		PressureSGPR += PSetI.getWeight();
		}
		}

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)
		vpykhtinUnsubmitted Not Done Reply Inline Actions I would move LIS->getInstructionIndex(BeginBlock).getRegSlot(), LIS->getInstructionIndex(EndBlock).getRegSlot() out of the loop vpykhtin: I would move LIS->getInstructionIndex(*BeginBlock).getRegSlot(), LIS->getInstructionIndex…

		LLVM_DUMP_METHOD void
		SISchedulerRPTracker::printDebugLives()
		{
		for (const auto &RegPair : LiveRegs) {
		dbgs() << PrintVRegOrUnit(RegPair.first, TRI);
		if (!RegPair.second.all())
		dbgs() << ':' << PrintLaneMask(RegPair.second);
		dbgs() << ' ';
		}
		}

		#endif

		SmallVector<RegisterMaskPair, 8>
		SISchedulerRPTracker::getPairsForReg(unsigned Reg, LaneBitmask Mask)
		{
		SmallVector<RegisterMaskPair, 8> Result;

		getPairsForReg(Result, Reg, Mask);

		return Result;
		}

		void
		SISchedulerRPTracker::getPairsForReg(SmallVector<RegisterMaskPair, 8> &ToAppend,
		unsigned Reg, LaneBitmask Mask)
		{
		auto Basis = LaneMaskBasisForReg.find(Reg);
		if (Basis == LaneMaskBasisForReg.end()) {
		assert(Mask.all() \|\| Mask == MRI->getMaxLaneMaskForVReg(Reg));
		// We want unicity of the RegisterMaskPair for a same register/mask
		// Thus replace getMaxLaneMaskForVReg by all, since they have the same
		// meaning.
		// Note: Physical registers have Mask.all(), but are disallowed
		// to call getMaxLaneMaskForVReg.
		if (!Mask.all() && Mask == MRI->getMaxLaneMaskForVReg(Reg))
		Mask = LaneBitmask::getAll();
		ToAppend.push_back(RegisterMaskPair(Reg, Mask));
		} else {
		for (const auto &Elem : Basis->second) {
		if ((Mask & Elem).any()) {
		assert((Mask & Elem) == Elem);
		ToAppend.push_back(RegisterMaskPair(Reg, Elem));
		Mask &= ~Elem;
		}
		}
		// Mask.all will have a non-none value.
		// We want Mask.all equivalent to the max lane mask.
		assert((Mask & MRI->getMaxLaneMaskForVReg(Reg)).none());
		}
		}

		SmallVector<RegisterMaskPair, 8>
		SISchedulerRPTracker::getPairsForRegs(const SmallVectorImpl<RegisterMaskPair> &Regs)
		{
		SmallVector<RegisterMaskPair, 8> Result;

		std::for_each(Regs.begin(), Regs.end(),
		[&](const RegisterMaskPair &RegPair){
		getPairsForReg(Result, RegPair.RegUnit,
		RegPair.LaneMask);
		});

		return Result;
		}

		void
		SISchedulerRPTracker::fillTopoData()
		{
		unsigned DAGSize = ItemSuccs.size();
		std::vector<int> WorkList;

		DEBUG(dbgs() << "Topological Sort\n");

		WorkList.reserve(DAGSize);
		TopoIndexToItem.resize(DAGSize);
		TopoItemToIndex.resize(DAGSize);

		for (unsigned i = 0, e = DAGSize; i != e; ++i) {
		unsigned Degree = ItemSuccs[i].size();
		TopoItemToIndex[i] = Degree;
		if (Degree == 0) {
		WorkList.push_back(i);
		}
		}

		int Id = DAGSize;
		while (!WorkList.empty()) {
		int i = WorkList.back();
		WorkList.pop_back();
		TopoItemToIndex[i] = --Id;
		TopoIndexToItem[Id] = i;
		for (unsigned PredID : ItemPreds[i]) {
		if (!--TopoItemToIndex[PredID])
		WorkList.push_back(PredID);
		}
		}

		#ifndef NDEBUG
		// Check correctness of the ordering.
		for (unsigned i = 0, e = DAGSize; i != e; ++i) {
		for (unsigned PredID : ItemPreds[i]) {
		assert(TopoItemToIndex[i] > TopoItemToIndex[PredID] &&
		"Wrong Top Down topological sorting");
		}
		}
		#endif
		}

		void
		SISchedulerRPTracker::addLiveRegs(const SmallVectorImpl<RegisterMaskPair> &Regs) {
		for (const RegisterMaskPair &RegPair : Regs) {
		unsigned Reg = RegPair.RegUnit;
		unsigned PressureVGPR, PressureSGPR;
		// For now only track virtual registers.
		if (!TargetRegisterInfo::isVirtualRegister(Reg))
		continue;

		auto Pos = LiveRegs.find(Reg);

		if (Pos == LiveRegs.end()) {
		assert(RegPair.LaneMask.any());
		LiveRegs.insert(std::make_pair(Reg, RegPair.LaneMask));
		getPressureOfReg(Reg, PressureVGPR, PressureSGPR);
		CurrentVGPRUsage += PressureVGPR;
		CurrentSGPRUsage += PressureSGPR;
		}
		else {
		Pos->second \|= RegPair.LaneMask;
		}
		}
		}

		void
		SISchedulerRPTracker::decreaseLiveRegs(const SmallVectorImpl<RegisterMaskPair> &Regs) {
		for (const RegisterMaskPair &RegPair : Regs) {
		// For now only track virtual registers.
		unsigned Reg = RegPair.RegUnit;
		unsigned PressureVGPR, PressureSGPR;
		if (!TargetRegisterInfo::isVirtualRegister(Reg))
		continue;

		std::map<unsigned, LaneBitmask>::iterator Pos = LiveRegs.find(Reg);
		assert(Pos != LiveRegs.end() && // Reg must be live.
		RemainingRegsConsumers.find(RegPair) !=
		RemainingRegsConsumers.end() &&
		RemainingRegsConsumers[RegPair] >= 1);
		--RemainingRegsConsumers[RegPair];
		if (RemainingRegsConsumers[RegPair] == 0) {
		if (Pos->second == RegPair.LaneMask) {
		LiveRegs.erase(Pos);
		getPressureOfReg(Reg, PressureVGPR, PressureSGPR);
		CurrentVGPRUsage -= PressureVGPR;
		CurrentSGPRUsage -= PressureSGPR;
		}
		else
		Pos->second &= ~RegPair.LaneMask;
		}
		}
		}

		void
		SISchedulerRPTracker::releaseItemSuccs(unsigned ID) {
		for (unsigned SuccID : ItemSuccs[ID]) {
		if (--ItemNumPredsLeft[SuccID] == 0)
		ReadyItems.push_back(SuccID);
		}
		}

		void
		SISchedulerRPTracker::itemScheduled(unsigned ID) {
		auto ReadyItemPos = std::find(ReadyItems.begin(), ReadyItems.end(), ID);
		assert(ReadyItemPos != ReadyItems.end());
		ReadyItems.erase(ReadyItemPos);
		decreaseLiveRegs(InRegsForItem[ID]);
		addLiveRegs(OutRegsForItem[ID]);
		releaseItemSuccs(ID);
		for (std::map<RegisterMaskPair, unsigned>::iterator RegI =
		OutRegsNumUsages[ID].begin(),
		E = OutRegsNumUsages[ID].end(); RegI != E; ++RegI) {
		std::pair<RegisterMaskPair, unsigned> RegP = *RegI;
		if (!TargetRegisterInfo::isVirtualRegister(RegP.first.RegUnit))
		continue;
		// We produce this register, thus it must not be previously alive.
		assert(RemainingRegsConsumers.find(RegP.first) ==
		RemainingRegsConsumers.end() \|\|
		RemainingRegsConsumers[RegP.first] == 0);
		RemainingRegsConsumers[RegP.first] += RegP.second;
		}
		}

#ifndef NDEBUG		#ifndef NDEBUG

static const char *getReasonStr(SIScheduleCandReason Reason) {		static const char *getReasonStr(SIScheduleCandReason Reason) {
switch (Reason) {		switch (Reason) {
case NoCand: return "NOCAND";		case NoCand: return "NOCAND";
case RegUsage: return "REGUSAGE";		case RegUsage: return "REGUSAGE";
case Latency: return "LATENCY";		case Latency: return "LATENCY";
case Successor: return "SUCCESSOR";		case Successor: return "SUCCESSOR";
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	void SIScheduleBlock::tryCandidateTopDown(SISchedCandidate &Cand,

// Fall through to original instruction order.		// Fall through to original instruction order.
if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {		if (TryCand.SU->NodeNum < Cand.SU->NodeNum) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
}		}
}		}

SUnit* SIScheduleBlock::pickNode() {		SUnit* SIScheduleBlock::pickNode() {
		SmallVector<unsigned, 16> TopReadySUs = RPTracker->getReadyItems();
SISchedCandidate TopCand;		SISchedCandidate TopCand;
		unsigned VGPRCurrentUsage, SGPRCurrentUsage;
		RPTracker->getCurrentRegUsage(VGPRCurrentUsage, SGPRCurrentUsage);

for (SUnit* SU : TopReadySUs) {		for (unsigned ID : TopReadySUs) {
		SUnit *SU = SUnits[ID];
SISchedCandidate TryCand;		SISchedCandidate TryCand;
std::vector<unsigned> pressure;		int VGPRDiff, SGPRDiff;
std::vector<unsigned> MaxPressure;
// Predict register usage after this instruction.
TryCand.SU = SU;		TryCand.SU = SU;
TopRPTracker.getDownwardPressure(SU->getInstr(), pressure, MaxPressure);		RPTracker->checkRegUsageImpact(ID, VGPRDiff, SGPRDiff);
TryCand.SGPRUsage = pressure[DAG->getSGPRSetID()];		TryCand.SGPRUsage = SGPRCurrentUsage + SGPRDiff;
TryCand.VGPRUsage = pressure[DAG->getVGPRSetID()];		TryCand.VGPRUsage = VGPRCurrentUsage + VGPRDiff;
TryCand.IsLowLatency = DAG->IsLowLatencySU[SU->NodeNum];		TryCand.IsLowLatency = DAG->IsLowLatencySU[SU->NodeNum];
TryCand.LowLatencyOffset = DAG->LowLatencyOffset[SU->NodeNum];		TryCand.LowLatencyOffset = DAG->LowLatencyOffset[SU->NodeNum];
TryCand.HasLowLatencyNonWaitedParent =		TryCand.HasLowLatencyNonWaitedParent =
HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]];		HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]];
tryCandidateTopDown(TopCand, TryCand);		tryCandidateTopDown(TopCand, TryCand);
if (TryCand.Reason != NoCand)		if (TryCand.Reason != NoCand)
TopCand.setBest(TryCand);		TopCand.setBest(TryCand);
}		}

return TopCand.SU;		return TopCand.SU;
}		}

		static void addRegLanes(SmallVectorImpl<RegisterMaskPair> &RegUnits,
// Schedule something valid.		RegisterMaskPair Pair) {
void SIScheduleBlock::fastSchedule() {		unsigned RegUnit = Pair.RegUnit;
TopReadySUs.clear();		assert(Pair.LaneMask.any());
if (Scheduled)		auto I = llvm::find_if(RegUnits, [RegUnit](const RegisterMaskPair Other) {
undoSchedule();		return Other.RegUnit == RegUnit;
		});
for (SUnit* SU : SUnits) {		if (I == RegUnits.end()) {
if (!SU->NumPredsLeft)		RegUnits.push_back(Pair);
TopReadySUs.push_back(SU);		} else {
		I->LaneMask \|= Pair.LaneMask;
}		}

while (!TopReadySUs.empty()) {
SUnit *SU = TopReadySUs[0];
ScheduledSUnits.push_back(SU);
nodeScheduled(SU);
}		}

Scheduled = true;		static void getDefsForSU(SmallVector<RegisterMaskPair, 8> &Defs,
		const SUnit &SU, const TargetRegisterInfo *TRI,
		bool ShouldTrackLaneMasks)
		{
		for (const MachineOperand &MO : SU.getInstr()->operands()) {
		if (MO.isReg() && MO.isDef() && !MO.isDead()) {
		unsigned Reg = MO.getReg();
		// For now only track virtual registers
		if (!TargetRegisterInfo::isVirtualRegister(Reg))
		continue;
		if (!ShouldTrackLaneMasks)
		addRegLanes(Defs, RegisterMaskPair(Reg, LaneBitmask::getAll()));
		else {
		unsigned SubRegIdx = MO.getSubReg();
		if (MO.isUndef())
		SubRegIdx = 0;
		addRegLanes(Defs, RegisterMaskPair(Reg,
		SubRegIdx != 0 ?
		TRI->getSubRegIndexLaneMask(SubRegIdx) :
		LaneBitmask::getAll()));
		}
		}
		}
}		}

// Returns if the register was set between first and last.		// Here one big difference with RegOpers.collect is that
static bool isDefBetween(unsigned Reg,		// we don't count in the Uses the Defs with readsReg() when
SlotIndex First, SlotIndex Last,		// ShouldTrackLaneMasks is false.
const MachineRegisterInfo *MRI,		// We don't choose not to count them because the undef flags
const LiveIntervals *LIS) {		// are not always set properly (we would need to schedule first
for (MachineRegisterInfo::def_instr_iterator		// and update LiveIntervals to get them correct).
UI = MRI->def_instr_begin(Reg),		// For better tracking, we prefer to miss some uses, rather than
UE = MRI->def_instr_end(); UI != UE; ++UI) {		// having incorrect uses.
const MachineInstr* MI = &*UI;		static void getUsesForSU(SmallVector<RegisterMaskPair, 8> &Uses,
if (MI->isDebugValue())		const SUnit &SU, const TargetRegisterInfo *TRI,
		bool ShouldTrackLaneMasks)
		{
		for (const MachineOperand &MO : SU.getInstr()->operands()) {
		if (MO.isReg() && MO.isUse() && !MO.isUndef() && !MO.isInternalRead()) {
		unsigned Reg = MO.getReg();
		if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;		continue;
SlotIndex InstSlot = LIS->getInstructionIndex(*MI).getRegSlot();		if (!ShouldTrackLaneMasks)
if (InstSlot >= First && InstSlot <= Last)		addRegLanes(Uses, RegisterMaskPair(Reg, LaneBitmask::getAll()));
return true;		else {
		unsigned SubRegIdx = MO.getSubReg();
		if (MO.isUndef())
		SubRegIdx = 0;
		addRegLanes(Uses, RegisterMaskPair(Reg,
		SubRegIdx != 0 ?
		TRI->getSubRegIndexLaneMask(SubRegIdx) :
		LaneBitmask::getAll()));
		}
		}
}		}
return false;
}		}

void SIScheduleBlock::initRegPressure(MachineBasicBlock::iterator BeginBlock,		void SIScheduleBlock::schedule() {
MachineBasicBlock::iterator EndBlock) {		std::vector<SmallVector<unsigned, 8>> SUSuccs;
IntervalPressure Pressure, BotPressure;		std::vector<SmallVector<unsigned, 8>> SUPreds;
RegPressureTracker RPTracker(Pressure), BotRPTracker(BotPressure);		std::vector<SmallVector<RegisterMaskPair, 8>> InRegsForSU;
LiveIntervals *LIS = DAG->getLIS();		std::vector<SmallVector<RegisterMaskPair, 8>> OutRegsForSU;
MachineRegisterInfo *MRI = DAG->getMRI();
DAG->initRPTracker(TopRPTracker);
DAG->initRPTracker(BotRPTracker);
DAG->initRPTracker(RPTracker);

// Goes though all SU. RPTracker captures what had to be alive for the SUs
// to execute, and what is still alive at the end.
for (SUnit* SU : ScheduledSUnits) {
RPTracker.setPos(SU->getInstr());
RPTracker.advance();
}

// Close the RPTracker to finalize live ins/outs.
RPTracker.closeRegion();

// Initialize the live ins and live outs.
TopRPTracker.addLiveRegs(RPTracker.getPressure().LiveInRegs);
BotRPTracker.addLiveRegs(RPTracker.getPressure().LiveOutRegs);

// Do not Track Physical Registers, because it messes up.
for (const auto &RegMaskPair : RPTracker.getPressure().LiveInRegs) {
if (TargetRegisterInfo::isVirtualRegister(RegMaskPair.RegUnit))
LiveInRegs.insert(RegMaskPair.RegUnit);
}
LiveOutRegs.clear();
// There is several possibilities to distinguish:
// 1) Reg is not input to any instruction in the block, but is output of one
// 2) 1) + read in the block and not needed after it
// 3) 1) + read in the block but needed in another block
// 4) Reg is input of an instruction but another block will read it too
// 5) Reg is input of an instruction and then rewritten in the block.
// result is not read in the block (implies used in another block)
// 6) Reg is input of an instruction and then rewritten in the block.
// result is read in the block and not needed in another block
// 7) Reg is input of an instruction and then rewritten in the block.
// result is read in the block but also needed in another block
// LiveInRegs will contains all the regs in situation 4, 5, 6, 7
// We want LiveOutRegs to contain only Regs whose content will be read after
// in another block, and whose content was written in the current block,
// that is we want it to get 1, 3, 5, 7
// Since we made the MIs of a block to be packed all together before
// scheduling, then the LiveIntervals were correct, and the RPTracker was
// able to correctly handle 5 vs 6, 2 vs 3.
// (Note: This is not sufficient for RPTracker to not do mistakes for case 4)
// The RPTracker's LiveOutRegs has 1, 3, (some correct or incorrect)4, 5, 7
// Comparing to LiveInRegs is not sufficient to differenciate 4 vs 5, 7
// The use of findDefBetween removes the case 4.
for (const auto &RegMaskPair : RPTracker.getPressure().LiveOutRegs) {
unsigned Reg = RegMaskPair.RegUnit;
if (TargetRegisterInfo::isVirtualRegister(Reg) &&
isDefBetween(Reg, LIS->getInstructionIndex(*BeginBlock).getRegSlot(),
LIS->getInstructionIndex(*EndBlock).getRegSlot(), MRI,
LIS)) {
LiveOutRegs.insert(Reg);
}
}

// Pressure = sum_alive_registers register size
// Internally llvm will represent some registers as big 128 bits registers
// for example, but they actually correspond to 4 actual 32 bits registers.
// Thus Pressure is not equal to num_alive_registers * constant.
LiveInPressure = TopPressure.MaxSetPressure;
LiveOutPressure = BotPressure.MaxSetPressure;

// Prepares TopRPTracker for top down scheduling.
TopRPTracker.closeTop();
}

void SIScheduleBlock::schedule(MachineBasicBlock::iterator BeginBlock,
MachineBasicBlock::iterator EndBlock) {
if (!Scheduled)
fastSchedule();

// PreScheduling phase to set LiveIn and LiveOut.
initRegPressure(BeginBlock, EndBlock);
undoSchedule();

// Schedule for real now.		SUSuccs.resize(SUnits.size());
		SUPreds.resize(SUnits.size());
		InRegsForSU.resize(SUnits.size());
		OutRegsForSU.resize(SUnits.size());

TopReadySUs.clear();		for (unsigned i = 0; i < SUnits.size(); i++) {
		SUnit *SU = SUnits[i];

for (SUnit* SU : SUnits) {		for (SDep& SuccDep : SU->Succs) {
if (!SU->NumPredsLeft)		SUnit *Succ = SuccDep.getSUnit();
TopReadySUs.push_back(SU);		if (Succ->isBoundaryNode() \|\| !BC->isSUInBlock(Succ, ID))
		continue;
		if (SuccDep.isWeak())
		continue;
		SUSuccs[i].push_back(NodeNum2Index[Succ->NodeNum]);
}		}
		for (SDep& PredDep : SU->Preds) {
		SUnit *Pred = PredDep.getSUnit();
		if (Pred->isBoundaryNode() \|\| !BC->isSUInBlock(Pred, ID))
		continue;
		if (PredDep.isWeak())
		continue;
		SUPreds[i].push_back(NodeNum2Index[Pred->NodeNum]);
		}
		getUsesForSU(InRegsForSU[i], *SU, DAG->getTRI(),
		DAG->shouldTrackLaneMasks());
		getDefsForSU(OutRegsForSU[i], *SU, DAG->getTRI(),
		DAG->shouldTrackLaneMasks());
		}

		RPTracker.reset(new SISchedulerRPTracker(
		LiveInRegs,
		LiveOutRegs,
		SUSuccs,
		SUPreds,
		InRegsForSU,
		OutRegsForSU,
		DAG->getMRI(),
		DAG->getTRI(),
		DAG->getVGPRSetID(),
		DAG->getSGPRSetID()
		));

while (!TopReadySUs.empty()) {		while (!RPTracker->getReadyItems().empty()) {
SUnit *SU = pickNode();		SUnit *SU = pickNode();
ScheduledSUnits.push_back(SU);		ScheduledSUnits.push_back(SU);
TopRPTracker.setPos(SU->getInstr());
TopRPTracker.advance();
nodeScheduled(SU);		nodeScheduled(SU);
}		}

// TODO: compute InternalAdditionnalPressure.		// TODO: compute InternalAdditionnalPressure.
InternalAdditionnalPressure.resize(TopPressure.MaxSetPressure.size());		InternalAdditionnalPressure.resize(DAG->getTRI()->getNumRegPressureSets());

// Check everything is right.		// Check everything is right.
#ifndef NDEBUG		#ifndef NDEBUG
assert(SUnits.size() == ScheduledSUnits.size() &&		assert(SUnits.size() == ScheduledSUnits.size() &&
TopReadySUs.empty());		RPTracker->getReadyItems().empty());
for (SUnit* SU : SUnits) {
assert(SU->isScheduled &&
SU->NumPredsLeft == 0);
}
#endif		#endif

Scheduled = true;
}

void SIScheduleBlock::undoSchedule() {
for (SUnit* SU : SUnits) {
SU->isScheduled = false;
for (SDep& Succ : SU->Succs) {
if (BC->isSUInBlock(Succ.getSUnit(), ID))
undoReleaseSucc(SU, &Succ);
}
}
HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);
ScheduledSUnits.clear();
Scheduled = false;
}

void SIScheduleBlock::undoReleaseSucc(SUnit SU, SDep SuccEdge) {
SUnit *SuccSU = SuccEdge->getSUnit();

if (SuccEdge->isWeak()) {
++SuccSU->WeakPredsLeft;
return;
}
++SuccSU->NumPredsLeft;
}

void SIScheduleBlock::releaseSucc(SUnit SU, SDep SuccEdge) {
SUnit *SuccSU = SuccEdge->getSUnit();

if (SuccEdge->isWeak()) {
--SuccSU->WeakPredsLeft;
return;
}
#ifndef NDEBUG
if (SuccSU->NumPredsLeft == 0) {
dbgs() << "* Scheduling failed! *\n";
SuccSU->dump(DAG);
dbgs() << " has been released too many times!\n";
llvm_unreachable(nullptr);
}
#endif

--SuccSU->NumPredsLeft;
}

/// Release Successors of the SU that are in the block or not.
void SIScheduleBlock::releaseSuccessors(SUnit *SU, bool InOrOutBlock) {
for (SDep& Succ : SU->Succs) {
SUnit *SuccSU = Succ.getSUnit();

if (SuccSU->NodeNum >= DAG->SUnits.size())
continue;

if (BC->isSUInBlock(SuccSU, ID) != InOrOutBlock)
continue;

releaseSucc(SU, &Succ);
if (SuccSU->NumPredsLeft == 0 && InOrOutBlock)
TopReadySUs.push_back(SuccSU);
}
}		}

void SIScheduleBlock::nodeScheduled(SUnit *SU) {		void SIScheduleBlock::nodeScheduled(SUnit *SU) {
// Is in TopReadySUs		RPTracker->itemScheduled(NodeNum2Index[SU->NodeNum]);
assert (!SU->NumPredsLeft);
std::vector<SUnit *>::iterator I = llvm::find(TopReadySUs, SU);
if (I == TopReadySUs.end()) {
dbgs() << "Data Structure Bug in SI Scheduler\n";
llvm_unreachable(nullptr);
}
TopReadySUs.erase(I);

releaseSuccessors(SU, true);
// Scheduling this node will trigger a wait,		// Scheduling this node will trigger a wait,
// thus propagate to other instructions that they do not need to wait either.		// thus propagate to other instructions that they do not need to wait either.
if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]])		if (HasLowLatencyNonWaitedParent[NodeNum2Index[SU->NodeNum]])
HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);		HasLowLatencyNonWaitedParent.assign(SUnits.size(), 0);

if (DAG->IsLowLatencySU[SU->NodeNum]) {		if (DAG->IsLowLatencySU[SU->NodeNum]) {
for (SDep& Succ : SU->Succs) {		for (SDep& Succ : SU->Succs) {
std::map<unsigned, unsigned>::iterator I =		std::map<unsigned, unsigned>::iterator I =
NodeNum2Index.find(Succ.getSUnit()->NodeNum);		NodeNum2Index.find(Succ.getSUnit()->NodeNum);
if (I != NodeNum2Index.end())		if (I != NodeNum2Index.end())
HasLowLatencyNonWaitedParent[I->second] = 1;		HasLowLatencyNonWaitedParent[I->second] = 1;
}		}
}		}
SU->isScheduled = true;
}		}

void SIScheduleBlock::finalizeUnits() {		void SIScheduleBlock::finalize() {
// We remove links from outside blocks to enable scheduling inside the block.
for (SUnit* SU : SUnits) {		for (SUnit* SU : SUnits) {
releaseSuccessors(SU, false);
if (DAG->IsHighLatencySU[SU->NodeNum])		if (DAG->IsHighLatencySU[SU->NodeNum])
HighLatencyBlock = true;		HighLatencyBlock = true;
}		}
HasLowLatencyNonWaitedParent.resize(SUnits.size(), 0);		HasLowLatencyNonWaitedParent.resize(SUnits.size(), 0);
		schedule();
}		}

// we maintain ascending order of IDs		// we maintain ascending order of IDs
void SIScheduleBlock::addPred(SIScheduleBlock *Pred) {		void SIScheduleBlock::addPred(SIScheduleBlock *Pred) {
unsigned PredID = Pred->getID();		unsigned PredID = Pred->getID();

// Check if not already predecessor.		// Check if not already predecessor.
for (SIScheduleBlock* P : Preds) {		for (SIScheduleBlock* P : Preds) {
Show All 27 Lines	if (Succ->isHighLatencyBlock())
++NumHighLatencySuccessors;		++NumHighLatencySuccessors;
Succs.push_back(std::make_pair(Succ, Kind));		Succs.push_back(std::make_pair(Succ, Kind));

assert(none_of(Preds,		assert(none_of(Preds,
[=](SIScheduleBlock *P) { return SuccID == P->getID(); }) &&		[=](SIScheduleBlock *P) { return SuccID == P->getID(); }) &&
"Loop in the Block Graph!");		"Loop in the Block Graph!");
}		}

#ifndef NDEBUG		void SIScheduleBlock::addLiveIns(const SmallVector<RegisterMaskPair, 8> &Ins)
		{
		auto addLiveIn = [&](const RegisterMaskPair &RegPair) {
		unsigned Reg = RegPair.RegUnit;
		assert(RegPair.LaneMask.any());
		auto Pos = std::find_if(LiveInRegs.begin(), LiveInRegs.end(),
		[=](RegisterMaskPair &RegPair2) {
		return RegPair2.RegUnit == Reg;
		});
		if (Pos == LiveInRegs.end()) {
		unsigned PressureVGPR, PressureSGPR;
		RPTracker->getPressureOfReg(Reg, PressureVGPR, PressureSGPR);
		LiveInVGPRPressure += PressureVGPR;
		LiveInSGPRPressure += PressureSGPR;
		LiveInRegs.push_back(RegPair);
		}
		else
		Pos->LaneMask \|= RegPair.LaneMask;
		};

		std::for_each(Ins.begin(), Ins.end(), addLiveIn);
		}

		void SIScheduleBlock::addLiveOuts(const SmallVector<RegisterMaskPair, 8> &Outs)
		{
		auto addLiveOut = [&](const RegisterMaskPair &RegPair) {
		unsigned Reg = RegPair.RegUnit;
		auto Pos = std::find_if(LiveOutRegs.begin(), LiveOutRegs.end(),
		[=](RegisterMaskPair &RegPair2) {
		return RegPair2.RegUnit == Reg;
		});
		if (Pos == LiveOutRegs.end()) {
		unsigned PressureVGPR, PressureSGPR;
		RPTracker->getPressureOfReg(Reg, PressureVGPR, PressureSGPR);
		LiveOutVGPRPressure += PressureVGPR;
		LiveOutSGPRPressure += PressureSGPR;
		LiveOutRegs.push_back(RegPair);
		}
		else
		Pos->LaneMask \|= RegPair.LaneMask;
		};

		std::for_each(Outs.begin(), Outs.end(), addLiveOut);
		}

		#if !defined(NDEBUG) \|\| defined(LLVM_ENABLE_DUMP)

		LLVM_DUMP_METHOD
void SIScheduleBlock::printDebug(bool full) {		void SIScheduleBlock::printDebug(bool full) {
dbgs() << "Block (" << ID << ")\n";		dbgs() << "Block (" << ID << ")\n";
if (!full)		if (!full)
return;		return;

dbgs() << "\nContains High Latency Instruction: "		dbgs() << "\nContains High Latency Instruction: "
<< HighLatencyBlock << '\n';		<< HighLatencyBlock << '\n';
dbgs() << "\nDepends On:\n";		dbgs() << "\nDepends On:\n";
for (SIScheduleBlock* P : Preds) {		for (SIScheduleBlock* P : Preds) {
P->printDebug(false);		P->printDebug(false);
}		}

dbgs() << "\nSuccessors:\n";		dbgs() << "\nSuccessors:\n";
for (std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind> S : Succs) {		for (std::pair<SIScheduleBlock*, SIScheduleBlockLinkKind> S : Succs) {
if (S.second == SIScheduleBlockLinkKind::Data)		if (S.second == SIScheduleBlockLinkKind::Data)
dbgs() << "(Data Dep) ";		dbgs() << "(Data Dep) ";
S.first->printDebug(false);		S.first->printDebug(false);
}		}

if (Scheduled) {		dbgs() << "LiveInPressure " << LiveInVGPRPressure << ' '
dbgs() << "LiveInPressure " << LiveInPressure[DAG->getSGPRSetID()] << ' '		<< LiveInSGPRPressure << '\n';
<< LiveInPressure[DAG->getVGPRSetID()] << '\n';		dbgs() << "LiveOutPressure " << LiveOutVGPRPressure << ' '
dbgs() << "LiveOutPressure " << LiveOutPressure[DAG->getSGPRSetID()] << ' '		<< LiveOutSGPRPressure << "\n\n";
<< LiveOutPressure[DAG->getVGPRSetID()] << "\n\n";
dbgs() << "LiveIns:\n";		dbgs() << "LiveIns:\n";
for (unsigned Reg : LiveInRegs)		for (const auto &Ins : LiveInRegs) {
dbgs() << PrintVRegOrUnit(Reg, DAG->getTRI()) << ' ';		dbgs() << PrintVRegOrUnit(Ins.RegUnit, DAG->getTRI());
		if (!Ins.LaneMask.all())
		dbgs() << ':' << PrintLaneMask(Ins.LaneMask);
		dbgs() << ' ';
		}

dbgs() << "\nLiveOuts:\n";		dbgs() << "\nLiveOuts:\n";
for (unsigned Reg : LiveOutRegs)		for (const auto &Outs : LiveOutRegs) {
dbgs() << PrintVRegOrUnit(Reg, DAG->getTRI()) << ' ';		dbgs() << PrintVRegOrUnit(Outs.RegUnit, DAG->getTRI());
		if (!Outs.LaneMask.all())
		dbgs() << ':' << PrintLaneMask(Outs.LaneMask);
		dbgs() << ' ';
}		}

dbgs() << "\nInstructions:\n";		dbgs() << "\nInstructions:\n";
if (!Scheduled) {
for (SUnit* SU : SUnits) {		for (SUnit* SU : SUnits) {
SU->dump(DAG);		SU->dump(DAG);
}		}
} else {
for (SUnit* SU : SUnits) {
SU->dump(DAG);
}
}

dbgs() << "///////////////////////\n";		dbgs() << "///////////////////////\n";
}		}
#endif		#endif

// SIScheduleBlockCreator //		// SIScheduleBlockCreator //

SIScheduleBlockCreator::SIScheduleBlockCreator(SIScheduleDAGMI *DAG) :		SIScheduleBlockCreator::SIScheduleBlockCreator(SIScheduleDAGMI *DAG) :
DAG(DAG) {		DAG(DAG) {
}		}

SIScheduleBlockCreator::~SIScheduleBlockCreator() = default;		SIScheduleBlockCreator::~SIScheduleBlockCreator() = default;

SIScheduleBlocks		SIScheduleBlocks
SIScheduleBlockCreator::getBlocks(SISchedulerBlockCreatorVariant BlockVariant) {		SIScheduleBlockCreator::getBlocks(SISchedulerBlockCreatorVariant BlockVariant) {
std::map<SISchedulerBlockCreatorVariant, SIScheduleBlocks>::iterator B =		std::map<SISchedulerBlockCreatorVariant, SIScheduleBlocks>::iterator B =
Blocks.find(BlockVariant);		Blocks.find(BlockVariant);
if (B == Blocks.end()) {		if (B == Blocks.end()) {
SIScheduleBlocks Res;		SIScheduleBlocks Res;
createBlocksForVariant(BlockVariant);		createBlocksForVariant(BlockVariant);
topologicalSort();		topologicalSort();
scheduleInsideBlocks();
fillStats();		fillStats();
Res.Blocks = CurrentBlocks;		Res.Blocks = CurrentBlocks;
Res.TopDownIndex2Block = TopDownIndex2Block;		Res.TopDownIndex2Block = TopDownIndex2Block;
Res.TopDownBlock2Index = TopDownBlock2Index;		Res.TopDownBlock2Index = TopDownBlock2Index;
Blocks[BlockVariant] = Res;		Blocks[BlockVariant] = Res;
return Res;		return Res;
} else {		} else {
return B->second;		return B->second;
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	if (DAG->IsHighLatencySU[SU.NodeNum]) {
// a data dependency or not.		// a data dependency or not.
for (unsigned j : FormingGroup) {		for (unsigned j : FormingGroup) {
bool HasSubGraph;		bool HasSubGraph;
std::vector<int> SubGraph;		std::vector<int> SubGraph;
// By construction (topological order), if SU and		// By construction (topological order), if SU and
// DAG->SUnits[j] are linked, DAG->SUnits[j] is neccessary		// DAG->SUnits[j] are linked, DAG->SUnits[j] is neccessary
// in the parent graph of SU.		// in the parent graph of SU.
#ifndef NDEBUG		#ifndef NDEBUG
SubGraph = DAG->GetTopo()->GetSubGraph(SU, DAG->SUnits[j],		SubGraph = DAG->getTopo()->GetSubGraph(SU, DAG->SUnits[j],
HasSubGraph);		HasSubGraph);
assert(!HasSubGraph);		assert(!HasSubGraph);
#endif		#endif
SubGraph = DAG->GetTopo()->GetSubGraph(DAG->SUnits[j], SU,		SubGraph = DAG->getTopo()->GetSubGraph(DAG->SUnits[j], SU,
HasSubGraph);		HasSubGraph);
if (!HasSubGraph)		if (!HasSubGraph)
continue; // No dependencies between each other		continue; // No dependencies between each other
else if (SubGraph.size() > 5) {		else if (SubGraph.size() > 5) {
// Too many elements would be required to be added to the block.		// Too many elements would be required to be added to the block.
CompatibleGroup = false;		CompatibleGroup = false;
break;		break;
}		}
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	void SIScheduleBlockCreator::colorForceConsecutiveOrderInGroup() {
for (unsigned i = 1, e = DAGSize; i != e; ++i) {		for (unsigned i = 1, e = DAGSize; i != e; ++i) {
SUnit *SU = &DAG->SUnits[i];		SUnit *SU = &DAG->SUnits[i];
unsigned CurrentColor = CurrentColoring[i];		unsigned CurrentColor = CurrentColoring[i];
unsigned PreviousColorSave = PreviousColor;		unsigned PreviousColorSave = PreviousColor;
assert(i == SU->NodeNum);		assert(i == SU->NodeNum);

if (CurrentColor != PreviousColor)		if (CurrentColor != PreviousColor)
SeenColors.insert(PreviousColor);		SeenColors.insert(PreviousColor);
PreviousColor = CurrentColor;		PreviousColor = CurrentColor;
		vpykhtinUnsubmitted Not Done Reply Inline Actions duplicated code vpykhtin: duplicated code
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions This is duplicated with your iterative scheduler, and I suggested to have a common function for both schedulers and the default scheduler (you said you extracted the code from there), but you didn't take the comment into account. Do you want a common function for both SI schedulers or with default scheduler too ? axeldavy: This is duplicated with your iterative scheduler, and I suggested to have a common function for…
		vpykhtinUnsubmitted Not Done Reply Inline Actions Yes, I remember that comment. This doesn't mean this code should be duplicated even more, I encorage you to make short functions whenever possible, even for your own use. This code is actually isn't very efficient and I planned to make efficient version, I should have been noted this, sorry. vpykhtin: Yes, I remember that comment. This doesn't mean this code should be duplicated even more, I…

if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)		if (CurrentColoring[SU->NodeNum] <= (int)DAGSize)
continue;		continue;

if (SeenColors.find(CurrentColor) == SeenColors.end())		if (SeenColors.find(CurrentColor) == SeenColors.end())
continue;		continue;

if (PreviousColorSave != CurrentColor)		if (PreviousColorSave != CurrentColor)
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	void SIScheduleBlockCreator::createBlocksForVariant(SISchedulerBlockCreatorVariant BlockVariant) {
unsigned DAGSize = DAG->SUnits.size();		unsigned DAGSize = DAG->SUnits.size();
std::map<unsigned,unsigned> RealID;		std::map<unsigned,unsigned> RealID;

CurrentBlocks.clear();		CurrentBlocks.clear();
CurrentColoring.clear();		CurrentColoring.clear();
CurrentColoring.resize(DAGSize, 0);		CurrentColoring.resize(DAGSize, 0);
Node2CurrentBlock.clear();		Node2CurrentBlock.clear();

// Restore links previous scheduling variant has overridden.
DAG->restoreSULinksLeft();

NextReservedID = 1;		NextReservedID = 1;
NextNonReservedID = DAGSize + 1;		NextNonReservedID = DAGSize + 1;

DEBUG(dbgs() << "Coloring the graph\n");		DEBUG(dbgs() << "Coloring the graph\n");

if (BlockVariant == SISchedulerBlockCreatorVariant::LatenciesGrouped)		if (BlockVariant == SISchedulerBlockCreatorVariant::LatenciesGrouped)
colorHighLatenciesGroups();		colorHighLatenciesGroups();
else		else
Show All 38 Lines	for (SDep& PredDep : SU->Preds) {
SUnit *Pred = PredDep.getSUnit();		SUnit *Pred = PredDep.getSUnit();
if (PredDep.isWeak() \|\| Pred->NodeNum >= DAGSize)		if (PredDep.isWeak() \|\| Pred->NodeNum >= DAGSize)
continue;		continue;
if (Node2CurrentBlock[Pred->NodeNum] != SUID)		if (Node2CurrentBlock[Pred->NodeNum] != SUID)
CurrentBlocks[SUID]->addPred(CurrentBlocks[Node2CurrentBlock[Pred->NodeNum]]);		CurrentBlocks[SUID]->addPred(CurrentBlocks[Node2CurrentBlock[Pred->NodeNum]]);
}		}
}		}

// Free root and leafs of all blocks to enable scheduling inside them.		// Compute Block LiveIns
		for (unsigned i = 0, e = DAGSize; i != e; ++i) {
		SUnit *SU = &DAG->SUnits[i];
		int SUID = Node2CurrentBlock[i];
		SmallVector<RegisterMaskPair, 8> Uses;

		getUsesForSU(Uses, *SU, DAG->getTRI(),
		DAG->shouldTrackLaneMasks());

		// Remove From Uses everything that is defined by SUs of the same Block.
		for (SDep& PredDep : SU->Preds) {
		SUnit *Pred = PredDep.getSUnit();
		if (Pred->isBoundaryNode() \|\| Node2CurrentBlock[Pred->NodeNum] != SUID)
		continue;
		if (!PredDep.isAssignedRegDep())
		continue;
		if (!TargetRegisterInfo::isVirtualRegister(PredDep.getReg()))
		continue;
		removeUseFromDef(Uses, PredDep.getReg(), Pred);
		}
		// The remaining Uses are Block LiveIns.
		CurrentBlocks[SUID]->addLiveIns(Uses);
		}

		// Compute Block LiveOut
		for (unsigned i = 0, e = DAGSize; i != e; ++i) {
		SUnit *SU = &DAG->SUnits[i];
		int SUID = Node2CurrentBlock[i];
		SmallVector<RegisterMaskPair, 8> LiveOuts;

		for (SDep& SuccDep : SU->Succs) {
		SUnit *Succ = SuccDep.getSUnit();
		if (!Succ->isBoundaryNode() && Node2CurrentBlock[Succ->NodeNum] == SUID)
		continue;
		if (!SuccDep.isAssignedRegDep())
		continue;
		// We don't track physical registers
		if (!TargetRegisterInfo::isVirtualRegister(SuccDep.getReg()))
		continue;
		addDefFromUse(LiveOuts, SuccDep.getReg(), SU, Succ);
		}
		CurrentBlocks[SUID]->addLiveOuts(LiveOuts);
		}

for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {		for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
SIScheduleBlock *Block = CurrentBlocks[i];		SIScheduleBlock *Block = CurrentBlocks[i];
Block->finalizeUnits();		Block->finalize();
}		}
DEBUG(		DEBUG(
dbgs() << "Blocks created:\n\n";		dbgs() << "Blocks created:\n\n";
for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {		for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
SIScheduleBlock *Block = CurrentBlocks[i];		SIScheduleBlock *Block = CurrentBlocks[i];
Block->printDebug(true);		Block->printDebug(true);
}		}
);		);
}		}

// Two functions taken from Codegen/MachineScheduler.cpp		// Adapted from Codegen/MachineScheduler.cpp

/// Non-const version.
static MachineBasicBlock::iterator
nextIfDebug(MachineBasicBlock::iterator I,
MachineBasicBlock::const_iterator End) {
for (; I != End; ++I) {
if (!I->isDebugValue())
break;
}
return I;
}

void SIScheduleBlockCreator::topologicalSort() {		void SIScheduleBlockCreator::topologicalSort() {
unsigned DAGSize = CurrentBlocks.size();		unsigned DAGSize = CurrentBlocks.size();
std::vector<int> WorkList;		std::vector<int> WorkList;

DEBUG(dbgs() << "Topological Sort\n");		DEBUG(dbgs() << "Topological Sort\n");

WorkList.reserve(DAGSize);		WorkList.reserve(DAGSize);
Show All 33 Lines	for (unsigned i = 0, e = DAGSize; i != e; ++i) {
}		}
}		}
#endif		#endif

BottomUpIndex2Block = std::vector<int>(TopDownIndex2Block.rbegin(),		BottomUpIndex2Block = std::vector<int>(TopDownIndex2Block.rbegin(),
TopDownIndex2Block.rend());		TopDownIndex2Block.rend());
}		}

void SIScheduleBlockCreator::scheduleInsideBlocks() {
unsigned DAGSize = CurrentBlocks.size();

DEBUG(dbgs() << "\nScheduling Blocks\n\n");

// We do schedule a valid scheduling such that a Block corresponds
// to a range of instructions.
DEBUG(dbgs() << "First phase: Fast scheduling for Reg Liveness\n");
for (unsigned i = 0, e = DAGSize; i != e; ++i) {
SIScheduleBlock *Block = CurrentBlocks[i];
Block->fastSchedule();
}

// Note: the following code, and the part restoring previous position
// is by far the most expensive operation of the Scheduler.

// Do not update CurrentTop.
MachineBasicBlock::iterator CurrentTopFastSched = DAG->getCurrentTop();
std::vector<MachineBasicBlock::iterator> PosOld;
std::vector<MachineBasicBlock::iterator> PosNew;
PosOld.reserve(DAG->SUnits.size());
PosNew.reserve(DAG->SUnits.size());

for (unsigned i = 0, e = DAGSize; i != e; ++i) {
int BlockIndice = TopDownIndex2Block[i];
SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
std::vector<SUnit*> SUs = Block->getScheduledUnits();

for (SUnit* SU : SUs) {
MachineInstr *MI = SU->getInstr();
MachineBasicBlock::iterator Pos = MI;
PosOld.push_back(Pos);
if (&*CurrentTopFastSched == MI) {
PosNew.push_back(Pos);
CurrentTopFastSched = nextIfDebug(++CurrentTopFastSched,
DAG->getCurrentBottom());
} else {
// Update the instruction stream.
DAG->getBB()->splice(CurrentTopFastSched, DAG->getBB(), MI);

// Update LiveIntervals.
// Note: Moving all instructions and calling handleMove every time
// is the most cpu intensive operation of the scheduler.
// It would gain a lot if there was a way to recompute the
// LiveIntervals for the entire scheduling region.
DAG->getLIS()->handleMove(MI, /UpdateFlags=*/true);
PosNew.push_back(CurrentTopFastSched);
}
}
}

// Now we have Block of SUs == Block of MI.
// We do the final schedule for the instructions inside the block.
// The property that all the SUs of the Block are grouped together as MI
// is used for correct reg usage tracking.
for (unsigned i = 0, e = DAGSize; i != e; ++i) {
SIScheduleBlock *Block = CurrentBlocks[i];
std::vector<SUnit*> SUs = Block->getScheduledUnits();
Block->schedule((SUs.begin())->getInstr(), (SUs.rbegin())->getInstr());
}

DEBUG(dbgs() << "Restoring MI Pos\n");
// Restore old ordering (which prevents a LIS->handleMove bug).
for (unsigned i = PosOld.size(), e = 0; i != e; --i) {
MachineBasicBlock::iterator POld = PosOld[i-1];
MachineBasicBlock::iterator PNew = PosNew[i-1];
if (PNew != POld) {
// Update the instruction stream.
DAG->getBB()->splice(POld, DAG->getBB(), PNew);

// Update LiveIntervals.
DAG->getLIS()->handleMove(POld, /UpdateFlags=*/true);
}
}

DEBUG(
for (unsigned i = 0, e = CurrentBlocks.size(); i != e; ++i) {
SIScheduleBlock *Block = CurrentBlocks[i];
Block->printDebug(true);
}
);
}

void SIScheduleBlockCreator::fillStats() {		void SIScheduleBlockCreator::fillStats() {
unsigned DAGSize = CurrentBlocks.size();		unsigned DAGSize = CurrentBlocks.size();

		rampitecUnsubmitted Not Done Reply Inline Actions According to coding standard, if (...) return; rampitec: According to coding standard, if (...) return;
for (unsigned i = 0, e = DAGSize; i != e; ++i) {		for (unsigned i = 0, e = DAGSize; i != e; ++i) {
int BlockIndice = TopDownIndex2Block[i];		int BlockIndice = TopDownIndex2Block[i];
SIScheduleBlock *Block = CurrentBlocks[BlockIndice];		SIScheduleBlock *Block = CurrentBlocks[BlockIndice];
if (Block->getPreds().empty())		if (Block->getPreds().empty())
Block->Depth = 0;		Block->Depth = 0;
else {		else {
unsigned Depth = 0;		unsigned Depth = 0;
for (SIScheduleBlock *Pred : Block->getPreds()) {		for (SIScheduleBlock *Pred : Block->getPreds()) {
Show All 13 Lines	else {
unsigned Height = 0;		unsigned Height = 0;
for (const auto &Succ : Block->getSuccs())		for (const auto &Succ : Block->getSuccs())
Height = std::min(Height, Succ.first->Height + 1);		Height = std::min(Height, Succ.first->Height + 1);
Block->Height = Height;		Block->Height = Height;
}		}
}		}
}		}

// SIScheduleBlockScheduler //		LaneBitmask SIScheduleBlockCreator::getLaneBitmaskForDef(const SUnit *SU,
		unsigned Reg)
SIScheduleBlockScheduler::SIScheduleBlockScheduler(SIScheduleDAGMI *DAG,		{
SISchedulerBlockSchedulerVariant Variant,		LaneBitmask DefMask = LaneBitmask::getNone();
SIScheduleBlocks BlocksStruct) :
DAG(DAG), Variant(Variant), Blocks(BlocksStruct.Blocks),
LastPosWaitedHighLatency(0), NumBlockScheduled(0), VregCurrentUsage(0),
SregCurrentUsage(0), maxVregUsage(0), maxSregUsage(0) {

// Fill the usage of every output
// Warning: while by construction we always have a link between two blocks
// when one needs a result from the other, the number of users of an output
// is not the sum of child blocks having as input the same virtual register.
// Here is an example. A produces x and y. B eats x and produces x'.
// C eats x' and y. The register coalescer may have attributed the same
// virtual register to x and x'.
// To count accurately, we do a topological sort. In case the register is
// found for several parents, we increment the usage of the one with the
// highest topological index.
LiveOutRegsNumUsages.resize(Blocks.size());
for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
SIScheduleBlock *Block = Blocks[i];
for (unsigned Reg : Block->getInRegs()) {
bool Found = false;
int topoInd = -1;
for (SIScheduleBlock* Pred: Block->getPreds()) {
std::set<unsigned> PredOutRegs = Pred->getOutRegs();
std::set<unsigned>::iterator RegPos = PredOutRegs.find(Reg);

if (RegPos != PredOutRegs.end()) {		for (const MachineOperand &MO : SU->getInstr()->operands()) {
Found = true;		if (MO.isReg() && MO.isDef() && MO.getReg() == Reg && !MO.isDead()) {
if (topoInd < BlocksStruct.TopDownBlock2Index[Pred->getID()]) {		unsigned SubRegIdx = MO.getSubReg();
topoInd = BlocksStruct.TopDownBlock2Index[Pred->getID()];		DefMask \|= SubRegIdx != 0 ?
		DAG->getTRI()->getSubRegIndexLaneMask(SubRegIdx) :
		LaneBitmask::getAll();
}		}
}		}

		return DefMask;
}		}

if (!Found)		LaneBitmask SIScheduleBlockCreator::getLaneBitmaskForUse(const SUnit *SU,
continue;		unsigned Reg)
		{
		LaneBitmask UseMask = LaneBitmask::getNone();

int PredID = BlocksStruct.TopDownIndex2Block[topoInd];		for (const MachineOperand &MO : SU->getInstr()->operands()) {
++LiveOutRegsNumUsages[PredID][Reg];		if (MO.isReg() && MO.isUse() && MO.getReg() == Reg &&
		!MO.isUndef() && !MO.isInternalRead()) {
		unsigned SubRegIdx = MO.getSubReg();
		UseMask \|= SubRegIdx != 0 ?
		DAG->getTRI()->getSubRegIndexLaneMask(SubRegIdx) :
		LaneBitmask::getAll();
}		}
}		}

LastPosHighLatencyParentScheduled.resize(Blocks.size(), 0);		return UseMask;
BlockNumPredsLeft.resize(Blocks.size());
BlockNumSuccsLeft.resize(Blocks.size());

for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
SIScheduleBlock *Block = Blocks[i];
BlockNumPredsLeft[i] = Block->getPreds().size();
BlockNumSuccsLeft[i] = Block->getSuccs().size();
}		}

#ifndef NDEBUG		static void replaceAllByMask(LaneBitmask &L, LaneBitmask &Mask)
for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {		{
SIScheduleBlock *Block = Blocks[i];		if (L.all())
assert(Block->getID() == i);		L = Mask;
}		}
#endif

std::set<unsigned> InRegs = DAG->getInRegs();		void
addLiveRegs(InRegs);		SIScheduleBlockCreator::removeUseFromDef(SmallVectorImpl<RegisterMaskPair> &Uses,
		unsigned Reg, const SUnit *SU)
		{
		assert(TargetRegisterInfo::isVirtualRegister(Reg));
		LaneBitmask LanesMaskMax, DefMask, UseMask;

// Increase LiveOutRegsNumUsages for blocks		auto UsePos = std::find_if(Uses.begin(), Uses.end(),
// producing registers consumed in another		[=](RegisterMaskPair P) {
// scheduling region.		return P.RegUnit == Reg;
for (unsigned Reg : DAG->getOutRegs()) {		});
for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
// Do reverse traversal
int ID = BlocksStruct.TopDownIndex2Block[Blocks.size()-1-i];
SIScheduleBlock *Block = Blocks[ID];
const std::set<unsigned> &OutRegs = Block->getOutRegs();

if (OutRegs.find(Reg) == OutRegs.end())		// Already removed
continue;		if (UsePos == Uses.end())
		rampitecUnsubmitted Not Done Reply Inline Actions Forgot '&'? rampitec: Forgot '&'?
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions indeed. Thanks. axeldavy: indeed. Thanks.
		return;

++LiveOutRegsNumUsages[ID][Reg];		if (!DAG->shouldTrackLaneMasks()) {
break;		Uses.erase(UsePos);
}		return;
}		}

// Fill LiveRegsConsumers for regs that were already		LanesMaskMax = DAG->getMRI()->getMaxLaneMaskForVReg(Reg);
// defined before scheduling.		DefMask = getLaneBitmaskForDef(SU, Reg);
for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {		UseMask = UsePos->LaneMask;
SIScheduleBlock *Block = Blocks[i];
for (unsigned Reg : Block->getInRegs()) {
bool Found = false;
for (SIScheduleBlock* Pred: Block->getPreds()) {
std::set<unsigned> PredOutRegs = Pred->getOutRegs();
std::set<unsigned>::iterator RegPos = PredOutRegs.find(Reg);

if (RegPos != PredOutRegs.end()) {		replaceAllByMask(DefMask, LanesMaskMax);
Found = true;		replaceAllByMask(UseMask, LanesMaskMax);
break;
		if ((UseMask & ~DefMask).none())
		Uses.erase(UsePos);
		else
		UsePos->LaneMask = UseMask & ~DefMask;
}		}

		void
		SIScheduleBlockCreator::addDefFromUse(SmallVectorImpl<RegisterMaskPair> &Defs,
		unsigned Reg, const SUnit *SUDef,
		const SUnit *SUUse)
		{
		LaneBitmask LanesMaskMax, DefMask, UseMask;
		int DefIndex = DAG->getTopo()->getSUTopoIndex(*SUDef);

		auto DefPos = std::find_if(Defs.begin(), Defs.end(),
		[=](RegisterMaskPair P) {
		return P.RegUnit == Reg;
		});

		if (!DAG->shouldTrackLaneMasks()) {
		if (DefPos == Defs.end())
		Defs.push_back(RegisterMaskPair(Reg, LaneBitmask::getAll()));
		return;
}		}

if (!Found)		LanesMaskMax = DAG->getMRI()->getMaxLaneMaskForVReg(Reg);
++LiveRegsConsumers[Reg];		DefMask = getLaneBitmaskForDef(SUDef, Reg);
		UseMask = getLaneBitmaskForUse(SUUse, Reg);

		replaceAllByMask(DefMask, LanesMaskMax);
		replaceAllByMask(UseMask, LanesMaskMax);

		for (const SDep& PredDep : SUUse->Preds) {
		const SUnit *Pred = PredDep.getSUnit();
		if (!PredDep.isAssignedRegDep())
		continue;
		if (PredDep.getReg() != Reg)
		continue;
		// Look for SUs that are after SUDef in the
		// topological sort.
		if (DAG->getTopo()->getSUTopoIndex(*Pred) <= DefIndex)
		continue;
		// These lanes content are defined by Pred,
		// and thus not from SUDef.
		UseMask &= ~getLaneBitmaskForDef(Pred, Reg);
}		}

		DefMask &= UseMask;
		// There is data dependency, thus DefMask cannot be none.
		assert(!DefMask.none());

		if (DefPos == Defs.end())
		Defs.push_back(RegisterMaskPair(Reg, DefMask));
		else
		DefPos->LaneMask \|= DefMask;
}		}

		// SIScheduleBlockScheduler //

		SIScheduleBlockScheduler::SIScheduleBlockScheduler(SIScheduleDAGMI *DAG,
		SISchedulerBlockSchedulerVariant Variant,
		SIScheduleBlocks BlocksStruct) :
		DAG(DAG), Variant(Variant), Blocks(BlocksStruct.Blocks),
		LastPosWaitedHighLatency(0), NumBlockScheduled(0),
		maxVregUsage(0), maxSregUsage(0) {

		LastPosHighLatencyParentScheduled.resize(Blocks.size(), 0);

		#ifndef NDEBUG
for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {		for (unsigned i = 0, e = Blocks.size(); i != e; ++i) {
SIScheduleBlock *Block = Blocks[i];		SIScheduleBlock *Block = Blocks[i];
if (BlockNumPredsLeft[i] == 0) {		assert(Block->getID() == i);
ReadyBlocks.push_back(Block);
}
}		}
		#endif

		std::vector<SmallVector<unsigned, 8>> BlockSuccs;
		std::vector<SmallVector<unsigned, 8>> BlockPreds;
		std::vector<SmallVector<RegisterMaskPair, 8>> InRegsForBlock;
		std::vector<SmallVector<RegisterMaskPair, 8>> OutRegsForBlock;

		BlockSuccs.resize(Blocks.size());
		BlockPreds.resize(Blocks.size());
		InRegsForBlock.resize(Blocks.size());
		OutRegsForBlock.resize(Blocks.size());

		for (unsigned i = 0; i < Blocks.size(); i++) {
		SIScheduleBlock *Block = Blocks[i];
		for (const auto &Succ : Block->getSuccs())
		BlockSuccs[i].push_back(Succ.first->getID());
		for (const auto &Pred : Block->getPreds())
		BlockPreds[i].push_back(Pred->getID());
		InRegsForBlock[i] = Block->getInRegs();
		OutRegsForBlock[i] = Block->getOutRegs();
		}

		RPTracker.reset(new SISchedulerRPTracker(
		DAG->getInRegs(),
		DAG->getOutRegs(),
		BlockSuccs,
		BlockPreds,
		InRegsForBlock,
		OutRegsForBlock,
		DAG->getMRI(),
		DAG->getTRI(),
		DAG->getVGPRSetID(),
		DAG->getSGPRSetID()
		));

while (SIScheduleBlock *Block = pickBlock()) {		while (SIScheduleBlock *Block = pickBlock()) {
BlocksScheduled.push_back(Block);		BlocksScheduled.push_back(Block);
blockScheduled(Block);		blockScheduled(Block);
}		}

DEBUG(		DEBUG(
dbgs() << "Block Order:";		dbgs() << "Block Order:";
Show All 10 Lines	if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return true;		return true;
}		}

// Try to hide high latencies.		// Try to hide high latencies.
if (tryLess(TryCand.LastPosHighLatParentScheduled,		if (tryLess(TryCand.LastPosHighLatParentScheduled,
Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))		Cand.LastPosHighLatParentScheduled, TryCand, Cand, Latency))
return true;		return true;
// Schedule high latencies early so you can hide them better.		// Schedule high latencies early so you can hide them better.
		rampitecUnsubmitted Not Done Reply Inline Actions ToAppend is passed by value... As far as I understand it should not work, i.e. it should not return any values. rampitec: ToAppend is passed by value... As far as I understand it should not work, i.e. it should not…
if (tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,		if (tryGreater(TryCand.IsHighLatency, Cand.IsHighLatency,
TryCand, Cand, Latency))		TryCand, Cand, Latency))
return true;		return true;
if (TryCand.IsHighLatency && tryGreater(TryCand.Height, Cand.Height,		if (TryCand.IsHighLatency && tryGreater(TryCand.Height, Cand.Height,
TryCand, Cand, Depth))		TryCand, Cand, Depth))
return true;		return true;
if (tryGreater(TryCand.NumHighLatencySuccessors,		if (tryGreater(TryCand.NumHighLatencySuccessors,
Cand.NumHighLatencySuccessors,		Cand.NumHighLatencySuccessors,
TryCand, Cand, Successor))		TryCand, Cand, Successor))
return true;		return true;
return false;		return false;
}		}

bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,		bool SIScheduleBlockScheduler::tryCandidateRegUsage(SIBlockSchedCandidate &Cand,
SIBlockSchedCandidate &TryCand) {		SIBlockSchedCandidate &TryCand) {
if (!Cand.isValid()) {		if (!Cand.isValid()) {
TryCand.Reason = NodeOrder;		TryCand.Reason = NodeOrder;
return true;		return true;
}		}

if (tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,		if (tryLess(TryCand.VGPRUsageDiff > 0, Cand.VGPRUsageDiff > 0,
TryCand, Cand, RegUsage))		TryCand, Cand, RegUsage))
return true;		return true;
if (tryGreater(TryCand.NumSuccessors > 0,		if (tryGreater(TryCand.NumSuccessors > 0,
Cand.NumSuccessors > 0,		Cand.NumSuccessors > 0,
		rampitecUnsubmitted Not Done Reply Inline Actions Can you please avoid copy here and in function's arguments? rampitec: Can you please avoid copy here and in function's arguments?
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions Right, I can get SIScheduleBlockScheduler::getPairsForRegs(const SmallVector<RegisterMaskPair, 8> &Regs) { SmallVector<RegisterMaskPair, 8> Result; for (const auto &RegPair : Regs) { But for the line with: for (const auto RegPairRes : getPairsForReg(RegPair.RegUnit, Do you mean just use &RegPairRes ? Daniel Berlin suggested to use std::transform in combination with a new vector_inserter to replace the two loops. That wouldn't prevent the fact there is in all cases a copy from getPairsForReg results into the result tab of getPairsForRegs. I could probably solve the issue by adding a new getPairsForReg function that takes the Result array to append to, and using that helper with the two functions. Do you think that would be a good solution ? axeldavy: Right, I can get SIScheduleBlockScheduler::getPairsForRegs(const SmallVector<RegisterMaskPair…
		rampitecUnsubmitted Not Done Reply Inline Actions Yes, it sounds more efficient to me. rampitec: Yes, it sounds more efficient to me.
TryCand, Cand, Successor))		TryCand, Cand, Successor))
return true;		return true;
if (tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))		if (tryGreater(TryCand.Height, Cand.Height, TryCand, Cand, Depth))
return true;		return true;
		rampitecUnsubmitted Not Done Reply Inline Actions Argument needs to be reference. rampitec: Argument needs to be reference.
if (tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,		if (tryLess(TryCand.VGPRUsageDiff, Cand.VGPRUsageDiff,
TryCand, Cand, RegUsage))		TryCand, Cand, RegUsage))
return true;		return true;
return false;		return false;
}		}

SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {		SIScheduleBlock *SIScheduleBlockScheduler::pickBlock() {
SIBlockSchedCandidate Cand;		SIBlockSchedCandidate Cand;
std::vector<SIScheduleBlock*>::iterator Best;		unsigned VregCurrentUsage, SregCurrentUsage;
SIScheduleBlock *Block;		SIScheduleBlock *Block;
		SmallVector<unsigned, 16> ReadyBlocks = RPTracker->getReadyItems();
if (ReadyBlocks.empty())		if (ReadyBlocks.empty())
return nullptr;		return nullptr;

DAG->fillVgprSgprCost(LiveRegs.begin(), LiveRegs.end(),		RPTracker->getCurrentRegUsage(VregCurrentUsage, SregCurrentUsage);
VregCurrentUsage, SregCurrentUsage);
if (VregCurrentUsage > maxVregUsage)		if (VregCurrentUsage > maxVregUsage)
maxVregUsage = VregCurrentUsage;		maxVregUsage = VregCurrentUsage;
if (SregCurrentUsage > maxSregUsage)		if (SregCurrentUsage > maxSregUsage)
maxSregUsage = SregCurrentUsage;		maxSregUsage = SregCurrentUsage;
DEBUG(		DEBUG(
dbgs() << "Picking New Blocks\n";		dbgs() << "Picking New Blocks\n";
dbgs() << "Available: ";		dbgs() << "Available: ";
for (SIScheduleBlock* Block : ReadyBlocks)		for (unsigned ID : ReadyBlocks)
dbgs() << Block->getID() << ' ';		dbgs() << ID << ' ';
dbgs() << "\nCurrent Live:\n";		dbgs() << "\nCurrent Live:\n";
for (unsigned Reg : LiveRegs)		RPTracker->printDebugLives();
		rampitecUnsubmitted Not Done Reply Inline Actions auto& (and in other places too)? rampitec: auto& (and in other places too)?
		axeldavyAuthorUnsubmitted Not Done Reply Inline Actions Ok axeldavy: Ok
dbgs() << PrintVRegOrUnit(Reg, DAG->getTRI()) << ' ';
dbgs() << '\n';		dbgs() << '\n';
dbgs() << "Current VGPRs: " << VregCurrentUsage << '\n';		dbgs() << "Current VGPRs: " << VregCurrentUsage << '\n';
dbgs() << "Current SGPRs: " << SregCurrentUsage << '\n';		dbgs() << "Current SGPRs: " << SregCurrentUsage << '\n';
);		);

Cand.Block = nullptr;		Cand.Block = nullptr;
for (std::vector<SIScheduleBlock*>::iterator I = ReadyBlocks.begin(),		for (unsigned ID : ReadyBlocks) {
E = ReadyBlocks.end(); I != E; ++I) {
SIBlockSchedCandidate TryCand;		SIBlockSchedCandidate TryCand;
TryCand.Block = *I;		int SGPRUsageDiff;

		TryCand.Block = Blocks[ID];
TryCand.IsHighLatency = TryCand.Block->isHighLatencyBlock();		TryCand.IsHighLatency = TryCand.Block->isHighLatencyBlock();
TryCand.VGPRUsageDiff =		RPTracker->checkRegUsageImpact(ID, TryCand.VGPRUsageDiff, SGPRUsageDiff);
checkRegUsageImpact(TryCand.Block->getInRegs(),
TryCand.Block->getOutRegs())[DAG->getVGPRSetID()];
TryCand.NumSuccessors = TryCand.Block->getSuccs().size();		TryCand.NumSuccessors = TryCand.Block->getSuccs().size();
TryCand.NumHighLatencySuccessors =		TryCand.NumHighLatencySuccessors =
TryCand.Block->getNumHighLatencySuccessors();		TryCand.Block->getNumHighLatencySuccessors();
TryCand.LastPosHighLatParentScheduled =		TryCand.LastPosHighLatParentScheduled =
(unsigned int) std::max<int> (0,		(unsigned int) std::max<int> (0,
LastPosHighLatencyParentScheduled[TryCand.Block->getID()] -		LastPosHighLatencyParentScheduled[ID] -
LastPosWaitedHighLatency);		LastPosWaitedHighLatency);
TryCand.Height = TryCand.Block->Height;		TryCand.Height = TryCand.Block->Height;
// Try not to increase VGPR usage too much, else we may spill.		// Try not to increase VGPR usage too much, else we may spill.
if (VregCurrentUsage > 120 \|\|		if (VregCurrentUsage > 120 \|\|
Variant != SISchedulerBlockSchedulerVariant::BlockLatencyRegUsage) {		Variant != SISchedulerBlockSchedulerVariant::BlockLatencyRegUsage) {
if (!tryCandidateRegUsage(Cand, TryCand) &&		if (!tryCandidateRegUsage(Cand, TryCand) &&
Variant != SISchedulerBlockSchedulerVariant::BlockRegUsage)		Variant != SISchedulerBlockSchedulerVariant::BlockRegUsage)
tryCandidateLatency(Cand, TryCand);		tryCandidateLatency(Cand, TryCand);
} else {		} else {
if (!tryCandidateLatency(Cand, TryCand))		if (!tryCandidateLatency(Cand, TryCand))
tryCandidateRegUsage(Cand, TryCand);		tryCandidateRegUsage(Cand, TryCand);
}		}
if (TryCand.Reason != NoCand) {		if (TryCand.Reason != NoCand) {
Cand.setBest(TryCand);		Cand.setBest(TryCand);
Best = I;
DEBUG(dbgs() << "Best Current Choice: " << Cand.Block->getID() << ' '		DEBUG(dbgs() << "Best Current Choice: " << Cand.Block->getID() << ' '
<< getReasonStr(Cand.Reason) << '\n');		<< getReasonStr(Cand.Reason) << '\n');
}		}
}		}

DEBUG(		DEBUG(
dbgs() << "Picking: " << Cand.Block->getID() << '\n';		dbgs() << "Picking: " << Cand.Block->getID() << '\n';
dbgs() << "Is a block with high latency instruction: "		dbgs() << "Is a block with high latency instruction: "
<< (Cand.IsHighLatency ? "yes\n" : "no\n");		<< (Cand.IsHighLatency ? "yes\n" : "no\n");
dbgs() << "Position of last high latency dependency: "		dbgs() << "Position of last high latency dependency: "
<< Cand.LastPosHighLatParentScheduled << '\n';		<< Cand.LastPosHighLatParentScheduled << '\n';
dbgs() << "VGPRUsageDiff: " << Cand.VGPRUsageDiff << '\n';		dbgs() << "VGPRUsageDiff: " << Cand.VGPRUsageDiff << '\n';
dbgs() << '\n';		dbgs() << '\n';
);		);

Block = Cand.Block;		Block = Cand.Block;
ReadyBlocks.erase(Best);
return Block;		return Block;
}		}

// Tracking of currently alive registers to determine VGPR Usage.		void SIScheduleBlockScheduler::blockScheduled(SIScheduleBlock *Block) {
		unsigned ID = Block->getID();
void SIScheduleBlockScheduler::addLiveRegs(std::set<unsigned> &Regs) {		RPTracker->itemScheduled(ID);
for (unsigned Reg : Regs) {
// For now only track virtual registers.
if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;
// If not already in the live set, then add it.
(void) LiveRegs.insert(Reg);
}
}

void SIScheduleBlockScheduler::decreaseLiveRegs(SIScheduleBlock *Block,
std::set<unsigned> &Regs) {
for (unsigned Reg : Regs) {
// For now only track virtual registers.
std::set<unsigned>::iterator Pos = LiveRegs.find(Reg);
assert (Pos != LiveRegs.end() && // Reg must be live.
LiveRegsConsumers.find(Reg) != LiveRegsConsumers.end() &&
LiveRegsConsumers[Reg] >= 1);
--LiveRegsConsumers[Reg];
if (LiveRegsConsumers[Reg] == 0)
LiveRegs.erase(Pos);
}
}

void SIScheduleBlockScheduler::releaseBlockSuccs(SIScheduleBlock *Parent) {		if (Block->isHighLatencyBlock()) {
for (const auto &Block : Parent->getSuccs()) {		for (const auto &Succ : Block->getSuccs()) {
if (--BlockNumPredsLeft[Block.first->getID()] == 0)		if (Succ.second == SIScheduleBlockLinkKind::Data)
ReadyBlocks.push_back(Block.first);		LastPosHighLatencyParentScheduled[Succ.first->getID()] =
		NumBlockScheduled;
if (Parent->isHighLatencyBlock() &&
Block.second == SIScheduleBlockLinkKind::Data)
LastPosHighLatencyParentScheduled[Block.first->getID()] = NumBlockScheduled;
}		}
}		}

void SIScheduleBlockScheduler::blockScheduled(SIScheduleBlock *Block) {		if (LastPosHighLatencyParentScheduled[ID] >
decreaseLiveRegs(Block, Block->getInRegs());
addLiveRegs(Block->getOutRegs());
releaseBlockSuccs(Block);
for (std::map<unsigned, unsigned>::iterator RegI =
LiveOutRegsNumUsages[Block->getID()].begin(),
E = LiveOutRegsNumUsages[Block->getID()].end(); RegI != E; ++RegI) {
std::pair<unsigned, unsigned> RegP = *RegI;
// We produce this register, thus it must not be previously alive.
assert(LiveRegsConsumers.find(RegP.first) == LiveRegsConsumers.end() \|\|
LiveRegsConsumers[RegP.first] == 0);
LiveRegsConsumers[RegP.first] += RegP.second;
}
if (LastPosHighLatencyParentScheduled[Block->getID()] >
(unsigned)LastPosWaitedHighLatency)		(unsigned)LastPosWaitedHighLatency)
LastPosWaitedHighLatency =		LastPosWaitedHighLatency =
LastPosHighLatencyParentScheduled[Block->getID()];		LastPosHighLatencyParentScheduled[ID];
++NumBlockScheduled;		++NumBlockScheduled;
}		}

std::vector<int>
SIScheduleBlockScheduler::checkRegUsageImpact(std::set<unsigned> &InRegs,
std::set<unsigned> &OutRegs) {
std::vector<int> DiffSetPressure;
DiffSetPressure.assign(DAG->getTRI()->getNumRegPressureSets(), 0);

for (unsigned Reg : InRegs) {
// For now only track virtual registers.
if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;
if (LiveRegsConsumers[Reg] > 1)
continue;
PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);
for (; PSetI.isValid(); ++PSetI) {
DiffSetPressure[*PSetI] -= PSetI.getWeight();
}
}

for (unsigned Reg : OutRegs) {
// For now only track virtual registers.
if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;
PSetIterator PSetI = DAG->getMRI()->getPressureSets(Reg);
for (; PSetI.isValid(); ++PSetI) {
DiffSetPressure[*PSetI] += PSetI.getWeight();
}
}

return DiffSetPressure;
}

// SIScheduler //		// SIScheduler //

struct SIScheduleBlockResult		struct SIScheduleBlockResult
SIScheduler::scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,		SIScheduler::scheduleVariant(SISchedulerBlockCreatorVariant BlockVariant,
SISchedulerBlockSchedulerVariant ScheduleVariant) {		SISchedulerBlockSchedulerVariant ScheduleVariant) {
		vpykhtinUnsubmitted Not Done Reply Inline Actions SmallPtrSet is designed for pointer types but used for unsigned type, gcc complains, try consider SmallDenseSet. vpykhtin: SmallPtrSet is designed for pointer types but used for unsigned type, gcc complains, try…
SIScheduleBlocks Blocks = BlockCreator.getBlocks(BlockVariant);		SIScheduleBlocks Blocks = BlockCreator.getBlocks(BlockVariant);
SIScheduleBlockScheduler Scheduler(DAG, ScheduleVariant, Blocks);		SIScheduleBlockScheduler Scheduler(DAG, ScheduleVariant, Blocks);
std::vector<SIScheduleBlock*> ScheduledBlocks;		std::vector<SIScheduleBlock*> ScheduledBlocks;
struct SIScheduleBlockResult Res;		struct SIScheduleBlockResult Res;

ScheduledBlocks = Scheduler.getBlocks();		ScheduledBlocks = Scheduler.getBlocks();

for (unsigned b = 0; b < ScheduledBlocks.size(); ++b) {		for (unsigned b = 0; b < ScheduledBlocks.size(); ++b) {
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	// the low latency instructions too.
}		}
ScheduledSUnits[MinPos] = SU->NodeNum;		ScheduledSUnits[MinPos] = SU->NodeNum;
ScheduledSUnitsInv[SU->NodeNum] = MinPos;		ScheduledSUnitsInv[SU->NodeNum] = MinPos;
}		}
}		}
}		}
}		}

void SIScheduleDAGMI::restoreSULinksLeft() {
for (unsigned i = 0, e = SUnits.size(); i != e; ++i) {
SUnits[i].isScheduled = false;
SUnits[i].WeakPredsLeft = SUnitsLinksBackup[i].WeakPredsLeft;
SUnits[i].NumPredsLeft = SUnitsLinksBackup[i].NumPredsLeft;
SUnits[i].WeakSuccsLeft = SUnitsLinksBackup[i].WeakSuccsLeft;
SUnits[i].NumSuccsLeft = SUnitsLinksBackup[i].NumSuccsLeft;
}
}

// Return the Vgpr and Sgpr usage corresponding to some virtual registers.
template<typename _Iterator> void
SIScheduleDAGMI::fillVgprSgprCost(_Iterator First, _Iterator End,
unsigned &VgprUsage, unsigned &SgprUsage) {
VgprUsage = 0;
SgprUsage = 0;
for (_Iterator RegI = First; RegI != End; ++RegI) {
unsigned Reg = *RegI;
// For now only track virtual registers
if (!TargetRegisterInfo::isVirtualRegister(Reg))
continue;
PSetIterator PSetI = MRI.getPressureSets(Reg);
for (; PSetI.isValid(); ++PSetI) {
if (*PSetI == VGPRSetID)
VgprUsage += PSetI.getWeight();
else if (*PSetI == SGPRSetID)
SgprUsage += PSetI.getWeight();
}
}
}

void SIScheduleDAGMI::schedule()		void SIScheduleDAGMI::schedule()
{		{
SmallVector<SUnit*, 8> TopRoots, BotRoots;		SmallVector<SUnit*, 8> TopRoots, BotRoots;
SIScheduleBlockResult Best, Temp;		SIScheduleBlockResult Best, Temp;
DEBUG(dbgs() << "Preparing Scheduling\n");		DEBUG(dbgs() << "Preparing Scheduling\n");

buildDAGWithRegPressure();		buildDAGWithRegPressure();
DEBUG(		DEBUG(
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	// { LatenciesAlonePlusConsecutive, BlockLatencyRegUsage },
// Tell the outside world about the result of the scheduling.		// Tell the outside world about the result of the scheduling.

assert(TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker");		assert(TopRPTracker.getPos() == RegionBegin && "bad initial Top tracker");
TopRPTracker.setPos(CurrentTop);		TopRPTracker.setPos(CurrentTop);

for (std::vector<unsigned>::iterator I = ScheduledSUnits.begin(),		for (std::vector<unsigned>::iterator I = ScheduledSUnits.begin(),
E = ScheduledSUnits.end(); I != E; ++I) {		E = ScheduledSUnits.end(); I != E; ++I) {
SUnit SU = &SUnits[I];		SUnit SU = &SUnits[I];
		SU->NumPredsLeft = 0; // To please scheduleMI

scheduleMI(SU, true);		scheduleMI(SU, true);
		SU->isScheduled = true;

DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "		DEBUG(dbgs() << "Scheduling SU(" << SU->NodeNum << ") "
<< *SU->getInstr());		<< *SU->getInstr());
}		}

assert(CurrentTop == CurrentBottom && "Nonempty unscheduled zone.");		assert(CurrentTop == CurrentBottom && "Nonempty unscheduled zone.");

placeDebugValues();		placeDebugValues();

DEBUG({		DEBUG({
unsigned BBNum = begin()->getParent()->getNumber();		unsigned BBNum = begin()->getParent()->getNumber();
dbgs() << "* Final schedule for BB#" << BBNum << " *\n";		dbgs() << "* Final schedule for BB#" << BBNum << " *\n";
dumpSchedule();		dumpSchedule();
dbgs() << '\n';		dbgs() << '\n';
});		});
}		}