This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Track occupancy in MFI
ClosedPublic

Authored by rampitec on May 29 2018, 6:53 PM.

Details

Summary

Keep track of achieved occupancy in SIMachineFunctionInfo.
At the moment we have a lot of duplicated or even missed code to
query and maintain occupancy info. Record it in the MFI and
query in a single call. Interfaces:

  • getOccupancy() - returns current recorded achieved occupancy.
  • getMinAllowedOccupancy() - returns lesser of the achieved occupancy

and the lowest occupancy we are ready to tolerate. For example if
a kernel is memory bound we are ready to tolerate 4 waves.

  • limitOccupancy() - record occupancy level if we have to lower it.
  • increaseOccupancy() - record occupancy if scheduler managed to

increase the occupancy.

MFI takes care of integrating different checks affecting occupancy,
including LDS use and waves-per-eu attribute. Note that scheduler
starts with not yet known register pressure, so has to record either
limit or increase in occupancy after it is done. Later passes can
just query a resulting value.

New interface is used in the active scheduler and NFC wrt its work.
Changes are also made to experimental schedulers to use it and record
an occupancy after they are done. Before the change waves-per-eu was
ignored by experimental schedulers and tolerance window for memory
bound kernels was not used.

Diff Detail

Event Timeline

rampitec created this revision.May 29 2018, 6:53 PM

The problem with this is until we can serialize MachineFunctionInfo, this is going fo further degrade MIR tests.

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263

Why is this here?

The problem with this is until we can serialize MachineFunctionInfo, this is going fo further degrade MIR tests.

That did not degrade mir tests so far because all what can be recorded from an original function is lost anyway. Default is to have full 10 waves, unless further limited by an attribute (which is lost for mir), static lds usage (which is lost as well) or scheduler register usage (which was never here until this change).

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263

allocateLDSGlobal() belongs to AMDGPUMachineFunction, not SIMachineFunctionInfo. The choice is to either make it virtual or handle in the single place it is really used.

rampitec added inline comments.May 30 2018, 1:35 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263

In fact the other choice is to create SIMachineFunctionInfo::allocateLDSGlobal() which will call AMDGPUMachineFunction::allocateLDSGlobal() and then limitOccupancy(), then call it from here. Same thing basically, but more obscure to my taste. The only sound solution is to virtualize MFI, but I am not really sure it is worth it.

arsenm added inline comments.May 30 2018, 3:07 AM
lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263

I don't see a reason to update this during lowering. We don't do anything with occupancy information in the DAG? Why can't you just adjust this once after the function is selected and the LDS size is known?

rampitec updated this revision to Diff 149170.May 30 2018, 11:28 AM
rampitec marked 4 inline comments as done.

Moved LDS processing into finalizeLowering().

lib/Target/AMDGPU/SIISelLowering.cpp
4259–4263

Thanks! Moved to finalizeLowering().

vpykhtin accepted this revision.May 30 2018, 9:40 PM
vpykhtin added inline comments.
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
185

may be one call to limitOccupancy with min(getMaxWavesPerEU and getOcc..LMS)?

This revision is now accepted and ready to land.May 30 2018, 9:40 PM
rampitec added inline comments.May 30 2018, 10:10 PM
lib/Target/AMDGPU/SIMachineFunctionInfo.cpp
185

It is really the same, it will even produce the same code after compilation. The whole intent of "limit" semantics is that you can take into account just a single factor and omit all others. I would prefer to keep this snippet to emphasize that ;)

This revision was automatically updated to reflect the committed changes.