This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Constrain src0 RC of 64 bit shift amount on gfx90a
AbandonedPublic

Authored by rampitec on Sep 1 2022, 4:52 PM.

Details

Summary

This is another w/a for the bug where a shift amount cannot be
a highest allocated register. Unlike D133067 this w/a is not
guaranteed as later passes may replace the operand or whole
instruction, but works most of the times. Thus making D133067
not unneeded but less likely to trigger and hopefully resulting
in a better code.

Diff Detail

Event Timeline

rampitec created this revision.Sep 1 2022, 4:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 4:52 PM
rampitec requested review of this revision.Sep 1 2022, 4:52 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 1 2022, 4:52 PM
Herald added a subscriber: wdng. · View Herald Transcript

Now the w/a in the hazard recognizer is submitted, do we need this as well? The w/a in the recognizer produces quite heavy code, this can reduce chances of it to minimum. Although this may also increase register pressure.

One important thing to note: defining this single RC implodes AMDGPUGenRegisterInfo.inc from 7.6Mb to 93Mb and increases time to generate this file from 5s to 35s.

So maybe we do not need it after all and can live with just hazard recognizer w/a.

rampitec abandoned this revision.Sep 8 2022, 4:29 PM

Given the compile time impact and that we have less than ideal but working w/a, plus that only one subtarget is affected, I think we do not need this. Abandoning.