This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Add patterns for v_dot*_IU for GFX11
AbandonedPublic

Authored by jrbyrnes on Jul 20 2023, 10:42 AM.

Details

Summary

Due to naming, it appears we don't actually try to select these instructions. In general, selection for dot instructions (specifically the dot4 variants) is a bit fragile, and I plan to rework lowering via combining to make selection more robust. For now, we should at least try to select them.

Diff Detail

Event Timeline

jrbyrnes created this revision.Jul 20 2023, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 10:42 AM
jrbyrnes requested review of this revision.Jul 20 2023, 10:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 10:42 AM

We used to pattern match all the dot operations, but stopped because of a ridiculous blow up in compile time. Have you tried measuring that?

We used to pattern match all the dot operations, but stopped because of a ridiculous blow up in compile time. Have you tried measuring that?

Also look at the generated selection tables. This shouldn't be one of the first patterns tried

We used to pattern match all the dot operations, but stopped because of a ridiculous blow up in compile time. Have you tried measuring that?

Also look at the generated selection tables. This shouldn't be one of the first patterns tried

I'll take a look.

I have not directly measured compile time of this patch, but I will. I was thinking of deleting these patterns as they are supported by DAGCombining -- which shouldn't be too expensive with early exits on missed V_MUL_*24 operands. Worth mentioning is that there is also a huge compile time cost for not selecting into these instructions when we should -- we are seeing kernels getting stuck in RA for hours due to code bloat.

Will abandon if https://reviews.llvm.org/D155995 supersedes selection of these instructions.

This is better than the combiner - if it doesn't completely blow up compile time. It's probably easier to avoid the compile time problems with the combine though

This is better than the combiner - if it doesn't completely blow up compile time. It's probably easier to avoid the compile time problems with the combine though

The main problem is the amount of permutations that can occur in extract code. I haven't tired implementing in TableGen and measuring, but I would assume that trying to pattern match them all with tablegen would blow up compile time.