This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Coalesce register classes for {i16,f16,bf16}, {i32,v2f16,v2bf16}
ClosedPublic

Authored by tra on May 26 2023, 5:13 PM.

Details

Summary

They all use the same 16/32 bit PTX registers and there's no point creating more register classes for each.

The changes are largely mechanical replacing *f16 register classes with i16/i32, with the exception of a minor optimization to the register copying. We now produce smaller number of pointless register moves.

Diff Detail

Event Timeline

tra created this revision.May 26 2023, 5:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2023, 5:13 PM
tra updated this revision to Diff 526756.May 30 2023, 1:05 PM

Switched v2f16 and v2bf16 to use the same registers as i32.

tra updated this revision to Diff 527245.May 31 2023, 5:23 PM

Removed now-redundant LD/ST instructions for f16/bf16/v2f16/v2bf16

tra updated this revision to Diff 527609.Jun 1 2023, 2:02 PM

Removed more unnecessary stuff and reduced unnecessary register copying.

tra published this revision for review.Jun 1 2023, 2:16 PM
tra retitled this revision from Coalesce i16/f16/bf16 to use the same register class. to Coalesce register classes for {i16,f16,bf16}, {i32,v2f16,v2bf16}.
tra edited the summary of this revision. (Show Details)
tra added reviewers: jlebar, kushanam.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJun 1 2023, 2:16 PM
tra retitled this revision from Coalesce register classes for {i16,f16,bf16}, {i32,v2f16,v2bf16} to [NVPTX] Coalesce register classes for {i16,f16,bf16}, {i32,v2f16,v2bf16}.Jun 1 2023, 2:17 PM
tra added a comment.Jun 2 2023, 11:42 AM

I've tested the change on a bunch of tensorflow tests and the patch didn't cause any apparent issues.

jlebar accepted this revision.Jun 2 2023, 11:51 AM

I cannot say that I 100% looked over every line, but in principle this seems fine, and if it's passing TF tests then that's pretty strong evidence this is working.

This revision is now accepted and ready to land.Jun 2 2023, 11:51 AM
tra updated this revision to Diff 527929.Jun 2 2023, 12:18 PM
tra edited the summary of this revision. (Show Details)

Removed unused INT_PTX_{LDU,LDG} variants.

tra updated this revision to Diff 527936.Jun 2 2023, 12:34 PM

Added missing tests for llvm.nvvm.{ldg,ldu}.global.p

This revision was landed with ongoing or failed builds.Jun 5 2023, 12:22 PM
This revision was automatically updated to reflect the committed changes.