This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Accept arbitrary sized sources in CalculateByteProvider
ClosedPublic

Authored by jrbyrnes on Aug 28 2023, 4:38 PM.

Details

Summary

This allows working with e.g. v8i8 / v16i8 sources.

It is generally useful, but is primarily beneficial when allowing e.g. v8i8s to be passed to branches directly through registers. As such, this is the first in a series of patches to enable that work. However, it effects https://reviews.llvm.org/D155995, so it has been implemented on top of that.

Diff Detail

Unit TestsFailed

Event Timeline

jrbyrnes created this revision.Aug 28 2023, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2023, 4:38 PM
jrbyrnes requested review of this revision.Aug 28 2023, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2023, 4:38 PM
arsenm accepted this revision.Aug 29 2023, 5:00 AM
This revision is now accepted and ready to land.Aug 29 2023, 5:00 AM
jrbyrnes updated this revision to Diff 554812.Aug 30 2023, 12:56 PM

Account for max scalar size of i256. Factor out common code.

Minor regressions in load-hi16.ll are caused by problematic heuristic hasNon16BitAccesses which will be addressed separately.

arsenm accepted this revision.Aug 30 2023, 4:20 PM
This revision was landed with ongoing or failed builds.Oct 23 2023, 4:08 PM
This revision was automatically updated to reflect the committed changes.
foad added a comment.Oct 24 2023, 11:42 AM

This seems to be causing some Vulkan CTS failures. I'm working on reducing a test case.

This seems to be causing some Vulkan CTS failures. I'm working on reducing a test case.

Are they fixed by https://github.com/llvm/llvm-project/pull/70153?

foad added a comment.Oct 25 2023, 12:47 AM

This seems to be causing some Vulkan CTS failures. I'm working on reducing a test case.

Are they fixed by https://github.com/llvm/llvm-project/pull/70153?

No that does not change the test results.

foad added a comment.Oct 25 2023, 1:23 AM

Here is a test case:


I compiled with llc -march=amdgcn -mcpu=gfx1030 gs.ll -o /dev/null -debug and saw the following, heavily edited:

Initial selection DAG: %bb.5 '_amdgpu_gs_main:.exportVertex'
SelectionDAG has 261 nodes:
...
    t148: v2i32 = vselect # D:1 t64, t78, t147
  t149: v2i16 = truncate # D:1 t148
...
            t250: i16 = extract_vector_elt # D:1 t149, Constant:i32<1>
          t251: i32 = zero_extend # D:1 t250
        t252: i32 = shl nuw # D:1 t251, Constant:i32<16>
          t248: i16 = extract_vector_elt # D:1 t149, Constant:i32<0>
        t249: i32 = zero_extend # D:1 t248
      t253: i32 = or # D:1 t252, t249
...
Combining: t253: i32 = or # D:1 t252, t249
Creating new node: t262: i64 = bitcast # D:1 t148
Creating new node: t263: i32 = truncate # D:1 t262
 ... into: t263: i32 = truncate # D:1 t262

Note that t149 truncates each element of t148 from 32 to 16 bits.
t253 was extracting the two parts of t149 and combining them into single i32, i.e. it was equivalent to i32 bitcast t149.
You've replaced it with i32 truncate (i64 bitcast t148) which is equivalent to extracting element 0 of t148.
These are clearly not the same thing. I'd like to revert the patch unless you have a quick fix.