This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][CODEGEN] Added 'A' constraint for inline assembler
ClosedPublic

Authored by dp on Apr 20 2020, 7:41 AM.

Details

Summary

The 'A' constraint requires an integer or a floating-point inline constant.

Diff Detail

Event Timeline

dp created this revision.Apr 20 2020, 7:41 AM
arsenm added inline comments.Apr 20 2020, 8:14 AM
llvm/docs/LangRef.rst
4131

Shouldn't this be any inline immediate, which includes the FP values (plus the subtarget dependent one)

llvm/test/CodeGen/AMDGPU/inline-constraints.ll
100

FP value tests including the subtarget dependent inv 2pi one?

I also assume this needs a clang part?

dp marked an inline comment as done.Apr 20 2020, 9:39 AM
dp added inline comments.
llvm/docs/LangRef.rst
4131

Probably should. But GCC actually supports about 30 constraints which we do not.

I'm not sure how easy it will be to add fp constants because llvm's C_Immediate is for integers only. I'll take a look.

arsenm added inline comments.Apr 20 2020, 9:45 AM
llvm/docs/LangRef.rst
4131

The type doesn't matter. The bit values should still work as integers. It doesn't need to be FP

dp marked an inline comment as done.Apr 20 2020, 10:10 AM
dp added inline comments.
llvm/docs/LangRef.rst
4131

Ok, thanks!

dp updated this revision to Diff 260886.Apr 29 2020, 4:56 AM
dp edited the summary of this revision. (Show Details)

Added support of fp inline constants. Clang part and documentation will be updated by a separate change.

Herald added a project: Restricted Project. · View Herald TranscriptApr 29 2020, 4:56 AM
dp updated this revision to Diff 260889.Apr 29 2020, 5:07 AM

Corrected documentation.

arsenm added inline comments.May 1 2020, 11:40 AM
llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp
1347

No else after return

llvm/test/CodeGen/AMDGPU/inline-constraints.ll
299

can you also add some tests for packed <2 x i16>/<2 x half> operands

dp updated this revision to Diff 262104.May 5 2020, 7:24 AM

Corrected code style; added support of v2x16 types.

dp marked an inline comment as done.May 5 2020, 7:26 AM
dp added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10914

Should we enforce this limitation? GCC seems to allow arbitrary vector size and type provided that all elements are equal and may be inlined.

dp added a comment.May 20 2020, 3:35 AM

What issues do you see in this change? Thanks.

arsenm added inline comments.May 20 2020, 5:52 AM
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10914

You mean it also accepts <4 x i8> because it's a 32-bit type?

llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
1464–1466

Logic doesn't make sense here? Returns the 64-bit value if it's 16 or 32?

1469–1478

I wouldn't expect any FP rounding in the assembler

dp marked 6 inline comments as done.May 20 2020, 9:53 AM
dp added inline comments.
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10914

I was wrong about GCC 'vectors'. Looks like they are not regular types and used only for auto-vectorization.

Here is the GCC implementation:

(define_constraint "A"
  "Inline immediate parameter"
  (and (match_code "const_int,const_double,const_vector")
       (match_test "gcn_inline_constant_p (op)")))

...

bool
gcn_inline_constant_p (rtx x)
{
  if (GET_CODE (x) == CONST_INT)
    return INTVAL (x) >= -16 && INTVAL (x) <= 64;
  if (GET_CODE (x) == CONST_DOUBLE)
    return gcn_inline_fp_constant_p (x, false);
  if (GET_CODE (x) == CONST_VECTOR)
    {
      int n;
      if (!vgpr_vector_mode_p (GET_MODE (x)))
	return false;
      n = gcn_inline_constant_p (CONST_VECTOR_ELT (x, 0));
      if (!n)
	return false;
      for (int i = 1; i < 64; i++)
	if (CONST_VECTOR_ELT (x, i) != CONST_VECTOR_ELT (x, 0))
	  return false;
      return 1;
    }
  return false;
}

...

inline bool
vgpr_vector_mode_p (machine_mode mode)
{
  return (mode == V64QImode || mode == V64HImode
	  || mode == V64SImode || mode == V64DImode
	  || mode == V64HFmode || mode == V64SFmode || mode == V64DFmode);
}
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
1464–1466

When working on this fix I was under an impression that types of values passed to inline assembler are not required to match instruction operand types. In other words, I assumed that the following code is legal:

"v_mov_b32 $0, $1", "=v,A"(half 0xH3C00) // 1.0h

Also I assumed that this code required a type conversion to end up as "v_mov_b32 ..., 1.0" and not "v_mov_b32 ..., 0x3C00" because in the latter case the constant cannot be encoded as inline value.

That assumption complicated implementation. Was the assumption incorrect?

If types of values passed to inline assembler are required to match types of corresponding assembler operands, the implementation would be much simpler.

1469–1478

See my comments above.

arsenm added inline comments.May 20 2020, 2:08 PM
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
1464–1466

We have to use 32-bit instruction to materialize 16-bit values. This should be zero extended. This should end up as v_mov_b32 $0, 0x00003c00. For some cases we don't currently use in codegen, (and I'm not sure are supported by the assembler), some 64-bit operands accept 32-bit literals which are zero extended

dp updated this revision to Diff 265480.May 21 2020, 5:33 AM
dp marked 3 inline comments as done.

Updated after a discussion with Matt.

arsenm accepted this revision.May 21 2020, 7:51 AM

I guess the fundamental problem for the 16-bit immediate case is you can't know the use instruction's context. You're really accepting something that's not an inline immediate in the use context, but you have no way of knowing that.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10920

This should be getZExtValue? It looks like the tests worked correctly here, but not sure why it didn't matter

This revision is now accepted and ready to land.May 21 2020, 7:51 AM

I guess the fundamental problem for the 16-bit immediate case is you can't know the use instruction's context. You're really accepting something that's not an inline immediate in the use context, but you have no way of knowing that.

Actually, could we emit an error in the MC layer if it turns out to not be used in a 16-bit context? That would make more sense, but we don't have a way of tracking the immediate size

dp marked 2 inline comments as done.May 22 2020, 3:34 AM

I guess the fundamental problem for the 16-bit immediate case is you can't know the use instruction's context. You're really accepting something that's not an inline immediate in the use context, but you have no way of knowing that.

I believe this is a generic problem which affects all supported types because of unknown context.
In the previous version of this change I tried to alleviate this problem by emitting all fp constants as double so they would work in any context.
As you have pointed out, this is unsuitable in some cases where the constant type is importand but for these cases we could have used a 'B' constraint:

"v_mov_b32 $0, $1", "=v,B"(half 0xH3C00)

However I think the corrent version of this change is simpler and cleaner and it has its logic.
We expect constant type to match assembler operand type, but in the end it is up to programmer to decide how to use the constant in assembly code.

Actually, could we emit an error in the MC layer if it turns out to not be used in a 16-bit context? That would make more sense, but we don't have a way of tracking the immediate size

Agree, I do not see how this can be done.

llvm/lib/Target/AMDGPU/SIISelLowering.cpp
10920

This does not really matter because isInlinableLiteralXX truncates input value to XX bits before checking.
However getting sign-extended values is useful to easily recognize negative integer constants and get prettier output.

Consider this code (actually a corner case):

... asm "v_mov_b32 $0, $1", "=v,A"(half bitcast (i16 -1 to half))

Currently the code ends up as

v_mov_b32 v0, -1

If we replace getSExtValue with getZExtValue, the result would be

v_mov_b32 v0, 0xFFFF
This revision was automatically updated to reflect the committed changes.
dp marked an inline comment as done.