This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Assembler: better support for immediate literals in assembler.
ClosedPublic

Authored by SamWot on Jul 28 2016, 9:15 AM.

Details

Summary

Prevously assembler parsed all literals as either 32-bit integers or 32-bit floating-point values. Because of this we couldn't support f64 literals.
E.g. in instruction "v_fract_f64 v[0:1], 0.5", literal 0.5 was encoded as 32-bit literal 0x3f000000, which is incorrect and will be interpreted as 3.0517578125E-5 instead of 0.5. Correct encoding is inline constant 240 (optimal) or 32-bit literal 0x3FE00000 at least.

With this change the way immediate literals are parsed is changed. All literals are always parsed as 64-bit values either integer or floating-point. Then we convert parsed literals to correct form based on information about type of operand parsed (was literal floating or binary) and type of expected instruction operands (is this f32/64 or b32/64 instruction).
Here are rules how we convert literals:

  • We parsed fp literal:
    • Instruction expects 64-bit operand:
      • If parsed literal is inlinable (e.g. v_fract_f64_e32 v[0:1], 0.5)
        • then we do nothing this literal
      • Else if literal is not-inlinable but instruction requires to inline it (e.g. this is e64 encoding, v_fract_f64_e64 v[0:1], 1.5)
        • report error
      • Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
        • If instruction expect fp operand type (f64)
          • Check if low 32 bits of literal are zeroes (e.g. v_fract_f64 v[0:1], 1.5)
            • If so then do nothing
          • Else (e.g. v_fract_f64 v[0:1], 3.1415)
            • report warning that low 32 bits will be set to zeroes and precision will be lost
            • set low 32 bits of literal to zeroes
        • Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
          • report error as it is unclear how to encode this literal
    • Instruction expects 32-bit operand:
      • Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow
      • Is this literal inlinable and are we required to inline literal (e.g. v_trunc_f32_e64 v0, 0.5)
        • do nothing
        • Else report error
      • Do nothing. We can encode any other 32-bit fp literal (e.g. v_trunc_f32 v0, 10000000.0)
  • Parsed binary literal:
    • Is this literal inlinable (e.g. v_trunc_f32_e32 v0, 35)
      • do nothing
    • Else, are we required to inline this literal (e.g. v_trunc_f32_e64 v0, 35)
      • report error
    • Else, literal is not-inlinable and we are not required to inline it
      • Are high 32 bit of literal zeroes or same as sign bit (32 bit)
        • do nothing (e.g. v_trunc_f32 v0, 0xdeadbeef)
      • Else
        • report error (e.g. v_trunc_f32 v0, 0x123456789abcdef0)

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64). I added several new register operands (they extend previous register operands) and set operand types to corresponding types:
'''
enum OperandType {

OPERAND_REG_IMM32_INT,
OPERAND_REG_IMM32_FP,
OPERAND_REG_INLINE_C_INT,
OPERAND_REG_INLINE_C_FP,

}
'''

This is not working yet:

  • Several tests are failing
  • Problems with predicate methods for inline immediates
  • LLVM generated assembler parts try to select e64 encoding before e32.

More changes are required for several AsmOperands.

Diff Detail

Repository
rL LLVM

Event Timeline

SamWot updated this revision to Diff 65946.Jul 28 2016, 9:15 AM
SamWot retitled this revision from to [AMDGPU] Assembler: better support for immediate literals in assembler..
SamWot updated this object.
SamWot added a reviewer: vpykhtin.
SamWot added a subscriber: artem.tamazov.

Thanks for detailed explanation!

We parsed fp literal:

Instruction expects 64-bit operand:
  ...
  Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
    If instruction expect fp operand type (f64)
      ...
      Else (e.g. v_fract_f64 v[0:1], 3.1415)
        report warning that low 32 bits will be set to zeroes and precision will be lost
        set low 32 bits of literal to zeroes

The last two sentences are questionable. I would prefer an error here. I believe we have to try to be exact wherever possible; do you agree?

We parsed fp literal:

Instruction expects 64-bit operand:
    ...
    Else literal is not-inlinable but we can encode it as additional 32-bit literal constant
        ...
        Instruction expects integer operand type (e.g. s_mov_b64_e32 s[0:1], 1.5)
            report error as it is unclear how to encode this literal

Actually it is clear provided that we know the signedness of an operand. 32-bit literals are zero-extended to 64 bits if operand type is B64/U64, or sign-extended for I64. So:

  • If operand type is I64
    • If 33 upper bits are all ones or all zeroes, then can be encoded
    • Else error
  • Else (we have B/U64)
    • If 32 upper bits are all zeroes, then can be encoded
    • Else error

We parsed fp literal:

...
Instruction expects 32-bit operand:
    Convert parsed 64 bit fp literal to 32 bit fp. Allow lose of precision but not overflow or underflow

Good place for the "loss of precision" warning here.

Parsed binary literal:

....

I am guessing that the whole "binary literal" branch needs to be reconsidered... More details later.

For this change it is required that we know operand types of instruction (are they f32/64 or b32/64)...

Hmm, I believe we need to know also if non-FP instruction operand is signed (I32/64), unsigned (U32/64) or "untyped bits" (B32/64).

Let's proceed with actual code review after we agree on basic algorithm.

Does using a 32-bit literal for a 64-bit operand actually work? I vaguely remember asking about this when working on operand folding and it wasn't clear if that really worked

artem.tamazov added a comment.EditedAug 16 2016, 4:43 AM

Does using a 32-bit literal for a 64-bit operand actually work? I vaguely remember asking about this when working on operand folding and it wasn't clear if that really worked

Yes, it works. _F64 operands are zero-padded (low 32 bits set to zero, high 32 bits taken from instruction stream), _B/U64 - zero-extended, _I64 - sign-extended.

SamWot updated this revision to Diff 69372.Aug 26 2016, 8:05 AM
SamWot edited edge metadata.

Changed way how it all works. Moved all checks to predicate methods and all mutators to render methods.

There are 5 failing tests for now. With this changes for several instruction (integer VOP instructions) assembler tries to generate 64-bit encoding in first place instead of 32-bit encoding where it is possible. This is correct behaviour (instructions are identical) but unwantable: we want assembler to generate 32-bit encoding where possible. This is caused by order in which instructions are sorted in MatchTable inside LLVM-generated assembler. We can't control how instructions are sorted and we shouldn't rely on this. It was just luck that previously they were sorted as we wanted and 32-bit instruction were generated in first place.
We should explicitly convert from 64-bit encoding to 32-bit encoding where possible. This might be done in converter method for VOP3 instructions. I will try to implement this.

...There are 5 failing tests for now... We should explicitly convert from 64-bit encoding to 32-bit encoding where possible.
This might be done in converter method for VOP3 instructions. I will try to implement this.

Yes, when encoding width is not specified explicitly (none of _e32 or _e64 suffix), assembler must use the narrowest width possible. Let's make this feature explicit: I recommend creating separate tests for it and fixing the 5 failing tests (by adding _e32 suffix). This way, this change would not lead to regressions. The new tests will fail until new VOP3 converter is implemented.

SamWot updated this revision to Diff 69698.Aug 30 2016, 8:36 AM

Fixed error with KIMM operand.
Moved all failing tests cases to new file vop3-conversion.s and marked them as XFAIL. I will fix them later.

SamWot updated this revision to Diff 69824.Aug 31 2016, 3:54 AM

Rebased on top of latest SOP changes.

SamWot added a reviewer: Restricted Project.Aug 31 2016, 3:56 AM

What about f16 inline immediates? I've been wondering how to handle those since currently the size of the register class is used in various places, but for f16 it will still be the same VReg_32

artem.tamazov accepted this revision.Sep 1 2016, 4:28 AM
artem.tamazov added a reviewer: artem.tamazov.

Actually it is clear provided that we know the signedness of an operand. 32-bit literals are zero-extended to 64 bits if operand type is B64/U64, or sign-extended for I64. So...

I am guessing that the whole "binary literal" branch needs to be reconsidered...

...we need to know also if non-FP instruction operand is signed (I32/64), unsigned (U32/64) or "untyped bits" (B32/64).

Finding out signness of operands seems to be complicated task. Let's address it later, by dedicated patch.

What about f16 inline immediates?...

I recommend sorting out f16 immediates separately.

This revision is now accepted and ready to land.Sep 1 2016, 4:28 AM
SamWot added a comment.Sep 1 2016, 4:47 AM

What about f16 inline immediates? I've been wondering how to handle those since currently the size of the register class is used in various places, but for f16 it will still be the same VReg_32

In this change f16 are processed similarly as f32. It is not correct but implementing full support for f16 would involve massive changes. We decided that it is not necessary for now but we plan to do this later.
As you said main problem here is that f16 uses 32-bit registers so we can't use register size to understad type of operand.

vpykhtin accepted this revision.Sep 9 2016, 6:17 AM
vpykhtin edited edge metadata.

LGTM.

lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
692 ↗(On Diff #69824)

spare part?

SamWot updated this revision to Diff 70836.Sep 9 2016, 7:45 AM
SamWot edited edge metadata.

Removed XFAIL from conversion tests

This revision was automatically updated to reflect the committed changes.