This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Added support for half-precision floating point.
ClosedPublic

Authored by tra on Jan 10 2017, 4:50 PM.

Details

Summary

Only scalar half-precision operations are supported at the moment.

  • Adds general support for 'half' type in NVPTX
  • fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math)
  • type conversions to/from fp16 are supported on all GPU variants.
  • On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage.

ptxas is rather peculiar when it comes to fp16-related syntax, which had to be worked around:

  • there's no way to represent immediate fp16 argument as a hex value. We load such constants into a .b16 register first.
  • there are no .f16 variants of mov/ld/st instructions.
  • ptxas only supports .f16 registers on sm_53+ only. It does accept .b16 registers for all supported fp16-related operations on all GPU variants, so that's the type the patch uses.
  • NVPTX ABI explicitly states that fp16 can't be used as a function argument or return value. It also states that arguments and return values must be at least 32-bit wide. The patch follows the doc and uses .b32 for fp16 arguments and return values. On the other hand, current fp16 implementation in nvcc uses a struct to represent fp16 type which results in nvcc passing fp16 as an aggregate. I'm not sure whether we want/need to follow nvcc and pass fp16 as aggregates, too.

Event Timeline

tra updated this revision to Diff 83891.Jan 10 2017, 4:50 PM
tra retitled this revision from to [NVPTX] Added support for half-precision floating point..
tra updated this object.
tra added reviewers: jlebar, jholewinski.
tra added a subscriber: llvm-commits.
tra updated this object.Jan 10 2017, 5:30 PM
jlebar edited edge metadata.Jan 11 2017, 11:16 AM

Well this is heroic. Aside from my question about sin.approx, I just have nits, mostly about comments.

Do you think we should add some tests to the test-suite so we can cover this e2e?

lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1389

Can we add a comment why this is not f16?

1617

Can we comment why this is not sz = 16?

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
547

s/constants/immediates/

759–761

Can we add a comment here?

1013

ibid

2213

ibid

lib/Target/NVPTX/NVPTXISelLowering.cpp
176

The comment above doesn't seem to apply to these two new lines.

183

ibid

322

Suggest being a tad more explicit:

Promote fp16 arithmetic if fp16 hardware isn't available or the user passed --flag-name. The flag is useful because, although sm_53+ GPUs ...

346

This is all legal even on sm_20 with ptxas from CUDA 7? (I tried looking this up in the docs and failed, sorry.)

354

Which library -- libm? Maybe "without calling an external library" or perhaps "without calling libm" (although maybe someone will come here and tell me it's not technically libm, it's libgcc or somesuch).

2148

Can we explain what we're doing differently here?

lib/Target/NVPTX/NVPTXInstrInfo.td
249

Update comment.

783

Please just make a separate paragraph, instead of indenting like this.

784

Run-on sentence. Suggest

instructions. Instead, we have to

lib/Target/NVPTX/NVPTXRegisterInfo.cpp
74

Nit, can we write this function either using if-no-else or if-else, instead of mixing the two? (I realize that this patch is not the original sin...)

lib/Target/NVPTX/NVPTXSubtarget.h
104

Do we need this function at all? It's kind of confusing because pre-sm_53 we had fp16 loads/stores, so you could say we "had" fp16. But also someone might accidentally call this instead of allowFP16Math().

test/CodeGen/NVPTX/f16-instructions.ll
765

How do we know it's correct to lower this as cvt.to.f16(sin.approx.f32(x))? That only works if we're guaranteed that the error of sin.approx.f32 is too small to be noticed in fp16. But that doesn't seem guaranteed. All the ISA says about precision is

The maximum absolute error is 2^-20.9 in quadrant 00.

This error is too small to be represented in an fp16, which would normally mean we're good. But because it qualifies with "in quadrant 00", that suggests that all bets are off if we're not in...whatever is quadrant 00. (I presume it's the first quadrant?)

Same for cosine.

jlebar added inline comments.Jan 11 2017, 11:17 AM
test/CodeGen/NVPTX/f16-instructions.ll
765

Actually, I take it back about 2^-20.9 being too small to fit in an fp16. I forgot about denormals. See https://en.wikipedia.org/wiki/Half-precision_floating-point_format#Precision_limitations_on_decimal_values_in_.5B0.2C_1.5D

tra updated this revision to Diff 84019.Jan 11 2017, 1:47 PM
tra edited edge metadata.
tra marked 13 inline comments as done.

Addressed Justin's comments.

tra added inline comments.Jan 11 2017, 1:52 PM
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
1013

Removed this change as we're not handling f16 vectors here yet.

lib/Target/NVPTX/NVPTXISelLowering.cpp
176

Moved SETCC out. The comment still applies to {SELECT,BR}_CC below.

346

Yes. Conversion instructions support f16 on all sm_20+ GPUs according to CUDA 7.0+ docs.

2148

I don't think I need it any more -- llvm does all f16 loads/stores using .b16 ops and registers now, so I don't have to explicitly store it as MVT::i16.
I've removed the function.

lib/Target/NVPTX/NVPTXSubtarget.h
104

Renamed to hasFP16Math.

test/CodeGen/NVPTX/f16-instructions.ll
765

I don't have a good answer. In general, given that fp16 has fewer bits of mantissa, whatever error sin.approx.f32 produces is likely to be lost because the low bits will be lost during conversion to fp16. We'll know more once I have FP test suite running on GPU.

jlebar added inline comments.Jan 12 2017, 9:21 PM
lib/Target/NVPTX/NVPTXAsmPrinter.cpp
411

Everywhere that we talk about f16s being stored/returned as/loaded as "integers" or "untyped integers", I think we should just say "a b16" or "an untyped value". An "untyped integer" might mean something like llvm's i16, which is definitely an integer, but is "untyped" inasmuch as it might represent a signed or an unsigned int. That's conceptually different from .b16, which is just a bag of bits.

I've noted places that would need to change with the comment "b16", but you don't have to change all of them to read "b16", so long as we don't say "untyped integer". :)

1618

b16

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
759–761

b16

759–761

(Unable to delete this comment due to phab bug. This comment intentionally left empty.)

2212–2214

b16

lib/Target/NVPTX/NVPTXISelLowering.cpp
174

(I can't delete this comment; intentionally left empty.)

183

(Yet another comment I cannot delete, please ignore me.)

1020

All scalar return values, function parameters, etc, but not all scalars in general. Can we clarify?

1021

b16

1081

Same here about "all scalars"

1082

b16

1396

Same about all scalars, and same b16 comment.

lib/Target/NVPTX/NVPTXInstrInfo.td
300

Heh, we should do a big rename of "doF32FTZ" in a separate patch.

lib/Target/NVPTX/NVPTXSubtarget.h
104

Yeah, but still...do we need it? It's confusing that the predicates in the .td are called "hasFP16Math" but they actually correspond to allowFP16Math, and I'm afraid in general someone will call hasFP16Math when they mean to call allowFP16Math. It seems that nobody needs this function today, so can we remove it and inline the SM version check into allowFP16Math?

We might also want to change the tablegen predicate to match this function name.

test/CodeGen/NVPTX/f16-instructions.ll
765

I really think, in the absence of evidence that it is safe, we need to be conservative and disable this (except for fastmath, if you like). We should not assume that sin.approx.f32 is sufficiently accurate outside of the first quadrant (and I am not even sure it's sufficiently accurate within the first quadrant).

tra updated this revision to Diff 84338.Jan 13 2017, 11:35 AM
tra marked 11 inline comments as done.

Set unsafe-fp-math attribute on test cases that use sin/cos, so LLVM can lower them. These tests only care whether fp16->fp32->fp16 conversion happens.
Updated comments to according to Justin's suggestions.

lib/Target/NVPTX/NVPTXAsmPrinter.cpp
411

I've attempted to reword the comments so they use .b16 to describe storage type of f16 we use in PTX.

lib/Target/NVPTX/NVPTXInstrInfo.td
300

Will do.

jlebar added inline comments.Jan 13 2017, 11:47 AM
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
759–761

b16

lib/Target/NVPTX/NVPTXInstrInfo.td
156

Can we match the predicate name to the name of the subtarget function? Unless you think that would be more confusing than not.

lib/Target/NVPTX/NVPTXSubtarget.h
104

^ This is my only remaining nontrivial concern about the patch. Otherwise, looks great to me.

tra updated this revision to Diff 84345.Jan 13 2017, 11:55 AM
tra marked an inline comment as done.

Renamed tablegen predicate hasFP16Math -> useFP16Math to avoid confusion with NVPTXSubtarget::hasFP16Math().

tra added inline comments.Jan 13 2017, 11:56 AM
lib/Target/NVPTX/NVPTXSubtarget.h
104

I'd rather keep both functions here so one can tell 'hardware has this feature' from 'we can use this feature'.

You do have a point about the predicate name in .td file. I've renamed it to useFP16Math to avoid confusion with this function.

test/CodeGen/NVPTX/f16-instructions.ll
765

Done in D28619/r291936

jlebar accepted this revision.Jan 13 2017, 11:59 AM
jlebar edited edge metadata.
This revision is now accepted and ready to land.Jan 13 2017, 11:59 AM
This revision was automatically updated to reflect the committed changes.