This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Added support for half-precision floating point.
ClosedPublic

Authored by tra on Jan 10 2017, 4:50 PM.

Details

Summary

Only scalar half-precision operations are supported at the moment.

  • Adds general support for 'half' type in NVPTX
  • fp16 math operations are supported on sm_53+ GPUs only (can be disabled with --nvptx-no-f16-math)
  • type conversions to/from fp16 are supported on all GPU variants.
  • On GPU variants that do not have full fp16 support (or if it's disabled), fp16 operations are promoted to fp32 and results are converted back to fp16 for storage.

ptxas is rather peculiar when it comes to fp16-related syntax, which had to be worked around:

  • there's no way to represent immediate fp16 argument as a hex value. We load such constants into a .b16 register first.
  • there are no .f16 variants of mov/ld/st instructions.
  • ptxas only supports .f16 registers on sm_53+ only. It does accept .b16 registers for all supported fp16-related operations on all GPU variants, so that's the type the patch uses.
  • NVPTX ABI explicitly states that fp16 can't be used as a function argument or return value. It also states that arguments and return values must be at least 32-bit wide. The patch follows the doc and uses .b32 for fp16 arguments and return values. On the other hand, current fp16 implementation in nvcc uses a struct to represent fp16 type which results in nvcc passing fp16 as an aggregate. I'm not sure whether we want/need to follow nvcc and pass fp16 as aggregates, too.

Diff Detail

Repository
rL LLVM

Event Timeline

tra updated this revision to Diff 83891.Jan 10 2017, 4:50 PM
tra retitled this revision from to [NVPTX] Added support for half-precision floating point..
tra updated this object.
tra added reviewers: jlebar, jholewinski.
tra added a subscriber: llvm-commits.
tra updated this object.Jan 10 2017, 5:30 PM
jlebar edited edge metadata.Jan 11 2017, 11:16 AM

Well this is heroic. Aside from my question about sin.approx, I just have nits, mostly about comments.

Do you think we should add some tests to the test-suite so we can cover this e2e?

lib/Target/NVPTX/NVPTXAsmPrinter.cpp
1352 ↗(On Diff #83891)

Can we add a comment why this is not f16?

1581 ↗(On Diff #83891)

Can we comment why this is not sz = 16?

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
542 ↗(On Diff #83891)

s/constants/immediates/

754 ↗(On Diff #83891)

Can we add a comment here?

1007 ↗(On Diff #83891)

ibid

2208 ↗(On Diff #83891)

ibid

lib/Target/NVPTX/NVPTXISelLowering.cpp
149 ↗(On Diff #83891)

The comment above doesn't seem to apply to these two new lines.

158 ↗(On Diff #83891)

ibid

297 ↗(On Diff #83891)

Suggest being a tad more explicit:

Promote fp16 arithmetic if fp16 hardware isn't available or the user passed --flag-name. The flag is useful because, although sm_53+ GPUs ...

320 ↗(On Diff #83891)

This is all legal even on sm_20 with ptxas from CUDA 7? (I tried looking this up in the docs and failed, sorry.)

328 ↗(On Diff #83891)

Which library -- libm? Maybe "without calling an external library" or perhaps "without calling libm" (although maybe someone will come here and tell me it's not technically libm, it's libgcc or somesuch).

2120 ↗(On Diff #83891)

Can we explain what we're doing differently here?

lib/Target/NVPTX/NVPTXInstrInfo.td
248 ↗(On Diff #83891)

Update comment.

782 ↗(On Diff #83891)

Please just make a separate paragraph, instead of indenting like this.

783 ↗(On Diff #83891)

Run-on sentence. Suggest

instructions. Instead, we have to

lib/Target/NVPTX/NVPTXRegisterInfo.cpp
80 ↗(On Diff #83891)

Nit, can we write this function either using if-no-else or if-else, instead of mixing the two? (I realize that this patch is not the original sin...)

lib/Target/NVPTX/NVPTXSubtarget.h
104 ↗(On Diff #83891)

Do we need this function at all? It's kind of confusing because pre-sm_53 we had fp16 loads/stores, so you could say we "had" fp16. But also someone might accidentally call this instead of allowFP16Math().

test/CodeGen/NVPTX/f16-instructions.ll
764 ↗(On Diff #83891)

How do we know it's correct to lower this as cvt.to.f16(sin.approx.f32(x))? That only works if we're guaranteed that the error of sin.approx.f32 is too small to be noticed in fp16. But that doesn't seem guaranteed. All the ISA says about precision is

The maximum absolute error is 2^-20.9 in quadrant 00.

This error is too small to be represented in an fp16, which would normally mean we're good. But because it qualifies with "in quadrant 00", that suggests that all bets are off if we're not in...whatever is quadrant 00. (I presume it's the first quadrant?)

Same for cosine.

jlebar added inline comments.Jan 11 2017, 11:17 AM
test/CodeGen/NVPTX/f16-instructions.ll
764 ↗(On Diff #83891)

Actually, I take it back about 2^-20.9 being too small to fit in an fp16. I forgot about denormals. See https://en.wikipedia.org/wiki/Half-precision_floating-point_format#Precision_limitations_on_decimal_values_in_.5B0.2C_1.5D

tra updated this revision to Diff 84019.Jan 11 2017, 1:47 PM
tra edited edge metadata.
tra marked 13 inline comments as done.

Addressed Justin's comments.

tra added inline comments.Jan 11 2017, 1:52 PM
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
1007 ↗(On Diff #83891)

Removed this change as we're not handling f16 vectors here yet.

lib/Target/NVPTX/NVPTXISelLowering.cpp
149 ↗(On Diff #83891)

Moved SETCC out. The comment still applies to {SELECT,BR}_CC below.

320 ↗(On Diff #83891)

Yes. Conversion instructions support f16 on all sm_20+ GPUs according to CUDA 7.0+ docs.

2120 ↗(On Diff #83891)

I don't think I need it any more -- llvm does all f16 loads/stores using .b16 ops and registers now, so I don't have to explicitly store it as MVT::i16.
I've removed the function.

lib/Target/NVPTX/NVPTXSubtarget.h
104 ↗(On Diff #83891)

Renamed to hasFP16Math.

test/CodeGen/NVPTX/f16-instructions.ll
764 ↗(On Diff #83891)

I don't have a good answer. In general, given that fp16 has fewer bits of mantissa, whatever error sin.approx.f32 produces is likely to be lost because the low bits will be lost during conversion to fp16. We'll know more once I have FP test suite running on GPU.

jlebar added inline comments.Jan 12 2017, 9:21 PM
lib/Target/NVPTX/NVPTXAsmPrinter.cpp
370 ↗(On Diff #84019)

Everywhere that we talk about f16s being stored/returned as/loaded as "integers" or "untyped integers", I think we should just say "a b16" or "an untyped value". An "untyped integer" might mean something like llvm's i16, which is definitely an integer, but is "untyped" inasmuch as it might represent a signed or an unsigned int. That's conceptually different from .b16, which is just a bag of bits.

I've noted places that would need to change with the comment "b16", but you don't have to change all of them to read "b16", so long as we don't say "untyped integer". :)

1586 ↗(On Diff #84019)

b16

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
754 ↗(On Diff #84019)

b16

754 ↗(On Diff #84019)

(Unable to delete this comment due to phab bug. This comment intentionally left empty.)

2207 ↗(On Diff #84019)

b16

lib/Target/NVPTX/NVPTXISelLowering.cpp
158 ↗(On Diff #83891)

(Yet another comment I cannot delete, please ignore me.)

150 ↗(On Diff #84019)

(I can't delete this comment; intentionally left empty.)

996 ↗(On Diff #84019)

All scalar return values, function parameters, etc, but not all scalars in general. Can we clarify?

997 ↗(On Diff #84019)

b16

1056 ↗(On Diff #84019)

Same here about "all scalars"

1057 ↗(On Diff #84019)

b16

1370 ↗(On Diff #84019)

Same about all scalars, and same b16 comment.

lib/Target/NVPTX/NVPTXInstrInfo.td
299 ↗(On Diff #84019)

Heh, we should do a big rename of "doF32FTZ" in a separate patch.

lib/Target/NVPTX/NVPTXSubtarget.h
104 ↗(On Diff #83891)

Yeah, but still...do we need it? It's confusing that the predicates in the .td are called "hasFP16Math" but they actually correspond to allowFP16Math, and I'm afraid in general someone will call hasFP16Math when they mean to call allowFP16Math. It seems that nobody needs this function today, so can we remove it and inline the SM version check into allowFP16Math?

We might also want to change the tablegen predicate to match this function name.

test/CodeGen/NVPTX/f16-instructions.ll
764 ↗(On Diff #83891)

I really think, in the absence of evidence that it is safe, we need to be conservative and disable this (except for fastmath, if you like). We should not assume that sin.approx.f32 is sufficiently accurate outside of the first quadrant (and I am not even sure it's sufficiently accurate within the first quadrant).

tra updated this revision to Diff 84338.Jan 13 2017, 11:35 AM
tra marked 11 inline comments as done.

Set unsafe-fp-math attribute on test cases that use sin/cos, so LLVM can lower them. These tests only care whether fp16->fp32->fp16 conversion happens.
Updated comments to according to Justin's suggestions.

lib/Target/NVPTX/NVPTXAsmPrinter.cpp
370 ↗(On Diff #84019)

I've attempted to reword the comments so they use .b16 to describe storage type of f16 we use in PTX.

lib/Target/NVPTX/NVPTXInstrInfo.td
299 ↗(On Diff #84019)

Will do.

jlebar added inline comments.Jan 13 2017, 11:47 AM
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
754 ↗(On Diff #84019)

b16

lib/Target/NVPTX/NVPTXInstrInfo.td
156 ↗(On Diff #84338)

Can we match the predicate name to the name of the subtarget function? Unless you think that would be more confusing than not.

lib/Target/NVPTX/NVPTXSubtarget.h
104 ↗(On Diff #83891)

^ This is my only remaining nontrivial concern about the patch. Otherwise, looks great to me.

tra updated this revision to Diff 84345.Jan 13 2017, 11:55 AM
tra marked an inline comment as done.

Renamed tablegen predicate hasFP16Math -> useFP16Math to avoid confusion with NVPTXSubtarget::hasFP16Math().

tra added inline comments.Jan 13 2017, 11:56 AM
lib/Target/NVPTX/NVPTXSubtarget.h
104 ↗(On Diff #83891)

I'd rather keep both functions here so one can tell 'hardware has this feature' from 'we can use this feature'.

You do have a point about the predicate name in .td file. I've renamed it to useFP16Math to avoid confusion with this function.

test/CodeGen/NVPTX/f16-instructions.ll
764 ↗(On Diff #83891)

Done in D28619/r291936

jlebar accepted this revision.Jan 13 2017, 11:59 AM
jlebar edited edge metadata.
This revision is now accepted and ready to land.Jan 13 2017, 11:59 AM
This revision was automatically updated to reflect the committed changes.