This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Added support for v2f16 operations.
ClosedPublic

Authored by tra on Feb 16 2017, 5:06 PM.

Details

Summary

This patch enables support for .f16x2 operations.

  • Added new register type Float16x2.
  • Added support for .f16x2 instructions.
  • Added handling of vectorized loads/stores of v2f16 values.

Event Timeline

jlebar edited edge metadata.Feb 16 2017, 6:36 PM

I really like this patch.

lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
869

I'm confused how this works without additional changes elsewhere. We increase fromTypeWidth, but this doesn't affect the selection of Opcode below. So how do we know that this is a packed-i32 load rather than a regular f16 load?

2410

Should this be unified with the similar code above?

lib/Target/NVPTX/NVPTXISelLowering.cpp
176

, which

181

Vectors with an even number

188

j != NumElts?

546

expandef

546

, which

2332

getLoad needs a vector type, but it can't handle vectors which contain v2f16 elements. So we must load using i32 here and then bitcast back.

2333

, which

2356

", and", or even better, write as two sentences.

4225

, which :)

4227

The returned result will be scalarized by the legalizer, but the comparison will remain a single vector instruction.

4233

Nit, return DCI.DAG.getNode(...)?

lib/Target/NVPTX/NVPTXInstrInfo.td
2856

Could we give these slightly louder names, maybe ImmediateZero and ImmediateOne or immZero/immOne? Otherwise it's not really clear that "i1imm" and "imm0" are completely different things -- they look too similar.

2863

Nit, indentation, also on the one below.

2873

What if the vector element index is not an immediate? Do we just fail to select? I'd think we should be able to handle that.

2882

the b32 register

tra updated this revision to Diff 89448.Feb 22 2017, 5:15 PM
tra marked 12 inline comments as done.

Addressed Justin's comments.
Added support for lowering EXTRACT_VECTOR_ELT with non-constant index.
Improved handling of loads/stores of f16 vectors (we can handle v8f16 now).
Added few more test cases.

tra marked 5 inline comments as done.Feb 22 2017, 5:22 PM
tra added inline comments.
lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp
869

fromTypeWidth=16, vecType = NVPTX::PTXLdStInstCode::V2 would normally result in ld.v2.b16. Unfortunately, you can't use it to load v2f16, because it does not accept 32-bit registers that we use for v2f16. This code changes load width to 32 (which now can load v2f16 bits into a 32-bit register) and reduces number of elements to compensate for it -- we emit ld.b32.

llvm keeps splitting vectors until it ends up with a legal register type. If a vector of f16 has even number of elements, it's split into chunks of v2f16, otherwise it gets split into scalar f16 elements.

It also appears that vector types don't get passed to this function as they are either handled by tryLoadVector() or split by legalizer into scalars. This patch introduces a load of v2f16 element which will be the only vector to be handled here.

2410

It turns out that most of this code wasn't necessary. All vector types (except v2f16) are either handled in tryStoreVector() or scalarized. I've removed unneeded code.

lib/Target/NVPTX/NVPTXISelLowering.cpp
2333

TODO discarded. getLoad/getStore are the right way to create loads/stores as they can handle unaligned loads/stores correctly, which getMemIntrinsicNode() does not.

lib/Target/NVPTX/NVPTXInstrInfo.td
2856

I can just use 0/1 in the instruction pattern below without wrapping them into a predicate.
I've moved these predicates and simplified subvector extraction instructions below.

2873

Yes. We did fail to select. I've added custom lowering for that case, so it works now. I'm not sure if it's needed, though.

This revision is now accepted and ready to land.Feb 23 2017, 10:21 AM
This revision was automatically updated to reflect the committed changes.
tra marked 3 inline comments as done.