Page MenuHomePhabricator

AMDGPU/SI: Implement d16 support for image intrinsics
ClosedPublic

Authored by cfang on Nov 10 2017, 11:13 AM.

Details

Reviewers
arsenm
b-sumner
Summary

This patch implements d16 support for image load, image store and image sample intrinsics.
It depends on the following patch that is pending reviews:
https://reviews.llvm.org/D38906

The LIT tests are still under development and will be ready soon.

Diff Detail

Event Timeline

cfang created this revision.Nov 10 2017, 11:13 AM
b-sumner edited edge metadata.Nov 10 2017, 12:17 PM

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

We did not add new intrinsics. We just add support for new data types. In IntrinsicsAMDGCN.td, we have already defined
the data types as any_float which includes half types.

In other wordm llvm.amdgcn.image.load.v4f16 (for example) has already been declared in IntrinsicsAMDGCN.td. And this patch just needs to actually define (implement) it.

Pardon my ignorance, but why isn't include/llvm/IR/IntrinsicsAMDGCN.td being updated?

We did not add new intrinsics. We just add support for new data types. In IntrinsicsAMDGCN.td, we have already defined
the data types as any_float which includes half types.

In other wordm llvm.amdgcn.image.load.v4f16 (for example) has already been declared in IntrinsicsAMDGCN.td. And this patch just needs to actually define (implement) it.

Ugh. I knew that. Sorry for the spam.

cfang updated this revision to Diff 122850.Nov 14 2017, 8:38 AM

Add LIT tests.

arsenm added inline comments.Nov 14 2017, 12:02 PM
lib/Target/AMDGPU/MIMGInstructions.td
39

Since the register class is changing, there shouldn't be an operand for this. This is a separate MI opcode

lib/Target/AMDGPU/SIISelLowering.cpp
6903

This should check something more specific than the generation

7131–7133

There shouldn't be a d16 operand, so this should be unnecessary

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.ll
11–12

Checking exact registers is probably likely to break here

79

inreg doesn't do anything with amdgpu_kernel, so you should remove them.

cfang added inline comments.Nov 16 2017, 11:03 AM
lib/Target/AMDGPU/MIMGInstructions.td
39

For each register class, we have already had a MI opcode corresponds to it. For D16 support,
we are NOT introducing new register class that needs new MI Opcode. In stead we are using the existing ones.
Also, we could not use register class to identify whether d16 bit is set. This should come from the data types from
the intrinsics, and thus we need an operand bit.

lib/Target/AMDGPU/SIISelLowering.cpp
6903

I don't know what is appropriate. It should be that the image instruction has d16 bit. We haven't defined this feature yet.
since it starts from VI, so I just use the generation here. Do you think I can use something like target has f16? Or define a new feature? Thanks.

7131–7133

I think using a d16 operand might be the simpler approach to handle d16 support.

test/CodeGen/AMDGPU/llvm.amdgcn.image.sample.d16.ll
11–12

I know that. But I need way to differentiate v[0:1] with v[0:3], i.e. 2 or 4 registers.

79

will take a look and update accordingly. Thanks.

nhaehnle added inline comments.Nov 22 2017, 11:33 AM
lib/Target/AMDGPU/SIISelLowering.cpp
3491–3569

This is pretty terrible :/

I'm sorry I don't have anything more constructive to say about it yet (it's a pity that TableGen is so difficult to work with), but it needs to be said.

cfang updated this revision to Diff 126799.Dec 13 2017, 11:50 AM
  1. Add definitions for new machine instructions for D16 bit and for whether target support packed/unpacked f16.
  2. Add a flag bit to indicate whether an instruction is referencing 16-bit half typed data (D16).
  3. Update to the LIT test cases to avoid using specific registers.
NOTE: We may define more instructions than necessary for the "PACKED", but I would appreciate suggestions to get rid of them.
arsenm added inline comments.Dec 22 2017, 12:18 PM
lib/Target/AMDGPU/SIISelLowering.cpp
549

Extra space before (, can use Intrinsic::ID type

551

Returns on separte line

4906–4908

Can't you reuse the type convert function from the buffer patch?

7125

Dead code. Comment isn't helpful, this is a TODO to implement this optimization for d16. Can you open a bug for that?.

For the unpacked case I would expect it to be trivial to extend for it

arsenm added inline comments.Dec 22 2017, 12:20 PM
lib/Target/AMDGPU/SIISelLowering.cpp
3491–3569

If we didn't have to deal with the unpacked register layout mistake, we wouldn't need to do this. There's no way to create an InstrMapping type thing for other types.

Although I'm confused why I don't see the type conversion here? It's there for the stores, but the load should also need to re-pack the results

arsenm added inline comments.Dec 22 2017, 12:22 PM
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4134 ↗(On Diff #126799)

It's part of the asm string, so I don't think you need this?

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
220–223 ↗(On Diff #126799)

You shouldn't need this? The d16 string is hardcoded into the asm string?

cfang added inline comments.Jan 5 2018, 8:44 AM
lib/Target/AMDGPU/SIISelLowering.cpp
3491–3569

The type manipulation is done in the buffer d16 support patch (common with buffer_load_format and tbuffer_load_format).

The idea is, (v4f16 as example), "load" to v4i32 --> truncate to v4i16, then bitcast back to v4f16.

cfang marked 2 inline comments as done.Jan 5 2018, 8:48 AM
cfang added inline comments.
lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
4134 ↗(On Diff #126799)

Thanks. Will remove this unnecessary part.

lib/Target/AMDGPU/InstPrinter/AMDGPUInstPrinter.cpp
220–223 ↗(On Diff #126799)

Yes, this is no longer needed.

cfang marked 6 inline comments as done.Jan 5 2018, 8:56 AM
cfang added inline comments.
lib/Target/AMDGPU/SIISelLowering.cpp
4906–4908

Yes. I have used a new function in the buffer patch to do the conversion, and will call that function. Thanks.

7125

Sure I will open a ticket to add similar optimization for d16. You are right for unpacked case. Thanks.

cfang updated this revision to Diff 129912.Jan 15 2018, 3:18 PM
cfang marked 2 inline comments as done.
  1. sync with the buffer d16 support patch to reuse some of the functionalities;
  2. Remove unnecessary functions that print "d16" in the AsmPrinter;
  3. Update based on Matt's other comments.
cfang added a comment.Jan 16 2018, 8:17 AM

Don't know why I didn't received a message after I updated the patch. So ping here with the updating message:

  1. Sync with the buffer d16 support patch to reuse some of the functionalities;
  2. Remove unnecessary functions that print "d16" in the AsmPrinter;
  3. Update based on Matt's other comments.
arsenm added inline comments.Jan 17 2018, 8:02 AM
lib/Target/AMDGPU/MIMGInstructions.td
181

This should be the default anyway, so you shouldn't need to 0 it again

322–331

These are all the same within each block, so you can introduce another multiclass rather than repeating the same combintions. Same for the other MIMG types

lib/Target/AMDGPU/SIISelLowering.cpp
3507

Extra space before (

3508–3509

Indentation off

3658

newline

3742

Extra space before (

cfang added inline comments.Jan 17 2018, 10:04 AM
lib/Target/AMDGPU/MIMGInstructions.td
322–331

Do you mean the following defs are the same in different block? I have difficulty in the name mangling to attch "_D16" and "_D16_gfx80" is the def. Can you please advice? Thanks.

def _V1 : MIMG_Gather_Helper <op, asm, dst_rc, VGPR_32, wqm>,
            MIMG_Mask<asm#"_V1", channels>;

def _V1_D16 : MIMG_Gather_Helper <op, asm, dst_rc, VGPR_32, wqm, 1>,

MIMG_Mask<asm#"_V1_D16", channels>;
cfang marked 5 inline comments as done.Jan 18 2018, 8:23 AM
cfang updated this revision to Diff 130419.Jan 18 2018, 8:25 AM

Update based on Matt's suggestion: Factor out the common defs.

cfang added a comment.Jan 18 2018, 8:35 AM

Patched updated! Request for reviewer's check. Thanks.
<still I could not receive message of the diff update>

arsenm accepted this revision.Jan 18 2018, 12:32 PM

LGTM

This revision is now accepted and ready to land.Jan 18 2018, 12:32 PM