This patch implements d16 support for image load, image store and image sample intrinsics.
It depends on the following patch that is pending reviews:
The LIT tests are still under development and will be ready soon.
We did not add new intrinsics. We just add support for new data types. In IntrinsicsAMDGCN.td, we have already defined
the data types as any_float which includes half types.
In other wordm llvm.amdgcn.image.load.v4f16 (for example) has already been declared in IntrinsicsAMDGCN.td. And this patch just needs to actually define (implement) it.
Since the register class is changing, there shouldn't be an operand for this. This is a separate MI opcode
This should check something more specific than the generation
There shouldn't be a d16 operand, so this should be unnecessary
Checking exact registers is probably likely to break here
inreg doesn't do anything with amdgpu_kernel, so you should remove them.
For each register class, we have already had a MI opcode corresponds to it. For D16 support,
I don't know what is appropriate. It should be that the image instruction has d16 bit. We haven't defined this feature yet.
I think using a d16 operand might be the simpler approach to handle d16 support.
I know that. But I need way to differentiate v[0:1] with v[0:3], i.e. 2 or 4 registers.
will take a look and update accordingly. Thanks.
This is pretty terrible :/
I'm sorry I don't have anything more constructive to say about it yet (it's a pity that TableGen is so difficult to work with), but it needs to be said.
Extra space before (, can use Intrinsic::ID type
Returns on separte line
Can't you reuse the type convert function from the buffer patch?
Dead code. Comment isn't helpful, this is a TODO to implement this optimization for d16. Can you open a bug for that?.
For the unpacked case I would expect it to be trivial to extend for it
If we didn't have to deal with the unpacked register layout mistake, we wouldn't need to do this. There's no way to create an InstrMapping type thing for other types.
Although I'm confused why I don't see the type conversion here? It's there for the stores, but the load should also need to re-pack the results
The type manipulation is done in the buffer d16 support patch (common with buffer_load_format and tbuffer_load_format).
The idea is, (v4f16 as example), "load" to v4i32 --> truncate to v4i16, then bitcast back to v4f16.
Don't know why I didn't received a message after I updated the patch. So ping here with the updating message:
This should be the default anyway, so you shouldn't need to 0 it again
These are all the same within each block, so you can introduce another multiclass rather than repeating the same combintions. Same for the other MIMG types
Extra space before (
Extra space before (
Do you mean the following defs are the same in different block? I have difficulty in the name mangling to attch "_D16" and "_D16_gfx80" is the def. Can you please advice? Thanks.
def _V1 : MIMG_Gather_Helper <op, asm, dst_rc, VGPR_32, wqm>, MIMG_Mask<asm#"_V1", channels>;
def _V1_D16 : MIMG_Gather_Helper <op, asm, dst_rc, VGPR_32, wqm, 1>,