Patch by: Wei Ding
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
All of these tests should go in the existing test files and be run on the other subtargets as well.
I also expected way more load and store test additions. We need to have extload tests for i8->i16, i16->i32, i16->i64, some of which probably exist already.
This patch should limit itself to the basic set of required operations on i16: load and store, add, sub, bitshifts, and conversions. Optimizations like min/max should come later.
lib/Target/AMDGPU/VIInstructions.td | ||
---|---|---|
134–135 | This pattern is necessary. I believe I had a test for this in my original patch | |
135–152 | Dead code should be removed | |
148 | Should follow camel case naming convention | |
150 | No spaces around the :s | |
153–154 | Should be indented like other Pats in the file | |
159 | This is incorrect if this is a scalar zext, which currently doesn't happen because there are no scalar i16 instructions (although we may want pseudos for these). To be consistent, this should use S_MOV_B32 to materialize the 0 | |
189–190 | These should be using the signed min/max. I also think the min/max matching should be a separate patch | |
195 | These should be done in a separate patch | |
198–216 | Dead code should be removed. These instruction's dont exist. However, tests should be added to the rotl/rotr/bswap/ctlz/cttz to make sure these are properly expanded when i16 is added as legal since most operations by default are assumed to be legal if the type is |
- global-extload-i16.ll contains extload tests for i16->i32, i16->i64. Test for load/store i8->i16 is added into file global-extload-i8.ll
- Dead codes have been removed.
- Fixed indentation issues.
- Fixed scalar zext, used S_MOV_B32 to materialize the 0
- Added tests for shl, sra, sub
- Removed min/max, which will be committed into another separate patch.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
264–268 | Are these really necessary? I thought making a type legal marked these operations legal by default. | |
270–276 | Same with these too. Are they really necessary? | |
1743–1756 | Why does i16 needs special handling here. These seems to be nearly identical to the block of code directly below. | |
2022–2031 | We do we need to custom lower i16 stores? Can't we just mark then as promote? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
270–276 | min/max need to be explicitly made legal, but that should be a separate patch. setcc should be promote for now until those are added later |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
65 | I did this already, so this should check Subtarget->has16BitInsts() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2022–2031 | Load/store promote expects an equal size type for a bitcast promote. This is the same problem that i1 has, so it should follow that example |
Please temporarily ignore code changes in function LowerFDV32(). I will finish the code development for fp32 div in next patch.
Revert DFIV32 code changes. This patch contains i16 implementation only based on Tom / Matt comments.
I tried to apply this patch, but I get a lot of failing lit tests in test/CodeGen/AMDGPU. Do all the tests in this directory pass for you?
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
65 | Do we still need this comment? | |
1740 | Does this need to be removed? | |
1748 | Coding style. Variable names should start with a captial. |
LGTM overall with minor fixes:
lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
600 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
611 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
2369 ↗ | (On Diff #74904) | Space after //. Capitalize. |
2370 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
36 | Alphabetize | |
65 | I do not think we need this comment | |
66 | Subtarget->has16BitInsts() | |
264 | Subtarget->has16BitInsts() | |
308 | Detabify | |
310 | Detabify | |
312 | Detabify | |
314 | Detabify | |
315 | Remove extra new line |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
962–963 ↗ | (On Diff #74904) | Originally I was avoiding these by promoting the load return type and truncating. Is this the same or is there an advantage from known bits types of things knowing it is really a 32-bit extload? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1327 | Why is this part of the patch? This looks unrelated | |
2403–2404 | Define on same line |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
962–963 ↗ | (On Diff #74904) | The main problem is that extload must be legal, so it's hard to custom lower all extloads. I attempted to do this, and I think it would require a lot of work in the backend and also the legalizer to make this work. I'm not sure it's worth the effort at this point. |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
960 ↗ | (On Diff #76087) | This can be refined to a has 16-bit predicate |
lib/Target/AMDGPU/VOP1Instructions.td | ||
614–618 ↗ | (On Diff #76087) | Commented out code |
lib/Target/AMDGPU/VOP2Instructions.td | ||
348 ↗ | (On Diff #76087) | This should only need to be around the actual defs, not the multiclasses |
355 ↗ | (On Diff #76087) | I don't think you need to repeat the type in the output here |
lib/Target/AMDGPU/VOP3Instructions.td | ||
232 ↗ | (On Diff #76087) | Line wrapping |
Alphabetize