Patch by: Wei Ding
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
All of these tests should go in the existing test files and be run on the other subtargets as well.
I also expected way more load and store test additions. We need to have extload tests for i8->i16, i16->i32, i16->i64, some of which probably exist already.
This patch should limit itself to the basic set of required operations on i16: load and store, add, sub, bitshifts, and conversions. Optimizations like min/max should come later.
lib/Target/AMDGPU/VIInstructions.td | ||
---|---|---|
170 ↗ | (On Diff #50288) | This pattern is necessary. I believe I had a test for this in my original patch |
178–195 ↗ | (On Diff #50288) | Dead code should be removed |
199 ↗ | (On Diff #50288) | Should follow camel case naming convention |
201 ↗ | (On Diff #50288) | No spaces around the :s |
204–205 ↗ | (On Diff #50288) | Should be indented like other Pats in the file |
210 ↗ | (On Diff #50288) | This is incorrect if this is a scalar zext, which currently doesn't happen because there are no scalar i16 instructions (although we may want pseudos for these). To be consistent, this should use S_MOV_B32 to materialize the 0 |
240–241 ↗ | (On Diff #50288) | These should be using the signed min/max. I also think the min/max matching should be a separate patch |
246 ↗ | (On Diff #50288) | These should be done in a separate patch |
249–267 ↗ | (On Diff #50288) | Dead code should be removed. These instruction's dont exist. However, tests should be added to the rotl/rotr/bswap/ctlz/cttz to make sure these are properly expanded when i16 is added as legal since most operations by default are assumed to be legal if the type is |
- global-extload-i16.ll contains extload tests for i16->i32, i16->i64. Test for load/store i8->i16 is added into file global-extload-i8.ll
- Dead codes have been removed.
- Fixed indentation issues.
- Fixed scalar zext, used S_MOV_B32 to materialize the 0
- Added tests for shl, sra, sub
- Removed min/max, which will be committed into another separate patch.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
264–268 ↗ | (On Diff #51139) | Are these really necessary? I thought making a type legal marked these operations legal by default. |
270–276 ↗ | (On Diff #51139) | Same with these too. Are they really necessary? |
1743–1756 ↗ | (On Diff #51139) | Why does i16 needs special handling here. These seems to be nearly identical to the block of code directly below. |
2022–2031 ↗ | (On Diff #51139) | We do we need to custom lower i16 stores? Can't we just mark then as promote? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
270–276 ↗ | (On Diff #51139) | min/max need to be explicitly made legal, but that should be a separate patch. setcc should be promote for now until those are added later |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
65 ↗ | (On Diff #51139) | I did this already, so this should check Subtarget->has16BitInsts() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2022–2031 ↗ | (On Diff #51139) | Load/store promote expects an equal size type for a bitcast promote. This is the same problem that i1 has, so it should follow that example |
Please temporarily ignore code changes in function LowerFDV32(). I will finish the code development for fp32 div in next patch.
Revert DFIV32 code changes. This patch contains i16 implementation only based on Tom / Matt comments.
I tried to apply this patch, but I get a lot of failing lit tests in test/CodeGen/AMDGPU. Do all the tests in this directory pass for you?
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
65 ↗ | (On Diff #52514) | Do we still need this comment? |
1740 ↗ | (On Diff #52514) | Does this need to be removed? |
1748 ↗ | (On Diff #52514) | Coding style. Variable names should start with a captial. |
LGTM overall with minor fixes:
lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
600 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
611 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
2369 ↗ | (On Diff #74904) | Space after //. Capitalize. |
2370 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
37 ↗ | (On Diff #74904) | Alphabetize |
82 ↗ | (On Diff #74904) | I do not think we need this comment |
83 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
229 ↗ | (On Diff #74904) | Subtarget->has16BitInsts() |
273 ↗ | (On Diff #74904) | Detabify |
275 ↗ | (On Diff #74904) | Detabify |
277 ↗ | (On Diff #74904) | Detabify |
279 ↗ | (On Diff #74904) | Detabify |
280 ↗ | (On Diff #74904) | Remove extra new line |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
962–963 ↗ | (On Diff #74904) | Originally I was avoiding these by promoting the load return type and truncating. Is this the same or is there an advantage from known bits types of things knowing it is really a 32-bit extload? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1916 ↗ | (On Diff #74904) | Why is this part of the patch? This looks unrelated |
3428–3429 ↗ | (On Diff #74904) | Define on same line |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
962–963 ↗ | (On Diff #74904) | The main problem is that extload must be legal, so it's hard to custom lower all extloads. I attempted to do this, and I think it would require a lot of work in the backend and also the legalizer to make this work. I'm not sure it's worth the effort at this point. |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
960 ↗ | (On Diff #76087) | This can be refined to a has 16-bit predicate |
lib/Target/AMDGPU/VOP1Instructions.td | ||
614–618 ↗ | (On Diff #76087) | Commented out code |
lib/Target/AMDGPU/VOP2Instructions.td | ||
348 ↗ | (On Diff #76087) | This should only need to be around the actual defs, not the multiclasses |
355 ↗ | (On Diff #76087) | I don't think you need to repeat the type in the output here |
lib/Target/AMDGPU/VOP3Instructions.td | ||
232 ↗ | (On Diff #76087) | Line wrapping |