Patch by: Wei Ding
Details
Diff Detail
- Build Status
Buildable 836 Build 836: arc lint + arc unit
Event Timeline
All of these tests should go in the existing test files and be run on the other subtargets as well.
I also expected way more load and store test additions. We need to have extload tests for i8->i16, i16->i32, i16->i64, some of which probably exist already.
This patch should limit itself to the basic set of required operations on i16: load and store, add, sub, bitshifts, and conversions. Optimizations like min/max should come later.
lib/Target/AMDGPU/VIInstructions.td | ||
---|---|---|
170 | This pattern is necessary. I believe I had a test for this in my original patch | |
178–195 | Dead code should be removed | |
199 | Should follow camel case naming convention | |
201 | No spaces around the :s | |
204–205 | Should be indented like other Pats in the file | |
210 | This is incorrect if this is a scalar zext, which currently doesn't happen because there are no scalar i16 instructions (although we may want pseudos for these). To be consistent, this should use S_MOV_B32 to materialize the 0 | |
240–241 | These should be using the signed min/max. I also think the min/max matching should be a separate patch | |
246 | These should be done in a separate patch | |
249–267 | Dead code should be removed. These instruction's dont exist. However, tests should be added to the rotl/rotr/bswap/ctlz/cttz to make sure these are properly expanded when i16 is added as legal since most operations by default are assumed to be legal if the type is |
- global-extload-i16.ll contains extload tests for i16->i32, i16->i64. Test for load/store i8->i16 is added into file global-extload-i8.ll
- Dead codes have been removed.
- Fixed indentation issues.
- Fixed scalar zext, used S_MOV_B32 to materialize the 0
- Added tests for shl, sra, sub
- Removed min/max, which will be committed into another separate patch.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
264–268 | Are these really necessary? I thought making a type legal marked these operations legal by default. | |
270–276 | Same with these too. Are they really necessary? | |
1743–1756 | Why does i16 needs special handling here. These seems to be nearly identical to the block of code directly below. | |
2022–2031 | We do we need to custom lower i16 stores? Can't we just mark then as promote? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
270–276 | min/max need to be explicitly made legal, but that should be a separate patch. setcc should be promote for now until those are added later |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
65 | I did this already, so this should check Subtarget->has16BitInsts() |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
2022–2031 | Load/store promote expects an equal size type for a bitcast promote. This is the same problem that i1 has, so it should follow that example |
Please temporarily ignore code changes in function LowerFDV32(). I will finish the code development for fp32 div in next patch.
Revert DFIV32 code changes. This patch contains i16 implementation only based on Tom / Matt comments.
I tried to apply this patch, but I get a lot of failing lit tests in test/CodeGen/AMDGPU. Do all the tests in this directory pass for you?
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
81 | Do we still need this comment? | |
2612–2613 | Does this need to be removed? | |
2620 | Coding style. Variable names should start with a captial. |
LGTM overall with minor fixes:
lib/Target/AMDGPU/AMDGPUISelLowering.cpp | ||
---|---|---|
600 | Subtarget->has16BitInsts() | |
610 | Subtarget->has16BitInsts() | |
2367 | Space after //. Capitalize. | |
2368 | Subtarget->has16BitInsts() | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
37 | Alphabetize | |
81 | I do not think we need this comment | |
82 | Subtarget->has16BitInsts() | |
227 | Subtarget->has16BitInsts() | |
271 | Detabify | |
273 | Detabify | |
275 | Detabify | |
277 | Detabify | |
278 | Remove extra new line |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
962–963 | Originally I was avoiding these by promoting the load return type and truncating. Is this the same or is there an advantage from known bits types of things knowing it is really a 32-bit extload? | |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
1929 | Why is this part of the patch? This looks unrelated | |
3439–3440 | Define on same line |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
962–963 | The main problem is that extload must be legal, so it's hard to custom lower all extloads. I attempted to do this, and I think it would require a lot of work in the backend and also the legalizer to make this work. I'm not sure it's worth the effort at this point. |
lib/Target/AMDGPU/BUFInstructions.td | ||
---|---|---|
960 | This can be refined to a has 16-bit predicate | |
lib/Target/AMDGPU/VOP1Instructions.td | ||
614–618 | Commented out code | |
lib/Target/AMDGPU/VOP2Instructions.td | ||
348 | This should only need to be around the actual defs, not the multiclasses | |
355 | I don't think you need to repeat the type in the output here | |
lib/Target/AMDGPU/VOP3Instructions.td | ||
232 | Line wrapping |
Subtarget->has16BitInsts()