The sea was angry that day, my friends - like an old man trying to send back soup in a deli.
- User Since
- Sep 9 2013, 3:45 AM (342 w, 1 d)
Sorry, forgot to add cfe-commits to the original diff.
LGTM too. One day I'm going to change all MOPs to MOp when you're not looking anyway.
Tue, Mar 24
Changed my mind again, I won't be a part of this.
I'm back in.
Mon, Mar 23
(after Matt's points).
Thu, Mar 19
Mon, Mar 16
Could you fix up the ARM64 selection code as I mentioned earlier so we don't generate different instructions?
Fri, Mar 6
Thu, Mar 5
@omjavaid can you look over the lldb changes? I don't have the hardware to be able to actually run this test but I've tried to relax the checks.
Wed, Mar 4
Hi, I just tried out this patch locally and I'm seeing failures running the tests:
Failing Tests (3): LLVM :: CodeGen/AArch64/machine-outliner-cfi.mir LLVM :: CodeGen/AArch64/machine-outliner-noreturn-save-lr.mir LLVM :: CodeGen/AArch64/machine-outliner-side-effect.mir
Tue, Mar 3
Which part of the langref are you referring to?
Now with test updates.
Actually this is missing a bunch of test updates....
Feb 28 2020
Feb 27 2020
I see what you mean. Essentially we would take the code size/perf hit with the specific MIR example I used, but instead try harder to form the extending load ops and ensure those are assigned to gpr banks. There might be other issues with G_EXTRACT but we'll see.
Feb 26 2020
The change itself is ok but is there any way to have more granular overriding of this behavior, instead of overriding the whole shouldLocalize()?
I committed before I saw your message.
Yes, on AArch64 all sub-32bit loads give a free ZEXT. We catch this case in the selector if for some reason we have a redundant ZEXT.
Anyway, I guess this ship has sailed.
s16 operations, if legal, just get selected to gpr32 regclass instructions. E.g.
%ld:gpr(s16) = G_LOAD ... => %ld:gpr32 = LDR_16 ...
For things like G_ADD we legalize to s32, but s16 itself as a type isn't illegal.
I would have expected that if you have to widen the size when assigning the actual register class, then legalization should have actually widen the value. Otherwise, that means we have to manually deal with "dangling" bits everywhere else.
Forgot to answer your first question:
I would expect G_PHI to be selected to PHI (i.e., just a opcode change) and then the phi elimination pass will insert the cross copies for you.
Where is this going wrong?
Thanks for taking a look. Yes the G_PHI gets selected to PHI. This is normally ok even in the case where we have operands with different regbanks. However on AArch64 this doesn't work when the type size is less than 32 bits. If this happens, the PHI operand with a GPR RegBank gets the gpr32 regclass assigned to it (we don't have anything smaller to represent types like s16 on GPR on AArch64). Then the FPR bank operand gets the appropriately sized FPR16 regclass assigned. These two registers have different sizes, and copyPhysReg doesn't know how to deal with copies between a gpr32 and an fpr16 or vice versa. I thought about maybe teaching copyPhysReg to deal with that, but the fact that we have to deal with subregisters (because of the differing sizes) makes me nervous and didn't seem like the best place to fix it.
Feb 24 2020
Feb 21 2020
Feb 18 2020
Feb 11 2020
LGTM with nits.
Feb 10 2020
Feb 9 2020
Feb 7 2020
Feb 6 2020
Feb 5 2020
LGTM with nit.
Feb 4 2020
The localizer is one pass that can sink instructions, although it currently doesn't do intrinsics yet.
I'm not 100% sure on what the convergent attribute means here, but assuming that it means it's not allowed to sink instructions into control flow etc, I don't think we should be marking these generic intrinsic ops with it. On arm64 we can use G_INTRINSIC to represent user-written intrinsic instructions which have no problem with being made control dependent.
Feb 3 2020
Can we have one extra test checking for a chain of G_XORs?
It's a real struggle for me to review AMDGPU legalizer changes sometimes. I wish we had smaller, more obvious for the change being made.
Jan 31 2020
Didn't mean to reject.
Jan 30 2020
Committed in 610f1d22f149 but with wrong phab link in commit msg.
Could you re-upload your patch so it can be reviewed?
Oops, linked to the wrong review in the commit message and ended up auto-closing this by mistake.
Jan 29 2020
Forgot to rename test file.
I've made the test XFAIL for now. Address other comments.
@arsenm Matt when I run the update_llc_test_checks script on the llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.update.dpp.ll test, it crashes with a backtrace like this:
Assertion failed: (Opcode < NumOpcodes && "Invalid opcode!"), function get, file /Users/aemerson/work/oss/co1/src/llvm/include/llvm/MC/MCInstrInfo.h, line 4 frame #4: 0x00000001000bdef3 llc`llvm::MCInstrInfo::get(this=0x0000000110010248, Opcode=4294967295) const at MCInstrInfo.h:45:5 frame #5: 0x000000010083b8d7 llc`llvm::SIInstrInfo::getMCOpcodeFromPseudo(this=0x0000000110010240, Opcode=3758) const at SIInstrInfo.h:939:12 frame #6: 0x00000001008277e3 llc`llvm::SIInstrInfo::getInstSizeInBytes(this=0x0000000110010240, MI=0x000000010f879b28) const at SIInstrInfo.cpp:6067:29 frame #7: 0x0000000101f9bf44 llc`(anonymous namespace)::BranchRelaxation::computeBlockSize(this=0x000000010f03fb60, MBB=0x000000010f8bbf28) const at BranchRelaxation.cpp:168:18 frame #8: 0x0000000101f9b7be llc`(anonymous namespace)::BranchRelaxation::scanFunction(this=0x000000010f03fb60) at BranchRelaxation.cpp:158:39 frame #9: 0x0000000101f9b264 llc`(anonymous namespace)::BranchRelaxation::runOnMachineFunction(this=0x000000010f03fb60, mf=0x000000010f0566f0) at BranchRelaxation.cpp:555:3
I'm replacing a G_MUL with an equivalent G_SHL here, is there an issue with doing this for AMDGPU?