- User Since
- Jan 29 2015, 2:08 PM (224 w, 4 d)
Jun 15 2018
May 24 2018
May 21 2018
May 15 2018
May 14 2018
May 11 2018
Add some comments
May 9 2018
Add lane index range check
May 8 2018
May 4 2018
May 1 2018
Apr 30 2018
Apr 26 2018
I've been looking into this issue for AArch64 recently as well (see bug 37240 for a PostRA scheduler issue I found). I'll be filing more bugs and/or posting fixes for other cases that we hit on our target with our different compiler flags in the next few days.
Apr 25 2018
I've put the full diff for my alternative fix here: D46063
Apr 24 2018
Apr 23 2018
Apr 19 2018
Apr 18 2018
@MatzeB This isn't about the scavenging of a register to use for large offsets, etc., but rather the scavenging of stack slots in the CSR save area to use for regular stack objects.
Ugh, good find. Isn't this an issue if variable sized stack objects are present as well?
Apr 10 2018
Apr 6 2018
Apr 3 2018
Another thought, would it be worthwhile to add a serialized MIR function property for this usesRedZone flag so you could write MIR tests for this instead?
Apr 2 2018
Mar 28 2018
I get it now. LGTM
I'm not sure I see what's wrong with removing the second COPY in either of these cases, unless the semantics of the multi-register COPY are that each component register is copied in order?
Mar 20 2018
Mar 16 2018
Mar 5 2018
Feb 27 2018
Feb 23 2018
Would it not be simpler to just add a subtarget bool that controls whether the problematic opcodes are emitted and set it for your subtargets (similar to the way STRQroIsSlow is handled)? That way you could avoid generating them not just in this pass, but in ISel and frame lowering?
Feb 22 2018
Add test/CodeGen/AMDGPU/postra-norename.mir as requested by Quentin
Feb 21 2018
@arsenm Could you take a look at the AMDGPU change?
New version based on review feedback.
Feb 12 2018
Checking hasExtraRegAllocReg in isRenamable sounds reasonable to me (though it may require the interface to be changed slightly since we'll need the opcode). An alternative that is closer to what I initially proposed would be to have setReg and addOperand both clear the isRenamable flag. This way the only operands that would be renamable are those that have been untouched since they were marked renamable by the register allocator. This would also allow us to get rid of the extra code that I added to clear the renamable flag in some post-RA target code. Targets would be able to opt in to preserving the renamable flag by correctly preserving the renamable flag in post-RA transformed code, but by default the flag would be set conservatively. I'm going to try this approach out to see what it looks like and what kind of impact it has unless someone has a reason they think it won't work.
Feb 7 2018
Feb 5 2018
I've thought about this some more and tested it out on Falkor. As currently written this change causes SIMD store instructions to not have pre/post increments folded into them, causing minor performance regressions. I have the following general reservations as well:
- does using the max latency of the load/store and add make sense given that the operations are dependent?
- does always favoring latency over number of uops (an approximation of throughput) make sense? unless the operation is on the critical path I would think not.
Feb 1 2018
Jan 31 2018
Jan 30 2018
Jan 29 2018
Add comments to address feedback from Quentin
Jan 24 2018
D42449 adds the machine verifier check that no renamable operands are assigned to reserved registers
I've uploaded a new version that addresses most of the issues brought up by @qcolombet.
Updates based on Quentin's comments
Jan 23 2018
Jan 19 2018
Jan 16 2018
Jan 11 2018
Jan 8 2018
@tstellar could you specifically take a look at the AMDGPU backend changes?
Jan 3 2018
I'm not sure using the redzone is safe on all targets.
Okay, I get it now, I had the wrong case in mind. I was thinking that you could put the scavenge slot close to [sp], but that doesn't work since you have a large outgoing stack parameter area.
This change looks good to me, but you might want to get a second opinion since my thinking on this hasn't been that clear.
I think what is bothering me about this change is that the return value of hasFP() now seems more dynamic. Did you consider a potentially simpler fix of just creating the spill slot as a fixed stack object with a hard-coded offset that would guarantee it is directly addressable from the SP/FP?
Dec 29 2017
Dec 22 2017
I'll resubmit this for review if I ever get back around to it
Dec 18 2017
One thing I noticed when looking at this: it seems fragile to me to have MachineFrameInfo::computeMaxCallFrameSize() and PEI::calculateFrameInfo() doing essentially the same calculation with duplicated code.
Would it make sense to add an assert to PEI::calculateFrameInfo that checks that the previously calculated MaxCallFrameSize isn't smaller than the one calculated later?