This is an archive of the discontinued LLVM Phabricator instance.

[WIP] syndicate code generation between dynamic alloca and static alloca during stack clash probing
AbandonedPublic

Authored by serge-sans-paille on May 7 2020, 10:24 AM.

Details

Summary

It's a WIP, but as spotted by @jonpa, there was room for improvement there, so I'm doing my best.

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptMay 7 2020, 10:24 AM
jonpa added a comment.May 7 2020, 10:59 AM

I think this was actually @uweigands suggestion, but for now we are going to first try to make sure that the prologue is as fast as possible and not combine the reg/imm case. But it will be interesting to know if you can get the prologue code equally fast this way...

I also wonder if the X86 backend is adding the live-in list entries on the new MBBs in inlineStackProbe(). I discovered that I had to call recomputeLiveIns() on the new blocks (SystemZ patch is updated, so you can see what I mean there...)

I think this was actually @uweigands suggestion, but for now we are going to first try to make sure that the prologue is as fast as possible and not combine the reg/imm case. But it will be interesting to know if you can get the prologue code equally fast this way...

Yeah, that's my goal, having both equally fast.

I also wonder if the X86 backend is adding the live-in list entries on the new MBBs in inlineStackProbe(). I discovered that I had to call recomputeLiveIns() on the new blocks (SystemZ patch is updated, so you can see what I mean there...)

Indeed, I manually added the LiveIn for the testMBB, I'm surprised you need to recompute them instead of just adding the extra registers by hand.

jonpa added a comment.May 11 2020, 7:03 AM

I think this was actually @uweigands suggestion, but for now we are going to first try to make sure that the prologue is as fast as possible and not combine the reg/imm case. But it will be interesting to know if you can get the prologue code equally fast this way...

Yeah, that's my goal, having both equally fast.

I also wonder if the X86 backend is adding the live-in list entries on the new MBBs in inlineStackProbe(). I discovered that I had to call recomputeLiveIns() on the new blocks (SystemZ patch is updated, so you can see what I mean there...)

Indeed, I manually added the LiveIn for the testMBB, I'm surprised you need to recompute them instead of just adding the extra registers by hand.

All the callee-saved registers are added as live-in on the entry block, of which most are saved (and killed) immediately on SystemZ. The function argument registers however are also live-in and as well live past the stack-probing loop. I found it most convenient to make those argument registers in particular live-in after the probing by recomputing the liveness...

All the callee-saved registers are added as live-in on the entry block, of which most are saved (and killed) immediately on SystemZ. The function argument registers however are also live-in and as well live past the stack-probing loop. I found it most convenient to make those argument registers in particular live-in after the probing by recomputing the liveness...

I've got no prior knowledge of SystemZ, so pardon the seemingly stupid question, but how does the live-in of the entryblock impacts the Live-in of the probing loop?

jonpa added a comment.May 11 2020, 8:43 AM

All the callee-saved registers are added as live-in on the entry block, of which most are saved (and killed) immediately on SystemZ. The function argument registers however are also live-in and as well live past the stack-probing loop. I found it most convenient to make those argument registers in particular live-in after the probing by recomputing the liveness...

I've got no prior knowledge of SystemZ, so pardon the seemingly stupid question, but how does the live-in of the entryblock impacts the Live-in of the probing loop?

There are no stupid questions here :-) Well, the entry block is split at the point of the stub, with new MBBs inserted in between those two new halves. So my understanding is that any phys reg that is live at the point of the stub should also be live after it (in the rest of the original MBB), and therefore it should be live-in to all the other new blocks created so that the regs are proberly modelled as live-through through those blocks...

All the callee-saved registers are added as live-in on the entry block, of which most are saved (and killed) immediately on SystemZ. The function argument registers however are also live-in and as well live past the stack-probing loop. I found it most convenient to make those argument registers in particular live-in after the probing by recomputing the liveness...

I've got no prior knowledge of SystemZ, so pardon the seemingly stupid question, but how does the live-in of the entryblock impacts the Live-in of the probing loop?

There are no stupid questions here :-) Well, the entry block is split at the point of the stub, with new MBBs inserted in between those two new halves. So my understanding is that any phys reg that is live at the point of the stub should also be live after it (in the rest of the original MBB), and therefore it should be live-in to all the other new blocks created so that the regs are proberly modelled as live-through through those blocks...

I can't find a reference in the doc that states whether Live-In for a block include live-In of its successors (which you seem to imply) or only of the block itself (which was my original belief)

Apart from the fruitful discussion with @jonpa, I'm going to drop that attempt. I can't find of a correct way to syndicate theses codes without breaking existing method visibility.

jonpa added a comment.May 12 2020, 3:20 AM

All the callee-saved registers are added as live-in on the entry block, of which most are saved (and killed) immediately on SystemZ. The function argument registers however are also live-in and as well live past the stack-probing loop. I found it most convenient to make those argument registers in particular live-in after the probing by recomputing the liveness...

I've got no prior knowledge of SystemZ, so pardon the seemingly stupid question, but how does the live-in of the entryblock impacts the Live-in of the probing loop?

There are no stupid questions here :-) Well, the entry block is split at the point of the stub, with new MBBs inserted in between those two new halves. So my understanding is that any phys reg that is live at the point of the stub should also be live after it (in the rest of the original MBB), and therefore it should be live-in to all the other new blocks created so that the regs are proberly modelled as live-through through those blocks...

I can't find a reference in the doc that states whether Live-In for a block include live-In of its successors (which you seem to imply) or only of the block itself (which was my original belief)

In theory, all uses must have a prior definition in the IR model, and it is a broken IR if there is not. If there is a use in an MBB of a value defined in another MBB that is not an immediate predecessor, but further up in the CFG, that should be marked as live-in to MBB and also live-in to any other MBB that is on the path to the use (to model "live-through"). The machine verifier already has checks for this, and it is under development: https://reviews.llvm.org/D78586.