This patch makes vector spills valid for tail predication when all loads from the same stack slot are within the loop.
Details
Diff Detail
Event Timeline
| llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
|---|---|---|
| 1143 | Is it worth splitting this out into a new function? To keep this stack logic separate in case it needs to grow even more sophisticated in the future. | |
| 1147 | I think we may need to distinguish between a load that accesses SP and a stack spill slot. A SP address could be some other (potentially escaping) stack object. I think that info is usually available through the MMO, maybe via FrameInfo::isSpillSlotObjectIndex. I was looking around for a way to get all the MachineInstr's that access a specific stack slot, but couldn't fine one better than searching through a lot of instructions. | |
| 1150–1152 | I think this comment would be better if we explain what we do do, not why we don't use RDA (or maybe both). Explaining that we are searching all blocks after the loop for another access to the same stack slot. | |
| 1157 | Use I or It or Idx or something? | |
The searching feels gnarly, like it can end up searching a lot of instructions. I don't know of a better idea though, and it should be rare that it needs to search.
| llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
|---|---|---|
| 1087 | This function can be made static. | |
| 1090–1092 | getPseudoValue can return nullptr on failure, which needs to be checked. Or you may be able to use if (const auto *FS = dyn_cast_or_null<FixedStackPseudoSourceValue>(PseudoValue)) to get the same effect? | |
| 1106 | This will have to check that it has at least one memoperand. | |
| 1121 | Frame indexes look like they are usually int's. Does GetFrameIndex return the value by parameter reference because FrameIndexes can be negative? So we can't return -1 for not found? | |
| 1122 | Check for isSpillSlotObjectIndex(FI) too? | |
| 1124 | I think you can remove the ", 1", and let it choose a default small size. Same below. | |
| 1139 | If we find another stack store, does that mean we can stop the search? | |
| 1143 | As a function looks good. It's nice and self-contained. | |
| 1148 | Should this check that Succ is not already in Frontier too? | |
Make ValidateMVEStore static, use dyn_cast, check isSpillSlotObjectIndex, make GetFrameIndex return an int, remove the size for the SmallVector and check the size of memoperands.
| llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
|---|---|---|
| 1090–1092 | The dyn_cast<...> option looks good to me. 👍 | |
| 1106 | Ah I thought that would be guaranteed by the instruction being a V[STR|LDR]WU32. I'll add a check. | |
| 1121 | I chose for it to be taken as reference so that a boolean could be returned to indicate success, but if it can just return the frame index or -1 in case of error then that would be great. From looking through the codebase (this is a great use-case for more function documentation) it looks like frame indexes will be positive so I think returning -1 in case of failure is a good option. | |
| 1124 | SmallPtrSet requires an initial size but I can remove it for SmallVector. | |
| 1139 | If we have found another store to the same stack slot then we stop the search as any further loads in the same BB won't invalidate the store we're validating, since they are dependent on that second store instead. | |
| 1148 | Yeah that can't hurt, however I can't find a nice contains function in SmallVector. I wish container classes had the same interface. | |
Thanks. This looks good, minus some mostly Nitpicks.
| llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
|---|---|---|
| 1090 | const PseudoSourceValue* PseudoValue = | |
| 1104 | == 1 is probably fine. I wouldn't expect it to be higher. | |
| 1121 | I think they may be able to go negative, but a negative argument means a parameter passed via the stack (I think). So treating it the same as an invalid value sounds like it should work fine. | |
| 1124 | Oh of course, yeah. I hadn't realized it didn't have the same interface. SmallPtrSet sounds good, but you may want to increase the default to 4. It won't make a large difference, but 4 pointers are not going to hurt anyone. | |
| 1131 | unsigned int -> unsigned, is more common in llvm. | |
| 1139 | Sure. I was thinking in that case it wouldn't have to add any Successors, so we could potentially save from looking through a lot of other BB's. | |
| 1148 | There is an is_contained(Frontier, Succ) in ADT, that is a wrapper around std::find. | |
This function can be made static.