Avoid HW prefetcher instruction tag collisions in loops by inserting
MOVs to change the base address register of strided loads.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/AArch64/AArch64FalkorHWPFFix.cpp | ||
---|---|---|
188 ↗ | (On Diff #106459) | /// |
214 ↗ | (On Diff #106459) | I generally prefer long instruction list being in AArch64InstrInfo. I think a good guide is asking yourself, what happens if someone adds a new Load/Store instruction: does this code need to be adapted and will the person adding the load/store be able to find this code? AArch64InstrInfo, also already has various getMemXXX functions, at a first glance it looks like code duplication. Did you check whether this code could use the existing functions or the existing functions could be rewritten using this code? |
684 ↗ | (On Diff #106459) | It looks like you only need the scavenger to find free registers. In this case I'd recommend using the slightly leaner/simpler LiveRegUnits class instead:
available(Reg) && !MRI.isReserved(Reg) which avoids constructing intermediate bitsets (and hence saves you some malloc/free operations). |
685–687 ↗ | (On Diff #106459) | Less auto please. |
test/CodeGen/AArch64/falkor-hwpf-fix.mir | ||
30 ↗ | (On Diff #106459) | I think this should be surrounded by ---, ...; looks like the llvm yaml parser is lenient. |
lib/Target/AArch64/AArch64FalkorHWPFFix.cpp | ||
---|---|---|
214 ↗ | (On Diff #106459) | I considered this, but some of the info computed by this function is specific to the Falkor HW prefetcher tag implementation, so I figured if I'm going to need a giant switch op opcodes, I might as well compute everything I need to know in one place, even though some of it is a bit redundant. To your point about adding instructions in the future: I don't think that applies in this case since the opcodes supported by Falkor are not going to change going forward (unless we add more pseudo load/store instructions I guess, in which case those would hopefully have been expanded by the time this pass runs). |
685–687 ↗ | (On Diff #106459) | I got rid of the auto on 'MBB', but left it on the iterator since that seems fairly standard, and the variable is only used on the next line (which doesn't use 'auto'). |