Depends on D114642
Original review https://reviews.llvm.org/D112201
OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Differential D117003
[SchedModels][CortexA55] Add ASIMD integer instructioins kpdev42 on Jan 11 2022, 3:04 AM. Authored by
Details Depends on D114642 Original review https://reviews.llvm.org/D112201 OS Laboratory. Huawei Russian Research Institute. Saint-Petersburg
Diff Detail
Event TimelineComment Actions I'm not sure how much I love the predicate matching in the scheduler, as opposed to just matching instructions opcodes. There are quite a few instructions which narrow or enlarge vectorsm where register types are misleading. Can you at least move the code so it doesn't look like this is bolted onto the end of the existing schedule :)
Comment Actions Hello. I'm getting a few reports of this making performance worse, especially on Cortex-A510 cpu's. I think that adding the forwarding paths present on A55, but not available in A510 are causing more hazards and the performance to drop significantly in places, because they are compiled for cpu=generic. The A510 generally has higher throughput, but also higher latencies in places. We may need to back out some of these changes, even if it makes the A55 model less precise. At least in the short term. We might need to take the route of not hurting other cpus, providing it doesn't help the A55 performance much. Comment Actions I have partially reverted this in 61b616755aced8ed7afc48ffd152f02194b9d201. I was trying not to undo the whole thing, but just removed the forwarding paths and some other parts that were making performance worse around the "L" instructions. The rest was honestly making some performance worse too, but some stuff was better and the parts removed seemed to be causing much of the change. We probably need to be more careful going forward that we benchmark on more cpu's, not just the Cortex-A55. The schedule is used by any -mcpu=generic compile, so even if it's a less accurate model of the A55, we may need to strike more of a balance between different cpus until we have a better option. |