Goldmont is similiar to silvermont we should probably use the silvermont cost model as a starting point.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I suspect this is wrong for PMULLD. I think that improved to a single uop on goldmont.
Turns out almost all of the slow things in the SLM table have been improved in GLM. So this isn't the right table to use. The only thing that didn't change much was floating point division.
I'm hoping to start work on PR36550 reasonably soon - a better approach might be to (a) ensure that the SLM model matches what the TTI says it should and (b) decide how best to provide a GLM model. What do you think?
Introduce a GLM specific table the overrides FDIV. Override FSQRT for both GLM and SLM. It appears for packed operations only half of the 128-bit vector is produced at a time for both SLM and GLM as the throughput is twice the scalar throughput. The default SSE42 throughputs we were getting otherwise don't match that behavior.
Throughput data for GLM was taken from table 16-17 in the latest Intel Optimization Manual.
Should I make a copy of the SLM scheduler and use it for GLM so we can start refining it?
Probably - I was looking to find tidy ways to override models such as these - architecturally the same but with a few latency tweaks - but couldn't see anything. Probably easier just to copy it,, maybe once you're happy with the accuracy of the SLM model.
These changes LGTM though