[X86][SSE] Improve DIV/SQRT throughput estimates for SB/HW schedule models
Needs ReviewPublic

Authored by RKSimon on Wed, Apr 19, 4:38 AM.

Details

Summary

The current DIV/SQRT throughput estimates for SB/HW schedule models use the default 1cy value, which is highly unrealistic.

I've updated the values with estimates based on the latencies which is typically about right for DIV/SQRT units, its also in the ballpark of what Agner suggests - if anyone has even more accurate values that would be great but these alone should be a major improvement to scheduling.

Diff Detail

Repository
rL LLVM
RKSimon created this revision.Wed, Apr 19, 4:38 AM
avt77 added a comment.Fri, Apr 21, 2:51 AM

What are your plans here? I've just checked (with help of "-print-schedule=true") IMUL and LEA for Jaguar: they are completely wrong if we compare with numbers from http://www.agner.org/optimize/instruction_tables.pdf. Are we going to change all these things step-by-step?

What are your plans here? I've just checked (with help of "-print-schedule=true") IMUL and LEA for Jaguar: they are completely wrong if we compare with numbers from http://www.agner.org/optimize/instruction_tables.pdf. Are we going to change all these things step-by-step?

The basic process will be: add thorough tests, identify issues, fix issues (either direct commit or reviewed patch if it warrants discussion). I'm intending to initially focus on the SSE/AVX instructions so if you want to add scheduler tests for the mul/imul/lea/etc. instructions then I say go for it.

gadi.haber added inline comments.Mon, Apr 24, 11:30 PM
lib/Target/X86/X86SchedHaswell.td
139

let NumMicroOps = 1;

143

let NumMicroOps = 2;

148

let NumMicroOps = 1;

152

let NumMicroOps = 2;

lib/Target/X86/X86SchedSandyBridge.td
126

let NumMicroOps = 1;

130

let NumMicroOps = 2;

135

let NumMicroOps = 1;

139

let NumMicroOps = 2;

RKSimon updated this revision to Diff 96539.Tue, Apr 25, 6:05 AM

Add NumMicroOps and regenerate (adds 256-bit vector cases which were added recently).