Page MenuHomePhabricator

[WIP][RFC][Utils] Helper script to check sanity of cost tables vs scheduler models
Needs ReviewPublic

Authored by RKSimon on Fri, Jun 4, 5:27 AM.

Details

Summary

We're seeing more and more vectorizer perf regressions that turn out to be issues with values reported from the cost tables.

I've written this (admittedly hacky) helper script to compare the estimated (worst case) costs reported by the cost tables against groups of similar CPUs represented by their scheduler models (+ llvm-mca). For each common IR instruction/intrinsic + type (up to a CPU's maximum vector width) it generates the IR/assembly and runs llvm-mca to compare the reciprocal-throughput costs against 'opt --analyze --cost-model' and reports if the cost model doesn't match the worst case value reported by the CPUs in a given 'level' (e.g. avx1 - btver2/bdver2/sandybridge). Only reciprocal-throughput costs are handled at this time.

If run without any args, the script will exhaustively (slowly) test every cpulevel for every IR/type - you can specify cpulevel and/or op to better focus the test runs.

The script writes out the same 4 temp files each iteration to the cwd (fuzz.ll, fuzz.s, analyze.txt and mca.txt), if you use the --stop-on-diff command line argument you can easily grab these to dump into godbolt.org for triage.

There are still a lot of discrepancies reported, some in the cost tables but others in scheduler models, many are obvious (v2i32->v2f64 sitofp doesn't take 20cycles....) - but this script has to be used with due care and with the initial assumption that none of the cost tables, generated assembly or models/llvm-mca reports are perfectly correct.

This is very much a WIP (just count the TODO comments...) but I wanted to get this out so people can check my reasoning as I continue to develop this. There's plenty still to do before this is ready to be committed.

I've written this primarily for x86 but can't see much that will make this tricky to support other targets.

Please don't judge me on my rubbish python skills :)

Diff Detail

Event Timeline

RKSimon created this revision.Fri, Jun 4, 5:27 AM
RKSimon requested review of this revision.Fri, Jun 4, 5:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptFri, Jun 4, 5:27 AM

High-level comment: i have been thinking about this for a while,
and i'm basically set on at least trying to come up with
an infrastructure to autogenerate cost model for a cpu.
The differences between worst-case and best-case models
are too great to ignore.

I'm happy to resurrect D46276 someday, but until we actually have accurate, well maintained/tested models for a broad range of CPUs (for instance anything that shows up on https://store.steampowered.com/hwsurvey) we can't rely on them.

I'm happy to resurrect D46276 someday, but until we actually have accurate, well maintained/tested models for a broad range of CPUs (for instance anything that shows up on https://store.steampowered.com/hwsurvey) we can't rely on them.

Clarification: i was *NOT* talking about just auto generating the generic cost-model as worst-case over all the models we have,
but about having full custom cost models for specific, hand-picked sched models.

Matt added a subscriber: Matt.Fri, Jun 4, 9:02 AM
RKSimon updated this revision to Diff 349898.Fri, Jun 4, 9:37 AM

Thanks to @gbedwell for the python cleanup