This is an archive of the discontinued LLVM Phabricator instance.

[x86] Switch the x86 backend's execution domain order to prefer the packed integer domain.
AcceptedPublic

Authored by chandlerc on Feb 4 2015, 2:31 AM.

Details

Summary

While the packed single-precision domain has the smallest encodings and
was the first domain, it isn't a very good default. Consider operations
which could occur in any domain: loads, stores, shuffles, and, or, xor.
All of these operations are the same latency in the integer domain and
the floating point domains, but in many cases have 2x or 3x the
throughput in the integer domain! When in the floating point domain,
they end up bottlenecked on a single execution port in every micro
architecture since sandybridge, and probably some older ones as well.

This in turn uncovers some issues with our execution domain settings in
the backend. I've got a patch for one that is an independent improvement
and I'll submit shortly -- it adds an execution domain to movss and
movsd, both of which specifically target floating point domains. Adding
these causes us to match up floating point domain code much better
already.

A second issue I'm investigating is with vinsertf128 near a packed
double domain instruction becoming vinserti128.

Aside from these issues, everything I'm seeing looks like a huge
improvement to domain crossing and generally using the higher throughput
integer units. What do others think?

I'll update this patch with the test changes if folks want, but it will
be *MANY* updates to tests to make this change.

Diff Detail

Event Timeline

chandlerc updated this revision to Diff 19306.Feb 4 2015, 2:31 AM
chandlerc retitled this revision from to [x86] Switch the x86 backend's execution domain order to prefer the packed integer domain..
chandlerc updated this object.
chandlerc edited the test plan for this revision. (Show Details)
chandlerc added reviewers: craig.topper, qcolombet.
chandlerc added a subscriber: Unknown Object (MLST).
chandlerc updated this revision to Diff 19309.Feb 4 2015, 3:20 AM

Rotated the other table thanks to Craig Topper pointing out where I was being
blind. =]

The issue with vinsertf128 is now fixed. I've submitted the fix for movss.

However, most of the noise I see left from this change comes form the fact that
we also don't mark *scalar* floating point instructions as executing on the
packed floating point domain, even though they essentially do. I'm working on
a fix to that independently. Currentyl there are 105 tests that require updates
with this change, and I'm hoping the scalar operations specifying their domain
will reduced that significantly.

qcolombet accepted this revision.Feb 4 2015, 1:15 PM
qcolombet edited edge metadata.

LGTM.

Q.

This revision is now accepted and ready to land.Feb 4 2015, 1:15 PM

FWIW, I've run some very generic benchmarks (the only ones that are vector
heavy, are FP vector heavy) and have measured no significant swings in
performance in either direction. Everything is below the (frustratingly
high) noise floor.

Still planning to leave this under review for a while as i'd have to update
maaaany tests to submit it and I'm happy to let more folks benchmark in the
mean time.