This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Fix domains for VZEXT_LOAD type instructions
ClosedPublic

Authored by RKSimon on Dec 12 2016, 11:58 AM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 81122.Dec 12 2016, 11:58 AM
RKSimon retitled this revision from to [X86][SSE] Fix domains for VZEXT_LOAD type instructions.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.

Is there actually a domain crossing penalty for these cases?
(Adding Dave & Zia as authoritative sources of truth :-) )

Is there actually a domain crossing penalty for these cases?
(Adding Dave & Zia as authoritative sources of truth :-) )

The penalties are minor (and non-existent on some latest architectures), but definitely present on pre-AVX targets. It does allow us to be consistent along an instruction chain which isn't a bad thing. By allowing domain switching we also encourage float domain instructions which often have shorter encodings.

zansari edited edge metadata.Dec 12 2016, 4:25 PM

I'm working on getting some confirmation on the latest ones, but most current Core architectures suffer a 1-clk penalty switching between fp and int domains. This doesn't include the Atom line, which can do it for free.

The 1 clk isn't insignificant if you're latency bound and you do a lot of switching on the critical path. I'm not familiar with the code that decides to switch, but can it take architectures and maybe code size into consideration (i.e. favor smaller encoding with Os/Oz)?

I'm working on getting some confirmation on the latest ones, but most current Core architectures suffer a 1-clk penalty switching between fp and int domains. This doesn't include the Atom line, which can do it for free.

The 1 clk isn't insignificant if you're latency bound and you do a lot of switching on the critical path. I'm not familiar with the code that decides to switch, but can it take architectures and maybe code size into consideration (i.e. favor smaller encoding with Os/Oz)?

Float domain is the default as we assume that float instructions are at least as small as the equivalent double/integer alternatives (this was true in SSE days, not so certain about the latest instruction sets) - this is why most domain agnostic code ends up using floats. Through that we get some optsize automatically without requiring Os/Oz. There is nothing to ensure we always use the shortest instruction (domain switches be damned).

We don't do much for specific architectures - we currently filter just by a target's instruction set - as the code is really only there to try and maintain a particular domain as long as possible.

andreadb accepted this revision.Dec 15 2016, 6:55 AM
andreadb edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Dec 15 2016, 6:55 AM
This revision was automatically updated to reflect the committed changes.