This is an archive of the discontinued LLVM Phabricator instance.

[X86][SSE] Add general lowering of nontemporal vector loads
ClosedPublic

Authored by RKSimon on Jun 3 2016, 8:00 AM.

Details

Summary

Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.

This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.

We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.

There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 59559.Jun 3 2016, 8:00 AM
RKSimon retitled this revision from to [X86][SSE] Add general lowering of nontemporal vector loads.
RKSimon updated this object.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
craig.topper edited edge metadata.Jun 3 2016, 8:06 AM

Does this make isel combine non-temporal loads into instructions even before the folding tables come into play?

let me rephrase that question. If we replace the intrinsic with generic IR, would we end up combining those loads into instructions during isel?

let me rephrase that question. If we replace the intrinsic with generic IR, would we end up combining those loads into instructions during isel?

At present yes, if we replace the intrinsics then cases that currently don't fold would then start folding - that's why I added those tests to nontemporal-loads.ll to make sure we fix that before moving to generic ir.

We have very few cases of folding stores but the ones I've tested (_mm_extract_epi* and __mm256_cvtps_ph) don't seem to fold non-temporal stores so its certainly possible (whether its intended or not is another question.......).

mkuper accepted this revision.Jun 6 2016, 11:41 AM
mkuper edited edge metadata.

LGTM

lib/Target/X86/X86InstrAVX512.td
3347

Any reason we support more types for loads than for stores? Are they just missing for stores?

test/CodeGen/X86/fast-isel-nontemporal.ll
599

I wonder if this is better or worse, in practice, than 2 * vmovntdqa %xmm.

This revision is now accepted and ready to land.Jun 6 2016, 11:41 AM
craig.topper added inline comments.Jun 6 2016, 9:47 PM
lib/Target/X86/X86InstrAVX512.td
3378

Aren't 128/256 integer loads still promoted to v2i64 and v4i64 even when AVX512 is enabled?

RKSimon added inline comments.Jun 7 2016, 6:28 AM
lib/Target/X86/X86InstrAVX512.td
3378

No - if I remove the i32/i16/i8 patterns then the nt loads don't happen - I haven't been able to work out why.

test/CodeGen/X86/fast-isel-nontemporal.ll
599

Its worse - if you're wanting to use NT loads you must have a good reason. I'll look at ways to split this in a future patch.