This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove IntrArgMemOnly from target specific gather/scatter intrinsics
ClosedPublic

Authored by craig.topper on Feb 28 2019, 5:33 PM.

Details

Summary

IntrArgMemOnly implies that only memory pointed to by pointer typed arguments will be accessed. But these intrinsics allow you to pass null to the pointer argument and put the full address into the index argument. Other passes won't be able to understand this.

A colleague found that ISPC was creating gathers like this and then dead store elimination removed some stores because it didn't understand what the gather was doing since the pointer argument was null.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Feb 28 2019, 5:33 PM

Just because it's technically possible to put the entire pointer into an index doesn't make it legal, generally (e.g. that's not how GEP works). But if Intel wants to define its intrinsics that way, fine, I guess. Please try to verify that some other compiler behaves this way, and add a testcase.

I'm not sure I follow how you would reach useful memory on a 64-bit system; the indices are 32 bits, right?

Do we lower any Intel intrinsics to llvm.masked.gather/scatter? If we do, we might need to change clang as well.

There are intrinsics with 32 bit indices and intrinsics with 64-bit indices. The data elements are 32 bits or 64 bits integer or floating. And we support basically all permutations of the data and indices. On 64-bit targets the 32-bit indices are sign extended. And 32-bit targets the 64-bit indices are truncated I think.

We don't convert any target specific intrinsics to masked.gather/scatter in clang. It would require combining the 3 disected address pieces into gep or ptrtoint/inttoptr+arithmetic so I haven't looked into it.

Slightly related, how close are we to just using the generic masked gather/scatter intrinsics instead?

Verified gcc behavior here. https://godbolt.org/z/lXdXjL

@RKSimon I haven't really investigated using the generic intrinsics very much. So I'd say not close. Off the top of my head, we'd need to rebuild a gep by bitcasting the pointer and make sure the scale and load size matched. If they don't match then we have to do arithmetic using ptrtoint, splatting, vector multiplies, vector additions in IR. Then try to pattern match that all back out in the backend. But we may not be able to use a gep at all if the pointer can be null like this.

RKSimon accepted this revision.Mar 1 2019, 12:30 PM

LGTM

This revision is now accepted and ready to land.Mar 1 2019, 12:30 PM
This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2019, 1:02 PM