This is an archive of the discontinued LLVM Phabricator instance.

[X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets
ClosedPublic

Authored by RKSimon on Nov 13 2021, 8:18 AM.

Details

Summary

Similar to what we've done for other ops, this patch widens VPTERNLOG to a 512-bit op for non-VLX targets.

Fixes regressions in D113192

Diff Detail

Event Timeline

RKSimon created this revision.Nov 13 2021, 8:18 AM
RKSimon requested review of this revision.Nov 13 2021, 8:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 13 2021, 8:18 AM

Enable use of 128/256-bit VPTERNLOG on non-VLX targets

I think the title is misleading. We widen to 512-bit VPTERNLOG rather than use 128/256-bit. Besides, should be better to mention the broadcastable work there?

llvm/lib/Target/X86/X86ISelLowering.cpp
46266–46269

Why do we need to do it? Do we have new VT type other than i32/i64 now? Or the previous code can handle them already?

RKSimon retitled this revision from [X86] Enable use of 128/256-bit VPTERNLOG on non-VLX targets to [X86] Widen 128/256-bit VPTERNLOG patterns to 512-bit on non-VLX targets.Nov 14 2021, 2:38 AM
RKSimon added inline comments.Nov 14 2021, 2:46 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
46266–46269

This is to ensure we are using types that getAVX512Node can recognise as potential broadcasts. We currently generate VPTERNLOG nodes with any integer type (not just vXi32/i64) and isel handles it later on,

Would it make sense to remove the new broadcast code from getAVX512Node for this patch and I propose it as a separate patch? It'd mean that we end up with unfolded 128/256-bit loads for the widened cases in this patch, but wouldn't be an actual regression.

pengfei added inline comments.Nov 14 2021, 4:16 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
46266–46269

Yeah, makes sense.

RKSimon updated this revision to Diff 387066.Nov 14 2021, 4:38 AM
RKSimon edited the summary of this revision. (Show Details)

Remove the broadcast handling from getAVX512Node

pengfei added inline comments.Nov 14 2021, 4:48 AM
llvm/test/CodeGen/X86/combine-bitselect.ll
35–41

I wonder if the left side code wins sometime, e.g. optsize?

RKSimon added inline comments.Nov 14 2021, 4:55 AM
llvm/test/CodeGen/X86/combine-bitselect.ll
35–41

We've replaced 2 x 128-bit (folded) loads with 1 x 128-bit (unfolded) load.

The issue I was most concerned with is increased register pressure - this is what the broadcast code helps with, but even that is of low concern.

pengfei accepted this revision.Nov 14 2021, 4:59 AM

LGTM.

llvm/test/CodeGen/X86/combine-bitselect.ll
35–41

Agreed. Thanks Simon.

This revision is now accepted and ready to land.Nov 14 2021, 4:59 AM