Page MenuHomePhabricator

[X86][SSE] Simplify extract(shuffle(load())) handling (PR43971)
Changes PlannedPublic

Authored by RKSimon on Thu, Nov 14, 12:20 PM.



PR43971 showed how XFormVExtractWithShuffleIntoLoad was relying on a later call to DAGCombiner::visitEXTRACT_VECTOR_ELT to succeed before the regenerated VECTOR_SHUFFLE was re-lowered to a target shuffle again.

This patch removes XFormVExtractWithShuffleIntoLoad entirely, avoiding the creation of the VECTOR_SHUFFLE, instead it uses combineExtractWithShuffle to extract directly from the load (stripping any bitcasts).

Diff Detail

Event Timeline

RKSimon created this revision.Thu, Nov 14, 12:20 PM
Herald added a project: Restricted Project. · View Herald TranscriptThu, Nov 14, 12:20 PM
wolfgangp added a subscriber: test.EditedThu, Nov 14, 5:58 PM

I applied your patch to ToT, but now I'm seeing a loop with the following IR (on linux). Seems still stuck in DAGCombine.

define void @test() local_unnamed_addr {
  %id34847 = alloca <2 x double>, align 16
  %id34846 = alloca double, align 8
  %id34847.0.id34847.0. = load volatile <2 x double>, <2 x double>* %id34847, align 16
  %vecext = extractelement <2 x double> %id34847.0.id34847.0., i32 1
  store volatile double %vecext, double* %id34846, align 8
  ret void
RKSimon updated this revision to Diff 229499.Fri, Nov 15, 3:45 AM

Ensure the load is simple

I'm unfortunately still getting a loop in DAGcombiner with this one (llc -mattr=+avx on linux). If you make the 1.000 in the select instr into 0.000 it finishes.

define float @test(<8 x float> *%a0) {
  %0 = load <8 x float>, <8 x float>* %a0, align 32
  %vecext = extractelement <8 x float> %0, i32 1
  %cmp = fcmp oeq float %vecext, 0.000000e+00
  %cond = select i1 %cmp, float 1.000000e+00, float %vecext
  ret float %cond
RKSimon planned changes to this revision.Fri, Nov 22, 8:37 AM