This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel] Add G_SBFX + G_UBFX (bitfield extraction opcodes)
ClosedPublic

Authored by paquette on Mar 11 2021, 3:58 PM.

Details

Summary

There is a bunch of similar bitfield extraction code throughout *ISelDAGToDAG.

E.g, ARMISelDAGToDAG, AArch64ISelDAGToDAG, and AMDGPUISelDAGToDAG all contain code that matches a bitfield extract from an and + right shift.

Rather than duplicating code in the same way, this adds two opcodes:

  • G_UBFX (unsigned bitfield extract)
  • G_SBFX (signed bitfield extract)

They work like this

%x = G_UBFX %y, lsb, width

Where lsb and width denote

  • The least-significant bit of the extraction
  • The width of the extraction

This will extract width bits from %y, starting at lsb. G_UBFX zero-extends the result, while G_SBFX sign-extends the result.

This should allow us to use the combiner to match the bitfield extraction patterns rather than duplicating pattern-matching code in each target.

Diff Detail

Event Timeline

paquette created this revision.Mar 11 2021, 3:58 PM
paquette requested review of this revision.Mar 11 2021, 3:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2021, 3:58 PM
Herald added a subscriber: wdng. · View Herald Transcript
arsenm added inline comments.Mar 11 2021, 5:40 PM
llvm/include/llvm/Target/GenericOpcodes.td
1364

These can be registers like a normal op. AMDGPU has registers/variable inputs for the offset and width

If we have to allow variable width operands for these, it doesn't really help AArch64 that much. We have to match the constant pattern to a target specific opcode, and then lower the rest of these to shifts by default. I can see the benefit of AMDGPU, but with variable extracts this just puts us back where we started for AArch64.

paquette updated this revision to Diff 330291.Mar 12 2021, 10:21 AM
paquette edited the summary of this revision. (Show Details)
  • Make the LSB + width registers
  • Update verifier
  • Add a MIRBuilder test

If we have to allow variable width operands for these, it doesn't really help AArch64 that much. We have to match the constant pattern to a target specific opcode, and then lower the rest of these to shifts by default. I can see the benefit of AMDGPU, but with variable extracts this just puts us back where we started for AArch64.

Would a allowsVariableWidthExtracts hook of some sort help? Then we could match the constants-only pattern in AArch64, but allow other targets to have variable-width operands.

I think something like this would work post-legalization:

// We'd need these functions...
if (isBeforeLegalizer() || !isLegal(...))
  return false;

Register LshrLHS, LshrRHS;
int64_t Mask;
if (!mi_match(Reg, MRI,
              m_GAnd(m_GLShr(m_Reg(LshrLHS), m_Reg(LshrRHS)), m_ICst(Mask))))
  return false;
if (/*Mask isn't a mask...*/)
  return false;

if (TLI.allowsRegisterExtractOps(/*...*/)) {
  // ... Do stuff ...
  return true;
}

// Need immediates for LSB + width.
int64_t LshrImm;
if (!mi_match(LshrRHS, MRI, m_ICst(LshrImm)))
  return false;

// ... Do stuff ...
return true;

I think something like this would work post-legalization:

// We'd need these functions...
if (isBeforeLegalizer() || !isLegal(...))
  return false;

Register LshrLHS, LshrRHS;
int64_t Mask;
if (!mi_match(Reg, MRI,
              m_GAnd(m_GLShr(m_Reg(LshrLHS), m_Reg(LshrRHS)), m_ICst(Mask))))
  return false;
if (/*Mask isn't a mask...*/)
  return false;

if (TLI.allowsRegisterExtractOps(/*...*/)) {
  // ... Do stuff ...
  return true;
}

// Need immediates for LSB + width.
int64_t LshrImm;
if (!mi_match(LshrRHS, MRI, m_ICst(LshrImm)))
  return false;

// ... Do stuff ...
return true;

So we form these post-legalize only if we can see that the sources are constant for arm64. Sounds reasonable to me. @arsenm that ok for AMDGPU?

If we have to allow variable width operands for these, it doesn't really help AArch64 that much. We have to match the constant pattern to a target specific opcode, and then lower the rest of these to shifts by default. I can see the benefit of AMDGPU, but with variable extracts this just puts us back where we started for AArch64.

I think something like this would work post-legalization:

// We'd need these functions...
if (isBeforeLegalizer() || !isLegal(...))
  return false;

Register LshrLHS, LshrRHS;
int64_t Mask;
if (!mi_match(Reg, MRI,
              m_GAnd(m_GLShr(m_Reg(LshrLHS), m_Reg(LshrRHS)), m_ICst(Mask))))
  return false;
if (/*Mask isn't a mask...*/)
  return false;

if (TLI.allowsRegisterExtractOps(/*...*/)) {
  // ... Do stuff ...
  return true;
}

// Need immediates for LSB + width.
int64_t LshrImm;
if (!mi_match(LshrRHS, MRI, m_ICst(LshrImm)))
  return false;

// ... Do stuff ...
return true;

So we form these post-legalize only if we can see that the sources are constant for arm64. Sounds reasonable to me. @arsenm that ok for AMDGPU?

Yes, forming them only if the target supports the variable case makes sense (although also having the legalize back to shifts path would be nice for consistency)

aemerson accepted this revision.Mar 15 2021, 9:39 PM

LGTM.

This revision is now accepted and ready to land.Mar 15 2021, 9:39 PM