Page MenuHomePhabricator

[x86][SelectionDAG] Unroll vectorized FREM instructions which will be lowered to libcalls

Authored by n-omer on May 19 2022, 9:33 AM.



Currently, two element vectors produced as the result of a binary op are
widened to four element vectors on x86 by
DAGTypeLegalizer::WidenVecRes_BinaryCanTrap. If the op still isn't legal
after widening it is unrolled into scalar 4 scalar ops in selectionDAG before
being converted into a libcall. This way we end up with 4 libcalls (two of them
on known undef elements) instead of the original two libcalls.

This patch modifies DAGTypeLegalizer::WidenVectorResult to ensure
that if it is known that an binary op will be turned into a libcall, it is
unrolled instead of being widened. This prevents the creation of the extra
scalar instructions on known undef elements and (eventually) libcalls with
known undef parameters which would otherwise be created when the op gets
expanded post widening.


Regression test for the change.


The change in SelectionDAG as mentioned above.

Diff Detail

Event Timeline

n-omer created this revision.May 19 2022, 9:33 AM
n-omer requested review of this revision.May 19 2022, 9:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2022, 9:33 AM

Probably pull out the frem.ll and frem-libcall.ll tests into their own phab for review first - frem-libcall.ll in particular doesn't show the current problem in trunk (it generates 4 fmodf calls atm).


Sort this to keep the description more easy to notice:

; NOTE: Assertions have been autogenerated by utils/
; RUN: llc -mtriple=x86_64-linux-gnu < %s  | FileCheck %s

; Ensure vectorized FREMs are not widened/unrolled such that they get lowered
; into libcalls on undef elements.
4 ↗(On Diff #430716)

Sort this to keep the description more easy to notice:

; NOTE: Assertions have been autogenerated by utils/
; RUN: llc -mtriple=x86_64-linux-gnu < %s  | FileCheck %s

; Basic test coverage for FREM
95 ↗(On Diff #430716)

very pedantic - but maybe sort the vector tests by size - the v8f16/v4f32/v2f64 first - then the 256 / 512-bit variants.

309 ↗(On Diff #430716)


This makes the names and conventions here seem very confusing. The function is called WidenVecRes_BinaryCanTrap, but we're dealing with operations that can't actually trap. And it now has two different ways of unrolling an operation, depending on whether we're generating a libcall: the existing codepath uses smaller vectors, and the new one completely unrolls.

Can we unify the determination of whether we want to generate operations involving undef? And can we unify how we unroll an operation?

Why do we need to anything more than what we do for ISD::FSIN and friends already? LegalizeDAG.cpp doesn't have an Expand other than libcall for ISD::FREM. So checking TLI.isOperationExpand like we do for FSIN should be sufficient right?

Hi @RKSimon, I've updated the tests for the current trunk and split them off into

n-omer updated this revision to Diff 431000.EditedMay 20 2022, 9:34 AM

Updating the diff for completeness because D126055. Still looking into the reviewers comments.

n-omer updated this revision to Diff 431365.May 23 2022, 7:27 AM

Update diff based on review comments.

n-omer updated this revision to Diff 431367.May 23 2022, 7:30 AM

Add context to diff.

n-omer edited the summary of this revision. (Show Details)May 23 2022, 7:33 AM
n-omer updated this revision to Diff 431371.May 23 2022, 7:36 AM

Remove FIXME from test.

LGTM - its annoying that we can't reuse the similar code for the unary ops, but not a big issue. I think we will probably need to add FPOW and the integer division/remainder, but we can address that later.

@craig.topper @efriedma Any comments?

spatel added inline comments.May 23 2022, 9:34 AM

Instead of copying this block, create a lambda above the switch and call it from here and the existing code?

auto unrollExpandedOp = [&]() {
  // We're going to widen this vector op to a legal type by padding with undef
  // elements. If the wide vector op is eventually going to be expanded to
  // scalar libcalls, then unroll into scalar ops now to avoid unnecessary
  // libcalls on the undef elements.
  EVT VT = N->getValueType(0);
  EVT WideVecVT = TLI.getTypeToTransformTo(*DAG.getContext(), VT);
  if (!TLI.isOperationLegalOrCustom(N->getOpcode(), WideVecVT) &&
      TLI.isOperationExpand(N->getOpcode(), VT.getScalarType())) {
    Res = DAG.UnrollVectorOp(N, WideVecVT.getVectorNumElements());
    return true;
  return false;
n-omer updated this revision to Diff 431415.May 23 2022, 10:01 AM

Update diff based on reviewer's comments.

Thanks @spatel.

RKSimon added inline comments.May 24 2022, 12:14 AM

remove these braces now?


these braces can go now?

n-omer updated this revision to Diff 431616.May 24 2022, 1:49 AM

Update diff based on reviewer's comments.

Thanks @RKSimon.

RKSimon accepted this revision.May 24 2022, 2:24 AM


This revision is now accepted and ready to land.May 24 2022, 2:24 AM
This revision was landed with ongoing or failed builds.May 24 2022, 5:35 AM
This revision was automatically updated to reflect the committed changes.