This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
4/20
SLPVectorizer.cpp
-
test/Transforms/SLPVectorizer/X86/
-
Transforms/
-
SLPVectorizer/
-
X86/
-
reorder_with_external_users.ll

Differential D125111

[SLP] Make reordering aware of external vectorizable scalar stores.
ClosedPublic

Authored by vporpo on May 6 2022, 11:25 AM.

Download Raw Diff

Details

Reviewers

ABataev
RKSimon
dmgreen
vdmitrie

Commits

rG71bcead98b2e: [SLP] Make reordering aware of external vectorizable scalar stores.

Summary

The current reordering scheme only checks the ordering of in-tree operands.
There are some cases, however, where we need to adjust the ordering based on
the ordering of a future SLP-tree who's instructions are not part of the
current tree, but are external users.

This patch is a simple implementation of this. We keep track of scalar stores
that are users of TreeEntries and if they look profitable to vectorize, then
we keep track of their ordering. During the reordering step we take this new
index order into account. This can remove some shuffles in cases like in the
lit test.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vporpo created this revision.May 6 2022, 11:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 11:25 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

vporpo requested review of this revision.May 6 2022, 11:25 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 11:25 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B163187: Diff 427688.May 6 2022, 11:26 AM

jgorbe added a subscriber: jgorbe.May 6 2022, 11:31 AM

ABataev added inline comments.May 6 2022, 2:07 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	Maybe just add another analysis somewhere here instead of adding new fields etc.?

vporpo added inline comments.May 6 2022, 2:30 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	We could do the check here, but I think it is a bit more explicit if we have a field in TreeEntry. Also it is very similar in nature to the other reordering data, so I think they should be represented in a similar way. It also helps with debugging because you can actually see it with a `dumpVectorizableTree()` dump just like the other reordering data. Wdyt?

vdmitrie added inline comments.May 6 2022, 3:01 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

4092

Hi Vasileios.

May I offer you to make some stylistic changes to this method to improve code readability?
I was bit lazy to place all comments inline - sorry for that :). It was much easier for me to just apply your patch and use editor locally. So I'm only putting summary here:

reduce abuse of "auto"
do not create thin lambdas just for minor reducing amount of code.
improve Cmp to not call getPointerDiff if we already failed once
added type check to Cmp before querying for diff (although I'm not sure whether we can have different store types here)
inline Cmp into the call for stable_sort
inline stable_sort into CanFormVector
outline CanFromVector outside the loop
I'm not sure we need to look at each store to find out whether they are all sequential. We probably can take diff between the first and the last ones.
use range loop for sequential index traversing

After applying all these suggestions here is how the method could possibly look:

SmallVector<BoUpSLP::OrdersType, 1>
BoUpSLP::findExternalStoreUsersReorderIndices(TreeEntry *TE) const {

unsigned NumLanes = TE->Scalars.size();

DenseMap<Value *, SmallVector<StoreInst *, 4>> PtrToStoresMap =
    collectUserStores(TE);

// Holds the reorder indices for each candidate store vector that is a user of
// the current TreeEntry.
SmallVector<OrdersType, 1> ExternalReorderIndices;

// \Returns true if the stores in `SortedStoresVec` are consecutive and can
// form a vector.
auto &&CanFormVector = [this](SmallVector<StoreInst *, 4> &StoresVecSorted) {
  Type *Ty = StoresVecSorted.front()->getValueOperand()->getType();
  bool FailedToSort = false;
  stable_sort(StoresVecSorted, [this, &FailedToSort, Ty](StoreInst *S1,
                                                         StoreInst *S2) {
    if (FailedToSort)
      return false;
    if (S1->getValueOperand()->getType() != Ty ||
        S2->getValueOperand()->getType() != Ty) {
      FailedToSort = true;
      return false;
    }
    Optional<int> Diff = getPointersDiff(Ty, S2->getPointerOperand(), Ty,
                                         S1->getPointerOperand(), *DL, *SE,
                                         /*StrictCheck=*/true);
    if (!Diff) {
      FailedToSort = true;
      return false;
    }
    return Diff < 0;
  });
  // If we failed to compare stores, then just abandon this stores vector.
  if (FailedToSort)
    return false;
  Optional<int> Range =
      getPointersDiff(Ty, StoresVecSorted.front()->getPointerOperand(), Ty,
                      StoresVecSorted.back()->getPointerOperand(), *DL, *SE,
                      /*StrictCheck=*/true);
  return *Range == (int)StoresVecSorted.size() - 1;
};

// Now inspect the stores collected per pointer and look for vectorization
// candidates. For each candidate calculate the reorder index vector and push
// it into `ExternalReorderIndices`
for (const auto &Pair : PtrToStoresMap) {
  const SmallVector<StoreInst *, 4> &StoresVec = Pair.second;

  // If we have fewer than NumLanes stores, then we can't form a vector.
  if (StoresVec.size() != NumLanes)
    continue;

  // Sort the vector based on the pointers. We create a copy because we may
  // need the original later for calculating the reorder (shuffle) indices.
  auto StoresVecSorted = StoresVec;

  // If the stores are not consecutive then abandon this sotres vector.
  if (!CanFormVector(StoresVecSorted))
    continue;

  // The scalars in StoresVec can form a vector instruction, so calculate the
  // shuffle indices.
  ExternalReorderIndices.resize(ExternalReorderIndices.size() + 1);
  OrdersType &ReorderIndices = ExternalReorderIndices.back();
  for (StoreInst *SI : StoresVec) {
    unsigned Idx = llvm::find(StoresVecSorted, SI) - StoresVecSorted.begin();
    ReorderIndices.push_back(Idx);
  }

  // Identity order (e.g., {0,1,2,3}) is modeled as an empty OrdersType in
  // reorderTopToBottom() and reorderBottomToTop(), so we are following the
  // same convention here.
  auto IsIdentityOrder = [](const OrdersType &Order) {
    for (unsigned Idx : seq<unsigned>(0, Order.size()))
      if (Idx != Order[Idx])
        return false;
    return true;
  };
  if (IsIdentityOrder(ReorderIndices))
    ReorderIndices.clear();
}
return ExternalReorderIndices;

}

ABataev added inline comments.May 6 2022, 3:07 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	I rather doubt that it is wise decision to waste some extra memory just to handle this corner case.

Thanks @vdmitrie for your comments. I will update the code.

vporpo added inline comments.May 6 2022, 3:26 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	I don't think that memory is a concern here since the VectorizableTree does not grow too large and we clear it before we build the next one. I think it is not worth making it less explicit just to save some memory. Reordering is already quite complex and without having this explicitly shown in the dump it would just make debugging harder. @vdmitrie any thoughts on this?

ABataev added inline comments.May 6 2022, 3:38 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	The reordering data is not stored in the tree, except for some cases, where this data is also required for correct codegen/cost estimation. I do not like the idea to keep this data in the tree without actually being used for cost/codegen.

vdmitrie added inline comments.May 6 2022, 4:02 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	I do not have strong objections wrt to keeping it in the tree although Alexey's arguments sound very reasonable. If debug printing is the issue then may be it worth trying to tweak debug printing routine(s) instead? dumpVectorizableTree() for example could collect this data and print it alongside with each tree node.

ABataev added inline comments.May 6 2022, 4:11 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	Better to teach reordering functions about dumping, rather than put some service data to the tree structure.

vporpo added inline comments.May 6 2022, 4:16 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	It is not just about debugging, it is also about the design. I agree with Alexey that, we should not keep data in the tree that is purely temporal. But I think in this case it is not temporal data. I believe it is a good design principle to separate phases, just because we can place the analysis in the reorder function, I don't think we should. Please bear in mind that we may decide in the future to extend the analysis to cover more cross-tree cases like this by doing a more extensive search, so the analysis could grow. If we do decide to have this analysis as a separate phase, then the natural place for holding the ordering data is the TreeEntry. I don't think this should be restricted to data passed to the cost/codegen phase only. Any data passed from analysis to transformations should qualify, reordering included.

ABataev added inline comments.May 6 2022, 4:33 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3537	I would not store it in the tree unless we definitely will use it somewhere else except for reordering. If (!) we'll need it for something else (probably, cost estimation), we can make this data a data member. It should not be hard. But before that I'd keep it internal to reordering phase.

What's the impact on the tests/benchmarks?

This fixes a regression in eigen BM_Dot_ComplexComplex_Naive. I doubt it will have any large impact anywhere else, but we should know soon enough.

In D125111#3498166, @vporpo wrote:

This fixes a regression in eigen BM_Dot_ComplexComplex_Naive. I doubt it will have any large impact anywhere else, but we should know soon enough.

The regressions would still appear unless final solution for non-power-of-2 is landed. Even after this there might be the issues with the cost model. What is the actual cause of the regression? The code was vectorized before but then it did not? There are couole regressions in reductiins, caused bya bit dufferent processing order, they should go away with the final code for reductions and non-power-of2.

vdmitrie added inline comments.May 6 2022, 5:02 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3556	not your code but this is unreachable because of check at line 3539.
3599	nit: name Order is already used by lambda above.
4042	nit: for (unsigned Lane : seq<unsigned>(0, TE->Scalars.size())) { then NumLanes can be dropped
4056	nit: use dyn_cast and move it to line 4043 (presumably with isSimple and isValidElementType checks)

The regressions would still appear unless final solution for non-power-of-2 is landed. Even after this there might be the issues with the cost model. What is the actual cause of the regression? The code was vectorized before but then it did not? There are couole regressions in reductiins, caused bya bit dufferent processing order, they should go away with the final code for reductions and non-power-of2.

This is not related to the non-power-of-2. The cause is shown in the lit test: SLP vectorizes trees in isolation so it may generate extra shuffles. It was triggered by the load broadcast cost change.

I'm also observing a stability issue. I'll submit a test case once reduce it.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
3599	nit: name Order is already used by lambda above. just to correct myself: not lambda but by the result of the lambda call ( I first overlooked it is a call). Anyway, Order in this loop hides Order from line 3582
4139	std::distance(StoresVecSorted.begin(), find(StoresVecSorted, SI));

Addressed most of @vdmitrie's comments.

Harbormaster completed remote builds in B163276: Diff 427807.May 6 2022, 6:30 PM

vporpo added inline comments.May 6 2022, 6:31 PM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4092	I think I addressed most of these points. Please take a look.

@vdmitrie please let me know if you find a stability issue, I will do more testing on my side too.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4139	Hmm why is it preferable over `operator-()`? `StoresVecSorted` is a `SmallVector`, it should implement a random access iterator.

reduced.ll3 KBDownload

to reproduce crash: opt -slp-vectorizer -disable-output reduced.ll
Fails on assertion:
... llvm-project/llvm/include/llvm/ADT/DenseMap.h:1244: llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type* llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::operator->() const [with KeyT = const llvm::slpvectorizer::BoUpSLP::TreeEntry*; ValueT = llvm::SmallVector<unsigned int, 4>; KeyInfoT = llvm::DenseMapInfo<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, void>; Bucket = llvm::detail::DenseMapPair<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, llvm::SmallVector<unsigned int, 4> >; bool IsConst = false; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::pointer = llvm::detail::DenseMapPair<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, llvm::SmallVector<unsigned int, 4> >*; llvm::DenseMapIterator<KeyT, ValueT, KeyInfoT, Bucket, IsConst>::value_type = llvm::detail::DenseMapPair<const llvm::slpvectorizer::BoUpSLP::TreeEntry*, llvm::SmallVector<unsigned int, 4> >]: Assertion `Ptr != End && "dereferencing end() iterator"' failed.

Thanks for the reproducer @vdmitrie , think I fixed the issue.

Fixed stability issue and also removed user tree indices out of the TreeEntry.

Harbormaster completed remote builds in B163521: Diff 428132.May 9 2022, 10:41 AM

Looks good with a nit.

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4119	wrap it with #ifndef NDEBUG (or delete).

This revision is now accepted and ready to land.May 10 2022, 7:59 AM

dmgreen added inline comments.May 10 2022, 8:15 AM

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
4094	Just a quick point - I would recommend against calling getPointersDiff in the sort compare function. I believe std::sorts require a strict weak ordering, and some compilers (MSVC) will complain if they do not. It's also probably calling getPointersDiff more times than necessary, being O(NlogN) as opposed to the N-1 calls needed. I think it should be possible to precompute all the offsets from the first pointer initially, and sort the offsets?

Thanks for the comments. Updated sorting according to @dmgreen's comments.

Harbormaster completed remote builds in B163767: Diff 428476.May 10 2022, 1:06 PM

This revision was landed with ongoing or failed builds.May 10 2022, 3:27 PM

Closed by commit rG71bcead98b2e: [SLP] Make reordering aware of external vectorizable scalar stores. (authored by vporpo). · Explain Why

This revision was automatically updated to reflect the committed changes.

vporpo added a commit: rG71bcead98b2e: [SLP] Make reordering aware of external vectorizable scalar stores..

this is causing crashes

$ cat /tmp/d.ll
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

define void @f() {
  store i32 undef, ptr undef, align 4
  %1 = getelementptr inbounds [4 x i32], ptr undef, i64 0, i64 3
  %2 = load i32, ptr null, align 4
  store i32 %2, ptr %1, align 4
  br label %3

3:                                                ; preds = %0
  br label %4

4:                                                ; preds = %9, %3
  %5 = phi i32 [ %2, %3 ], [ 0, %9 ]
  %6 = phi i32 [ undef, %3 ], [ 0, %9 ]
  %7 = phi i32 [ undef, %3 ], [ 0, %9 ]
  %8 = phi i32 [ %2, %3 ], [ 0, %9 ]
  br label %9

9:                                                ; preds = %4
  br label %4
}
$ ./build/rel/bin/opt -passes=slp-vectorizer -disable-output /tmp/d.ll
opt: ../../llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:5851: llvm::InstructionCost llvm::slpvectorizer::BoUpSLP::getEntryCost(const llvm::slpvectorizer::BoUpSLP::TreeEntry *, ArrayRef<llvm::Value *>): Assertion `E->getOpcode() && allSameType(VL) && allSameBlock(VL) && "Invalid VL"' failed.

aeubanks added a reverting change: rGc2a7904aba46: Revert "[SLP] Make reordering aware of external vectorizable scalar stores.".May 11 2022, 3:31 PM

vporpo mentioned this in rG0950d4060cd9: Recommit "[SLP] Make reordering aware of external vectorizable scalar stores.".May 11 2022, 4:49 PM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

SLPVectorizer.cpp

202 lines

test/

Transforms/

SLPVectorizer/

X86/

reorder_with_external_users.ll

11 lines

Diff 428476

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,175 Lines • ▼ Show 20 Lines	private:
/// Reorder commutative or alt operands to get better probability of		/// Reorder commutative or alt operands to get better probability of
/// generating vectorized code.		/// generating vectorized code.
static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,		static void reorderInputsAccordingToOpcode(ArrayRef<Value *> VL,
SmallVectorImpl<Value *> &Left,		SmallVectorImpl<Value *> &Left,
SmallVectorImpl<Value *> &Right,		SmallVectorImpl<Value *> &Right,
const DataLayout &DL,		const DataLayout &DL,
ScalarEvolution &SE,		ScalarEvolution &SE,
const BoUpSLP &R);		const BoUpSLP &R);

		/// Helper for `findExternalStoreUsersReorderIndices()`. It iterates over the
		/// users of \p TE and collects the stores. It returns the map from the store
		/// pointers to the collected stores.
		DenseMap<Value , SmallVector<StoreInst , 4>>
		collectUserStores(const BoUpSLP::TreeEntry *TE) const;

		/// Helper for `findExternalStoreUsersReorderIndices()`. It checks if the
		/// stores in \p StoresVec can for a vector instruction. If so it returns true
		/// and populates \p ReorderIndices with the shuffle indices of the the stores
		/// when compared to the sorted vector.
		bool CanFormVector(const SmallVector<StoreInst *, 4> &StoresVec,
		OrdersType &ReorderIndices) const;

		/// Iterates through the users of \p TE, looking for scalar stores that can be
		/// potentially vectorized in a future SLP-tree. If found, it keeps track of
		/// their order and builds an order index vector for each store bundle. It
		/// returns all these order vectors found.
		/// We run this after the tree has formed, otherwise we may come across user
		/// instructions that are not yet in the tree.
		SmallVector<OrdersType, 1>
		findExternalStoreUsersReorderIndices(TreeEntry *TE) const;

struct TreeEntry {		struct TreeEntry {
using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;		using VecTreeTy = SmallVector<std::unique_ptr<TreeEntry>, 8>;
TreeEntry(VecTreeTy &Container) : Container(Container) {}		TreeEntry(VecTreeTy &Container) : Container(Container) {}

/// \returns true if the scalars in VL are equal to this entry.		/// \returns true if the scalars in VL are equal to this entry.
bool isSame(ArrayRef<Value *> VL) const {		bool isSame(ArrayRef<Value *> VL) const {
auto &&IsSame = [VL](ArrayRef<Value *> Scalars, ArrayRef<int> Mask) {		auto &&IsSame = [VL](ArrayRef<Value *> Scalars, ArrayRef<int> Mask) {
if (Mask.size() != VL.size() && VL.size() == Scalars.size())		if (Mask.size() != VL.size() && VL.size() == Scalars.size())
▲ Show 20 Lines • Show All 1,294 Lines • ▼ Show 20 Lines
}		}

void BoUpSLP::reorderTopToBottom() {		void BoUpSLP::reorderTopToBottom() {
// Maps VF to the graph nodes.		// Maps VF to the graph nodes.
DenseMap<unsigned, SetVector<TreeEntry *>> VFToOrderedEntries;		DenseMap<unsigned, SetVector<TreeEntry *>> VFToOrderedEntries;
// ExtractElement gather nodes which can be vectorized and need to handle		// ExtractElement gather nodes which can be vectorized and need to handle
// their ordering.		// their ordering.
DenseMap<const TreeEntry *, OrdersType> GathersToOrders;		DenseMap<const TreeEntry *, OrdersType> GathersToOrders;

		// Maps a TreeEntry to the reorder indices of external users.
		DenseMap<const TreeEntry *, SmallVector<OrdersType, 1>>
		ExternalUserReorderMap;
// Find all reorderable nodes with the given VF.		// Find all reorderable nodes with the given VF.
// Currently the are vectorized stores,loads,extracts + some gathering of		// Currently the are vectorized stores,loads,extracts + some gathering of
// extracts.		// extracts.
for_each(VectorizableTree, [this, &VFToOrderedEntries, &GathersToOrders](		for_each(VectorizableTree, [this, &VFToOrderedEntries, &GathersToOrders,
		&ExternalUserReorderMap](
const std::unique_ptr<TreeEntry> &TE) {		const std::unique_ptr<TreeEntry> &TE) {
		// Look for external users that will probably be vectorized.
		SmallVector<OrdersType, 1> ExternalUserReorderIndices =
		findExternalStoreUsersReorderIndices(TE.get());
		if (!ExternalUserReorderIndices.empty()) {
		VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
		ExternalUserReorderMap.try_emplace(TE.get(),
		std::move(ExternalUserReorderIndices));
		}

if (Optional<OrdersType> CurrentOrder =		if (Optional<OrdersType> CurrentOrder =
getReorderingData(TE, /TopToBottom=*/true)) {		getReorderingData(TE, /TopToBottom=*/true)) {
		ABataevUnsubmitted Not Done Reply Inline Actions Maybe just add another analysis somewhere here instead of adding new fields etc.? ABataev: Maybe just add another analysis somewhere here instead of adding new fields etc.?
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions We could do the check here, but I think it is a bit more explicit if we have a field in TreeEntry. Also it is very similar in nature to the other reordering data, so I think they should be represented in a similar way. It also helps with debugging because you can actually see it with a `dumpVectorizableTree()` dump just like the other reordering data. Wdyt? vporpo: We could do the check here, but I think it is a bit more explicit if we have a field in…
		ABataevUnsubmitted Not Done Reply Inline Actions I rather doubt that it is wise decision to waste some extra memory just to handle this corner case. ABataev: I rather doubt that it is wise decision to waste some extra memory just to handle this corner…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions I don't think that memory is a concern here since the VectorizableTree does not grow too large and we clear it before we build the next one. I think it is not worth making it less explicit just to save some memory. Reordering is already quite complex and without having this explicitly shown in the dump it would just make debugging harder. @vdmitrie any thoughts on this? vporpo: I don't think that memory is a concern here since the VectorizableTree does not grow too large…
		ABataevUnsubmitted Not Done Reply Inline Actions The reordering data is not stored in the tree, except for some cases, where this data is also required for correct codegen/cost estimation. I do not like the idea to keep this data in the tree without actually being used for cost/codegen. ABataev: The reordering data is not stored in the tree, except for some cases, where this data is also…
		vdmitrieUnsubmitted Not Done Reply Inline Actions I do not have strong objections wrt to keeping it in the tree although Alexey's arguments sound very reasonable. If debug printing is the issue then may be it worth trying to tweak debug printing routine(s) instead? dumpVectorizableTree() for example could collect this data and print it alongside with each tree node. vdmitrie: I do not have strong objections wrt to keeping it in the tree although Alexey's arguments sound…
		ABataevUnsubmitted Not Done Reply Inline Actions Better to teach reordering functions about dumping, rather than put some service data to the tree structure. ABataev: Better to teach reordering functions about dumping, rather than put some service data to the…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions It is not just about debugging, it is also about the design. I agree with Alexey that, we should not keep data in the tree that is purely temporal. But I think in this case it is not temporal data. I believe it is a good design principle to separate phases, just because we can place the analysis in the reorder function, I don't think we should. Please bear in mind that we may decide in the future to extend the analysis to cover more cross-tree cases like this by doing a more extensive search, so the analysis could grow. If we do decide to have this analysis as a separate phase, then the natural place for holding the ordering data is the TreeEntry. I don't think this should be restricted to data passed to the cost/codegen phase only. Any data passed from analysis to transformations should qualify, reordering included. vporpo: It is not just about debugging, it is also about the design. I agree with Alexey that, we…
		ABataevUnsubmitted Not Done Reply Inline Actions I would not store it in the tree unless we definitely will use it somewhere else except for reordering. If (!) we'll need it for something else (probably, cost estimation), we can make this data a data member. It should not be hard. But before that I'd keep it internal to reordering phase. ABataev: I would not store it in the tree unless we definitely will use it somewhere else except for…
// Do not include ordering for nodes used in the alt opcode vectorization,		// Do not include ordering for nodes used in the alt opcode vectorization,
// better to reorder them during bottom-to-top stage. If follow the order		// better to reorder them during bottom-to-top stage. If follow the order
// here, it causes reordering of the whole graph though actually it is		// here, it causes reordering of the whole graph though actually it is
// profitable just to reorder the subgraph that starts from the alternate		// profitable just to reorder the subgraph that starts from the alternate
// opcode vectorization node. Such nodes already end-up with the shuffle		// opcode vectorization node. Such nodes already end-up with the shuffle
// instruction and it is just enough to change this shuffle rather than		// instruction and it is just enough to change this shuffle rather than
// rotate the scalars for the whole graph.		// rotate the scalars for the whole graph.
unsigned Cnt = 0;		unsigned Cnt = 0;
const TreeEntry *UserTE = TE.get();		const TreeEntry *UserTE = TE.get();
while (UserTE && Cnt < RecursionMaxDepth) {		while (UserTE && Cnt < RecursionMaxDepth) {
if (UserTE->UserTreeIndices.size() != 1)		if (UserTE->UserTreeIndices.size() != 1)
break;		break;
if (all_of(UserTE->UserTreeIndices, [](const EdgeInfo &EI) {		if (all_of(UserTE->UserTreeIndices, [](const EdgeInfo &EI) {
return EI.UserTE->State == TreeEntry::Vectorize &&		return EI.UserTE->State == TreeEntry::Vectorize &&
EI.UserTE->isAltShuffle() && EI.UserTE->Idx != 0;		EI.UserTE->isAltShuffle() && EI.UserTE->Idx != 0;
}))		}))
return;		return;
if (UserTE->UserTreeIndices.empty())		if (UserTE->UserTreeIndices.empty())
UserTE = nullptr;		UserTE = nullptr;
		vdmitrieUnsubmitted Not Done Reply Inline Actions not your code but this is unreachable because of check at line 3539. vdmitrie: not your code but this is unreachable because of check at line 3539.
else		else
UserTE = UserTE->UserTreeIndices.back().UserTE;		UserTE = UserTE->UserTreeIndices.back().UserTE;
++Cnt;		++Cnt;
}		}
VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());		VFToOrderedEntries[TE->Scalars.size()].insert(TE.get());
if (TE->State != TreeEntry::Vectorize)		if (TE->State != TreeEntry::Vectorize)
GathersToOrders.try_emplace(TE.get(), *CurrentOrder);		GathersToOrders.try_emplace(TE.get(), *CurrentOrder);
}		}
Show All 17 Lines	for (unsigned VF = VectorizableTree.front()->Scalars.size(); VF > 1;
SmallPtrSet<const TreeEntry *, 4> VisitedOps;		SmallPtrSet<const TreeEntry *, 4> VisitedOps;
for (const TreeEntry *OpTE : OrderedEntries) {		for (const TreeEntry *OpTE : OrderedEntries) {
// No need to reorder this nodes, still need to extend and to use shuffle,		// No need to reorder this nodes, still need to extend and to use shuffle,
// just need to merge reordering shuffle and the reuse shuffle.		// just need to merge reordering shuffle and the reuse shuffle.
if (!OpTE->ReuseShuffleIndices.empty())		if (!OpTE->ReuseShuffleIndices.empty())
continue;		continue;
// Count number of orders uses.		// Count number of orders uses.
const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {		const auto &Order = [OpTE, &GathersToOrders]() -> const OrdersType & {
if (OpTE->State == TreeEntry::NeedToGather)		if (OpTE->State == TreeEntry::NeedToGather) {
return GathersToOrders.find(OpTE)->second;		auto It = GathersToOrders.find(OpTE);
		if (It != GathersToOrders.end())
		return It->second;
		}
return OpTE->ReorderIndices;		return OpTE->ReorderIndices;
}();		}();
		// First consider the order of the external scalar users.
		auto It = ExternalUserReorderMap.find(OpTE);
		if (It != ExternalUserReorderMap.end()) {
		vdmitrieUnsubmitted Done Reply Inline Actions nit: name Order is already used by lambda above. vdmitrie: nit: name Order is already used by lambda above.
		vdmitrieUnsubmitted Not Done Reply Inline Actions nit: name Order is already used by lambda above. just to correct myself: not lambda but by the result of the lambda call ( I first overlooked it is a call). Anyway, Order in this loop hides Order from line 3582 vdmitrie: > nit: name Order is already used by lambda above. just to correct myself: not lambda but by…
		const auto &ExternalUserReorderIndices = It->second;
		for (const OrdersType &ExtOrder : ExternalUserReorderIndices)
		++OrdersUses.insert(std::make_pair(ExtOrder, 0)).first->second;
		// No other useful reorder data in this entry.
		if (Order.empty())
		continue;
		}
// Stores actually store the mask, not the order, need to invert.		// Stores actually store the mask, not the order, need to invert.
if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&		if (OpTE->State == TreeEntry::Vectorize && !OpTE->isAltShuffle() &&
OpTE->getOpcode() == Instruction::Store && !Order.empty()) {		OpTE->getOpcode() == Instruction::Store && !Order.empty()) {
SmallVector<int> Mask;		SmallVector<int> Mask;
inversePermutation(Order, Mask);		inversePermutation(Order, Mask);
unsigned E = Order.size();		unsigned E = Order.size();
OrdersType CurrentOrder(E, E);		OrdersType CurrentOrder(E, E);
transform(Mask, CurrentOrder.begin(), [E](int Idx) {		transform(Mask, CurrentOrder.begin(), [E](int Idx) {
▲ Show 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	for (int Lane = 0, LE = Entry->Scalars.size(); Lane != LE; ++Lane) {
LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "		LLVM_DEBUG(dbgs() << "SLP: Need to extract:" << *U << " from lane "
<< Lane << " from " << *Scalar << ".\n");		<< Lane << " from " << *Scalar << ".\n");
ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));		ExternalUses.push_back(ExternalUser(Scalar, U, FoundLane));
}		}
}		}
}		}
}		}

		DenseMap<Value , SmallVector<StoreInst , 4>>
		BoUpSLP::collectUserStores(const BoUpSLP::TreeEntry *TE) const {
		DenseMap<Value , SmallVector<StoreInst , 4>> PtrToStoresMap;
		for (unsigned Lane : seq<unsigned>(0, TE->Scalars.size())) {
		Value *V = TE->Scalars[Lane];
		vdmitrieUnsubmitted Done Reply Inline Actions nit: for (unsigned Lane : seq<unsigned>(0, TE->Scalars.size())) { then NumLanes can be dropped vdmitrie: nit: for (unsigned Lane : seq<unsigned>(0, TE->Scalars.size())) { then NumLanes can be…
		// To save compilation time we don't visit if we have too many users.
		static constexpr unsigned UsersLimit = 4;
		if (V->hasNUsesOrMore(UsersLimit))
		break;

		// Collect stores per pointer object.
		for (User *U : V->users()) {
		auto *SI = dyn_cast<StoreInst>(U);
		if (SI == nullptr \|\| !SI->isSimple() \|\|
		!isValidElementType(SI->getValueOperand()->getType()))
		continue;
		// Skip entry if already
		if (getTreeEntry(U))
		continue;
		vdmitrieUnsubmitted Done Reply Inline Actions nit: use dyn_cast and move it to line 4043 (presumably with isSimple and isValidElementType checks) vdmitrie: nit: use dyn_cast and move it to line 4043 (presumably with isSimple and isValidElementType…

		Value *Ptr = getUnderlyingObject(SI->getPointerOperand());
		auto &StoresVec = PtrToStoresMap[Ptr];
		// For now just keep one store per pointer object per lane.
		// TODO: Extend this to support multiple stores per pointer per lane
		if (StoresVec.size() > Lane)
		continue;
		// Skip if in different BBs.
		if (!StoresVec.empty() &&
		SI->getParent() != StoresVec.back()->getParent())
		continue;
		// Make sure that the stores are of the same type.
		if (!StoresVec.empty() &&
		SI->getValueOperand()->getType() !=
		StoresVec.back()->getValueOperand()->getType())
		continue;
		StoresVec.push_back(SI);
		}
		}
		return PtrToStoresMap;
		}

		bool BoUpSLP::CanFormVector(const SmallVector<StoreInst *, 4> &StoresVec,
		OrdersType &ReorderIndices) const {
		// We check whether the stores in StoreVec can form a vector by sorting them
		// and checking whether they are consecutive.

		// To avoid calling getPointersDiff() while sorting we create a vector of
		// pairs {store, offset from first} and sort this instead.
		SmallVector<std::pair<StoreInst *, int>, 4> StoreOffsetVec(StoresVec.size());
		StoreInst *S0 = StoresVec[0];
		StoreOffsetVec[0] = {S0, 0};
		Type *S0Ty = S0->getValueOperand()->getType();
		Value *S0Ptr = S0->getPointerOperand();
		for (unsigned Idx : seq<unsigned>(1, StoresVec.size())) {
		StoreInst *SI = StoresVec[Idx];
		vdmitrieUnsubmitted Not Done Reply Inline Actions Hi Vasileios. May I offer you to make some stylistic changes to this method to improve code readability? I was bit lazy to place all comments inline - sorry for that :). It was much easier for me to just apply your patch and use editor locally. So I'm only putting summary here: reduce abuse of "auto" do not create thin lambdas just for minor reducing amount of code. improve Cmp to not call getPointerDiff if we already failed once added type check to Cmp before querying for diff (although I'm not sure whether we can have different store types here) inline Cmp into the call for stable_sort inline stable_sort into CanFormVector outline CanFromVector outside the loop I'm not sure we need to look at each store to find out whether they are all sequential. We probably can take diff between the first and the last ones. use range loop for sequential index traversing After applying all these suggestions here is how the method could possibly look: SmallVector<BoUpSLP::OrdersType, 1> BoUpSLP::findExternalStoreUsersReorderIndices(TreeEntry TE) const { unsigned NumLanes = TE->Scalars.size(); DenseMap<Value , SmallVector<StoreInst , 4>> PtrToStoresMap = collectUserStores(TE); // Holds the reorder indices for each candidate store vector that is a user of // the current TreeEntry. SmallVector<OrdersType, 1> ExternalReorderIndices; // \Returns true if the stores in `SortedStoresVec` are consecutive and can // form a vector. auto &&CanFormVector = [this](SmallVector<StoreInst , 4> &StoresVecSorted) { Type Ty = StoresVecSorted.front()->getValueOperand()->getType(); bool FailedToSort = false; stable_sort(StoresVecSorted, [this, &FailedToSort, Ty](StoreInst S1, StoreInst S2) { if (FailedToSort) return false; if (S1->getValueOperand()->getType() != Ty \|\| S2->getValueOperand()->getType() != Ty) { FailedToSort = true; return false; } Optional<int> Diff = getPointersDiff(Ty, S2->getPointerOperand(), Ty, S1->getPointerOperand(), DL, SE, /StrictCheck=/true); if (!Diff) { FailedToSort = true; return false; } return Diff < 0; }); // If we failed to compare stores, then just abandon this stores vector. if (FailedToSort) return false; Optional<int> Range = getPointersDiff(Ty, StoresVecSorted.front()->getPointerOperand(), Ty, StoresVecSorted.back()->getPointerOperand(), DL, SE, /StrictCheck=/true); return Range == (int)StoresVecSorted.size() - 1; }; // Now inspect the stores collected per pointer and look for vectorization // candidates. For each candidate calculate the reorder index vector and push // it into `ExternalReorderIndices` for (const auto &Pair : PtrToStoresMap) { const SmallVector<StoreInst , 4> &StoresVec = Pair.second; // If we have fewer than NumLanes stores, then we can't form a vector. if (StoresVec.size() != NumLanes) continue; // Sort the vector based on the pointers. We create a copy because we may // need the original later for calculating the reorder (shuffle) indices. auto StoresVecSorted = StoresVec; // If the stores are not consecutive then abandon this sotres vector. if (!CanFormVector(StoresVecSorted)) continue; // The scalars in StoresVec can form a vector instruction, so calculate the // shuffle indices. ExternalReorderIndices.resize(ExternalReorderIndices.size() + 1); OrdersType &ReorderIndices = ExternalReorderIndices.back(); for (StoreInst SI : StoresVec) { unsigned Idx = llvm::find(StoresVecSorted, SI) - StoresVecSorted.begin(); ReorderIndices.push_back(Idx); } // Identity order (e.g., {0,1,2,3}) is modeled as an empty OrdersType in // reorderTopToBottom() and reorderBottomToTop(), so we are following the // same convention here. auto IsIdentityOrder = [](const OrdersType &Order) { for (unsigned Idx : seq<unsigned>(0, Order.size())) if (Idx != Order[Idx]) return false; return true; }; if (IsIdentityOrder(ReorderIndices)) ReorderIndices.clear(); } return ExternalReorderIndices; } vdmitrie: Hi Vasileios. May I offer you to make some stylistic changes to this method to improve code…
		vporpoAuthorUnsubmitted Not Done Reply Inline Actions I think I addressed most of these points. Please take a look. vporpo: I think I addressed most of these points. Please take a look.
		Optional<int> Diff =
		getPointersDiff(S0Ty, S0Ptr, SI->getValueOperand()->getType(),
		dmgreenUnsubmitted Not Done Reply Inline Actions Just a quick point - I would recommend against calling getPointersDiff in the sort compare function. I believe std::sorts require a strict weak ordering, and some compilers (MSVC) will complain if they do not. It's also probably calling getPointersDiff more times than necessary, being O(NlogN) as opposed to the N-1 calls needed. I think it should be possible to precompute all the offsets from the first pointer initially, and sort the offsets? dmgreen: Just a quick point - I would recommend against calling getPointersDiff in the sort compare…
		SI->getPointerOperand(), DL, SE,
		/StrictCheck=/true);
		// We failed to compare the pointers so just abandon this StoresVec.
		if (!Diff)
		return false;
		StoreOffsetVec[Idx] = {StoresVec[Idx], *Diff};
		}

		// Sort the vector based on the pointers. We create a copy because we may
		// need the original later for calculating the reorder (shuffle) indices.
		stable_sort(StoreOffsetVec, [](const std::pair<StoreInst *, int> &Pair1,
		const std::pair<StoreInst *, int> &Pair2) {
		int Offset1 = Pair1.second;
		int Offset2 = Pair2.second;
		return Offset1 < Offset2;
		});

		// Check if the stores are consecutive by checking if last-first == size-1.
		int LastOffset = StoreOffsetVec.back().second;
		int FirstOffset = StoreOffsetVec.front().second;
		if (LastOffset - FirstOffset != (int)StoreOffsetVec.size() - 1)
		return false;

		// Calculate the shuffle indices according to their offset against the sorted
		// StoreOffsetVec.
		vdmitrieUnsubmitted Not Done Reply Inline Actions wrap it with #ifndef NDEBUG (or delete). vdmitrie: wrap it with #ifndef NDEBUG (or delete).
		ReorderIndices.reserve(StoresVec.size());
		for (StoreInst *SI : StoresVec) {
		unsigned Idx = find_if(StoreOffsetVec,
		[SI](const std::pair<StoreInst *, int> &Pair) {
		return Pair.first == SI;
		}) -
		StoreOffsetVec.begin();
		ReorderIndices.push_back(Idx);
		}
		// Identity order (e.g., {0,1,2,3}) is modeled as an empty OrdersType in
		// reorderTopToBottom() and reorderBottomToTop(), so we are following the
		// same convention here.
		auto IsIdentityOrder = [](const OrdersType &Order) {
		for (unsigned Idx : seq<unsigned>(0, Order.size()))
		if (Idx != Order[Idx])
		return false;
		return true;
		};
		if (IsIdentityOrder(ReorderIndices))
		ReorderIndices.clear();
		vdmitrieUnsubmitted Not Done Reply Inline Actions std::distance(StoresVecSorted.begin(), find(StoresVecSorted, SI)); vdmitrie: std::distance(StoresVecSorted.begin(), find(StoresVecSorted, SI));
		vporpoAuthorUnsubmitted Done Reply Inline Actions Hmm why is it preferable over `operator-()`? `StoresVecSorted` is a `SmallVector`, it should implement a random access iterator. vporpo: Hmm why is it preferable over `operator-()`? `StoresVecSorted` is a `SmallVector`, it should…

		return true;
		}

		#ifndef NDEBUG
		LLVM_DUMP_METHOD static void dumpOrder(const BoUpSLP::OrdersType &Order) {
		for (unsigned Idx : Order)
		dbgs() << Idx << ", ";
		dbgs() << "\n";
		}
		#endif

		SmallVector<BoUpSLP::OrdersType, 1>
		BoUpSLP::findExternalStoreUsersReorderIndices(TreeEntry *TE) const {
		unsigned NumLanes = TE->Scalars.size();

		DenseMap<Value , SmallVector<StoreInst , 4>> PtrToStoresMap =
		collectUserStores(TE);

		// Holds the reorder indices for each candidate store vector that is a user of
		// the current TreeEntry.
		SmallVector<OrdersType, 1> ExternalReorderIndices;

		// Now inspect the stores collected per pointer and look for vectorization
		// candidates. For each candidate calculate the reorder index vector and push
		// it into `ExternalReorderIndices`
		for (const auto &Pair : PtrToStoresMap) {
		auto &StoresVec = Pair.second;
		// If we have fewer than NumLanes stores, then we can't form a vector.
		if (StoresVec.size() != NumLanes)
		continue;

		// If the stores are not consecutive then abandon this StoresVec.
		OrdersType ReorderIndices;
		if (!CanFormVector(StoresVec, ReorderIndices))
		continue;

		// We now know that the scalars in StoresVec can form a vector instruction,
		// so set the reorder indices.
		ExternalReorderIndices.push_back(ReorderIndices);
		}
		return ExternalReorderIndices;
		}

void BoUpSLP::buildTree(ArrayRef<Value *> Roots,		void BoUpSLP::buildTree(ArrayRef<Value *> Roots,
ArrayRef<Value *> UserIgnoreLst) {		ArrayRef<Value *> UserIgnoreLst) {
deleteTree();		deleteTree();
UserIgnoreList = UserIgnoreLst;		UserIgnoreList = UserIgnoreLst;
if (!allSameType(Roots))		if (!allSameType(Roots))
return;		return;
buildTree_rec(Roots, 0, EdgeInfo());		buildTree_rec(Roots, 0, EdgeInfo());
}		}
▲ Show 20 Lines • Show All 7,438 Lines • Show Last 20 Lines

llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s			; RUN: opt < %s -slp-vectorizer -S -mtriple=x86_64-unknown-linux-gnu -mcpu=corei7-avx \| FileCheck %s

	; Make sure that we rotate the graph to help avoid the shuffle to			; Make sure that we rotate the graph to help avoid the shuffle to
	; the external vectorizable stores.			; the external vectorizable stores.
	;			;
	; SLP starts vectorizing from the operands of the `fcmp` in bb2, then crosses			; SLP starts vectorizing from the operands of the `fcmp` in bb2, then crosses
	; into bb1, vectorizing all the way to the broadcast load at the top.			; into bb1, vectorizing all the way to the broadcast load at the top.
	; The stores in bb1 are external to this tree, but they are vectorizable and are			; The stores in bb1 are external to this tree, but they are vectorizable and are
	; in reverse order.			; in reverse order.
	define void @rotate_with_external_users(double %A, double %ptr) {			define void @rotate_with_external_users(double %A, double %ptr) {
	; CHECK-LABEL: @rotate_with_external_users(			; CHECK-LABEL: @rotate_with_external_users(
	; CHECK-NEXT: bb1:			; CHECK-NEXT: bb1:
	; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8			; CHECK-NEXT: [[LD:%.]] = load double, double undef, align 8
	; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0			; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x double> poison, double [[LD]], i32 0
	; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[LD]], i32 1			; CHECK-NEXT: [[TMP1:%.*]] = insertelement <2 x double> [[TMP0]], double [[LD]], i32 1
	; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 1.100000e+00, double 2.200000e+00>			; CHECK-NEXT: [[TMP2:%.*]] = fadd <2 x double> [[TMP1]], <double 2.200000e+00, double 1.100000e+00>
	; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 1.100000e+00, double 2.200000e+00>			; CHECK-NEXT: [[TMP3:%.*]] = fmul <2 x double> [[TMP2]], <double 2.200000e+00, double 1.100000e+00>
	; CHECK-NEXT: [[PTRA1:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0			; CHECK-NEXT: [[PTRA1:%.]] = getelementptr inbounds double, double [[A:%.*]], i64 0
	; CHECK-NEXT: [[SHUFFLE:%.*]] = shufflevector <2 x double> [[TMP3]], <2 x double> poison, <2 x i32> <i32 1, i32 0>
	; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[PTRA1]] to <2 x double>*			; CHECK-NEXT: [[TMP4:%.]] = bitcast double [[PTRA1]] to <2 x double>*
	; CHECK-NEXT: store <2 x double> [[SHUFFLE]], <2 x double>* [[TMP4]], align 8			; CHECK-NEXT: store <2 x double> [[TMP3]], <2 x double>* [[TMP4]], align 8
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP3]], <double 3.300000e+00, double 4.400000e+00>			; CHECK-NEXT: [[TMP5:%.*]] = fadd <2 x double> [[TMP3]], <double 4.400000e+00, double 3.300000e+00>
	; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0			; CHECK-NEXT: [[TMP6:%.*]] = extractelement <2 x double> [[TMP5]], i32 0
	; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1			; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x double> [[TMP5]], i32 1
	; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP6]], [[TMP7]]			; CHECK-NEXT: [[SEED:%.*]] = fcmp ogt double [[TMP7]], [[TMP6]]
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	;			;
	bb1:			bb1:
	%ld = load double, double* undef			%ld = load double, double* undef

	%add1 = fadd double %ld, 1.1			%add1 = fadd double %ld, 1.1
	%add2 = fadd double %ld, 2.2			%add2 = fadd double %ld, 2.2

	Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Make reordering aware of external vectorizable scalar stores.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 428476

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

llvm/test/Transforms/SLPVectorizer/X86/reorder_with_external_users.ll

[SLP] Make reordering aware of external vectorizable scalar stores.
ClosedPublic