This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/Transforms/Vectorize/
-
lib/
-
Transforms/
-
Vectorize/
1
LoopVectorize.cpp

Differential D91518

[LV][NFC-ish] Allow vector widths over 256 elements
ClosedPublic

Authored by simoll on Nov 16 2020, 12:41 AM.

Download Raw Diff

Details

Reviewers

fhahn
k-ishizaka
kaz7

Commits

rGa1de391dae8b: [LV][NFC-ish] Allow vector widths over 256 elements

Summary

The assertion that vector widths are <= 256 elements was hard wired in the LV code. Eg, VE allows for vectors up to 512 elements. Test again the TTI vector register bit width instead - this is an NFC for non-asserting builds.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	400 ms	linux > HWAddressSanitizer-x86_64.TestCases::sizes.cpp

Event Timeline

simoll created this revision.Nov 16 2020, 12:41 AM

Herald added a project: Restricted Project. · View Herald TranscriptNov 16 2020, 12:41 AM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

simoll requested review of this revision.Nov 16 2020, 12:41 AM

Harbormaster completed remote builds in B78926: Diff 305422.Nov 16 2020, 1:13 AM

I think removing this is good idea, but I'm not sure why the maximum vector size was limited to 64 and recently jumped up to 256. So, I cannot say LGTM atm. Does anyone know background on this?

Here is that prior commit that changed this - i guess, we can just commit our change, if this looks good to you.

fhahn added inline comments.Nov 18 2020, 12:02 PM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5352	Could we just use `TTI->getRegisterBitWidth` as upper bound? That would allow us to keep the assert as a sanity check.

Followed @fhahn 's suggestion to test against the widest vector register width instead.

I thought it's a good idea when I hear it from @fhahn, but... I think It's not a good idea since 1) WidestRegister holds bit width, 2) MaxVectorSize is calculated from TTI->getRegisterBitWidth anyway.

Harbormaster completed remote builds in B79422: Diff 306330.Nov 19 2020, 1:18 AM

In D91518#2404870, @kaz7 wrote:

I thought it's a good idea when I hear it from @fhahn, but... I think It's not a good idea since 1) WidestRegister holds bit width, 2) MaxVectorSize is calculated from TTI->getRegisterBitWidth anyway.

I think it somewhat preserves the spirit of the assertion, which IIUC was added to ensure none of the calculations above go rouge on lead to huge vectorization factors. The way the computation is supposed to work, dividing the widest register in bits by the smallest possible type width in bits (1) seems a suitable upper bound to preserve the spirit of the assert (I'd say it's debatable whether the assert itself adds a lot of protection, but it does at least add a little bit).

This LGTM, but happy to discuss this further.

This revision is now accepted and ready to land.Nov 19 2020, 1:37 AM

In D91518#2404924, @fhahn wrote:

dividing the widest register in bits by the smallest possible type width in bits (1) seems a suitable upper bound to preserve the spirit of the assert

That's make sense. I understand now. Thank you for explanations.

This revision was landed with ongoing or failed builds.Nov 19 2020, 1:58 AM

Closed by commit rGa1de391dae8b: [LV][NFC-ish] Allow vector widths over 256 elements (authored by simoll). · Explain Why

This revision was automatically updated to reflect the committed changes.

simoll added a commit: rGa1de391dae8b: [LV][NFC-ish] Allow vector widths over 256 elements.

The current versions seems a good middle ground between computing a tighter bound and abolishing the assertion altogether. Trying to be smart in assertions is usually a bad idea because people might pull the carpet (underlying assumptions) from under you some day.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Vectorize/

LoopVectorize.cpp

5 lines

Diff 306330

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 5,343 Lines • ▼ Show 20 Lines	LoopVectorizationCostModel::computeFeasibleMaxVF(unsigned ConstTripCount) {
// Note that both WidestRegister and WidestType may not be a powers of 2.		// Note that both WidestRegister and WidestType may not be a powers of 2.
unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);		unsigned MaxVectorSize = PowerOf2Floor(WidestRegister / WidestType);

LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType		LLVM_DEBUG(dbgs() << "LV: The Smallest and Widest types: " << SmallestType
<< " / " << WidestType << " bits.\n");		<< " / " << WidestType << " bits.\n");
LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "		LLVM_DEBUG(dbgs() << "LV: The Widest register safe to use is: "
<< WidestRegister << " bits.\n");		<< WidestRegister << " bits.\n");

assert(MaxVectorSize <= 256 && "Did not expect to pack so many elements"		assert(MaxVectorSize <= WidestRegister &&
fhahnUnsubmitted Not Done Reply Inline Actions Could we just use `TTI->getRegisterBitWidth` as upper bound? That would allow us to keep the assert as a sanity check. fhahn: Could we just use `TTI->getRegisterBitWidth` as upper bound? That would allow us to keep the…
		"Did not expect to pack so many elements"
" into one vector!");		" into one vector!");
if (MaxVectorSize == 0) {		if (MaxVectorSize == 0) {
LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");		LLVM_DEBUG(dbgs() << "LV: The target has no vector registers.\n");
MaxVectorSize = 1;		MaxVectorSize = 1;
return ElementCount::getFixed(MaxVectorSize);		return ElementCount::getFixed(MaxVectorSize);
} else if (ConstTripCount && ConstTripCount < MaxVectorSize &&		} else if (ConstTripCount && ConstTripCount < MaxVectorSize &&
isPowerOf2_32(ConstTripCount)) {		isPowerOf2_32(ConstTripCount)) {
// We need to clamp the VF to be the ConstTripCount. There is no point in		// We need to clamp the VF to be the ConstTripCount. There is no point in
// choosing a higher viable VF as done in the loop below.		// choosing a higher viable VF as done in the loop below.
▲ Show 20 Lines • Show All 3,341 Lines • Show Last 20 Lines