This is an archive of the discontinued LLVM Phabricator instance.

[LV] Fix crash when computing max VF too early

Authored by c-rhodes on Jan 16 2021, 9:00 AM.



D90687 introduced a crash:

llvm::LoopVectorizationCostModel::computeMaxVF(llvm::ElementCount, unsigned int): Assertion `WideningDecisions.empty() && Uniforms.empty() && Scalars.empty() && "No decisions should have been taken at this point"' failed.

when compiling the following C code:

typedef struct {
char a;
} b;

b *c;
int d, e;

int f() {
  int g = 0;
  for (; d; d++) {
    e = 0;
    for (; e < c[d].a; e++)
  return g;


clang -Os -target hexagon -mhvx -fvectorize -mv67 testcase.c -S -o -

This occurred since prior to D90687 computeFeasibleMaxVF would only be
called in computeMaxVF when a scalar epilogue was allowed, but now it's
always called. This causes the assert above since computeFeasibleMaxVF
collects all viable VFs larger than the default MaxVF, and for each VF
calculates the register usage which results in analysis being done the
assert above guards against. This can occur in computeFeasibleMaxVF if
TTI.shouldMaximizeVectorBandwidth and this target hook is implemented in
the hexagon backend to always return true.

Reported by @iajbar.

Diff Detail

Event Timeline

c-rhodes created this revision.Jan 16 2021, 9:00 AM
c-rhodes requested review of this revision.Jan 16 2021, 9:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 16 2021, 9:00 AM
c-rhodes added a subscriber: fhahn.Jan 18 2021, 6:47 AM
c-rhodes added inline comments.

@fhahn Any thoughts on this? This assert is firing since D90687, details above. I'm just wondering having looked at your patch D78298 if there's a more sensible fix here, maybe to get rid of the assert and call invalidateCostModelingDecisions?

fhahn added inline comments.Jan 21 2021, 9:15 AM

I think it is still preferable to avoid spending time computing cost-model decisions unnecessarily.

Duplicating the MaxVF computation at a few places to ensure we only do it when needed looks good to me.


please add some check lines to make sure something sensible happens, besides not crashing.


Personally I think the C source code mostly just adds clutter. Ideally the IR source would be concise and with descriptive variable names, so it should be relatively easy to see what's going on without C source; especially if the C source looks like something auto-generated/C-reduced. Also, there's no guarantee that Clang will generate the same IR in future versions.


can this function be a bit more simplified & cleaned up? I'll leave some suggestions below & I think the basic block names could be improved & shortened,


none of this should be needed to reproduce the failure, you should be able to just use a constant instead of as incoming value below.


can we instead just pass a pointer argument?


instead of using a struct, can this just be plain pointer to i8 or something like that?


are all those compares/extensions/selects needed?


not needed?


not needed?

c-rhodes updated this revision to Diff 319279.Jan 26 2021, 6:10 AM

@fhahn I've simplified the test, thanks for the comments!

fhahn accepted this revision.Feb 1 2021, 1:14 AM

LGTM, thanks! This should probably also go onto the 12.x release branch.

This revision is now accepted and ready to land.Feb 1 2021, 1:14 AM
This revision was automatically updated to reflect the committed changes.

LGTM, thanks! This should probably also go onto the 12.x release branch.

Landed, cheers!

I've created to get it cherry-picked onto the 12.x release branch.