This is an archive of the discontinued LLVM Phabricator instance.

[Driver][AArch64]Add driver support for neoverse-512tvb target
ClosedPublic

Authored by CarolineConcatto on Oct 24 2021, 11:59 PM.

Details

Summary

The support for neoverse-512tvb mirrors the same option available in GCC[1].
There is no functional effect for this option yet.
This patch ensures the driver accepts "-mcpu=neoverse-512tvb", and enough
plumbing is in place to allow the new option to be used in the future.

[1]https://gcc.gnu.org/onlinedocs/gcc/AArch64-Options.html

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.Oct 24 2021, 11:59 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptOct 24 2021, 11:59 PM

This is a bit of a shame. I was hoping we wouldn't need the same hacks as GCC. The llvm cost modelling can work quite differently at times to GCC and I didn't think we were close enough to optimal code to need to worry about these kinds of differences. I guess having the option is useful for consistency.

llvm/lib/Support/Host.cpp
216 ↗(On Diff #381853)

This doesn't sound right - for a fake cpu to work with -mcpu=native.

llvm/lib/Target/AArch64/AArch64Subtarget.cpp
168

Should this have Loop Alignment too?

Is the interleave factor higher due to the 512bit vector bandwidth?

llvm/lib/Target/ARM/ARM.td
1412 ↗(On Diff #381853)

Are we sure gcc has a -mcpu=neoverse-512tvb option for Arm? Or is it AArch64 only?

CarolineConcatto retitled this revision from [Driver][AArch64]Add driver support for neoverse-512tvb target to [WIP][Driver][AArch64]Add driver support for neoverse-512tvb target.Oct 25 2021, 5:09 AM
  • remove neoverse-512tvb from Host.cpp
CarolineConcatto retitled this revision from [WIP][Driver][AArch64]Add driver support for neoverse-512tvb target to [Driver][AArch64]Add driver support for neoverse-512tvb target.Oct 25 2021, 8:03 AM

This one might get a VScaleForTuning:
https://reviews.llvm.org/D112459
Do you need this as well?

  • Rebase and remove support on for ARM 32 bits
CarolineConcatto marked an inline comment as done.Oct 26 2021, 5:25 AM

Thank you for your review @dmgreen and @tschuett.
I rebase the patch, now VScaleForTuning is being set.
And I removed support for neoverse-512tvb from Arm 32 bits.
@paulwalker-arm pointed me that neoverse-512tvb is only supported by AArch64.

llvm/lib/Support/Host.cpp
216 ↗(On Diff #381853)

Good catch, I should have removed that.

Thanks. If the cpu has a 512 bit total vector bandwidth, should the VScaleForTuning be 1 or 2 (or higher)? llvm doesn't usually deal with total bandwidth a lot, perhaps not as much as it should.

@david-arm any thoughts?

Thanks. If the cpu has a 512 bit total vector bandwidth, should the VScaleForTuning be 1 or 2 (or higher)? llvm doesn't usually deal with total bandwidth a lot, perhaps not as much as it should.

@david-arm any thoughts?

The total vector bandwidth includes unrolling so currently having VScaleForTuning=1 and MaxInterleaveFactor=4 implies 512 tvb. If the target has >128bit vectors then vector loops will likely have more work than they can handle in parallel but as long as that does not negatively affect register pressure it shouldn't be a problem.

sdesmalen accepted this revision.Oct 27 2021, 3:08 AM

LGTM with nit addressed.

llvm/lib/Target/AArch64/AArch64.td
840

nit: s/512/512-TVB/

This revision is now accepted and ready to land.Oct 27 2021, 3:08 AM

The total vector bandwidth includes unrolling so currently having VScaleForTuning=1 and MaxInterleaveFactor=4 implies 512 tvb. If the target has >128bit vectors then vector loops will likely have more work than they can handle in parallel but as long as that does not negatively affect register pressure it shouldn't be a problem.

That doesn't fit with my understanding of how VScaleForTuning is currently used, and vectorizing/unrolling too far can easily cause the vector part to be skipped for many loop counts, falling back to the scalar part. But that all sounds fine to me for what this is. Cheers.

This revision was landed with ongoing or failed builds.Oct 28 2021, 1:10 AM
This revision was automatically updated to reflect the committed changes.