This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fix inline memcpy trip count sequence
ClosedPublic

Authored by dmgreen on May 17 2021, 7:40 AM.

Details

Summary

The trip count for a memcpy/memset will be n/16 rounded up to the nearest integer. So (n+15)>>4. The old code was including a BIC too, to clear one of the bits, which does not seem correct. This remove the extra BIC.

Note that ideally this would never actually be generated, as in the creation of a tail predicated loop we will DCE that setup code, letting the WLSTP perform the trip count calculation. So this doesn't usually come up in testing (and apparently the ARMLowOverheadLoops pass does not do any sort of validation on the tripcount). Only if the generation of the WLTP fails will it use the incorrect BIC instructions.

Diff Detail

Event Timeline

dmgreen created this revision.May 17 2021, 7:40 AM
dmgreen requested review of this revision.May 17 2021, 7:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 17 2021, 7:40 AM
malharJ accepted this revision.May 17 2021, 1:48 PM

Thanks for spotting and correcting this @dmgreen,

I was trying to go for BIC n, n, 15 (to clear off the last 4 bits, which would also have been redundant given the LSR after it).

This revision is now accepted and ready to land.May 17 2021, 1:48 PM
malharJ added inline comments.May 17 2021, 4:26 PM
llvm/lib/Target/ARM/ARMISelLowering.cpp
11204

I think this might need to be &ARM::GPRlrRegClass ...

I recall ARMLowOverheadLoops::IterationCountDCE() not working for me correctly
when I had used rGPR.

But if the tests work correctly then I guess it's fine leaving it as rGPR.

Thanks!

llvm/lib/Target/ARM/ARMISelLowering.cpp
11204

Yeah I saw that. I put up D102620 as a fix, which seems to be the cause of it not removing the instructions that it should be able to.

This revision was landed with ongoing or failed builds.May 24 2021, 3:02 AM
This revision was automatically updated to reflect the committed changes.