This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Increase cost of unaligned double and float loads
AcceptedPublic

Authored by samtebbs on Jun 1 2023, 2:26 AM.

Details

Summary

Unaligned double and float loads produce a lot of stack stores and loads that are not accounted for in the cost model yet. This patch adjusts the cost model to account for those extra operations.

Diff Detail

Event Timeline

samtebbs created this revision.Jun 1 2023, 2:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 2:26 AM
samtebbs requested review of this revision.Jun 1 2023, 2:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2023, 2:26 AM
dmgreen accepted this revision.Jun 1 2023, 11:11 PM

Sounds good. Remember to upload with context! Other than adding a few extra tests this LGTM.

llvm/test/Analysis/CostModel/ARM/load_store.ll
73

It might be worth adding a test with an alignment of 8 to be sure the cost remains OK. And add an unaligned float test.

This revision is now accepted and ready to land.Jun 1 2023, 11:11 PM

Just nits, lgtm too.

One question about my favourite fp16 data type while we are at it. Does or can that benefit from some handling too here?

And one last question about this:

Unaligned float loads introduce an extra store and a register move
The codegen for these unaligned loads could be improved,

Was just curious if this is visible in an existing test and if there's a FIXME/TODO as a comment.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1498

Nit: punctation.

1501

Same

1502

Do you need to check Alignment ( .. && Alignment && ..) when you use valueOrOne()?

Sounds good. Remember to upload with context!

Oh yeah! It's been a while since I last submitted a patch.

Just nits, lgtm too.

Thanks :)

One question about my favourite fp16 data type while we are at it. Does or can that benefit from some handling too here?

I haven't considered fp16 since it didn't come up when I noticed this issue, but I can certainly add checking for it.

And one last question about this:

Unaligned float loads introduce an extra store and a register move
The codegen for these unaligned loads could be improved,

Was just curious if this is visible in an existing test and if there's a FIXME/TODO as a comment.

I noticed this issue after implementing something downstream and it has been tough to make upstream clang make unaligned loads to act as a test case for this. If you can think of some C that produces unaligned loads then I'd gladly add it.

llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp
1502

I believe we do, since if the instruction doesn't have an alignment operand, it uses the type's natural alignment and won't have the extra stack instructions.

llvm/test/Analysis/CostModel/ARM/load_store.ll
73

Good idea

samtebbs updated this revision to Diff 527795.Jun 2 2023, 3:30 AM
samtebbs marked 3 inline comments as done.

clang generates unaligned load/store for something like the following:

float f(char* x) { float a; __builtin_memcpy(&a, x, sizeof(a)); return a; }
float ff(char* x, float f) { __builtin_memcpy(x, &f, sizeof(f)); }
double d(char* x) { double a; __builtin_memcpy(&a, x, sizeof(a)); return a; }
double dd(char* x, double f) { __builtin_memcpy(x, &f, sizeof(f)); }

Looking at the generated code for this, not sure how you get a cost of "6" for double; I only count 4 instructions (https://godbolt.org/z/6rodoPMqj)

Also, with NEON, we generate vld1/vst1 for unaligned double load/store.

clang generates unaligned load/store for something like the following:

float f(char* x) { float a; __builtin_memcpy(&a, x, sizeof(a)); return a; }
float ff(char* x, float f) { __builtin_memcpy(x, &f, sizeof(f)); }
double d(char* x) { double a; __builtin_memcpy(&a, x, sizeof(a)); return a; }
double dd(char* x, double f) { __builtin_memcpy(x, &f, sizeof(f)); }

Thanks for that, I'll add some of those as a test case.

Looking at the generated code for this, not sure how you get a cost of "6" for double; I only count 4 instructions (https://godbolt.org/z/6rodoPMqj)

Perhaps my use case was a bit more complex and resulted in more instructions. I'll change it to 4 and see how it works.

Matt added a subscriber: Matt.Aug 28 2023, 12:23 PM