Unaligned double and float loads produce a lot of stack stores and loads that are not accounted for in the cost model yet. This patch adjusts the cost model to account for those extra operations.
Details
Diff Detail
Event Timeline
Sounds good. Remember to upload with context! Other than adding a few extra tests this LGTM.
llvm/test/Analysis/CostModel/ARM/load_store.ll | ||
---|---|---|
73 | It might be worth adding a test with an alignment of 8 to be sure the cost remains OK. And add an unaligned float test. |
Just nits, lgtm too.
One question about my favourite fp16 data type while we are at it. Does or can that benefit from some handling too here?
And one last question about this:
Unaligned float loads introduce an extra store and a register move
The codegen for these unaligned loads could be improved,
Was just curious if this is visible in an existing test and if there's a FIXME/TODO as a comment.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1498 | Nit: punctation. | |
1501 | Same | |
1502 | Do you need to check Alignment ( .. && Alignment && ..) when you use valueOrOne()? |
Oh yeah! It's been a while since I last submitted a patch.
Thanks :)
One question about my favourite fp16 data type while we are at it. Does or can that benefit from some handling too here?
I haven't considered fp16 since it didn't come up when I noticed this issue, but I can certainly add checking for it.
And one last question about this:
Unaligned float loads introduce an extra store and a register move
The codegen for these unaligned loads could be improved,Was just curious if this is visible in an existing test and if there's a FIXME/TODO as a comment.
I noticed this issue after implementing something downstream and it has been tough to make upstream clang make unaligned loads to act as a test case for this. If you can think of some C that produces unaligned loads then I'd gladly add it.
llvm/lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
1502 | I believe we do, since if the instruction doesn't have an alignment operand, it uses the type's natural alignment and won't have the extra stack instructions. | |
llvm/test/Analysis/CostModel/ARM/load_store.ll | ||
73 | Good idea |
clang generates unaligned load/store for something like the following:
float f(char* x) { float a; __builtin_memcpy(&a, x, sizeof(a)); return a; } float ff(char* x, float f) { __builtin_memcpy(x, &f, sizeof(f)); } double d(char* x) { double a; __builtin_memcpy(&a, x, sizeof(a)); return a; } double dd(char* x, double f) { __builtin_memcpy(x, &f, sizeof(f)); }
Looking at the generated code for this, not sure how you get a cost of "6" for double; I only count 4 instructions (https://godbolt.org/z/6rodoPMqj)
Thanks for that, I'll add some of those as a test case.
Looking at the generated code for this, not sure how you get a cost of "6" for double; I only count 4 instructions (https://godbolt.org/z/6rodoPMqj)
Perhaps my use case was a bit more complex and resulted in more instructions. I'll change it to 4 and see how it works.
Nit: punctation.