Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
Ignoring for a moment how I would like to use this, I think this is a nice self-contained improvement (or perhaps even bug fix) so that getIntstructionCost/getIntrinsicCost returns something an awful lot more reasonable for memcpy's. I think the dependent refactoring patches are general improvements too. So, a friendly ping, any opinions on this?
Ah, oops, that was a experiment in my local branch. I will get rid of this, and upload a new diff soon. So, thanks for spotting this, you haven't missed anything!
Run code-size benchmark CSiBe, triggered an assert in function FindOptimalMemOpLowering, and fixed that.
This are the code-size results:
App | Before | AFTER | Diff |
---|---|---|---|
csibe/OpenTCP-1.0.4 | 21879 | 21895 | 0.07% |
csibe/bzip2-1.0.2 | 47350 | 47390 | 0.08% |
csibe/cg_compiler_opensrc | 96212 | 96184 | -0.03% |
csibe/compiler | 19286 | 19297 | 0.06% |
csibe/jikespg-1.3 | 170538 | 170199 | -0.20% |
csibe/jpeg-6b | 96431 | 96441 | 0.01% |
csibe/libpng-1.2.5 | 77702 | 77902 | 0.26% |
csibe/lwip-0.5.3.preproc | 68611 | 68627 | 0.02% |
csibe/mpeg2dec-0.3.1 | 38988 | 38924 | -0.16% |
csibe/mpgcut-1.1 | 7196 | 7198 | 0.03% |
csibe/teem-1.6.0-src | 1033815 | 1034595 | 0.08% |
csibe/ttt-0.10.1.preproc | 12711 | 12703 | -0.06% |
csibe/unrarlib-0.4.0 | 10394 | 10415 | 0.20% |
csibe/zlib-1.1.4 | 29229 | 29228 | 0.00% |
total | 1730343 | 1730999 | 0.04% |
Overall, small differences and the changes are neutral, but I will look into libpng to see what is happening there.
Removed the change in findOptimalMemOpLowering, because that belongs to D59766 which I have just updated.
About the regression: in my local tree had a logic error in findOptimalMemOpLowering, i.e. the condition to bail early was wrong. It was right here in the upstream diffs, so somehow I managed to mess that up).
Now, the good news is that the regressions disappear, In fact, there are no changes at all anymore. Because of the logic error, is was triggering more often, also for other intrinsics. Thus, this is now almost a non-functional change, which is what I was expecting as this is just the initial plumbing to model the cost correctly.
Investigating the "regressions" was a useful exercise though. I learned that in libPNG it was actually making the right decisions, but we were unlucky with register allocation allocating high registers like R9 and R10 not allowing narrow encodings for instructions LDRB and STRB. The other regression, in TEEM, was because it was also transforming memset intrinsics to eabi_memclr4 calls. This was just wrong, and was an result of not bailing early in findOptimalMemOpLowering. But this shows I definitely want to look at removing this early bail in a follow-up patch, because with the memclr things fixed, it should be a win overall and reduce code-size.
But summarising: this is not causing any changes in codesize for two code-bases CSiBe and Mbed, and thus is almost a non-functional change, which was my intention.
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
412 ↗ | (On Diff #193888) | Can you explain why this is more than 6? I would imagine it would be something like 4; 1 (for the call) + a few for argument setup. |
436 ↗ | (On Diff #193888) | Does this need to be clang-formatted? |
test/Analysis/CostModel/ARM/memcpy.ll | ||
373 ↗ | (On Diff #196444) | Is it worth adding a few strict-align tests too? |
Hi Dave, many thanks for taking a look!
Can you explain why this is more than 6? I would imagine it would be something like 4; 1 (for the call) + a few for argument setup.
6 was arbitrary, I wanted to make it slightly more costly than 'TCC_Expensive', but don't have good reasons at the moment, so I've changed it back to 4.
Does this need to be clang-formatted?
I did and made a few changes. But to keep the inline comments near the arguments readable, I kept them on separate lines (clang-format puts them after each other).
Is it worth adding a few strict-align tests too?
Yes, thanks for the suggestion. Done!
lib/Target/ARM/ARMTargetTransformInfo.cpp | ||
---|---|---|
420 ↗ | (On Diff #196632) | How come this is needed now? I was under the impression that findOptimalMemOpLowering handled allows unaligned cases. And if the memory is aligned (but size >= 8), we can still expand the memcpy. |
434 ↗ | (On Diff #196632) | Should we be using the same value as getMaxStoresPerMemcpy here? That way we may not need the "Size >= 32 && ... " checks above. |
Thanks for the suggestion to use getMaxStoresPerMemcpy; that greatly simplified and cleaned up things. Have added more tests too (all functions are now tested with/without strict align).