On failing to find sequences that can be converted into dual macs, try to find sequential 16-bit loads that are used by muls which we can then use smultb, smulbt, smultt with a wide load.
Details
- Reviewers
SjoerdMeijer rnk javed.absar efriedma - Commits
- rG2ef3c0dad6a9: [ARM] bottom-top mul support in ARMParallelDSP
rGa7b2405b06c9: [ARM] bottom-top mul support ARMParallelDSP
rG7b84fd78478f: [ARM] bottom-top mul support in ARMParallelDSP
rL344693: [ARM] bottom-top mul support in ARMParallelDSP
rL342870: [ARM] bottom-top mul support ARMParallelDSP
rL342210: [ARM] bottom-top mul support in ARMParallelDSP
Diff Detail
- Repository
- rL LLVM
Event Timeline
lib/Target/ARM/ARMParallelDSP.cpp | ||
---|---|---|
662 ↗ | (On Diff #165086) | I think the alias checks are the same as in MatchSMLAD, so it looks like we can share that, and avoid the duplication here. I could definitely live with that being done in a follow up patch. |
test/CodeGen/ARM/paralleldsp-top-bottom.ll | ||
28 ↗ | (On Diff #165086) | These tests are not ideal, in the sense we are not directly checking for the smultb, smulbt, smultt patterns here. Instead, we rely on these ashr + mul patterns that we now generate, to be picked up and turned in to smlab variants later, like it is tested in test/CodeGen/ARM/smul.ll. And thus, since breaking these shr + mul rewrite patterns would give problems in smul.ll, I think these tests in their current form are acceptable as I understood that turning these tests in to llc tests would complicate things a bit. |
lib/Target/ARM/ARMParallelDSP.cpp | ||
---|---|---|
662 ↗ | (On Diff #165086) | Will do. |
The patch caused an assert on some vectorized code because I missed a check that the muls are plain integer types. parallel-dsp-top-bottom-neg-vec.ll has been added which was the reproducer provided.
lib/Target/ARM/ARMParallelDSP.cpp | ||
---|---|---|
650 ↗ | (On Diff #166256) | Diff'ing this was a bit difficult, but these are the 2 new lines, which matches with your last comment. Looks like a straightforward fix to me. |
Two breaking assumptions were that:
- the base load would be before the offset load.
- each load would only have one user - this is true but I also really meant and assumed that the sext of the load has one user.
So, I've rewritten the sext user replacement code as the previous version was horrible. I've also added some code to reorder the load and address calculations. The way I've figured out the ordering looks bad, but I couldn't find a simple way in the APIs to find this kind of info out.
I've also created a ParallelDSP subdirectory for the tests and moved the smlad stuff into it as well.
lib/Target/ARM/ARMParallelDSP.cpp | ||
---|---|---|
719 ↗ | (On Diff #167745) | Nit: this loop is doing the same things as the previous loop, can we make and call a function? |
730 ↗ | (On Diff #167745) | Nit: looks like this is workaround for not having comparison operators on these iterators. Perhaps we can put this in little helper function too, sweep it under the carpet a little bit, but with a TODO/FIXME and some comments added to it. If a helper function is too much (this function MatchTopBottomMuls is becoming rather big), than at lease some comments here. |