The logic in ARMParallelDSP is setup to merge two 16-bits loads into a 32-bit load and feed them into the smlads. This requires that four loads are combined for the four inputs, but there wasn't actually a check for this.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo