Optimize multiplication with some specific immediates to
a pair of alsl.
Details
Diff Detail
Event Timeline
This is good strength reduction overall, thanks for the insight!
Though maybe the exact patterns can be automatically generated with nested loops, for reduced mental burden to prove correctness? I've sketched something and it seems doable:
>>> alsl = lambda j, k, imm: (j << imm) + k >>> a = [alsl(alsl(1, 1, i), alsl(1, 1, i), j) for j in range(1,5) for i in range(1,5)] >>> a [9, 15, 27, 51, 15, 25, 45, 85, 27, 45, 81, 153, 51, 85, 153, 289] >>> list(sorted(set(a))) [9, 15, 25, 27, 45, 51, 81, 85, 153, 289] >>> b = [alsl(alsl(1, 1, i), 1, j) for j in range(1,5) for i in range(1,5)] >>> b [7, 11, 19, 35, 13, 21, 37, 69, 25, 41, 73, 137, 49, 81, 145, 273] >>> ab = set(a).union(set(b)) >>> len(ab) 24 >>> list(sorted(set(a).intersection(set(b)))) [25, 81] >>> c = [alsl(1, alsl(1, 1, i), j) for j in range(1,5) for i in range(1,5)] >>> c [5, 7, 11, 19, 7, 9, 13, 21, 11, 13, 17, 25, 19, 21, 25, 33] >>> list(sorted(set(c))) [5, 7, 9, 11, 13, 17, 19, 21, 25, 33] >>> set(c).difference(ab) {33, 5, 17} >>> all = set(c).union(ab).difference({3, 5, 9, 17}) >>> len(all) 24 >>> list(sorted(all)) [7, 11, 13, 15, 19, 21, 25, 27, 33, 35, 37, 41, 45, 49, 51, 69, 73, 81, 85, 137, 145, 153, 273, 289]
So basically, we can strength-reduce a total of 24 different constant-multiplications with two alsl's:
- case 1: alsl T, X, X, i; alsl Y, T, T, j: 15, 25, 27, 45, 51, 81, 85, 153, 289
- case 2: alsl T, X, X, i; alsl Y, T, X, j: 7, 11, 13, 19, 21, 25, 35, 37, 41, 49, 69, 73, 81, 137, 145, 273
- case 3: alsl T, X, X, i; alsl Y, X, T, j: 7, 11, 13, 19, 21, 25, 33
Problem is that there are some overlaps between the 3 possible combinations, and some inside case 1 and 3. If we could somehow avoid producing conflicting rules then probably leveraging TableGen's loop and computation abilities would produce code that's easier to maintain. Otherwise, simplifying the code with some macros could also be beneficial.
llvm/lib/Target/LoongArch/LoongArchInstrInfo.td | ||
---|---|---|
840–841 | The inner if could be simplified into just return N1C->hasOneUse(). The outer return false could be kept though, for avoiding an overly complex single return expression. |
Thanks for your suggestion! Using foreach really makes the code more clear!
BTW: ALSL only accepts shift amount 1,2,3,4, the value 5 is not supported.
My pleasure. ;-)
BTW: ALSL only accepts shift amount 1,2,3,4, the value 5 is not supported.
It's just one of the Python idiosyncrasies: range(1, 5) really yields 1, 2, 3, 4. Just like how for (int i = 1; i < 5; i++) is the same in C.
Thanks for the improvements.
So this is only for the case 2 mentioned by @xen0n, right? Seems that the test for 81 is missing.
Will case 1 and case 3 be handled later?
Thanks for your comments.
- The missing case 81 is added.
- I will implement case 1 and case 3 later in another patch. BTW: Total amount of these two cases is small, and they even have redundant values with case 2, so how about implementing them without foreach, just standalone Pat one by one ?
As for point 2, fine by me. Case 1 would still have many remaining constants so a macro would go a long way (you could only go over the distinct Imm1 and Imm2 and auto-compute the source constant as you did for Case 2), and only 33 would be left for Case 3 so you may write the pattern straight-forward.
Unfortanately the case 3 you mentioned can not be implemented, besides other duplicates to case 2, the remaining immediate x * 33 will be optimized to (x << 5) + x.
The inner if could be simplified into just return N1C->hasOneUse(). The outer return false could be kept though, for avoiding an overly complex single return expression.