Pass triple for all the test runs
- rZORG LLVM Github Zorg
- Wed, Dec 4, 7:56 PM
- Git
- rT test-suite
- Oct 21 2019, 9:44 PM
- Subversion
- rPLO Polly
- 3,936 Commits
- ·
- Restricted Project
- Oct 21 2019, 8:48 AM
- Subversion
- Importing...
Today
I do find it odd that there is a PATH fallback in the existing code in the first place. I agree that basically no compiler other than the "system" compiler should ever use it (and also even the concept of the "system" compiler really only makes much sense on systems like Linux and BSDs where compiling things for the local system is common). I guess the other option here would be to just require that wasm-opt be in the same directory as clang, which we can arrange in wasi-sdk or wherever.
In D71064#1771063, @lebedev.ri wrote:In D71064#1771030, @spatel wrote:Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mbIf I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.
So the comment is that the undo fold needs to be adjusted first, to fire for non-cmp i1 vectors on aarch64 and powerpc64le?
lgtm
In D71064#1771030, @spatel wrote:In D71064#1770969, @lebedev.ri wrote:In D71064#1770937, @spatel wrote:Did you confirm that codegen is equal or better for these cases?
I think we have DAGCombiner reversals for this transform, but some targets that seem like they would benefit have not enabled the TLI hook.I didn't yet. So, as far as i can tell, these are all the interesting cases:
(i'm looking at @t0_new_canon vs @t1_old_canon since that is the only change in this patch)
https://godbolt.org/z/vbb25Q - no regression for x86
https://godbolt.org/z/Htg7m5 - aarch64 also looks ok?
https://godbolt.org/z/xmP6JV - arm good?
https://godbolt.org/z/vNrNJ8 - thumb good?So i'd say everything is already covered by backend undo folds?
Let me know if i'm missing the point here?Scalar looks same all-around. Vector shows some potential diffs:
https://godbolt.org/z/y3E-mbIf I'm seeing it correctly, we always do better on the typical case where the bool vector is produced by a compare, but we might do worse if we don't have that cmp and don't have AssertSext knowledge.
Did performance test and I saw 0.4% improvement in an internal benchmark. That is a good improvement, thanks for the change!
Remove a typo after rebase (uimm5s2 vs uim5s4)
I agree with David about splitting. There are 4 issues mentioned in this patch, and they are not related. (I'm not sure if 1 or 3 could be combine or not)
Putting them together makes it hard to review, and hard to determine if the test case actually covered the issue that's raised.
Let's make separate patches if possible.
I rerun perf test and I don't see any performance change. Last run I saw very small improvement on latency in a benchmark. This is fine since the benchmark has some fluctuation by itself.
LGTM - thanks!
This will definitely work; Once you got the testcase, it might be good to check whether there is a more targeted root cause that we could fix and assert on EndLoc here.
LGTM!
In D71064#1770969, @lebedev.ri wrote:In D71064#1770937, @spatel wrote:Did you confirm that codegen is equal or better for these cases?
I think we have DAGCombiner reversals for this transform, but some targets that seem like they would benefit have not enabled the TLI hook.I didn't yet. So, as far as i can tell, these are all the interesting cases:
(i'm looking at @t0_new_canon vs @t1_old_canon since that is the only change in this patch)
https://godbolt.org/z/vbb25Q - no regression for x86
https://godbolt.org/z/Htg7m5 - aarch64 also looks ok?
https://godbolt.org/z/xmP6JV - arm good?
https://godbolt.org/z/vNrNJ8 - thumb good?So i'd say everything is already covered by backend undo folds?
Let me know if i'm missing the point here?
LGTM with minor nit.
I'm trying to get access to commit the patch, but it would be great if you can get someone to commit it.
[suggestion] Add a test case to check for "not rotated" to be rejected.
In D68717#1763170, @dmgreen wrote:Sorry for the delay. Now that I have added some extra Neon codegen (and some extra tests), this is how this looks.
The unsigned cases still look like a clear win. The signed cases looks worse (although you could argue in cases that this would simplify if part of a loop).
WDYT? Remove the signed part and keep the unsigned?
Reshuffle the loop structure, early continue rather than an indented block. I've also moved the PHI/EHpad test ahead of the domination check for efficiency.