- User Since
- Feb 13 2015, 12:29 AM (136 w, 3 d)
Aug 14 2017
Aug 9 2017
@RKSimon I'll find a way to make that fast, or find an alternative like activating it only in some specific situations. In addition to solving my specific problem, it seems to improve numerous other things, especially for the AMD backend. In any case, I think D33840 is a good thing either way and we should proceed with it.
Jul 31 2017
OK, benchmarks. Compiling clang from a bc containing clang in its entierety. With the patch:
Jul 29 2017
Jul 10 2017
I see nothing obviously wrong with this patch. It'd be better if someone with more ARM experience than I do could review it.
Jun 27 2017
I got sidetracked into some other very urgent project during June. I'll do my best to provide that number as soon as I can. I don't expect the gain, as this, to be that great, but who knows ? What I'm after here is to reduce the impact of D33587 on performances.
Jun 22 2017
I have one more comment. It's looking good.
Jun 11 2017
Jun 6 2017
I don't know go that much so I'll give the opportunity to someone that knows better checks this out.
Jun 5 2017
So maybe we should change floating point undefs into NaNs and let everything else unfold ?
- My best guess is that it is an oversight, but there would be a reason I'm not aware of.
- Yes because of legalization.
- The problem i intended to solve was the (fadd constant, undef) where it gets flipped again and again. fsub and fdiv/frem are not commutative ops so that problem doesn't occur with them.
Jun 4 2017
Jun 3 2017
Jun 2 2017
Do it in all cases.
make comment clearer.
Jun 1 2017
@inouehrs That wouldn't be the same as this will bail when no more combine is found.
@davide It's more like 3% as far as I can tell. The sad truth here, looking into it, is that there are a lot of combine that and undo themselves and most of the perf hit come from there. These transform are the very reason why i limited the number of iterations to begin with.
Rebase, fix merge conflicts.
Remove setcc change in combineX86ADD .
This is not relevent anymore.
This is not relevent anymore.
May 31 2017
May 30 2017
There is an assert within getContainedType so I'm not sure how that's different.
There is also facilities to do it for pointers and alike, and I assume you need to check what kind of type you are iterating on, because you have no way to know if the index is valid or not when you don't, so I'm not sure what is the use case here. The C++ code seems to use that to do an iterator interface, but there are no iterators in C.
What's the difference betwenn this and LLVMStructGetTypeAtIndex ?
@RKSimon I don't think it fit the scope of this diff. Plus I'm not that familiar with these backends, so it'll take time and gate other work.
May 29 2017
Check result count.
May 28 2017
Check the kind of bool being generated as carry.
So on the full clang bc, post optimization:
Alright so I ended up being able to create a lto build of clang. I'm not sure how to get the bc file to do the benchmarking.
May 27 2017
I'm getting a bunch of
May 26 2017
Use getOperationAction as isOperationLegalOrCustom is breaking the AMDGPU backend for some reason. Remove formating change in wide-integer-cmp.ll .
I don't have a specific plan except it needs to be done and I got to figure out how to do it :)
Also remove SETCCE .
I usually do not work with clang. Do you have instructions I can follow to get that bc file ?
@RKSimon Most of these case aren't because node are not added to the worklist, but because of pattern that are somewhat deep - such as anything depending on KnownBits . Consider the following DAG:
Improve checks in constant_sextload_v8i16_to_v8i32 .
I'd rather remove it as part of D33390 than this one. Just in case something goes wrong, it'll be easier to revert.
Do not match sext and any_ext as this is invalid.
May 25 2017
OK I was able to dig more. Something is screwed up with my test case. This is indeed not doing the right thing with the carry.
May 24 2017
I have no idea what clang is doing there. It seems like the intrinsic do not map directly to the uaddo/usubo. See for yourself the generated IR (in clang 3.8 that's what I have available ATM):
May 22 2017
May 21 2017
This kicks in for fold-pcmpeqd-2.ll . Looking at the assembly, things looks good, but I'm not really sure what this test is testing for, so if someone familiar could advice on what to do, that'd be great. @chandlerc , @dblaikie you worked on that, can you advice ?