- User Since
- Jun 8 2016, 12:50 PM (76 w, 38 m)
LGTM with a minor comment. Thanks!
I think we will need a subtarget feature indicating whether the false-dependency on these instructions happens. For starters, does this happen in AMD processors?
Mon, Nov 20
One option is to run llc with -O0 which will disable passes which use InstructionSimplify, but this will require changing the CHECK's to expect different generated code. If the purpose of the test is to not crash in the Verifier, are the CHECK's needed?
Sun, Nov 19
With this patch i am seeing failures in test/CodeGen/AMDGPU/indirect-addressing-si.ll . This test include extractelement instructions with undef indices, which is what this patch targets. I would appreciate your help with advice on how we can modify these tests so that they can be used safely. Please bear in mind that i have no knowledge of the AMDGPU backend.
Rebase on ToT. NFC in this revision.
Address the last of Craig's comments:
- Thanks, @lsaba, for porting the pass to the new PassManager.
- Removed shrinkage of vector types until we sort out if it is generally allowed to shrink element types of vector operations.
- Some minor fixes to comments.
Thu, Nov 16
Address some of Craig's recent comments.
Sun, Nov 12
Thu, Nov 9
Commandeering this patch while Amjad is away on a few weeks of vacation.
Also sending a friendly ping to the reviewers. AFAIK, all comments were addressed as of the latest revision of this patch. Please let me know if i missed anything. Thanks.
Tue, Nov 7
Since the final commit of this patch, rL317483, the AVX2 buildbot is broken: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-avx2-linux/builds/1402
Mon, Nov 6
Last minute changes in the spirit of Simon's suggestion
Thu, Nov 2
Follow Simon's suggestion to pull the oth index scan into the loop.
Fix typo + style defects.
Thanks, Amjad! This patch LGTM, but i think it would be best to wait for an LGTM from one of the assigned reviewers.
Some more minor comments
Wed, Nov 1
Mon, Oct 30
Add support for floating-point types.
I know you decided to move this to a new pass, but here's a couple of more comments that will be relevant.
LGTM, with a minor concern that with this change FastISel is a bit slower because the bail-out condition was moved to a later point?
Sun, Oct 29
Thu, Oct 26
Amjad, some questions about where do we want this work to evolve to.
Tue, Oct 24
Mon, Oct 23
Oct 20 2017
Oct 19 2017
Oct 18 2017
Oct 17 2017
The changes in WidenVSELECTAndMask and the X86 test LGTM.
Oct 16 2017
Oct 10 2017
Oct 9 2017
Oct 1 2017
LGTM, and yes please add the assert
Sep 28 2017
Sep 27 2017
The title should have an X86 label, and can you please add a cover message in the summary?
LGTM with minor comments
Please add the avx512vl feature to the AVX512 test (in the parent patch) and rebase this patch. It will drastically improve some of the cases :).
Sep 25 2017
Please remember to create the diff with -U
Sep 13 2017
Can you please move the NFC to the Title:
[X86][Skylake][KNL] Updating code gen regression test to use the KNL and SKYLAKE prefixes. NFC.
Sep 12 2017
LGTM, and as you mentioned in the cover message, there is more room for improvement.
Sep 11 2017
If this patch is still in review please rebase it.
Please add some details about the combine in the Summary. It's better to put a description in addition to the links to make it easier to read the commit log.
Maybe something like:
umax(a,b) - b -> subus(a,b)
a - umin(a,b) -> subus(a,b)
Looks ready apart from the missing AVX512 testing Dave suggested.
Sep 7 2017
Intuitively, it seems to me that choosing a minimum threshold, as suggested in note 2, is a better option.
Another concern for store-merging in general: are we more susceptible to losing store-to-load forwarding? I know that Intel pre-Nehalem processors had some limitations that were later improved. Sorry i can't recall the full details from the top of my head. Will look later at the Optimization Manual for the info.