lsaba (Lama)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 3 2016, 5:39 AM (59 w, 4 d)

Recent Activity

Wed, Sep 13

lsaba accepted D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Wed, Sep 13, 7:13 AM

Wed, Sep 6

lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

3-Ops LEA are costly starting target SandyBridge , is there a limitation in the code for the targets this transformation works on? If not I think there should be.
you can check the Slow3OpsLEA feature for the full list of targets.

Wed, Sep 6, 12:13 AM

Mon, Sep 4

lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

Mon, Sep 4, 8:19 AM
lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@lamas, @reviewers, comments have been taken care. Let me know if anything else.

Mon, Sep 4, 12:24 AM

Thu, Aug 31

lsaba added a comment to D37289: [X86] Speculatively load operands of select instruction.

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

This is the transformation I am interested in doing:

struct S {

int a;
int b;

}

from:

foo (S* s, int x) {

 int c;
 if (x) 
   c = s->a;
 else
  c = s->b;
}

to:
foo (S* s, int x) {

 int c1= s->a;
 int c2 = a->b;
 c = x? c1 : c2;
}

I am assuming this transformation is legal in C for the given struct with the given types since the entire struct is allocated, is my assumption wrong? the idea is to limit the pass to these cases ( I am uploading a patch to limit the current implementation more)

Yes, I think your assumption is wrong. It's contrived, but consider this possibility based on the current version of the patch:

#include <stdlib.h>
typedef struct S {
  char padding[4088]; // not necessary, but might it make it easier for GuardMalloc or valgrind to see the bug
  struct S *p1;
  struct S *p2;
} S;

S *Sptr;

void init() {
  Sptr = malloc(4096); // sorry, p2 - no space for you
  Sptr->p1 = "1239"; // crazy, but just to prove a point
}

When input to a wrapper test similar to the test in this patch, this is safe without this transform, but crashes after. You need something to tell you the loads are dereferenceable (and whatever that information is will not be x86-specific, so there's no need to make an x86 pass for it).

Thu, Aug 31, 7:40 AM
lsaba updated the diff for D37289: [X86] Speculatively load operands of select instruction.

addressing aaboud's comments + limiting the opt. to non aggregate gep accesses

Thu, Aug 31, 7:11 AM
lsaba added a comment to D37289: [X86] Speculatively load operands of select instruction.

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

The implementation handles a specific case where both operands of the select are GEPs into elements of the same struct, correct me if i'm wrong but this should be safe

I don't see how being elements of one struct changes anything. Just because one pointer is dereferenceable does not make the other dereferenceable? You would need 'dereferenceable' metadata or some other means to know that either load is safe to hoist ahead of the select.

You're proposing this transform as an x86-specific pass, so maybe I'm missing something. Is there some feature of x86 that makes speculating the load safe? I'm guessing no because I tested the example I was thinking of on x86, and this transform crashes there.

Thu, Aug 31, 7:00 AM
lsaba added a comment to D37289: [X86] Speculatively load operands of select instruction.

I didn't look at the implementation, but why is it safe to speculate loads in these tests? I can create an example where one of the pointers in the select is unmapped, so speculating that load will crash in the general case.

Thu, Aug 31, 12:03 AM

Wed, Aug 30

lsaba created D37289: [X86] Speculatively load operands of select instruction.
Wed, Aug 30, 1:26 AM
lsaba abandoned D37288: [X86] Speculatively load operands of select instruction.
Wed, Aug 30, 1:19 AM
lsaba created D37288: [X86] Speculatively load operands of select instruction.
Wed, Aug 30, 12:46 AM

Aug 17 2017

lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

It seems like there are still correctness issues in the patch, I ran the llvm-test-suite and got a couple of runfails :
multisource_applications_alac_encode_alacconvert_encode
multisource_applications_jm_lencod_lencod

Aug 17 2017, 12:05 AM

Aug 9 2017

lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

Hi,
I ran the patch on several benchmarks to check performance, overall the changes look good, but there is a regression in one of the benchmarks (EEMBC/coremark-pro) caused by creating an undesired lea instruction instead of the previously created add instruction, I am working on creating a simple reproducer for the problem and would appreciate your patience.

Thanks

Aug 9 2017, 8:20 AM

Aug 8 2017

lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

@ reviewers , kindly let me know if there are any more comments apart from last comment from lsaba.
Thanks.

Aug 8 2017, 9:57 AM

Aug 7 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Aug 7 2017, 5:19 AM

Aug 6 2017

lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

Pinging reviewers. Kindly pour your comments.
Thanks

Aug 6 2017, 1:33 AM

Jul 31 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 31 2017, 4:50 AM

Jul 30 2017

lsaba committed rL309521: NFC: spell correction..
NFC: spell correction.
Jul 30 2017, 1:13 PM
lsaba closed D35885: NFC: spell correction. by committing rL309521: NFC: spell correction..
Jul 30 2017, 1:13 PM
lsaba accepted D36048: [X86]: Extending a test cases for LEA factorization..
Jul 30 2017, 6:37 AM

Jul 27 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 27 2017, 12:49 AM

Jul 26 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 26 2017, 4:22 AM

Jul 24 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 24 2017, 5:53 AM

Jul 19 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 19 2017, 8:04 AM

Jul 11 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 11 2017, 5:02 AM

Jul 10 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 10 2017, 6:20 AM

Jul 6 2017

lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 6 2017, 2:06 AM
lsaba added a comment to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..

Please add test cases for the optimization added in OptimizeLEAPass

Jul 6 2017, 1:42 AM
lsaba added inline comments to D35014: [X86] PR32755 : Improvement in CodeGen instruction selection for LEAs..
Jul 6 2017, 1:26 AM

May 18 2017

lsaba committed rL303333: [X86] Replace slow LEA instructions in X86.
[X86] Replace slow LEA instructions in X86
May 18 2017, 1:25 AM

May 16 2017

lsaba committed rL303183: [X86] Replace slow LEA instructions in X86.
[X86] Replace slow LEA instructions in X86
May 16 2017, 9:15 AM
lsaba closed D32277: Replace slow LEA instructions in X86 by committing rL303183: [X86] Replace slow LEA instructions in X86.
May 16 2017, 9:15 AM

May 8 2017

lsaba added inline comments to D32277: Replace slow LEA instructions in X86.
May 8 2017, 6:10 AM
lsaba updated the diff for D32277: Replace slow LEA instructions in X86.

Updated with Zvi's comments

May 8 2017, 4:41 AM
lsaba updated the diff for D32277: Replace slow LEA instructions in X86.
May 8 2017, 1:23 AM

May 7 2017

lsaba updated the diff for D32277: Replace slow LEA instructions in X86.

remove redundant variable

May 7 2017, 7:12 AM

May 3 2017

lsaba added inline comments to D32352: Go to eleven.
May 3 2017, 6:02 AM
lsaba updated the diff for D32277: Replace slow LEA instructions in X86.
May 3 2017, 5:44 AM

Apr 27 2017

lsaba added inline comments to D32352: Go to eleven.
Apr 27 2017, 1:53 AM
lsaba added inline comments to D32352: Go to eleven.
Apr 27 2017, 1:41 AM
lsaba added a comment to D32277: Replace slow LEA instructions in X86.

I have not got the conclusion: R13 is a bad register or not?
According to what I see in the comment it is not but patch still consider it is?
Could you please clarify this?

Apr 27 2017, 12:18 AM

Apr 26 2017

lsaba added a comment to D32277: Replace slow LEA instructions in X86.

D32352 is looking at more aggressive conversion of IMUL to multiple LEA instructions.

Thanks for notifying, the patch does not contain any 3-Operand LEAs as far as I can see, the only case we need to be careful with is for 2-Operand LEA with RBP/R13/EBP as a base register, since this is determined only after RA, I am thinking it's better to let my patch fix those cases rather than preventing that patch from running on the problematic targets

Sorry for being pedantic about the naming, but for AMD architectures the 'slowlea' cases (whether it uses the ALU or AGU pipe) include scale != 1 (even for 2-ops), but it doesn't seem to be noticeably affected by RBP/R13/EBP. Hence my interest in PR32326 to try and make it easier to discriminate.

will separating this feature form the existing slowLEA feature which is used in SLM (and giving it another name) make it less confusing?

Yes please, we need to discriminate between different slow LEA behaviours and separate features is probably the most straightforward way to do it.

Apr 26 2017, 7:07 AM
lsaba added a comment to D32277: Replace slow LEA instructions in X86.

Could you please also add some negative tests when you cannot do such transformation?
For example, involving eflags.

Apr 26 2017, 7:05 AM
lsaba updated the diff for D32277: Replace slow LEA instructions in X86.

Updated patch with ZVi's comments

Apr 26 2017, 7:05 AM

Apr 25 2017

lsaba added a comment to D32277: Replace slow LEA instructions in X86.

D32352 is looking at more aggressive conversion of IMUL to multiple LEA instructions.

Thanks for notifying, the patch does not contain any 3-Operand LEAs as far as I can see, the only case we need to be careful with is for 2-Operand LEA with RBP/R13/EBP as a base register, since this is determined only after RA, I am thinking it's better to let my patch fix those cases rather than preventing that patch from running on the problematic targets

Sorry for being pedantic about the naming, but for AMD architectures the 'slowlea' cases (whether it uses the ALU or AGU pipe) include scale != 1 (even for 2-ops), but it doesn't seem to be noticeably affected by RBP/R13/EBP. Hence my interest in PR32326 to try and make it easier to discriminate.

Apr 25 2017, 4:07 AM
lsaba added a comment to D32277: Replace slow LEA instructions in X86.

D32352 is looking at more aggressive conversion of IMUL to multiple LEA instructions.

Apr 25 2017, 1:53 AM
lsaba added a comment to D32277: Replace slow LEA instructions in X86.

@skatkov Well, the assemblies also included leal with r13 register as well. :)

102     leal    8(%r14), %eax

->

102     leal    8(%r13), %eax
The performance gap may be due to the instruction, but I'm not sure. (actually, converting `r14` to `r13` increased performance in this case, but I have no idea what's happening inside CPU..)
Apr 25 2017, 1:44 AM
lsaba added a comment to D32352: Go to eleven.
In D32352#736445, @zvi wrote:

In the meantime we might be better off only using multiple LEA calls when !Subtarget->slowLEA() ? In which case we need to add tests for silvermont

At least the tests in this patch don't show that any 'slow' LEA is being generated.

As a reminder, here's what the Intel Optimization Manual says about 'slow' LEA's:

For LEA instructions with three source operands and some specific situations, instruction latency has
increased to 3 cycles, and must dispatch via port 1:
— LEA that has all three source operands: base, index, and offset.
— LEA that uses base and index registers where the base is EBP, RBP, or R13.
— LEA that uses RIP relative addressing mode.
— LEA that uses 16-bit addressing mode.

It would be interesting to see tests that mimic expressions such as: (x*9+42)*(x*5+2) where it would be temping to select two 3-operand LEAs:

leal    42(%rdi,%rdi,8), %ecx
leal    2(%rdi,%rdi,4), %eax
imull   %ecx, %eax
Apr 25 2017, 1:00 AM

Apr 20 2017

lsaba added a reviewer for D32277: Replace slow LEA instructions in X86: RKSimon.
Apr 20 2017, 1:56 AM
lsaba created D32277: Replace slow LEA instructions in X86.
Apr 20 2017, 1:45 AM

Sep 27 2016

lsaba updated subscribers of D24951: Import/adapt the SLEEF vector math-function library as an LLVM runtime.
Sep 27 2016, 4:15 AM

Aug 10 2016

lsaba committed rL278209: [X86][AVX512] lower __mm512_andnot_ps/__mm512_andnot_pd to IR.
[X86][AVX512] lower __mm512_andnot_ps/__mm512_andnot_pd to IR
Aug 10 2016, 3:42 AM
lsaba closed D23262: lower __mm512_andnot_ps/__mm512_andnot_pd to IR by committing rL278209: [X86][AVX512] lower __mm512_andnot_ps/__mm512_andnot_pd to IR.
Aug 10 2016, 3:42 AM

Aug 9 2016

lsaba updated subscribers of D23262: lower __mm512_andnot_ps/__mm512_andnot_pd to IR.
Aug 9 2016, 12:20 AM

Aug 8 2016

lsaba added reviewers for D23262: lower __mm512_andnot_ps/__mm512_andnot_pd to IR: delena, igorb.
Aug 8 2016, 6:40 AM
lsaba retitled D23262: lower __mm512_andnot_ps/__mm512_andnot_pd to IR from to lower __mm512_andnot_ps/__mm512_andnot_pd to IR.
Aug 8 2016, 6:24 AM

Aug 3 2016

lsaba updated subscribers of D23108: Implemented 132/213/231 forms selection for X86-FMA3-AVX512 opcodes..
Aug 3 2016, 5:41 AM