nemanjai (Nemanja Ivanovic)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 23 2015, 9:38 AM (147 w, 5 d)

Recent Activity

Today

nemanjai accepted D39078: [PowerPC] Relax the checking on AND/AND8 in isSignOrZeroExtended.

LGTM.
I think this code may be off by default at this time due to issues on some external benchmarks. If that's the case, it should be fine to commit this, but you'll probably need the flag in the test case to enable the transformation that uses this.

Wed, Nov 22, 6:47 AM
nemanjai accepted D39736: [PowerPC] Remove redundant TOC saves.
Wed, Nov 22, 6:27 AM
nemanjai added a comment to D39736: [PowerPC] Remove redundant TOC saves.

LGTM.
The remaining comments are essentially just minor nits.

Wed, Nov 22, 6:26 AM
nemanjai accepted D39858: [PowerPC][msan] Update msan to handle changed memory layouts in newer kernels.

LGTM.
I am far from an expert on this, but the modification seems to make perfect sense. Perhaps @eugenis can have a look as well, but if not, I'd say that we should go ahead with this since:

  • It seems to make perfect sense (and shouldn't affect 46 and 45 bit VMA kernels)
  • You've tested it on multiple kernels
  • It's been up for review for around two weeks
  • This is blocking us from running on 47 bit VMA kernels
Wed, Nov 22, 6:05 AM
nemanjai accepted D38413: [PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute.

Other than the tiny issue with an unnecessary cast, LGTM.

Wed, Nov 22, 5:46 AM
nemanjai created D40348: [PowerPC] Follow-up to r318436 to get the missed CSE opportunities.
Wed, Nov 22, 5:03 AM

Tue, Nov 14

nemanjai added a comment to D39942: [PPC] Add additional fnmadd patterns..

@nemanjai I cannot find tests for instructions XVNMSUBASP ... in // Additional fnmsub patterns: -a*c + b == -(a*c - b). It is also unclear to me what roles these FMA test files take and in which file I should add the test.

I wonder if they somehow got added without adding test cases...

% ls test/CodeGen/PowerPC/*fma*                                                                   
test/CodeGen/PowerPC/fma-aggr-FMF.ll               
test/CodeGen/PowerPC/fma-assoc.ll                  
test/CodeGen/PowerPC/fma-ext.ll                    
test/CodeGen/PowerPC/fma.ll

I would imagine the test case above is where you'd add general FMA tests. The rest of them deal with conditions under which it is safe to fuse, when to mutate the "A-Forms" to the "M-Forms" etc.

test/CodeGen/PowerPC/fma-mutate-duplicate-vreg.ll
test/CodeGen/PowerPC/fma-mutate.ll
test/CodeGen/PowerPC/fma-mutate-register-constraint.ll
test/CodeGen/PowerPC/fmaxnum.ll
test/CodeGen/PowerPC/vsx-fma-m.ll
test/CodeGen/PowerPC/vsx-fma-mutate-trivial-copy.ll
test/CodeGen/PowerPC/vsx-fma-mutate-undef.ll
test/CodeGen/PowerPC/vsx-fma-sp.ll

To be honest, it seems to me that what you might be after here is a DAG Combine to put the FMA into a form you can easily match. Perhaps the negated forms should just match a canonical form such as (fneg (fma %a, %b, %c)) and you'd convert things like (fma (fneg %a), %b, (fneg %c)) to that canonical form. That way you have a single pattern in the .td file and you don't have to introduce ambiguities there.

Power ISA 2.07 xsnm{add,sub}asp are defined after the "// Additional fnmsub patterns: -a*c + b == -(a*c - b)", and if I put the associated patterns after that, the patterns would not be used. (I need to learn more about the instruction selection mechanism)

Basically, the instruction selector matches the "best" input pattern to the "best" output pattern (as long as all predicates are satisfied). For input, best means that it consumes more nodes in the input DAG (i.e. has higher complexity). For output, best means that it produces fewest output nodes (i.e. smaller code size). However, when there's an ambiguity (both input and output DAGs for two patterns are equivalent and predicates satisfied), the selector will pick the one that it happened to emit earlier in the table. That view is probably not 100% accurate, but has served me well in understanding the selector and how to write .td files.

Tue, Nov 14, 11:53 PM
nemanjai added a comment to D39736: [PowerPC] Remove redundant TOC saves.

Also, for the test cases that have to keep multiple TOC saves, can you just add a comment as to why it isn't safe to remove one (i.e. neither dominates the other, etc.).

Tue, Nov 14, 4:19 PM
nemanjai added a comment to D39805: [Power9] Set MicroOpBufferSize for Power 9.

But before committing, please wait for at least another approval from either @echristo or @hfinkel.

Tue, Nov 14, 3:53 PM
nemanjai accepted D39805: [Power9] Set MicroOpBufferSize for Power 9.

LGTM. For now anyway.

Tue, Nov 14, 3:41 PM
nemanjai added a comment to D39875: [PPC] Change i32 constant in store instruction to i64.

There are two SelectionDAG::getTruncStore() functions, but neither of them can accept the general (base + offset) addressing mode, so if we use getTruncStore, we still need UpdateNodeOperands to add offset if there is one.

Fair enough, it's unfortunate that there isn't an overload that can do the job. And providing another overload is probably even more intrusive than adding a mutator function like this to StoreSDNode. Can you please add a comment explaining that in the code since this looks different from other combines and others may wonder why we didn't just use DAG.getTruncStore()?

Tue, Nov 14, 3:38 PM
nemanjai accepted D39875: [PPC] Change i32 constant in store instruction to i64.

Forgot to select "Accept"...

Tue, Nov 14, 3:38 PM
nemanjai added a comment to D39942: [PPC] Add additional fnmadd patterns..

Is there meant to be more to this patch? Please provide the test cases that will produce each of the added patterns.
I think it would also be good to add a functional test case to test-suite that will use all the FMA patterns with key values to verify the expected behaviour. But this of course belongs in a separate patch.

Tue, Nov 14, 12:09 PM
nemanjai added a comment to D39946: Relax unaligned access assertion when type is byte aligned.

A couple of general comments...
I'm not a fan of modifying the ABI of DataLayout for this. Seems that we can just compare the ABI alignment against 1 at the call site. But I also wonder if it would suffice to simply check that the load instruction's alignment is less than the ABI alignment for that type
if (I.getAlignment() < DL.getABITypeAlignment(Ty))

Tue, Nov 14, 11:56 AM

Mon, Nov 13

nemanjai added a comment to D39777: [PowerPC] implement mayBeEmittedAsTailCall for PPC.

Just a few other typos I noticed but forgot to click submit. Wanted to make sure I did so before this gets committed. Feel free to fix these on the commit.

Mon, Nov 13, 8:37 PM

Fri, Nov 10

nemanjai added a comment to D39875: [PPC] Change i32 constant in store instruction to i64.

Thanks for taking care of this, I think it's quite useful and it should reduce the number of load immediate instructions and registers used. However, do we really need to expose a new API in StoreSDNode? Can we not just create a new truncating store using SelectionDAG::getTruncStore()? That would make this look a lot more like the rest of the DAG combines.

Fri, Nov 10, 3:27 AM

Thu, Nov 9

nemanjai accepted D39860: [PowerPC] Simplify a Swap if it feeds a Splat.

This looks good, but of course, lets run this through all the testing before committing. Also, I think that for the test cases you added, you don't expect to see any swaps. You should be able to add -implicit-check-not to the RUN lines.
With these minor changes, LGTM.

Thu, Nov 9, 2:19 PM
nemanjai added inline comments to D39785: [PowerPC] Remove redundant register copies in late peephole.
Thu, Nov 9, 6:04 AM

Wed, Nov 8

nemanjai accepted D38128: Handle COPYs of physregs better (regalloc hints).

The PowerPC CodeGen test changes are fine. I've also confirmed that this patch reduces the total number of register move instructions (which implement the copies). So that LGTM.

Wed, Nov 8, 10:55 AM
nemanjai added a dependent revision for D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA): D39785: [PowerPC] Remove redundant register copies in late peephole.
Wed, Nov 8, 2:10 AM
nemanjai added a dependency for D39785: [PowerPC] Remove redundant register copies in late peephole: D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).
Wed, Nov 8, 2:10 AM
nemanjai added a reviewer for D39785: [PowerPC] Remove redundant register copies in late peephole: power-llvm-team.
Wed, Nov 8, 2:03 AM
nemanjai created D39785: [PowerPC] Remove redundant register copies in late peephole.
Wed, Nov 8, 1:57 AM

Tue, Nov 7

nemanjai added inline comments to D39736: [PowerPC] Remove redundant TOC saves.
Tue, Nov 7, 11:22 AM
nemanjai added a comment to D39536: [PowerPC] Eliminate redundant register copys after register allocation.

If we're adding a new peephole, it likely won't only be used for this one purpose. A more general name might be in order.

I changed the pass name. Do you have any suggestion on an alternative name?

If you agree that this would likely be used for other purposes, why not just name it PPCPostRAPeephole? Otherwise if you think this will likely only be used for this purpose, I suppose the name is fine.

Tue, Nov 7, 6:20 AM
nemanjai added a comment to D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).

Seems like a good example. Can you please document this in a comment in the pass as an example of why such a thing might be needed. I find this convincing.

I'll certainly add this to the comments.

Tue, Nov 7, 5:51 AM

Mon, Nov 6

nemanjai added a comment to D38128: Handle COPYs of physregs better (regalloc hints).

Just looking at the number and placement of reg-reg moves for PowerPC seems fine. It doesn't seem like there are regressions but no huge reduction in the number of copies. However, since the diff is without context, it's hard to tell exactly what is going on in the test cases. I'll apply this patch to ToT and have a look at how things change before signing off on the PPC changes.

Mon, Nov 6, 1:11 PM
nemanjai added a comment to D39536: [PowerPC] Eliminate redundant register copys after register allocation.

I think that the main loop definitely needs to be re-written and made more readable and easier to follow. To me, it seems like it does too much. It spans nearly 250 lines and does a whole bunch of analyses and transformations.
Could you extract functionality into easy-to-read functions that perform individual things so that the flow is easier to follow?

Mon, Nov 6, 12:37 AM
nemanjai added a reviewer for D39536: [PowerPC] Eliminate redundant register copys after register allocation: gberry.
Mon, Nov 6, 12:34 AM

Sun, Nov 5

nemanjai added inline comments to D38413: [PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute.
Sun, Nov 5, 9:21 PM

Thu, Nov 2

nemanjai accepted D39510: [PPC] Use xxbrd to speed up bswap64.

This is a great idea considering direct moves are so fast on Power9. I guess we just didn't think of this use when we implemented the vector byte reversal. Thanks for doing this. Other than the rather obvious change to generate the faster mfvsrd instruction, this LGTM.

Thu, Nov 2, 12:31 AM

Wed, Nov 1

nemanjai added inline comments to D38486: [PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st..
Wed, Nov 1, 12:58 PM
nemanjai accepted D38486: [PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st..

The remaining comments I have are minor nits. Please feel free to fix them on the commit. Otherwise, LGTM.

Wed, Nov 1, 9:20 AM
nemanjai added a comment to D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).

I don't think that we can catch all of these early enough not to need something like this. I think that fundamentally, we're limited in our ability to select the right instruction when values come from other blocks. Most of the early cases are likely due to SDAG being bb-local.

One definite source of these opportunities is something like this:

int a;
void fn1() { __atomic_fetch_add(&a, 0, 4); }

The atomic pseudo instructions are as general as possible so don't have immediate forms. Then of course, when we expand the pseudo, we end up with something like li 5, 4 -> add 3, 4, 5. This pass will clean it up. Of course, we can add all the immediate form atomic pseudo's, but I'm just afraid we'll keep finding more such cases.

Wed, Nov 1, 5:03 AM

Tue, Oct 31

nemanjai added a comment to D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).

This is really interesting. So we're seeing a number of cases where, after instruction selection, we're doing something that makes operations have constant operands. Do you know what that is? Maybe one thing we could do with this is have it assert, instead of transforming the code, and then reduce the test cases in order to understand why this happens.

Tue, Oct 31, 2:11 AM

Mon, Oct 30

nemanjai updated the diff for D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).
Mon, Oct 30, 9:46 AM
nemanjai accepted D38962: [test-suite] Update bitreverse benchmark..

LGTM.

Mon, Oct 30, 7:46 AM

Sat, Oct 28

nemanjai added a comment to D39344: [PowerPC] Add implementation for -msave-toc-indirect option - llvm portion.

I'd like to think that we already have something that would clean this up, but apparently we don't. We should.

I was under the impression that this patch was supposed to be the something that cleans it up. What I thought this would be is a patch that does 2 things:

  1. Save the TOC in the prologue if the option is set and there's an indirect call in the function
  2. Fix up the call lowering sequence to not do the TOC save prior to the call

If it accomplishes the stated goals, it would achieve what you're after, right @hfinkel? It certainly removes the two TOC saves from your test case. Of course, the prologue is just chosen because it is guaranteed to dominate all the indirect calls. Otherwise we may not always be able to do this without some kind of dataflow analysis + code motion + possibly creation of new basic blocks.
For example, where's the right place to save the TOC on something like this:

void foo2(void (*f)(int), int A) {
  if (A > 10)
    f(5);
  f(15);
}
Sat, Oct 28, 3:00 AM

Thu, Oct 26

nemanjai added a comment to D20217: Represent runtime preemption in the IR..

New keyword[s]? Seems it might be a nice addition to also add them to utils/vim/syntax/llvm.vim and friends if appropriate.

Thu, Oct 26, 6:10 AM

Wed, Oct 25

nemanjai accepted D38941: [PowerPC] Use record-form instruction for Less-or-Equal -1 and Greater-or-Equal 1.

As Eric points out, perhaps mention that it would be possible to optimize even if there were multiple uses of the CR register if all the uses fell into one of the four categories, but it would be very unlikely that doing to would be profitable.

Wed, Oct 25, 9:37 AM

Tue, Oct 24

nemanjai added a comment to D39016: Add Percent Symbol In PPC Registers for Linux.

If what we want is to name the registers with the percent sign prefix on all non-AIX, non-Darwin systems, why not just change the definitions in the .td files to include the prefix? That way you don't have to worry about this kind of hacky/leaky code to prepend the symbol to the register names. And on AIX/Darwin, we just strip the full prefix (percent symbol and letter). Considering that it is questionable whether LLVM will ever have full support on AIX and that Darwin support is likely to be pulled soon, this seems like the best choice.

Tue, Oct 24, 4:39 AM

Oct 20 2017

nemanjai updated subscribers of D39016: Add Percent Symbol In PPC Registers for Linux.

A few general comments:

  1. You should add reviewers to this review. At the minimum @hfinkel, @kbarton and @echristo. Maybe someone from FreeBSD if this affects them.
  2. Why do we need to change the default and get the existing behaviour with an option? Why not the other way around?
  3. Is this syntax accepted by all the assemblers (I think LLVM for PPC runs on FreeBSD as well, but I have no idea if FreeBSD returns true for isOSLinux()? We don't want to produce assembly that we can't assemble.
Oct 20 2017, 5:25 AM

Oct 19 2017

nemanjai committed rL316199: Disabling the transformation introduced in r315888.
Disabling the transformation introduced in r315888
Oct 19 2017, 5:37 PM

Oct 18 2017

nemanjai accepted D38820: [CGExprScalar] Add missing types in function GetIntrinsic.

LGTM.

Oct 18 2017, 10:32 AM
nemanjai accepted D38988: [PowerPC] Use helper functions to check sign-/zero-extended value.

And FWIW, I still don't like the names of these helper functions. There isn't really much indication in the name what it is sign/zero extending from/to. Presume I want to do add/remove an instruction depending on whether the input is known to be sign-extended from a halfword to a doubleword, but not if it is only known to be sign-extended from a word to a doubleword, it isn't clear how I would use these functions to determine that.

Is isSignExtendedFromWord better? Other candidates I have are isSignExtendedWord or isSignExtended_32_64. Any suggestions?

Oct 18 2017, 12:58 AM
nemanjai requested changes to D38962: [test-suite] Update bitreverse benchmark..

With the above changes to the test case (and the corresponding changes to the reference output), the problem would have been blatantly obvious from the start and we wouldn't have to jump through hoops to ensure opt isn't just giving us the right answer.

Oct 18 2017, 12:13 AM

Oct 17 2017

nemanjai requested changes to D38486: [PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st..

Please add test cases for the following:

  • An actual use of the VSX instructions (i.e. where the register pressure is high enough that an upper VSX register will be allocated - you should be able to accomplish this with inline asm clobbers lists in a small test case)
  • A Power9 test case where we will use an X-Form version (i.e. the offset is large enough not to fit in a displacement field)
Oct 17 2017, 5:55 AM
nemanjai added a comment to D38988: [PowerPC] Use helper functions to check sign-/zero-extended value.

This seems like a perfectly valid thing to do. Please add a test case before this can proceed though.
I would imagine that one of the extra 5000 eliminated compare instructions would make for a good test case.

Oct 17 2017, 5:00 AM

Oct 10 2017

nemanjai removed a dependency for D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA): D31851: [PowerPC] Eliminate compares - add handling for logical operations without the use of condition registers.
Oct 10 2017, 1:41 PM
nemanjai removed a dependent revision for D31851: [PowerPC] Eliminate compares - add handling for logical operations without the use of condition registers: D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).
Oct 10 2017, 1:41 PM
nemanjai updated the diff for D31852: [PowerPC] Convert reg/reg instructions fed by constants to reg/imm instructions (pre and post RA).

This isn't really dependent on the compare elimination patches (although it will find more opportunities once those land).

Oct 10 2017, 1:41 PM
nemanjai added a comment to D22948: Add Percent Symbol Before Named PPC Registers.
In D22948#892973, @gut wrote:

Hi. This change seems interesting, why wasn't it reviewed?

Our team is considering implementing such a thing for POWER8 but then we noticed this patch.

Oct 10 2017, 5:51 AM
nemanjai committed rL315285: Fix for PR34888..
Fix for PR34888.
Oct 10 2017, 1:46 AM

Oct 9 2017

nemanjai added a comment to D38705: [PPC CodeGen] Fix the bitreverse.i64 intrinsic..

Where to put the test?

For projects/test-suite/SingleSource/Benchmarks/Misc/revertBits.c, the incorrect output was:

Sum1 = 807fffff800000, Sum2 = ff80000000000000

With this patch, it is:

Sum1 = 0, Sum2 = feff800000800000

The same as projects/test-suite/SingleSource/Benchmarks/Misc/revertBits.reference_output

Oct 9 2017, 10:56 PM
nemanjai added a comment to D38656: [CGExprScalar] In EmitCompare trunc the result if it has different type as E->getType().

I worked on a similar bug as 31161, and then found this one, it should be same as in comment7.
What is the current status of the work on that bug?

Oct 9 2017, 10:18 PM
nemanjai added a comment to D38498: [CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads, part 1..

Does this just not have reviewers because it isn't ready for review yet?

Oct 9 2017, 3:03 PM
nemanjai added a comment to D38705: [PPC CodeGen] Fix the bitreverse.i64 intrinsic..

I'm also adding the author of the original code.

Oct 9 2017, 2:29 PM
nemanjai updated subscribers of D38705: [PPC CodeGen] Fix the bitreverse.i64 intrinsic..
Oct 9 2017, 2:29 PM
nemanjai added a comment to D38705: [PPC CodeGen] Fix the bitreverse.i64 intrinsic..

So the bitreverse code appears to be functionally broken. Test case projects/test-suite/SingleSource/Benchmarks/Misc/revertBits.c was added to test this functionally. How does it miss the fact that the words are swapped in the register?
Actually, looking at the test case, there is no real test for functionality there (not obviously verifiable anyway). Can you please also add a straightforward test in there that will reverse a bit pattern that can be visually confirmed (of course ensuring that the optimizer doesn't just get rid of the code and emit a constant)?

Oct 9 2017, 2:24 PM
nemanjai requested changes to D38413: [PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute.

An important aspect of the testing here is missing. Namely checking for the coldcc calling convention on the IR. You should be able to accomplish this either through an MIR test case or just a test case running opt.

Oct 9 2017, 7:42 AM
nemanjai added a comment to D36706: DAGCombiner: Add form of isFPExtFree to check uses.

It would appear to me that the PPC-only portion of this is NFC since it'll still return true for any floating point destination type. So I would say that the PPC back end is fine with those (as long as the assert gets a message :)).
However, shouldn't this patch have a test case?

Oct 9 2017, 5:30 AM

Oct 8 2017

nemanjai added a comment to D38656: [CGExprScalar] In EmitCompare trunc the result if it has different type as E->getType().

I assume this also fixes https://bugs.llvm.org/show_bug.cgi?id=31161?

Oct 8 2017, 10:27 PM

Oct 6 2017

nemanjai added inline comments to D38486: [PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st..
Oct 6 2017, 1:32 AM
nemanjai added inline comments to D38486: [PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st..
Oct 6 2017, 1:24 AM

Oct 4 2017

nemanjai added a reviewer for D38575: [PowerPC] Recommit r314244 with refactoring and off by default: djasper.
Oct 4 2017, 10:46 PM
nemanjai added inline comments to D38575: [PowerPC] Recommit r314244 with refactoring and off by default.
Oct 4 2017, 10:41 PM
nemanjai created D38575: [PowerPC] Recommit r314244 with refactoring and off by default.
Oct 4 2017, 10:36 PM

Oct 3 2017

nemanjai added a reviewer for D31319: [PPC] Eliminate redundant sign- and zero-extensions in PPC MI Peephole pass: iteratee.
Oct 3 2017, 2:47 PM

Sep 26 2017

nemanjai committed rL314244: [PowerPC] Reverting sequence of patches for elimination of comparison….
[PowerPC] Reverting sequence of patches for elimination of comparison…
Sep 26 2017, 3:35 PM

Sep 25 2017

nemanjai added inline comments to D38212: [PowerPC] Add profitablilty check for conversion to mtctr loops .
Sep 25 2017, 3:36 PM
nemanjai committed rL314106: [PowerPC] Eliminate compares - add i64 sext/zext handling for SETLT/SETGT.
[PowerPC] Eliminate compares - add i64 sext/zext handling for SETLT/SETGT
Sep 25 2017, 7:07 AM
nemanjai committed rL314073: [PowerPC] Eliminate compares - add i64 sext/zext handling for SETLE/SETGE.
[PowerPC] Eliminate compares - add i64 sext/zext handling for SETLE/SETGE
Sep 25 2017, 5:08 AM
nemanjai committed rL314060: [PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGE.
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULE/SETUGE
Sep 25 2017, 5:08 AM
nemanjai committed rL314062: [PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGT.
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETULT/SETUGT
Sep 25 2017, 5:08 AM
nemanjai committed rL314055: [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT.
[PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT
Sep 25 2017, 5:08 AM
nemanjai abandoned D36613: [PowerPC] Eliminate compares - add i32 sext/zext handling for SETLT/SETGT.

The SETLT/SETGT patterns were committed in rL314055.
The spill problem fix was committed in rL313978.

Sep 25 2017, 5:02 AM

Sep 22 2017

nemanjai accepted D38165: [CodeGenPrepare][NFC] Rename TargetTransformInfo::expandMemCmp -> TargetTransformInfo::enableMemCmpExpansion..

Yeah, the new name makes a lot more sense. LGTM.

Sep 22 2017, 11:41 AM
nemanjai committed rL313980: Remove the default clause from a fully-covering switch.
Remove the default clause from a fully-covering switch
Sep 22 2017, 5:27 AM
nemanjai committed rL313978: Recommit r310809 with a fix for the spill problem.
Recommit r310809 with a fix for the spill problem
Sep 22 2017, 4:52 AM
nemanjai closed D38054: [PowerPC] Fix the spill issue exposed by r310809 by committing rL313978: Recommit r310809 with a fix for the spill problem.
Sep 22 2017, 4:52 AM
nemanjai added a comment to D38054: [PowerPC] Fix the spill issue exposed by r310809.

I think this generally looks fine.

Just as a general note on code organizations and commenting, we seem to be accumulating a pretty deep call tree here of functions that only work correctly in 64-bit mode. Not in this patch, but we should do something about this (either segregating these things into a distinct part of the file and/or adding asserts and/or comments to the 64-bit-mode-only functions.

Agreed. Also a lot of the fine points of when to extend and when not could probably be split out as well. This is definitely getting pretty unwieldy.

-eric

Sep 22 2017, 3:05 AM

Sep 20 2017

nemanjai updated the diff for D38054: [PowerPC] Fix the spill issue exposed by r310809.

Add a test case.

Sep 20 2017, 4:10 PM
nemanjai updated the diff for D38054: [PowerPC] Fix the spill issue exposed by r310809.

Updated the patch to only include the bug fix that r310809 needs in order to function correctly. This won't apply cleanly to trunk because r310809 was pulled out, so this needs to apply on top of that.

Sep 20 2017, 3:22 PM
nemanjai added a comment to D38099: Fix crashes with -fprofile-use + new pass manager in queens.c from the testsuite (bugzilla bug 33776).
  1. Can you please explain the rationale behind this fix? I'm afraid it's just papering over the problem rather than solving it.

I am not at all sure how this is supposed to work, but it certainly seems messy to always do a NULL check before using ORE. Perhaps someone that really knows how this is supposed to work can comment on why there's a requirement that the ORE object be available in the cache (i.e. why we need to acquire it through getCachedResult() rather than getResult() and add it to the cache if it isn't already cached).

Sep 20 2017, 2:50 PM

Sep 19 2017

nemanjai created D38054: [PowerPC] Fix the spill issue exposed by r310809.
Sep 19 2017, 2:04 PM
nemanjai accepted D37851: [Power9] Add missing Power9 instructions..

LGTM.

Sep 19 2017, 1:13 AM

Sep 18 2017

nemanjai accepted D34815: [Power9] Spill gprs to vector registers rather than stack.

Forgot to accept :).

Sep 18 2017, 6:30 AM
nemanjai added a comment to D34815: [Power9] Spill gprs to vector registers rather than stack.

Other than a few minor inline comments, this LGTM.

Sep 18 2017, 6:30 AM
nemanjai accepted D36431: Add powerpc64 to compiler-rt build infrastructure..

I hope I haven't lost track of the patches that precluded this. If I remember correctly, all the X86 80-bit stuff was sorted out. We now know why those test cases were running forever (i.e. a vaarg function invoked as a non-vaarg function). So this just enables the truly generic builtins along with some PPC builtins. If my understanding is correct, I'd say this is ready to proceed (unless it was subsumed by one of the other patches).

Sep 18 2017, 5:54 AM
nemanjai accepted D36734: [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD..

LGTM. Just a request for a comment to be added to the code.

Sep 18 2017, 5:45 AM
nemanjai requested changes to D37851: [Power9] Add missing Power9 instructions..
Sep 18 2017, 4:30 AM

Sep 14 2017

nemanjai added inline comments to D37730: [PowerPC] eliminate unconditional branch to the next instruction.
Sep 14 2017, 12:32 AM

Sep 12 2017

nemanjai added a comment to D37342: [Power9] Add missing instructions: extswsli, popcntb.

The update to the EXTSWSLI definition looks fine. However, you'll want to coordinate with https://reviews.llvm.org/D37404 for POPCNTB.

Sep 12 2017, 3:19 PM

Sep 9 2017

nemanjai added a comment to D36504: [CodeGenPrepare][WIP] Convert uncond. branch to return into a return to help with shrink-wrapping.

Ping.
I am just wondering if any other targets have had a chance to try this out and whether there's any interest in proceeding with this patch as a result.
I will certainly address the comments raised by @junbuml if this patch has a future, but I'd like to evaluate whether there's any interest in proceeding before doing more work on this patch. Perhaps I'll open this up to a wider audience with a quick RFC email on llvm-dev.

Sep 9 2017, 1:43 AM

Sep 8 2017

nemanjai added a comment to D37654: PPC: Don't select lxv/stxv for insufficiently aligned stack slots..

I imagine this patch was prompted by a bug. Just out of curiosity, did the assert:

assert(MO.isImm() && !(MO.getImm() % 16) &&
       "Expecting an immediate that is a multiple of 16");

fire for the case where the frame op was lowered to an lxv/stxv with an offset that isn't a multiple of 16? If not, perhaps we need another assert somewhere else?

Sep 8 2017, 11:46 PM

Aug 31 2017

nemanjai added inline comments to D37342: [Power9] Add missing instructions: extswsli, popcntb.
Aug 31 2017, 3:02 PM

Aug 29 2017

nemanjai added inline comments to D36734: [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD..
Aug 29 2017, 11:08 AM

Aug 20 2017

nemanjai added a comment to D36734: [PowerPC Peephole] Constants into a join add, use ADDI over LI/ADD..

Lambda functions are useful constructs in some cases, but if a patch consists of 3 lambda functions and 5 lines of code to just call them, it's an indication of an overuse of this construct. I would say that as general guidance, use a lambda when it:

  1. Saves code repetition
  2. Capturing values from the enclosing function is necessary
  3. Functionality isn't needed outside of the function

or if you need to pass a functor to something and will likely just define it inline.

Aug 20 2017, 1:17 AM
nemanjai added a comment to D36764: The following functions and tests work fine for powerpc64, so enable them..

divtc3 and friends.

Aug 20 2017, 12:25 AM
nemanjai added a comment to D36840: [DAG] convert vector select-of-constants to logic/math.

This patch certainly looks good from the PPC perspective as far as I can tell.

Aug 20 2017, 12:13 AM