Page MenuHomePhabricator

sabuasal (Sameer AbuAsal)
User

Projects

User does not belong to any projects.

User Details

User Since
Oct 18 2016, 6:44 PM (152 w, 4 d)

Recent Activity

Jun 20 2019

sabuasal added inline comments to D63305: Propagate Trip count Assumptions to runtime check.
Jun 20 2019, 11:40 AM · Restricted Project, Restricted Project
sabuasal added a comment to D63305: Propagate Trip count Assumptions to runtime check.

Also, please let me know if you think it'll make sense to commit the two patches at once (once they are approved).

Jun 20 2019, 11:40 AM · Restricted Project, Restricted Project
sabuasal updated the diff for D63305: Propagate Trip count Assumptions to runtime check.
Jun 20 2019, 11:40 AM · Restricted Project, Restricted Project

Jun 17 2019

sabuasal updated the diff for D63304: Ignore Singletons in statement domains.
Jun 17 2019, 7:22 PM · Restricted Project, Restricted Project
sabuasal updated the diff for D63304: Ignore Singletons in statement domains.

Addressed comments from @Meinersbur.

Jun 17 2019, 7:22 PM · Restricted Project, Restricted Project
sabuasal added inline comments to D63304: Ignore Singletons in statement domains.
Jun 17 2019, 7:17 PM · Restricted Project, Restricted Project
sabuasal added inline comments to D63305: Propagate Trip count Assumptions to runtime check.
Jun 17 2019, 12:39 PM · Restricted Project, Restricted Project

Jun 14 2019

sabuasal added inline comments to D63304: Ignore Singletons in statement domains.
Jun 14 2019, 3:30 PM · Restricted Project, Restricted Project
sabuasal added inline comments to D63304: Ignore Singletons in statement domains.
Jun 14 2019, 2:58 PM · Restricted Project, Restricted Project
sabuasal added a comment to D63304: Ignore Singletons in statement domains.

You can detect this simple loop domain directly by using quering "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

What I meant is NOT using isl at all for detecting the situation, but deriving it from the control flow graph (such as 'conditional branch found at the end of the loop instead of before'). Keep in mind this is only a [suggestion], not a requirement for me to accept this patch.

Oh I see, I only thought about this in terms of modeling in ISL instead of hacking something in ScopDetection\Info. Thanks for the suggestion.

Jun 14 2019, 2:50 PM · Restricted Project, Restricted Project
sabuasal added a reviewer for D63304: Ignore Singletons in statement domains: zinob.
Jun 14 2019, 11:59 AM · Restricted Project, Restricted Project
sabuasal added a comment to D63304: Ignore Singletons in statement domains.

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);

You can detect this simple loop domain directly by using "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

Jun 14 2019, 10:31 AM · Restricted Project, Restricted Project
sabuasal added a comment to D63304: Ignore Singletons in statement domains.

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);

You can detect this simple loop domain directly by using "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

Jun 14 2019, 10:31 AM · Restricted Project, Restricted Project
sabuasal added a comment to D63304: Ignore Singletons in statement domains.

Thanks for the patch. Do you happen to have performance, code size numbers?

[suggestion] The algorithm for detecting non-looping basic sets is quite ISL-heavy. If the goal is to better handle loops such as

i = 0;
do {
  Stmt(i)
} while (++i <n)

did you think about recognizing them directly?
Another possibility would be to merge the singleton with the main loop. E.g. the loop above could be converted into

for (int i = 0; i < max(1,n); i+=1) 
  Stmt(i);

You can detect this simple loop domain directly by using quering "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.

Jun 14 2019, 10:19 AM · Restricted Project, Restricted Project

Jun 13 2019

sabuasal updated the summary of D63304: Ignore Singletons in statement domains.
Jun 13 2019, 4:42 PM · Restricted Project, Restricted Project
sabuasal updated the summary of D63304: Ignore Singletons in statement domains.
Jun 13 2019, 4:08 PM · Restricted Project, Restricted Project
sabuasal updated the summary of D63305: Propagate Trip count Assumptions to runtime check.
Jun 13 2019, 4:05 PM · Restricted Project, Restricted Project
sabuasal added a parent revision for D63305: Propagate Trip count Assumptions to runtime check: D63304: Ignore Singletons in statement domains.
Jun 13 2019, 4:05 PM · Restricted Project, Restricted Project
sabuasal added a child revision for D63304: Ignore Singletons in statement domains: D63305: Propagate Trip count Assumptions to runtime check.
Jun 13 2019, 4:05 PM · Restricted Project, Restricted Project
sabuasal updated the summary of D63304: Ignore Singletons in statement domains.
Jun 13 2019, 4:05 PM · Restricted Project, Restricted Project
sabuasal created D63305: Propagate Trip count Assumptions to runtime check.
Jun 13 2019, 4:05 PM · Restricted Project, Restricted Project
sabuasal created D63304: Ignore Singletons in statement domains.
Jun 13 2019, 4:02 PM · Restricted Project, Restricted Project

Jun 10 2019

sabuasal committed rG04b5ee99f77e: [RISCV] Replace map with set in getReqFeatures (authored by sabuasal).
[RISCV] Replace map with set in getReqFeatures
Jun 10 2019, 10:15 AM
sabuasal committed rL362968: [RISCV] Replace map with set in getReqFeatures.
[RISCV] Replace map with set in getReqFeatures
Jun 10 2019, 10:14 AM
sabuasal closed D61412: [RISCV] Replace map with set in getReqFeatures.
Jun 10 2019, 10:14 AM · Restricted Project
sabuasal added a comment to D61412: [RISCV] Replace map with set in getReqFeatures.

Apologies for the late response

Jun 10 2019, 9:59 AM · Restricted Project

May 1 2019

sabuasal created D61412: [RISCV] Replace map with set in getReqFeatures.
May 1 2019, 4:20 PM · Restricted Project

Nov 28 2018

sabuasal added a comment to D55003: Two extra spaces have been added in unittests/Analysis/TargetLibraryInfoTest.cpp when ran clang-format-diff on it.

Thank you for updating this.

Nov 28 2018, 11:35 AM

Nov 27 2018

sabuasal added inline comments to rL346318: Fix unit tests after patch https://reviews.llvm.org/rL346313.
Nov 27 2018, 6:10 PM
sabuasal accepted D54816: [RISCV] Mark unit tests as "requires: riscv-registered-target".
Nov 27 2018, 10:52 AM
sabuasal added a reviewer for D54816: [RISCV] Mark unit tests as "requires: riscv-registered-target": sabuasal.
Nov 27 2018, 10:52 AM

Nov 15 2018

sabuasal accepted D52962: [RISCV] Constant materialisation for RV64I.

LGTM!

Nov 15 2018, 2:48 PM

Oct 22 2018

sabuasal edited reviewers for D52962: [RISCV] Constant materialisation for RV64I, added: sabuasal; removed: sameer.abuasal.
Oct 22 2018, 4:43 PM

Oct 15 2018

sabuasal added a comment to D52961: [RISCV] Introduce the RISCVMatInt::generateInstSeq helper.
In D52961#1265204, @asb wrote:
In D52961#1258464, @asb wrote:

No real reason to prefer one or the other when they're semantically identical. It's mostly to be closer to gas/gcc which uses addi at least for the isInt<12> case.

I see. makes sense. I just checked gcc and it seems to be favoring addi for that case.

Does this look good to you then?

Oct 15 2018, 11:58 AM
sabuasal accepted D52961: [RISCV] Introduce the RISCVMatInt::generateInstSeq helper.
Oct 15 2018, 11:58 AM
sabuasal edited reviewers for D52961: [RISCV] Introduce the RISCVMatInt::generateInstSeq helper, added: sabuasal; removed: sameer.abuasal.
Oct 15 2018, 11:57 AM

Oct 10 2018

sabuasal added a comment to D52977: [RISCV] Introduce codegen patterns for instructions introduced in RV64I.

Do you need to update test/CodeGen/RISCV/compress.ll to add an 64 bit run line?

Oct 10 2018, 3:54 PM
sabuasal added a comment to D52961: [RISCV] Introduce the RISCVMatInt::generateInstSeq helper.
In D52961#1258464, @asb wrote:

The helper makes it clear to follow. Thanks!

I am just not sure why you did the change to prefer add to addiw when for 64bit? is addi faster than addiw?

No real reason to prefer one or the other when they're semantically identical. It's mostly to be closer to gas/gcc which uses addi at least for the isInt<12> case.

I see. makes sense. I just checked gcc and it seems to be favoring addi for that case.

Oct 10 2018, 3:30 PM
sabuasal edited reviewers for D52977: [RISCV] Introduce codegen patterns for instructions introduced in RV64I, added: sabuasal; removed: sameer.abuasal.
Oct 10 2018, 2:51 PM

Oct 8 2018

sabuasal added a comment to D52961: [RISCV] Introduce the RISCVMatInt::generateInstSeq helper.

The helper makes it clear to follow. Thanks!

Oct 8 2018, 6:48 PM

Sep 20 2018

sabuasal added a comment to D52067: [inline Cost] Don't mark function accessing varargs as non-inlinable.

comments addressed in committed version rL342675.

Sep 20 2018, 11:45 AM
sabuasal committed rL342675: [inline Cost] Don't mark functions accessing varargs as non-inlinable.
[inline Cost] Don't mark functions accessing varargs as non-inlinable
Sep 20 2018, 11:41 AM
sabuasal closed D52067: [inline Cost] Don't mark function accessing varargs as non-inlinable.
Sep 20 2018, 11:41 AM

Sep 19 2018

sabuasal added inline comments to D52067: [inline Cost] Don't mark function accessing varargs as non-inlinable.
Sep 19 2018, 7:15 PM
sabuasal updated the diff for D52067: [inline Cost] Don't mark function accessing varargs as non-inlinable.
  • Simplified the test case.
  • Added it to inline-varargs instead of having it in a separate file.
Sep 19 2018, 7:12 PM

Sep 13 2018

sabuasal created D52067: [inline Cost] Don't mark function accessing varargs as non-inlinable.
Sep 13 2018, 6:44 PM

Aug 29 2018

sabuasal added inline comments to D50634: [RISCV] Add support for local PIC addressing.
Aug 29 2018, 12:19 PM

Aug 22 2018

sabuasal added a reviewer for D50043: [RISCV] RISC-V using -fuse-init-array by default: mgrang.
Aug 22 2018, 6:55 PM
sabuasal added inline comments to D50790: [RISCV] Fixed SmallVector.h Assertion `idx < size()'.
Aug 22 2018, 6:54 PM

Jul 31 2018

sabuasal added inline comments to D50046: [RISCV] Add InstAlias definitions for add[w], and, xor, or, sll[w], srl[w], sra[w], slt and sltu with immediate..
Jul 31 2018, 11:53 AM

Jun 27 2018

sabuasal edited reviewers for D48430: [RISCV] Add support for lowering jumptables, added: sabuasal; removed: sameer.abuasal.
Jun 27 2018, 2:50 PM
sabuasal added a comment to rL335786: [RISCV] Add machine function pass to merge base + offset.

Can something similar to this be done where the user is a load or store?

You quite often get things like...

lui r1, #nnnnn
addi r1,#nnn
lw r2 nnn(r1)

... for example to access a field in a global. The addi can be merged into the lw offset and the lui adjusted appropriately.

Jun 27 2018, 2:41 PM
sabuasal abandoned D48414: [RISCV] Fix test/CodeGen/RISCV/indirectbr.ll after D48202.
Jun 27 2018, 2:19 PM
sabuasal committed rL335786: [RISCV] Add machine function pass to merge base + offset.
[RISCV] Add machine function pass to merge base + offset
Jun 27 2018, 1:56 PM
sabuasal closed D47857: [RISCV] Add machine function pass to merge base + offset.
Jun 27 2018, 1:56 PM
sabuasal updated the summary of D47857: [RISCV] Add machine function pass to merge base + offset.
Jun 27 2018, 1:44 PM
sabuasal updated the diff for D47857: [RISCV] Add machine function pass to merge base + offset.

Updated comments by @asb .

Jun 27 2018, 1:35 PM

Jun 21 2018

sabuasal committed rL335239: [RISCV] Tail calls don't need to save return address.
[RISCV] Tail calls don't need to save return address
Jun 21 2018, 7:41 AM
sabuasal closed D48343: [RISCV] Tail calls don't need to save return address.
Jun 21 2018, 7:41 AM

Jun 20 2018

sabuasal added a comment to D48414: [RISCV] Fix test/CodeGen/RISCV/indirectbr.ll after D48202.

Perhaps we should update the test to not rely on label names on the first place.

Jun 20 2018, 7:17 PM
sabuasal created D48414: [RISCV] Fix test/CodeGen/RISCV/indirectbr.ll after D48202.
Jun 20 2018, 7:17 PM

Jun 19 2018

sabuasal created D48343: [RISCV] Tail calls don't need to save return address.
Jun 19 2018, 5:29 PM

Jun 15 2018

sabuasal added inline comments to D47857: [RISCV] Add machine function pass to merge base + offset.
Jun 15 2018, 1:42 PM

Jun 6 2018

sabuasal created D47857: [RISCV] Add machine function pass to merge base + offset.
Jun 6 2018, 4:55 PM

May 29 2018

sabuasal added inline comments to D46118: [RISCV] AsmParser support for the li pseudo instruction.
May 29 2018, 3:17 PM
sabuasal added a comment to rL333455: [RISCV] Add peepholes for Global Address lowering patterns.

commits: https://reviews.llvm.org/D45748

May 29 2018, 12:41 PM
sabuasal closed D45748: [RISCV] Add peepholes for Global Address lowering patterns.

committed in: https://reviews.llvm.org/rL333455

May 29 2018, 12:40 PM
sabuasal committed rL333455: [RISCV] Add peepholes for Global Address lowering patterns.
[RISCV] Add peepholes for Global Address lowering patterns
May 29 2018, 12:39 PM
sabuasal updated the diff for D45748: [RISCV] Add peepholes for Global Address lowering patterns.
  • Added comment to the peephole to mention that it might be better as a machine function pass.
  • Added one extra test case to test.
May 29 2018, 12:30 PM

May 24 2018

sabuasal updated the diff for D45748: [RISCV] Add peepholes for Global Address lowering patterns.
  • Added a test case that shows this patch inability to deal with global address nodes spanning multiple blocks.
May 24 2018, 2:20 PM
sabuasal added a comment to D45748: [RISCV] Add peepholes for Global Address lowering patterns.
In D45748#1111477, @asb wrote:
In D45748#1110644, @asb wrote:

I'm happy to land this patch as-is. It generates good code in a wide variety of cases (more so than many other backends). But please add the above example to your test file with a note about the missed optimisation opportunity.

The example is actually already added to the test (define dso_local i32 @load_half() nounwind).

Sorry, I was focusing on the diff and had forgotten you already added this in the previous patch. Thanks!

On the other hand, after some chat with @efriedma I am starting to think that this whole thing could be better handled with a machine function Pass that runs after MachineCSE. Your point in your original longer example (and the example I have in the test case called control_flow) shows that we are getting different notes for the lowered global address (just run llc with --stop-before=machine--cse), hasUseOnce always considers the uses within the basic block you are on, not the whole function (which makes sense). The reason these two examples worked (and by worked I mean NOT having the offset merged back into the global address lowering) is because the ADDI generated for the offset was folded into the Load\STore and my peephole found no Tail to kick in. If we add a MachineFunctionPasss we we will have:

I agree with your analysis. It's conceivable you could have an IR pass that tries to improve codegen, but it would have to operate based on assumption about how global lowering would work. I think as you suggest, a machine function pass is promising.

So would the plan be to always generate offset separate from the global (as is current upstream behaviour after your recent patch), and to opportunistically merge the offset in the case the the global base only has a single reference, or (optionally, probably less common) every reference using that base has the same offset.

May 24 2018, 2:02 PM
sabuasal updated subscribers of D45748: [RISCV] Add peepholes for Global Address lowering patterns.
In D45748#1110644, @asb wrote:

The point I was trying to make is that even with these peepholes, there are trivial cases that aren't caught, such as the example I shared:

@foo = global [6 x i16] [i16 1, i16 2, i16 3, i16 4, i16 5, i16 0], align 2                                     
                                                                                                                
define i32 @main() nounwind {                                                                                   
entry:                                                                                                          
  %0 = load i16, i16* getelementptr inbounds ([6 x i16], [6 x i16]* @foo, i32 0, i32 4), align 2                
  %cmp = icmp eq i16 %0, 140                                                                                    
  br i1 %cmp, label %if.end, label %if.then                                                                     
                                                                                                                
if.then:                                                                                                        
  tail call void @abort()                                                                                       
  unreachable                                                                                                   
                                                                                                                
if.end:                                                                                                         
  ret i32 0                                                                                                     
}                                                                                                               
                                                                                                                
declare void @abort()

This generates 3 instructions for the global access when you'd obviously prefer to use two:

lui     a0, %hi(foo)
addi    a0, a0, %lo(foo)
lhu     a0, 8(a0)

I proposed one way of adding a peephole, but don't see a way of differentiating between the case above and more complex cases where the peephole is counter-productive (such as in the longer IR sample I shared). Possible the solution would be to have an IR pass perform common subexpression elimination on the getelementptr calculations - possibly such a pass already exists.

Hi alex,

May 24 2018, 12:46 AM

May 23 2018

sabuasal committed rL333132: [RISCV] Set CostPerUse for registers.
[RISCV] Set CostPerUse for registers
May 23 2018, 2:38 PM
sabuasal closed D47039: [RISCV] Set CostPerUse for registers.
May 23 2018, 2:38 PM
sabuasal updated the diff for D47039: [RISCV] Set CostPerUse for registers.

Updated the comment.

May 23 2018, 2:31 PM
sabuasal added a comment to D45748: [RISCV] Add peepholes for Global Address lowering patterns.

Ping?

May 23 2018, 12:06 PM
sabuasal added a comment to D47039: [RISCV] Set CostPerUse for registers.

Ping?
is this good to merge?

May 23 2018, 12:03 PM

May 18 2018

sabuasal added a comment to D46630: [RISCV] Insert NOPs and R_RISCV_ALIGN relocation type for .align directive when linker relaxation enabled.

Hi Shiva,

May 18 2018, 4:13 PM
sabuasal updated the diff for D47039: [RISCV] Set CostPerUse for registers.
May 18 2018, 2:29 PM
sabuasal updated the diff for D47039: [RISCV] Set CostPerUse for registers.

Added descriptive comments for rationale behind setting the CostPerUse property.

May 18 2018, 2:28 PM

May 17 2018

sabuasal created D47039: [RISCV] Set CostPerUse for registers.
May 17 2018, 4:40 PM
sabuasal updated the summary of D45748: [RISCV] Add peepholes for Global Address lowering patterns.
May 17 2018, 12:51 PM
sabuasal updated the summary of D45748: [RISCV] Add peepholes for Global Address lowering patterns.
May 17 2018, 12:51 PM
sabuasal updated the summary of D45748: [RISCV] Add peepholes for Global Address lowering patterns.
May 17 2018, 12:51 PM
sabuasal updated the summary of D45748: [RISCV] Add peepholes for Global Address lowering patterns.
May 17 2018, 12:44 PM
sabuasal updated the summary of D45748: [RISCV] Add peepholes for Global Address lowering patterns.
May 17 2018, 12:25 PM
sabuasal added a child revision for D46989: [RISCV] Separate base from offset in lowerGlobalAddress: D45748: [RISCV] Add peepholes for Global Address lowering patterns.
May 17 2018, 12:20 PM
sabuasal added a parent revision for D45748: [RISCV] Add peepholes for Global Address lowering patterns: D46989: [RISCV] Separate base from offset in lowerGlobalAddress.
May 17 2018, 12:20 PM
sabuasal updated the diff for D45748: [RISCV] Add peepholes for Global Address lowering patterns.

Updated the patch to only show the peephole optimizations.

May 17 2018, 12:18 PM
sabuasal committed rL332641: [RISCV] Separate base from offset in lowerGlobalAddress.
[RISCV] Separate base from offset in lowerGlobalAddress
May 17 2018, 11:18 AM
sabuasal closed D46989: [RISCV] Separate base from offset in lowerGlobalAddress.
May 17 2018, 11:18 AM
sabuasal updated the diff for D46989: [RISCV] Separate base from offset in lowerGlobalAddress.
May 17 2018, 11:13 AM

May 16 2018

sabuasal retitled D46989: [RISCV] Separate base from offset in lowerGlobalAddress from [RISCV] Separate base from offset in lowerGlobalAddress to [RISCV] Separate base from offset in lowerGlobalAddress (no peephole).
May 16 2018, 5:05 PM
sabuasal added a comment to D45748: [RISCV] Add peepholes for Global Address lowering patterns.
In D45748#1100940, @asb wrote:

I think we've all spent enough time looking at this to see that separating the base from offset in lowerGlobalAddress is a better starting point, even though there are a range of examples where it results in more instructions. I think we'll soon arrive at a solution that fixes the most common of those cases, but it's obviously not quite as straight-forward as originally hoped. If you'd like to land the change to lowerGlobalAddress quickly, I'd happily now review that as a standalone change. If you like, you could create a new review thread that includes that change as well as test cases that demonstrates where it helps, and cases where it's worse marked with TODO.

May 16 2018, 4:20 PM
sabuasal updated the summary of D46989: [RISCV] Separate base from offset in lowerGlobalAddress.
May 16 2018, 4:18 PM
sabuasal updated the summary of D46989: [RISCV] Separate base from offset in lowerGlobalAddress.
May 16 2018, 4:18 PM
sabuasal added reviewers for D46989: [RISCV] Separate base from offset in lowerGlobalAddress: asb, apazos.
May 16 2018, 4:17 PM
sabuasal created D46989: [RISCV] Separate base from offset in lowerGlobalAddress.
May 16 2018, 4:16 PM
sabuasal added a comment to D45748: [RISCV] Add peepholes for Global Address lowering patterns.
In D45748#1100874, @asb wrote:

I've started a discussion on GlobalAddress lowering strategy on llvm-dev here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123359.html

Your peephole code seems well written and really solid, but I'd like to discuss the dagcombiner approach some more. The sort of approach taken in rL330630 seems to catch the cases discussed in this thread and has a simpler implementation.

May 16 2018, 10:49 AM
sabuasal added a comment to D45748: [RISCV] Add peepholes for Global Address lowering patterns.
In D45748#1100079, @asb wrote:

I looked at this in some more detail. I have a few very rough statistics on the effect of the various peepholes on the GCC torture suite. I don't claim this is representative of real code, but of course global accesses are highly codebase dependent anyway. Stats are just on the total static instruction count, i.e. it doesn't take into account if instructions are moved from hot to cold basic blocks or vice versa.

Improvement of your proposed peephole over your patch with the peephole disabled:

  • 26 files have reduced number of instructions
  • 10 have an increased number of instructions
  • 6 have changed output but the same static instruction count

    If you then add my proposed peephole on top of that, then you see the following changes (baseline = this patch including your peephole)
  • 23 files have reduced number of instructions
  • 11 have increased number of instructions
  • 2 have changed output but the same static instruction count

So you are saying your addition hurts size, right?

If it were doable, the ideal starting point would be the following:

  • If the global base has only a single reference across the whole function, or every reference has the same offset then combine the global with offset
  • If the global base has multiple references with different offsets, then never combine the global with the offset

This is exactly what I am doing in my peephole:

  • Catch a tail node (RISCV::ADDI or RISCV::ADD)
  • Look at the operands of the node t detect a global address lowering sequence (LUI %hi followed by and ADDI %lo)
  • Look at the other operands of the tail node to catch the and reconstruct the offset. (it can be just an operand if the tail is an ADDI, or you might have to reconstruct from an LUI + ADDI pair if the offset is big or just an LUI)
  • Merge the offset back into the GlobalAddress if the global address only had one use. We can trust that at this point because ALL the blocks have RISCV target nodes.

I think the above two rules would be sensible regardless of whether you are linking statically or e.g. dynamic linking with PIC. Unfortunately we don't have access to that information at the point we'd like to make a decision about combining an offset. A particular lui+addi of a global may have only a single use, but that global may be referenced many times in the function. New TargetGlobalAddress nodes are introduced each time rather than reusing them. In these cases, MachineCSE can remove redundant instructions as long as different offsets aren't folded into the global. Unless we can force the reuse of identical TargetGlobalAddress nodes we can't easily write a peephole pass that respects the two rules proposed above. This input shows the problem well:

@a = internal unnamed_addr global [4 x i32] zeroinitializer, align 4

; Function Attrs: noreturn nounwind
define dso_local i32 @main() local_unnamed_addr #0 {
entry:
  %0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 0), align 4
  %cmp = icmp eq i32 %0, 0
  br i1 %cmp, label %if.end, label %if.then

if.then:                                          ; preds = %entry
  tail call void @abort() #3
  unreachable

if.end:                                           ; preds = %entry
  %1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 1), align 4
  %cmp1 = icmp eq i32 %1, 3
  br i1 %cmp1, label %if.end3, label %if.then2

if.then2:                                         ; preds = %if.end
  tail call void @abort() #3
  unreachable

if.end3:                                          ; preds = %if.end
  %2 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 2), align 4
  %cmp4 = icmp eq i32 %2, 2
  br i1 %cmp4, label %if.end6, label %if.then5

if.then5:                                         ; preds = %if.end3
  tail call void @abort() #3
  unreachable

if.end6:                                          ; preds = %if.end3
  %3 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 3), align 4
  %cmp7 = icmp eq i32 %3, 1
  br i1 %cmp7, label %if.end9, label %if.then8

if.then8:                                         ; preds = %if.end6
  tail call void @abort() #3
  unreachable

if.end9:                                          ; preds = %if.end6
  tail call void @exit(i32 0) #3
  unreachable
}

declare void @abort()

declare void @exit(i32)

Let me take a closer look at this test case. But I think SelectionDAG builder should be able
to merge the TargetNodes if they refer to the same address.

May 16 2018, 10:31 AM