User Details
- User Since
- Oct 18 2016, 6:44 PM (336 w, 2 m)
Jun 20 2019
Also, please let me know if you think it'll make sense to commit the two patches at once (once they are approved).
Jun 17 2019
Addressed comments from @Meinersbur.
Jun 14 2019
Oh I see, I only thought about this in terms of modeling in ISL instead of hacking something in ScopDetection\Info. Thanks for the suggestion.
You can detect this simple loop domain directly by using "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.
You can detect this simple loop domain directly by using "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.
You can detect this simple loop domain directly by using quering "is_singleton" but with nested loops you'll need to project out to find singletons in one dimension out of all the loop nest dimensions.
Jun 13 2019
Jun 10 2019
Apologies for the late response
May 1 2019
Nov 28 2018
Thank you for updating this.
Nov 27 2018
Nov 15 2018
Oct 22 2018
Oct 15 2018
Oct 10 2018
Do you need to update test/CodeGen/RISCV/compress.ll to add an 64 bit run line?
I see. makes sense. I just checked gcc and it seems to be favoring addi for that case.
Oct 8 2018
The helper makes it clear to follow. Thanks!
Sep 20 2018
comments addressed in committed version rL342675.
Sep 19 2018
- Simplified the test case.
- Added it to inline-varargs instead of having it in a separate file.
Sep 13 2018
Aug 29 2018
Aug 22 2018
Jul 31 2018
Jun 27 2018
Updated comments by @asb .
Jun 21 2018
Jun 20 2018
Perhaps we should update the test to not rely on label names on the first place.
Jun 19 2018
Jun 15 2018
Jun 6 2018
May 29 2018
commits: https://reviews.llvm.org/D45748
committed in: https://reviews.llvm.org/rL333455
- Added comment to the peephole to mention that it might be better as a machine function pass.
- Added one extra test case to test.
May 24 2018
- Added a test case that shows this patch inability to deal with global address nodes spanning multiple blocks.
Hi alex,
May 23 2018
Updated the comment.
Ping?
Ping?
is this good to merge?
May 18 2018
Hi Shiva,
Added descriptive comments for rationale behind setting the CostPerUse property.
May 17 2018
Updated the patch to only show the peephole optimizations.
May 16 2018
So you are saying your addition hurts size, right?
If it were doable, the ideal starting point would be the following:
- If the global base has only a single reference across the whole function, or every reference has the same offset then combine the global with offset
- If the global base has multiple references with different offsets, then never combine the global with the offset
This is exactly what I am doing in my peephole:
- Catch a tail node (RISCV::ADDI or RISCV::ADD)
- Look at the operands of the node t detect a global address lowering sequence (LUI %hi followed by and ADDI %lo)
- Look at the other operands of the tail node to catch the and reconstruct the offset. (it can be just an operand if the tail is an ADDI, or you might have to reconstruct from an LUI + ADDI pair if the offset is big or just an LUI)
- Merge the offset back into the GlobalAddress if the global address only had one use. We can trust that at this point because ALL the blocks have RISCV target nodes.
I think the above two rules would be sensible regardless of whether you are linking statically or e.g. dynamic linking with PIC. Unfortunately we don't have access to that information at the point we'd like to make a decision about combining an offset. A particular lui+addi of a global may have only a single use, but that global may be referenced many times in the function. New TargetGlobalAddress nodes are introduced each time rather than reusing them. In these cases, MachineCSE can remove redundant instructions as long as different offsets aren't folded into the global. Unless we can force the reuse of identical TargetGlobalAddress nodes we can't easily write a peephole pass that respects the two rules proposed above. This input shows the problem well:
@a = internal unnamed_addr global [4 x i32] zeroinitializer, align 4 ; Function Attrs: noreturn nounwind define dso_local i32 @main() local_unnamed_addr #0 { entry: %0 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 0), align 4 %cmp = icmp eq i32 %0, 0 br i1 %cmp, label %if.end, label %if.then if.then: ; preds = %entry tail call void @abort() #3 unreachable if.end: ; preds = %entry %1 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 1), align 4 %cmp1 = icmp eq i32 %1, 3 br i1 %cmp1, label %if.end3, label %if.then2 if.then2: ; preds = %if.end tail call void @abort() #3 unreachable if.end3: ; preds = %if.end %2 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 2), align 4 %cmp4 = icmp eq i32 %2, 2 br i1 %cmp4, label %if.end6, label %if.then5 if.then5: ; preds = %if.end3 tail call void @abort() #3 unreachable if.end6: ; preds = %if.end3 %3 = load i32, i32* getelementptr inbounds ([4 x i32], [4 x i32]* @a, i32 0, i32 3), align 4 %cmp7 = icmp eq i32 %3, 1 br i1 %cmp7, label %if.end9, label %if.then8 if.then8: ; preds = %if.end6 tail call void @abort() #3 unreachable if.end9: ; preds = %if.end6 tail call void @exit(i32 0) #3 unreachable } declare void @abort() declare void @exit(i32)
Let me take a closer look at this test case. But I think SelectionDAG builder should be able
to merge the TargetNodes if they refer to the same address.
Yes. Here is the problem with DAG combine:
- In our optimization we lower the Global address in a target global address and an ISD::ADD (a generic node, not target specific node)
- We create a DAG combine that does a call back for ISD::ADD.
- Right after the lowerGlobalAddress is done you hit the DAG combiner function. Now at this point other blocks are yet to be processed and the global address there is still not lowered into a TargetGlobalAddress, in other blocks (as listed in the the IR test I added called "control_flow") so the Target node we created only has one use only. Our combiner check will go back and put the offset back in the target lowering because it simply doesn't know.
May 9 2018
- Updated tests and formatting.
- Added peepholes to handle corner cases, handling these cases with dag combiner doesn't work for uses spanning multiple basic blocks (dag combiner works on BB level),