The finding the combined complex addressing mode in some cases becomes
expensive. This CL limits the work by number of basic blocks to traverse.
This is a fix for PR39625.
Differential D54517
[CGP] Limit Complex Addressing mode by number of BasicBlocks to traverse skatkov on Nov 14 2018, 1:53 AM. Authored by
Details
The finding the combined complex addressing mode in some cases becomes This is a fix for PR39625.
Diff Detail Event TimelineComment Actions Basically I think the code looks ok. However, I'm not so familiar with this algorithm so it is hard to comment about the actual solution. My understanding is that you introduce a threshold, and if the size of the TraverseOrder vector grows past the threshold we bail out from findCommon. So what is the impact of this? How did you choose the current threshold? (that is probably something people want to know when trying to finetune this 4 years from now, so it could be nice to say something about it in the commit msg even if it just is an estimate) Comment Actions Hi Bjorn, thank you for looking into this patch. The optimization itself does the following: Say the difference in the base. Generally we want to build a number of phi nodes to have right base at load. The number 100 has no specific meaning. It is some threshold which is big enough to accept more or less big CFG. I hope it helps. Comment Actions I don't have any objections to the patch myself, but I don't really have the knowledge to understand if this could be bad for some important use cases either. A user can ofcourse override the threshold if needed, so I guess that makes this patch a little bit less "dangerous". It looks like @chandlerc and @haicheng has contributed to this pass in the past. Can perhaps one of you help out and take a quick look at this to see if introducing this threshold is a reasonable fix? Comment Actions Thank you, Bjorn. I guess @john.brawn would be best person to review this patch because he reviewed an original implementation. Comment Actions FYI: in parallel there is an idea how to simplify algorithm significantly. I'll try to implement it... Comment Actions Looks OK as a fix for the reported bug, with one minor nitpick. Taking a look at PR39625 it looks like there's an underlying problem where we insert a load of useless placeholders, e.g. for declare void @otherfn() define i32 @fn(i32* %arg1, i32* %arg2) { entry: %gep1 = getelementptr i32, i32* %arg1, i32 4 %gep2 = getelementptr i32, i32* %arg1, i32 8 br i1 undef, label %a1, label %b1 a1: call void @otherfn() br label %middle b1: call void @otherfn() br label %middle middle: br i1 undef, label %a2, label %b2 a2: call void @otherfn() br label %end b2: call void @otherfn() br label %end end: %phi = phi i32* [ %gep1, %a2 ], [ %gep2, %b2 ] %val = load i32, i32* %phi, align 4 ret i32 %val } we insert a total of 9 placeholders, but actually we need only one and the rest get deleted later. So I think there's something that could be done to improve that, but that's no reason not to do this fix.
Comment Actions Hi @john.brawn, could you please take a look at https://reviews.llvm.org/D54932. It makes a compile time improvement and I guess we will not need this patch for a while. |
This should have a << "\n" on the end.