twoh (Taewook Oh)
User

Projects

User does not belong to any projects.

User Details

User Since
Feb 4 2016, 3:03 PM (72 w, 2 d)

Recent Activity

Tue, Jun 20

twoh updated the diff for D31821: Remove redundant copy in recurrences.

Addressing @wmi's concern by limiting the targets to the recurrence cycles that only the last instruction of the recurrence (that feeds the PHI instruction) can have uses outside of the recurrence. This is not an ideal solution yet, and more fundamental solution (such as having recurrence optimization as a separate pass and/or using live range analysis for it) should follow. But still I think it is worth to have it here.

Tue, Jun 20, 1:44 PM
twoh added a comment to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.
In D34373#785485, @twoh wrote:
In D34373#784975, @twoh wrote:

I think this is a right approach, but concerned that the experimental results I shared on D32451 show that it is generally better to not to vectorize the low trip count loops. @Ayal, I wonder if you have any results that this patch actually improves the performance. Thanks!

I know that we're currently missing opportunities for large vectorizable loops with low (static) trip counts. Smaller inner loops are also good candidates for unpredicated vectorization, but we may need to be a bit careful because of modeling inaccuracies and phase-ordering effects (e.g. if we don't vectorize a loop, then we'll end up unrolling it when the unroller runs).

Got it. My concern was for small single-level loops with low trip counts, as I observe them pretty frequently. I have no objection accepting this patch and improve the cost estimator separately.

Do you mean that you see such loops frequently with dynamically-small trip counts, or with static trip counts? I assume the small loops with (small) static trip counts will generally be unrolled.

Tue, Jun 20, 11:11 AM
twoh added a comment to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.
In D34373#784975, @twoh wrote:

I think this is a right approach, but concerned that the experimental results I shared on D32451 show that it is generally better to not to vectorize the low trip count loops. @Ayal, I wonder if you have any results that this patch actually improves the performance. Thanks!

I know that we're currently missing opportunities for large vectorizable loops with low (static) trip counts. Smaller inner loops are also good candidates for unpredicated vectorization, but we may need to be a bit careful because of modeling inaccuracies and phase-ordering effects (e.g. if we don't vectorize a loop, then we'll end up unrolling it when the unroller runs).

Tue, Jun 20, 9:17 AM

Mon, Jun 19

twoh added a comment to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.

I think this is a right approach, but concerned that the experimental results I shared on D32451 show that it is generally better to not to vectorize the low trip count loops. @Ayal, I wonder if you have any results that this patch actually improves the performance. Thanks!

Mon, Jun 19, 10:13 PM
twoh committed rL305729: Improve profile-guided heuristics to use estimated trip count..
Improve profile-guided heuristics to use estimated trip count.
Mon, Jun 19, 11:50 AM
twoh closed D32451: Improve profile-guided heuristics to use estimated trip count. by committing rL305729: Improve profile-guided heuristics to use estimated trip count..
Mon, Jun 19, 11:49 AM
twoh updated the diff for D32451: Improve profile-guided heuristics to use estimated trip count..

Addressing comments from @tejohnson. Thanks!

Mon, Jun 19, 11:35 AM

Sun, Jun 18

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

@tejohnson Sorry for the late reply. I was out of internet for days. I have no objection to set the default value to true, but prefer to have it as a separate patch so that we can track the impact on performance of each chance more easily.

Sun, Jun 18, 5:20 PM

Mon, Jun 12

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

Ping. Thanks!

Mon, Jun 12, 1:42 PM

Sat, Jun 10

twoh added a comment to D31821: Remove redundant copy in recurrences.

@wmi Not at all! Thanks for your comments.

Sat, Jun 10, 9:46 PM

Wed, Jun 7

twoh added a comment to D31821: Remove redundant copy in recurrences.

ping. thanks!

Wed, Jun 7, 2:29 PM

Sun, Jun 4

twoh added inline comments to D32451: Improve profile-guided heuristics to use estimated trip count..
Sun, Jun 4, 10:10 PM

Fri, Jun 2

twoh added inline comments to D32451: Improve profile-guided heuristics to use estimated trip count..
Fri, Jun 2, 12:54 PM

Wed, May 31

twoh added a comment to D31821: Remove redundant copy in recurrences.

FYI, below is the compile difference in percentage for spec2006:

Wed, May 31, 11:11 AM
twoh updated the diff for D31821: Remove redundant copy in recurrences.

Add missing reference.

Wed, May 31, 11:09 AM
twoh updated the diff for D31821: Remove redundant copy in recurrences.

Reimplement the feature in peephole optimizer. This patch makes the values in the recurrence cycle to be tied to each other if possible.

Wed, May 31, 10:28 AM
twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

Ping. Thanks!

Wed, May 31, 9:50 AM

May 22 2017

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

Comparing the existing implementation and this patch, I don't observe noticeable compile time different with Video-SIMDBench. I compiled the benchmark suite 3 times each and the median was 27.12 sec vs 27.55 sec, while the average was 27.96sec vs 27.89 sec. There were some code size differences, but it is simply more vectorization results bigger code size. There's no difference between OptForSize and non-vectorized.

May 22 2017, 3:33 PM

May 19 2017

twoh updated the diff for D32451: Improve profile-guided heuristics to use estimated trip count..

Addressing Ayal's comments.

May 19 2017, 1:20 PM
twoh added a comment to D31821: Remove redundant copy in recurrences.

@wmi Thank you for your reply. I agree on you that we should consider tied operand group, and as @qcolombet mentioned in the previous comment, if tie operand information is already available before this pass, I'd like to discuss where would be the best place to implement this.

May 19 2017, 1:14 PM

May 17 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

@qcolombet Actually you're right. Having tied operands, this can be done within SSA. Do you have a suggestion for a better place to implement this? I thought what isRevCopyChain does is most close to what this patch does.

May 17 2017, 1:25 PM

May 16 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

@MatzeB Thank you for your comments. I'm collection compile time numbers now with spec2006. For our internal benchmark, which takes about ~40min to compile, the compile time difference was negligible.

May 16 2017, 2:42 PM
twoh updated the diff for D31821: Remove redundant copy in recurrences.

Addressing comments from @MatzeB.

May 16 2017, 2:42 PM

May 13 2017

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

@Ayal Thanks for the pointer.

May 13 2017, 10:38 PM

May 12 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

Ping. Can someone please let me know what would be the best way to have this patch reviewed? Thanks!

May 12 2017, 2:36 PM

May 11 2017

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

I don't have a strong opinion about treating static trip count and profile-based trip count or not. I treated them separately because existing implementation sets OptForSize to true instead of returning false when it makes a profile-based decision. It would be great if someone can explain the reason behind that.

May 11 2017, 10:05 PM

May 7 2017

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

Thanks @Ayal for your comments! If the profile-based trip count checking is moved above the line

May 7 2017, 9:59 PM
twoh updated the diff for D32451: Improve profile-guided heuristics to use estimated trip count..

Addresing comments from Ayal.

May 7 2017, 9:49 PM

May 5 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

Ping.

May 5 2017, 8:46 AM
twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

Ping. Thanks!

May 5 2017, 8:46 AM

Apr 28 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

Ping. If you're too busy to review this patch, could you please recommend someone else? Thanks!

Apr 28 2017, 9:40 AM

Apr 27 2017

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

So I evaluated loop vectorizer with https://github.com/malvanos/Video-SIMDBench, which is introduced in this paper: http://ieeexplore.ieee.org/document/7723550/. I first built it without profile data and ran linux perf with a command of

Apr 27 2017, 2:36 PM
twoh updated the diff for D32451: Improve profile-guided heuristics to use estimated trip count..

Use profile-based trip count estimation only when trip count cannot be computed.

Apr 27 2017, 2:35 PM

Apr 24 2017

twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

@mkuper I can do that, but the issue is that the vectorized part doesn't dominated the execution time of the benchmark (profile is pretty flat across multiple functions), so it may not be suitable to tell the impact of vectorization change. Let me think about a good way to measure the it. Thanks!

Apr 24 2017, 2:17 PM
twoh added a comment to D32451: Improve profile-guided heuristics to use estimated trip count..

@mkuper This helps with our internal benchmarks, but I don't have numbers with public benchmarks. What would be the good benchmarks to evaluate the vectorization? Thanks!

Apr 24 2017, 2:00 PM
twoh created D32451: Improve profile-guided heuristics to use estimated trip count..
Apr 24 2017, 1:50 PM

Apr 21 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

@MatzeB I was considering other passes as well for the sake of simplicity, but I concluded two-address instruction is the right place to do. The problem doesn't appear while we're in SSA, but becomes an issue when we make an decision about which operand register should be the destination register. I think this is why isProfitableToCommute function is in two-address instruction pass.

Apr 21 2017, 11:39 AM
twoh added a comment to D31821: Remove redundant copy in recurrences.

Ping. Thanks!

Apr 21 2017, 11:16 AM

Apr 14 2017

twoh added a comment to D31821: Remove redundant copy in recurrences.

Friendly ping. Thanks!

Apr 14 2017, 10:43 AM
twoh added a reviewer for D31821: Remove redundant copy in recurrences: MatzeB.
Apr 14 2017, 10:43 AM

Apr 12 2017

twoh added a comment to D31827: Add a flag that enables more aggressive load PRE.

With numbers, it doesn't seem worth to turn it on by default. @dberlin, just curious, do you have any rough estimation about how much compilation time can be saved by implementing rank-order based iteration optimization?

Apr 12 2017, 11:07 PM
twoh added a comment to D31827: Add a flag that enables more aggressive load PRE.

Below are the numbers with spec2006 (The numbers in the last row are overall for SPEC score and total for compile time and binary size):

Apr 12 2017, 10:52 PM

Apr 11 2017

twoh added a comment to D31827: Add a flag that enables more aggressive load PRE.

@dberlin Thank you for your comments. I agree on you that it is not easy to invent a good heuristics around this knob, but still I find this useful for some cases that actually matters. Assuming that this doesn't make the implementation much complex, wouldn't it be better for compiler users to have more options?

Apr 11 2017, 3:58 PM
twoh added a comment to D31827: Add a flag that enables more aggressive load PRE.

No perf numbers with public benchmarks yet, but with our internal benchmark I confirmed that aggressive load PRE removes redundant instructions in a hot loop. I'll report spec2006 numbers later. This patch is for quick unlocking of the feature before making more fundamental improvement.

Apr 11 2017, 8:55 AM

Apr 7 2017

twoh created D31827: Add a flag that enables more aggressive load PRE.
Apr 7 2017, 1:11 PM
twoh updated the diff for D31821: Remove redundant copy in recurrences.

small changes to the test.

Apr 7 2017, 11:22 AM
twoh updated the summary of D31821: Remove redundant copy in recurrences.
Apr 7 2017, 11:19 AM
twoh created D31821: Remove redundant copy in recurrences.
Apr 7 2017, 11:19 AM

Mar 14 2017

twoh committed rL297808: NFC: Reformats comments according to the coding guildelines..
NFC: Reformats comments according to the coding guildelines.
Mar 14 2017, 11:41 PM
twoh committed rL297805: [BranchFolding] Merge debug locations from common tail instead of removing.
[BranchFolding] Merge debug locations from common tail instead of removing
Mar 14 2017, 10:57 PM
twoh closed D30226: [BranchFolding] Merge debug locations from common tail instead of removing by committing rL297805: [BranchFolding] Merge debug locations from common tail instead of removing.
Mar 14 2017, 10:57 PM

Mar 13 2017

twoh added a comment to D30226: [BranchFolding] Merge debug locations from common tail instead of removing.

@aprantl Could you please give your opinion on the inlined questions? Thanks!

Mar 13 2017, 9:10 AM

Mar 7 2017

twoh committed rL297194: Use filename in linemarker when compiling preprocessed source (Revised).
Use filename in linemarker when compiling preprocessed source (Revised)
Mar 7 2017, 12:32 PM
twoh closed D30663: Use filename in linemarker when compiling preprocessed source (Revised) by committing rL297194: Use filename in linemarker when compiling preprocessed source (Revised).
Mar 7 2017, 12:32 PM

Mar 6 2017

twoh added a comment to D30226: [BranchFolding] Merge debug locations from common tail instead of removing.

@aprantl Thanks for the review! I added inline questions asking for your opinion about consistency vs guideline.

Mar 6 2017, 2:29 PM
twoh updated the diff for D30663: Use filename in linemarker when compiling preprocessed source (Revised).

addressing comments from @inglorion

Mar 6 2017, 12:24 PM
twoh added inline comments to D30663: Use filename in linemarker when compiling preprocessed source (Revised).
Mar 6 2017, 11:53 AM
twoh created D30663: Use filename in linemarker when compiling preprocessed source (Revised).
Mar 6 2017, 10:03 AM

Mar 3 2017

twoh abandoned D30591: Introduce the feature "linux" for tests only for linux.

@inglorion That makes a lot of sense. Maybe we don't even need -g, because -S -emit-llvm shows source_filename. I'll run some more experiments with relpath/abspath and debug locations, and submit a revised patch. Thanks for the comment!

Mar 3 2017, 11:25 PM
twoh added a comment to D30226: [BranchFolding] Merge debug locations from common tail instead of removing.

Friendly Ping. Thanks!

Mar 3 2017, 3:24 PM
twoh created D30591: Introduce the feature "linux" for tests only for linux.
Mar 3 2017, 1:37 PM

Mar 2 2017

twoh committed rL296825: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).
[DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y)
Mar 2 2017, 2:10 PM
twoh closed D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y) by committing rL296825: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).
Mar 2 2017, 2:10 PM
twoh updated the diff for D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).

Rebase. Fix test/CodeGen/X86/and-sink.ll that added 9days ago.

Mar 2 2017, 2:02 PM

Mar 1 2017

twoh added a comment to D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).

Ping. thanks!

Mar 1 2017, 8:30 PM

Feb 27 2017

twoh committed rL296371: [TailDuplicator] Maintain DebugLoc for branch instructions.
[TailDuplicator] Maintain DebugLoc for branch instructions
Feb 27 2017, 11:42 AM
twoh closed D30026: [TailDuplicator] Maintain DebugLoc for branch instructions by committing rL296371: [TailDuplicator] Maintain DebugLoc for branch instructions.
Feb 27 2017, 11:41 AM

Feb 24 2017

twoh added a comment to D30226: [BranchFolding] Merge debug locations from common tail instead of removing.

Yes I observed the case from refresh_potential function in spec2006 429.mcf. The original C code snippet is below:

Feb 24 2017, 10:00 AM

Feb 23 2017

twoh added a comment to D29833: Improve the API of DILocation::getMergedLocation().

@aprantl I like the idea of having a convenience function. I'm afraid I still don't understand @dblaikie on getMergedLocation is better than the proposed convenience function in preventing buggy profile.

Feb 23 2017, 9:13 AM

Feb 22 2017

twoh added a comment to D30026: [TailDuplicator] Maintain DebugLoc for branch instructions.

Friendly Ping. Thanks!

Feb 22 2017, 10:21 PM
twoh added inline comments to D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).
Feb 22 2017, 11:03 AM
twoh added inline comments to D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).
Feb 22 2017, 8:42 AM

Feb 21 2017

twoh added a comment to D29833: Improve the API of DILocation::getMergedLocation().
In D29833#682595, @twoh wrote:

I don't have strong opinion about scope, because anyway we won't be able to have precise scope. Taking DISubprogram of the destination instruction might be a conservative option.

Mmmm... Does scope imply source file? LTO can inline across CUs, then merging an instruction from CU#1 into a scope from CU#2 would make the line number meaningless.

Feb 21 2017, 3:38 PM
twoh committed rL295779: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.
Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters
Feb 21 2017, 2:42 PM
twoh closed D30000: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters by committing rL295779: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.
Feb 21 2017, 2:42 PM
twoh created D30226: [BranchFolding] Merge debug locations from common tail instead of removing.
Feb 21 2017, 2:39 PM
twoh updated the diff for D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).

Addressing @aprantl's comments. Thanks!

Feb 21 2017, 1:15 PM
twoh added a comment to D30000: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.

@eric_niebler Do you want any more experiments with this patch? I think Windows machines not printing warnings/fixits for absolute path is a separate issue with this.

Feb 21 2017, 11:43 AM
twoh added a comment to D29833: Improve the API of DILocation::getMergedLocation().

Personally I prefer the existing API, because it doesn't allow the transient state before merging. For example, in lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp in this patch, DebugLoc of NewSI is not actually valid after line 1430. Existing API doesn't have this problem.

Feb 21 2017, 11:26 AM
twoh added inline comments to D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).
Feb 21 2017, 10:53 AM

Feb 20 2017

twoh updated the diff for D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).

Now avx512-fsel.ll is auto-generated from the script.

Feb 20 2017, 9:07 PM
twoh added inline comments to D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).
Feb 20 2017, 9:05 PM
twoh added reviewers for D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y): andreadb, craig.topper.
Feb 20 2017, 7:59 PM
twoh updated the diff for D29813: [DAGCombiner] Fix DebugLoc propagation when folding !(x cc y) -> (x !cc y).

Some tests need to be updated as new DebugLoc changes instruction ordering.

Feb 20 2017, 7:55 PM
twoh committed rL295684: [BranchFolding] Update debug location along with the update of branch….
[BranchFolding] Update debug location along with the update of branch…
Feb 20 2017, 4:24 PM
twoh closed D29902: [BranchFolding] Update debug location along with the update of branch instruction. by committing rL295684: [BranchFolding] Update debug location along with the update of branch….
Feb 20 2017, 4:24 PM
twoh added a comment to D29902: [BranchFolding] Update debug location along with the update of branch instruction..

Ping. Thanks!

Feb 20 2017, 8:08 AM

Feb 16 2017

twoh added inline comments to D30000: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.
Feb 16 2017, 11:16 AM

Feb 15 2017

twoh added a comment to D30000: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.

@eric_niebler I just tried it on Windows machine, and it just succeeded with no warnings/fix-it. Is that expected?

Feb 15 2017, 11:57 PM
twoh updated the diff for D30026: [TailDuplicator] Maintain DebugLoc for branch instructions.

nit: auto

Feb 15 2017, 11:45 PM
twoh created D30026: [TailDuplicator] Maintain DebugLoc for branch instructions.
Feb 15 2017, 11:38 PM
twoh updated the diff for D29902: [BranchFolding] Update debug location along with the update of branch instruction..

Use MachineBasicBlock::findBranchDebugLoc instead of getBranchDebugLoc

Feb 15 2017, 4:10 PM
twoh updated the diff for D30000: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.

Make it explicit that the test doesn't support windows. @eric_niebler, my original intention was avoiding use of platform-dependent path separator, but now made it explicit that the test is not for windows, it should be okay to use '/'. Thanks for the comments!

Feb 15 2017, 11:45 AM
twoh created D30000: Fix for pr31836 - pp_nonportable_path on absolute paths: broken delimiters.
Feb 15 2017, 10:57 AM

Feb 14 2017

twoh committed rL295106: [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions….
[BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions…
Feb 14 2017, 1:22 PM
twoh closed D29867: [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors by committing rL295106: [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions….
Feb 14 2017, 1:22 PM
twoh added a comment to D29867: [BasicBlockUtils] Use getFirstNonPHIOrDbg to set debugloc for instructions created in SplitBlockPredecessors.

Thanks for the review!

Feb 14 2017, 1:20 PM
twoh added inline comments to D29833: Improve the API of DILocation::getMergedLocation().
Feb 14 2017, 10:44 AM
twoh committed rL295075: Do not apply redundant LastCallToStaticBonus.
Do not apply redundant LastCallToStaticBonus
Feb 14 2017, 9:41 AM
twoh closed D29169: Do not apply redundant LastCallToStaticBonus by committing rL295075: Do not apply redundant LastCallToStaticBonus.
Feb 14 2017, 9:41 AM