Page MenuHomePhabricator

fhahn (Florian Hahn)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 18 2016, 4:39 AM (143 w, 5 d)

Recent Activity

Today

fhahn committed rGf9b28e53c7d1: [ScheduleDAGInstrs] Compute topological ordering on demand. (authored by fhahn).
[ScheduleDAGInstrs] Compute topological ordering on demand.
Tue, May 21, 6:03 AM
fhahn committed rL361253: [ScheduleDAGInstrs] Compute topological ordering on demand..
[ScheduleDAGInstrs] Compute topological ordering on demand.
Tue, May 21, 6:02 AM
fhahn closed D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..
Tue, May 21, 6:02 AM · Restricted Project
fhahn added a comment to D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..

Thanks Eli!

Tue, May 21, 6:02 AM · Restricted Project
fhahn committed rG4a8835c655e8: [AArch64] Skip mask checks for masks with an odd number of elements. (authored by fhahn).
[AArch64] Skip mask checks for masks with an odd number of elements.
Tue, May 21, 3:04 AM
fhahn committed rL361235: [AArch64] Skip mask checks for masks with an odd number of elements..
[AArch64] Skip mask checks for masks with an odd number of elements.
Tue, May 21, 3:03 AM
fhahn closed D60690: [AArch64] Skip mask checks for masks with an odd number of elements..
Tue, May 21, 3:03 AM · Restricted Project

Yesterday

fhahn added inline comments to D60690: [AArch64] Skip mask checks for masks with an odd number of elements..
Mon, May 20, 2:25 PM · Restricted Project
fhahn added a comment to D60690: [AArch64] Skip mask checks for masks with an odd number of elements..

Ping. @Gerolf, are you happy with the update?

Mon, May 20, 2:23 PM · Restricted Project

Sun, May 19

fhahn added a comment to D61409: [SimplifyCFG] Added condition assumption for unreachable blocks.

You can use something like the command below to get a comparison between 2 sets of runs. Also, doing single runs only will probably result in quite noisy results.

Sun, May 19, 2:35 PM · Restricted Project
fhahn added a comment to D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..

I've verified this does not change CodeGen for the test-suite + various external suites. Slightly positive impact on compile-time (-0.1 % geomean speedup for test-suite + SPEC & co, with -O1 on X86)

Sun, May 19, 9:01 AM · Restricted Project
fhahn added inline comments to D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..
Sun, May 19, 9:00 AM · Restricted Project
fhahn updated the diff for D60839: [ScheduleDAGInstrs] Compute topological ordering on demand..

Remove unused MarkDirty.

Sun, May 19, 9:00 AM · Restricted Project

Wed, May 15

fhahn added a comment to D60266: [LoopUnroll] Rotate loop, when optimizing for size and can fully unroll a loop..

Why do we need to rotate the loop before unrolling? llvm::UnrollLoop currently refuses to unroll loops where the latch is an unconditional branch, but that isn't a fundamental limitation, as far as I can tell. We already support unrolling loops where the latch is not the exit branch; allowing loops where the latch doesn't exit at all is a minor extension. Granted, it might be more efficient to explicitly rotate the loop before unrolling, so we don't clone quite so much code.

Wed, May 15, 1:51 PM · Restricted Project
fhahn created D61962: [WIP][LoopUnroll] Add support for loops with exiting headers and uncond latches..
Wed, May 15, 1:49 PM · Restricted Project
fhahn added a comment to D60619: New pass to produce more easily-read IR..

As a way to test this on a larger number of cases, could you add it to the default pass pipeline (maybe at multiple points) and bootstrap clang with it (or build the test-suite & co)?

Wed, May 15, 11:59 AM · Restricted Project
fhahn added inline comments to D60619: New pass to produce more easily-read IR..
Wed, May 15, 11:56 AM · Restricted Project
fhahn accepted D61882: [ARM] Don't use the Machine Scheduler for cortex-m at minsize.

LGTM, it would be great to have Cortex-M4 use the MachineScheduler in the non -Oz cases. Lets keep track of the issues (I expect more to surface) and go from there.

Wed, May 15, 3:38 AM · Restricted Project
fhahn committed rG9e778e6c730a: [LV] Move getScalarizationOverhead and vector call cost computations to CM. (authored by fhahn).
[LV] Move getScalarizationOverhead and vector call cost computations to CM.
Wed, May 15, 3:05 AM
fhahn committed rL360758: [LV] Move getScalarizationOverhead and vector call cost computations to CM..
[LV] Move getScalarizationOverhead and vector call cost computations to CM.
Wed, May 15, 3:05 AM
fhahn closed D61638: [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC).
Wed, May 15, 3:05 AM · Restricted Project

Tue, May 14

fhahn added inline comments to D61920: [JumpThreading] A bug fix for stale loop info after unfold select.
Tue, May 14, 3:28 PM · Restricted Project
fhahn accepted D61770: [OpenMP][AArch64] Fix compile with LLVM trunk..

The change in the AArch64 assembly looks good to me.

Tue, May 14, 2:33 PM · Restricted Project
fhahn added a comment to D59065: Add ptrmask intrinsic.

@sanjoy it would be great if you could have a quick look and check if the new approach makes sense to you.

Tue, May 14, 2:21 PM · Restricted Project
fhahn updated the diff for D59065: Add ptrmask intrinsic.

Thanks for the excellent suggestions! I've updated the text to use the GEP based description, as the GEP documentation spells out the behavior with respect to invalid pointers in detail. Also incorporated Hal's comments.

Tue, May 14, 12:52 PM · Restricted Project
fhahn added a comment to D61904: [LICM] Allow AliasSetMap to contain top-level loops..

Thanks!

Tue, May 14, 12:48 PM · Restricted Project
fhahn committed rG53c9d585b5b1: [LICM] Allow AliasSetMap to contain top-level loops. (authored by fhahn).
[LICM] Allow AliasSetMap to contain top-level loops.
Tue, May 14, 12:41 PM
fhahn committed rL360704: [LICM] Allow AliasSetMap to contain top-level loops..
[LICM] Allow AliasSetMap to contain top-level loops.
Tue, May 14, 12:39 PM
fhahn closed D61904: [LICM] Allow AliasSetMap to contain top-level loops..
Tue, May 14, 12:39 PM · Restricted Project
fhahn added a reviewer for D59995: [LV] Exclude loop-invariant inputs from scalar cost computation.: Ayal.

Ping. Ayal, does this address your comments appropriately?

Tue, May 14, 12:32 PM · Restricted Project
fhahn added inline comments to D59918: [Attributor] Pass infrastructure and fixpoint framework.
Tue, May 14, 9:45 AM · Restricted Project
fhahn added a comment to D59065: Add ptrmask intrinsic.
  • What happened to the poison result sentence. I think we need that to ensure the "based on the same ..." part. Otherwise, we can butcher the pointer with the mask and end up with a "valid pointer" into a different object. I might be wrong here, just want to make sure we have thought of it.

My understanding from Hal's comments was that the behavior should be clear from the pre-existing semantics, but I might have missed something?

I don't see why but I might be wrong (@hfinkel). What I thought is detailed below.

The case where we fail to clear bits that must be zero should be clearly handled by the existing semantics.

Sure.

The case where we might clear meaningful bits

I'm not sure about this. At least I think this is not what you intended to do initially but maybe this is OK for you.

Tue, May 14, 9:32 AM · Restricted Project
fhahn updated the diff for D59065: Add ptrmask intrinsic.

What happens if I use an integer type bigger than the native pointer width? Or smaller?

Initially I wanted to restrict the mask type to the native pointer width, but couldn't find how to do that in Intrinsics.td. On second thought, this might be more restrictive than necessary. I've adjusted the wording to make it clear that we zero-extend or truncate the mask if the bit width does not match the pointer size of the target. What do you think?

The wording makes it clear now (what is supposed to happen at least).

Tue, May 14, 8:38 AM · Restricted Project
fhahn created D61904: [LICM] Allow AliasSetMap to contain top-level loops..
Tue, May 14, 8:32 AM · Restricted Project
fhahn added a comment to D61882: [ARM] Don't use the Machine Scheduler for cortex-m at minsize.

This looks like an interesting case! I think there are 2 things going wrong (see input to the scheduler below)

Tue, May 14, 4:19 AM · Restricted Project

Mon, May 13

fhahn added a comment to D61576: [LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches..

ping

Mon, May 13, 10:52 AM · Restricted Project
fhahn updated the diff for D61638: [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC).

Ran clang-format.

Mon, May 13, 10:16 AM · Restricted Project
fhahn added a comment to D60690: [AArch64] Skip mask checks for masks with an odd number of elements..

ping

Mon, May 13, 9:59 AM · Restricted Project
fhahn accepted D61258: AArch64: support binutils-like things on arm64_32..

LGTM. I've also added a few other reviewers, in case they have additional thoughts.

Mon, May 13, 3:41 AM · Restricted Project

Fri, May 10

fhahn added inline comments to D59065: Add ptrmask intrinsic.
Fri, May 10, 6:33 AM · Restricted Project
fhahn updated the diff for D59065: Add ptrmask intrinsic.

What happens if I use an integer type bigger than the native pointer width? Or smaller?

Fri, May 10, 6:27 AM · Restricted Project
fhahn updated the diff for D59065: Add ptrmask intrinsic.

Thanks for your feedback @jdoerfert & @hfinkel! I've adjusted the description as suggested. I dropped the information about 'relevant' bits and made the description more precise.

Fri, May 10, 6:16 AM · Restricted Project

Thu, May 9

fhahn updated the diff for D59065: Add ptrmask intrinsic.

Change behavior from undefined to poison when passing masks that zero out relevant bits of a pointer, thanks @aqjune

Thu, May 9, 10:39 AM · Restricted Project
fhahn added a comment to D61511: [DAGCombiner] Limit number of nodes explored as store candidates..

thanks!

Thu, May 9, 10:23 AM · Restricted Project
fhahn committed rGbe10bc71f9af: [DAGCombiner] Limit number of nodes explored as store candidates. (authored by fhahn).
[DAGCombiner] Limit number of nodes explored as store candidates.
Thu, May 9, 10:04 AM
fhahn committed rL360357: [DAGCombiner] Limit number of nodes explored as store candidates..
[DAGCombiner] Limit number of nodes explored as store candidates.
Thu, May 9, 10:03 AM
fhahn closed D61511: [DAGCombiner] Limit number of nodes explored as store candidates..
Thu, May 9, 10:03 AM · Restricted Project

Wed, May 8

fhahn added inline comments to D59065: Add ptrmask intrinsic.
Wed, May 8, 9:12 AM · Restricted Project
fhahn added a parent revision for D61669: [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject.: D59065: Add ptrmask intrinsic.
Wed, May 8, 9:02 AM · Restricted Project
fhahn added a comment to D61669: [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject..

I'd look into this but I have trouble locating the "ptrmask" intrinsic in LLVM/Clang/LangRef. Could you help me out?

Wed, May 8, 9:02 AM · Restricted Project
fhahn added a child revision for D59065: Add ptrmask intrinsic: D61669: [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject..
Wed, May 8, 9:02 AM · Restricted Project
fhahn added a comment to D60266: [LoopUnroll] Rotate loop, when optimizing for size and can fully unroll a loop..

On a related note, independent of what we do in unrolling, it would probably be worthwhile to teach loop rotation to rotate loops where the cloned header would fold to zero instructions.

Wed, May 8, 8:37 AM · Restricted Project
fhahn added a parent revision for D61683: [LoopRotate] Use UnrolledInstAnalyzer to account for simplifications in hoisted header. (WIP): D61682: [LoopUnroll] Move visitBB and cost computation to UnrolledInstAnalyzer, for reuse (WIP).
Wed, May 8, 8:33 AM · Restricted Project
fhahn added a child revision for D61682: [LoopUnroll] Move visitBB and cost computation to UnrolledInstAnalyzer, for reuse (WIP): D61683: [LoopRotate] Use UnrolledInstAnalyzer to account for simplifications in hoisted header. (WIP).
Wed, May 8, 8:33 AM · Restricted Project
fhahn created D61682: [LoopUnroll] Move visitBB and cost computation to UnrolledInstAnalyzer, for reuse (WIP).
Wed, May 8, 8:32 AM · Restricted Project
fhahn created D61683: [LoopRotate] Use UnrolledInstAnalyzer to account for simplifications in hoisted header. (WIP).
Wed, May 8, 8:32 AM · Restricted Project
fhahn added inline comments to D61258: AArch64: support binutils-like things on arm64_32..
Wed, May 8, 7:05 AM · Restricted Project
fhahn updated the diff for D61669: [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject..

Add missing attributes to @init argument.

Wed, May 8, 2:53 AM · Restricted Project
fhahn retitled D59065: Add ptrmask intrinsic from [BasicAA] Simplify inttoptr(and(ptrtoint(X), C)) to X, if C preserves all significant bits. to Add ptrmask intrinsic.
Wed, May 8, 2:41 AM · Restricted Project
fhahn added a reviewer for D61669: [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject.: aqjune.
Wed, May 8, 2:41 AM · Restricted Project
fhahn created D61669: [ValueTracking] Look through ptrmask intrinsics during getUnderlyingObject..
Wed, May 8, 2:39 AM · Restricted Project
fhahn updated the diff for D59065: Add ptrmask intrinsic.

I went with adding a pointer-mask intrinsic, as suggested.

Wed, May 8, 2:35 AM · Restricted Project
fhahn committed rG3c696b3e7c21: [SCCP] Fix crash when trying to constant-fold terminators multiple times. (authored by fhahn).
[SCCP] Fix crash when trying to constant-fold terminators multiple times.
Wed, May 8, 2:08 AM
fhahn committed rL360232: [SCCP] Fix crash when trying to constant-fold terminators multiple times..
[SCCP] Fix crash when trying to constant-fold terminators multiple times.
Wed, May 8, 2:07 AM
fhahn closed D61300: [SCCP] Fix crash when trying to constant-fold terminators multiple times..
Wed, May 8, 2:07 AM · Restricted Project

Tue, May 7

fhahn added a comment to D60582: [IPSCCP] Add general integer range support..

ping

Tue, May 7, 1:04 PM · Restricted Project
fhahn added a comment to D59918: [Attributor] Pass infrastructure and fixpoint framework.

And finally, what can we do to check correctness? As the current attribute definitions are a bit fuzzy, it seems this could lead to problems down the road, in case we infer invalid attributes, which only get used (and cause problems) a bit later. Assuming we have a clear definition of a set of attributes, how hard would it be to verify the attributes in a module (maybe we are doing it already)?

I'm unsure I understand this question. If a fixpoint is reached, as reported by the AbstractAttribute::updateImpl methods, we should have derived "correct" information. Obviously, there might be a bug in the logic of the AbstractAttribute::updateImpl function, or the associated materialization, the dependence tracking in the attributor, etc. but I do not have a better solution than more (stress) tests (see D59903). I open a few bugs while I was developing attribute deductions in this framework and compare the results to the existing one. We current derive different things we should not as described in PR41336 and PR41328. More attributes will also trigger, or better expose, "bugs" down the line (see D59917) and we will probably have to manually look into the root cause as they pop up, was the attribute not correct or later reasoning based on it.

Tue, May 7, 12:56 PM · Restricted Project
fhahn updated the diff for D61300: [SCCP] Fix crash when trying to constant-fold terminators multiple times..

Use InstBB instead of I->getParent(). Thanks, I was looking at the wrong line :)

Tue, May 7, 12:30 PM · Restricted Project
fhahn added a comment to D61300: [SCCP] Fix crash when trying to constant-fold terminators multiple times..

Do we need to "repair" the folded branch or could we also determine earlier that the current block is dead (it will have an unconditional branch to a dead block, right?) and avoid folding?
It might make sense to fold due to the DT update though.

Tue, May 7, 10:04 AM · Restricted Project
fhahn committed rGa9d6c32eafc6: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor (authored by fhahn).
[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
Tue, May 7, 9:46 AM
fhahn committed rL360171: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
[DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor
Tue, May 7, 9:45 AM
fhahn closed D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
Tue, May 7, 9:45 AM · Restricted Project
fhahn added a comment to D61300: [SCCP] Fix crash when trying to constant-fold terminators multiple times..

ping

Tue, May 7, 8:59 AM · Restricted Project
fhahn added a parent revision for D59995: [LV] Exclude loop-invariant inputs from scalar cost computation.: D61638: [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC).
Tue, May 7, 4:13 AM · Restricted Project
fhahn updated the diff for D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

Filter operands for getIntrinsicCallCost as well.

Tue, May 7, 4:13 AM · Restricted Project
fhahn added a child revision for D61638: [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC): D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..
Tue, May 7, 4:13 AM · Restricted Project
fhahn added a comment to D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

The culprit here is the assumption made by TTI.getOperandsScalarizationOverhead(Operands, VF) that all its Operands will be vectorized according to VF, and would thus require extraction to feed a scalarized/replicated user. But any such Operand might not get vectorized, and possibly must not get vectorized, e.g., due to an incompatible type as demonstrated by PR41294 and the testcase. In some cases an Operand will obviously not be vectorized, such as if it's loop-invariant or live-in. More generally, LV uses the following:

auto needsExtract = [&](Instruction *I) -> bool {
  return TheLoop->contains(I) && !isScalarAfterVectorization(I, VF);
};

which would require passing not only TheLoop into getScalarizationOverhead(I, VF, TTI) but also the CM --- better turn this static function into a method of CM?

Tue, May 7, 4:08 AM · Restricted Project
fhahn updated the diff for D59995: [LV] Exclude loop-invariant inputs from scalar cost computation..

Split out moving various functions to LoopVectorizationCostModel to D61638.

Tue, May 7, 3:54 AM · Restricted Project
fhahn created D61638: [LV] Move getScalarizationOverhead and vector call cost computations to CM. (NFC).
Tue, May 7, 3:51 AM · Restricted Project

Mon, May 6

fhahn added inline comments to D61511: [DAGCombiner] Limit number of nodes explored as store candidates..
Mon, May 6, 12:38 PM · Restricted Project
fhahn updated the diff for D61511: [DAGCombiner] Limit number of nodes explored as store candidates..

Updated increments.

Mon, May 6, 12:37 PM · Restricted Project
fhahn added inline comments to D61576: [LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches..
Mon, May 6, 5:00 AM · Restricted Project
fhahn updated the diff for D61576: [LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches..

Remove restriction for single use PHIs. We can replace such PHIs with their incoming
value, iff all uses are in the loop exit block or in the outer loop header, if
the incoming value is defined in the inner loop header (reduction phis).

Mon, May 6, 4:54 AM · Restricted Project

Sun, May 5

fhahn updated the diff for D61576: [LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches..

Add test case without loop exit, clang-format-diff

Sun, May 5, 2:54 PM · Restricted Project
fhahn created D61576: [LoopInterchange] Fix handling of LCSSA nodes defined in headers and latches..
Sun, May 5, 2:38 PM · Restricted Project

Fri, May 3

fhahn added a comment to D60133: [DAGCombiner] Improve detection of unmergable stores, based on type size (WIP).

It's not sufficient to check if you can merge two stores into a valid node; there are backends where you need 4 or more to get a legal merged store.

If you look at target-specific implementations of CanMergeStoresTo it essentially serves as a context-specific find maximum store which is what we need here. If you massage that interface a bit you can fold most of this check in there.

Fri, May 3, 8:49 AM · Restricted Project
fhahn added inline comments to D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
Fri, May 3, 8:10 AM · Restricted Project
fhahn created D61511: [DAGCombiner] Limit number of nodes explored as store candidates..
Fri, May 3, 8:09 AM · Restricted Project
fhahn added inline comments to D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
Fri, May 3, 8:06 AM · Restricted Project
fhahn updated the diff for D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.

Limit this patch to visitTokenFactor, limiting the number of operands to inline to 2048 nodes.

Fri, May 3, 7:57 AM · Restricted Project
fhahn added a comment to D60690: [AArch64] Skip mask checks for masks with an odd number of elements..

Could you separate size one checks are OK vs. checks that require even powers of 2? That smells like an inconsistency worth calling out explicitly.

Fri, May 3, 3:47 AM · Restricted Project
fhahn updated the diff for D60690: [AArch64] Skip mask checks for masks with an odd number of elements..

Push even element checks to the impacted functions.

Fri, May 3, 3:45 AM · Restricted Project

Thu, May 2

fhahn added a comment to D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.

Notes inline, but I think the majority of the compile time improvements come from keepign TokenFactor operand counts bounded. This should be changed to do reflect that.

Thu, May 2, 2:19 PM · Restricted Project
fhahn updated the diff for D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.

Relax limits to 1000 nodes to explore. Further experimenting showed that those
bigger limits are still sufficient to ensure limiting quadratic compile time.

Thu, May 2, 5:43 AM · Restricted Project

Wed, May 1

fhahn abandoned D59108: [Docs] Add top level CONTRIBUTING.md..

As per feedback here & on llvm-commits it seems like this is not the right thing at the moment. Thanks for taking a look!

Wed, May 1, 1:27 PM · Restricted Project
fhahn added a comment to D61346: [CMake] Do not use libtool on Apple platforms when building LLVM with (full) LTO..

Thanks for taking a look so quickly! I'll look into how to best set-up DYLD_LIBRARY_PATH in that case. I am after a more lightweight solution than CLANG_ENABLE_BOOTSTRAP, i.e. creating/configuring the separate build stages manually.

Wed, May 1, 1:26 PM · Restricted Project
fhahn created D61397: [DAGCombiner] Avoid creating large tokenfactors in visitTokenFactor.
Wed, May 1, 1:22 PM · Restricted Project

Tue, Apr 30

fhahn created D61346: [CMake] Do not use libtool on Apple platforms when building LLVM with (full) LTO..
Tue, Apr 30, 3:00 PM · Restricted Project
fhahn added inline comments to D60582: [IPSCCP] Add general integer range support..
Tue, Apr 30, 8:43 AM · Restricted Project
fhahn abandoned D51431: [IPSCCP] Add lattice value for != constant .

I plan to add this on top of the integer range support patches, abandoning this for now

Tue, Apr 30, 8:40 AM