hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (295 w, 4 d)

Recent Activity

Yesterday

hfinkel accepted D49337: Skip debuginfo intrinsic in markLiveBlocks..

LGTM

Mon, Jul 16, 4:01 PM

Sun, Jul 15

hfinkel added inline comments to D49337: Skip debuginfo intrinsic in markLiveBlocks..
Sun, Jul 15, 7:37 PM

Sat, Jul 14

hfinkel added a comment to D49337: Skip debuginfo intrinsic in markLiveBlocks..

Hi @hfinkel.

Thank you for taking a look at this code. I agree we should definitely "else" the StoreInst check and I have observed improvements in Xcode Instrument for doing so.

And I should also move the DbgInfoIntrinsic check inside the IntrinsicInst check by comparing IntrinsicID.

The problem with moving the IntrinsicInst checks inside the dyn_cast<CallInst> is that every intrinsic call will need to go through a dyn_cast<CallInst> check, a getCalledValue() and then a dyn_cast<Function>(CalledValue) check. If we have a lot of intrinsic which we do when compiling with debuginfo, we could end up being overall slower than keeping the IntrinsicInst check.

When we compile without debuginfo, moving the intrinsic check inside the CallInst check will not necessarily bring us performance, as now every CallInst needs to checks for the existing cases in the if (dyn_cast<CallInst>)) and the Intrinsics (which require dyn_cast<Function>(CalledValue)). 4

Sat, Jul 14, 7:52 PM
hfinkel added a comment to D49337: Skip debuginfo intrinsic in markLiveBlocks..

There's nothing special about debug intrinsics here except that there are a lot of them. The problem, as far as I can tell, is that we're repeatedly using dyn_cast on each instruction and doing multiple redundant tests. Adding yet another redundant test will help having a lot of debug intrinsics makes things incrementally more expensive for all other kinds of intrinsics. Thus, this doesn't seem like the right way to fix this. Instead of testing for IntrinsicInst, and then CallInst (which is always true whenever the IntrinsicInst is true), and then we always test for StoreInst (but we don't use an 'else' so we always do this test even when the IntrinsicInst/CallInst is true (which will include debug intrinsics)).

Sat, Jul 14, 6:37 AM

Fri, Jul 13

hfinkel accepted D47963: [LangRef] nnan and ninf produce poison..

LGTM

Fri, Jul 13, 7:57 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Fri, Jul 13, 7:52 PM
hfinkel added a comment to D44564: [BasicAA] Use PhiValuesAnalysis if available when handling phi alias.

We have a check in aliasSameBasePointerGEPs to avoid PR32314 (which I'll glibly summarize as: we need to be careful about looking through PHIs so that we don't, without realizing it, end up comparing values from different loop iterations). The fact that this looks back through an unknown number of PHIs makes me concerned that we'll get into a similar situation.

Fri, Jul 13, 2:17 PM
hfinkel accepted D48887: [TableGen] Suppress type validation when parsing pattern fragments.

LGTM

Fri, Jul 13, 9:32 AM
hfinkel updated the diff for D49144: [FunctionAttrs] Infer the speculatable attribute.

Misc cleanup and try to validate ret attrs instead of disqualifying them all.

Fri, Jul 13, 8:10 AM

Thu, Jul 12

hfinkel updated the diff for D49144: [FunctionAttrs] Infer the speculatable attribute.

Updated to not infer speculatable if there are instructions with non-debug metadata, loads with alignment requirements that we can't independently prove, and for functions with value-constraining return-value attributes.

Thu, Jul 12, 7:35 PM
hfinkel accepted D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

Thu, Jul 12, 4:16 PM
hfinkel added a comment to D48545: [RFC v2] "Alternative" matches for TableGen DAG patterns.

LGTM (there are a lot of changes here, but given that it produces no changes to existing matching tables, that seems like pretty good test coverage).

Thanks for the review!

Note that in order to avoid spurious changes to other targets, I've for now still restricted the new inference code for the isBitcast and hasChain flags to patterns that originated from Instruction nodes. In general, it would probably be better to remove this final difference between instruction patterns and "regular" patterns, but that should IMO only be done after analysing the specific impacts on existing targets, so this should be done as a follow-on patch.

I definitely encourage resolving these in follow up. I think it will be better if this works uniformly.

OK, I'll work on this as a follow-up.

  1. Theoretically, a PatFrags instance can now have an empty Fragments list -- this would simply never match anything. Should this be explicitly forbidden? On the other hand, we might also use this to implement "null_frag" without hardcoding its name in TableGen ...

I don't see any reason to disallow it. Getting rid of null_frag as a special case seems nice.

OK, I'll continue to allow empty Fragments. Unfortunately, getting rid of null_frag is more difficult than I expected: since current code checks for null_frag so very early, many backends get away with patterns that are actually invalid even structurally (number of results don't match number of users; use of some TableGen operators inside a pattern that aren't even dag nodes at all) as long as there's a null_frag anywhere in there as well. Trying to define a null_frag as a PatFrags with empty fragment list means that TableGen would still ignore the pattern, but only after doing a structural parse first -- which now errors out.

Thu, Jul 12, 3:26 PM
hfinkel accepted D48545: [RFC v2] "Alternative" matches for TableGen DAG patterns.

LGTM (there are a lot of changes here, but given that it produces no changes to existing matching tables, that seems like pretty good test coverage).

Thu, Jul 12, 11:57 AM
hfinkel added a comment to D46842: [OpenMP][libomptarget] Make bitcode library building depend on clang and llvm-linker being available .

I'm still opposed to this, as stated in my many comments. If at all, you need to make sure that this library doesn't rebuild whenever you do an unrelated change to Clang.

Thu, Jul 12, 7:25 AM

Tue, Jul 10

hfinkel added a comment to D49144: [FunctionAttrs] Infer the speculatable attribute.

I think marking a function speculatable requires stripping metadata from any loads in that function, to avoid UB. Does that sound reasonable?

Tue, Jul 10, 7:21 PM
hfinkel added inline comments to D49165: Add, and infer, a nofree function attribute.
Tue, Jul 10, 7:07 PM
hfinkel created D49165: Add, and infer, a nofree function attribute.
Tue, Jul 10, 6:29 PM
hfinkel added a comment to D49144: [FunctionAttrs] Infer the speculatable attribute.

My assumption here is that we need to assume that any loads performed by the function, and any incoming arguments, might be poison. As a result, we can't have any branches, PHIs, or stores (even to local static allocas) - because doing any of these things with poison values is UB. Is that correct?

Control flow that might lead to infinite loops might be a problem, but why not allow at least acyclic control flow?

If the value stored to an alloca is thrown away after the ret, why is it a problem to be executed speculatively?

Tue, Jul 10, 3:51 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Tue, Jul 10, 12:49 PM
hfinkel added a comment to D49139: [OptRemark] Demangle symbols when emitting remarks.

I specifically didn't want to do this for the YAML output. The tool consuming the YAML should demangle if it wants. It's hard to go backward, and so if the tool needs to map back to symbols in the program, it needs the mangled name. That having been said:

Tue, Jul 10, 10:12 AM
hfinkel added inline comments to D49041: [LangRef] Clarify undefined behavior for function attributes..
Tue, Jul 10, 9:21 AM
hfinkel created D49144: [FunctionAttrs] Infer the speculatable attribute.
Tue, Jul 10, 9:19 AM

Mon, Jul 9

hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Mon, Jul 9, 7:04 PM
hfinkel added a comment to D49114: [clang-tidy] Add a check for "magic numbers".

This version detects and report integers only. If there is interest of merging the tool I can add the functionality for floats as well.

Mon, Jul 9, 6:47 PM
hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction

Well, not all the users pass in the context parameter, since it's optional, but otherwise, that's all we need in theory.

Mon, Jul 9, 5:29 PM
hfinkel accepted D47854: [LangRef] Clarify semantics of load metadata..

ping

Mon, Jul 9, 3:41 PM

Sun, Jul 8

hfinkel added inline comments to D49041: [LangRef] Clarify undefined behavior for function attributes..
Sun, Jul 8, 2:47 PM

Fri, Jul 6

hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

Fri, Jul 6, 8:06 PM
hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

IMO, we really need to add the other attribute to the IR.

If we codify these semantics, we need to stop Clang from using this attribute. But we shouldn't just rip all that code out only to re-add it once the new attribute is in place. I would seem cleaner to add the new attribute with the desired semantics (even if it isn't (yet) wired up to the optimizer) so that we can just switch Clang from one attribute to the other.

Fri, Jul 6, 8:03 PM

Wed, Jul 4

hfinkel added a comment to D48663: [Power9] Code Cleanup - Remove needsAggressiveScheduling().

Hi Hal, Eric,

I ran SPEC2017 on a P9 machine and difference between bi-directional and bottom up scheduling was very small overall. The bi-directional scheduling continues to be a bit better with
xalancbmk - 0.5% better on bi-directional
leela - 2.0% better on bi-directional

Changes for all of the other benchmarks were too small.

Wed, Jul 4, 11:09 AM
hfinkel added a comment to D48721: Patch to fix pragma metadata for do-while loops.

I encountered the issue while working with the unroller and found that it was not following the pragma info, and traced it back to the issue with metadata.
As far as I understood, for for-loops and while-loops, we add the metadata only to the loop back-edge. So it would make sense to keep them consistent.
I'm not an expert in clang, and do not know how we can detect such problems.

Wed, Jul 4, 10:20 AM · Restricted Project
hfinkel added a comment to D48721: Patch to fix pragma metadata for do-while loops.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Yea. Our past thinking has been that any backedge in the loop is valid. The metadata shouldn't end up other places, although it's benign unless those other places are (or may later become) a backedge for some different loop.

I'm no expert on clang. The patch seems to fix the problem if the goal is to only add the loop-metadata on the backedge , but I'll leave it to someone else to approve it.

I'm a little bit concerned about opt not detecting this kind of problems though.
Would it be possible for some verifier to detect if we have loop metadata on some branch that aren't in the latch block?
And/or should the optimization that "merges" two branches with different loop metadata), be smarter about which loop metadata to keep? Or maybe we should be defensive and discard loop metadata on the merged branch instruction?

Wed, Jul 4, 10:16 AM · Restricted Project

Mon, Jul 2

hfinkel added a comment to D48721: Patch to fix pragma metadata for do-while loops.

I tried running

/clang -cc1 -O3 -funroll-loops -S -emit-llvm pragma-do-while-unroll.cpp -o - -mllvm -print-after-all

and I get this

...
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.unroll.count", i32 3}
!4 = !{!5, !5, i64 0}
!5 = !{!"int", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C/C++ TBAA"}
!8 = distinct !{!8, !9}
!9 = !{!"llvm.loop.unroll.count", i32 5}
*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body6, %do.end
  %i.1 = phi i32 [ 0, %do.end ], [ %inc15, %do.body6 ]
  %sum.1 = phi i32 [ %add5, %do.end ], [ %add14, %do.body6 ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
*** IR Dump After Simplify the CFG ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body, %do.body6
  %i.1 = phi i32 [ %inc15, %do.body6 ], [ 0, %do.body ]
  %sum.1 = phi i32 [ %add14, %do.body6 ], [ %add5, %do.body ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
...

So up until simplifyCFG I think things look pretty good with different loop-metadata for the two loops. But when simplifyCFG is tranforming

  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

into

br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

we get incorrect metadata for one branch target.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Mon, Jul 2, 6:02 PM · Restricted Project
hfinkel added a comment to D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

We specifically defined the metadata to support nested loops. The LangRef says, "The llvm.mem.parallel_loop_access metadata refers to a loop identifier, or metadata containing a list of loop identifiers for nested loops." To handle nested loops, we need to make the instruction metadata point to a list of loop IDs.

Thanks for pointing me to it. I've not seen parallel_loop_access pointing to a list, so I didn't know it was possible.

Mon, Jul 2, 11:21 AM
hfinkel added a comment to D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

Michael, can you please add a test with two inner loops, one where more than one is annotated, and one where only the outer loop is annotated? It's not clear to me that I->setMetadata will do the right thing here in the former case.

The test case pragma-loop-safety-nested.cpp check the case where the outer loop is annotated.

Iterating over Active will iterate from outer to inner loops, meaning I->setMetadata will overwrite the annotation of the outermost loop (which IMHO is the most useful behaviour). Since it not possible to add multiple !llvm.mem.parallel_loop_access annotation, we cannot annotate multiple loops.

Mon, Jul 2, 11:00 AM
hfinkel added a comment to D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

I don't think that this is the intended behavior of the #pragma clang loop. it is better to ask the author of this pragma is this correct or not.

Mon, Jul 2, 8:41 AM

Sun, Jul 1

hfinkel accepted D48813: [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store.

LGTM

Sun, Jul 1, 9:15 PM

Fri, Jun 29

hfinkel added a comment to D48663: [Power9] Code Cleanup - Remove needsAggressiveScheduling().

Hi Eric,

We set the generic scheduler to bi-directional scheduling back in 2013. It was at the time when the generic machine scheduler was added to PowerPC. I can only assume that when we did this we ran a set of benchmarks and determined that this is the best option based on that.
However, what I can do now is run a round of SPEC with Top-Of-Trunk and confirm that we still do better in the bi-directional case. (Or at least not worse.)

Fri, Jun 29, 11:14 AM

Tue, Jun 26

hfinkel accepted D48424: LoopUnroll: Allow analyzing intrinsic call costs.

LGTM

Tue, Jun 26, 10:30 AM

Thu, Jun 21

hfinkel added inline comments to D6260: Add -mlong-double-64 flag.
Thu, Jun 21, 6:51 AM

Tue, Jun 19

hfinkel added a comment to D48326: [RFC] "Alternative" matches for TableGen DAG patterns.

I've not looked at the patch itself, but regarding the general functionality: Nice! I've wanted this for as long as I've been working on LLVM :-)

Tue, Jun 19, 11:30 AM

Jun 15 2018

hfinkel added a comment to D48025: [PowerPC] avoid masking already-zero bits in BitPermutationSelector.

Seems like a good idea. A couple of questions below.

Jun 15 2018, 4:17 PM

Jun 14 2018

hfinkel added a comment to D47963: [LangRef] nnan and ninf produce poison..

I don't think this is right. Saying the result is undefined seems like what we intend. We might choose, as an implementation technique, to limit how we take advantage of that undefinedness in certain cases, but the violating the constraint still produces logical inconsistencies that can transfer to other parts of the code, and in general, turn into any other kind of undefined behavior (depending on the structure of the code).

Jun 14 2018, 12:12 PM

Jun 12 2018

hfinkel added a comment to D48066: Add one more No-alias case to alias analysis..
knownZeros(n) == (knownZeros(idx) << 1) | 1 and
knownOnes(n) == knownOnes(idx) << 1

I meant if nothing is known about idx, we fail to get No-alias opportunity. If there is nothing for KnownBits, let's just check odd number and the idx is used by both offsets as below.

if (success of computeKnownBits with `n` and `idx`) {
  knownZeros(n) == (knownZeros(idx) << 1) | 1 and
  knownOnes(n) == knownOnes(idx) << 1
} else {
  check just LHS is odd number and RHS is shared by offsets.
}

How do you think about it? @hfinkel

Jun 12 2018, 7:12 AM
hfinkel added a comment to D48066: Add one more No-alias case to alias analysis..

I think that this optimization is going to be very fragile in practice without also handling the case where both sides look like (m-idx). Nevertheless, it's good to make AA changes in minimal small increments. Let's try this single case for now.

Jun 12 2018, 6:17 AM
hfinkel added a comment to D48066: Add one more No-alias case to alias analysis..

I put this on the llvm-dev thread, but to repeat it here:

Jun 12 2018, 5:24 AM
hfinkel added inline comments to D47854: [LangRef] Clarify semantics of load metadata..
Jun 12 2018, 2:47 AM

Jun 11 2018

hfinkel added inline comments to D47854: [LangRef] Clarify semantics of load metadata..
Jun 11 2018, 6:32 AM
hfinkel added inline comments to D47854: [LangRef] Clarify semantics of load metadata..
Jun 11 2018, 3:37 AM

Jun 7 2018

hfinkel added a comment to D47854: [LangRef] Clarify semantics of load metadata..

I just got a scary ah-ah moment..
Take this code:

int f(int &p, int n, int a) {
    int r = 0;
    for (int i = 0; i < n; ++i) {
      if (a) {
        r = r * p + 3;
      } else {
        ++a;
      }
    }
    return r;
}

This code doesn't load p if f is called with a = false and n = 1 (i.e., only 1 iteration of the loop and goes to the else branch).

clang -O1 (not -O2 to avoid hard-to-read-unrolling):

define i32 @f(i32* dereferenceable(4), i32, i32) {
  %4 = icmp sgt i32 %1, 0
  br i1 %4, label %5, label %7

; <label>:5:
  %6 = load i32, i32* %0, align 4   ; <---
  br label %9

; <label>:7:
  %8 = phi i32 [ 0, %3 ], [ %17, %9 ]
  ret i32 %8

; <label>:9:
  %10 = phi i32 [ 0, %5 ], [ %18, %9 ]
  %11 = phi i32 [ 0, %5 ], [ %17, %9 ]
  %12 = phi i32 [ %2, %5 ], [ %16, %9 ]
  %13 = icmp eq i32 %12, 0
  %14 = mul nsw i32 %6, %11
  %15 = add nsw i32 %14, 3
  %16 = select i1 %13, i32 1, i32 %12
  %17 = select i1 %13, i32 %11, i32 %15
  %18 = add nuw nsw i32 %10, 1
  %19 = icmp eq i32 %18, %1
  br i1 %19, label %7, label %9
}

https://godbolt.org/g/SQh8EJ

The load was moved to the loop preheader, but it's executed irrespective of a.
Hence if p is not 4-byte aligned, then LLVM introduce undefined behavior. Being dereferenceable doesn't imply it's so with a 4-byte alignment load.

This implies we need to remove the !align attribute (as well as all the other) when hoisting loads. Lowering 1-byte loads is not particularly efficient..
We can't really define load !align as returning poison or anything other than UB since some chips trap with unaligned memory accesses. So this is a real bug we need to fix for those folks using SPARC & friends.

Jun 7 2018, 7:14 AM
hfinkel accepted D47867: [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector.

LGTM

Jun 7 2018, 5:50 AM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

IMO this goes into the right direction, we should use the fast implementation in libdevice. If LLVM doesn't lower these calls in the NVPTX backend, I think it's ok to use header wrappers as CUDA already does.

Two questions:

  1. Can you explain where this is important for "correctness"? Yesterday I compiled a code using sqrt and it seems to spit out the correct results. Maybe that's relevant for other functions?
  2. Incidentally I ran into a closely related problem: I can't #include <math.h> in translation units compiled for offloading, Clang complains about inline assembly for x86 (see below). Does that work for you?

    ` In file included from /usr/include/math.h:413: /usr/include/bits/mathinline.h:131:43: error: invalid input constraint 'x' in asm asm ("pmovmskb %1, %0" : "=r" (m) : "x" (x)); ^ /usr/include/bits/mathinline.h:143:43: error: invalid input constraint 'x' in asm asm ("pmovmskb %1, %0" : "=r" (m) : "x" (x)); ^ 2 errors generated. `
Jun 7 2018, 5:34 AM
hfinkel added inline comments to D47867: [PowerPC] avoid unprofitable Repl32 flag in BitPermutationSelector.
Jun 7 2018, 5:09 AM

Jun 6 2018

hfinkel added a comment to D47816: [InstCombine] Stop sinking instructions across function call..

I agree that, generally it is not a good idea to sink across function call. I am a little bit consecutive on this change.

Thank you for your suggestion. I checked the document about TTI::isLoweredToCall.
IIUIC, it seems to me this backend information is still not strong enough.

  • If the backend returns false for a function, it will get expanded by some passes somewhere in the optimization pipeline. This might complicated the CFG, similar as the inlined function cases.
Jun 6 2018, 8:21 AM
hfinkel added a comment to D47816: [InstCombine] Stop sinking instructions across function call..

I don't see how inlining is relevant here. Sinking across function call boundaries is generally also bad (because increasing register pressure across calls tends to increase spilling). I feel like the test you want here is to call TTI::isLoweredToCall.

Jun 6 2018, 6:29 AM
hfinkel added inline comments to D47749: guard fsqrt with fmf sub flags.
Jun 6 2018, 5:49 AM
hfinkel added a comment to D47691: [NVPTX] Ignore target-cpu and -features for inling.

Bottom line -- the situation is far from perfect, but IMO the patch does sensible thing if we're compiling IR with mixed target-cpu and target-features attributes using NVPTX.

Jun 6 2018, 5:41 AM
hfinkel added inline comments to D47691: [NVPTX] Ignore target-cpu and -features for inling.
Jun 6 2018, 5:40 AM
hfinkel updated subscribers of D47747: [LangRef] Clarify "undefined" for various instructions..
Jun 6 2018, 2:55 AM
hfinkel added a comment to D47267: [UnrollAndJam] Add unroll_and_jam pragma handling.

I see your point about the mix of underscores. "nounroll_and_jam" also comes from the Intel docs, and theres already "nounroll" vs "unroll". The "no" becomes a qualifier on "unroll_and_jam". "no_unroll_and_jam" feels less consistent to me.

nounroll_and_jam looks like it should be parsed as "(no unroll) and jam" (do not unroll, but fuse) instead of "no (unroll-and-jam)" because nounroll is one word and as you mentioned, already used as a keyword somewhere else. Other variants use the underscore to append an option, e.g. vectorize_width.

But if you have a strong opinion, I'm happy enough to change it.

I don't. Feel free to chose the name you think fits best. We might support multiple spellings if necessary.

Jun 6 2018, 2:03 AM

Jun 5 2018

hfinkel accepted D45520: [PowerPC] add secure plt support for TLS symbols.

LGTM

Jun 5 2018, 4:58 AM
hfinkel accepted D46582: [PowerPC] Fix label address calculation for ppc32.

LGTM

Jun 5 2018, 4:57 AM
hfinkel accepted D47467: [IR] Begin removal of TerminatorInst by removing successor manipulation..

These are likely to be quite mechanical, and I suspect are probably best done with post-commit review.

Jun 5 2018, 2:49 AM
hfinkel accepted D47765: [PowerPC] reduce rotate in BitPermutationSelector.

LGTM

Jun 5 2018, 2:10 AM

Jun 4 2018

hfinkel accepted D47739: DAG: Stop dropping invariant/dereferencable.

That's odd; I can't think of any reason why we'd drop the flags in this case. LGTM.

Jun 4 2018, 12:56 PM

Jun 3 2018

hfinkel added inline comments to D47691: [NVPTX] Ignore target-cpu and -features for inling.
Jun 3 2018, 2:32 PM
hfinkel added a comment to D47683: [PM/LoopUnswitch] Teach the new unswitch to handle nontrivial unswitching of switches..

Can you please update the comment at the top of include/llvm/Transforms/Scalar/SimpleLoopUnswitch.h to define "full vs. partial" unswitching and "trivial vs. non-trivial" unswitching?

Jun 3 2018, 6:41 AM

May 31 2018

hfinkel accepted D42109: [PowerPC] Follow-up to r322373 for materializing constants that set the CR.

LGTM

May 31 2018, 7:52 AM

May 30 2018

hfinkel added inline comments to D47484: [test-suite] Corrections for MiniGMG.
May 30 2018, 6:38 PM
hfinkel added a comment to D47300: [RFC] Abstract parallel IR analyzes & optimizations + OpenMP implementations.

I've added a few specific comments, but I think that you should move forward with the associated RFC.

May 30 2018, 6:10 PM
hfinkel accepted D47531: [ValueTracking] Fix endless recursion in isKnownNonZero().

LGTM

May 30 2018, 8:11 AM

May 29 2018

hfinkel accepted D47456: [PowerPC] fix broken JIT-compiled code with tail call optimization.

LGTM

May 29 2018, 12:01 PM
hfinkel added inline comments to D42109: [PowerPC] Follow-up to r322373 for materializing constants that set the CR.
May 29 2018, 11:55 AM

May 28 2018

hfinkel accepted D47437: [PowerPC] Fix the incorrect iterator inside peephole .

I'll let Hal chime in here, but I'd prefer if we just refactor this code to use a range-for instead of this while loop (in all 3 cases).

May 28 2018, 8:32 PM

May 27 2018

hfinkel accepted D47436: [llvm-c] Remove LLVMAddBBVectorizePass.

LGTM. We're well past LLVM 5 now.

May 27 2018, 9:16 PM
hfinkel added inline comments to D45835: Add new driver mode for dumping compiler options.
May 27 2018, 8:46 PM
hfinkel added a comment to D41953: [LoopUnroll] Unroll and Jam.

It sounded like Hal wanted to review this

May 27 2018, 8:35 PM
hfinkel added a comment to D45151: [LICM] Hoisting invariant.group loads.

You say that you don't yet know if this is profitable yet. Do you have reason to believe that it might not be profitable (e.g., some example where it seems like we might want to constrain the behavior)? We almost never favor keeping metadata over doing other transformations, so I think it's worth being explicit about our reasoning here. Just saying "I don't know yet" is probably not sufficient, as we could say that about nearly all metadata, and in that case, change the default logic.

May 27 2018, 7:44 PM

May 25 2018

hfinkel added a comment to D47396: [LLD] Place .nv_fatbin section at the beginning of the executable..

Does this just not work well with other linkers, do other linkers also have special handling, or does it work better elsewhere for some other reason?

May 25 2018, 4:02 PM
hfinkel accepted D47382: [PowerPC] Set isAsmParserOnly=1 for X-form TLS loads/stores.

LGTM

May 25 2018, 11:46 AM

May 24 2018

hfinkel added a comment to D46706: [PM/LoopUnswitch] Support partial trivial unswitching..

and instead re-add the now mutated loop to the pass manager to re-visit (much like we do with non-trivial unswitching) and rely on it then iterating for us. This will ... not be a somewhat surprisingly significant behavior change. I think its good, but its worth noting. This will essentially make loop-unswitch a fixed-point pass in the pass pipeline, but it will do so using the pass manager. I'm not aware of any other passes that currently do this. Anyways, brave new world. The pipeline will now be ... truly dynamic.

May 24 2018, 9:01 PM
hfinkel added a comment to D47348: [LoopIdiomRecognize] Only convert loops to ctlz if we can prove that the input is non-negative..

Out of curiosity, did the original test case come from C/C++? Because if it did, given that we get to assume infinite loops like this don't happen in C/C++, we may end up "breaking" this again in the future.

May 24 2018, 8:53 PM

May 23 2018

hfinkel added a comment to D45576: [RFC] Allow target to handle STRICT floating-point nodes.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo optimization implications.

Correction: I mean writing to memory (the trap handler might write memory other than inaccessible memory (e.g., some global variable)). Anything that's escaped (trivially including globals) is fair game.

I suppose that they also get to read memory (so long as they're not using it to keep ordering/count information). So the trap handlers are modeled as reading/writing arbitrary escaped memory. That, plus register dependencies, seems like it should be sufficient.

So if I understand you correctly, you're suggesting (at the MI level) to mark trapping FP operations as mayLoad and mayStore instead of hasSideEffects (as the patch does originally)?

That's probably OK, I think. But this would still restrict the optimization possibilities quite a bit compared to now, so you still wouldn't want to enable it by default in non-strict FP mode, right? In that case we're back to the question whether we need to duplicate all FP patterns or not.

May 23 2018, 10:44 AM
hfinkel added a comment to D47267: [UnrollAndJam] Add unroll_and_jam pragma handling.

I have some doubts whether this will ever be used sensibly in the real world, but can
be useful for testing and was not difficult to put together.

May 23 2018, 9:09 AM
hfinkel added a reviewer for D47267: [UnrollAndJam] Add unroll_and_jam pragma handling: Meinersbur.
May 23 2018, 9:07 AM
hfinkel added a comment to D45576: [RFC] Allow target to handle STRICT floating-point nodes.

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo

Correction: I mean writing to memory (the trap handler might write memory other than inaccessible memory (e.g., some global variable)). Anything that's escaped (trivially including globals) is fair game.

May 23 2018, 9:05 AM
hfinkel added a comment to D45576: [RFC] Allow target to handle STRICT floating-point nodes.

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

Ah, okay. That doesn't seem so bad. The fact that the number of times they're called can't be observable means, I think, that we only need to model the trapping behavior as writing (and not reading) inaccessible memory. This can't be removed or speculated, but otherwise shouldn't have undo

May 23 2018, 8:59 AM
hfinkel added a comment to D45576: [RFC] Allow target to handle STRICT floating-point nodes.

That's a good point. I'm not sure that we want to model the effects of trap handlers, however. We don't at the IR level (as we mark the intrinsics as IntrInaccessibleMemOnly), and don't want to (because otherwise we need to assume that every FP operation is like an arbitrary external function call). As a result, I don't think that we want to here either. Also, can we delete the instructions when they're dead?

Well, we certainly have a use case where trapping exception mode needs to be supported. (At the very least, the compiler must generate code that runs correctly in an environment where trapping mode is enabled; in particular, FP operations must not be speculated, so that we don't get spurious traps.)

Also, can we delete the instructions when they're dead?

According to the C11 standard F.9.1, no:

Concern about side effects may inhibit code motion and removal of seemingly useless code. For example, in

#include <fenv.h>
#pragma STDC FENV_ACCESS ON
void f(double x)
{
/* ... */
for (i = 0; i < n; i++) x + 1;
/* ... */
}

x + 1 might raise floating-point exceptions, so cannot be removed. And since the loop body might not execute (maybe 0 ≥ n), x + 1 cannot be moved out of the loop. (Of course these optimizations are valid if the implementation can rule out the nettlesome cases.)

This specification does not require support for trap handlers that maintain information about the order or count of floating-point exceptions. Therefore, between function calls, floating-point exceptions need not be precise: the actual order and number of occurrences of floating-point exceptions (> 1) may vary from what the source code expresses. Thus, the preceding loop could be treated as

if (0 < n) x + 1;

Note that this applies even in default (non-trapping) exception handling mode, because removing dead operations may still change the exception status flags read out by subsequent calls to fegetexcept.

May 23 2018, 8:55 AM
hfinkel added inline comments to D45520: [PowerPC] add secure plt support for TLS symbols.
May 23 2018, 8:08 AM
hfinkel added inline comments to D46582: [PowerPC] Fix label address calculation for ppc32.
May 23 2018, 8:00 AM
hfinkel added a comment to D45576: [RFC] Allow target to handle STRICT floating-point nodes.

Following the BoF at EuroLLVM, I've been talking to Chris Lattner about this, and he suggested a somewhat different approach: providing a way for an MI instruction to have the "unmodeled side effects" flag be provided by an MI *operand*. That is, right now the hasSideEffects flag is a constant: any particular MI instruction class has this either set or clear, but to model FP instructions, it might be better to have a setting where whether or not any particular instantiation of the instruction has side effects depends on the value of an operand.

But that's much stronger than necessary, and will prevent all kinds of optimizations. I don't think that we want to do that. Why not just add a dependence on the registers that matter?

So the above statement was assuming that you need hasSideEffects on the MI level to correctly model certain strict FP effects, specifically the mode where FP instructions may trap. I agree that hasSideEffects is unneeded if we only want to model e.g. rounding modes or non-trapping exception flags. But I don't really see how we can correctly model trapping instructions only by using register dependencies.

May 23 2018, 7:57 AM

May 22 2018

hfinkel accepted D47088: Fix aliasing of launder.invariant.group.

This is still a bit convoluted, but I think that someone can now figure out what's going on from the comments. They also raise an interesting point of whether we should somehow separate out the capture-only-by-return case from the more-general case (although that's nothing to be addressed here).

May 22 2018, 6:08 PM
hfinkel added inline comments to D47188: Intel SVML calling conventions.
May 22 2018, 10:57 AM
hfinkel added a comment to D45576: [RFC] Allow target to handle STRICT floating-point nodes.

The reason that I originally implemented this the way I did, mutating the strict nodes to their non-constrained equivalents, was that I thought it would require too much duplication in the .td files to implement all the pattern matching for the strict nodes. The original plan was to find some other way to communicate the "strict" state to the target after instruction selection, but I never found a way to do that.

What you've done here seems reasonable to me. Obviously it still involves a lot of updates to the td files, but your approach to making that manageable going forward seems plausible. I really don't have the expertise in instruction selection to judge that completely, but this looks like a promising direction.

Thanks for looking at this!

Following the BoF at EuroLLVM, I've been talking to Chris Lattner about this, and he suggested a somewhat different approach: providing a way for an MI instruction to have the "unmodeled side effects" flag be provided by an MI *operand*. That is, right now the hasSideEffects flag is a constant: any particular MI instruction class has this either set or clear, but to model FP instructions, it might be better to have a setting where whether or not any particular instantiation of the instruction has side effects depends on the value of an operand.

May 22 2018, 10:01 AM
hfinkel added inline comments to D47188: Intel SVML calling conventions.
May 22 2018, 8:40 AM
hfinkel added inline comments to D47188: Intel SVML calling conventions.
May 22 2018, 7:09 AM
hfinkel accepted D47178: [PowerPC] Remove the match pattern in the definition of LXSDX/STXSDX.

LGTM

May 22 2018, 6:59 AM

May 19 2018

hfinkel added inline comments to D47088: Fix aliasing of launder.invariant.group.
May 19 2018, 10:14 PM
hfinkel added inline comments to D47088: Fix aliasing of launder.invariant.group.
May 19 2018, 8:29 AM

May 18 2018

hfinkel committed rL332798: Add a link to the new sponsorship document..
Add a link to the new sponsorship document.
May 18 2018, 8:14 PM