hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (305 w, 14 h)

Recent Activity

Yesterday

hfinkel added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Interesting, although given that the loop runs for more than one iteration, would that i1 variable work correctly?

The i1 variable doesn't actually get used for anything; the vectorizer makes a new induction variable to track the iteration count.

Fri, Sep 21, 2:50 PM
hfinkel added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

If there's some actual implementation work involved in supporting it, fine, I guess... but it seems like it can't be much work to support properly: even just adding an unused i1 induction variable is enough to get around the crash for the given testcase.

Fri, Sep 21, 2:27 PM
hfinkel added a comment to D52362: [CloneFunction] Simplify previously unsimplifiable instructions.

Is there any compile-time impact?

I haven't checked the compile time impact. Will do. However, this is just a linear addition of instructions and it's uses - we were already looking at phis and their uses. Also, later passes (that iterate over the inlined instructions) will anyway iterate over all instructions. So, I think it should be either neutral or a win. Win when we were wastefully iterating over instructions through multiple passes until we reached the pass which could simplify these instructions.

Should we just simplify all instructions at this later point instead of having two passes?

I'll take a look, but it seems a more invasive change. We first cloneBlocks, then iterate over all instructions in the callee and insert it in the caller in right order (now the instructions have a parent block), and then have two passes over the PHINodes. The second pass over the PHINode already looks over the uses and does recursive simplification. To this list, now that all instructions have basic block parents, we add them for recursive simplification.

Fri, Sep 21, 2:12 PM
hfinkel added a comment to D52362: [CloneFunction] Simplify previously unsimplifiable instructions.

Is there any compile-time impact? Should we just simplify all instructions at this later point instead of having two passes?

Fri, Sep 21, 10:28 AM
hfinkel accepted D52351: [LoopVectorizer] Fix in getScalarizationOverhead().

LGTM

Fri, Sep 21, 10:12 AM
hfinkel added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

I'm somewhat confused by the motivation. Can you explain why the fact that the operands are commuted causes a problem. In D51995, you just seem to be looping over them anyway.

Fri, Sep 21, 9:58 AM
hfinkel accepted D51993: TableGen/CodeGenDAGPatterns: addPredicateFn only once.

Any perf numbers? For instance, on x86 -gen-dag-isel

No, and I don't see why we'd care. The motivation was purely code cleanup, and I don't see how the performance could get worse.

Curiosity - I've been trying very hard to reduce tablegen build times over the past few months.

Fri, Sep 21, 9:52 AM
hfinkel accepted D52345: [PowerPC] optimize conditional branch on CRSET/CRUNSET.

LGTM

Fri, Sep 21, 9:27 AM
hfinkel added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Fri, Sep 21, 8:44 AM

Mon, Sep 17

hfinkel added a comment to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

int f(int n) {

int *x = (int *)malloc(n);
free(x);
int *y =  (int *)malloc(n);
return (y == x) ;

}

This has been already folded to true even before this patch.

Mon, Sep 17, 11:56 AM
hfinkel added a comment to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

disable this fold when comparing noalias ptr with null

Mon, Sep 17, 11:52 AM
hfinkel requested changes to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

Unfortunately, I don't think that we can do this. noalias semantics only provide information about the overlap of memory accesses using the resulting pointers, in the context of other necessary constraints, and does not strongly constrain the pointer values. For example, it might be that:

Mon, Sep 17, 10:53 AM

Tue, Sep 11

hfinkel accepted D51108: [PowerPC] Fix wrong ABI for i1 stack arguments on PPC32.

LGTM

Tue, Sep 11, 8:26 AM

Mon, Aug 27

hfinkel accepted D49978: LangRef: Clarify expected sNaN behavior for minnum/maxnum.

ping

Mon, Aug 27, 10:00 AM
hfinkel updated subscribers of D50987: [Exception Handling] Unwind tables are required for all functions that have an EH personality..

I'm not 100% sure if we can remove the default option -funwind-tables for x86.
I know that using -funwind-tables prevents (or hides) this issue but it may be required for another reason that this fix does not cover. I don't actually know what the initial intention was when that default was added.
I think that someone on the x86 side would be a better judge of this.

Mon, Aug 27, 7:51 AM

Sat, Aug 25

hfinkel added a comment to D50987: [Exception Handling] Unwind tables are required for all functions that have an EH personality..

Does this mean that we can remove that previously-discussed x86 logic in Clang?

Sat, Aug 25, 5:09 PM
hfinkel added inline comments to D51108: [PowerPC] Fix wrong ABI for i1 stack arguments on PPC32.
Sat, Aug 25, 2:53 PM
hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..

Hello folks, is there a plan to merge this feature still?

Sat, Aug 25, 2:37 PM

Thu, Aug 23

hfinkel added inline comments to D51108: [PowerPC] Fix wrong ABI for i1 stack arguments on PPC32.
Thu, Aug 23, 1:52 PM

Aug 17 2018

hfinkel added a comment to D50908: [PowerPC] Enable -funwind-tables by default on powerpc.

Also, if you know, should this actually be the default behavior for all targets? I'm somewhat surprised if x86 and PPC are special cases.

Aug 17 2018, 11:21 AM
hfinkel added a comment to D50908: [PowerPC] Enable -funwind-tables by default on powerpc.

Interesting. Why? Does this effectively match GCC's behavior?

Aug 17 2018, 11:20 AM

Aug 16 2018

hfinkel added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

As a result, we should really have a separate header that has those actually-available functions. When targeting NVPTX, why don't we have the included math.h be CUDA's math.h? In the end, those are the functions we need to call when we generate code. Right?

That's what D47849 deals with.

Aug 16 2018, 12:25 PM
hfinkel added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Another option would be to implement some sort of attribute-based overloading. Then OpenMP can provide its own version of the device-side library function without clashing with system headers.

Aug 16 2018, 11:54 AM

Aug 15 2018

hfinkel added a comment to D50791: [SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527).

Thinking forward to a future where we have SVML library support for v4f32 and a v8f32 sin/cos. What should we do for v5f32?

Not sure. Over in:
https://bugs.llvm.org/show_bug.cgi?id=38528
...we discussed transforming to veclib calls late in IR, so any special lib support would already be taken care of before we reach this point?

Aug 15 2018, 4:40 PM

Aug 14 2018

hfinkel added a comment to D45151: [LICM] Hoisting invariant.group loads.

I'm sorry I can't be more decisive here since I wasn't deeply involved with the devirtualization work early on and so lack a lot of context. The general direction here seems fine to me -- given that we use metadata to express a large range of things, it seems ok to have metadata specific MD dropping policies. However, given that Hal originally objected to this, we should make sure he is on board.

Aug 14 2018, 5:21 PM

Aug 13 2018

hfinkel added inline comments to D50680: [SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array..
Aug 13 2018, 6:05 PM
hfinkel accepted D50679: [BasicAA] Don't assume tail calls with byval don't alias allocas.
In D50679#1198248, @rnk wrote:

The DSE test is fine, and I think that you should keep it, but can you please add a direct AA test (like, e.g., test/Analysis/BasicAA/cs-cs.ll)?

Should we just always return ModRefInfo::Ref in this case? It can read the data (to copy it), but can't write, correct?

I thought about that, but I think the code below this check is usually more precise in these cases. It tests if the alloca isNonEscapingLocalObject and then looks at each argument to see if it is actually used by the call. For local objects that *are* escaping, we could return ModRefInfo::Ref. Basically, we already get this test case:

define void @dead(i32* byval %x) {
entry:
  %p = alloca i32
  %q = alloca i32
  %v = load i32, i32* %x
  store i32 %v, i32* %p
  store i32 1, i32* %q  ; DEAD
  tail call void @g(i32* byval %p)
  store i32 2, i32* %q
  call void @escape(i32* %q)
  ret void
}
Aug 13 2018, 5:42 PM
hfinkel added inline comments to D50680: [SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array..
Aug 13 2018, 5:40 PM
hfinkel added a comment to D50680: [SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array..

This code certainly looks cleaner.

Aug 13 2018, 5:30 PM
hfinkel added a comment to D50679: [BasicAA] Don't assume tail calls with byval don't alias allocas.

The DSE test is fine, and I think that you should keep it, but can you please add a direct AA test (like, e.g., test/Analysis/BasicAA/cs-cs.ll)?

Aug 13 2018, 5:21 PM
hfinkel accepted D48386: [clang] Mark libm functions writeonly when we care about errno.

LGTM too.

Aug 13 2018, 3:21 PM
hfinkel added a comment to D48388: [EarlyCSE] Understand WriteOnly function calls in EarlyCSE with MSSA.

Makes sense to me. Obviously needs some test.

Aug 13 2018, 3:04 PM
hfinkel accepted D48387: [FunctionAttrs] Infer WriteOnly Function Attribute .

A couple of minor comments, but otherwise, LGTM.

Aug 13 2018, 2:58 PM

Aug 6 2018

hfinkel added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

That, I know. I wasn't questioning about that.

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?
Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways? If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

Aug 6 2018, 11:35 AM

Aug 5 2018

hfinkel added a comment to D50295: Let GetUnderlyingObject/Objects use MemorySSA.

Thanks for the comments!

@hfinkel The motivation is that GetUnderlyingObject stucks at cases like this:

store i8* %p, i8** %q
%p2 = load i8*, i8** %q ; p2 is p

If we have MemorySSA, this can be resolved in constant time. I believe BasicAliasAnalysis can get benefit this, hence supporting partially flow sensitive analysis.

Aug 5 2018, 7:53 PM

Aug 4 2018

hfinkel added a comment to D50295: Let GetUnderlyingObject/Objects use MemorySSA.

Seems like an interesting idea. What's the motivation for doing this?

Aug 4 2018, 5:47 AM

Aug 3 2018

hfinkel added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.
Aug 3 2018, 4:22 PM

Aug 1 2018

hfinkel accepted D48100: Append new attributes to the end of an AttributeList..

Assuming that @aaron.ballman is still okay with this, I think that we should move forward. @erichkeane, I suspect that fixing the ordering problems, which were often arbitrary before and are still so, is a larger change that we'd want to split out regardless (not to mention the fact that we'd need to decide how to fix them). Erich, is that alright with you?

Aug 1 2018, 5:59 PM
hfinkel added a comment to D48348: [ADT] Add zip_longest iterators..

Getting the size may require iterating the sequence in advance, which I tried to avoid in D48100.

Yeah, it's a tradeoff there, to be sure. Not sure if anyone else has thoughts on the design tradeoffs here - would hope other folks might chime in.

Aug 1 2018, 5:41 PM
hfinkel accepted D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

LGTM

Aug 1 2018, 5:32 PM
hfinkel added inline comments to D46313: [clang] Add WriteOnly attribute.
Aug 1 2018, 2:21 PM
hfinkel added inline comments to D50055: Update the coding standard about NFC changes and whitespace.
Aug 1 2018, 2:08 PM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

The problem is that the inline assembly might actually be for the target, instead of the host, because we also have target preprocessor macros defined, and it's going to be hard to tell. I'm not sure that there's a great solution here, and I agree that having something more general than undefining some specific things that happen to matter for math.h would be better. As you point out, this is not just a system-header problem. We might indeed want to undefine all of the target-feature-related macros (although that won't always be sufficient, because we need basic arch macros for the system headers to work at all, and those are generally enough to guard some inline asm).

I think there was a reason for pulling in the host defines. I'd have to look at the commit message though...

Aug 1 2018, 9:05 AM
hfinkel accepted D50121: [PowerPC] Do not round values prior to converting to integer.

LGTM

Aug 1 2018, 8:40 AM
hfinkel accepted D50083: [SelectionDAG] Make binop reduction matcher available to all targets.

LGTM

Aug 1 2018, 8:34 AM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

Hrmm. Doesn't that make it so that whatever functions are implemented using that inline assembly will not be callable from target code (or, perhaps worse, will crash the backend if called)?

You are right :-(

However I'm getting worried about a more general case, not all inline assembly is guarded by #ifdefs that we could hope to get right. For example take sys/io.h which currently throws 18 errors when compiling with offloading to GPUs, even with -O0. The inline assembly is only guarded by #if defined __GNUC__ && __GNUC__ >= 2 which should be defined by any modern compiler claiming compatibility with GCC. I'm not sure this particular header will ever end up in an OpenMP application, but others with inline assembly will. From a quick grep it looks like some headers dealing with atomic operations have inline assembly and even eigen3/Eigen/src/Core/util/Memory.h for finding the cpuid.

Coming back to the original problem: Maybe we need to undefine optimization macros as in your patch to get as many correct inline functions as possible AND ignore errors from inline assembly as in my patch to not break when including weird headers?

Aug 1 2018, 8:32 AM

Jul 31 2018

hfinkel added inline comments to D50121: [PowerPC] Do not round values prior to converting to integer.
Jul 31 2018, 6:07 PM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.
  1. Incidentally I ran into a closely related problem: I can't #include <math.h> in translation units compiled for offloading, Clang complains about inline assembly for x86 (see below). Does that work for you?
In file included from /usr/include/math.h:413:
/usr/include/bits/mathinline.h:131:43: error: invalid input constraint 'x' in asm
  __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
                                          ^
/usr/include/bits/mathinline.h:143:43: error: invalid input constraint 'x' in asm
  __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
                                          ^
2 errors generated.

Hrmm. I thought that we had fixed that already.

In case it's helpful, in an out-of-tree experimental target I have I ran into a similar problem, and to fix that I wrote the following code in the target's getTargetDefines function (in lib/Basic/Targets):

// If used as an OpenMP target on x86, x86 target feature macros are defined. math.h
// and other system headers will include inline asm if these are defined.
Builder.undefineMacro("__SSE2_MATH__");
Builder.undefineMacro("__SSE_MATH__");

Just found another workaround:

diff --git a/lib/Sema/SemaStmtAsm.cpp b/lib/Sema/SemaStmtAsm.cpp
index 0db15ea..b95f949 100644
--- a/lib/Sema/SemaStmtAsm.cpp
+++ b/lib/Sema/SemaStmtAsm.cpp
@@ -306,7 +306,9 @@ StmtResult Sema::ActOnGCCAsmStmt(SourceLocation AsmLoc, bool IsSimple,
 
     TargetInfo::ConstraintInfo Info(Literal->getString(), InputName);
     if (!Context.getTargetInfo().validateInputConstraint(OutputConstraintInfos,
-                                                         Info)) {
+                                                         Info) &&
+        !(Context.getLangOpts().OpenMPIsDevice &&
+          Context.getSourceManager().isInSystemHeader(AsmLoc))) {
       return StmtError(Diag(Literal->getLocStart(),
                             diag::err_asm_invalid_input_constraint)
                        << Info.getConstraintStr());

This will ignore all errors during OpenMP device codegen from system headers when the inline assembly is not used. In that case (calling signbit) you'll get

In file included from math.c:2:
In file included from /usr/include/math.h:413:
/usr/include/bits/mathinline.h:143:10: error: couldn't allocate input reg for constraint 'x'
  __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
         ^
1 error generated.

Not sure if that's acceptable...

Jul 31 2018, 12:27 PM
hfinkel added inline comments to D50083: [SelectionDAG] Make binop reduction matcher available to all targets.
Jul 31 2018, 11:10 AM

Jul 30 2018

hfinkel added inline comments to D49978: LangRef: Clarify expected sNaN behavior for minnum/maxnum.
Jul 30 2018, 12:58 PM

Jul 19 2018

hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 19 2018, 4:57 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 19 2018, 2:00 PM
hfinkel accepted D49041: [LangRef] Clarify undefined behavior for function attributes..

LGTM

Jul 19 2018, 10:01 AM

Jul 16 2018

hfinkel accepted D49337: Skip debuginfo intrinsic in markLiveBlocks..

LGTM

Jul 16 2018, 4:01 PM

Jul 15 2018

hfinkel added inline comments to D49337: Skip debuginfo intrinsic in markLiveBlocks..
Jul 15 2018, 7:37 PM

Jul 14 2018

hfinkel added a comment to D49337: Skip debuginfo intrinsic in markLiveBlocks..

Hi @hfinkel.

Thank you for taking a look at this code. I agree we should definitely "else" the StoreInst check and I have observed improvements in Xcode Instrument for doing so.

And I should also move the DbgInfoIntrinsic check inside the IntrinsicInst check by comparing IntrinsicID.

The problem with moving the IntrinsicInst checks inside the dyn_cast<CallInst> is that every intrinsic call will need to go through a dyn_cast<CallInst> check, a getCalledValue() and then a dyn_cast<Function>(CalledValue) check. If we have a lot of intrinsic which we do when compiling with debuginfo, we could end up being overall slower than keeping the IntrinsicInst check.

When we compile without debuginfo, moving the intrinsic check inside the CallInst check will not necessarily bring us performance, as now every CallInst needs to checks for the existing cases in the if (dyn_cast<CallInst>)) and the Intrinsics (which require dyn_cast<Function>(CalledValue)). 4

Jul 14 2018, 7:52 PM
hfinkel added a comment to D49337: Skip debuginfo intrinsic in markLiveBlocks..

There's nothing special about debug intrinsics here except that there are a lot of them. The problem, as far as I can tell, is that we're repeatedly using dyn_cast on each instruction and doing multiple redundant tests. Adding yet another redundant test will help having a lot of debug intrinsics makes things incrementally more expensive for all other kinds of intrinsics. Thus, this doesn't seem like the right way to fix this. Instead of testing for IntrinsicInst, and then CallInst (which is always true whenever the IntrinsicInst is true), and then we always test for StoreInst (but we don't use an 'else' so we always do this test even when the IntrinsicInst/CallInst is true (which will include debug intrinsics)).

Jul 14 2018, 6:37 AM

Jul 13 2018

hfinkel accepted D47963: [LangRef] nnan and ninf produce poison..

LGTM

Jul 13 2018, 7:57 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 13 2018, 7:52 PM
hfinkel added a comment to D44564: [BasicAA] Use PhiValuesAnalysis if available when handling phi alias.

We have a check in aliasSameBasePointerGEPs to avoid PR32314 (which I'll glibly summarize as: we need to be careful about looking through PHIs so that we don't, without realizing it, end up comparing values from different loop iterations). The fact that this looks back through an unknown number of PHIs makes me concerned that we'll get into a similar situation.

Jul 13 2018, 2:17 PM
hfinkel accepted D48887: [TableGen] Suppress type validation when parsing pattern fragments.

LGTM

Jul 13 2018, 9:32 AM
hfinkel updated the diff for D49144: [FunctionAttrs] Infer the speculatable attribute.

Misc cleanup and try to validate ret attrs instead of disqualifying them all.

Jul 13 2018, 8:10 AM

Jul 12 2018

hfinkel updated the diff for D49144: [FunctionAttrs] Infer the speculatable attribute.

Updated to not infer speculatable if there are instructions with non-debug metadata, loads with alignment requirements that we can't independently prove, and for functions with value-constraining return-value attributes.

Jul 12 2018, 7:35 PM
hfinkel accepted D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

Jul 12 2018, 4:16 PM
hfinkel added a comment to D48545: [RFC v2] "Alternative" matches for TableGen DAG patterns.

LGTM (there are a lot of changes here, but given that it produces no changes to existing matching tables, that seems like pretty good test coverage).

Thanks for the review!

Note that in order to avoid spurious changes to other targets, I've for now still restricted the new inference code for the isBitcast and hasChain flags to patterns that originated from Instruction nodes. In general, it would probably be better to remove this final difference between instruction patterns and "regular" patterns, but that should IMO only be done after analysing the specific impacts on existing targets, so this should be done as a follow-on patch.

I definitely encourage resolving these in follow up. I think it will be better if this works uniformly.

OK, I'll work on this as a follow-up.

  1. Theoretically, a PatFrags instance can now have an empty Fragments list -- this would simply never match anything. Should this be explicitly forbidden? On the other hand, we might also use this to implement "null_frag" without hardcoding its name in TableGen ...

I don't see any reason to disallow it. Getting rid of null_frag as a special case seems nice.

OK, I'll continue to allow empty Fragments. Unfortunately, getting rid of null_frag is more difficult than I expected: since current code checks for null_frag so very early, many backends get away with patterns that are actually invalid even structurally (number of results don't match number of users; use of some TableGen operators inside a pattern that aren't even dag nodes at all) as long as there's a null_frag anywhere in there as well. Trying to define a null_frag as a PatFrags with empty fragment list means that TableGen would still ignore the pattern, but only after doing a structural parse first -- which now errors out.

Jul 12 2018, 3:26 PM
hfinkel accepted D48545: [RFC v2] "Alternative" matches for TableGen DAG patterns.

LGTM (there are a lot of changes here, but given that it produces no changes to existing matching tables, that seems like pretty good test coverage).

Jul 12 2018, 11:57 AM
hfinkel added a comment to D46842: [OpenMP][libomptarget] Make bitcode library building depend on clang and llvm-linker being available .

I'm still opposed to this, as stated in my many comments. If at all, you need to make sure that this library doesn't rebuild whenever you do an unrelated change to Clang.

Jul 12 2018, 7:25 AM

Jul 10 2018

hfinkel added a comment to D49144: [FunctionAttrs] Infer the speculatable attribute.

I think marking a function speculatable requires stripping metadata from any loads in that function, to avoid UB. Does that sound reasonable?

Jul 10 2018, 7:21 PM
hfinkel added inline comments to D49165: Add, and infer, a nofree function attribute.
Jul 10 2018, 7:07 PM
hfinkel created D49165: Add, and infer, a nofree function attribute.
Jul 10 2018, 6:29 PM
hfinkel added a comment to D49144: [FunctionAttrs] Infer the speculatable attribute.

My assumption here is that we need to assume that any loads performed by the function, and any incoming arguments, might be poison. As a result, we can't have any branches, PHIs, or stores (even to local static allocas) - because doing any of these things with poison values is UB. Is that correct?

Control flow that might lead to infinite loops might be a problem, but why not allow at least acyclic control flow?

If the value stored to an alloca is thrown away after the ret, why is it a problem to be executed speculatively?

Jul 10 2018, 3:51 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 10 2018, 12:49 PM
hfinkel added a comment to D49139: [OptRemark] Demangle symbols when emitting remarks.

I specifically didn't want to do this for the YAML output. The tool consuming the YAML should demangle if it wants. It's hard to go backward, and so if the tool needs to map back to symbols in the program, it needs the mangled name. That having been said:

Jul 10 2018, 10:12 AM
hfinkel added inline comments to D49041: [LangRef] Clarify undefined behavior for function attributes..
Jul 10 2018, 9:21 AM
hfinkel created D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 10 2018, 9:19 AM

Jul 9 2018

hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Jul 9 2018, 7:04 PM
hfinkel added a comment to D49114: [clang-tidy] Add a check for "magic numbers".

This version detects and report integers only. If there is interest of merging the tool I can add the functionality for floats as well.

Jul 9 2018, 6:47 PM
hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

You're correct about getPointerDereferenceableBytes, but all uses go though isDereferenceableAndAlignedPointer (isDereferenceableAndAlignedPointer is really the only caller of getPointerDereferenceableBytes, as far as I know), and hooking this into isDereferenceableAndAlignedPointer should be straightforward because it takes a context instruction

Well, not all the users pass in the context parameter, since it's optional, but otherwise, that's all we need in theory.

Jul 9 2018, 5:29 PM
hfinkel accepted D47854: [LangRef] Clarify semantics of load metadata..

ping

Jul 9 2018, 3:41 PM

Jul 8 2018

hfinkel added inline comments to D49041: [LangRef] Clarify undefined behavior for function attributes..
Jul 8 2018, 2:47 PM

Jul 6 2018

hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

It's straightforward to define the alternative semantics where it only applies at the point of the call/load. And it would still be useful to the optimizer. But the optimizer code would have to be written from scratch; the existing getPointerDereferenceableBytes API isn't usable with an attribute like that. It's probably worth doing at some point, though: we could prove other interesting things with the context-sensitive analysis, though. For example, we could prove that a pointer is dereferenceable using a previous load or store operation.)

Jul 6 2018, 8:06 PM
hfinkel added a comment to D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

IMO, we really need to add the other attribute to the IR.

If we codify these semantics, we need to stop Clang from using this attribute. But we shouldn't just rip all that code out only to re-add it once the new attribute is in place. I would seem cleaner to add the new attribute with the desired semantics (even if it isn't (yet) wired up to the optimizer) so that we can just switch Clang from one attribute to the other.

Jul 6 2018, 8:03 PM

Jul 4 2018

hfinkel added a comment to D48663: [Power9] Code Cleanup - Remove needsAggressiveScheduling().

Hi Hal, Eric,

I ran SPEC2017 on a P9 machine and difference between bi-directional and bottom up scheduling was very small overall. The bi-directional scheduling continues to be a bit better with
xalancbmk - 0.5% better on bi-directional
leela - 2.0% better on bi-directional

Changes for all of the other benchmarks were too small.

Jul 4 2018, 11:09 AM
hfinkel added a comment to D48721: Patch to fix pragma metadata for do-while loops.

I encountered the issue while working with the unroller and found that it was not following the pragma info, and traced it back to the issue with metadata.
As far as I understood, for for-loops and while-loops, we add the metadata only to the loop back-edge. So it would make sense to keep them consistent.
I'm not an expert in clang, and do not know how we can detect such problems.

Jul 4 2018, 10:20 AM · Restricted Project
hfinkel added a comment to D48721: Patch to fix pragma metadata for do-while loops.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Yea. Our past thinking has been that any backedge in the loop is valid. The metadata shouldn't end up other places, although it's benign unless those other places are (or may later become) a backedge for some different loop.

I'm no expert on clang. The patch seems to fix the problem if the goal is to only add the loop-metadata on the backedge , but I'll leave it to someone else to approve it.

I'm a little bit concerned about opt not detecting this kind of problems though.
Would it be possible for some verifier to detect if we have loop metadata on some branch that aren't in the latch block?
And/or should the optimization that "merges" two branches with different loop metadata), be smarter about which loop metadata to keep? Or maybe we should be defensive and discard loop metadata on the merged branch instruction?

Jul 4 2018, 10:16 AM · Restricted Project

Jul 2 2018

hfinkel added a comment to D48721: Patch to fix pragma metadata for do-while loops.

I tried running

/clang -cc1 -O3 -funroll-loops -S -emit-llvm pragma-do-while-unroll.cpp -o - -mllvm -print-after-all

and I get this

...
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.unroll.count", i32 3}
!4 = !{!5, !5, i64 0}
!5 = !{!"int", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C/C++ TBAA"}
!8 = distinct !{!8, !9}
!9 = !{!"llvm.loop.unroll.count", i32 5}
*** IR Dump After Combine redundant instructions ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body6, %do.end
  %i.1 = phi i32 [ 0, %do.end ], [ %inc15, %do.body6 ]
  %sum.1 = phi i32 [ %add5, %do.end ], [ %add14, %do.body6 ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
*** IR Dump After Simplify the CFG ***
; Function Attrs: nounwind
define i32 @test(i32* %a, i32 %n) local_unnamed_addr #0 {
entry:
  br label %do.body, !llvm.loop !2

do.body:                                          ; preds = %do.body, %entry
  %i.0 = phi i32 [ 0, %entry ], [ %inc, %do.body ]
  %sum.0 = phi i32 [ 0, %entry ], [ %add5, %do.body ]
  %0 = zext i32 %i.0 to i64
  %arrayidx = getelementptr inbounds i32, i32* %a, i64 %0
  %1 = load i32, i32* %arrayidx, align 4, !tbaa !4
  %add = add nsw i32 %1, 1
  store i32 %add, i32* %arrayidx, align 4, !tbaa !4
  %add5 = add nsw i32 %sum.0, %add
  %inc = add nuw nsw i32 %i.0, 1
  %cmp = icmp slt i32 %inc, %n
  br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

do.body6:                                         ; preds = %do.body, %do.body6
  %i.1 = phi i32 [ %inc15, %do.body6 ], [ 0, %do.body ]
  %sum.1 = phi i32 [ %add14, %do.body6 ], [ %add5, %do.body ]
  %2 = zext i32 %i.1 to i64
  %arrayidx8 = getelementptr inbounds i32, i32* %a, i64 %2
  %3 = load i32, i32* %arrayidx8, align 4, !tbaa !4
  %add9 = add nsw i32 %3, 1
  store i32 %add9, i32* %arrayidx8, align 4, !tbaa !4
  %add14 = add nsw i32 %sum.1, %add9
  %inc15 = add nuw nsw i32 %i.1, 1
  %cmp17 = icmp slt i32 %inc15, %n
  br i1 %cmp17, label %do.body6, label %do.end18, !llvm.loop !8

do.end18:                                         ; preds = %do.body6
  ret i32 %add14
}
...

So up until simplifyCFG I think things look pretty good with different loop-metadata for the two loops. But when simplifyCFG is tranforming

  br i1 %cmp, label %do.body, label %do.end, !llvm.loop !2

do.end:                                           ; preds = %do.body
  br label %do.body6, !llvm.loop !8

into

br i1 %cmp, label %do.body, label %do.body6, !llvm.loop !8

we get incorrect metadata for one branch target.

Is the fault that the metadata only should be put on the back edge, not the branch in the preheader?

Jul 2 2018, 6:02 PM · Restricted Project
hfinkel added a comment to D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

We specifically defined the metadata to support nested loops. The LangRef says, "The llvm.mem.parallel_loop_access metadata refers to a loop identifier, or metadata containing a list of loop identifiers for nested loops." To handle nested loops, we need to make the instruction metadata point to a list of loop IDs.

Thanks for pointing me to it. I've not seen parallel_loop_access pointing to a list, so I didn't know it was possible.

Jul 2 2018, 11:21 AM
hfinkel added a comment to D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

Michael, can you please add a test with two inner loops, one where more than one is annotated, and one where only the outer loop is annotated? It's not clear to me that I->setMetadata will do the right thing here in the former case.

The test case pragma-loop-safety-nested.cpp check the case where the outer loop is annotated.

Iterating over Active will iterate from outer to inner loops, meaning I->setMetadata will overwrite the annotation of the outermost loop (which IMHO is the most useful behaviour). Since it not possible to add multiple !llvm.mem.parallel_loop_access annotation, we cannot annotate multiple loops.

Jul 2 2018, 11:00 AM
hfinkel added a comment to D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

I don't think that this is the intended behavior of the #pragma clang loop. it is better to ask the author of this pragma is this correct or not.

Jul 2 2018, 8:41 AM

Jul 1 2018

hfinkel accepted D48813: [PowerPC] Don't make it as pre-inc candidate if displacement isn't 4's multiple for i64 pre-inc load/store.

LGTM

Jul 1 2018, 9:15 PM

Jun 29 2018

hfinkel added a comment to D48663: [Power9] Code Cleanup - Remove needsAggressiveScheduling().

Hi Eric,

We set the generic scheduler to bi-directional scheduling back in 2013. It was at the time when the generic machine scheduler was added to PowerPC. I can only assume that when we did this we ran a set of benchmarks and determined that this is the best option based on that.
However, what I can do now is run a round of SPEC with Top-Of-Trunk and confirm that we still do better in the bi-directional case. (Or at least not worse.)

Jun 29 2018, 11:14 AM

Jun 26 2018

hfinkel accepted D48424: LoopUnroll: Allow analyzing intrinsic call costs.

LGTM

Jun 26 2018, 10:30 AM

Jun 21 2018

hfinkel added inline comments to D6260: Add -mlong-double-64 flag.
Jun 21 2018, 6:51 AM

Jun 19 2018

hfinkel added a comment to D48326: [RFC] "Alternative" matches for TableGen DAG patterns.

I've not looked at the patch itself, but regarding the general functionality: Nice! I've wanted this for as long as I've been working on LLVM :-)

Jun 19 2018, 11:30 AM

Jun 15 2018

hfinkel added a comment to D48025: [PowerPC] avoid masking already-zero bits in BitPermutationSelector.

Seems like a good idea. A couple of questions below.

Jun 15 2018, 4:17 PM

Jun 14 2018

hfinkel added a comment to D47963: [LangRef] nnan and ninf produce poison..

I don't think this is right. Saying the result is undefined seems like what we intend. We might choose, as an implementation technique, to limit how we take advantage of that undefinedness in certain cases, but the violating the constraint still produces logical inconsistencies that can transfer to other parts of the code, and in general, turn into any other kind of undefined behavior (depending on the structure of the code).

Jun 14 2018, 12:12 PM

Jun 12 2018

hfinkel added a comment to D48066: Add one more No-alias case to alias analysis..
knownZeros(n) == (knownZeros(idx) << 1) | 1 and
knownOnes(n) == knownOnes(idx) << 1

I meant if nothing is known about idx, we fail to get No-alias opportunity. If there is nothing for KnownBits, let's just check odd number and the idx is used by both offsets as below.

if (success of computeKnownBits with `n` and `idx`) {
  knownZeros(n) == (knownZeros(idx) << 1) | 1 and
  knownOnes(n) == knownOnes(idx) << 1
} else {
  check just LHS is odd number and RHS is shared by offsets.
}

How do you think about it? @hfinkel

Jun 12 2018, 7:12 AM
hfinkel added a comment to D48066: Add one more No-alias case to alias analysis..

I think that this optimization is going to be very fragile in practice without also handling the case where both sides look like (m-idx). Nevertheless, it's good to make AA changes in minimal small increments. Let's try this single case for now.

Jun 12 2018, 6:17 AM
hfinkel added a comment to D48066: Add one more No-alias case to alias analysis..

I put this on the llvm-dev thread, but to repeat it here:

Jun 12 2018, 5:24 AM
hfinkel added inline comments to D47854: [LangRef] Clarify semantics of load metadata..
Jun 12 2018, 2:47 AM