hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (312 w, 4 d)

Recent Activity

Mon, Nov 12

hfinkel added a comment to D53184: [LangRef] Clarify semantics of volatile operations..

LGTM

Mon, Nov 12, 10:00 PM

Wed, Nov 7

hfinkel added a comment to D53157: Teach the IRBuilder about constrained fadd and friends.
In D53157#1290266, @kpn wrote:

Adding the usual suspects: @andrew.w.kaylor @craig.topper @uweigand @hfinkel

I don't see the benefit of this change. The intrinsics are good enough to be functional right now (minus a few that are in progress), so I'm not sure we need IRBuilder functions. Am I missing something?

The IRBuilder is used in clang. Once the use of the pragma is wired down to clang's codegen we need a way to emit the constrained intrinsics. It makes for very readable code to change a call to Builder.CreateFAdd() so it conditionally calls Builder.CreateConstrainedFAdd() instead. And if CreateConstrainedFAdd() returns something other than a call to an intrinsic then clang doesn't care.

Wed, Nov 7, 9:18 AM

Fri, Nov 2

hfinkel accepted D54018: Update the relicensing website to better explain the list of companies, and update the list as some more signatures have come in..

LGTM

Fri, Nov 2, 7:53 AM
hfinkel added a comment to D53203: Allow MemoryLocation to carry pre-existing knowledge to AA to elide expensive repeated checks.

I'm not very familiar with the MemorySSA pass but based on a fairly quick skim of it ...
It looks like the MemoryDef/MemoryUse objects are attached to the BB rather than the instructions.

Fri, Nov 2, 7:52 AM

Thu, Nov 1

hfinkel added a comment to D43142: Experimental pass to convert all floating point operations to the equivalent constrained intrinsics.

I don't think this will work. The ConstantFolder could fold away traps early in the IRBuilder. We also wouldn't catch the FNeg/FSub case either.

I think Clang would need to explicitly generate the constrained intrinsics.

Thu, Nov 1, 4:44 PM
hfinkel added a comment to D43142: Experimental pass to convert all floating point operations to the equivalent constrained intrinsics.
In D43142#1284190, @kpn wrote:

...

It's unclear to me how LTO (or other cross file inlining) would work here. I haven't given it much though until now. My knee-jerk reaction is that we shouldn't be inlining from a FENV_ACCESS=OFF to FENV_ACCESS=ON location.

Thu, Nov 1, 4:28 PM
hfinkel accepted D51844: Allow subclassing ExternalAA.

Somewhat unfortunate amount of boilerplate for each target AA pass, but I don't have a better suggestion right now.

Thu, Nov 1, 9:05 AM

Wed, Oct 31

hfinkel added a comment to D53203: Allow MemoryLocation to carry pre-existing knowledge to AA to elide expensive repeated checks.

Hi Hal,

I didn't spot you in CODE_OWNERS.txt at first because I was searching for 'Alias' instead of 'alias' :-). Would you be able to take a look at this patch too?

Wed, Oct 31, 7:18 PM

Thu, Oct 18

hfinkel added a comment to D53362: [Prototype] Heap-To-Stack Conversion Pass.

I think you have to be careful about something like the following:

void f() {
  void *p = malloc(4);
  nocapture_func_frees_pointer(p);
  func_throws();
  free(p);
}

I don't think it's possible to trigger the issue at the moment, though, because we don't have a mustreturn attribute (so isGuaranteedToTransferExecutionToSuccessor() will return false for nocapture_func_frees_pointer).

Thu, Oct 18, 7:15 AM

Wed, Oct 17

hfinkel updated the diff for D53362: [Prototype] Heap-To-Stack Conversion Pass.

Add some optimization remarks. Now we'll get things like his:

Wed, Oct 17, 4:52 PM
hfinkel added a comment to D53362: [Prototype] Heap-To-Stack Conversion Pass.

Add the pointer-capturing check when potential exits are found.

Not sure if enough, since @efriedma noted in D47345:

"Your "capture" function captures the pointer, but it's possible to write a function which takes the pointer as a parameter but doesn't capture it. For example, given the function void f(void *p) { free(p); }, LLVM will mark the parameter "p" nocapture."

Wed, Oct 17, 12:03 PM
hfinkel updated the diff for D53362: [Prototype] Heap-To-Stack Conversion Pass.

Fix variable initialization so that we don't infinte-loop when we refine.

Wed, Oct 17, 11:35 AM
hfinkel updated the diff for D53362: [Prototype] Heap-To-Stack Conversion Pass.

Add the pointer-capturing check when potential exits are found.

Wed, Oct 17, 10:21 AM
hfinkel created D53362: [Prototype] Heap-To-Stack Conversion Pass.
Wed, Oct 17, 3:09 AM

Mon, Oct 15

hfinkel accepted D53269: Major update to relicensing page..

LGTM

Mon, Oct 15, 9:27 AM

Oct 11 2018

hfinkel added a comment to D53184: [LangRef] Clarify semantics of volatile operations..

Currently, there's sort of a split in the LLVM source code about whether volatile operations are allowed to trap; this resolves that dispute in favor of not allowing them to trap. (This is the opposite of the position I originally took, but thinking about it a bit more it makes sense for the uses we expect.)

Oct 11 2018, 8:03 PM
hfinkel accepted D48025: [PowerPC] avoid masking already-zero bits in BitPermutationSelector.

LGTM

Oct 11 2018, 7:48 PM

Oct 10 2018

hfinkel added a comment to D48025: [PowerPC] avoid masking already-zero bits in BitPermutationSelector.

gentle ping

Oct 10 2018, 7:33 AM

Oct 9 2018

hfinkel added inline comments to D49691: [DAGCombine] Allow alias analysis with inline asm calls and GluedNodes..
Oct 9 2018, 1:17 PM

Oct 8 2018

hfinkel added a comment to D52785: [PseudoSourceValue] New category to represent floating-point status.

As far as I understand MachineMemoryOperands:

  • You are allowed to drop them
Oct 8 2018, 4:08 PM

Oct 6 2018

hfinkel accepted D52960: Add a listing of companies that have agreed to the relicensing..

LGTM

Oct 6 2018, 7:56 PM

Oct 5 2018

hfinkel accepted D52929: Update to relicensing website with developments over the past few months..

Looks great, thanks Chandler!

Oct 5 2018, 8:22 PM

Oct 4 2018

hfinkel added a comment to D52785: [PseudoSourceValue] New category to represent floating-point status.

I'm not sure where it's documented, but MMOs are definitely not guaranteed to be preserved (except in global-isel, the verifier enforces this for some of the G_* instructions). I'm not aware of anywhere specifically dropping them, but code is not required to preserve them.

Why do you say this?

Things like MachineInstr::hasOrderedMemoryRef() has handling such as:

// Otherwise, if the instruction has no memory reference information,
// conservatively assume it wasn't preserved.
if (memoperands_empty())
  return true;

I always assumed thing were allowed to drop MMOs to avoid passes having to figure out what to do when an instruction had multiple memoperands.

Oct 4 2018, 10:35 PM
hfinkel added a comment to D52785: [PseudoSourceValue] New category to represent floating-point status.

I'm not sure where it's documented, but MMOs are definitely not guaranteed to be preserved (except in global-isel, the verifier enforces this for some of the G_* instructions). I'm not aware of anywhere specifically dropping them, but code is not required to preserve them.

Oct 4 2018, 7:54 PM

Sep 25 2018

hfinkel added inline comments to D52432: [PowerPC] Remove self-copies in pre-emit peephole.
Sep 25 2018, 11:16 AM
hfinkel accepted D52432: [PowerPC] Remove self-copies in pre-emit peephole.

LGTM

Sep 25 2018, 7:15 AM
hfinkel accepted D52431: [PowerPC] Turn on CR-Logical reducer pass.

Now that we understand the reason for the only degradation, we can turn this pass on by default. A subsequent patch will detect and remove self-copies in the pre-emit peephole.

Sep 25 2018, 7:01 AM

Sep 21 2018

hfinkel added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

Interesting, although given that the loop runs for more than one iteration, would that i1 variable work correctly?

The i1 variable doesn't actually get used for anything; the vectorizer makes a new induction variable to track the iteration count.

Sep 21 2018, 2:50 PM
hfinkel added a comment to D52327: [Loop Vectorizer] Abandon vectorization when no integer IV found.

If there's some actual implementation work involved in supporting it, fine, I guess... but it seems like it can't be much work to support properly: even just adding an unused i1 induction variable is enough to get around the crash for the given testcase.

Sep 21 2018, 2:27 PM
hfinkel added a comment to D52362: [CloneFunction] Simplify previously unsimplifiable instructions.

Is there any compile-time impact?

I haven't checked the compile time impact. Will do. However, this is just a linear addition of instructions and it's uses - we were already looking at phis and their uses. Also, later passes (that iterate over the inlined instructions) will anyway iterate over all instructions. So, I think it should be either neutral or a win. Win when we were wastefully iterating over instructions through multiple passes until we reached the pass which could simplify these instructions.

Should we just simplify all instructions at this later point instead of having two passes?

I'll take a look, but it seems a more invasive change. We first cloneBlocks, then iterate over all instructions in the callee and insert it in the caller in right order (now the instructions have a parent block), and then have two passes over the PHINodes. The second pass over the PHINode already looks over the uses and does recursive simplification. To this list, now that all instructions have basic block parents, we add them for recursive simplification.

Sep 21 2018, 2:12 PM
hfinkel added a comment to D52362: [CloneFunction] Simplify previously unsimplifiable instructions.

Is there any compile-time impact? Should we just simplify all instructions at this later point instead of having two passes?

Sep 21 2018, 10:28 AM
hfinkel accepted D52351: [LoopVectorizer] Fix in getScalarizationOverhead().

LGTM

Sep 21 2018, 10:12 AM
hfinkel added a comment to D51994: TableGen/ISel: Allow PatFrag predicate code to access captured operands.

I'm somewhat confused by the motivation. Can you explain why the fact that the operands are commuted causes a problem. In D51995, you just seem to be looping over them anyway.

Sep 21 2018, 9:58 AM
hfinkel accepted D51993: TableGen/CodeGenDAGPatterns: addPredicateFn only once.

Any perf numbers? For instance, on x86 -gen-dag-isel

No, and I don't see why we'd care. The motivation was purely code cleanup, and I don't see how the performance could get worse.

Curiosity - I've been trying very hard to reduce tablegen build times over the past few months.

Sep 21 2018, 9:52 AM
hfinkel accepted D52345: [PowerPC] optimize conditional branch on CRSET/CRUNSET.

LGTM

Sep 21 2018, 9:27 AM
hfinkel added inline comments to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
Sep 21 2018, 8:44 AM

Sep 17 2018

hfinkel added a comment to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

int f(int n) {

int *x = (int *)malloc(n);
free(x);
int *y =  (int *)malloc(n);
return (y == x) ;

}

This has been already folded to true even before this patch.

Sep 17 2018, 11:56 AM
hfinkel added a comment to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

disable this fold when comparing noalias ptr with null

Sep 17 2018, 11:52 AM
hfinkel requested changes to D52164: [InstSimplify] Fold ne/eq comparison of a pointer produced by a noalias function.

Unfortunately, I don't think that we can do this. noalias semantics only provide information about the overlap of memory accesses using the resulting pointers, in the context of other necessary constraints, and does not strongly constrain the pointer values. For example, it might be that:

Sep 17 2018, 10:53 AM

Sep 11 2018

hfinkel accepted D51108: [PowerPC] Fix wrong ABI for i1 stack arguments on PPC32.

LGTM

Sep 11 2018, 8:26 AM

Aug 27 2018

hfinkel accepted D49978: LangRef: Clarify expected sNaN behavior for minnum/maxnum.

ping

Aug 27 2018, 10:00 AM
hfinkel updated subscribers of D50987: [Exception Handling] Unwind tables are required for all functions that have an EH personality..

I'm not 100% sure if we can remove the default option -funwind-tables for x86.
I know that using -funwind-tables prevents (or hides) this issue but it may be required for another reason that this fix does not cover. I don't actually know what the initial intention was when that default was added.
I think that someone on the x86 side would be a better judge of this.

Aug 27 2018, 7:51 AM

Aug 25 2018

hfinkel added a comment to D50987: [Exception Handling] Unwind tables are required for all functions that have an EH personality..

Does this mean that we can remove that previously-discussed x86 logic in Clang?

Aug 25 2018, 5:09 PM
hfinkel added inline comments to D51108: [PowerPC] Fix wrong ABI for i1 stack arguments on PPC32.
Aug 25 2018, 2:53 PM
hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..

Hello folks, is there a plan to merge this feature still?

Aug 25 2018, 2:37 PM

Aug 23 2018

hfinkel added inline comments to D51108: [PowerPC] Fix wrong ABI for i1 stack arguments on PPC32.
Aug 23 2018, 1:52 PM

Aug 17 2018

hfinkel added a comment to D50908: [PowerPC] Enable -funwind-tables by default on powerpc.

Also, if you know, should this actually be the default behavior for all targets? I'm somewhat surprised if x86 and PPC are special cases.

Aug 17 2018, 11:21 AM
hfinkel added a comment to D50908: [PowerPC] Enable -funwind-tables by default on powerpc.

Interesting. Why? Does this effectively match GCC's behavior?

Aug 17 2018, 11:20 AM

Aug 16 2018

hfinkel added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

As a result, we should really have a separate header that has those actually-available functions. When targeting NVPTX, why don't we have the included math.h be CUDA's math.h? In the end, those are the functions we need to call when we generate code. Right?

That's what D47849 deals with.

Aug 16 2018, 12:25 PM
hfinkel added a comment to D50845: [CUDA/OpenMP] Define only some host macros during device compilation.

Another option would be to implement some sort of attribute-based overloading. Then OpenMP can provide its own version of the device-side library function without clashing with system headers.

Aug 16 2018, 11:54 AM

Aug 15 2018

hfinkel added a comment to D50791: [SelectionDAG] unroll unsupported vector FP ops earlier to avoid libcalls on undef elements (PR38527).

Thinking forward to a future where we have SVML library support for v4f32 and a v8f32 sin/cos. What should we do for v5f32?

Not sure. Over in:
https://bugs.llvm.org/show_bug.cgi?id=38528
...we discussed transforming to veclib calls late in IR, so any special lib support would already be taken care of before we reach this point?

Aug 15 2018, 4:40 PM

Aug 14 2018

hfinkel added a comment to D45151: [LICM] Hoisting invariant.group loads.

I'm sorry I can't be more decisive here since I wasn't deeply involved with the devirtualization work early on and so lack a lot of context. The general direction here seems fine to me -- given that we use metadata to express a large range of things, it seems ok to have metadata specific MD dropping policies. However, given that Hal originally objected to this, we should make sure he is on board.

Aug 14 2018, 5:21 PM

Aug 13 2018

hfinkel added inline comments to D50680: [SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array..
Aug 13 2018, 6:05 PM
hfinkel accepted D50679: [BasicAA] Don't assume tail calls with byval don't alias allocas.
In D50679#1198248, @rnk wrote:

The DSE test is fine, and I think that you should keep it, but can you please add a direct AA test (like, e.g., test/Analysis/BasicAA/cs-cs.ll)?

Should we just always return ModRefInfo::Ref in this case? It can read the data (to copy it), but can't write, correct?

I thought about that, but I think the code below this check is usually more precise in these cases. It tests if the alloca isNonEscapingLocalObject and then looks at each argument to see if it is actually used by the call. For local objects that *are* escaping, we could return ModRefInfo::Ref. Basically, we already get this test case:

define void @dead(i32* byval %x) {
entry:
  %p = alloca i32
  %q = alloca i32
  %v = load i32, i32* %x
  store i32 %v, i32* %p
  store i32 1, i32* %q  ; DEAD
  tail call void @g(i32* byval %p)
  store i32 2, i32* %q
  call void @escape(i32* %q)
  ret void
}
Aug 13 2018, 5:42 PM
hfinkel added inline comments to D50680: [SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array..
Aug 13 2018, 5:40 PM
hfinkel added a comment to D50680: [SDAG] Remove the reliance on MI's allocation strategy for `MachineMemOperand` pointers attached to `MachineSDNodes` and instead have the `SelectionDAG` fully manage the memory for this array..

This code certainly looks cleaner.

Aug 13 2018, 5:30 PM
hfinkel added a comment to D50679: [BasicAA] Don't assume tail calls with byval don't alias allocas.

The DSE test is fine, and I think that you should keep it, but can you please add a direct AA test (like, e.g., test/Analysis/BasicAA/cs-cs.ll)?

Aug 13 2018, 5:21 PM
hfinkel accepted D48386: [clang] Mark libm functions writeonly when we care about errno.

LGTM too.

Aug 13 2018, 3:21 PM
hfinkel added a comment to D48388: [EarlyCSE] Understand WriteOnly function calls in EarlyCSE with MSSA.

Makes sense to me. Obviously needs some test.

Aug 13 2018, 3:04 PM
hfinkel accepted D48387: [FunctionAttrs] Infer WriteOnly Function Attribute .

A couple of minor comments, but otherwise, LGTM.

Aug 13 2018, 2:58 PM

Aug 6 2018

hfinkel added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.

Hideki, I think that LLVM does the right thing here: We provide a separation between the hint and the mandate (i.e., using assume_safety or not).

That, I know. I wasn't questioning about that.

What I'm not seeing from this RFC/patch is that, if the programmer specifies transformation behavior A -> B -> C, what happens if transformation A does not kick-in? Should we just warn that "A did not happen" and stop processing the request B and C?
Also, if the programmer requests that the loop to be distribute in three ways and specify different transformations for each, what should the latter transformation do if the loop is distributed in two ways or four ways? If we are serious about introducing this kind of features, we should clearly define what should happen when the programmer intention cannot be satisfied well enough ---- when we should continue honoring and when we should stop honoring. If we say "we should stop in all those circumstances", that should simplify the problem a lot. If we say "we should allow to continue on subset of those cases", we should clearly state which subset and why. If there are any prior discussions (or descriptions within this patch) along this lines, please point me to that.

Aug 6 2018, 11:35 AM

Aug 5 2018

hfinkel added a comment to D50295: Let GetUnderlyingObject/Objects use MemorySSA.

Thanks for the comments!

@hfinkel The motivation is that GetUnderlyingObject stucks at cases like this:

store i8* %p, i8** %q
%p2 = load i8*, i8** %q ; p2 is p

If we have MemorySSA, this can be resolved in constant time. I believe BasicAliasAnalysis can get benefit this, hence supporting partially flow sensitive analysis.

Aug 5 2018, 7:53 PM

Aug 4 2018

hfinkel added a comment to D50295: Let GetUnderlyingObject/Objects use MemorySSA.

Seems like an interesting idea. What's the motivation for doing this?

Aug 4 2018, 5:47 AM

Aug 3 2018

hfinkel added a comment to D49281: [Unroll/UnrollAndJam/Vectorizer/Distribute] Add followup loop attributes..
... As such, how to fall back when the transformation doesn't happen is almost equally important as what to do next when the transformation happens.
Aug 3 2018, 4:22 PM

Aug 1 2018

hfinkel accepted D48100: Append new attributes to the end of an AttributeList..

Assuming that @aaron.ballman is still okay with this, I think that we should move forward. @erichkeane, I suspect that fixing the ordering problems, which were often arbitrary before and are still so, is a larger change that we'd want to split out regardless (not to mention the fact that we'd need to decide how to fix them). Erich, is that alright with you?

Aug 1 2018, 5:59 PM
hfinkel added a comment to D48348: [ADT] Add zip_longest iterators..

Getting the size may require iterating the sequence in advance, which I tried to avoid in D48100.

Yeah, it's a tradeoff there, to be sure. Not sure if anyone else has thoughts on the design tradeoffs here - would hope other folks might chime in.

Aug 1 2018, 5:41 PM
hfinkel accepted D48808: [CodeGen] Emit parallel_loop_access for each loop in the loop stack..

LGTM

Aug 1 2018, 5:32 PM
hfinkel added inline comments to D46313: [clang] Add WriteOnly attribute.
Aug 1 2018, 2:21 PM
hfinkel added inline comments to D50055: Update the coding standard about NFC changes and whitespace.
Aug 1 2018, 2:08 PM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

The problem is that the inline assembly might actually be for the target, instead of the host, because we also have target preprocessor macros defined, and it's going to be hard to tell. I'm not sure that there's a great solution here, and I agree that having something more general than undefining some specific things that happen to matter for math.h would be better. As you point out, this is not just a system-header problem. We might indeed want to undefine all of the target-feature-related macros (although that won't always be sufficient, because we need basic arch macros for the system headers to work at all, and those are generally enough to guard some inline asm).

I think there was a reason for pulling in the host defines. I'd have to look at the commit message though...

Aug 1 2018, 9:05 AM
hfinkel accepted D50121: [PowerPC] Do not round values prior to converting to integer.

LGTM

Aug 1 2018, 8:40 AM
hfinkel accepted D50083: [SelectionDAG] Make binop reduction matcher available to all targets.

LGTM

Aug 1 2018, 8:34 AM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

Hrmm. Doesn't that make it so that whatever functions are implemented using that inline assembly will not be callable from target code (or, perhaps worse, will crash the backend if called)?

You are right :-(

However I'm getting worried about a more general case, not all inline assembly is guarded by #ifdefs that we could hope to get right. For example take sys/io.h which currently throws 18 errors when compiling with offloading to GPUs, even with -O0. The inline assembly is only guarded by #if defined __GNUC__ && __GNUC__ >= 2 which should be defined by any modern compiler claiming compatibility with GCC. I'm not sure this particular header will ever end up in an OpenMP application, but others with inline assembly will. From a quick grep it looks like some headers dealing with atomic operations have inline assembly and even eigen3/Eigen/src/Core/util/Memory.h for finding the cpuid.

Coming back to the original problem: Maybe we need to undefine optimization macros as in your patch to get as many correct inline functions as possible AND ignore errors from inline assembly as in my patch to not break when including weird headers?

Aug 1 2018, 8:32 AM

Jul 31 2018

hfinkel added inline comments to D50121: [PowerPC] Do not round values prior to converting to integer.
Jul 31 2018, 6:07 PM
hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.
  1. Incidentally I ran into a closely related problem: I can't #include <math.h> in translation units compiled for offloading, Clang complains about inline assembly for x86 (see below). Does that work for you?
In file included from /usr/include/math.h:413:
/usr/include/bits/mathinline.h:131:43: error: invalid input constraint 'x' in asm
  __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
                                          ^
/usr/include/bits/mathinline.h:143:43: error: invalid input constraint 'x' in asm
  __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
                                          ^
2 errors generated.

Hrmm. I thought that we had fixed that already.

In case it's helpful, in an out-of-tree experimental target I have I ran into a similar problem, and to fix that I wrote the following code in the target's getTargetDefines function (in lib/Basic/Targets):

// If used as an OpenMP target on x86, x86 target feature macros are defined. math.h
// and other system headers will include inline asm if these are defined.
Builder.undefineMacro("__SSE2_MATH__");
Builder.undefineMacro("__SSE_MATH__");

Just found another workaround:

diff --git a/lib/Sema/SemaStmtAsm.cpp b/lib/Sema/SemaStmtAsm.cpp
index 0db15ea..b95f949 100644
--- a/lib/Sema/SemaStmtAsm.cpp
+++ b/lib/Sema/SemaStmtAsm.cpp
@@ -306,7 +306,9 @@ StmtResult Sema::ActOnGCCAsmStmt(SourceLocation AsmLoc, bool IsSimple,
 
     TargetInfo::ConstraintInfo Info(Literal->getString(), InputName);
     if (!Context.getTargetInfo().validateInputConstraint(OutputConstraintInfos,
-                                                         Info)) {
+                                                         Info) &&
+        !(Context.getLangOpts().OpenMPIsDevice &&
+          Context.getSourceManager().isInSystemHeader(AsmLoc))) {
       return StmtError(Diag(Literal->getLocStart(),
                             diag::err_asm_invalid_input_constraint)
                        << Info.getConstraintStr());

This will ignore all errors during OpenMP device codegen from system headers when the inline assembly is not used. In that case (calling signbit) you'll get

In file included from math.c:2:
In file included from /usr/include/math.h:413:
/usr/include/bits/mathinline.h:143:10: error: couldn't allocate input reg for constraint 'x'
  __asm ("pmovmskb %1, %0" : "=r" (__m) : "x" (__x));
         ^
1 error generated.

Not sure if that's acceptable...

Jul 31 2018, 12:27 PM
hfinkel added inline comments to D50083: [SelectionDAG] Make binop reduction matcher available to all targets.
Jul 31 2018, 11:10 AM

Jul 30 2018

hfinkel added inline comments to D49978: LangRef: Clarify expected sNaN behavior for minnum/maxnum.
Jul 30 2018, 12:58 PM

Jul 19 2018

hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 19 2018, 4:57 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 19 2018, 2:00 PM
hfinkel accepted D49041: [LangRef] Clarify undefined behavior for function attributes..

LGTM

Jul 19 2018, 10:01 AM

Jul 16 2018

hfinkel accepted D49337: Skip debuginfo intrinsic in markLiveBlocks..

LGTM

Jul 16 2018, 4:01 PM

Jul 15 2018

hfinkel added inline comments to D49337: Skip debuginfo intrinsic in markLiveBlocks..
Jul 15 2018, 7:37 PM

Jul 14 2018

hfinkel added a comment to D49337: Skip debuginfo intrinsic in markLiveBlocks..

Hi @hfinkel.

Thank you for taking a look at this code. I agree we should definitely "else" the StoreInst check and I have observed improvements in Xcode Instrument for doing so.

And I should also move the DbgInfoIntrinsic check inside the IntrinsicInst check by comparing IntrinsicID.

The problem with moving the IntrinsicInst checks inside the dyn_cast<CallInst> is that every intrinsic call will need to go through a dyn_cast<CallInst> check, a getCalledValue() and then a dyn_cast<Function>(CalledValue) check. If we have a lot of intrinsic which we do when compiling with debuginfo, we could end up being overall slower than keeping the IntrinsicInst check.

When we compile without debuginfo, moving the intrinsic check inside the CallInst check will not necessarily bring us performance, as now every CallInst needs to checks for the existing cases in the if (dyn_cast<CallInst>)) and the Intrinsics (which require dyn_cast<Function>(CalledValue)). 4

Jul 14 2018, 7:52 PM
hfinkel added a comment to D49337: Skip debuginfo intrinsic in markLiveBlocks..

There's nothing special about debug intrinsics here except that there are a lot of them. The problem, as far as I can tell, is that we're repeatedly using dyn_cast on each instruction and doing multiple redundant tests. Adding yet another redundant test will help having a lot of debug intrinsics makes things incrementally more expensive for all other kinds of intrinsics. Thus, this doesn't seem like the right way to fix this. Instead of testing for IntrinsicInst, and then CallInst (which is always true whenever the IntrinsicInst is true), and then we always test for StoreInst (but we don't use an 'else' so we always do this test even when the IntrinsicInst/CallInst is true (which will include debug intrinsics)).

Jul 14 2018, 6:37 AM

Jul 13 2018

hfinkel accepted D47963: [LangRef] nnan and ninf produce poison..

LGTM

Jul 13 2018, 7:57 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 13 2018, 7:52 PM
hfinkel added a comment to D44564: [BasicAA] Use PhiValuesAnalysis if available when handling phi alias.

We have a check in aliasSameBasePointerGEPs to avoid PR32314 (which I'll glibly summarize as: we need to be careful about looking through PHIs so that we don't, without realizing it, end up comparing values from different loop iterations). The fact that this looks back through an unknown number of PHIs makes me concerned that we'll get into a similar situation.

Jul 13 2018, 2:17 PM
hfinkel accepted D48887: [TableGen] Suppress type validation when parsing pattern fragments.

LGTM

Jul 13 2018, 9:32 AM
hfinkel updated the diff for D49144: [FunctionAttrs] Infer the speculatable attribute.

Misc cleanup and try to validate ret attrs instead of disqualifying them all.

Jul 13 2018, 8:10 AM

Jul 12 2018

hfinkel updated the diff for D49144: [FunctionAttrs] Infer the speculatable attribute.

Updated to not infer speculatable if there are instructions with non-debug metadata, loads with alignment requirements that we can't independently prove, and for functions with value-constraining return-value attributes.

Jul 12 2018, 7:35 PM
hfinkel accepted D48239: [LangRef] Clarify meaning of "dereferencable" attribute/metadata..

PointerMayBeCaptured doesn't do the right thing. The primary issue is that it only walks the uses of the argument; other pointers could alias the argument if it isn't also noalias. (The other issue is that freeing a pointer doesn't count as capturing it.)

Ah, indeed. You're certainly correct. And free is marked as nocapture (which we'd likely prefer to keep, although free almost certainly does capture the pointer value in some physical sense, it does not do so in a way that will be visible to the rest of the program (except that it might be returned by some later malloc call)).

We'd need to model this directly for it to be useful (as an attribute that means that the function doesn't call free, or doesn't free a particular argument, or similar). I'll send an RFC about that.

Jul 12 2018, 4:16 PM
hfinkel added a comment to D48545: [RFC v2] "Alternative" matches for TableGen DAG patterns.

LGTM (there are a lot of changes here, but given that it produces no changes to existing matching tables, that seems like pretty good test coverage).

Thanks for the review!

Note that in order to avoid spurious changes to other targets, I've for now still restricted the new inference code for the isBitcast and hasChain flags to patterns that originated from Instruction nodes. In general, it would probably be better to remove this final difference between instruction patterns and "regular" patterns, but that should IMO only be done after analysing the specific impacts on existing targets, so this should be done as a follow-on patch.

I definitely encourage resolving these in follow up. I think it will be better if this works uniformly.

OK, I'll work on this as a follow-up.

  1. Theoretically, a PatFrags instance can now have an empty Fragments list -- this would simply never match anything. Should this be explicitly forbidden? On the other hand, we might also use this to implement "null_frag" without hardcoding its name in TableGen ...

I don't see any reason to disallow it. Getting rid of null_frag as a special case seems nice.

OK, I'll continue to allow empty Fragments. Unfortunately, getting rid of null_frag is more difficult than I expected: since current code checks for null_frag so very early, many backends get away with patterns that are actually invalid even structurally (number of results don't match number of users; use of some TableGen operators inside a pattern that aren't even dag nodes at all) as long as there's a null_frag anywhere in there as well. Trying to define a null_frag as a PatFrags with empty fragment list means that TableGen would still ignore the pattern, but only after doing a structural parse first -- which now errors out.

Jul 12 2018, 3:26 PM
hfinkel accepted D48545: [RFC v2] "Alternative" matches for TableGen DAG patterns.

LGTM (there are a lot of changes here, but given that it produces no changes to existing matching tables, that seems like pretty good test coverage).

Jul 12 2018, 11:57 AM
hfinkel added a comment to D46842: [OpenMP][libomptarget] Make bitcode library building depend on clang and llvm-linker being available .

I'm still opposed to this, as stated in my many comments. If at all, you need to make sure that this library doesn't rebuild whenever you do an unrelated change to Clang.

Jul 12 2018, 7:25 AM

Jul 10 2018

hfinkel added a comment to D49144: [FunctionAttrs] Infer the speculatable attribute.

I think marking a function speculatable requires stripping metadata from any loads in that function, to avoid UB. Does that sound reasonable?

Jul 10 2018, 7:21 PM
hfinkel added inline comments to D49165: Add, and infer, a nofree function attribute.
Jul 10 2018, 7:07 PM
hfinkel created D49165: Add, and infer, a nofree function attribute.
Jul 10 2018, 6:29 PM
hfinkel added a comment to D49144: [FunctionAttrs] Infer the speculatable attribute.

My assumption here is that we need to assume that any loads performed by the function, and any incoming arguments, might be poison. As a result, we can't have any branches, PHIs, or stores (even to local static allocas) - because doing any of these things with poison values is UB. Is that correct?

Control flow that might lead to infinite loops might be a problem, but why not allow at least acyclic control flow?

If the value stored to an alloca is thrown away after the ret, why is it a problem to be executed speculatively?

Jul 10 2018, 3:51 PM
hfinkel added inline comments to D49144: [FunctionAttrs] Infer the speculatable attribute.
Jul 10 2018, 12:49 PM
hfinkel added a comment to D49139: [OptRemark] Demangle symbols when emitting remarks.

I specifically didn't want to do this for the YAML output. The tool consuming the YAML should demangle if it wants. It's hard to go backward, and so if the tool needs to map back to symbols in the program, it needs the mangled name. That having been said:

Jul 10 2018, 10:12 AM