Page MenuHomePhabricator

hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (348 w, 2 d)

Recent Activity

Fri, Jul 19

hfinkel added a comment to D63782: [FPEnv] Add fptosi and fptoui constrained intrinsics.

I've been thinking about the getStrictFPOperationAction issue. I believe this function makes no sense at all and really should go away. Instead, the strict opcodes should use their own operation actions just like everything else. In current code, those should now be set up in a reasonable manner: they all default to Expand, unless the target overrides them (to whatever makes sense).

If the operation action is anything but Expand, I believe it should simply be respected as-is (i.e. Legal stays until ISel, Custom calls the target hook, Promote -if ever used- should be implemented in common code in a way that preserves strictness, and LibCall should emit a library call - possibly a strict version if necessary).

Now, if the operation action of a strict FP operation is Expand, common code should try to implement an expansion rule if possible in a way that preserves strictness. If that is not possible, then and only then should we think about falling back to a non-strict implementation. That fallback (and only that fallback) now should look at the operation action of the non-strict equivalent. If that is anything but Legal, then the expansion logic should replace the strict node with a non-strict node and push that non-strict node back to the legalizer to handle it in whatever way is indicated by its operation action. If the operation action is Legal, then we can leave the strict node in and mutate it to the non-strict node only shortly before ISel as is done today.

As to when an expansion is possible in a way that preserves strictness, and when to fall back to the non-strict equivalent: For a vector op, I believe it always makes sense to expand a strict vector op by scalarizing it to strict component ops. (Except, possibly, if that scalar op would itself fall back to its non-strict eqiuvalent -- then we might as well do the fallback on the vector type.) For scalar ops, for some it could be possible to expand them while respecting strictness, e.g. for some of the fp-to-sint cases above. That can be decided on a case-by-case basis. If at the end of ExpandNode we haven't found a way to expand a strict node, we can to the fallback then.

Does this make sense?

Fri, Jul 19, 12:00 PM · Restricted Project
hfinkel added a comment to D53362: [Prototype] Heap-To-Stack Conversion Pass.

Hi Hal! Are you planning on working on this one? If not maybe I can take over? Now that we have nofree & nosync added and deduced in the Attributor, maybe this can be revisited.

Fri, Jul 19, 12:09 AM

Thu, Jul 18

hfinkel added a comment to D64067: [X86][PPC] Support -mlong-double-64.

Hi, we just inherited this commit at Cray when we did our latest upstream merge and there are a few problems with it that I'd like to point out. Sorry that I was not part of the initial discussion here, but I didn't know that this work was being done and I had already done it for x86 in our downstream compiler a while ago.

  1. This patch adds the -m options to f_Group, which is weird. They should be in m_Group since they are target-specific. I actually implemented them as part of a subgroup of m_Group so that I can take the last option from that group that appears on the command line.
  2. This patch lacks the GNU -mlong-double-80 option for x86.
  3. Instead of tracking the size in clang/Basic/LangOptions.def, I track it in clang/Basic/TargetOptions.h. This is a target-specific thing afterall.

    We're trying to resolve merge conflicts with this change and might be able to submit a patch to help reduce our differences. Would that be of interest?
Thu, Jul 18, 12:58 PM · Restricted Project, Restricted Project

Wed, Jul 17

hfinkel accepted D64850: Remove use of malloc in PPC mm_malloc..

LGTM

Wed, Jul 17, 12:12 AM · Restricted Project, Restricted Project

Tue, Jul 16

hfinkel accepted D64499: [PowerPC][HTM] Fix impossible reg-to-reg copy assert with ttest builtin.

LGTM

Tue, Jul 16, 12:59 PM · Restricted Project

Mon, Jul 15

hfinkel added a comment to D64662: [FPEnv] [PowerPC] Lower ppc_fp128 StrictFP Nodes to libcalls.

Sorry Hal, I must have accidentally released an incomplete draft yesterday instead of simply deleting it like I meant to.

Mon, Jul 15, 11:27 AM · Restricted Project

Sun, Jul 14

hfinkel accepted D64283: [PowerPC] Support -mabi=ieeelongdouble and -mabi=ibmlongdouble.

LGTM

Sun, Jul 14, 10:18 PM · Restricted Project, Restricted Project
hfinkel accepted D64282: [PowerPC] Support fp128 libcalls.

LGTM

Sun, Jul 14, 9:37 PM · Restricted Project

Thu, Jul 11

hfinkel accepted D64192: [MachinePipeliner] Fix order for nodes with Anti dependence in same cycle.

LGTM.

Thu, Jul 11, 3:12 PM · Restricted Project
hfinkel added inline comments to D64192: [MachinePipeliner] Fix order for nodes with Anti dependence in same cycle.
Thu, Jul 11, 1:45 PM · Restricted Project
hfinkel added a comment to D64557: Add llvm.loop.licm.disable metadata.

Unfortunately I don't believe I have an example that is suitable for publishing. The problem is basically that the code motion is generally a good thing, but it can increase live ranges significantly pushing register pressure up such that performance is degraded, but that impact isn't apparent at the point at which the transformation is performed. Disabling the pass for certain loops is a low-impact pragmatic (but not ideal) solution.

Thu, Jul 11, 7:44 AM · Restricted Project
hfinkel added a reviewer for D64557: Add llvm.loop.licm.disable metadata: Meinersbur.
Thu, Jul 11, 7:37 AM · Restricted Project

Wed, Jul 10

hfinkel accepted D64368: [LoopUnroll+LoopUnswitch] do not transform loops containing callbr.

it would be good to add a test with callbr there as well, to make sure we do not miss it, in case we remove the restriction again.

I agree and would be happy to do so. Would you prefer that additional test case in this commit or a subsequent commit (either way, before I re-visit https://reviews.llvm.org/D64101)?

Wed, Jul 10, 5:24 PM · Restricted Project
hfinkel accepted D64468: Replace three "strip & accumulate" implementations with a single one.

LGTM

Wed, Jul 10, 4:12 PM · Restricted Project

Tue, Jul 9

hfinkel added a comment to D64283: [PowerPC] Support -mabi=ieeelongdouble and -mabi=ibmlongdouble.

Ah, fun with overloaded, legacy command-line options...

Tue, Jul 9, 1:11 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D64258: [InferFuncAttributes] extend 'dereferenceable' attribute based on loads.
Tue, Jul 9, 11:58 AM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

Do we have any policies against using clang-only builtins in the codebase?

Tue, Jul 9, 11:31 AM · Restricted Project, Restricted Project
hfinkel added a comment to D64394: [MachineCSE][MachinePRE] Do not hoist common computations into hot BBs.

However, if CMBB is in a loop body, we might get performance degradation.

Tue, Jul 9, 9:28 AM · Restricted Project

Mon, Jul 8

hfinkel accepted D63806: [PowerPC][Peephole] Combine extsw and sldi after instruction selection.

LGTM

Mon, Jul 8, 7:52 PM · Restricted Project
hfinkel added inline comments to D63806: [PowerPC][Peephole] Combine extsw and sldi after instruction selection.
Mon, Jul 8, 7:04 PM · Restricted Project
hfinkel added inline comments to D64258: [InferFuncAttributes] extend 'dereferenceable' attribute based on loads.
Mon, Jul 8, 3:09 PM · Restricted Project
hfinkel accepted D64035: [MachinePipeliner] Fix Phi refers to Phi in same stage in 1st epilogue.

Ping...

Mon, Jul 8, 2:06 PM · Restricted Project
hfinkel added a comment to D63378: [ORC] WIP Speculative compilation.

Can you please upload a patch with full context?

Mon, Jul 8, 1:11 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D63806: [PowerPC][Peephole] Combine extsw and sldi after instruction selection.
Mon, Jul 8, 10:01 AM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

I would be opposed to any addition of a -f flag for this absent any evidence that it's valuable for some actual C code; this patch appears to be purely speculative. I certainly don't think we should be adding it default-on. Your argument about alignment is what I was referring to when I mentioned that this is enforcing alignment restrictions in places we traditionally and intentionally haven't.

Mon, Jul 8, 8:53 AM · Restricted Project, Restricted Project

Fri, Jul 5

hfinkel accepted D59919: [Attributor] Deduce "returned" argument attribute.

LGTM

Fri, Jul 5, 8:30 PM · Restricted Project, Restricted Project
hfinkel accepted D64224: Keep the order of the basic blocks in the cloned loop as the original loop.

LGTM

Fri, Jul 5, 7:45 PM · Restricted Project
hfinkel added inline comments to D64224: Keep the order of the basic blocks in the cloned loop as the original loop.
Fri, Jul 5, 7:30 PM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

int *x = malloc(4);
int *y = malloc(4);
if (x & ~15 == y) {
  *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
}

I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have void *memory_pool, and we need to allow x & N to point into memory_pool, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).

There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary).

This is what I was thinking about for allocators; this is a common implementation technique for free / realloc / malloc_size.

If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default).

If you mean a theoretical minimal C abstract machine that does not correspond to an actual target and is therefore not bound by any of the statements in the C standard that say things like "this is expected to have its obvious translation on the target", then yes, I completely agree. If you're talking about the actual C programming language that does correspond to actual targets, then it's not clear at all that it's outside the C abstract machine, because AFAICT integer-pointer conversions are (1) well-specified on specific targets by this de facto requirement of corresponding directly to pointer representations and (2) well-behaved as long as the integer does correspond to the address of an actual object of that type.

Also, please understand that compiler writers have been telling our users for decades that (1) pointer arithmetic is subject to some restrictions on penalty of UB and (2) they can avoid those restrictions by using pointer-integer conversions and doing integer arithmetic instead.

Fri, Jul 5, 7:22 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64220: [PowerPC] Remove redundant load immediate instructions.

Is this really a PowerPC-specific problem? Do we just want to have a general post-RA cleanup that checks for isMoveImmediate() and does this cleanup?

Fri, Jul 5, 6:52 PM · Restricted Project
hfinkel added inline comments to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..
Fri, Jul 5, 6:45 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D64217: [OpenMP][NFCI] Cleanup the target state queue implementation.
Fri, Jul 5, 6:42 PM · Restricted Project
hfinkel added a comment to D64224: Keep the order of the basic blocks in the cloned loop as the original loop.

Do you have any kind of test case? I suppose that you can check the output in a test in test/Analysis/LoopInfo to show the ordering?

Fri, Jul 5, 6:36 PM · Restricted Project
hfinkel added a comment to D64198: [NFC][PowerPC] Add the feature control for PreRA and PostRA scheduler.

As Hal mentioned, enable-misched and enable-post-misched should be able to turn on/off the pre-ra/post-ra scheduler.
Regarding to different subtarget (cpu), why we need to turn on/off with feature at runtime? Shouldn't that be configured in code?

yeah, per function control might be more useful here.

Ah, I didn't realize the option you mentioned, and yes, they can turn on/off the scheduler target independent for all cpu.
But if we want to turn on/off the scheduler for specific cpu, we have to add the feature for it. And the feature supports to be configured at runtime.

I am ok to abandon this patch as I have got the options to turn on/off the scheduler for PowerPC. However, I didn't see the harm to add this feature for PowerPC, as, it is the right way if we want to custom the scheduler for different cpu model. Any thoughts ?

Fri, Jul 5, 6:29 PM · Restricted Project
hfinkel accepted D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

The patch description says that you add a trip count "per thread", but the code is adding a trip count per device. Could this still cause a problem if multiple threads have target regions for the same device?

No. We already have tripcount per device: each device stores its unique tripcount value, but only one. I extended the existing solution to have the tripcount not only per device, but also per host thread.

Would there be any advantage to using __kmpc_threadprivate_register and __kmpc_threadprivate (or similar) instead of storing a map with thread ids?

Hard to tell, but definitely ee increase dependency between libomptarget and libomp. and, seems to me, threadprivate interface also uses some kind of hash table. We can use unordered_map instead.

Given that we expect this to be temporary, I'm okay with it either way. unordered_map is probably a little bit better than std::map. I expect the number of entries in the map to be small because __kmpc_global_thread_num returns OpenMP thread ids, not system TIDs, right?

Yes.

SGTM.

Fri, Jul 5, 6:24 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D64217: [OpenMP][NFCI] Cleanup the target state queue implementation.
Fri, Jul 5, 6:21 PM · Restricted Project
hfinkel added a comment to D63614: [System Model] [TTI] Update cache and prefetch TTI interfaces.

I know this now says "ready to land" but is one review really sufficient?

Fri, Jul 5, 9:57 AM · Restricted Project

Thu, Jul 4

hfinkel requested changes to D64198: [NFC][PowerPC] Add the feature control for PreRA and PostRA scheduler.

We may need to disable the Pre-RA or Post-RA scheduler. Add the feature control for it.

Why? If this is just for debugging, then my thought is: We do have cl::opts that control this, and if we want this on a per-function basis, then this is not a ppc-specific problem and should introduce some target-independent attribute.

Thu, Jul 4, 8:27 AM · Restricted Project
hfinkel accepted D64198: [NFC][PowerPC] Add the feature control for PreRA and PostRA scheduler.

We may need to disable the Pre-RA or Post-RA scheduler. Add the feature control for it.

Thu, Jul 4, 8:26 AM · Restricted Project

Wed, Jul 3

hfinkel accepted D63614: [System Model] [TTI] Update cache and prefetch TTI interfaces.

I'd prefer we remove the redundant comments in TTIImpl.h (see below), but otherwise, LGTM.

Wed, Jul 3, 11:04 PM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

Wed, Jul 3, 7:56 PM · Restricted Project, Restricted Project
hfinkel accepted D63814: [TableGen] Allow DAG isel patterns to override default operands..

LGTM

Wed, Jul 3, 7:20 PM · Restricted Project
hfinkel accepted D63803: [PowerPC] Move TOC save to prologue when profitable.

One question below, but otherwise, LGTM.

Wed, Jul 3, 7:11 PM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

If they're all syntactically together like this, maybe that's safe?

Having them together syntactically doesn't really help, I think; it might be guarded by some code that does the same conversion (and if you repeat the conversion, it has to produce the same result).

Wed, Jul 3, 6:37 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64101: [LoopUnroll] fix cloning callbr.

Or actually, it might make more sense to change the way we generate/lower callbr, to make the label parameters implicit; instead of modeling them with blockaddress, we never write them in IR at all, and automatically generate them later based on the successor list.

Wed, Jul 3, 5:45 PM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

I don't think this transform is valid, for the same reasons we don't do it in IR optimizations.

Wed, Jul 3, 4:47 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

The patch description says that you add a trip count "per thread", but the code is adding a trip count per device. Could this still cause a problem if multiple threads have target regions for the same device?

No. We already have tripcount per device: each device stores its unique tripcount value, but only one. I extended the existing solution to have the tripcount not only per device, but also per host thread.

Would there be any advantage to using __kmpc_threadprivate_register and __kmpc_threadprivate (or similar) instead of storing a map with thread ids?

Hard to tell, but definitely ee increase dependency between libomptarget and libomp. and, seems to me, threadprivate interface also uses some kind of hash table. We can use unordered_map instead.

Given that we expect this to be temporary, I'm okay with it either way. unordered_map is probably a little bit better than std::map. I expect the number of entries in the map to be small because __kmpc_global_thread_num returns OpenMP thread ids, not system TIDs, right?

Yes.

Wed, Jul 3, 4:31 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

The patch description says that you add a trip count "per thread", but the code is adding a trip count per device. Could this still cause a problem if multiple threads have target regions for the same device?

No. We already have tripcount per device: each device stores its unique tripcount value, but only one. I extended the existing solution to have the tripcount not only per device, but also per host thread.

Would there be any advantage to using __kmpc_threadprivate_register and __kmpc_threadprivate (or similar) instead of storing a map with thread ids?

Hard to tell, but definitely ee increase dependency between libomptarget and libomp. and, seems to me, threadprivate interface also uses some kind of hash table. We can use unordered_map instead.

Wed, Jul 3, 3:25 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

The patch description says that you add a trip count "per thread", but the code is adding a trip count per device. Could this still cause a problem if multiple threads have target regions for the same device?

No. We already have tripcount per device: each device stores its unique tripcount value, but only one. I extended the existing solution to have the tripcount not only per device, but also per host thread.

Wed, Jul 3, 2:55 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

The patch description says that you add a trip count "per thread", but the code is adding a trip count per device. Could this still cause a problem if multiple threads have target regions for the same device?

It's one map per device, each map contains one trip count per OpenMP thread.

Wed, Jul 3, 2:34 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64124: [PowerPC] Hardware Loop branch instruction's condition may not be icmp.

Thanks. LGTM

Wed, Jul 3, 2:30 PM · Restricted Project
hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

Wed, Jul 3, 2:27 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64067: [X86][PPC] Support -mlong-double-64.
In D64067#1568888, @rnk wrote:

In this review (https://reviews.llvm.org/D6260) @rsmith mentions that this should also have an effect on name mangling.

I'm not sure that's consistent with GCC, at least not anymore:
https://gcc.godbolt.org/z/eUviCd
Looks like you can still have an overload set with double and long double, even if they both use the same representation.

Wed, Jul 3, 11:25 AM · Restricted Project, Restricted Project
hfinkel added a comment to D64090: [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457).

LGTM

Thank you for the review.

PPC hook may also need tuning, but i'm not sure as to the incantation needed.

Wed, Jul 3, 9:39 AM · Restricted Project

Tue, Jul 2

hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

A long time ago we had identified the problem with the loop trip count and if I recall correctly the proposed solution was to store the trip count per OpenMP task. The fix would be implemented in libomp because libomptarget has no notion of tasks.

@AlexEichenberger Can you confirm whether or not this is the case? I may confuse this with something else...

The patch does exactly what you said: stores tripcount per task, but in the libomptarget, not libomp. When it will be implemented in libomp, we could remove this code. But we need fully functional implementation now, not in the future.

If this should really be fixed in libomp, then we should fix it there. At the very least, there should be a FIXME about removing the workaround once the libomp fix is in place. @AndreyChurbanov, any thoughts on this?

Hal, currently target tripcount has nothing to do with libomp. It is stored as part of the device in libomptarget. So, I think, to fix this problem in libomptarget is fine.
To put it into target task allocated memory is completely different problem, which requires some redesign and full support of target tasks.

I think that it makes sense to note this in the code. Can you please add a comment along the lines of (assuming that I'm understanding you correctly):

// NOTE: Once libomp gains full target-task support, this state should be moved into the target task in libomp.

Ok.

Tue, Jul 2, 9:59 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64067: [X86][PPC] Support -mlong-double-64.

Can you please add a test ensuring that -malign-double and -mlong-double-64 interact properly? I think that, in the current patch, they do (as -malign-double is processed first), but I'd prefer that we cover that case explicitly.

Tue, Jul 2, 9:51 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

A long time ago we had identified the problem with the loop trip count and if I recall correctly the proposed solution was to store the trip count per OpenMP task. The fix would be implemented in libomp because libomptarget has no notion of tasks.

@AlexEichenberger Can you confirm whether or not this is the case? I may confuse this with something else...

The patch does exactly what you said: stores tripcount per task, but in the libomptarget, not libomp. When it will be implemented in libomp, we could remove this code. But we need fully functional implementation now, not in the future.

If this should really be fixed in libomp, then we should fix it there. At the very least, there should be a FIXME about removing the workaround once the libomp fix is in place. @AndreyChurbanov, any thoughts on this?

Hal, currently target tripcount has nothing to do with libomp. It is stored as part of the device in libomptarget. So, I think, to fix this problem in libomptarget is fine.
To put it into target task allocated memory is completely different problem, which requires some redesign and full support of target tasks.

Tue, Jul 2, 9:23 PM · Restricted Project, Restricted Project
hfinkel accepted D63477: [PowerPC] exclude ICmpZero Use in LSR if icmp can be replaced inside hardware loop..

LGTM

Tue, Jul 2, 4:57 PM · Restricted Project
hfinkel updated subscribers of D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

A long time ago we had identified the problem with the loop trip count and if I recall correctly the proposed solution was to store the trip count per OpenMP task. The fix would be implemented in libomp because libomptarget has no notion of tasks.

@AlexEichenberger Can you confirm whether or not this is the case? I may confuse this with something else...

The patch does exactly what you said: stores tripcount per task, but in the libomptarget, not libomp. When it will be implemented in libomp, we could remove this code. But we need fully functional implementation now, not in the future.

Tue, Jul 2, 4:53 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D62766: [Attributor] Deduce "nosync" function attribute..
Tue, Jul 2, 4:40 PM · Restricted Project
hfinkel accepted D64094: CodeGen: Set hasSideEffects = 0 on BUNDLE.

If we already search the member instructions, then this seems like the right thing to do. LGTM

Tue, Jul 2, 2:15 PM

Mon, Jul 1

hfinkel updated subscribers of D64015: [WIP][CUDA] Use shared MangleContext for CUDA and CXX CG.
Mon, Jul 1, 10:02 AM · Restricted Project

Wed, Jun 26

hfinkel added a comment to D63782: [FPEnv] Add fptosi and fptoui constrained intrinsics.

Is this a general comment or referring to a change in Kevin's patch?

It's a question about the semantics of the proposed llvm.experimental.constrained.fptosi/llvm.experimental.constrained.fptoui. Specifically, what does it do when the result is larger than the largest integer representable in the destination format? LangRef for fptoui/fptosi says "If the value cannot fit in ty2, the result is a poison value."

Unless I'm misunderstanding, we should be leaving the constrained converts alone until a hardware instruction is produced.

We have to define the semantics; I mean, I guess we could say "If the value cannot fit in the destination type, the result is computed in a target-specific way", but we'd have to state it explicitly. And it's sort of awkward.

Wed, Jun 26, 3:49 PM · Restricted Project
hfinkel added a comment to D60091: [test-suite] Signal error if llvm-lit was not found.

Reverted in rL364448, thanks.
I have never touched bot configs, so i'm not the one to fix the phenomenon, so please address it.

Wed, Jun 26, 10:29 AM · Restricted Project

Tue, Jun 25

hfinkel added inline comments to D63590: [PowerPC] Sign extend the select instr operands if it is any_extend.
Tue, Jun 25, 8:34 PM · Restricted Project
hfinkel added inline comments to D63590: [PowerPC] Sign extend the select instr operands if it is any_extend.
Tue, Jun 25, 8:03 PM · Restricted Project
hfinkel added inline comments to D63803: [PowerPC] Move TOC save to prologue when profitable.
Tue, Jun 25, 7:50 PM · Restricted Project
hfinkel added inline comments to D63590: [PowerPC] Sign extend the select instr operands if it is any_extend.
Tue, Jun 25, 7:25 PM · Restricted Project
hfinkel added inline comments to D63477: [PowerPC] exclude ICmpZero Use in LSR if icmp can be replaced inside hardware loop..
Tue, Jun 25, 7:20 PM · Restricted Project
hfinkel added inline comments to D63477: [PowerPC] exclude ICmpZero Use in LSR if icmp can be replaced inside hardware loop..
Tue, Jun 25, 7:17 PM · Restricted Project
hfinkel added a comment to D61911: [GlobalOpt] Allow dead struct fields in SRA with non constant offset..

The problem here is that the IR semantics don't allow this transform in general.

Even if you've managed to dodge the exact construct from bug 38309 (I'm not sure I understand how, but it's not really relevant), the rules for GEP indexing don't work the way you want them to. "inbounds" just means the result is somewhere within the same global; it doesn't mean that array indexes don't "overflow". See http://llvm.org/docs/LangRef.html#getelementptr-instruction .

Tue, Jun 25, 7:08 PM · Restricted Project
hfinkel accepted D61935: [PowerPC][HTM] Fix disassembling buffer overflow for tabortdc and others.

LGTM

Tue, Jun 25, 7:03 PM · Restricted Project
hfinkel added inline comments to D63590: [PowerPC] Sign extend the select instr operands if it is any_extend.
Tue, Jun 25, 6:54 PM · Restricted Project
hfinkel accepted D63800: [NFC][PowerPC] Improve the for loop in Early Return.

LGTM

Tue, Jun 25, 6:41 PM · Restricted Project
hfinkel accepted D60091: [test-suite] Signal error if llvm-lit was not found.

LGTM

Tue, Jun 25, 4:26 PM · Restricted Project
hfinkel accepted D63634: [PowerPC] Mark FCOPYSIGN legal for FP vectors.

LGTM. Thanks for fixing.

Tue, Jun 25, 4:13 PM · Restricted Project
hfinkel accepted D63507: Teach TableGen Intrin Emitter to handle LLVMPointerType<llvm_any_ty>.

LGTM

Tue, Jun 25, 4:11 PM · Restricted Project
hfinkel added inline comments to D63507: Teach TableGen Intrin Emitter to handle LLVMPointerType<llvm_any_ty>.
Tue, Jun 25, 3:59 PM · Restricted Project

Jun 18 2019

hfinkel accepted D63515: [OPENMP][CUDA]Use __syncthreads when compiled by nvcc and clang >= 9.0..

LGTM. Thanks!

Jun 18 2019, 2:01 PM · Restricted Project, Restricted Project
hfinkel added a comment to D63488: [docs] Phabricator, not the lists is the main entry point for new patches.

Requiring Phabricator raises the barrier to one-off patches from casual contributors, because using Phabricator requires a registration step.
I don't think we should require it until casual users with drive-by patches can contribute easily.

Jun 18 2019, 10:13 AM · Restricted Project

Jun 15 2019

hfinkel added a comment to D63378: [ORC] WIP Speculative compilation.

I assume that you're posting this for initial feedback, but a patch description explaining the usage model would be helpful.

Jun 15 2019, 8:43 AM · Restricted Project, Restricted Project
hfinkel added a comment to D63378: [ORC] WIP Speculative compilation.

Can you please update this patch with full context?

Jun 15 2019, 8:43 AM · Restricted Project, Restricted Project
hfinkel added a comment to D63375: [ConstantFolding] Fix assertion failure on non-power-of-two vector load..

The test case does an (out of bounds) load from a global constant with type <3 x float>.

Jun 15 2019, 7:05 AM · Restricted Project

Jun 14 2019

hfinkel added inline comments to D63329: Allow static linking of libc++ on Linux, just like -static-libstdc++.
Jun 14 2019, 11:55 AM · Restricted Project

Jun 13 2019

hfinkel added a comment to D63068: [AVR] Fix incorrect stack parameter push order.

Transform all store nodes into one single node to ensure the order of all store nodes can't be changed. So that the push instruction sequence generated would be correct.

Jun 13 2019, 7:44 PM · Restricted Project
hfinkel added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

...

Again, I see no reason to believe that everyone here isn't acting in good faith and working to create the software of the highest possible quality. Thanks!

Sorry, but Johannes did this. "The more patches go through with this kind of "review" the more "suspicious" this looks to me." It is his words. seems to me, he does not think that me or my colleagues are acting in good faith.

Okay, I understand why you say this, but no. Frankly, Johannes was implying that the reviews were too lax (e.g., allowing combined patches with insufficient tests/documentation/etc.).

Only one patch was combined.

I'm not sure that this is true. D55379 was also. There might be others. I don't think that it's useful to conduct an audit in this regard. The important point is that there were a number of issues over time, some have been addressed, and others we're addressing right now.

Ok, you found another one committed half year ago. Do really think you can call it systematic and suspicious?

Jun 13 2019, 6:07 PM · Restricted Project
hfinkel added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

...

Again, I see no reason to believe that everyone here isn't acting in good faith and working to create the software of the highest possible quality. Thanks!

Sorry, but Johannes did this. "The more patches go through with this kind of "review" the more "suspicious" this looks to me." It is his words. seems to me, he does not think that me or my colleagues are acting in good faith.

Okay, I understand why you say this, but no. Frankly, Johannes was implying that the reviews were too lax (e.g., allowing combined patches with insufficient tests/documentation/etc.).

Only one patch was combined.

Jun 13 2019, 3:01 PM · Restricted Project
hfinkel accepted D63282: [MachinePipeliner] Don't check boundary node in checkValidNodeOrder.

LGTM

Jun 13 2019, 1:14 PM · Restricted Project
hfinkel added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

...

Again, I see no reason to believe that everyone here isn't acting in good faith and working to create the software of the highest possible quality. Thanks!

Sorry, but Johannes did this. "The more patches go through with this kind of "review" the more "suspicious" this looks to me." It is his words. seems to me, he does not think that me or my colleagues are acting in good faith.

Jun 13 2019, 12:46 PM · Restricted Project
hfinkel added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

This is a clarrification for some older comments.

First of all, you started this when expressed the thought that the patches are accepted not because they are correct, but because we all work in the same company. This was rude! Besides, we don't work at the same company anymore.

I do not remember me saying this, especially since I know you don't work in the same company.

Second, you need to try to understand the main idea yourself. I can't explain the whole protocol between the compiler and the runtime, it will take a lot of time. Try to understand this at first and then ask the questions about the particular details, but not the whole scheme. Or you want me ask you to explain the principles of Attributor in full? Or loop vectorizer? Or something else? Your question is not productive.

That is exactly what happens (wrt. Attributor), people looked at the code and asked about principles, requested documentation, etc. If you look at the code know, it is all the better for it. So far you just ignored my request for clarifications and justifications which is what this whole code review actually started with.

Third, the single array is used to handle up to 128 (not 256) inner parallel region. It is fine from point of view of the standard. The number of supported inner parallel levels (both, active and inactive) is implementation-defined.

I asked where we document that a single array encodes both, the number of active and inactive parallel regions, at the same time. The code is not sufficient documentation for such a low-level implementation detail. Similar, but before not on my radar, is the fact that there is an apparently undocumented implementation detail wrt the number of levels we can handle.

Hal, I agree that it was very dangerous situations, but I think, you need to prove something before trying to escalate the situation and blame somebody in doing the incorrect things. Johannes did this without any proves, he directly blamed me and others in doing improper things. Though he has no idea at all how things work in Cuda.

I did already provide plenty of arguments in https://reviews.llvm.org/D62199#1515182 and https://reviews.llvm.org/D62199#1517073.

For the record, this code review was (from my side) never about accusations or improper behavior but about:

  1. The right solution to a problem I don't think we specified(*) properly, and
  2. The general lack of documentation that lead to various questions on how to interpret the changes.

    (*) My question about which accesses are racy have been ignored, as did my inquiries about an alternative explicit synchronization to communicate the value through all threads in the warp.

    I don't think I have "no idea at all how things work in Cuda" and I don't appreciate the accusation.

Yes, but you need to have at least some basic knowledge about OpenMP itself, Cuda and the OpenMP runtime. I have no time to write the lectures trying to explain some basic things to Johannes.

Me, and probably other people, might argue I have some OpenMP/Clang experience.

Yes, you're right. The design decisions must described somewhere. But I don't think that this must prevent the work on the runtime improvement.

This seems to me like a core issue. Improvements without documentation will inevitably cause problems down the road and rushing into a solution is also not a sustainable practise.

Arguably, atomic accesses provide atomicity. You also did never respond to the question which other accesses actually race with the ones to parallelLevel or why you do not synchronize the accesses to parallelLevel across the warp explicitly.

I answered all the questions. We can use the explicit synchronizatiin, yes, but it will slow down the whole program execution. Instead, it is more effective to synchronize the access to the single variable. Volatile modifier allows to do this without explicit atomic operations. In Cuda volatile is slightly different form that in C/C++ and can be used to implement relaxed atomic operations.
In this array,the memory is written by one thread but can be read by other threads in the same warp. Without volatile modifier or atomic ops the ptxas tool reorders operations and it leads to the undefined behaviour when the runtime is inlined.

According to D50391, explicit relaxed accesses are lowered into volatile PTX accesses. Why did you see a slowdown or did you simply expect a slowdown?

If somebody tries to review the code, this review must be productive. If the review leads to the set of lectures about language details, it is not productive. It is school lectures. Should I explain Cuda language in the review? No. To review the code you need to have at least some minimal level of the expertise.

I never asked for a school lecture, nor is it appropriate that you think you have to give me one instead of actually answering my questions and commenting on my concerns. Simply questioning my expertise and therefore ignoring my concerns not helping anyone.

Hal, @hfinkel, do you really want me to continue with all this stuff?

Jun 13 2019, 10:34 AM · Restricted Project
hfinkel added a comment to D61228: [PowerPC] Set the innermost hot loop to align 32 bytes.

@nemanjai @hfinkel I have updated the patch to align to 32 bytes even if wthout PGO data.

Jun 13 2019, 9:47 AM · Restricted Project

Jun 12 2019

hfinkel added a comment to D61652: [Attr] Introduce dereferenceable_globally.

I'm specifically asking that this change not be committed - despite Hal's LGTM - before this point is addressed.

Jun 12 2019, 4:23 PM · Restricted Project
hfinkel accepted D60012: [Attributor] Introduce bit-encodings for abstract states.

LGTM.

Jun 12 2019, 3:37 PM · Restricted Project
hfinkel added inline comments to D61652: [Attr] Introduce dereferenceable_globally.
Jun 12 2019, 3:21 PM · Restricted Project
hfinkel accepted D61652: [Attr] Introduce dereferenceable_globally.

This LGTM. I suggest that we commit this along with the implementation.

Jun 12 2019, 3:21 PM · Restricted Project

Jun 11 2019

hfinkel added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..
In D62393#1539012, @jfb wrote:

FWIW we already support -fms-volatile, so there's precedent if you wanted -fnv-volatile (however terrible that is).

Most probably, we don't need this option, clang should emit correct code for volatile vars in Cuda mode automatically.

Jun 11 2019, 3:56 PM · Restricted Project
hfinkel added a comment to D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

I know some things about CUDA, volatile and C++. Let's see if I can understand the part of the proposed change that involves these. I don't understand the part about the test but I don't need to, I'll ignore that.

The gist of the issue is that parallelLevel table should really be atomic<> because of data-races on it (that was a bug prior to this, maybe there are more such bugs lying around), except there's a problem with the obvious fix: atomic<> is not available in CUDA (yet) so it's not an option to fix this issue. The next best thing we have instead is volatile.

Now, volatile in PTX (e.g. asm("ld.volatile...[x];")) and volatile (e.g. "volatile ...x;") in C++ source like this, are not exactly the same thing. When CUDA says that volatile is equivalent memory_order_relaxed, it's saying something clear (I think) about the PTX level code but it's still being pretty vague about CUDA C++ level code. OTOH it's entirely possible for Clang to do something with either atomic<> or volatile that isn't valid for the other -- and that's a different can of worms than, say, NVCC which does {whatever NVCC does, it's not the compiler you're using}.

However, since people do use CUDA C++ volatile this way a lot already, you can't really have a good CUDA toolchain unless it is the case (including, by accident) that it works for this purpose. In other words, it's probably reasonable to assume this in your code because every other code would be on fire otherwise, and it's not on fire, so far as we can tell.

It might be worth it to prepare your code for the eventual arrival of atomic<> on CUDA. That is, maybe create a template alias on T with some helper functions, just enough for your use. It might make this code more self-documenting and make it easy to make it 100% legit later on.

Hi Olivier, thanks for your comments.

Jun 11 2019, 2:41 PM · Restricted Project
hfinkel accepted D62164: [PowerPC] Enable MachinePipeliner for P9 with -ppc-enable-pipeliner.

LGTM

Jun 11 2019, 8:49 AM · Restricted Project
hfinkel updated subscribers of D62393: [OPENMP][NVPTX]Mark parallel level counter as volatile..

> The standard has "levels" and "active-levels". Both need to be tracked but it seems we only have a single array?
>

They all are tracked on this array.

Where is that described?

Sorry, read the code. I'm not a debugger.

First of all, this response is simply rude.
Second, you wrote the code so my question is well justified.
Third, does that mean you encode multiple values in a single int array but did not document this at all? The code is not documentation.
Finally, this will break once we have a parallel region in a recursive loop with more than 254 instantiations, correct?

First of all, you started this when expressed the thought that the patches are accepted not because they are correct, but because we all work in the same company. This was rude! Besides, we don't work at the same company anymore.

Jun 11 2019, 8:47 AM · Restricted Project

Jun 10 2019

hfinkel added inline comments to D62989: [Unroll] Do NOT unroll a loop with small runtime upperbound.
Jun 10 2019, 9:44 PM · Restricted Project