Page MenuHomePhabricator

hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (357 w, 2 d)

Recent Activity

Wed, Sep 18

hfinkel added a comment to D63607: [clang][driver] Add basic --driver-mode=flang support for fortran.

One thought that occurs to me when reviewing this. I think we will assume that F18/flang when it lands in the LLVM project will be an optional thing to build and will be an opt-in project at the start that is off by default. What happens when clang --driver-mode=flang is called and flang has not been built? And where would we put a test for that behaviour? If flang were not to be built, should --driver-mode=flang be unsupported/unrecognised and emit an error message?

Wed, Sep 18, 10:54 AM · Restricted Project

Tue, Sep 17

hfinkel accepted D66902: [PowerPC] Implementing overflow version for XO-Form instructions.

LGTM too.

Tue, Sep 17, 8:57 AM · Restricted Project

Mon, Sep 16

hfinkel added inline comments to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.
Mon, Sep 16, 11:31 PM · Restricted Project
hfinkel added inline comments to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.
Mon, Sep 16, 1:46 PM · Restricted Project

Thu, Sep 12

hfinkel added a comment to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.

Thanks for exploring this direction.

Thu, Sep 12, 11:54 AM · Restricted Project

Wed, Sep 11

hfinkel added a comment to D67259: [X86] Enable -mprefer-vector-width=256 by default for Skylake-avx512 and later Intel CPUs..

Do we need to include some benchmark numbers?

What tests do you think we should run?

I don't think it's practical to benchmark this. The effects are well-documented and well-understood. Additionally, the incidence and severity of high-power AVX frequency reductions are a function of system states and workload profile, which makes them difficult to reproduce in small, self-contained examples.

It's sufficient justification for this change to note that it brings Clang into line with other compilers. See here:

A significant problem is compiler inserted AVX-512 instructions. Even if you are not using any explicit AVX-512 instructions or intrinsics, compilers may decide to use them as a result of loop vectorization, within library functions and other optimization. Even something as simple as copying a structure may cause AVX-512 instructions to appear in your program. Current compiler behavior here varies greatly, and we can expect it to change in the future. In fact, it has already changed: Intel made more aggressive use of AVX-512 instructions in earlier versions of the icc compiler, but has since removed most use unless the user asks for it with a special command line option. Based on some not very comprehensive tests of LLVM’s clang (the default compiler on macOS), GNU gcc, Intel’s compiler (icc) and MSVC (part of Microsoft Visual Studio), only clang makes aggressive use of 512-bit instructions for simple constructs today: it used such instructions while copying structures, inlining memcpy, and vectorizing loops.

Wed, Sep 11, 8:54 AM · Restricted Project

Tue, Sep 10

hfinkel added a comment to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.

(*) Actually, for both cases, we can also have a separate class ID for scalar i1 types (with 8 registers).

Tue, Sep 10, 8:34 PM · Restricted Project
hfinkel added a comment to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.

Thanks for looking at this (it's been a problem for a long time). Let me suggest a different interface, which I believe will improve generality and reduce code duplication in the register-pressure estimator, and let me know what you think...

// Return the number of registers in the target-provided register class.
unsigned getNumberOfRegisters(unsigned ClassID = 0) const;
 
// Return the target-provided register class for the provided type.
unsigned getRegisterClassForType(Type *Ty) const;

The idea, then, is that we just calculate register usage for each register class separately (i.e., keep a hash table), and then when computing the interleaving factor, etc. we just iterate over all of the register classes returned by the target, and pick the smallest interleaving factor calculated over all of the register classes. There's probably even a nice way to construct a default implementation of this in the backend (although that we'd save for follow-up work).

Yes. Using ClassID is more general and fine-grained. But I think there is no need to iterate all kinds of register class because target register class is flexible and many register classes target provided are only different width with same position(such as gprc and g8rc), which makes too many kinds of register classes to maintain and some are just the same thing. It's not just use smallest interleaving factor calculated over all of the register classes and we also need to care about the overlapping relationship between different register class.

In all, it would be too fine-grained and little over-design, many register classes are just same behavior as separate with scalar and vector, int and float. What's your opinion?

Tue, Sep 10, 8:32 PM · Restricted Project
hfinkel added inline comments to D63607: [clang][driver] Add basic --driver-mode=flang support for fortran.
Tue, Sep 10, 7:45 PM · Restricted Project
hfinkel added a comment to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.

Thanks for looking at this (it's been a problem for a long time). Let me suggest a different interface, which I believe will improve generality and reduce code duplication in the register-pressure estimator, and let me know what you think...

Tue, Sep 10, 6:51 PM · Restricted Project
hfinkel added inline comments to D67383: Add new optimization pass of Tree-Height-Reduction.
Tue, Sep 10, 6:26 PM · Restricted Project
hfinkel added inline comments to D67305: [AliasSetTracker] Update AAInfo check..
Tue, Sep 10, 4:25 PM · Restricted Project

Mon, Sep 9

hfinkel added inline comments to D67305: [AliasSetTracker] Update AAInfo check..
Mon, Sep 9, 8:15 PM · Restricted Project
hfinkel added a comment to D67088: [PowerPC] extend PPCPreIncPrep Pass for ds/dq form.

@hfinkel Hi Hal, sorry for pushing you on this issue. Could you give some comments for this patch? As I tested on Power9, the patch gains 9.6% on cpu2017 benchmark exchanges, 2.0% on benchmark x264 for base rate, for peak rate, gains 2.1% on cactuBSSN and 1.9% on xalancbmk. Thanks.

Mon, Sep 9, 7:25 PM · Restricted Project
hfinkel added inline comments to D67305: [AliasSetTracker] Update AAInfo check..
Mon, Sep 9, 6:09 AM · Restricted Project
hfinkel accepted D62989: [Unroll] Do NOT unroll a loop with small runtime upperbound.

LGTM

Mon, Sep 9, 5:57 AM · Restricted Project

Thu, Sep 5

hfinkel accepted D66428: Change TargetLibraryInfo analysis passes to always require Function.

@chandlerc @hfinkel - can you take a look? Related to our discussion in D61634.

Thu, Sep 5, 1:14 PM · Restricted Project

Fri, Aug 30

hfinkel accepted D67016: [PowerPC][NFC] Avoid checking non-relevant .cfi instructions.

LGTM

Fri, Aug 30, 11:48 AM · Restricted Project
hfinkel added a comment to D66338: [CGP] Drop no op intrinsic calls.

Is CGP a "mandatory" pass always expected to be executed before ISel? In other words, can we remove logic from SelectionDAGBuilder when rewriting intrinsics here (simply assert that we no longer find intrinsics that should be eliminated by CGP)?

Fri, Aug 30, 8:42 AM · Restricted Project

Thu, Aug 29

hfinkel added inline comments to D66338: [CGP] Drop no op intrinsic calls.
Thu, Aug 29, 4:45 PM · Restricted Project
hfinkel accepted D66963: [PowerPC] Support extended mnemonics mffprwz etc..

mfvrwz RA,VRS is mfvsrwz RA,VRS+32

The '+32' bit is covered by the added CodeGen/PowerPC/inlineasm-extendedmne.ll test, but I'd really like to see direct encoding tests for all of the instructions. Can you please make sure that they all appear in MC/PowerPC/vsx.s and MC/Disassembler/PowerPC/vsx.txt?

Good point, and updated.

Thu, Aug 29, 2:22 PM · Restricted Project
hfinkel added a comment to D66963: [PowerPC] Support extended mnemonics mffprwz etc..

The '+32' bit is covered by the added CodeGen/PowerPC/inlineasm-extendedmne.ll test, but I'd really like to see direct encoding tests for all of the instructions. Can you please make sure that they all appear in MC/PowerPC/vsx.s and MC/Disassembler/PowerPC/vsx.txt?

Thu, Aug 29, 11:58 AM · Restricted Project

Wed, Aug 28

hfinkel added a comment to D62341: [DAGCombine][X86][AArch64][AMDGPU][MIPS][PPC] (sub x, c) -> (add x, -c) vector edition..

@hfinkel should i be worried about PPC regressions here?
Those are there regardless of the patch (instcombine already did this fold).
I'm not yet sure how to handle them, in some of these patterns the constant
is already constant-pool load, in some cases it's bitcasted, etc.

Wed, Aug 28, 3:11 PM · Restricted Project

Mon, Aug 26

hfinkel accepted D66308: [InstCombine] Fold select with ctlz to cttz.

LGTM

Mon, Aug 26, 8:52 PM · Restricted Project

Aug 23 2019

hfinkel added a comment to D53342: [SimplifyLibCalls] Mark known arguments with nonnull.

Yes, but only when size is constant non zero (isKnownNonZero)

Aug 23 2019, 6:02 AM · Restricted Project
hfinkel added a comment to D53342: [SimplifyLibCalls] Mark known arguments with nonnull.

If Clang already drops it [0], that is good, then this patch is totally fine.

Aug 23 2019, 5:49 AM · Restricted Project

Aug 22 2019

hfinkel added a comment to D66461: [CaptureTracker] Comparisons of allocation pointers do not capture.

As I described, the compare will capture only if the other pointer was captured.

Aug 22 2019, 8:49 PM · Restricted Project
hfinkel added a comment to D53342: [SimplifyLibCalls] Mark known arguments with nonnull.

do we add the attributes based on the function names.

I dont see this as a problem - reserved names.

What to do with the attributes that already exist in, e.g., some glibc header files.

Can you point me on real example/OS/glibc version where this actually happens?

I have Ubuntu 18 LTS and glibc and I see no such attributes in IR.

Aug 22 2019, 8:35 PM · Restricted Project
hfinkel added a comment to D53342: [SimplifyLibCalls] Mark known arguments with nonnull.

I would like to hear @hfinkel’s opinion too :)

Aug 22 2019, 6:20 AM · Restricted Project
hfinkel added inline comments to D66050: Improve division estimation of floating points..
Aug 22 2019, 6:01 AM · Restricted Project

Aug 21 2019

hfinkel added inline comments to D66088: AMD Znver2 (Rome) Scheduler enablement.
Aug 21 2019, 10:18 PM · Restricted Project
hfinkel added a comment to D66461: [CaptureTracker] Comparisons of allocation pointers do not capture.

If you compare the same pointer (or derived from the same), you need to capture at least one from them "explicitly" to learn the bits.

Aug 21 2019, 9:08 PM · Restricted Project
hfinkel added a comment to D66461: [CaptureTracker] Comparisons of allocation pointers do not capture.

I'm missing something here. If I have two pointers (e.g., returned from malloc), and I compare them so I know they're equal, then I've learned that the pointers are equal -- and, thus, if I know all of the bits of one pointer I now know all of the bits of the other pointer. As a result, I can still reconstruct it later using that information.

Aug 21 2019, 6:37 AM · Restricted Project
hfinkel accepted D65186: [MustExec] Add a generic "must-be-executed-context" explorer.

LGTM

Aug 21 2019, 6:31 AM · Restricted Project
hfinkel added a comment to D66029: llvm-canon.

Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.

Aug 21 2019, 6:27 AM · Restricted Project
hfinkel accepted D65800: [LLVM][Alignment] Introduce Alignment In MachineFrameInfo.

LGTM

Aug 21 2019, 6:19 AM · Restricted Project
hfinkel accepted D66157: [BasicAA] Use dereferenceability to reason about aliasing.

LGTM

Aug 21 2019, 6:19 AM · Restricted Project

Aug 20 2019

hfinkel added a comment to D66490: [NewPM] Enable the New Pass Manager by Default in Clang.

We already know that we don't want this enabled for tsan builds due to https://bugs.llvm.org/show_bug.cgi?id=42877, but I don't even know if anyone else will hit it (it's only when building one particular library).

Aug 20 2019, 8:42 PM · Restricted Project
TijmenW awarded D9375: An llvm.noalias intrinsic a Like token.
Aug 20 2019, 6:26 AM

Aug 19 2019

hfinkel added inline comments to D66157: [BasicAA] Use dereferenceability to reason about aliasing.
Aug 19 2019, 8:13 PM · Restricted Project

Aug 17 2019

hfinkel added a comment to D66338: [CGP] Drop no op intrinsic calls.

Anyway I think this fix is good and simple - does not require individual fixes for GlobalISel, FastISel, SelectionDAG.

Aug 17 2019, 8:47 AM · Restricted Project

Aug 16 2019

hfinkel added a comment to D66338: [CGP] Drop no op intrinsic calls.

(As an alternative, this can be fixed in the phrase when llvm.assume is dropped in backend.. but not sure if we have "backend" version of RecursivelyDeleteTriviallyDeadInstructions)

Aug 16 2019, 11:59 AM · Restricted Project
hfinkel added a comment to D66338: [CGP] Drop no op intrinsic calls.

Oh, this is funny (or nasty) bug.. Never called function is now called! :)

a.c
int foo(void) attribute((const));

int main(void) {

__builtin_assume(foo() == 1);

}

b.c
#include <stdio.h>

int foo(void) {

puts("BUM");
return 0;

}

clang -O3 a.c b.c -o bum

./bum
BUM

Side note: As shown in this example, attribute((const)) on declarations seems like a bad idea. User may add attribute((const)), then in the future introduce a side effect aaaand... compiler is happy, optimizers are happy, no warnings, weird binary. Can clang-tidy catch it? @aaron.ballman @NoQ GCC folks claim this is hard to do and say attribute((const)) is a best effort thing - you simply shouldn't do a mistake.

Aug 16 2019, 9:35 AM · Restricted Project

Aug 13 2019

hfinkel updated subscribers of rG038d604f4f8c: [SimplifyLibCalls] Add noalias from known callsites.

Clang [0], and probably other software, emit memcpy where memset would be needed.

Aug 13 2019, 12:28 PM
hfinkel added a comment to D66132: [CodeGen] Add `shouldDoPartialRedundancyElimination()` to `TargetInstrInfo` (PRR42405).

TargetTransformInfo, in the title, should say TargetInstrInfo.

Aug 13 2019, 8:57 AM · Restricted Project

Aug 12 2019

hfinkel accepted D66055: [X86] Support -mlong-double-80.

LGTM. As @MaskRay says, this shouldn't affect PowerPC, so you can remove the '[PowerPC]' from the title.

Aug 12 2019, 9:36 AM · Restricted Project

Aug 9 2019

hfinkel added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Upon further reflection, this is not clearly allowed by the standard. My experience is that, when reading standards, sometimes things are disallowed by contradiction (i.e., the standard does not define some behavior, and what the standard does say that's relevant is self contradictory). In this case, 2.19.3 says that list items which are privatized (and which are used) undergo replacement (with new items created as specified) while 2.12.5 says that "The is_device_ptr clause is used to indicate that a list item is a device pointer already in the device data environment and that it should be used directly." A given list item cannot simultaneously be "used directly" (2.12.5) and also undergo replacement: "Inside the construct, all references to the original list item are replaced by references to a new list item received by the task or SIMD lane" (2.19.3). Thus, it may be disallowed.

I think, this combination is still allowed. I assume that

#Pragma omp target parallel is_device_ptr(a) <dsa_clause>(a)

is the same as

#pragma omp target is_device_ptr(a)
#pragma omp parallel <dsa_clause>(a)

i.e. datasharing clauses are applied to inner sub-regions, not the target region itself.

Aug 9 2019, 4:42 PM · Restricted Project, Restricted Project
hfinkel added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Yep. We should add a warning message for it.

Aug 9 2019, 4:00 PM · Restricted Project, Restricted Project
hfinkel added a comment to D65835: [OpenMP] Permit map with DSA on combined directive.

I want to be sure we're on the same page: For OpenMP 5.0, should we allow is_device_ptr with the private clauses?

Yes, since it is allowed by the standard.

Umm ... I probably missed some earlier discussions! What would be the behavior of the following code?

p = omp_target_alloc(...);
#pragma omp target private(p) is_device_ptr(p)
  p[...] = ...;   // crash or not?

It must crush, I assume. The main problem is that this construct is allowed by the standard.

Aug 9 2019, 3:29 PM · Restricted Project, Restricted Project
hfinkel added a comment to D66029: llvm-canon.

I also recommend that you canonicalize PHI nodes. In past experiments looking for fixed points in the optimization pipeline, this came up as a significant issue. The order of the predecessors in the PHI operand lists don't carry any significance, also also sometimes a predecessor can be listed multiple times (always with the same corresponding value). It's probably best to canonicalize those so each predecessor is listed only once and the blocks appear in their natural order.

Aug 9 2019, 2:59 PM · Restricted Project
hfinkel accepted D65852: AssumptionCache: remove old affected values after RAUW.

LGTM

Aug 9 2019, 2:59 AM · Restricted Project

Aug 8 2019

hfinkel added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Aug 8 2019, 6:15 AM · Restricted Project, Restricted Project
hfinkel added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Aug 8 2019, 4:52 AM · Restricted Project, Restricted Project
hfinkel accepted D65261: [PowerPC] Fix ICE when truncating to vector with odd-size elements..

LGTM

Aug 8 2019, 4:17 AM · Restricted Project

Aug 7 2019

hfinkel added inline comments to D65835: [OpenMP] Permit map with DSA on combined directive.
Aug 7 2019, 4:19 PM · Restricted Project, Restricted Project
hfinkel accepted D65880: [Driver] Move LIBRARY_PATH before user inputs.

Local flags should certainly override LIBRARY_PATH.

Aug 7 2019, 9:46 AM · Restricted Project, Restricted Project

Aug 5 2019

hfinkel accepted D65226: [Strict FP] Allow custom operation actions.
In D65226#1615192, @kpn wrote:

Ping?

Should we move ahead with this? I believe this is still a pre-req for D63782 ...

I have no objections. I'll defer to one of the other reviewers.

I'm still not thrilled with the tricky doing nothing but returning true code like I mentioned, but I don't think that should be enough to hold things up.

Aug 5 2019, 11:18 AM · Restricted Project

Aug 4 2019

hfinkel added a comment to D65186: [MustExec] Add a generic "must-be-executed-context" explorer.

I think that this looks fine, but it should have some direct tests. Either make an analysis that prints out things and then some lit tests, or, some unit tests.

Aug 4 2019, 6:27 AM · Restricted Project

Aug 3 2019

hfinkel added a comment to D65410: [PassManager] First Pass implementation at -O1 pass pipeline.

Thanks for starting on this. Can you go ahead and replace the sroa calls with mem2reg calls for O1 and then see what that does to the performance? That strikes me as a major change, but certainly one that potentially makes sense, so I'd rather we go ahead and test it now before we make decisions about other adjustments.

Aug 3 2019, 11:36 AM · Restricted Project, Restricted Project

Aug 2 2019

hfinkel added a comment to D65377: [FunctionAttrs] Annotate intrinsics with nosync.

I almost think intrinsics should be assumed nosync by default. I'm not thrilled at the prospect of another attribute that nearly every intrinsic needs to be annotated with

Mh, I'm not opposing that idea but also not convinced. Doing it for all makes it easier to introduce errors I guess. Do we have precedence?

If we want to do this, maybe we should make a list of attributes that need to be explicitly removed for intrinsics.
I expect not only nosync to be on it but also others like willreturn, nofree, nounwind, etc.

It probably should be explicit, but I feel like I'm in a perpetual state of adding attributes to intrinsic definitions and it's never complete

I would propose the following:

  1. Let's go ahead with this
  2. We ask on the list how people feel about "opt-out" attributes for intrinsics and go from there
Aug 2 2019, 4:29 PM · Restricted Project

Aug 1 2019

hfinkel added a comment to D65267: [MachineCopyPropagation] Remove redundant copies after TailDup via machine-cp.

Is this something that could be fixed in TailDuplication?

Aug 1 2019, 10:48 AM · Restricted Project

Jul 24 2019

hfinkel added inline comments to D64386: CodeGen: Use memset in initializers for non-zeros.
Jul 24 2019, 9:04 AM · Restricted Project
hfinkel accepted D64795: [PowerPC] exclude more icmps in LSR which is converted in later hardware loop pass.

LGTM

Jul 24 2019, 8:52 AM · Restricted Project

Jul 22 2019

hfinkel accepted D65118: Analysis: Don't look through aliases when simplifying GEPs..

LGTM

Jul 22 2019, 3:09 PM · Restricted Project
hfinkel added a comment to D64217: [OpenMP][NFCI] Cleanup the target state queue implementation.

It is zero initialized, which you claim but which I failed to validate so far.

Jul 22 2019, 2:52 PM · Restricted Project

Jul 19 2019

hfinkel added a comment to D63782: [FPEnv] Add fptosi and fptoui constrained intrinsics.

I've been thinking about the getStrictFPOperationAction issue. I believe this function makes no sense at all and really should go away. Instead, the strict opcodes should use their own operation actions just like everything else. In current code, those should now be set up in a reasonable manner: they all default to Expand, unless the target overrides them (to whatever makes sense).

If the operation action is anything but Expand, I believe it should simply be respected as-is (i.e. Legal stays until ISel, Custom calls the target hook, Promote -if ever used- should be implemented in common code in a way that preserves strictness, and LibCall should emit a library call - possibly a strict version if necessary).

Now, if the operation action of a strict FP operation is Expand, common code should try to implement an expansion rule if possible in a way that preserves strictness. If that is not possible, then and only then should we think about falling back to a non-strict implementation. That fallback (and only that fallback) now should look at the operation action of the non-strict equivalent. If that is anything but Legal, then the expansion logic should replace the strict node with a non-strict node and push that non-strict node back to the legalizer to handle it in whatever way is indicated by its operation action. If the operation action is Legal, then we can leave the strict node in and mutate it to the non-strict node only shortly before ISel as is done today.

As to when an expansion is possible in a way that preserves strictness, and when to fall back to the non-strict equivalent: For a vector op, I believe it always makes sense to expand a strict vector op by scalarizing it to strict component ops. (Except, possibly, if that scalar op would itself fall back to its non-strict eqiuvalent -- then we might as well do the fallback on the vector type.) For scalar ops, for some it could be possible to expand them while respecting strictness, e.g. for some of the fp-to-sint cases above. That can be decided on a case-by-case basis. If at the end of ExpandNode we haven't found a way to expand a strict node, we can to the fallback then.

Does this make sense?

Jul 19 2019, 12:00 PM · Restricted Project
hfinkel added a comment to D53362: [Prototype] Heap-To-Stack Conversion Pass.

Hi Hal! Are you planning on working on this one? If not maybe I can take over? Now that we have nofree & nosync added and deduced in the Attributor, maybe this can be revisited.

Jul 19 2019, 12:09 AM

Jul 18 2019

hfinkel added a comment to D64067: [X86][PPC] Support -mlong-double-64.

Hi, we just inherited this commit at Cray when we did our latest upstream merge and there are a few problems with it that I'd like to point out. Sorry that I was not part of the initial discussion here, but I didn't know that this work was being done and I had already done it for x86 in our downstream compiler a while ago.

  1. This patch adds the -m options to f_Group, which is weird. They should be in m_Group since they are target-specific. I actually implemented them as part of a subgroup of m_Group so that I can take the last option from that group that appears on the command line.
  2. This patch lacks the GNU -mlong-double-80 option for x86.
  3. Instead of tracking the size in clang/Basic/LangOptions.def, I track it in clang/Basic/TargetOptions.h. This is a target-specific thing afterall.

    We're trying to resolve merge conflicts with this change and might be able to submit a patch to help reduce our differences. Would that be of interest?
Jul 18 2019, 12:58 PM · Restricted Project, Restricted Project

Jul 17 2019

hfinkel accepted D64850: Remove use of malloc in PPC mm_malloc..

LGTM

Jul 17 2019, 12:12 AM · Restricted Project, Restricted Project

Jul 16 2019

hfinkel accepted D64499: [PowerPC][HTM] Fix impossible reg-to-reg copy assert with ttest builtin.

LGTM

Jul 16 2019, 12:59 PM · Restricted Project

Jul 15 2019

hfinkel added a comment to D64662: [FPEnv] [PowerPC] Lower ppc_fp128 StrictFP Nodes to libcalls.

Sorry Hal, I must have accidentally released an incomplete draft yesterday instead of simply deleting it like I meant to.

Jul 15 2019, 11:27 AM · Restricted Project

Jul 14 2019

hfinkel accepted D64283: [PowerPC] Support -mabi=ieeelongdouble and -mabi=ibmlongdouble.

LGTM

Jul 14 2019, 10:18 PM · Restricted Project, Restricted Project
hfinkel accepted D64282: [PowerPC] Support fp128 libcalls.

LGTM

Jul 14 2019, 9:37 PM · Restricted Project

Jul 11 2019

hfinkel accepted D64192: [MachinePipeliner] Fix order for nodes with Anti dependence in same cycle.

LGTM.

Jul 11 2019, 3:12 PM · Restricted Project
hfinkel added inline comments to D64192: [MachinePipeliner] Fix order for nodes with Anti dependence in same cycle.
Jul 11 2019, 1:45 PM · Restricted Project
hfinkel added a comment to D64557: Add llvm.loop.licm.disable metadata.

Unfortunately I don't believe I have an example that is suitable for publishing. The problem is basically that the code motion is generally a good thing, but it can increase live ranges significantly pushing register pressure up such that performance is degraded, but that impact isn't apparent at the point at which the transformation is performed. Disabling the pass for certain loops is a low-impact pragmatic (but not ideal) solution.

Jul 11 2019, 7:44 AM · Restricted Project
hfinkel added a reviewer for D64557: Add llvm.loop.licm.disable metadata: Meinersbur.
Jul 11 2019, 7:37 AM · Restricted Project

Jul 10 2019

hfinkel accepted D64368: [LoopUnroll+LoopUnswitch] do not transform loops containing callbr.

it would be good to add a test with callbr there as well, to make sure we do not miss it, in case we remove the restriction again.

I agree and would be happy to do so. Would you prefer that additional test case in this commit or a subsequent commit (either way, before I re-visit https://reviews.llvm.org/D64101)?

Jul 10 2019, 5:24 PM · Restricted Project
hfinkel accepted D64468: Replace three "strip & accumulate" implementations with a single one.

LGTM

Jul 10 2019, 4:12 PM · Restricted Project

Jul 9 2019

hfinkel added a comment to D64283: [PowerPC] Support -mabi=ieeelongdouble and -mabi=ibmlongdouble.

Ah, fun with overloaded, legacy command-line options...

Jul 9 2019, 1:11 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D64258: [InferFuncAttributes] extend 'dereferenceable' attribute based on loads.
Jul 9 2019, 11:58 AM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

Do we have any policies against using clang-only builtins in the codebase?

Jul 9 2019, 11:31 AM · Restricted Project, Restricted Project
hfinkel added a comment to D64394: [MachineCSE][MachinePRE] Do not hoist common computations into hot BBs.

However, if CMBB is in a loop body, we might get performance degradation.

Jul 9 2019, 9:28 AM · Restricted Project

Jul 8 2019

hfinkel accepted D63806: [PowerPC][Peephole] Combine extsw and sldi after instruction selection.

LGTM

Jul 8 2019, 7:52 PM · Restricted Project
hfinkel added inline comments to D63806: [PowerPC][Peephole] Combine extsw and sldi after instruction selection.
Jul 8 2019, 7:04 PM · Restricted Project
hfinkel added inline comments to D64258: [InferFuncAttributes] extend 'dereferenceable' attribute based on loads.
Jul 8 2019, 3:09 PM · Restricted Project
hfinkel accepted D64035: [MachinePipeliner] Fix Phi refers to Phi in same stage in 1st epilogue.

Ping...

Jul 8 2019, 2:06 PM · Restricted Project
hfinkel added a comment to D63378: [ORC] WIP Speculative compilation.

Can you please upload a patch with full context?

Jul 8 2019, 1:11 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D63806: [PowerPC][Peephole] Combine extsw and sldi after instruction selection.
Jul 8 2019, 10:01 AM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

I would be opposed to any addition of a -f flag for this absent any evidence that it's valuable for some actual C code; this patch appears to be purely speculative. I certainly don't think we should be adding it default-on. Your argument about alignment is what I was referring to when I mentioned that this is enforcing alignment restrictions in places we traditionally and intentionally haven't.

Jul 8 2019, 8:53 AM · Restricted Project, Restricted Project

Jul 5 2019

hfinkel accepted D59919: [Attributor] Deduce "returned" argument attribute.

LGTM

Jul 5 2019, 8:30 PM · Restricted Project, Restricted Project
hfinkel accepted D64224: Keep the order of the basic blocks in the cloned loop as the original loop.

LGTM

Jul 5 2019, 7:45 PM · Restricted Project
hfinkel added inline comments to D64224: Keep the order of the basic blocks in the cloned loop as the original loop.
Jul 5 2019, 7:30 PM · Restricted Project
hfinkel added a comment to D64128: [CodeGen] Generate llvm.ptrmask instead of inttoptr(and(ptrtoint, C)) if possible..

The pointer/integer conversion is "implementation-defined", but it's not totally unconstrained. C notes that "The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.", and we do have to honor that. The standard allows that "the result ... might not point to an entity of the referenced type", but when in fact it's guaranteed to do so (i.e. it's not just a coincidental result of an implementation decision like the exact address of a global variable — no "guessing"), I do think we have an obligation to make it work. And on a practical level, there has to be *some* way of playing clever address tricks in the language in order to implement things like allocators and so forth. So this makes me very antsy.

I don't disagree. But I believe the question is if we have:

int *x = malloc(4);
int *y = malloc(4);
if (x & ~15 == y) {
  *(x & ~15) = 5; // Is this allowed, and if so, must the compiler assume that it might set the value of *y?
}

I certainly agree that we must allow the implementation of allocators, etc. But allocators, I think, have the opposite problem. They actually have some large underlying objects (from mmap or whatever), and we want the rest of the system to treat some subobjects of these larger objects as though they were independent objects of some given types. From the point of view of the allocator, we have x, and we have void *memory_pool, and we need to allow x & N to point into memory_pool, but because, from the allocator's perspective, we never knew that x didn't point into memory_pool (as, in fact, it likely does), that should be fine (*).

There might be more of an issue, for example, if for a given object, I happen to know that there's some interesting structure at the beginning of its page (or some other boundary).

This is what I was thinking about for allocators; this is a common implementation technique for free / realloc / malloc_size.

If I also have a pointer to this structure via some other means, then maybe this will cause a problem. This kind of thing certainly falls outside of the C/C++ abstract machine, and I'd lean toward a flag for supporting it (not on by default).

If you mean a theoretical minimal C abstract machine that does not correspond to an actual target and is therefore not bound by any of the statements in the C standard that say things like "this is expected to have its obvious translation on the target", then yes, I completely agree. If you're talking about the actual C programming language that does correspond to actual targets, then it's not clear at all that it's outside the C abstract machine, because AFAICT integer-pointer conversions are (1) well-specified on specific targets by this de facto requirement of corresponding directly to pointer representations and (2) well-behaved as long as the integer does correspond to the address of an actual object of that type.

Also, please understand that compiler writers have been telling our users for decades that (1) pointer arithmetic is subject to some restrictions on penalty of UB and (2) they can avoid those restrictions by using pointer-integer conversions and doing integer arithmetic instead.

Jul 5 2019, 7:22 PM · Restricted Project, Restricted Project
hfinkel added a comment to D64220: [PowerPC] Remove redundant load immediate instructions.

Is this really a PowerPC-specific problem? Do we just want to have a general post-RA cleanup that checks for isMoveImmediate() and does this cleanup?

Jul 5 2019, 6:52 PM · Restricted Project
hfinkel added inline comments to D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..
Jul 5 2019, 6:45 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D64217: [OpenMP][NFCI] Cleanup the target state queue implementation.
Jul 5 2019, 6:42 PM · Restricted Project
hfinkel added a comment to D64224: Keep the order of the basic blocks in the cloned loop as the original loop.

Do you have any kind of test case? I suppose that you can check the output in a test in test/Analysis/LoopInfo to show the ordering?

Jul 5 2019, 6:36 PM · Restricted Project
hfinkel added a comment to D64198: [NFC][PowerPC] Add the feature control for PreRA and PostRA scheduler.

As Hal mentioned, enable-misched and enable-post-misched should be able to turn on/off the pre-ra/post-ra scheduler.
Regarding to different subtarget (cpu), why we need to turn on/off with feature at runtime? Shouldn't that be configured in code?

yeah, per function control might be more useful here.

Ah, I didn't realize the option you mentioned, and yes, they can turn on/off the scheduler target independent for all cpu.
But if we want to turn on/off the scheduler for specific cpu, we have to add the feature for it. And the feature supports to be configured at runtime.

I am ok to abandon this patch as I have got the options to turn on/off the scheduler for PowerPC. However, I didn't see the harm to add this feature for PowerPC, as, it is the right way if we want to custom the scheduler for different cpu model. Any thoughts ?

Jul 5 2019, 6:29 PM · Restricted Project
hfinkel accepted D64080: [OPENMP]Make __kmpc_push_tripcount thread safe..

Added note.

The patch description says that you add a trip count "per thread", but the code is adding a trip count per device. Could this still cause a problem if multiple threads have target regions for the same device?

No. We already have tripcount per device: each device stores its unique tripcount value, but only one. I extended the existing solution to have the tripcount not only per device, but also per host thread.

Would there be any advantage to using __kmpc_threadprivate_register and __kmpc_threadprivate (or similar) instead of storing a map with thread ids?

Hard to tell, but definitely ee increase dependency between libomptarget and libomp. and, seems to me, threadprivate interface also uses some kind of hash table. We can use unordered_map instead.

Given that we expect this to be temporary, I'm okay with it either way. unordered_map is probably a little bit better than std::map. I expect the number of entries in the map to be small because __kmpc_global_thread_num returns OpenMP thread ids, not system TIDs, right?

Yes.

SGTM.

Jul 5 2019, 6:24 PM · Restricted Project, Restricted Project