Page MenuHomePhabricator

hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (339 w, 3 d)

Recent Activity

Thu, May 16

hfinkel accepted D62044: [PowerPC] Remove CRBits Copy Of Unset/set CBit.

LGTM.

Thu, May 16, 6:56 PM · Restricted Project
hfinkel added a comment to D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes.

Can you explain why we don't always do this? You're checking for profiling data but then not using it?

I think the idea is that with PGO data, we are more certain that the hotness information is actually meaningful - since the alignment directive will only be emitted for loops that are "hot". Without PGO data, we will align a majority of loops which may be overkill. But this should be clearly stated in the comment though.

Thu, May 16, 3:00 PM · Restricted Project
hfinkel added a comment to D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes.

@hfinkel , do you have any other comments?

Thu, May 16, 9:51 AM · Restricted Project

Wed, May 15

hfinkel accepted D61325: Treat a narrowing PtrToInt like Trunc when generating asm.

LGTM

Wed, May 15, 8:49 AM · Restricted Project
hfinkel updated subscribers of D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++.

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

I think that a function attribute would be better. We generally use these flags only in the context of certain translation units, and when we use LTO, it would be sad if we had to take the most-conservative settings across the entire application. When we insert new function call to a standard library, we always do it in the context of some function. We'd probably need to block inlining in some cases, but that's better than a global conservative setting.

Using function level attributes instead of module flags does provide finer grained control and avoids the conservativeness when merging IR for LTO. The downsides I see, mostly just in terms of the engineering effort to get this to work, are:

  • need to prevent inlining with different attributes
Wed, May 15, 8:18 AM · Restricted Project, Restricted Project

Tue, May 14

hfinkel added a comment to D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++.

I have a related patch that turns -fno-builtin* options into module flags

Do you have any opinion on representing -fno-builtin using a module flag vs. a function attribute in IR? It seems generally more flexible and easier to reason about a function attribute from my perspective. But I might be missing something about the semantics of -fno-builtin that would make that representation awkward. Or I guess it might just be more work to implement, given we have some IPO passes that use TargetLibraryInfo?

Tue, May 14, 3:17 PM · Restricted Project, Restricted Project
hfinkel added a comment to D59065: Add ptrmask intrinsic.

It might be simpler to describe the semantics in terms of GEP: e.g. ptrmask(x) is equivalent to gep x, (ptrtoint(x)&mask)-ptrtoint(x). I think that has the same semantics as the description, but it might be easier to understand for a reader familiar with GEPs.

Tue, May 14, 12:23 PM · Restricted Project
hfinkel added a comment to D59065: Add ptrmask intrinsic.

I don't see why but I might be wrong (@hfinkel). What I thought is detailed below.

Tue, May 14, 11:59 AM · Restricted Project
hfinkel accepted D54742: [CodeMetrics] Don't let extends of i1 be free..

We should watch for perf regressions on x86, and other targets.

Tue, May 14, 9:50 AM

Mon, May 13

hfinkel accepted D61754: [PowerPC] Custom lower known CR bit spills.

If this test case proves too fragile in the future, we might replace it with an MIR test.

Mon, May 13, 3:10 PM · Restricted Project
hfinkel added inline comments to D61754: [PowerPC] Custom lower known CR bit spills.
Mon, May 13, 12:58 PM · Restricted Project
hfinkel added inline comments to D61754: [PowerPC] Custom lower known CR bit spills.
Mon, May 13, 8:58 AM · Restricted Project

Fri, May 10

hfinkel added inline comments to D61754: [PowerPC] Custom lower known CR bit spills.
Fri, May 10, 1:25 PM · Restricted Project

Thu, May 9

hfinkel added inline comments to D59065: Add ptrmask intrinsic.
Thu, May 9, 3:14 PM · Restricted Project
hfinkel added inline comments to D61754: [PowerPC] Custom lower known CR bit spills.
Thu, May 9, 2:40 PM · Restricted Project
hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

What's the status of this? It seems like discussion has died down a bit. I think Graham's idea to change from <scalable 2 x float> to <vscale x 2 x float> will make the IR more readable/understandable but it's not a show-stopper for me.

Are there any other outstanding issues to address before this lands?

I think @huntergr still needs to update the insertelement/extractelement doc. For me, the last remaining thing is the definition of shufflevector behavior that his last revision did not address. Do we allow scalable vector? Do we allow anything other than zeroinitializer mask? I'm fine not allowing for the time being or restricting to zeroinitializer mask for the time being. We still need to write it down since generally supporting shufflevector of scalable vectors is outside the scope of this patch.

Thu, May 9, 11:14 AM

Wed, May 8

hfinkel added a comment to D61665: [TailCall] Disable tail call if the callee function contain __builtin_frame_address or __builtin_return_address.

I have a few concerns here:

  1. Looping over every instruction in a function is expensive, and makes any pass which checks this for every call in a function take quadratic time overall.
  2. You can't inspect the body of a function pointer, or a function in a different translation unit, so we can't make this work consistently.
  3. Even in the same translation unit, how do we "preserve" the behavior for values greater than 1?

    ------

    I'd prefer to just leave the current behavior if it isn't causing any practical problems. The user can always use -fno-optimize-sibling-calls if their codebase needs it for some reason.
Wed, May 8, 2:11 PM · Restricted Project

Tue, May 7

hfinkel added inline comments to D61399: [OpenMP][Clang] Support for target math functions.
Tue, May 7, 3:12 PM · Restricted Project
hfinkel added a comment to D61626: [RISCV] Disable tail call if the callee function contain __builtin_frame_address or __builtin_return_address.

Is this really a target-independent problem?

Tue, May 7, 9:30 AM · Restricted Project

Mon, May 6

hfinkel accepted D61546: Stop the DAG combiner from combining vector stores greater than preferred vector width....

Update comments. Have testcase work for multiple sizes and multiple preferred vector sizes.

Mon, May 6, 10:54 PM · Restricted Project
hfinkel added inline comments to D61607: Introduce an option to stripPointerCasts to force the same bit pattern.
Mon, May 6, 1:47 PM · Restricted Project

Sat, May 4

hfinkel added a comment to D61546: Stop the DAG combiner from combining vector stores greater than preferred vector width....

Seems reasonable to me. Can have have both a 128- and 256-bit test case?

Sat, May 4, 1:25 AM · Restricted Project

Fri, May 3

hfinkel added a comment to D61409: [SimplifyCFG] Added condition assumption for unreachable blocks.

Any advices about next steps for this patch?

if inserted llvm.assume is a big problem and “fixing” oneUse match check is not proper solution, then this patch is dead..

Fri, May 3, 5:25 PM · Restricted Project
hfinkel added a comment to D61399: [OpenMP][Clang] Support for target math functions.

It is up to you. I don't have strong objections if you think this will work as required. Just the tests must be fixed, especially codegen tests.

Fri, May 3, 12:03 PM · Restricted Project
hfinkel added a comment to D61399: [OpenMP][Clang] Support for target math functions.

Still, I think we need to prvide the default implementation of those non-standard functions (they can be very simple, maybe reporting error is going to be enough), which can be overriden by user.

Fri, May 3, 11:04 AM · Restricted Project
hfinkel accepted D60047: [CaptureTracking] Don't let comparisons against null escape inbounds pointers.

LGTM

Fri, May 3, 9:53 AM · Restricted Project

Thu, May 2

hfinkel added a comment to D61458: [hip] Relax CUDA call restriction within `decltype` context..

Only if they also differ in some other way. C++ does not (generally) have return-type-based overloading. The two functions described would even mangle the same way if CUDA didn't include host/device in the mangling.

Thu, May 2, 7:57 PM · Restricted Project
hfinkel added a comment to D61458: [hip] Relax CUDA call restriction within `decltype` context..

Here's one for you:

__host__ float bar();
__device__ int bar();
__host__ __device__ auto foo() -> decltype(bar()) {}

What is the return type of foo? :)

I don't believe the right answer is, "float when compiling for host, int when compiling for device."

Thu, May 2, 7:05 PM · Restricted Project
hfinkel accepted D61075: [CodeGenPrepare] limit overflow intrinsic matching to a single basic block.

LGTM

Thu, May 2, 5:24 PM · Restricted Project
hfinkel added a comment to D61399: [OpenMP][Clang] Support for target math functions.

I don't like this implementation. Seems to me, it breaks one of the OpenMP standard requirements: the program can be compiled without openmp support. I assume, that with this includes the program won't be able to be compiled without OpenMP support anymore because it may use some device-specific math functions explicitly.
Instead, I would like to see some additional, device-scpecific math header file, that must be included explicitly to support some device-specific math functions. And we need to provide default implementations for those extra math functions for all the platforms we're going to support, including default host implementations.

Can you provide an example of a conforming program that can't be compiled without OpenMP support? Regardless of the use of any device-specific functions (which isn't covered by the standard, of course, but might be needed in practice), the code still needs to be compilable by the host in order to generate the host-fallback version. This doesn't change that. Thus, any program that uses anything from this math.h, etc. needs to compile for the host, and thus, likely compiles without OpenMP support. Maybe I'm missing your point, however.

Assume we have something like this:

#pragma omp target if(cond)
a = __nv_xxxx(....);

Instead of __nv_xxx you can try to use any Cuda-specific function, which is not the part of the standard math.h/cmath files. Will it be compilable even with OpenMP?

I don't think that this changes that one way or the other. Your example won't work, AFAIK, unless you do something like:

#pragma omp target if(cond)
#ifdef __NVPTX__
a = __nv_xxxx(....);
#else
a = something_on_the_host;
#endif

and anything from these headers that doesn't also have a host version will suffer the same fate: if it won't also compile for the host (one way or another), then it won't work.

The problem with this header file is that it allows to use those Cuda-specific functions unconditionally in some cases:

#pragma omp target
a = __nv_xxxx(....);

It won't require any target-specific guards to compile this code (if we compile it only for Cuda-specific devices) and we're loosing the consistency here: in some cases target regions will require special device guards, in others, with the same function calls, it is not. And the worst thing, is that we implicitly allow to introduce this kind of incostistency into users code. That's why I would prefer to see a special kind of the include file, NVPTX specific, that must be included explicitly, so the user explictly commanded to use some target-specific math functions, if he really wants it. Plus, maybe, in this files we need force check of the platform and warn users that the functions from this header file must be used using device-specific checks. Or provide some kind of the default implementations for all the platforms, that do not support those math function natively.

Thu, May 2, 5:24 PM · Restricted Project
hfinkel added inline comments to D61075: [CodeGenPrepare] limit overflow intrinsic matching to a single basic block.
Thu, May 2, 2:11 PM · Restricted Project
hfinkel added a comment to D61399: [OpenMP][Clang] Support for target math functions.

I don't like this implementation. Seems to me, it breaks one of the OpenMP standard requirements: the program can be compiled without openmp support. I assume, that with this includes the program won't be able to be compiled without OpenMP support anymore because it may use some device-specific math functions explicitly.
Instead, I would like to see some additional, device-scpecific math header file, that must be included explicitly to support some device-specific math functions. And we need to provide default implementations for those extra math functions for all the platforms we're going to support, including default host implementations.

Can you provide an example of a conforming program that can't be compiled without OpenMP support? Regardless of the use of any device-specific functions (which isn't covered by the standard, of course, but might be needed in practice), the code still needs to be compilable by the host in order to generate the host-fallback version. This doesn't change that. Thus, any program that uses anything from this math.h, etc. needs to compile for the host, and thus, likely compiles without OpenMP support. Maybe I'm missing your point, however.

Assume we have something like this:

#pragma omp target if(cond)
a = __nv_xxxx(....);

Instead of __nv_xxx you can try to use any Cuda-specific function, which is not the part of the standard math.h/cmath files. Will it be compilable even with OpenMP?

Thu, May 2, 11:16 AM · Restricted Project
hfinkel added a comment to D61399: [OpenMP][Clang] Support for target math functions.

I don't like this implementation. Seems to me, it breaks one of the OpenMP standard requirements: the program can be compiled without openmp support. I assume, that with this includes the program won't be able to be compiled without OpenMP support anymore because it may use some device-specific math functions explicitly.
Instead, I would like to see some additional, device-scpecific math header file, that must be included explicitly to support some device-specific math functions. And we need to provide default implementations for those extra math functions for all the platforms we're going to support, including default host implementations.

Thu, May 2, 11:05 AM · Restricted Project
hfinkel accepted D61399: [OpenMP][Clang] Support for target math functions.
Thu, May 2, 10:29 AM · Restricted Project
hfinkel added inline comments to D61075: [CodeGenPrepare] limit overflow intrinsic matching to a single basic block.
Thu, May 2, 10:13 AM · Restricted Project
hfinkel added inline comments to D60047: [CaptureTracking] Don't let comparisons against null escape inbounds pointers.
Thu, May 2, 9:55 AM · Restricted Project

Wed, May 1

hfinkel added a comment to D60916: Add non-SSE wrapper for __kmp_{load,store}_mxcsr.

Does the code that uses __kmp_load_mxcsr expect the pointed-to value to be initialized?

Wed, May 1, 6:34 AM · Restricted Project
hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Why implementation defined and not UB for the case where the index exceeds the runtime length? How do you intend to define this for SVE?

SVE uses a predicate for indexed inserts and extracts. We generate that predicate by comparing a splat of the index against a stepvector (0,1,2,3...); if the index is out of range then the predicate will be all false.

For a mov (insert), that results in an unmodified vector.

For a lastb (extract), that extracts the last lane in the vector if no predicate bits are true.

I don't know if RVV or SX-Aurora have similarly defined semantics. If the preference is to make it UB, that's fine.

I think this should be UB, or at least return poison. However, making it UB has the unfortunate side effect of making it illegal to hoist these operations out of conditionals (which currently isn't the case), so maybe poison is better.

For me the main reason for making is UB (aside from generally being conservative) is that element insert/extract can be usefully legalized by putting the vector into a stack slot and accessing the affected element with scalar loads/stores -- in fact, that's the default legalization strategy in trunk. But in that setting an out-of-bounds lane index would become a wild read/write (= clear-cut UB), unless the legalization code jumps through extra hoops to add a bounds check or clamp the lane index into range. This wouldn't affect SVE or RVV since they would implement custom lowering anyway, but for other targets (especially if we eventually generalize insert/extract to accept variable lane positions and do so for all vector types for consistency) it could become an unnecessary headache. Though if we make insertelement with out-of-range lane index return poison instead of being UB, then at minimum clamping in insertelement is necessary to avoid the wild write.

Aside: your proposed semantics for SVE would break the property extractelt(insertelt(vec, i, elt), i) = elt which instcombine (and others) most likely assumes. If we make the insertelt return poison (or even just undef, FWIW), that issue goes away.

Wed, May 1, 6:30 AM

Tue, Apr 30

hfinkel added a comment to D61116: [test-suite] MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench: change *CMAKE* target name only.

While I'm sympathetic, we have a lot of benchmark names which are short and could conflict with other things.
We have xsbench too.

Sure, yes. But handling conflicts as they appears isn't really an invalid strategy, either.
This *only* changes the cmake target name, not the binary name, so this should be virtually unnoticeable,
unless one directly cares about that cmake target name that is of course.

I'm aware, but this is also a slippery slope. You have the upstream project carrying a patch to avoid an uncommon naming conflict with an out-of-tree enhancement. That's something that we've tried, in general, not to do.

Either we should solve this uniformly, or not at all.

Can you define the scope of uniformly? Every [a-zA-Z][a-zA-Z0-9]bench?

No, I mean making all of the test suite targets have some naming convention that makes conflict less likely. Maybe adding llvm-ts- to all of them, for example. Maybe you could build that into the llvm_multisource macro and thus solve the problem for everything at once.

You can always carry local patches for name conflicts with out-of-tree things.

I *could* but i'm trying to avoid that.

I realize. But the way you avoid that is to contribute your benchmark to the test suite, thus making the naming conflict something we figure out how to resolve during code review.

rL345166 / https://llvm.org/docs/Proposals/TestSuite.html#rawspeed

llvm_multisource has an option "PREFIX" that can be used to disambiguate target names. However, in my experience when using it collecting metrics does not work as reliable with this options.

Why not changing the name of rsbench downstream? Changing upstream to solve downstream conflicts seems to be the wrong direction.

The main thing is, i'd like for it to be seamlessly integrateable into testsuite, with *no* diff on either side.
That is:

  • I don't want to change my rsbench in upstream, because i *quite* often have to type it, manually.
  • I would prefer not to change my rsbench (when the time comes to actually upstream it into testsuite), because then that will be a diff from upstream.
  • I don't really see any way to carry this (MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench/CMakeLists.txt) patch.

    So the question, as i have seen it, is whether test-suite 'consumers' often have to manually type the rsbench for the current DOE-ProxyApps-C/RSBench ? I have guessed no, and thus it seemed better to rename *it*.

    It is indeed quite random name clash :/
Tue, Apr 30, 12:43 PM · Restricted Project
hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Noted the change in semantics for extractelement and insertelement in the langref.

Tue, Apr 30, 10:46 AM
hfinkel added a comment to D60619: New pass to produce more easily-read IR..

Hi,

neither of you have commented in the past couple of weeks. Should I:

  • look for other reviewers? if so, do you have suggestions?
  • abandon the feature and merely submit the corrected comment (".inc file")?

    Sorry to bother you. I do understand that reviewing isn't high on anyone's priority list.
Tue, Apr 30, 10:01 AM · Restricted Project
hfinkel accepted D60831: [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion.

My take-away from the discussion was this: It is desirable to map the instructions to something in the loop (e.g., not line 0), unless doing so will provide confusing information to the mapping that PGO uses to optimize the relevant branches. Am I correct in saying that this latter issue is of minimal concern in this case?

As far as I see it you are correct. There should be negligible to no impact on PGO.

Tue, Apr 30, 9:56 AM · debug-info, Restricted Project
hfinkel added a comment to D61116: [test-suite] MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench: change *CMAKE* target name only.

While I'm sympathetic, we have a lot of benchmark names which are short and could conflict with other things.
We have xsbench too.

Sure, yes. But handling conflicts as they appears isn't really an invalid strategy, either.
This *only* changes the cmake target name, not the binary name, so this should be virtually unnoticeable,
unless one directly cares about that cmake target name that is of course.

Tue, Apr 30, 9:51 AM · Restricted Project
hfinkel added a comment to D60907: [OpenMP] Add math functions support in OpenMP offloading.

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

...

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

The solution I suggested had the advantages of:

  1. Being able to directly reuse the code in __clang_cuda_device_functions.h. On the other hand, using this solution we need to implement a wrapper function for every math function. When __clang_cuda_device_functions.h is updated, we need to update the OpenMP wrapper as well.

I'd even go as far as to argue that __clang_cuda_device_functions.h should include the internal math.h wrapper to get all math functions. See also the next comment.

  1. Providing access to wrappers for other CUDA intrinsics in a natural way (e.g., rnorm3d) [it looks a bit nicer to provide a host version of rnorm3d than __nv_rnorm3d in user code].

@hfinkel
I don't see why you want to mix CUDA intrinsics with math.h overloads.

Tue, Apr 30, 9:43 AM · Restricted Project

Mon, Apr 29

hfinkel added a comment to D60907: [OpenMP] Add math functions support in OpenMP offloading.

The last two comments in D47849 indicated exploration of a different approach, and one which still seems superior to this one. Can you please comment on why you're now pursuing this approach instead?

...

Hal, as far as I can tell, this solution is similar to yours but with a slightly different implementation. If there are particular aspects about this patch you would like to discuss/give feedback on please let me know.

Mon, Apr 29, 8:55 PM · Restricted Project
hfinkel added a comment to D61116: [test-suite] MultiSource/Benchmarks/DOE-ProxyApps-C/RSBench: change *CMAKE* target name only.

While I'm sympathetic, we have a lot of benchmark names which are short and could conflict with other things. We have xsbench too. Either we should solve this uniformly, or not at all. You can always carry local patches for name conflicts with out-of-tree things.

Mon, Apr 29, 8:26 PM · Restricted Project
hfinkel added a comment to D60831: [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion.

Ping.

A summary of the discussion so far:

The middle block is quite "latch like" in that it dictates control flow between the vectorized and scalar loops, but it can contain additional
instructions. For example, it may also handle converting the vectorized results into a scalar value.

The patch maps all middle block instructions to the line of the loop latch branch.

An initial question of "How much will this affect profiling tools that rely on this debug info?" was not fully addressed because
discussions on an alternative solution I mentioned eclipsed the original question.

Mon, Apr 29, 10:29 AM · debug-info, Restricted Project
hfinkel added a comment to D61228: [PowerPC] Set the innermost hot loop(from PGO) to align 32 bytes.

For some special cases, the performance can improve more than 30% after adding the patch for ppc.

Mon, Apr 29, 5:55 AM · Restricted Project
hfinkel accepted D61227: [NFC]][PowerPC] Use -check-prefixes to simplify the check in code-align.ll.

LGTM

Mon, Apr 29, 5:52 AM · Restricted Project

Thu, Apr 25

hfinkel added a comment to D60907: [OpenMP] Add math functions support in OpenMP offloading.
Thu, Apr 25, 11:21 AM · Restricted Project

Wed, Apr 24

hfinkel added a comment to D61019: [AggressiveAntiDepBreaker] Handle predicate operands better.

Why don't we always do this? Isn't assuming that the register is always read the conservatively-correct thing to do here?

Wed, Apr 24, 9:39 AM · Restricted Project
hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I am not sure how it could be anything but n. If you don't know how long the vector is, you can't correctly generate an index beyond n.

But you know at runtime... there has to be a way to determine, at runtime, vscale. And the index doesn't need to be a constant. I'm not sure that we need to restrict non-constant n to only values valid for vscale == 1.

Good point. 100% agree. I was only considering the constant case.

Ok, so do we have agreement that constant literal indices should be limited to 0..n-1 for now, but non-constant indices can potentially exceed n so that expressions featuring vscale can be used?

I want to say yes here, but that leaves a bit of oddness. It's true that nobody can really police on true variable values. However, some variable values can become constant through optimization.
What's the behavior when >=n constant is exposed?

Wed, Apr 24, 7:58 AM

Apr 19 2019

hfinkel accepted D59712: [APSInt][OpenMP] Fix isNegative, etc. for unsigned types.

Does this pass check-all? check-all of stage-2? test-suite?

For me, all these tests behave with the current patch. As before, the only subproject I did not build was llgo.

Apr 19 2019, 6:02 PM · Restricted Project, Restricted Project
hfinkel accepted D60790: [LPM/BPI] Preserve BPI through trivial loop pass pipeline.

Hal, ping?

Apr 19 2019, 12:06 PM · Restricted Project

Apr 18 2019

hfinkel added a comment to D59712: [APSInt][OpenMP] Fix isNegative, etc. for unsigned types.

I've never tried running the other tests you mention, for any patch. I thought people normally left those to the bots. Should this patch be handled differently?

We have a lot of people actively working off of trunk, and we try very hard to keep trunk clean. The bots are a second line of defense, not the primary checkers. In any case, this comes down to professional judgement. It is not uncommon to ask for a patch author to check self hosting and a test suite run before committing - specifically, those patches that might affect correctness, or introduce other subtle problems, and for which running the compiler over a bunch of C/C++ code might uncover a problem.

Thanks for explaining. It's my first time receiving these particular requests (probably because of what parts of LLVM I normally edit), so I wasn't sure I understood.

Apr 18 2019, 5:36 PM · Restricted Project, Restricted Project
hfinkel added a comment to D59712: [APSInt][OpenMP] Fix isNegative, etc. for unsigned types.

Wondering if it would be better to assert for asking for the sign of an unsigned APSInt. I could see a caller just wanting to get the msb for some reason and not knowing that isNegative won’t work.

Yes, i, too, would think an assert is much better solution (since i literally just tripped over this in this review.)

Does this pass check-all? check-all of stage-2? test-suite?

No. The assert breaks cases in at least ExprConstant.cpp and SemaExpr.cpp.

Err, i was talking about the current code in the patch :)

For me, check-all's success is not affected by the current patch. I built all subprojects except llgo, which gave me a build problem independently of this patch.

I've never tried running the other tests you mention, for any patch. I thought people normally left those to the bots. Should this patch be handled differently?

Apr 18 2019, 3:32 PM · Restricted Project, Restricted Project
hfinkel added a comment to D60162: [ThinLTO] Add module flags for TargetLibraryInfoImpl and use in LTO backends.

I wonder if we could add a module flag for the TLI, and then store that in the summary. Would it work for both implementations in that case?

A module flag summarizing the TLI should theoretically work (we wouldn't need to put anything in the ThinLTO summary, just needs to be in the IR for the backends). There is a lot of stuff in the TLI though, it would all need to be summarized.

Apr 18 2019, 3:09 PM · Restricted Project, Restricted Project
hfinkel accepted D60861: [MachineScheduler] Check pending instructions when an instruction is scheduled.

Hi Hal,

Yes, that's right. The architecture in question is VLIW, so there is plenty of available parallelism per cycle and esoteric constraints.

Apr 18 2019, 12:31 PM · Restricted Project
hfinkel added a comment to D60861: [MachineScheduler] Check pending instructions when an instruction is scheduled.

Pending instructions that may have been blocked from being available by the HazardRecognizer may no longer may not be blocked any more when an instruction is scheduled; pending instructions should be re-checked in this case.

Apr 18 2019, 11:39 AM · Restricted Project
hfinkel accepted D60856: [CodeGen] Add "const" to MachineInstr::mayAlias.

LGTM

Apr 18 2019, 11:25 AM · Restricted Project

Apr 17 2019

hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

We need to clarify on insertelement/extractelement. Maybe already done in some other patches, but that clarification should be part of this patch.
Is the "length of val" under the semantics "scalable * n" in <scalable n x ElemTy>, right? Or is it still n?

I am not sure how it could be anything but n. If you don't know how long the vector is, you can't correctly generate an index beyond n.

Apr 17 2019, 2:17 PM
hfinkel added a comment to D60790: [LPM/BPI] Preserve BPI through trivial loop pass pipeline.

Can you please take the rationale, especially for LoopSimplify, and add it as a comment in the source code? At some point someone might break this invariant, and without the comment, they'll be no easy way to know that.

Apr 17 2019, 1:58 PM · Restricted Project
hfinkel added a comment to D60783: [LoopPred] Implement a version of the profitability heuristic w/o BPI.

In the new pass manager, getting access to BPI reliability is a challenge. In practice, we've been running downstream with a variation of this non-BPI based heuristic.

Can you please explain why? The new pass manager was supposed to make our problems getting a hold of analysis results in different places easier, not harder.

New pass manager provides proper organization of analyses but it cant (and shouldnt!) hide innate asymmetry in relationship between Function-level analysis and Loop-level transformation.
NPM specifically prohibits reruns of outer-level analysis from within a inner-level-pass (proxy is readonly).
When working on loop-level you can only *use* function/module-level analysis, manually *preserve* it or automatically invalidate analysis results as a whole after the changes to IR.
That seems to be a proper design decision.

Apr 17 2019, 1:55 PM · Restricted Project
hfinkel updated subscribers of D60831: [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion.

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

I brought this up in some offline discussions which all concluded that the impact on profilers would be small and the trade off for better debugging is worth it. The impact on profilers seems like it would be small because the middle block is only visited once after running through the vectorized loop.

I am glad you asked because this concern gives rise to an argument for giving the middle block instructions line 0 instead, and i am interested in hearing other's opinions.

Apr 17 2019, 12:08 PM · debug-info, Restricted Project
hfinkel added a comment to D60831: [DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion.

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

Apr 17 2019, 10:24 AM · debug-info, Restricted Project
hfinkel added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
    1. Updates
  • added constrained fp intrinsics (IR level only).
  • initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

Apr 17 2019, 7:21 AM · Restricted Project

Apr 16 2019

hfinkel added a comment to D60745: Remove RunSLPAfterLoopVectorization option....

Do we need to adjust the new pass manager too?

Nope. There's only a single instantiation of the SLP vectorizer in the new pass manager - it's in a fairly gratuitously different location and some loop passes have run before and after. I think that should be handled separately.

Apr 16 2019, 6:45 PM · Restricted Project
hfinkel added a comment to D60783: [LoopPred] Implement a version of the profitability heuristic w/o BPI.

In the new pass manager, getting access to BPI reliability is a challenge. In practice, we've been running downstream with a variation of this non-BPI based heuristic.

Apr 16 2019, 3:24 PM · Restricted Project
hfinkel added inline comments to D55290: [docs] Update llvm.loop metadata documentation..
Apr 16 2019, 3:23 PM · Restricted Project
hfinkel added a comment to D60747: Remove EnableEarlyCSEMemSSA option.

Is this still useful for debugging problems with MemSSA and GVNHoist? Is this mirrored in the new pass manager?

Apr 16 2019, 9:43 AM · Restricted Project
hfinkel added a comment to D60745: Remove RunSLPAfterLoopVectorization option....

Do we need to adjust the new pass manager too?

Apr 16 2019, 9:43 AM · Restricted Project
hfinkel accepted D60745: Remove RunSLPAfterLoopVectorization option....

LGTM

Apr 16 2019, 9:35 AM · Restricted Project

Apr 15 2019

hfinkel added a comment to D60578: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions..

So, the test is good or I should put it into some other directory?

Apr 15 2019, 12:49 PM · Restricted Project
hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

So if we wanted to keep them as intrinsics for now, I think we have one of three options:

  1. Leave discussion on more complicated shuffles until later, and only use scalable autovectorization on loops which don't need anything more than splats.

Given the current state, this is the easiest path.

Apr 15 2019, 10:02 AM
hfinkel added a comment to D60578: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions..

What check command do you run to run these nvptx tests?

The target is called check-libomptarget-nvptx and it's not run by check-openmp. The reasoning is basically what I've described in my previous answers, (maybe) some more in the initial revision D51687.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Okay. This patch review is not the right place to discuss the libomptarget support for old Clang versions. We should have a separate thread on this subject.

In any case, let's move forward with adding this test in that directory, and then we'll address the infrastructure issue as follow-up work.

So put differently, you're proposing to land this in its current form (which will break for some users, including me) and wait for "somebody" to work on the infrastructure to fix things?

Apr 15 2019, 9:41 AM · Restricted Project

Apr 14 2019

hfinkel added a comment to D60578: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions..

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

I don't see anything in this test that is nvptx specific. Is there something about the semantics that make it specific to nvptx? We need to build of a suite of tests for accelerator offloading in general. We'll have other accelerator backends (e.g., for AMD GPUs), and the offloading tests should apply to them too.

First, I agree that the test is not specific to nvptx and should pass for all targets. However (at least so far) libomptarget/test is for tests that exercise the target-agnostic part of libomptarget; like starting target regions, environment variables, mapping etc.

Also, I don't understand what Clang 6 support has to do with adding tests... clearly, we'll add tests for bugs, or already have, that will then fail on older versions of Clang.

In that case these tests need to be marked UNSUPPORTED for versions of Clang that will not pass them. There's infrastructure for that, but it's not applied in the current form of this patch.

Apr 14 2019, 5:15 PM · Restricted Project

Apr 13 2019

hfinkel added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I think this is a coherent set of changes. Given the current top of trunk, this expands support from just assembly/disassembly of machine instructions to include LLVM IR, right? Such being the case, I think this patch should go in. I have some ideas on how to structure passes so SV IR supporting optimizations can be added incrementally. If anyone thinks such a proposal would help, let me know.

I think there is one more thing we still have to do. Does scalable vector type apply to all Instructions where non-scalable vector is allowed? If the answer is no, we need to identify which ones are not allowed to take scalable vector type operand/result. Some of the Instructions are not plain element-wise operation. Do we have agreed upon semantics for all those that are allowed?

Apr 13 2019, 7:04 PM
hfinkel added a comment to D60578: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions..

Our general policy is that all commits that can have tests, should have tests. We have OpenMP target tests in libomptarget/test -- and given that you've added tests there yourself, I assume that you know this ;) -- plus tests in libomptarget/deviceRTLs/nvptx/test - although it sounds like this situation can be triggered using portable code, so I'd prefer we add a test in libomptarget/test. Can you please do that?

Sure, if we have a testing infrastructure for this, I'll add the test. Just missed the tests for NVPTX, will definitely add it.

I think we need to be careful about adding nvptx tests to libomptarget/test: They can be executed using Clang later than 6.0.0, but that version wasn't able to offload to GPUs. Given that the changes are limited to libomptarget-nvptx (because its parallelism is kind of special), I think the new test should go to libomptarget/deviceRTLs/nvptx/test. Just my 2 cents...

Apr 13 2019, 5:34 AM · Restricted Project

Apr 12 2019

hfinkel added inline comments to D59648: [BasicAliasAnalysis] Fix computation for arrays > half of address space.
Apr 12 2019, 12:02 PM · Restricted Project
hfinkel added a comment to D60578: [OPENMP][NVPTX]Fix dynamic scheduling in L2+ SPMD parallel regions..

...

Why don't we have unit tests here or in the llvm-test suite?

Because this is the library. Do you have an idea how to write the unit tests for it? It can be tested only with the executable tests.

We write google unit tests for various components, maybe something like that works here as well. A test that makes sure the initial output of omp_get_level is now 1 would then be great. It is by far not trivial to determine that omp_get_level, if called with an uninitialized device RT, should return parallelLevel + 1.

I know, that someone worked on the target-based testsuite, but don't know when it is going to be ready.

There is the V&V test suite: https://crpl.cis.udel.edu/ompvvsollve/
We could also add openmp target tests into the LLVM Test Suite and run them if people define CMAKE flags.

Actually, it was the testsuite, which reveals the problems with the runtime. But only after some changes in the compiler I made to run more constructs in SPMD. Before that they all were executed in non-SPMD and the problem was masked. And I don't see a problem here since the exhaustive testing is impossible in principle.
If you have a testsuite and ready to prepare and send an RFC, solve the problems with the license, organize it, setup buildbots, provide support, then go ahead. We can do everything, but it requires a lot of time. I agree that we need target-specific testing.

Apr 12 2019, 7:16 AM · Restricted Project

Apr 11 2019

hfinkel added inline comments to D60564: Changes for LLVM PPCISelLowering function combineBVOfConsecutiveLoads.
Apr 11 2019, 6:42 AM · Restricted Project

Apr 9 2019

hfinkel added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 9 2019, 1:43 AM

Apr 8 2019

hfinkel accepted D60406: Move the builtin headers to use the new license file header..

LGTM. Thanks!

Apr 8 2019, 9:59 AM · Restricted Project, Restricted Project

Apr 7 2019

hfinkel added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 7 2019, 7:41 AM
hfinkel added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 7 2019, 7:36 AM
hfinkel added a comment to D60093: [LoopPredication] Allow predication of loop invariant computations.

Because SCEV doesn't have access to AA,

Apr 7 2019, 6:57 AM · Restricted Project

Apr 5 2019

hfinkel added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 5 2019, 11:18 AM

Apr 4 2019

hfinkel added a comment to D28791: [compiler-rt][crt] Simple crtbegin and crtend implementation.

Looks like this needs to be updated with the new license headers.

Apr 4 2019, 10:08 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 4 2019, 9:14 AM
hfinkel added a comment to D59065: Add ptrmask intrinsic.

@sanjoy I haven't tried to solve this problem myself, but it seems pretty important. It sounds to me like you're laying out an argument for introducing LLVM pointer masking intrinsics that would preserve some sort of inbound property. Is it fair to say that we probably can't fully optimize tagged pointers without using intrinsics to avoid this ptrtoint optimizer trap?

I think your understanding is correct. To support full optimization opportunity, an intrinsic like llvm.ptrmask(p, mask) would work.

I agree, but unfortunately it isn't clear to me how we can generate this intrinsic from frontend code (assuming a C++ frontend) that does pointer arithmetic by casting pointers to integers and back.

Apr 4 2019, 8:50 AM · Restricted Project

Apr 3 2019

hfinkel added a comment to D60227: [Remarks] Add string deduplication using a string table.

Is this transparent to tools that use LLVM's YAML library? Other tools?

Apr 3 2019, 3:01 PM · Restricted Project
hfinkel added a comment to D60039: Fix the bug of garbage collection of siod..

I would question whether it's actually worth it to attempt to keep software working that seems to have been abandoned for a decade as part of the llvm test suite? Should this code just be removed, instead of patched?

Apr 3 2019, 1:29 PM · Restricted Project

Mar 29 2019

hfinkel added a reviewer for D59065: Add ptrmask intrinsic: sanjoy.
Mar 29 2019, 12:11 PM · Restricted Project
hfinkel added a comment to D59065: Add ptrmask intrinsic.

I am also not entirely sure how control dependencies could add new underlying objects with this patch

Mar 29 2019, 12:11 PM · Restricted Project

Mar 21 2019

hfinkel added a comment to D59657: [LangRef] Clarify codegen expectations for intrinsics with fp/integer-only overloads.

The audience of this are people who are defining new target intrinsics.

Mar 21 2019, 1:36 PM · Restricted Project
hfinkel added a comment to D59657: [LangRef] Clarify codegen expectations for intrinsics with fp/integer-only overloads.

I don't understand what this is supposed to mean to who the audience of this statement is?

Mar 21 2019, 11:48 AM · Restricted Project

Mar 20 2019

hfinkel accepted D59623: [PPC] Refactor PPCBranchSelector.cpp.

LGTM.

-eric

Mar 20 2019, 7:14 PM · Restricted Project

Mar 19 2019

hfinkel added a comment to D47849: [OpenMP][Clang][NVPTX] Enable math functions called in an OpenMP NVPTX target device region to be resolved as device-native function calls.

We need to make progress on this, and I'd like to suggest a path forward...

Mar 19 2019, 6:11 PM · Restricted Project

Mar 15 2019

hfinkel accepted D59437: Remove immarg from llvm.expect.

LGTM. Maybe we can also add the test case from the PR as one of Clang's regression tests?

Mar 15 2019, 3:40 PM

Mar 14 2019

hfinkel added a comment to D59065: Add ptrmask intrinsic.

I'm really afraid that this isn't sound. We've had a number of issues in this space, and we've always resisted attempts to make BasicAA look through inttoptr/ptrtotint. Once you convert to an integer, control dependencies can effectively add additional underlying objects. In cases where this is sound, why not fold away the whole inttoptr(and(ptrtoint)) in the first place?

Mar 14 2019, 9:32 AM · Restricted Project