hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (256 w, 4 d)

Recent Activity

Sun, Oct 15

hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Rebased.

Sun, Oct 15, 8:00 PM
hfinkel updated the diff for D32198: [TySan] A Type Sanitizer (LLVM).

Rebased.

Sun, Oct 15, 7:58 PM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Rebased.

Sun, Oct 15, 7:57 PM

Thu, Oct 12

hfinkel added inline comments to D37076: [LICM] Allow sinking when foldable in loop.
Thu, Oct 12, 2:04 PM
hfinkel accepted D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

LGTM

Thu, Oct 12, 9:35 AM

Wed, Oct 11

hfinkel added inline comments to D38662: [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug).
Wed, Oct 11, 11:52 PM
hfinkel added a comment to D37660: [ScalarEvolution] Handling Conditional Instruction in SCEV chain..

Please upload a patch with full context.

Wed, Oct 11, 6:50 AM

Tue, Oct 10

hfinkel added inline comments to D38662: [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug).
Tue, Oct 10, 10:14 PM
hfinkel added inline comments to D38662: [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug).
Tue, Oct 10, 3:39 PM

Mon, Oct 9

hfinkel added inline comments to D38415: [SimplifyIndVar] Replace IVUsers with loop invariant if possible.
Mon, Oct 9, 10:43 AM

Sat, Oct 7

hfinkel added inline comments to D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.
Sat, Oct 7, 11:08 PM
hfinkel added a comment to D24933: Enable configuration files in clang.

Some suggested improvements to the English in the documentation...

Sat, Oct 7, 10:32 PM
hfinkel accepted D38501: [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset.
In D38501#890491, @mppf wrote:

@hfinkel - I've made the change you requested. I think this one is good to go, but as I mentioned we need to commit it after another fix to BasicAA (such as D38499), or else the new test will fail with BasicAA assertions.

Sat, Oct 7, 9:04 PM

Fri, Oct 6

hfinkel created D38662: [BasicAA] Support arbitrary pointer sizes (and fix an overflow bug).
Fri, Oct 6, 7:59 PM
hfinkel added a comment to D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b.
In D38499#888784, @mppf wrote:

In my use case, I have 128-bit pointers but "adding" to one of these will only ever modify the lower 64 bits. So for my use case, offsets > 64 bits just don't matter.

That's completely different.

... we have a bunch of code which actually assumes it represents two's complement arithmetic in the width of the pointer; for example, instcombine has a transform which turns a pointer comparison into a comparison of pointer offsets, and some analysis passes use an APInt whose width is the pointer width rather than the int64_t AA uses. If pointer arithmetic wraps differently on your target, we probably need to add an attribute to the data layout, and adjust LangRef to define what it means.

I don't think that's necessary (for me at least) because it's OK for me to get undefined behavior / core dumps if one of these pointers actually overflows/underflows (assuming signed arithmetic). I.e. as long as adding 1, -1 or other small values works, that's good enough for me at the present time. In other words, you could assume that all my pointer operations are 'inbounds'. *technically* adding -1 causes overflow, if you look at it in the right way, but it happens in a way that plays well with truncation to a smaller sized pointer. My use-case involves a late optimization pass that changes these 128-bit pointers to {i64, 64-bit pointer} structures, at which point handling any necessary truncation is my problem. I'm saying that i64 arithmetic for my pointers is fine, but so is i128 arithmetic. I don't care what happens if you do something like load from NULL - 1.

It's certainly the case that we *could* try to modify LangRef to define pointers with offsets of a different size than the pointer, and I *could* use such semantics. But we'd have to adjust every optimization. I just don't think it's called for right now.

Remember, I'm looking for a bug fix?

But anyway this leaves me wondering - what in particular do we need to do, from your point of view, to move this patch forward? There have been quite a few suggestions and I can't tell which ones would be satisfactory (or which ones would be my problem to solve).

I'm working on a patch for this (replacing essentially all of the internal computations with APInt). Currently, my patch causes Analysis/BasicAA/gep-and-alias.ll to fail. I think this might be a real bug (I've made what I believe is a 64-bit version of this test case and it fails on trunk too). I'm investigating...

Fri, Oct 6, 1:33 PM
hfinkel added a comment to D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b.
In D38499#888784, @mppf wrote:

In my use case, I have 128-bit pointers but "adding" to one of these will only ever modify the lower 64 bits. So for my use case, offsets > 64 bits just don't matter.

That's completely different.

... we have a bunch of code which actually assumes it represents two's complement arithmetic in the width of the pointer; for example, instcombine has a transform which turns a pointer comparison into a comparison of pointer offsets, and some analysis passes use an APInt whose width is the pointer width rather than the int64_t AA uses. If pointer arithmetic wraps differently on your target, we probably need to add an attribute to the data layout, and adjust LangRef to define what it means.

I don't think that's necessary (for me at least) because it's OK for me to get undefined behavior / core dumps if one of these pointers actually overflows/underflows (assuming signed arithmetic). I.e. as long as adding 1, -1 or other small values works, that's good enough for me at the present time. In other words, you could assume that all my pointer operations are 'inbounds'. *technically* adding -1 causes overflow, if you look at it in the right way, but it happens in a way that plays well with truncation to a smaller sized pointer. My use-case involves a late optimization pass that changes these 128-bit pointers to {i64, 64-bit pointer} structures, at which point handling any necessary truncation is my problem. I'm saying that i64 arithmetic for my pointers is fine, but so is i128 arithmetic. I don't care what happens if you do something like load from NULL - 1.

It's certainly the case that we *could* try to modify LangRef to define pointers with offsets of a different size than the pointer, and I *could* use such semantics. But we'd have to adjust every optimization. I just don't think it's called for right now.

Remember, I'm looking for a bug fix?

But anyway this leaves me wondering - what in particular do we need to do, from your point of view, to move this patch forward? There have been quite a few suggestions and I can't tell which ones would be satisfactory (or which ones would be my problem to solve).

Fri, Oct 6, 1:30 PM

Thu, Oct 5

hfinkel accepted D38212: [PowerPC] Add profitablilty check for conversion to mtctr loops .

LGTM

Thu, Oct 5, 3:41 PM
hfinkel added inline comments to D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.
Thu, Oct 5, 3:34 PM
hfinkel added inline comments to D38501: [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset.
Thu, Oct 5, 1:59 PM
hfinkel added a comment to D38596: Implement attribute target multiversioning.

Can you explain briefly what is required of the system in order to support this (is it just ifunc support in the dynamic linker / loader)?

Yep, this feature depends entirely on ifuncs.

In general, there are a number of places in this patch, at the Sema layer (including in the error messages), that talk about how this is an x86-only feature. As soon as someone adds support for a second architecture, this will cause unnecessary churn (and perhaps miss places resulting in stale comments). Can you create some callbacks, in TargetInfo or similar, so that we can just naturally make this work on other architectures?

Yep, I have that enforced, with the hope that it would make it easier to identify all the places that this change needs to be made. Currently, there are a few 'TargetInfo' features that need to be supported (isValidCPUName/isValidCPUFeature/compareCpusAndFeatures), the SEMA check relaxed (though I could perhaps add a TargetInfo::IsFunctionMultiVersioningSupported function if thats what you're saying?), and CodeGenFunction::FormResolverCondition/CodeGenFunction::EmitMultiVersionResolver ( which could easily have FormResolverCondition moved to TargetInfo, assuming we OK with a CodeGen file calling into a TargetInfo?). Are these the changes you mean?

Thu, Oct 5, 1:52 PM
hfinkel added inline comments to D38501: [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset.
Thu, Oct 5, 1:33 PM
hfinkel added a comment to D38596: Implement attribute target multiversioning.

Can you explain briefly what is required of the system in order to support this (is it just ifunc support in the dynamic linker / loader)?

Thu, Oct 5, 1:19 PM
hfinkel added a comment to D38566: [SimplifyCFG] don't sink common insts too soon (PR34603).

This is a minimal patch to solve PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Can you please summarize here what's going on and how you arrived at this solution? Last I recall, in PR34603, we were discussing select formation, and how it might not be desirable to form selects early. By "aggressive flattening" do you mean select formation? What you're doing here seems like more than just disabling select formation, but sinking? I also think that we might not want to sink early, at least some things (because we can't later prove it safe to re-hoist), but it's not immediately clear how all of these things are related.

Sorry for the invented vocabulary. 'Aggressive flattening' was meant to include the select formation, but that's not actually the root of the problem. The sinking is the root of the problem, so that's why I'm proposing to disable that in the early incarnation of SimplifyCFG. My intent (probably needing refinement) is to delay the transform that was added with rL279460 ( SinkThenElseCodeToEnd ) because that's where we regressed the examples in the bug reports. Let me illustrate using the example in the test cases (it's a slightly simplified form of the motivating example in the bug report):

Thu, Oct 5, 12:52 PM
hfinkel added a comment to D35804: [BPI] Detect branches in loops that make themselves not taken.

This code looks fine to me. My next question is, is the pattern common? The amount of code and possible compile increase added is non trivial.

I've finally gotten around to gathering some data on this. Over the entire LLVM test suite (which may or may not be a useful data set, I don't know), for each call to computeUnlikelySuccessors:

  • 56% execute the first loop (i.e. the block does a conditional branch based on a compare with a constant)
  • 19% execute the second loop (i.e. we found a suitable phi node)
  • 0.11% insert at least one element into UnlikelyBlocks

    I measured the increase to the overall buildtime of the LLVM test suite as 2%.
Thu, Oct 5, 10:52 AM

Wed, Oct 4

hfinkel added a comment to D38566: [SimplifyCFG] don't sink common insts too soon (PR34603).

This is a minimal patch to solve PR34603:
https://bugs.llvm.org/show_bug.cgi?id=34603

Wed, Oct 4, 6:16 PM
hfinkel added inline comments to D37076: [LICM] Allow sinking when foldable in loop.
Wed, Oct 4, 6:04 PM
hfinkel added a comment to D38554: Fixed ppc32 function relocations in non-pic mode.

Let's start with the obvious: I have no idea why you need this code at all. NetBSD builds a mix of static, PIC and PIE code using GNU ld without hitting any unsupported relocations. As I said before, the normal ABI on PowerPC is mostly PIC by default. I wonder if the root of your problem is that you want EABI and not the SYSV ABI.

Wed, Oct 4, 5:34 PM

Tue, Oct 3

hfinkel accepted D38486: [PPC] Implement the heuristic to choose between a X-Form VSX ld/st vs a X-Form FP ld/st..
Tue, Oct 3, 9:05 PM
hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Rebased (and fixed address mappings for (big-Endian) ppc64).

Tue, Oct 3, 8:45 PM
hfinkel updated the diff for D32198: [TySan] A Type Sanitizer (LLVM).

Rebased.

Tue, Oct 3, 8:43 PM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Rebased.

Tue, Oct 3, 8:42 PM
hfinkel accepted D38372: [OpenMP] Fix passing of -m arguments correctly.

LGTM

Tue, Oct 3, 8:24 PM
hfinkel added inline comments to D38212: [PowerPC] Add profitablilty check for conversion to mtctr loops .
Tue, Oct 3, 8:13 PM
hfinkel added a comment to D38494: [ScalarEvolution] Handling for ICmp occuring in the evolution chain..

no, If there has no latch will not execute evoluateForICmp in LoopDeletion(if I think correct)
getBackedgeTakenInfo will return SCEVUnkown correctly if there is no latch.

Tue, Oct 3, 7:58 PM
hfinkel accepted D38336: Add an @llvm.sideeffect intrinsic.

LGTM

Tue, Oct 3, 7:45 PM
hfinkel added a comment to D38494: [ScalarEvolution] Handling for ICmp occuring in the evolution chain..

can you explain me why? (I may I didn't understood why do we need to process without loop latch)
I thought without latch cannot be evoluted from ICmp evolution.

without latch loop have to me evoluted from previous SCEV operation. isn't it?

Tue, Oct 3, 7:32 PM
hfinkel added inline comments to D38413: [PowerPC] Add handling for ColdCC calling convention and a pass to mark candidates with coldcc attribute.
Tue, Oct 3, 7:08 PM
hfinkel added inline comments to D38496: [XRay] [test-suite] Add litsupport module for microbenchmarks.
Tue, Oct 3, 2:55 PM
hfinkel added a comment to D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b.

If we switch to using APInt for offsets, adjustToPointerSize() just goes away, so I don't see how this is forward progress.

Tue, Oct 3, 2:19 PM
hfinkel added a comment to D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b.

Hang on, there's a more fundamental problem here this is papering over. If your pointers are larger than 64 bits, those pointers can have offsets larger than 64 bits. Since BasicAA is using 64-bit integers to represent pointer offsets, the math in DecomposeGEPExpression will overflow, so you'll get incorrect results, and eventually cause a miscompile.

Indeed; I made a similar comment in D38501. In this case, it looks like the main potential overflow comes from:

Index = GetLinearExpression(Index, IndexScale, IndexOffset, ZExtBits,
                            SExtBits, DL, 0, AC, DT, NSW, NUW);
 
// The GEP index scale ("Scale") scales C1*V+C2, yielding (C1*V+C2)*Scale.
// This gives us an aggregate computation of (C1*Scale)*V + C2*Scale.
Decomposed.OtherOffset += IndexOffset.getSExtValue() * Scale;
Scale *= IndexScale.getSExtValue();

where Scale is multiplied by IndexScale.getSExtValue();. We might just bail out here if IndexScale, or IndexOffset, can't be represented in 64 bits.

Tue, Oct 3, 1:37 PM
hfinkel added a comment to D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b.

Hang on, there's a more fundamental problem here this is papering over. If your pointers are larger than 64 bits, those pointers can have offsets larger than 64 bits. Since BasicAA is using 64-bit integers to represent pointer offsets, the math in DecomposeGEPExpression will overflow, so you'll get incorrect results, and eventually cause a miscompile.

Tue, Oct 3, 1:34 PM
hfinkel added a comment to D38501: [ValueTracking] Fix a misuse of APInt in GetPointerBaseWithConstantOffset.

It would be better if the test would check for an actual different result rather than "it doesn't crash".

Tue, Oct 3, 11:41 AM
hfinkel added inline comments to D38496: [XRay] [test-suite] Add litsupport module for microbenchmarks.
Tue, Oct 3, 10:43 AM
hfinkel added a comment to D38499: [BasicAA] Fix adjustToPointerSize in BasicAliasAnalysis.cpp for ptr > 64b.

LGTM too. Please do update the test so it checks that the result makes sense. We need more test coverage in this area.

Tue, Oct 3, 9:59 AM
hfinkel added inline comments to D38496: [XRay] [test-suite] Add litsupport module for microbenchmarks.
Tue, Oct 3, 9:55 AM
hfinkel added a comment to D31417: [OpenMP] Add support for omp simd pragmas without runtime.

Is this still being worked on?

Hi, yes it is. Sorry for the delay in posting new changes but priorities shifted a bit and I had to work on something else for a while.

Tue, Oct 3, 9:22 AM

Mon, Oct 2

hfinkel added a comment to D38482: TargetMachine: Use LLVMTargetMachine in CodeGen.
  1. If in #1 you mean actual targets -- are there any left now that the C backend is gone? What are these targets? I wonder if we could simplify things by collapsing a no longer necessary layer.
  • There is none in open source, and I don't know of any internal ones at my company.
  • According to the mailing list this may be good for SPIR-V (http://lists.llvm.org/pipermail/llvm-dev/2017-June/114593.html) which doesn't really need any of the CodeGen infrastructure.
  • Indeed collapsing the two classes would slightly simplify the code, and I'd be perfectly fine with that.
  • On the other hand it's kinda neat to have a CodeGen internal interface with LLVMTargetMachine and an outside interface with TargetMachine

I think we should probably just merge these.

My understanding is that the consensus from the entire community was that SPIR-V really should be leveraging the same common legalization infrastructure as the rest of the targets. Notably, other highly abstract targets like WebAssembly seem to be succeeding with that model.

If we want to introduce a target that doesn't share the core LLVM code generation layer, I'm actually still open to that, but I don't think we should maintain unnecessary abstractions waiting for that day to arrive. I think that sets up the right incentives: if we're going to go down the path of splitting out interfaces in this way, that comes with a cost and we should evaluate that cost against the benefit it provides to those actual use cases. Otherwise it seems too easy to endlessly invent arguments on one side or the other because they are all hypothetical.

But this is always a judgement call, so of course I'm interested in where others see this. It seems like Eric is largely agreeing. Quentin? Craig? Hal? Others?

Mon, Oct 2, 8:05 PM
hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..

Can you get more information on what GCC actually implemented and why? It's not clear to me that ignoring the namespaces is the most-useful way to do this. I don't want to emulate GCC bugs, but maybe there's a good reason why their implementation works this way.

This is what I found, https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00473.html

Mon, Oct 2, 7:32 PM
hfinkel added inline comments to D38417: [test-suite] Adding HACCKernels app.
Mon, Oct 2, 4:54 PM
hfinkel added a comment to D31417: [OpenMP] Add support for omp simd pragmas without runtime.

Is this still being worked on?

Mon, Oct 2, 4:18 PM

Sun, Oct 1

hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..
  • add comment to CPP test to explain usage

Thanks. Please also add some tests showing matching overloaded functions, functions with template parameters, etc.

Do we need to strip whitespace before trying to match the demangled names?

Some cursory testing with g++ shows that only the 'test5' of 'test5(float, int, int*)' is matched. 'test5(' or 'test5 (' is not matched. It seems weird that arguments are not matched.

g++ man page shows

"The function name to be matched is its user-visible name, such as "vector<int> blah(const vector<int> &)", not the internal mangled name"

but it doesn't seem to be including parameters.

Interesting. Can you tell what GCC is doing w.r.t. namespace names, class names, etc. and template parameters?

Also uncovered a bug where sub argument list containing comma needs to be surrounded by single quote, but clang seems to ignores single quote.
I'll try to dig around ArgList implementation to see if it can return argument surrounded by single-quote as a whole.

Given A::B::C<D>(T a), only 'C<D>' is meaningful in g++'s matcher. Adding anything else escapes the match [ 'B::C', 'C<D>(' ].
It seems like g++ will also not match single quotes as a whole, ie. int fooboo() is matched by 'foo,boo'.

Sun, Oct 1, 7:29 AM
hfinkel accepted D38421: Eliminate ftrunc if source is know to be rounded.

LGTM

Sun, Oct 1, 6:38 AM

Sat, Sep 30

hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..
  • add comment to CPP test to explain usage

Thanks. Please also add some tests showing matching overloaded functions, functions with template parameters, etc.

Do we need to strip whitespace before trying to match the demangled names?

Some cursory testing with g++ shows that only the 'test5' of 'test5(float, int, int*)' is matched. 'test5(' or 'test5 (' is not matched. It seems weird that arguments are not matched.

g++ man page shows

"The function name to be matched is its user-visible name, such as "vector<int> blah(const vector<int> &)", not the internal mangled name"

but it doesn't seem to be including parameters.

Sat, Sep 30, 11:48 PM
hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..
  • add comment to CPP test to explain usage
Sat, Sep 30, 7:12 PM
hfinkel added inline comments to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..
Sat, Sep 30, 6:49 PM
hfinkel added a comment to D38417: [test-suite] Adding HACCKernels app.

Hi Brian,

Thanks for working on this!

On the execution time of about 13 seconds on a fast processor: would it be possible to adapt the input to reduce the running time to about 1 second (or less), and not loose the characteristic behaviour of this benchmark?
I'd expect adapting the input to have an order of magnitude shorter running time will not make the measured execution time more noisy; and it would help in trying to make sure the test-suite keeps on running as quickly as possible.

Kristof, just FYI, reducing the runtime is easy to do, however, this will get significantly faster once we start actually vectorizing the hot loops (at least 4x if you have an architecture with <4 x float>). I looked at this a few weeks ago, and as I recall, we currently can't if-convert reductions (which prevents vectorization). We plan on working on improving this in the near future. Does this affect your opinion at all?

I mainly keep on focussing on the run-time of newly added benchmarks to the test-suite as I do think it is a problem already that the test-suite takes too long to run.

Sat, Sep 30, 1:29 PM
hfinkel added a comment to D38417: [test-suite] Adding HACCKernels app.

Hi Brian,

Thanks for working on this!

On the execution time of about 13 seconds on a fast processor: would it be possible to adapt the input to reduce the running time to about 1 second (or less), and not loose the characteristic behaviour of this benchmark?
I'd expect adapting the input to have an order of magnitude shorter running time will not make the measured execution time more noisy; and it would help in trying to make sure the test-suite keeps on running as quickly as possible.

Sat, Sep 30, 8:21 AM

Fri, Sep 29

hfinkel added a comment to D37624: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options..

Please also add a C++ test to check the mangling-related features.

Fri, Sep 29, 6:30 PM
hfinkel added a comment to D37622: add support for -fno-instrument-functions and -finstrument-functions-exclude-{file,function}-list=<arg1,arg2,...> to match gcc options.

Unfortunately, when you submit a patch for review, you need to add the commits list for the associated project as a subscriber. As this patch has no subscribers, no one saw it. While you can add a subscriber now, that won't send out the patch summary again. As a result, it's best to abandon this review and start another one. Make sure that you add cfe-commits as a subscriber, and the patch will get reviewed. Also, feel free to add me as a reviewer.

Fri, Sep 29, 5:26 PM
hfinkel added inline comments to D33946: [InlineCost] Find identical loads in the callee.
Fri, Sep 29, 2:08 PM
hfinkel added a comment to D38421: Eliminate ftrunc if source is know to be rounded.

We could potentially update visitCEIL and visitFLOOR as well, and use the same opcode test in each, although I don't think such combinations are very likely.

Right, that is possible but unlikely to happen. The current situation we have is due to the fact we have (fptosi (rint x)). I.e. we have rounded value but now we need to get integer from rounded float, and trunc is a part of that fptosi expansion.

Fri, Sep 29, 1:59 PM
hfinkel added a comment to D38419: Create instruction classes for identifying any atomicity of memory intrinsic..

Pulling this, for now. I'm going to make sure that the joint part of the hierarchy (i.e. the AnyMem* classes) will work before trying to land this part.

FWIW, I don't really like the PainMem* names, although the AtomicMem* looks like an improvement. Can't you just leave the plain ones alone?

I have no particular objection to leaving the plain ones as is. It's what I suggested: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116716.html

A counter suggestion, that wasn't challenged, was made to change the plain ones as well: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116722.html

Fair enough. I challenge :-) -- Sanjoy's comment about silent failures downstream is legitimate, but given how new the atomic versions of these are, I don't think that I'm concerned (and it will fail in the right way: by not recognizing the atomic forms, which is what any current code is probably intended to do (because it was written before the atomic forms existed)).

Playing a little devil's advocate... A benefit I can see, off the top of my head, in favour of changing to PlainMem* is cognitive. Having all of PlainMem* & AtomicMem* & AnyMem* makes the programmer have to think, briefly, about whether they're dealing with atomic or non-atomic variants. Leaving the non-atomic variants as is makes it easier to not even consider that the atomic variants exist.

Fri, Sep 29, 1:46 PM
hfinkel added inline comments to D33946: [InlineCost] Find identical loads in the callee.
Fri, Sep 29, 1:21 PM
hfinkel added a comment to D38419: Create instruction classes for identifying any atomicity of memory intrinsic..

Pulling this, for now. I'm going to make sure that the joint part of the hierarchy (i.e. the AnyMem* classes) will work before trying to land this part.

FWIW, I don't really like the PainMem* names, although the AtomicMem* looks like an improvement. Can't you just leave the plain ones alone?

I have no particular objection to leaving the plain ones as is. It's what I suggested: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116716.html

A counter suggestion, that wasn't challenged, was made to change the plain ones as well: http://lists.llvm.org/pipermail/llvm-dev/2017-August/116722.html

Fri, Sep 29, 1:18 PM
hfinkel added inline comments to D38421: Eliminate ftrunc if source is know to be rounded.
Fri, Sep 29, 1:10 PM
hfinkel accepted D37514: [PowerPC] support ZERO_EXTEND in tryBitPermutation.

LGTM (but please run performance tests and a self-host check before committing - I can imagine us overlooking something here).

Fri, Sep 29, 1:07 PM
hfinkel added a comment to D38419: Create instruction classes for identifying any atomicity of memory intrinsic..

Pulling this, for now. I'm going to make sure that the joint part of the hierarchy (i.e. the AnyMem* classes) will work before trying to land this part.

Fri, Sep 29, 12:57 PM
hfinkel added inline comments to D37514: [PowerPC] support ZERO_EXTEND in tryBitPermutation.
Fri, Sep 29, 8:37 AM
hfinkel added inline comments to D37514: [PowerPC] support ZERO_EXTEND in tryBitPermutation.
Fri, Sep 29, 8:26 AM

Wed, Sep 27

hfinkel added a comment to D37343: [CGP] Merge empty case blocks if no extra moves are added..

The real problem with computing DT there isn't how long it takes in common cases, it's that the total time will blow up quadratically for large functions.

Wed, Sep 27, 7:33 PM
hfinkel added a comment to D38336: Add an @llvm.sideeffect intrinsic.

Maybe we should do this in follow-up, but I'd also like to see some general text added to the LangRef about the semantics of loops (that makes reference to this intrinsic). Maybe we can say something like:

Wed, Sep 27, 7:05 PM
hfinkel added inline comments to D38186: Add the new -Wnull-pointer-arithmetic warnings to the release notes.
Wed, Sep 27, 12:52 PM · Restricted Project
hfinkel accepted D38085: Use the basic cost if a GEP is not used as addressing mode.

LGTM

Wed, Sep 27, 12:23 PM
hfinkel added a comment to D37463: Fix miscompile in LoopSink pass.

I think there might be some further investigation needed for unordered atomics, but I will take that offline with Daniil before returning to this thread.

Returning to this point, the conclusion I would reach reviewing our Atomics documentation is that sinking an unordered atomic into a loop must be disallowed, because doing so duplicates the load which is disallowed.

The relevant section of the documentation is: http://llvm.org/docs/Atomics.html#unordered, specifically the Notes for Optimizers section. Here's the full text of the section under discussion:

Notes for optimizers

In terms of the optimizer, this *prohibits any transformation that transforms a single load into multiple loads*, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise. Some examples of unsafe optimizations are narrowing an assignment into a bitfield, *rematerializing a load*, and turning loads and stores into a memcpy call. Reordering unordered operations is safe, though, and optimizers should take advantage of that because unordered operations are common in languages that need them.

The important bits are in *bold*. The first highlighted clause could be interpreted to be a weaker constraint that it first seems. In particular, it could be read as only disallowing *splitting* a load into multiple pieces, and not disallowing *duplicating* an load. Internally, several folks read it that way. I believe that is an incorrect interpretation and that duplication must also be disallowed by the text. My reasoning is as follows:

Sinking a load into the header of a loop with a trip count > 1 increases the number of times the load is run dynamically.

Rematerialization (which is explicitly allowed as a disallowed transformation) is the insertion of loads by the register allocator at use sites. LoopSinking and rematerialization end up producing *exactly the same code* on the following example:

L = load atomic %p
while (foo()) {

call_which_clobbers_all_regs();
use L

}

Both transforms would want to produce:

while (foo()) {

call_which_clobbers_all_regs();
L = load atomic %p
use L

}

If this were legal for LoopSink, then it would also be legal for rematerialization. And it's definitely not, as explicitly stated in the text. As such, it must be the case that LoopSink is disallowed for this case as well, which requires the broader interpretation of the initial sentence.

Given this, I must conclude that LoopSink is currently buggy as we do sink unordered atomic loads into loops.

Wed, Sep 27, 12:21 PM
hfinkel added inline comments to D38085: Use the basic cost if a GEP is not used as addressing mode.
Wed, Sep 27, 11:25 AM
hfinkel accepted D38236: [PowerPC] eliminate partially redundant compare instruction.

LGTM

Wed, Sep 27, 11:11 AM
hfinkel added a comment to D38113: OpenCL: Assume functions are convergent.

...

Yes, I see this sounds more reasonable indeed. Btw, currently LLVM can remove convergent in some cases to recover the performance loss?

Wed, Sep 27, 7:55 AM

Tue, Sep 26

hfinkel added inline comments to D38236: [PowerPC] eliminate partially redundant compare instruction.
Tue, Sep 26, 3:30 PM
hfinkel accepted D37730: [PowerPC] eliminate unconditional branch to the next instruction.

Seems like we might want to have block layout clean up this sort of branch rather than analyzeBranch since we could theoretically make the same change in every backend?

I think this optimization should be machine dependent based on the following comment in MachineBlockPlacement.cpp. X86 backend already implements the same optimization.

// Indeed, the target may be able to optimize the branches in a way we
// cannot because all branches may not be analyzable.
// E.g., the target may be able to remove an unconditional branch to
// a fallthrough when it occurs after predicated terminators.
Tue, Sep 26, 3:30 PM
hfinkel added a comment to D37514: [PowerPC] support ZERO_EXTEND in tryBitPermutation.

I apologize for taking so long to get back to this. The bit-permutation selector uses a cost model to decide how to lower each permutation sequence. Preemptively lowering the zext like this during the analysis phase of the algorithm seems suboptimal (or at least ad hoc).

Tue, Sep 26, 3:30 PM
hfinkel added inline comments to D38212: [PowerPC] Add profitablilty check for conversion to mtctr loops .
Tue, Sep 26, 3:29 PM
hfinkel added a comment to D36562: [Bitfield] Make the bitfield a separate location if it has width of legal integer type and its bit offset is naturally aligned for the type.

You seem to be only changing the behavior for the "separatable" fields, but I suspect you want to change the behavior for the others too. The bitfield would be decomposed into shards, separated by the naturally-sized-and-aligned fields. Each access only loads its shard. For example, in your test case you have:

Tue, Sep 26, 3:29 PM
hfinkel added a comment to D37648: [SLPVectorizer] Fix PR21780 Expansion of 256 bit vector loads fails to fold into shuffles.

Thanks for working on this. Please post an RFC on llvm-dev about the intrinsic and how you intend to use it. You can reference this review, but we should make sure that we have design consensus before proceeding here.

Here is the thread: http://lists.llvm.org/pipermail/llvm-dev/2017-September/117381.html

Tue, Sep 26, 3:29 PM

Mon, Sep 25

hfinkel added inline comments to D38212: [PowerPC] Add profitablilty check for conversion to mtctr loops .
Mon, Sep 25, 10:09 AM
hfinkel added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

@spatel , @reviewiews , can this land now into trunk ?

I haven't actually looked at the builder changes since you revised them, so I defer to @hfinkel is he's already approved that part.

Mon, Sep 25, 9:45 AM

Fri, Sep 22

hfinkel accepted D38104: [TargetTransformInfo] Handle intrinsic call in getInstructionLatency().

LGTM

Fri, Sep 22, 9:27 AM
hfinkel added a comment to D38113: OpenCL: Assume functions are convergent.

The problem of adding this attribute conservatively for all functions is that it prevents some optimizations to happen. I agree to commit this as a temporary fix to guarantee correctness of generated code.

Fri, Sep 22, 8:49 AM
hfinkel added a comment to D38054: [PowerPC] Fix the spill issue exposed by r310809.

I think this generally looks fine.

Just as a general note on code organizations and commenting, we seem to be accumulating a pretty deep call tree here of functions that only work correctly in 64-bit mode. Not in this patch, but we should do something about this (either segregating these things into a distinct part of the file and/or adding asserts and/or comments to the 64-bit-mode-only functions.

Agreed. Also a lot of the fine points of when to extend and when not could probably be split out as well. This is definitely getting pretty unwieldy.

-eric

OK. I'll commit this patch as-is with the inline comment addressed.

Then perhaps in a subsequent patch, I'll create something like class GPRComparisonSelector that uses essentially the same approach as the BitPermutationSelector. The Select() function will do the necessary checks and exit early if we're not a PPC64 target. All the functions that do the actual selection will be private. This way we should be able to avoid carelessly calling these functions under invalid conditions.
Does that sound like a good plan?

Fri, Sep 22, 7:19 AM

Thu, Sep 21

hfinkel added a comment to D36258: Disable loop peeling during full unrolling pass..

FYI, this patch causes a somewhat serious regression on one of our internal benchmarks. Given we have a custom pass pipeline, it's definitely our responsibility to adapt to upstream changes not the other way around, but it would have helped us recognize the problem if either of two things had happened along with this patch:

  1. The patch had been separated into two pieces: adding the new control know without changing the default and then a small patch changing the default.
  2. The change had been announced in some way. An email to llvm-dev probably would have been sufficient.

    Given we're hardly the only ones with custom pass pipelines, I'd encourage everyone involved (author, reviewers, etc..) to think about the upgrade path in similar cases which arise in the future.
Thu, Sep 21, 10:28 PM
hfinkel added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

While fast does imply nnan, and nnan does propagate backward, and it also implies arcp, and arcp does not propagate backward. arcp applied only to the instruction to which it's attached. In this case, we're allowed to use a reciprocal approximation to the division, and not the sqrt. However, we could argue that using the rsqrt is like doing the sqrt exactly and then just approximating the division. If there are no other users of the sqrt itself, there seems to be little semantic difference.

I thought we leaned to the more conservative (less propagating) model in IR. Not sure if that would be different here in the DAG or if rsqrt is a special case.

Thu, Sep 21, 3:17 PM
hfinkel added inline comments to D38104: [TargetTransformInfo] Handle intrinsic call in getInstructionLatency().
Thu, Sep 21, 3:09 PM
hfinkel added a comment to D37686: [DAG] Consolidating Instruction->SDNode Flags propagation in one class for better code management..

I've added some more FMF tests at rL313893 which I think this patch will miscompile. Please rebase/update.

As I suggested before, this patch shouldn't try to enable multiple DAG combines with node-level FMF. It's not as straightforward as you might think.

Pick exactly one combine if you want to show that this patch is working as intended. The llvm.muladd intrinsic test that you have here with a target that supports 'fma' (not plain x86) seems like a good choice to me. If we have a strict op in IR, it should produce an fma instruction. If we have a fast op in IR, it should produce the simpler fmul instruction?

My understanding and code changes are based LLVM Ref Manual 's section about Fast-Math flags" (http://llvm.org/docs/LangRef.html#fast-math-flags)

Which say for FMF flag NaN "Allow optimizations to assume the arguments and result are not NaN".

Now in following case which has been added by you

%y = call float @llvm.sqrt.f32(float %x)
%z = fdiv fast float 1.0, %y
ret float %z

We dont have fast flag over intrinsic but DAGCombining for fdiv sees a fast flag and assume result (%z) and arguments (constant , %y) as not a Nan and goes ahead and generates a reciprocal sqrt. If you remove fast from fdiv and add it to intrinsic then FMF opt at fdiv will not kick in.

Can you please let me know what you expected here.

I expect that the sqrt result is strict. Ie, it should use sqrtss if this is x86-64. We're not allowed to use rsqrtss and lose precision on that op.

Thu, Sep 21, 12:31 PM
hfinkel added inline comments to D38104: [TargetTransformInfo] Handle intrinsic call in getInstructionLatency().
Thu, Sep 21, 10:04 AM
hfinkel added a comment to D38054: [PowerPC] Fix the spill issue exposed by r310809.

I think this generally looks fine.

Thu, Sep 21, 9:15 AM

Wed, Sep 20

hfinkel added inline comments to D38085: Use the basic cost if a GEP is not used as addressing mode.
Wed, Sep 20, 3:02 PM
hfinkel added a comment to D38097: [IVUsers] Changes to make IVUsers's results robust to instruction and uselist ordering.

Any performance changes in the test suite?

I don't have access to SPEC source, so I cannot test those. I'm unfamiliar with the performance test suite, so I don't know what all else is there. I'm going to set up a run against our internal (Java-based) performance suite to see where that stands.

Wed, Sep 20, 2:28 PM
hfinkel added a comment to D38097: [IVUsers] Changes to make IVUsers's results robust to instruction and uselist ordering.

Any performance changes in the test suite?

Wed, Sep 20, 1:50 PM
hfinkel added inline comments to D38085: Use the basic cost if a GEP is not used as addressing mode.
Wed, Sep 20, 12:12 PM

Tue, Sep 19

hfinkel added a comment to D37421: [XRay] [test-suite] Add LNT support to retref-bench benchmarks..

First: This looks ok and can be committed as is if you want.

However IMO using the LNTBased system here is not the best choice, I'd rather try writing a litsupport plugin. LNTBased is little used and IMO less flexible than the cmake+lit plugin approach.

I agree, having this in litsupport seems better (and also solves the duplication/symlink problem). I imagine that the plugin could append a function to metric_collectors that reads the CSV and reports the associated timing data. @MatzeB, is there documentation on the litsupport plugin interface anywhere? (if all we have is what's in the README.md, we should write some more).

That's all right now. I'll write some paragraphs on how to get started in the README file and make sure the interfaces have better documentation comments.

In played around with this yesterday and this should work to collect the gtest output:

litsupport/microbenchmark.py

'''Test module to collect google benchmark results.'''
from litsupport import shellcommand
from litsupport import testplan
import csv
import lit.Test


def _mutateCommandLine(context, commandline):
    cmd = shellcommand.parse(commandline)
    cmd.arguments.append("--benchmark_format=csv")
    # We need stdout outself to get the benchmark csv data.
    if cmd.stdout is not None:
        raise Exception("Rerouting stdout not allowed for microbenchmarks")
    benchfile = context.tmpBase + '.bench.csv'
    cmd.stdout = benchfile
    context.microbenchfiles.append(benchfile)

    return cmd.toCommandline()


def _mutateScript(context, script):
    return testplan.mutateScript(context, script, _mutateCommandLine)


def _collectMicrobenchmarkTime(context, microbenchfiles):
    result = 0.0
    for f in microbenchfiles:
        with open(f) as inp:
            lines = csv.reader(inp)
            # First line: "name,iterations,real_time,cpu_time,time_unit..."
            for line in lines:
                if line[0] == 'name':
                    continue
                # Note that we cannot create new tests here, so for now we just
                # add up all the numbers here.
                result += float(line[3])
    return {'exec_time': lit.Test.toMetricValue(result/1000000.)}


def mutatePlan(context, plan):
    context.microbenchfiles = []
    plan.runscript = _mutateScript(context, plan.runscript)
    plan.metric_collectors.append(
        lambda context: _collectMicrobenchmarkTime(context,
                                                   context.microbenchfiles)
    )

I found two problems with this at the moment:

  • The list of litsupport plugins is currently global. Thus enabling this microbenchmarking plugin will make the non-microbenchmarking tests fail. It should be easy enough to change this to a local setting so that lit.local.cfg files can override the plugin list (just have to convince myself that this is a good feature to have :).
Tue, Sep 19, 7:48 PM
hfinkel added inline comments to D37914: [OpenMP] Don't throw cudalib not found error if only front-end is required..
Tue, Sep 19, 5:02 PM