hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (240 w, 2 d)

Recent Activity

Yesterday

hfinkel accepted D34089: [llvm-stress] Ensure that the C++11 random device respects its min/max values (PR32585).

Yes, go ahead. Let's fix the current code in the mean time.

Sun, Jun 25, 2:29 PM
hfinkel accepted D34597: [AST] Fix a bug in aliasesUnknownInst. Make sure we are comparing the unknown instructions in the alias set and the instruction interested in..

LGTM

Sun, Jun 25, 12:55 AM

Sat, Jun 24

hfinkel accepted D34467: [SelectionDAG] set dereferenceable flag when expanding memcpy/memmove.

LGTM

Sat, Jun 24, 6:55 AM

Fri, Jun 23

hfinkel added a comment to D34467: [SelectionDAG] set dereferenceable flag when expanding memcpy/memmove.

Please rebase after r306193 and post the patch with full context.

Fri, Jun 23, 6:46 PM
hfinkel added inline comments to D34579: Fold fneg and fabs like multiplications.
Fri, Jun 23, 4:18 PM
hfinkel added a comment to D32272: [XRay] Add Google Benchmark library + initial XRay benchmarks.

LGTM for the most part, thanks for making the changes, it's much simpler now.

Thoughts @MatzeB? I think we can work on running/reporting the microbenchmarks data later, in changes to LNT.

Can you please summarize the state of this and the eventual goal? It looks like:

  1. Eventually, we need a new LNT plugin (i.e. something like what we have in LNTBased/*) that understands how to interpret the output of the Google benchmark library.

I hope this could all be done within the test-suite, probably extending test-suite/litsupport. I hope/think that no changes to LNT will be needed, but I haven't investigated myself.

Just to be clear, LNTBased/* is in the test suite; LNT supports plugins and those can be in the test suite. That having been said, if we can do this just within litsupport, that potentially seems better. The key feature seems to be being able to report multiple benchmark timings per actual executable.

  • The lit data format currently does not allow multiple values for a metric.
  • I'm not sure I see why that would be needed here.

I don't mean multiple times for each benchmark, I mean that there are multiple benchmarks for which we should individually collect timings. In the benchmark added here, for example, we'll get individual timings out for:

BENCHMARK(BM_ReturnInstrumentedu);
BENCHMARK(BM_ReturnInstrumentedPatchedThenUnpatched);
BENCHMARK(BM_ReturnInstrumentedPatched);
...

and we should separately record them all.

Ah right.

  • Todays lit works best if you have 1 file for each test; Also test discover is done before any test is run.
  • If there is a flag to only run a specific benchmark in a googletest executable then we could work around this limitation on the cmake side:

    Just generate multiple .test files with slightly different RUN: lines (assume that option is called --only-test): ` retref-bench-unpatched.test contains RUN: retref-bench --only-test unpatched retref-bench-patched.test contains RUN: retref-bench --only-test patches ... ` It would require us to repeat the sub-benchmark names in the cmake file (or write cmake logic to scan the sourcecode?)
Fri, Jun 23, 12:38 PM
hfinkel added a comment to D32272: [XRay] Add Google Benchmark library + initial XRay benchmarks.

LGTM for the most part, thanks for making the changes, it's much simpler now.

Thoughts @MatzeB? I think we can work on running/reporting the microbenchmarks data later, in changes to LNT.

Can you please summarize the state of this and the eventual goal? It looks like:

  1. Eventually, we need a new LNT plugin (i.e. something like what we have in LNTBased/*) that understands how to interpret the output of the Google benchmark library.

I hope this could all be done within the test-suite, probably extending test-suite/litsupport. I hope/think that no changes to LNT will be needed, but I haven't investigated myself.

Just to be clear, LNTBased/* is in the test suite; LNT supports plugins and those can be in the test suite. That having been said, if we can do this just within litsupport, that potentially seems better. The key feature seems to be being able to report multiple benchmark timings per actual executable.

  • The lit data format currently does not allow multiple values for a metric.
  • I'm not sure I see why that would be needed here.
Fri, Jun 23, 11:53 AM
hfinkel added a comment to D32272: [XRay] Add Google Benchmark library + initial XRay benchmarks.

LGTM for the most part, thanks for making the changes, it's much simpler now.

Thoughts @MatzeB? I think we can work on running/reporting the microbenchmarks data later, in changes to LNT.

Can you please summarize the state of this and the eventual goal? It looks like:

  1. Eventually, we need a new LNT plugin (i.e. something like what we have in LNTBased/*) that understands how to interpret the output of the Google benchmark library.
Fri, Jun 23, 8:02 AM
hfinkel added a comment to D32272: [XRay] Add Google Benchmark library + initial XRay benchmarks.

LGTM for the most part, thanks for making the changes, it's much simpler now.

Thoughts @MatzeB? I think we can work on running/reporting the microbenchmarks data later, in changes to LNT.

Fri, Jun 23, 7:42 AM
hfinkel accepted D34531: [LoopUnroll] Pass SCEV to getUnrollingPreferences hook. NFCI..

LGTM

Fri, Jun 23, 4:17 AM

Thu, Jun 22

hfinkel added inline comments to D34544: [libunwind] Don't assume the return address register is always saved and has CFI for it.
Thu, Jun 22, 5:44 PM
hfinkel added a comment to D34157: [llvm-stress] Use C++11 mersenne_twister_engine random device instead of our own (PR32585).

my only concern is I have no idea if uniform_int_distribution guarantees the same behaviour on different targets as mersene does

It does not.
[rand.dist.general]/3:

The algorithms for producing each of the specified distributions are implementation-defined.

Wait, really? This makes using a seeded PRNG inside any of the standard distributions pretty much useless -- you can seed things all you want but you can't actually reproduce the particular results when fed through a distribution...

Thu, Jun 22, 3:13 PM

Wed, Jun 21

hfinkel added a comment to D29651: [OpenMP] Consider LIBRARY_PATH when selecting library paths for NVPTX targets in OpenMP mode..

Why is this necessary?

Wed, Jun 21, 6:31 PM
hfinkel added a comment to D29647: [OpenMP] Extend CLANG target options with device offloading kind..

In general, this patch seems to be missing tests (unless it is actually NFC, or you can't write tests yet, which, in either case, need to be explained).

Wed, Jun 21, 6:22 PM
hfinkel accepted D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.

LGTM

Wed, Jun 21, 6:09 PM
hfinkel committed rL305975: Add LLVM-HPC2017 to upcoming events.
Add LLVM-HPC2017 to upcoming events
Wed, Jun 21, 5:35 PM
hfinkel added inline comments to D34467: [SelectionDAG] set dereferenceable flag when expanding memcpy/memmove.
Wed, Jun 21, 12:09 PM
hfinkel added inline comments to D34245: [PowerPC] Refine the checking for emiting TOC restore nops and Tail-Call eligibility..
Wed, Jun 21, 8:09 AM
hfinkel added inline comments to D34347: [PowerPC] fix potential verification error on __tls_get_addr.
Wed, Jun 21, 7:38 AM

Tue, Jun 20

hfinkel added inline comments to D32139: [AliasSetTracker] Don't drop AA MD so eagerly.
Tue, Jun 20, 4:07 PM
hfinkel added a comment to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.
In D34373#785678, @twoh wrote:
In D34373#785485, @twoh wrote:
In D34373#784975, @twoh wrote:

I think this is a right approach, but concerned that the experimental results I shared on D32451 show that it is generally better to not to vectorize the low trip count loops. @Ayal, I wonder if you have any results that this patch actually improves the performance. Thanks!

I know that we're currently missing opportunities for large vectorizable loops with low (static) trip counts. Smaller inner loops are also good candidates for unpredicated vectorization, but we may need to be a bit careful because of modeling inaccuracies and phase-ordering effects (e.g. if we don't vectorize a loop, then we'll end up unrolling it when the unroller runs).

Got it. My concern was for small single-level loops with low trip counts, as I observe them pretty frequently. I have no objection accepting this patch and improve the cost estimator separately.

Do you mean that you see such loops frequently with dynamically-small trip counts, or with static trip counts? I assume the small loops with (small) static trip counts will generally be unrolled.

Actually you're right. The case I observed was a small static trip count loop completely unrolled and SLP vectorized which actually harms the performance, but not with LV. I think this patch should work if it effectively targets loops that are large enough to not to be unrolled.

Tue, Jun 20, 3:33 PM
hfinkel added a comment to D32729: LV: Don't vectorize with unknown loop counts on divergent targets.

ping

I don't understand this patch. Unless there is both a vector and scalar loop, and I don't see how vectorizing the loop affects divergence between threads one way or the other. Do you really mean to prohibit vectorization when you have a dynamic trip count *and also* don't require a scalar loop? If so, you might look at D34373 which is related.

Yes, the point is to avoid the additional condition where both loops could execute

Tue, Jun 20, 2:43 PM
hfinkel added a comment to D32729: LV: Don't vectorize with unknown loop counts on divergent targets.

ping

Tue, Jun 20, 1:51 PM
hfinkel added a comment to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.
In D34373#785485, @twoh wrote:
In D34373#784975, @twoh wrote:

I think this is a right approach, but concerned that the experimental results I shared on D32451 show that it is generally better to not to vectorize the low trip count loops. @Ayal, I wonder if you have any results that this patch actually improves the performance. Thanks!

I know that we're currently missing opportunities for large vectorizable loops with low (static) trip counts. Smaller inner loops are also good candidates for unpredicated vectorization, but we may need to be a bit careful because of modeling inaccuracies and phase-ordering effects (e.g. if we don't vectorize a loop, then we'll end up unrolling it when the unroller runs).

Got it. My concern was for small single-level loops with low trip counts, as I observe them pretty frequently. I have no objection accepting this patch and improve the cost estimator separately.

Tue, Jun 20, 10:37 AM
hfinkel added a comment to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.
In D34373#784975, @twoh wrote:

I think this is a right approach, but concerned that the experimental results I shared on D32451 show that it is generally better to not to vectorize the low trip count loops. @Ayal, I wonder if you have any results that this patch actually improves the performance. Thanks!

Tue, Jun 20, 7:27 AM

Mon, Jun 19

hfinkel accepted D34318: [BasicAA] Use MayAlias instead of PartialAlias for fallback..

I'd also like to get rid of this hack, and now that the union situation is resolved (at least in the sense that the metadata is now conservatively correct), I think that we should try. Assuming that self hosting and an associated test suite run is clean, this LGTM.

Mon, Jun 19, 7:16 PM
hfinkel added inline comments to D34373: [LV] Optimize for size when vectorizing loops with tiny trip count.
Mon, Jun 19, 5:25 PM
hfinkel accepted D34357: [Clang] Handle interaction of -pg and no_instrument_function attribute..

LGTM

Mon, Jun 19, 11:40 AM
hfinkel accepted D34115: [TRE] Improve code motion in TRE, use AA to tell whether a load can be moved before a call that writes to memory..

LGTM

Mon, Jun 19, 8:17 AM

Mon, Jun 12

hfinkel added a comment to D33904: Add a __has_attribute_enhancement macro to clang.

I prefer __has_attribute_feature to enhancement. I don't see why we need a new macro for this, however. Why not just use __has_feature(overloadable_unmarked) or similar?

Mon, Jun 12, 9:18 AM
hfinkel accepted D32921: [SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just -fno-math-errno.

LGTM

Mon, Jun 12, 8:45 AM

Thu, Jun 1

hfinkel accepted D32888: TableGen: Add support of Intrinsics with multiple returns.

LGTM

Thu, Jun 1, 9:03 PM

Tue, May 30

hfinkel added a comment to D32888: TableGen: Add support of Intrinsics with multiple returns.

Does matching instructions with multiple outputs now work except for this intrinsics problem, or is further work necessary to support this in general?

Tue, May 30, 7:52 PM
hfinkel requested changes to D33572: [PPC] Implement fast bit reverse in PPCDAGToDAGISel.

I'd rather not address the problem this way. Can we canonicalize the code sequence at the IR level into the @llvm.bitreverse intrinsic, and then match the intrinsic efficiently in the backend (preferably in TableGen, where complicated patterns are more succinct to write)? This will give us the additional advantage of good lowering for @llvm.bitreverse and the opportunity for IR-level optimizations to deal with the canonical representation. What the bit permutation selector is doing should be relatively easy to replicate at the IR level.

Tue, May 30, 7:28 PM

Mon, May 29

hfinkel added inline comments to D31494: [PowerPC] Pretty-print CR bits the way the binutils disassembler does.
Mon, May 29, 1:33 AM

Sat, May 27

hfinkel added inline comments to D33404: [PowerPC] Fix a performance bug for PPC::XXPERMDI..
Sat, May 27, 12:23 PM

May 26 2017

hfinkel accepted D33567: [InstCombine] Pass the DominatorTree and AssumptionCache to a few calls to isKnownPositive, isKnownNegative, and isKnownNonZero.

I was operating under the assumption that this was an accident. Most other places use the InstCombine specific wrappers around computeKnownBits so always pass the the dominatortree and assumption cache. If there's a reason for these places to be different then that should be commented.

May 26 2017, 10:48 AM

May 23 2017

hfinkel added inline comments to D33328: [CodeGen] Pessimize aliasing for union members (and may-alias) objects.
May 23 2017, 9:32 AM
hfinkel added inline comments to D33328: [CodeGen] Pessimize aliasing for union members (and may-alias) objects.
May 23 2017, 9:25 AM
hfinkel accepted D33402: [RuntimeDyld, PowerPC] Fix check for external symbols when detecting reloction overflow.

LGTM, thanks!

May 23 2017, 6:53 AM

May 22 2017

hfinkel added inline comments to D31772: [PowerPC] Add pass to expand extra memcpy calls.
May 22 2017, 3:35 PM
hfinkel added inline comments to D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.
May 22 2017, 3:09 PM
hfinkel updated the diff for D31239: [WIP] Add Caching of Known Bits in InstCombine.

Rebased patch provided by Craig Topper.

May 22 2017, 3:02 PM
hfinkel accepted D33396: [LV] Report multiple reasons for not vectorizing under allowExtraAnalysis.

LGTM

May 22 2017, 1:58 PM
hfinkel added a comment to D33402: [RuntimeDyld, PowerPC] Fix check for external symbols when detecting reloction overflow.

Also, can we have a test case for this?

May 22 2017, 7:33 AM
hfinkel accepted D33403: [RuntimeDyld, PowerPC] Fix relocation detection overflow.

LGTM

May 22 2017, 7:33 AM
hfinkel accepted D33402: [RuntimeDyld, PowerPC] Fix check for external symbols when detecting reloction overflow.

LGTM

May 22 2017, 7:30 AM

May 21 2017

hfinkel added a comment to D32737: [Constants][SVE] Represent the runtime length of a scalable vector.

I've only lightly read the spec, but it looks like the vector length can be controlled by writing to the ZCR_ELn registers (so, e.g. user code could make a syscall to change the vector length)?

As I explained to Hal on his comment, that is correct but doesn't have the effect you're expecting.

May 21 2017, 12:36 PM
hfinkel added a comment to D32782: Add pthread_self function prototype and make it speculatable..

OK, I think I found out the cause. I guess this patch was wrong, my bad.
GCC doesn't do anything special with pthread_self per-se.
What GCC does is speculating the glibc implementation of pthread_self is declared with __attribute__(const).
The semantic of the attribute is that of "The const attribute is specified as asserting that the function does not examine any data except the arguments. " [1]
If the function has no arguments, it has to return the same value every time.

Therefore, it speculates.
I think that other libc implementation are free to not declare pthread_self with that attribute. In fact, from what I can see, the FreeBSD version doesn't use that argument.
In other words, I don't think we're allowed to do anything with pthread_self() in general as POSIX specifies weaks guarantees.

So there a general implication we can implement: __attribute__(const). + zero arguments == speculatable?

I'm not sure if that's the exact recipe, but yes, it seems to be on the right path.

Also, I fail to see how it would not be safe to tread pthread_self as speculatable? Same for getpid.

The standard says that the call always succeeds and always returns the thread id. The thread ids are opaque, and I can imagine there being multiple "self" values (pthread_equal would just return true for all of them), thus making pthread_self non-const. However, I can think of no reason why an implementation would do this, don't know of any that do, the behavior would only be observable via some non-standard interface, and I'm happy to cross that bridge if we come to it.

I don't understand the bit about getpid(). In that case forking actually could change the value and you might end up in trouble if you rely on that to write temporary directories (as it's generally done).

Oh. You're right. Also, that seems to also rule out this as well. fork() could also change the value of pthread_self() I'd imagine.

It's slightly different, at least in my opinion.
getpid() returns a pid_t.

pid_t is defined to be an integer type, although POSIX doesn't put any restrictions on the size.
http://pubs.opengroup.org/onlinepubs/009696699/basedefs/sys/types.h.html

glibc (and FreeBSD libc) decide to make it an int. People know it's an integer and use as such.

On the other hand, pthread_t is considered to be an opaque type. It can be an integer/a struct/you name it.
In fact, this completely opaque implementation detail wildly varies across implementations (for LinuxThreads, it's an integer, for NPTL, a pointer).
http://man7.org/linux/man-pages/man3/pthread_self.3.html
Linux also documents that using pthread_t in anything that's not pthread calls results in an unspecified behaviour, while pid_t can be used freely.

So, it's not quite the same, but it's still debatable whether the transformation should be performed or not.

May 21 2017, 1:22 AM

May 20 2017

hfinkel added a comment to D32782: Add pthread_self function prototype and make it speculatable..

OK, I think I found out the cause. I guess this patch was wrong, my bad.
GCC doesn't do anything special with pthread_self per-se.
What GCC does is speculating the glibc implementation of pthread_self is declared with __attribute__(const).
The semantic of the attribute is that of "The const attribute is specified as asserting that the function does not examine any data except the arguments. " [1]
If the function has no arguments, it has to return the same value every time.

Therefore, it speculates.
I think that other libc implementation are free to not declare pthread_self with that attribute. In fact, from what I can see, the FreeBSD version doesn't use that argument.
In other words, I don't think we're allowed to do anything with pthread_self() in general as POSIX specifies weaks guarantees.

So there a general implication we can implement: __attribute__(const). + zero arguments == speculatable?

I'm not sure if that's the exact recipe, but yes, it seems to be on the right path.

Also, I fail to see how it would not be safe to tread pthread_self as speculatable? Same for getpid.

The standard says that the call always succeeds and always returns the thread id. The thread ids are opaque, and I can imagine there being multiple "self" values (pthread_equal would just return true for all of them), thus making pthread_self non-const. However, I can think of no reason why an implementation would do this, don't know of any that do, the behavior would only be observable via some non-standard interface, and I'm happy to cross that bridge if we come to it.

I don't understand the bit about getpid(). In that case forking actually could change the value and you might end up in trouble if you rely on that to write temporary directories (as it's generally done).

May 20 2017, 5:54 PM
hfinkel added a comment to D32782: Add pthread_self function prototype and make it speculatable..

OK, I think I found out the cause. I guess this patch was wrong, my bad.
GCC doesn't do anything special with pthread_self per-se.
What GCC does is speculating the glibc implementation of pthread_self is declared with __attribute__(const).
The semantic of the attribute is that of "The const attribute is specified as asserting that the function does not examine any data except the arguments. " [1]
If the function has no arguments, it has to return the same value every time.

Therefore, it speculates.
I think that other libc implementation are free to not declare pthread_self with that attribute. In fact, from what I can see, the FreeBSD version doesn't use that argument.
In other words, I don't think we're allowed to do anything with pthread_self() in general as POSIX specifies weaks guarantees.

May 20 2017, 5:44 PM
hfinkel added a comment to D32737: [Constants][SVE] Represent the runtime length of a scalable vector.

What are the semantics of select when the two vectors have different width? Does store do a memory allocation?

Maybe I misunderstood, but won't those selects be ill-typed?

May 20 2017, 5:15 PM

May 16 2017

hfinkel accepted D33017: [PPC] Properly update register save area offsets.

This passes a double bootstrap with full testing. So I think it's fine.

May 16 2017, 11:04 AM

May 13 2017

hfinkel added a comment to D18738: Add new !unconditionally_dereferenceable load instruction metadata.

@hfinkel Oh, my bad--I now remember that this came up long ago...

...

@sanjoy Can you confirm that a dereferenceable attribute on getelementptr would be an acceptable IR extension?

A GEP that always produces a dereferenceable value may be tricky to implement since we'll have to remember to strip said attribute whenever we hoist GEPs; and LLVM likes to hoist GEP's without thinking too much. But I believe this is going in the right direction -- we should not have soundness problems as long as the safety of some operation is guaranteed by some other preceding operation.

May 13 2017, 11:56 PM

May 12 2017

hfinkel added a comment to D33136: [ValueTracking] Don't let isAddOfNonZero look at adds with a PHI node as input.

Can we add a mode (e.g. some extra boolean parameter) to ValueTracking to enable this kind of restriction? We could then enable this mode when in BasicAA. I don't want to, in general, disable things here to work around the fact that BasicAA is doing some unsound.

May 12 2017, 9:40 AM

May 11 2017

hfinkel added a comment to D18738: Add new !unconditionally_dereferenceable load instruction metadata.

@sanjoy

I'd be okay (even happy! :) ) if you add a @llvm.safe.load.<ty> intrinsic that never has UB, and returns undef if the address passed to it is not dereferenceable. That intrinsic could then be marked speculatable. If needed, we could even implement the intrinsic by trying to read from the address passed in, and by catching the SIGSEGV or SIGBUS, if any.

First, it is not realistically possible to implement on most platforms (SEH *might* be fine but even then I'm not sure). Second, every pass that looks at loads will have to be amended in an invasive way (a quick look at LICM alone tells me this will be a nightmare as it passes bare LoadInst* everywhere...). Third, it would be crippled compared to real loads, as it won't support some attributes loads do (e.g. !invariant.load) and adding support for that will, AFAICT, require adding a new return value attribute, similar to how nonnull and dereferenceable are currently implemented there. Fourth, I don't think it will be easy to plug into the current AA architecture.

Even if all the rest was fixed, the lack of !invariant.load alone makes it completely useless for our use case so I suggest not discussing this proposal further.

Let's back away a bit. My current issue is that my frontend generates lots of deeply nested loads in inner loops. This happens because it is translating Python, and you can easily end up with something like:

for bs in as:
  for b in bs:
    c += self.core.dds0.ftw

The frontend guarantees that all these pointers are always dereferenceable. In fact every single SSA value of pointer type in the entire emitted IR is dereferenceable and nonnull. The frontend also knows that most of these loads are ultimately constant (in the case above, a simplified extract of real-world code, self.core.dds0 will never change for the entire program lifetime). The frontend is not able to hoist the loads into preheader itself because it does not perform inlining and so does not have enough visibility.

How can I tell LLVM that these loads may be safely hoisted?

May 11 2017, 1:19 PM

May 9 2017

hfinkel added inline comments to D29654: [OpenMP] Integrate OpenMP target region cubin into host binary.
May 9 2017, 7:48 PM
hfinkel added inline comments to D32888: TableGen: Add support of Intrinsics with multiple returns.
May 9 2017, 7:32 PM

May 8 2017

hfinkel added a comment to D32006: Mark invariant.group.barrier as inaccessiblememonly.

Is it possible to add "writeonly"? I am not sure if it will help in any way, but the tests seems to be working it.

May 8 2017, 8:13 AM

May 7 2017

hfinkel added inline comments to D32059: Make AssumptionCache's interface return a range of Values.
May 7 2017, 6:52 PM
hfinkel added inline comments to D32006: Mark invariant.group.barrier as inaccessiblememonly.
May 7 2017, 2:17 PM

May 4 2017

hfinkel added a comment to D31908: [AntiDepBreaker] Don't rename callee saved register restore instruction.

Could we check this in the verifier? One advantage this might have is that, for blocks ending in unreachable, we can easily realize that we don't need the restoration code.

Are we adding restoration code to such blocks now? That could be checked in the verifier, but it's not incorrect. Maybe a warning could be issued...

May 4 2017, 7:54 AM
hfinkel added a comment to D31908: [AntiDepBreaker] Don't rename callee saved register restore instruction.

Yeah I like that way of modeling things explicitely. Unfortunately that is not how most llvm targets work today.

Would it be reasonable to implement that for all targets?

May 4 2017, 7:35 AM

Apr 28 2017

hfinkel added a comment to D20116: Add speculatable function attribute.

Okay, unfortunately, this is only useful to me if we allow it on function declarations

I had somehow missed this bit ^ and I was under the impression that the main motivation for a general attribute was more completeness than anything else.

Apr 28 2017, 1:11 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

I think a general speculatable attribute that is allowed only on functions decls is *less problematic*[0] that a context sensitive one, but I think speculatable intrinsics are clearly okay. Therefore my opinion is (which I expressed on IRC to Matt) is to first land the intrinsic variant of this, since that's what he's blocked on; and then we can go ahead with more aggressive variants on subsequent patches.

[0] https://reviews.llvm.org/D20116#709352

Apr 28 2017, 12:41 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

Only allow for intrinsics

Why only for intrinsics? I thought we had concluded that we'd only allow it for declarations and not on call sites (which may technically mean on call sited but only matching the declaration). I think it is important that we can apply it to regular functions.

I think so too, but @sanjoy said to restrict it to intrinsics for now.

Apr 28 2017, 12:31 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

Only allow for intrinsics

Apr 28 2017, 12:13 PM

Apr 27 2017

hfinkel committed rL301590: Add references to past LLVM in HPC workshops.
Add references to past LLVM in HPC workshops
Apr 27 2017, 2:07 PM
hfinkel added a comment to D32605: Recognize CTLZ builtin.

We could add a TTI callback like we have for popcnt. You could also argue that this is the better canonical form because it can enable other analysis/optimizations (and we should fix the expansion if it is suboptimal).

Apr 27 2017, 12:19 PM
hfinkel added a comment to D32582: [InstCombine] Add range metadata to cttz/ctlz/ctpop intrinsic calls based on known bits.

What analysis benefits from this range data? Should we teach that analysis to use known-bits information directly instead of using metadata as an analysis cache?

Apr 27 2017, 4:15 AM

Apr 24 2017

hfinkel added inline comments to D32433: Compute safety information in a much finer granularity..
Apr 24 2017, 3:39 PM
hfinkel accepted D32433: Compute safety information in a much finer granularity..

LGTM

Apr 24 2017, 9:09 AM

Apr 22 2017

hfinkel added inline comments to D32387: [PartialInlining] Add Optimization Remark Support.
Apr 22 2017, 7:48 PM

Apr 21 2017

hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Mark the types of globals in a generated ctor-called function (added a runtime test for this). memcpy/memmove should do the corresponding thing for the type shadow data.

Apr 21 2017, 5:21 PM
hfinkel updated the diff for D32198: [TySan] A Type Sanitizer (LLVM).

Mark the types of globals in a generated ctor-called function. memcpy/memmove should do the corresponding thing for the type shadow data.

Apr 21 2017, 5:18 PM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Output metadata to provide the types of globals (similar to how Clang marks globals for asan).

Apr 21 2017, 5:16 PM
hfinkel updated the diff for D32198: [TySan] A Type Sanitizer (LLVM).

Rename TBAASanitizer -> TypeSanitizer

Apr 21 2017, 7:21 AM
hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Rename TBAASanitizer -> TypeSanitizer

Apr 21 2017, 7:19 AM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Rename TBAASanitizer -> TypeSanitizer

Apr 21 2017, 7:18 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).
  1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type

To come back to this point: We don't really implement these rules now, and it is not clear that we will. The problem here is that, if we take the specification literally, then we can't use our current TBAA at all. The problem is that if we have:

write x, !tbaa "int"
read x, !tbaa "int"
write x, !tbaa "float"

TBAA will currently tell us that the "float" write aliases with neither the preceding read nor the preceding write.

Right, C's TBAA rules do not (in general) permit a store to be reordered before a memory operation of a different type, they only allow loads to be moved before stores. (Put another way, they do not tell you that pointers point to distinct memory locations, just that a stored value cannot be observed by a load of a different type.) You get the more general "distinct memory locations" result only for objects of a declared type.

C++ is similar, except that (because object lifetimes do not currently begin magically due to a store) you /can/ reorder stores past a memory operation of a different type if you know no object's lifetime began in between. (But currently we do not record all lifetime events in IR, so we can't do that today. Also, we may be about to lose the property that you can statically determine a small number of places that might start an object lifetime.)

Also, a strict reading of C's access rules seems to rule out the premise underlying our struct-path TBAA entirely. So long as I'm accessing a value using a struct that has some member, including recursively, with that type, then it's fine. The matching of the relative offsets is a sufficient, but not necessary, condition for well-defined access. C++ has essentially the same language (and, thus, potentially the same problem).

I agree this rule is garbage, but it's not as permissive as I think you're suggesting. The rule says that you can use an lvalue of struct type to access memory of struct field type. In C this happens during struct assignment, for instance. It does *not* permit using an lvalue of struct field type to access unrelated fields of the same struct. So C appears to allow this nonsense:

char *p = malloc(8);
*(int*)p = 0;
*(int*)(p + 4) = 0;
struct S {int n; float f;} s = *(struct S*)p; // use lvalue of type `struct S` to access object of effective type `int`, to initialize a `float`

but not this nonsense:

float q = ((struct S*)p)->f; // ub, cannot use lvalue of type `float` to access object of effective type `int`

... which just means that we can't make much use of TBAA when emitting struct copies in C.

In C++, on the other hand, the rule is even more garbage, since there is no way to perform a memory access with a glvalue of class type. (The closest you get is that a defaulted union construction/assignment copies the object representation, but that's expressed in terms of copying a sequence of unsigned chars, and in any case those are member functions and so already require an object of the correct type to exist.) See wg21.link/cwg2051

Apr 21 2017, 4:03 AM

Apr 20 2017

hfinkel accepted D32327: [OpenMP] Run libomptarget regression tests using all available system threads..

LGTM

Apr 20 2017, 6:41 PM · Restricted Project
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

If you're going to try to enforce the declared type of memory, you'll also need something like the C effective type rule to handle char buffers in C++. As far as I can tell, it's not actually legal under the spec to cast an array of chars to an arbitrary type and access it that way — you have to do something to establish that there's an object of that type there first.
If you memcpy'ed into that buffer from an object of the right type, that would be sufficient to create a new formal object of that type, but I don't see any way to sensibly apply that rule to e.g. the POSIX "read" function. It seems to me that you at least need to have a rule saying that it's okay to access a formal object of type char/char[] using an arbitrarily-typed l-value.

Apr 20 2017, 11:34 AM
hfinkel accepted D32301: Don't pass FPOpFusion::Strict to the backend.

LGTM

Apr 20 2017, 10:00 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

As I've currently implemented it, both reads and writes set the type of previously-unknown storage, and after that it says fixed (unless you memcpy to it, memset it, or its lifetime ends (the type gets reset on lifetime.start/end and for malloc/allocas/etc.). There's a flag to enable the "writes always set the type" rule, but that's not the default. Is this too strict?

That seems like it will have at least three flavors of false positive:

  1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type
Apr 20 2017, 8:39 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

As I've currently implemented it, both reads and writes set the type of previously-unknown storage, and after that it says fixed (unless you memcpy to it, memset it, or its lifetime ends (the type gets reset on lifetime.start/end and for malloc/allocas/etc.). There's a flag to enable the "writes always set the type" rule, but that's not the default. Is this too strict?

That seems like it will have at least three flavors of false positive:

  1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type
  2. After a placement new in C++, you should be able to use the storage as a new type
  3. Storing directly to a member access on a union (ie, with the syntax x.a = b) in C++ permits using the storage as the new type

    If we want to follow the relevant language rules by default, that would suggest that "writes always set the type" should be enabled by default in C and disabled by default in C++. That may not be the right decision for other reasons, though. In C++, writes through union members and new-expressions should probably (re)set the type
Apr 20 2017, 5:46 AM

Apr 19 2017

hfinkel abandoned D32260: [TBAA] Vector types should (only) alias with their element types.

@rjmccall said, on this topic, in D31885:

Apr 19 2017, 8:52 PM
hfinkel added a comment to D31561: cmath: Skip Libc for integral types in isinf, etc..

Marshall, that's what I assumed originally, but I figured Hal had some non-standard-but-worth-supporting use case in mind.

Apr 19 2017, 8:49 PM
hfinkel created D32260: [TBAA] Vector types should (only) alias with their element types.
Apr 19 2017, 5:14 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

! In D32199#731252, @hfinkel wrote:

How about renaming this to something more like -fsanitize=type?

I'm fine with that. Do you like TypeSanitizer or TypeAccessSantizer or TypeAliasingSanitizer best?

I think calling it a type aliasing sanitizer would somewhat conflate the details of the mechanism with the fundamentals of the check itself. For example:

variant<int, float> v;
int &n = v.get<int>;
v = 1.3f;
int m = n;

... is a lifetime bug, not an aliasing bug, but would be caught by this check just the same.

Apr 19 2017, 5:04 PM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

...

(And the nonsensicality of the standard very much continues. The code that we've all agreed is obviously being miscompiled here is still a strict violation of the "improved" union rules in C++, and if we decide to treat it as ill-formed we're going to break a massive amount of code (and vector code in particular heavily uses unions), because trust me, nobody is reliably placement-newing union fields when they're non-trivial. C++'s new rules are unworkable in part because they rely on C++-specific features that cannot be easily adopted in headers which need to be language-neutral.)

I think we all agree that the vector types need to have their scalar (element) types as parents in the current TBAA representation. That's a separate change that we should just make. That, unfortunately, only fixes a subset of the union-related miscompiles.

Apr 19 2017, 4:21 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

...

I would also expect that we will extend this in future to assign types to storage even in cases where there is no store (for instance, we should be able to catch float f() { int n; return *(float*)&n; } despite there being no TBAA violation in the naive IR).

Yes. My thought was that we'd make Clang generate tbaa metadata on allocas and lifetime.start intrinsics (and globals) so that we can mark the memory types upon creation. Would that catch everything?

Apr 19 2017, 3:59 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

...

I would also expect that we will extend this in future to assign types to storage even in cases where there is no store (for instance, we should be able to catch float f() { int n; return *(float*)&n; } despite there being no TBAA violation in the naive IR).

Apr 19 2017, 3:54 PM
hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Updated per review comments.

Apr 19 2017, 3:43 PM
hfinkel added a comment to D32197: [TySan] A Type Sanitizer (Runtime Library).

I just did a very quick first pass. Looks good in general, but I have some nits.

Apr 19 2017, 3:39 PM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Updated per review comments (use only no_sanitize("tbaa") instead of adding no_sanitize_tbaa and don't touch freebsd for now).

Apr 19 2017, 2:59 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

Missing a sanitizer-ld.c test for freebsd.

Apr 19 2017, 2:55 PM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

I'm somewhat concerned that this patch is quadratic in the AST.

I'd be happy to address this, but I'm not sure how. Memoizing results could be one way, but don't know if that's acceptable.

This location in codegen seems to be the last place where the original C/C++ types are available, in particular the information as to whether a type is a union or not. Maybe it would be possible to propagate some bit somewhere, but then this patch would become much less localized.

Apr 19 2017, 11:57 AM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

There was a deliberate decision to make TBAA conservative about type punning in LLVM because, in practice, the strict form of TBAA you think we should follow has proven to be a failed optimization paradigm: it just leads users to globally disable TBAA, which is obviously a huge loss. This was never motivated purely by the nonsensicality of the standardization of unions.

Apr 19 2017, 11:54 AM
hfinkel added a comment to D32198: [TySan] A Type Sanitizer (LLVM).

As a note: As follow-up work, we might want to extend Clang to add TBAA metadata to allocas and lifetime.start intrinsics so that the instrumentation pass can mark the types of memory upon declaration/construction instead of only upon first access.

Apr 19 2017, 5:59 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

As a note: As follow-up work, we might want to extend Clang to add TBAA metadata to allocas and lifetime.start intrinsics so that the instrumentation pass can mark the types of memory upon declaration/construction instead of only upon first access.

Apr 19 2017, 5:58 AM