hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (231 w, 3 d)

Recent Activity

Yesterday

hfinkel added inline comments to D32433: Compute safety information in a much finer granularity..
Mon, Apr 24, 3:39 PM
hfinkel accepted D32433: Compute safety information in a much finer granularity..

LGTM

Mon, Apr 24, 9:09 AM

Sat, Apr 22

hfinkel added inline comments to D32387: [PartialInlining] Add Optimization Remark Support.
Sat, Apr 22, 7:48 PM

Fri, Apr 21

hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Mark the types of globals in a generated ctor-called function (added a runtime test for this). memcpy/memmove should do the corresponding thing for the type shadow data.

Fri, Apr 21, 5:21 PM
hfinkel updated the diff for D32198: [TySan] A Type Sanitizer (LLVM).

Mark the types of globals in a generated ctor-called function. memcpy/memmove should do the corresponding thing for the type shadow data.

Fri, Apr 21, 5:18 PM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Output metadata to provide the types of globals (similar to how Clang marks globals for asan).

Fri, Apr 21, 5:16 PM
hfinkel updated the diff for D32198: [TySan] A Type Sanitizer (LLVM).

Rename TBAASanitizer -> TypeSanitizer

Fri, Apr 21, 7:21 AM
hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Rename TBAASanitizer -> TypeSanitizer

Fri, Apr 21, 7:19 AM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Rename TBAASanitizer -> TypeSanitizer

Fri, Apr 21, 7:18 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).
  1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type

To come back to this point: We don't really implement these rules now, and it is not clear that we will. The problem here is that, if we take the specification literally, then we can't use our current TBAA at all. The problem is that if we have:

write x, !tbaa "int"
read x, !tbaa "int"
write x, !tbaa "float"

TBAA will currently tell us that the "float" write aliases with neither the preceding read nor the preceding write.

Right, C's TBAA rules do not (in general) permit a store to be reordered before a memory operation of a different type, they only allow loads to be moved before stores. (Put another way, they do not tell you that pointers point to distinct memory locations, just that a stored value cannot be observed by a load of a different type.) You get the more general "distinct memory locations" result only for objects of a declared type.

C++ is similar, except that (because object lifetimes do not currently begin magically due to a store) you /can/ reorder stores past a memory operation of a different type if you know no object's lifetime began in between. (But currently we do not record all lifetime events in IR, so we can't do that today. Also, we may be about to lose the property that you can statically determine a small number of places that might start an object lifetime.)

Also, a strict reading of C's access rules seems to rule out the premise underlying our struct-path TBAA entirely. So long as I'm accessing a value using a struct that has some member, including recursively, with that type, then it's fine. The matching of the relative offsets is a sufficient, but not necessary, condition for well-defined access. C++ has essentially the same language (and, thus, potentially the same problem).

I agree this rule is garbage, but it's not as permissive as I think you're suggesting. The rule says that you can use an lvalue of struct type to access memory of struct field type. In C this happens during struct assignment, for instance. It does *not* permit using an lvalue of struct field type to access unrelated fields of the same struct. So C appears to allow this nonsense:

char *p = malloc(8);
*(int*)p = 0;
*(int*)(p + 4) = 0;
struct S {int n; float f;} s = *(struct S*)p; // use lvalue of type `struct S` to access object of effective type `int`, to initialize a `float`

but not this nonsense:

float q = ((struct S*)p)->f; // ub, cannot use lvalue of type `float` to access object of effective type `int`

... which just means that we can't make much use of TBAA when emitting struct copies in C.

In C++, on the other hand, the rule is even more garbage, since there is no way to perform a memory access with a glvalue of class type. (The closest you get is that a defaulted union construction/assignment copies the object representation, but that's expressed in terms of copying a sequence of unsigned chars, and in any case those are member functions and so already require an object of the correct type to exist.) See wg21.link/cwg2051

Fri, Apr 21, 4:03 AM

Thu, Apr 20

hfinkel accepted D32327: [OpenMP] Run libomptarget regression tests using all available system threads..

LGTM

Thu, Apr 20, 6:41 PM · Restricted Project
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

If you're going to try to enforce the declared type of memory, you'll also need something like the C effective type rule to handle char buffers in C++. As far as I can tell, it's not actually legal under the spec to cast an array of chars to an arbitrary type and access it that way — you have to do something to establish that there's an object of that type there first.
If you memcpy'ed into that buffer from an object of the right type, that would be sufficient to create a new formal object of that type, but I don't see any way to sensibly apply that rule to e.g. the POSIX "read" function. It seems to me that you at least need to have a rule saying that it's okay to access a formal object of type char/char[] using an arbitrarily-typed l-value.

Thu, Apr 20, 11:34 AM
hfinkel accepted D32301: Don't pass FPOpFusion::Strict to the backend.

LGTM

Thu, Apr 20, 10:00 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

As I've currently implemented it, both reads and writes set the type of previously-unknown storage, and after that it says fixed (unless you memcpy to it, memset it, or its lifetime ends (the type gets reset on lifetime.start/end and for malloc/allocas/etc.). There's a flag to enable the "writes always set the type" rule, but that's not the default. Is this too strict?

That seems like it will have at least three flavors of false positive:

  1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type
Thu, Apr 20, 8:39 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

As I've currently implemented it, both reads and writes set the type of previously-unknown storage, and after that it says fixed (unless you memcpy to it, memset it, or its lifetime ends (the type gets reset on lifetime.start/end and for malloc/allocas/etc.). There's a flag to enable the "writes always set the type" rule, but that's not the default. Is this too strict?

That seems like it will have at least three flavors of false positive:

  1. C's "effective type" rule allows writes to set the type pretty much unconditionally, unless the storage is for a variable with a declared type
  2. After a placement new in C++, you should be able to use the storage as a new type
  3. Storing directly to a member access on a union (ie, with the syntax x.a = b) in C++ permits using the storage as the new type

    If we want to follow the relevant language rules by default, that would suggest that "writes always set the type" should be enabled by default in C and disabled by default in C++. That may not be the right decision for other reasons, though. In C++, writes through union members and new-expressions should probably (re)set the type
Thu, Apr 20, 5:46 AM

Wed, Apr 19

hfinkel abandoned D32260: [TBAA] Vector types should (only) alias with their element types.

@rjmccall said, on this topic, in D31885:

Wed, Apr 19, 8:52 PM
hfinkel added a comment to D31561: cmath: Skip Libc for integral types in isinf, etc..

Marshall, that's what I assumed originally, but I figured Hal had some non-standard-but-worth-supporting use case in mind.

Wed, Apr 19, 8:49 PM
hfinkel created D32260: [TBAA] Vector types should (only) alias with their element types.
Wed, Apr 19, 5:14 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

! In D32199#731252, @hfinkel wrote:

How about renaming this to something more like -fsanitize=type?

I'm fine with that. Do you like TypeSanitizer or TypeAccessSantizer or TypeAliasingSanitizer best?

I think calling it a type aliasing sanitizer would somewhat conflate the details of the mechanism with the fundamentals of the check itself. For example:

variant<int, float> v;
int &n = v.get<int>;
v = 1.3f;
int m = n;

... is a lifetime bug, not an aliasing bug, but would be caught by this check just the same.

Wed, Apr 19, 5:04 PM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

...

(And the nonsensicality of the standard very much continues. The code that we've all agreed is obviously being miscompiled here is still a strict violation of the "improved" union rules in C++, and if we decide to treat it as ill-formed we're going to break a massive amount of code (and vector code in particular heavily uses unions), because trust me, nobody is reliably placement-newing union fields when they're non-trivial. C++'s new rules are unworkable in part because they rely on C++-specific features that cannot be easily adopted in headers which need to be language-neutral.)

I think we all agree that the vector types need to have their scalar (element) types as parents in the current TBAA representation. That's a separate change that we should just make. That, unfortunately, only fixes a subset of the union-related miscompiles.

Wed, Apr 19, 4:21 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

...

I would also expect that we will extend this in future to assign types to storage even in cases where there is no store (for instance, we should be able to catch float f() { int n; return *(float*)&n; } despite there being no TBAA violation in the naive IR).

Yes. My thought was that we'd make Clang generate tbaa metadata on allocas and lifetime.start intrinsics (and globals) so that we can mark the memory types upon creation. Would that catch everything?

Wed, Apr 19, 3:59 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

...

I would also expect that we will extend this in future to assign types to storage even in cases where there is no store (for instance, we should be able to catch float f() { int n; return *(float*)&n; } despite there being no TBAA violation in the naive IR).

Wed, Apr 19, 3:54 PM
hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Updated per review comments.

Wed, Apr 19, 3:43 PM
hfinkel added a comment to D32197: [TySan] A Type Sanitizer (Runtime Library).

I just did a very quick first pass. Looks good in general, but I have some nits.

Wed, Apr 19, 3:39 PM
hfinkel updated the diff for D32199: [TySan] A Type Sanitizer (Clang).

Updated per review comments (use only no_sanitize("tbaa") instead of adding no_sanitize_tbaa and don't touch freebsd for now).

Wed, Apr 19, 2:59 PM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

Missing a sanitizer-ld.c test for freebsd.

Wed, Apr 19, 2:55 PM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

I'm somewhat concerned that this patch is quadratic in the AST.

I'd be happy to address this, but I'm not sure how. Memoizing results could be one way, but don't know if that's acceptable.

This location in codegen seems to be the last place where the original C/C++ types are available, in particular the information as to whether a type is a union or not. Maybe it would be possible to propagate some bit somewhere, but then this patch would become much less localized.

Wed, Apr 19, 11:57 AM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

There was a deliberate decision to make TBAA conservative about type punning in LLVM because, in practice, the strict form of TBAA you think we should follow has proven to be a failed optimization paradigm: it just leads users to globally disable TBAA, which is obviously a huge loss. This was never motivated purely by the nonsensicality of the standardization of unions.

Wed, Apr 19, 11:54 AM
hfinkel added a comment to D32198: [TySan] A Type Sanitizer (LLVM).

As a note: As follow-up work, we might want to extend Clang to add TBAA metadata to allocas and lifetime.start intrinsics so that the instrumentation pass can mark the types of memory upon declaration/construction instead of only upon first access.

Wed, Apr 19, 5:59 AM
hfinkel added a comment to D32199: [TySan] A Type Sanitizer (Clang).

As a note: As follow-up work, we might want to extend Clang to add TBAA metadata to allocas and lifetime.start intrinsics so that the instrumentation pass can mark the types of memory upon declaration/construction instead of only upon first access.

Wed, Apr 19, 5:58 AM

Tue, Apr 18

hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

To be clear, I'm find with disabling union tbaa until we can fix things for real. I'm somewhat concerned that this patch is quadratic in the AST.

Tue, Apr 18, 8:11 PM
hfinkel added a reviewer for D31885: Remove TBAA information from LValues representing union members: rjmccall.
Tue, Apr 18, 8:04 PM
hfinkel added a comment to D32198: [TySan] A Type Sanitizer (LLVM).

https://reviews.llvm.org/D32197 (Runtime)
https://reviews.llvm.org/D32199 (Clang)

Tue, Apr 18, 4:17 PM
hfinkel added a comment to D32197: [TySan] A Type Sanitizer (Runtime Library).

https://reviews.llvm.org/D32198 (LLVM)
https://reviews.llvm.org/D32199 (Clang)

Tue, Apr 18, 4:16 PM
hfinkel created D32199: [TySan] A Type Sanitizer (Clang).
Tue, Apr 18, 4:15 PM
hfinkel created D32198: [TySan] A Type Sanitizer (LLVM).
Tue, Apr 18, 4:04 PM
hfinkel updated the diff for D32197: [TySan] A Type Sanitizer (Runtime Library).

Upload the version of the patch that works on ppc64le.

Tue, Apr 18, 4:00 PM
hfinkel created D32197: [TySan] A Type Sanitizer (Runtime Library).
Tue, Apr 18, 3:58 PM

Fri, Apr 14

hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

I'm not sure this is the right way to do this; I suspect we're lumping together a bunch of different bugs:

  1. vector types need to have tbaa which makes them alias with their element types [to be clear, as vector types are an implementation extension, this is our choice; I believe most users expect this to be true, but I'm certainly open to leaving this as-is (i.e. the vector types and scalar types as independent/non-aliasing)].
  2. tbaa can't be used for write <-> write queries (only read <-> write queries) because the writes can change the effective type
  3. our 'struct' path TBAA for unions is broken (and to fix this we need to invert the tree structure, etc. as discussed on the list)

See https://bugs.llvm.org/show_bug.cgi?id=28189 for a testcase for (2) for this which doesn't involve unions.

Fri, Apr 14, 11:56 AM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

I'm not sure about the solution to #2, because i thought there were very specific points in time at which the effective type could change.

Fri, Apr 14, 10:09 AM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

This is not meant as a fine-grained solution to the TBAA problem, but a temporary fix for generating wrong information. Just yesterday I helped diagnose yet another problem related to this, so this issue is causing trouble out there.

Fri, Apr 14, 9:12 AM
hfinkel added inline comments to D31167: Use FPContractModeKind universally.
Fri, Apr 14, 5:54 AM
hfinkel added a comment to D31885: Remove TBAA information from LValues representing union members.

I'm not sure this is the right way to do this; I suspect we're lumping together a bunch of different bugs:

  1. vector types need to have tbaa which makes them alias with their element types [to be clear, as vector types are an implementation extension, this is our choice; I believe most users expect this to be true, but I'm certainly open to leaving this as-is (i.e. the vector types and scalar types as independent/non-aliasing)].
  2. tbaa can't be used for write <-> write queries (only read <-> write queries) because the writes can change the effective type
  3. our 'struct' path TBAA for unions is broken (and to fix this we need to invert the tree structure, etc. as discussed on the list)
Fri, Apr 14, 5:45 AM
hfinkel added a reviewer for D31885: Remove TBAA information from LValues representing union members: dberlin.
Fri, Apr 14, 5:38 AM

Thu, Apr 13

hfinkel created D32033: Make test/parallel/omp_nested.c not use so many threads.
Thu, Apr 13, 10:37 AM

Wed, Apr 12

hfinkel accepted D31998: [msan] Fix msan_test broken after r299884..

LGTM, let's try it. It would be good to commit the vector fix separately in case we need to revert the interface change we shouldn't lose the bug fix too.

Wed, Apr 12, 5:36 PM
hfinkel accepted D31531: Remove readnone from invariant.group.barrier.
Wed, Apr 12, 1:37 PM
hfinkel added inline comments to D31167: Use FPContractModeKind universally.
Wed, Apr 12, 11:06 AM
hfinkel accepted D30941: Better testing of schedule model instruction latencies/throughputs.

I implemeted all requirements from hfinkel.
Please, review again.

Wed, Apr 12, 8:42 AM

Tue, Apr 11

hfinkel added inline comments to D30941: Better testing of schedule model instruction latencies/throughputs.
Tue, Apr 11, 11:25 AM
hfinkel added inline comments to D31167: Use FPContractModeKind universally.
Tue, Apr 11, 10:12 AM
hfinkel added inline comments to D31167: Use FPContractModeKind universally.
Tue, Apr 11, 9:45 AM
hfinkel added a comment to D31413: [libc++] Use __attribute__((init_priority(101))) to ensure streams get initialized early.

I'm not certain of a good way to test it, but I have a question about the value you picked for init_priority. My understanding of the values starting from 101 is that 1-100 are reserved for implementation use. Is that understanding correct? If so, you may want to pick a value below 100 to ensure there's not an arms race with the user. I believe this may require some alteration to SemaDeclAttr.cpp to not diagnose when this is coming from a system header. Otherwise, what's to stop the user from having something marked constructor(101) that attempts to use a stream, but can't because they're not initialized yet?

Tue, Apr 11, 4:21 AM

Mon, Apr 10

hfinkel added a comment to D31398: [X86][X86 intrinsics]Folding cmp(sub(a,b),0) into cmp(a,b) optimization.

(x-y) == 0 --> x == y does not require nsz (zeros of any sign compare equal), nor does it require nnan (if x or y is NaN, both comparisons are false). It *does* require ninf (because inf-inf = NaN), and it also requires that subnormals are not flushed by the CPU. There's been some churn around flushing recently, Hal may have thoughts (+Hal).

Mon, Apr 10, 8:05 PM
hfinkel committed rL299911: [LICM] Hoist fp division from the loops and replace by a reciprocal.
[LICM] Hoist fp division from the loops and replace by a reciprocal
Mon, Apr 10, 7:35 PM
hfinkel closed D30819: Fix PR32157 - Hoist fp division from the loops and replace by a reciprocal by committing rL299911: [LICM] Hoist fp division from the loops and replace by a reciprocal.
Mon, Apr 10, 7:35 PM
hfinkel committed rL299910: [PowerPC] multiply-with-overflow might use the CTR register.
[PowerPC] multiply-with-overflow might use the CTR register
Mon, Apr 10, 7:16 PM
hfinkel closed D31790: [PowerPC] Assume 128bit multiply uses CTR by committing rL299910: [PowerPC] multiply-with-overflow might use the CTR register.
Mon, Apr 10, 7:15 PM
hfinkel committed rL299908: [bugpoint] Also remove comdat's from externalized GVs.
[bugpoint] Also remove comdat's from externalized GVs
Mon, Apr 10, 5:31 PM
hfinkel accepted D31555: [PPC64, Sanitizers] Proper stack frame for the thread spawned in internal_clone.

LGTM

Mon, Apr 10, 4:03 PM
hfinkel updated subscribers of D31561: cmath: Skip Libc for integral types in isinf, etc..

I'm happy with this now. @EricWF , @mclow.lists , thoughts?

Mon, Apr 10, 3:49 PM
hfinkel added inline comments to D31167: Use FPContractModeKind universally.
Mon, Apr 10, 1:03 PM
hfinkel added inline comments to D31167: Use FPContractModeKind universally.
Mon, Apr 10, 12:38 PM
hfinkel added inline comments to D31561: cmath: Skip Libc for integral types in isinf, etc..
Mon, Apr 10, 12:28 PM
hfinkel accepted D31790: [PowerPC] Assume 128bit multiply uses CTR.

@hfinkel: Thanks for the feedback!

I have made the changes recommended and uploaded a new revision. (I though this would trigger another mail but apparently it didn't, so I'm commenting again).

Mon, Apr 10, 11:49 AM

Sun, Apr 9

hfinkel committed rL299826: [PowerPC] Change -faltivec to -maltivec.
[PowerPC] Change -faltivec to -maltivec
Sun, Apr 9, 10:28 AM
hfinkel committed rL299823: [MemorySSA] Fix use of pointsToConstantMemory in….
[MemorySSA] Fix use of pointsToConstantMemory in…
Sun, Apr 9, 6:11 AM

Thu, Apr 6

hfinkel added a comment to D31790: [PowerPC] Assume 128bit multiply uses CTR.

A somewhat minified example IR which reproduces the assertion is attached to the linked bug report -- here I would also appreciate some guidance on in what form / where this can be integrated in the regression tests.

Thu, Apr 6, 5:51 PM
hfinkel accepted D31726: AliasAnalysis: Be less conservative about volatile than atomic..

LGTM

Thu, Apr 6, 5:24 PM
hfinkel added inline comments to D31726: AliasAnalysis: Be less conservative about volatile than atomic..
Thu, Apr 6, 9:00 AM

Thu, Mar 30

hfinkel added a comment to D31494: [PowerPC] Pretty-print CR bits the way the binutils disassembler does.

This seems like a good thing to do.

Thu, Mar 30, 11:03 AM

Tue, Mar 28

hfinkel added a comment to D29966: [SelectionDAG] Try to recompute LiveOutInfo of PHI.

Do you have a test case?

Tue, Mar 28, 10:05 PM
hfinkel added inline comments to D31442: Remove readnone from invariant.group.barrier.
Tue, Mar 28, 3:09 PM

Mar 25 2017

hfinkel accepted D30819: Fix PR32157 - Hoist fp division from the loops and replace by a reciprocal.

Formatting nit, otherwise LGTM.

Mar 25 2017, 7:00 AM
hfinkel added a comment to D31276: Add #pragma clang fp.

High-level comment ;)

#pragma clang fast_math contract_fast(on)

This seems a bit unfortunate because 'fast' appears twice? How are we planning on naming the other fast-math flags? Maybe we should just name it:

This is just pure laziness on my part, I was hoping that all these flags can be implemented with on/off but if you find it confusing that's a good indicator.

#pragma clang math constract_fast(on)

or

#pragma clang math contract(fast) // we could also accept off/on here for consistency and compatibility with the standard pragma

or maybe fp_math or floating_point_math or floating_point or fp instead of math.

I think that I prefer this last form (because it does not repeat 'fast' and also makes our extension a pure superset of the standard pragma).

What do you want to name the other flags? I'd prefer if they're grammatically consistent. Maybe we should stick closely to the command-line options, and have:

fp_contract(on/off/fast)
unsafe_optimizations(on/off)
finite_only(on/off)

What do you think?

I really like #pragma clang fp or fp_math because contraction feels different from the other fast-math flags. That said then we don't want to repeat fp in fp_contract.

We should probably have the full list to make sure it works though with all the FMFs. Here is a straw-man proposal:

UnsafeAlgebra          #pragma clang fp unsafe_optimizations(on/off)
NoNaNs                     #pragma clang fp no_nans(on/off)
NoInfs                        #pragma clang fp finite_only(on/off)
NoSignedZeros         #pragma clang fp no_signed_zeros(on/off)
AllowReciprocal        #pragma clang fp reciprocal_math
AllowContract           #pragma clang fp contract(on/off/fast)

The negative ones feel a bit strange... What do you think?

Mar 25 2017, 5:52 AM

Mar 24 2017

hfinkel added a comment to D30920: Do not pass -Os and -Oz to the Gold plugin.

Yes, we should do this. I don't understand why this is tricky. Actually, I think having these kinds of decisions explicit in the code of the transformations would be a positive development.

I think a high reason why I consider this tricky, it the part where I mentioned that a single pass can have different behavior/level depending on where it is in the pipeline.
But recently @joerg split SimplifyCFG into two wrapper passes, which may be a good way of achieving what you're describing.

Mar 24 2017, 2:50 PM
hfinkel added a comment to D30920: Do not pass -Os and -Oz to the Gold plugin.

The fundamental difference, is that Os/Oz especially are treated as optimizations directive that are independent of the pass pipeline: instructing that "loop unroll should not increase size" is independent of *where* is loop unroll inserted in the pipeline.

Mar 24 2017, 12:43 PM
hfinkel added inline comments to D31180: [LangRef] Introduce noreplace function attribute.
Mar 24 2017, 11:51 AM
hfinkel added a comment to D30819: Fix PR32157 - Hoist fp division from the loops and replace by a reciprocal.

A comment on the variable names; otherwise I think this looks good. Thanks!

Mar 24 2017, 10:16 AM
hfinkel accepted D31047: TTI: Split IsSimple in MemIntrinsicInfo.

LGTM

Mar 24 2017, 6:09 AM

Mar 23 2017

hfinkel added a comment to D31276: Add #pragma clang fp.

Thanks very much, Aaron! It would be great if you could also look at the three patches that add support for the generation of the new FMF 'contract' for -ffp-contract=fast. I've added them in the dependencies of this revision. (D31168 is not really a prerequisite but I am planning to commit the patches in this order.)

I'll take a look, but feel less qualified to give a LGTM to those.

NP, thanks for your help on this one. Hopefully @hfinkel or @mehdi_amini can take a look.

Mar 23 2017, 10:39 AM
hfinkel added a comment to D31276: Add #pragma clang fp.

High-level comment ;)

Mar 23 2017, 10:36 AM
hfinkel accepted D31285: [PPC] Add generated tests for all atomic operations.

Please strip comments like this "# =>This Inner Loop Header: Depth=1" from the check lines. We should be able to change those kinds of comments without affecting the tests. Otherwise, LGTM.

Mar 23 2017, 7:45 AM
hfinkel added inline comments to D31175: Improve TargetTransformInfo::getCFInstrCost().
Mar 23 2017, 7:41 AM
hfinkel added a comment to D31186: Changing TargetTransformInfo::getGEPCost to take GetElementPtrInst as parameter.
User stories:

An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

I think there is also the case in a vectorizer, where there is an existing instruction but the cost query is for the same instruction *vectorized* with VF.

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

No, not in the case in the vectorizer. Here the scalar instruction is passed, with vector types.

But that is that the vectorizer normally does. It takes the scalar instruction and emits the same instruction with vector types. You need to explain how any other case is different.

I meant that it is not quite true that it would be enough to just pass the Instruction without types, as a response to the statement that it just duplicates information. Currently LoopVectorizer passes types as arguments to TTI. Passing the instruction to TTI does not provide the Types, it only gives clues of the original instruction plus the possibility to inspect users / operands.

Mar 23 2017, 7:31 AM
hfinkel added a comment to D31186: Changing TargetTransformInfo::getGEPCost to take GetElementPtrInst as parameter.

...

Hal, you wrote:

I'll also point out that we're further walking down the path of costing instruction patterns

What do you mean "instruction patterns"?

Mar 23 2017, 6:17 AM
hfinkel added a comment to D31186: Changing TargetTransformInfo::getGEPCost to take GetElementPtrInst as parameter.
User stories:

An user wants to estimate the cost of an operation to make a decision whether it's worth to create it. The operation does not exist in IR.
An user has an operation in IR and wants to know the cost of the operation to estimate its contribution into execution.

I think there is also the case in a vectorizer, where there is an existing instruction but the cost query is for the same instruction *vectorized* with VF.

Jonas, an optional parameter duplicates the information passed through other parameters. It can provide all of the needed information. Also single API for all use cases might create some kind of misunderstanding.

No, not in the case in the vectorizer. Here the scalar instruction is passed, with vector types.

Mar 23 2017, 6:09 AM
hfinkel added inline comments to D31175: Improve TargetTransformInfo::getCFInstrCost().
Mar 23 2017, 6:06 AM

Mar 22 2017

hfinkel added a comment to D20116: Add speculatable function attribute.

It also breaks "code compression" type optimizations:

if (a == 1 || a == 2) {
  switch (a) {
  case 1:
    div(m, 1) speculatable;
    break;
  case 2:
    div(m, 2) speculatable;
    break;
  }
}

to

if (a == 1 || a == 2) {
  div(m, a) speculatable;
}

to

if (a == 1 || a == 2) {
  if (a == 0)
    div(m, 0) speculatable;

Your "code compression" optimization just introduced dead code ;)

Yea, I don't even know why I called it "code compression". :)

else
  div(m, a) speculatable;

}

I think that all of this is right, you can't apply some of these optimizations to call sites with the speculatable attribute. I agree, however, that we need to think carefully about how to define what speculatable means on an individual call site. Perhaps they're like convergent functions in this regard: you can't introduce new control dependencies (at least not in general).

I'd say as an initial step support for intrinsics that we _know_ are speculatable (like we _know_ that add is speculatable) can land without any further discussion.

Mar 22 2017, 5:57 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling.

That troubles (and worries) me as well.

Why? That's part of the promise we make when we tag the code as speculatable. We promise that doing the operation early, even in cases where it might otherwise have been performed, is fine. Furthermore, we're promising that any properties the call has (e.g. promising that a certain argument is nonnull) must not be control dependent. As a result, looking at it as dead code affecting live code is suboptimal; any other properties are just a kind of global assertion.

Making even the behavior of a program dependent on instructions that are never executed seems like a fundamentally new thing, and I'm not yet convinced that that's safe. It may be possible to come up with a consistent model of this new thing, but I think the model will still be tricky to work with.

For instance, certain kinds of devritualization optimizations are wrong in that model. Say we had:

void f() { }
void g() { 1 / 0; } // unused

void main() {
  fnptr p = f;
  *p() speculatable;
}

Now you're saying you can't specialize the call via function pointer like this:

void f() { }
void g() { 1 / 0; } // unused

void main() {
  fnptr p = f;
  if (p == g)
    g() speculatable;  // incorrect
  else
    *p() speculatable;
}

which seems odd.

There are also (somewhat more complex) cases like this that do not involve indirect speculatable calls:

struct Base {
  int k;
  Base(int k) : k(k) {}
};

struct S : public Base {
  S(bool c) : Base(c ? 10 : 20) {}
  virtual void f() {
    div(1, k) speculatable;  // k is either 10 or 20
  }
};

struct T : public Base {
  T() : Base(0) {}
  virtual void f() { }
};

void bug(Base* b) {
  b->f();
}

We have a problem in bug if we ever devirtualize, inline and hoist a load:

void bug(Base *b) {
  int k = b->k;
  if (b->type == S) {
    div(1, k) speculatable;
  } else (b->type == T) {
  }
}

It also breaks "code compression" type optimizations:

if (a == 1 || a == 2) {
  switch (a) {
  case 1:
    div(m, 1) speculatable;
    break;
  case 2:
    div(m, 2) speculatable;
    break;
  }
}

to

if (a == 1 || a == 2) {
  div(m, a) speculatable;
}

to

if (a == 1 || a == 2) {
  if (a == 0)
    div(m, 0) speculatable;
Mar 22 2017, 5:00 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

That troubles (and worries) me as well.

Why?

Irrational fear? ;-)
It seems unusual, and I'm cautious about introducing unusual properties in the compiler in general, it makes it harder to reason about "stuff" when there aren't "simple" rules to guide the logic.
Are there existing other cases of UB induced by unreachable/dead code?

I suppose it's the same as speculating a load from a pointer marked as dereferencable that isn't really, which is already done

Mar 22 2017, 4:28 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

That troubles (and worries) me as well.

Why?

Irrational fear? ;-)

Mar 22 2017, 4:17 PM
hfinkel added a comment to D20116: Add speculatable function attribute.

If we go with (2), then we've admitted that "dead code" (that is, code that is not run) can influence program behavior, which I find troubling.

That troubles (and worries) me as well.

Mar 22 2017, 3:23 PM
hfinkel added a comment to D20116: Add speculatable function attribute.
  1. It has undefined behavior, due to the incorrect speculatable attribute.

I don't see it as being anything other than this. Marking it as speculatable is saying it has no undefined behavior a condition could possibly be avoiding. If you are lying about this, you get what you get.

Mar 22 2017, 2:54 PM
hfinkel added a comment to D31239: [WIP] Add Caching of Known Bits in InstCombine.

Thanks for posting this, Hal!

I'll run some experiments with it, but in general I support this approach.

@rnk:

Long term, I agree with Danny that computing known bits on demand doesn't seem to scale. We might want to just do the analysis up front and see if that works better.

The problem with computing everything upfront is that at some point we might want to partially invalidate the results. Recomputing everything from scratch after it sounds inefficient, and if we add caching, we'll end up with something like what's proposed here. To me this looks very similar to what we have in SCEV (a map from expression to the corresponding analysis result, that we can invalidate and recompute whenever it's requested again), and I think SCEV has been working pretty well so far in this aspect.

Mar 22 2017, 2:54 PM
hfinkel added a comment to D31239: [WIP] Add Caching of Known Bits in InstCombine.
In D31239#707970, @rnk wrote:

We can't just keep passing more analysis pointers throughout our simplification utilities. We might want some wrapper of common, almost globally used analyses that optionally contains DL, TLI, DT, AC, etc.

Mar 22 2017, 2:53 PM
hfinkel added a comment to D31236: Refactor getHostCPUName to allow testing on non-native hardware..

I'm curious why you chose to take this approach rather than add some option that allows us to change the file name being read? If we do that, then we can test this with lit tests. I generally think of our practice as using mocking, and unit tests in general, for cases where lit tests aren't practical (or, to put it another way, the infrastructure necessary to enable them is more complicated than unit testing in C++). This does not seem to be the case here. It seems straightforward to make -proc-cpuinfo-file=/foo/bar/cpuinfo.txt (modulo bikeshed) work.

As I recall, there are several other places in Clang where this is also a problem (we have hard-coded file names for /etc/lsb-release, /etc/redhat-release, etc.).

I honestly hadn't thought this could fit in our lit testing framework. But maybe it could be made to do so as you outline above.
I see that tools/clang/unittests/Driver/DistroTest.cpp uses unittests with vfs::InMemoryFileSystem objects to mock the contents of /etc/lsb-release etc. I wasn't aware of the approach taken there, I'll take a closer look.

I think the main issue with a -proc-cpuinfo-file=%s command line option might be in which tool to attach it to. I guess it would have to be llc, run with -mcpu=native, and then somehow detecting the cpu it would set for code generation.

I think that might work for current functionality (returning a single CPU), but when we're trying to extend this to big.LITTLE systems, and introducing a call like getHostCPUNames(), returning all different cores in the system, I'm not sure anymore if it would be easily tested using "llc -mcpu=native", as it's unclear which core to pick for "native". FWIW, I expect getHostCPUNames() to initially mainly be used by JIT engines and they may make different choices than ahead-of-time compilers with -mcpu=native.

I'll look into this a bit further, but at the moment my feel is that testing via llc -mcpu=native may be too indirect for e.g. extending this API to big.LITTLE systems. I'm not sure if there is another tool already where we could test closer to getHostCPUName, but I don't think so.

Mar 22 2017, 8:52 AM
hfinkel added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

I did not realize that CHECK-NEXT always matched the whole line. That's interesting.

If you'd like to CHECK the prefix only you should use something like

CHECK: vextractf128 $1, %ymm2, %xmm5{{*}}

In any case, if this is an update_llc_test_checks.py problem, why don't you use it to update the tests?

I can use it to update the tests but it means I should update 530 tests. Is it acceptable? Should I do it? For me it is not a problem but is it OK for review?

Mar 22 2017, 8:44 AM
hfinkel added a comment to D30941: Better testing of schedule model instruction latencies/throughputs.

I did everything accordingly to Hal's requirements except one: the default value of "print-schedule" switch is false because otherewise we have "Unexpected Failures: 530" and it's X86 tests ony. The problem is very simple: update_llc_test_checks.py generates CHECKs like here

; XOP-AVX1-NEXT: vextractf128 $1, %ymm2, %xmm5

Mar 22 2017, 6:57 AM
hfinkel created D31239: [WIP] Add Caching of Known Bits in InstCombine.
Mar 22 2017, 6:33 AM