Today

timshen added inline comments to D41148: [libcxx] implement <experimental/simd> declarations based on P0214R7..
Wed, Dec 13, 12:48 PM
timshen updated the diff for D41148: [libcxx] implement <experimental/simd> declarations based on P0214R7..

Address comments:

  • Generator ctor is implementable
  • Format issues of tests
  • SFINAE on __is_non_narrowing_convertible for arithmetics only.
Wed, Dec 13, 12:47 PM
tpr added a comment to D41126: [SelectionDAG] Fixed f16-from-vector promotion problem.

Sorry, I mean the other way round: the fp16_to_fp has input i16 and result v1f32, and my theory was that this is illegal.

Wed, Dec 13, 12:47 PM
metzman updated the diff for D41193: [libFuzzer] Add dummy call of LLVMFuzzerTestOneInput to afl_driver..
  • Don't do dummy execution when executing files one-by-one.
Wed, Dec 13, 12:46 PM
tpr added a comment to D41126: [SelectionDAG] Fixed f16-from-vector promotion problem.

It can't handle it because it is a fp16_to_fp whose input is v1i16 and result is f32. My theory was that that is illegal, and I needed to fix where that was being generated, which is what I have done.

Wed, Dec 13, 12:46 PM
sunfish added a comment to D41073: Wasm: add support in libcxx.

Either way. I won't be able to get to it until next week, so feel free to land it earlier.

Wed, Dec 13, 12:42 PM
hfinkel added a comment to D41096: [X86] Initial support for prefer-vector-width function attribute.

The dependency isn't really an llvm dependency. You're absolutely right, that if it was just that it would be fixable by changing our instruction patterns. The problem here is about how the features and dependencies are defined in our instructions manuals. avx512f means "avx512 foundation", but only includes 512 bit instructions. The latter features imply the foundation is present.

A user may expect that if they use "-march=skylake-avx512" that they can use 512-bit intrinsics in x86intrin.h. But at the same time we should generate good performing code for other users who specify -march=skylake-avx512 and didn't write any vector code in their source. Based on the data we've seen so far, the best way to do that is to disable 512-bit instructions.

Understood. However, we can separate this within the implementation. Specifically, Clang does not need to tag all generated functions with the same target attribute. On the LLVM side, we can separate these, and then use that finer-grained control.

So what I'm looking for a solution that optimizes for the likely case that the user code doesn't contain 512-bit vector intrinsics and tells the legalizer to use 256-bit registers only. But if the user code does contains explicit 512-bit intrinsics, we still allow that code to compile and ideally generate what the user expected. I don't want existing 512-bit intrinsic code to suddenly stop compiling with -march=skylake-avx512.

I think that we're on the same page. In addition to intrinsics, I'm also worried about OpenMP SIMD loops/functions (or other explicit vectorization). We may want those to also give 512-bit vectors by default (presumably, such specifically-tagged loops are likely to run long enough to amortize the clock-rate effects).

getHostCPUName/getHostCPUFeatures is called by the clang driver long before any code is parsed. How would it know if the code contained any 512-bit instructions?

I recommend that we do this in Clang during CodeGen. We should add a special callback that will allow TargetInfo to examine the AST and adjust the target (or target features) on a per-function basis. Any function using AXV-512 intrinsics on 512-bit vectors, explicit vector types, or OpenMP SIMD (unless a small simdlen clause is present) will stay as-is, and everything else will be modified to turn off 512-bit vectors.

I definitely don't want to change the target-cpu string added by clang. Maybe a target feature. This problem may continue in follow on CPUs after skx and I don't want to start coding a specific list of CPUs into clang. We already have more information about CPU feature mapping in clang than I think we would really like.

The real question is whether to block inlining on mismatch here. I don't think that we should (and we'll need to enable 512-bit vectors in the caller). The problem is that people write C++ wrappers around vector intrinsics, and we need the compiler to remove the abstraction layer. Generating that code poorly will be a significant problem. This will have the unfortunate "action at a distance" effects we discussed earlier (because having some 512-bit vectors in some function, even after inlining, will suddenly enable it elsewhere in the function), but I don't see any good way to prevent that in undesirable cases without causing significant problems elsewhere.

I agree we don't want to block inlining on a mismatch. Do we have a way to allow targets to control the merging behavior? If we do this as part of the "target-feature" or "target-cpu" attribute we would need custom merging.

We don't currently, we have only areInlineCompatible in TTI. This is called like this:

return TTI.areInlineCompatible(Caller, Callee) &&
       AttributeFuncs::areInlineCompatible(*Caller, *Callee);

we also have a:

AttributeFuncs::mergeAttributesForInlining(*Caller, *Callee);

adding a corresponding TTI function and calling it in the two places where AttributeFuncs::mergeAttributesForInlining is called would be straightforward.

Alternatively, I was thinking about a separate "required-vector-width" attribute. Either clang codegen, or an early IR pass

I prefer that we do this in Clang's CodeGen. We just don't have enough information at the IR level to differentiate between things the user explicitly requested and things that have been added by some earlier stage automatically (plus, the pass method would need to rely on pass injection, or similar, and that won't work with frontends with custom pipelines anyway).

Does "pass injection" here mean having a hook to put in a target specific pass or something else?

Wed, Dec 13, 12:42 PM
ruiu accepted D41163: [LLD] [COFF] Error out if 20 bit thumb branches are out of range.

We should use error() instead of fatal() in this case because the error is not caused by corrupted input files. That said, this patch is LGTM because we use fatal() for other relocations in this file.

Wed, Dec 13, 12:42 PM
cryptoad committed rL320611: [scudo] Adding a public Scudo interface.
[scudo] Adding a public Scudo interface
Wed, Dec 13, 12:42 PM
cryptoad committed rCRT320611: [scudo] Adding a public Scudo interface.
[scudo] Adding a public Scudo interface
Wed, Dec 13, 12:42 PM
cryptoad closed D41128: [scudo] Adding a public Scudo interface.
Wed, Dec 13, 12:42 PM
sunfish added a comment to D40759: [WebAssembly] Implement @llvm.global_ctors and @llvm.global_dtors.

Omitting the start section part for now is ok with me. It should be sufficient to just remove the Priority == UINT16_MAX special cases.

Wed, Dec 13, 12:41 PM
brzycki retitled D40146: [JumpThreading] Preservation of DT and LVI across the pass from [JumpThreading] Deferred preservation of DT and LVI across the pass. to [JumpThreading] Preservation of DT and LVI across the pass.
Wed, Dec 13, 12:41 PM
tejohnson added a comment to D41113: [LLVMgold] Don't set resolutions for undefined symbols to 'Prevailing'.
In D41113#954157, @pcc wrote:

This seems fine to me with the assert.

There's even a probable current issue - if the prevailing def in the asm is marked dead, anything reachable from it might be marked dead as well (i.e. if the only use is in that def).

Module asm definitions do not refer to anything at the summary level, so it does not matter if we mark them as dead. Any symbols that are referenced from module inline asm are GC roots because of D32544.

Wed, Dec 13, 12:41 PM
sbc committed rLLD320610: Add missing reference to lldCommon in MinGW/CMakeLists.txt.
Add missing reference to lldCommon in MinGW/CMakeLists.txt
Wed, Dec 13, 12:39 PM
sbc committed rL320610: Add missing reference to lldCommon in MinGW/CMakeLists.txt.
Add missing reference to lldCommon in MinGW/CMakeLists.txt
Wed, Dec 13, 12:39 PM
Diffusion closed D41194: Add missing reference to lldCommon in MinGW/CMakeLists.txt.
Wed, Dec 13, 12:39 PM
ruiu accepted D41194: Add missing reference to lldCommon in MinGW/CMakeLists.txt.

LGTM

Wed, Dec 13, 12:38 PM
kpw added a comment to D41153: [XRay][compiler-rt] Coalesce calls to mprotect to reduce patching overhead.

Suggestion for future changes:

Wed, Dec 13, 12:38 PM
sbc100 updated the summary of D41194: Add missing reference to lldCommon in MinGW/CMakeLists.txt.
Wed, Dec 13, 12:38 PM
sbc100 created D41194: Add missing reference to lldCommon in MinGW/CMakeLists.txt.
Wed, Dec 13, 12:37 PM
efriedma added inline comments to D41149: [CodeGen] Specialize mixed-sign mul-with-overflow (fix PR34920).
Wed, Dec 13, 12:34 PM
craig.topper added inline comments to D40478: Added Instrument Control Flow Flag.
Wed, Dec 13, 12:29 PM
hfinkel added inline comments to D40196: Add hooks to MachineLICM to hoist invariant stores..
Wed, Dec 13, 12:27 PM
smeenai added a comment to D41157: [cmake] Explicitly set VS 2017 compatibility.

The default is used only when the MSVC installation cannot be found. Successful compilation with clang-cl depends on finding the MSVC toolset. So either you have a toolset installed that's recent enough to compile or you don't, in which case overriding the compatibility level won't make much difference. Right? Is this because the detection method changed and that the old method won't find VC 2017?

Wed, Dec 13, 12:24 PM
clayborg accepted D41169: ObjectFile: remove ReadSectionData/MemoryMapSectionData mutual recursion.

This was left over from before we mmap'ed the entire object file into memory. Removing it is fine as the backing DataBufferSP for the object file will be mmaped or not depending on where the file was loaded from and if the section isn't compressed, we will just hand out a shared slice of the object file data.

Wed, Dec 13, 12:22 PM
metzman added a reviewer for D41193: [libFuzzer] Add dummy call of LLVMFuzzerTestOneInput to afl_driver.: kcc.
Wed, Dec 13, 12:20 PM
metzman created D41193: [libFuzzer] Add dummy call of LLVMFuzzerTestOneInput to afl_driver..
Wed, Dec 13, 12:19 PM
halyavin added a comment to D40775: [libcxx] Add underscores to win32 locale headers..

@EricWF , could you please look at this change? It doesn't have any functional changes.

Wed, Dec 13, 12:16 PM
benhamilton updated the diff for D41074: [ClangFormat] ObjCSpaceBeforeProtocolList should be true in the google style.

Rebase

Wed, Dec 13, 12:15 PM
vsapsai added a comment to D39133: [Sema] Better fix for tags defined inside an enumeration (PR28903)..

Ping. The change can be still applied on trunk without changes, all tests are passing.

Wed, Dec 13, 12:14 PM
sdardis added inline comments to D39074: [libunwind][MIPS]: Add support for unwinding in N32 processes..
Wed, Dec 13, 12:13 PM
sbc100 accepted D40759: [WebAssembly] Implement @llvm.global_ctors and @llvm.global_dtors.

I'd like to land this soon if possible (sans the start section if possible) so I start working on the lld side.

Wed, Dec 13, 12:13 PM
craig.topper added a comment to D41096: [X86] Initial support for prefer-vector-width function attribute.

The dependency isn't really an llvm dependency. You're absolutely right, that if it was just that it would be fixable by changing our instruction patterns. The problem here is about how the features and dependencies are defined in our instructions manuals. avx512f means "avx512 foundation", but only includes 512 bit instructions. The latter features imply the foundation is present.

A user may expect that if they use "-march=skylake-avx512" that they can use 512-bit intrinsics in x86intrin.h. But at the same time we should generate good performing code for other users who specify -march=skylake-avx512 and didn't write any vector code in their source. Based on the data we've seen so far, the best way to do that is to disable 512-bit instructions.

Understood. However, we can separate this within the implementation. Specifically, Clang does not need to tag all generated functions with the same target attribute. On the LLVM side, we can separate these, and then use that finer-grained control.

So what I'm looking for a solution that optimizes for the likely case that the user code doesn't contain 512-bit vector intrinsics and tells the legalizer to use 256-bit registers only. But if the user code does contains explicit 512-bit intrinsics, we still allow that code to compile and ideally generate what the user expected. I don't want existing 512-bit intrinsic code to suddenly stop compiling with -march=skylake-avx512.

I think that we're on the same page. In addition to intrinsics, I'm also worried about OpenMP SIMD loops/functions (or other explicit vectorization). We may want those to also give 512-bit vectors by default (presumably, such specifically-tagged loops are likely to run long enough to amortize the clock-rate effects).

getHostCPUName/getHostCPUFeatures is called by the clang driver long before any code is parsed. How would it know if the code contained any 512-bit instructions?

I recommend that we do this in Clang during CodeGen. We should add a special callback that will allow TargetInfo to examine the AST and adjust the target (or target features) on a per-function basis. Any function using AXV-512 intrinsics on 512-bit vectors, explicit vector types, or OpenMP SIMD (unless a small simdlen clause is present) will stay as-is, and everything else will be modified to turn off 512-bit vectors.

I definitely don't want to change the target-cpu string added by clang. Maybe a target feature. This problem may continue in follow on CPUs after skx and I don't want to start coding a specific list of CPUs into clang. We already have more information about CPU feature mapping in clang than I think we would really like.

The real question is whether to block inlining on mismatch here. I don't think that we should (and we'll need to enable 512-bit vectors in the caller). The problem is that people write C++ wrappers around vector intrinsics, and we need the compiler to remove the abstraction layer. Generating that code poorly will be a significant problem. This will have the unfortunate "action at a distance" effects we discussed earlier (because having some 512-bit vectors in some function, even after inlining, will suddenly enable it elsewhere in the function), but I don't see any good way to prevent that in undesirable cases without causing significant problems elsewhere.

I agree we don't want to block inlining on a mismatch. Do we have a way to allow targets to control the merging behavior? If we do this as part of the "target-feature" or "target-cpu" attribute we would need custom merging.

We don't currently, we have only areInlineCompatible in TTI. This is called like this:

return TTI.areInlineCompatible(Caller, Callee) &&
       AttributeFuncs::areInlineCompatible(*Caller, *Callee);

we also have a:

AttributeFuncs::mergeAttributesForInlining(*Caller, *Callee);

adding a corresponding TTI function and calling it in the two places where AttributeFuncs::mergeAttributesForInlining is called would be straightforward.

Alternatively, I was thinking about a separate "required-vector-width" attribute. Either clang codegen, or an early IR pass

I prefer that we do this in Clang's CodeGen. We just don't have enough information at the IR level to differentiate between things the user explicitly requested and things that have been added by some earlier stage automatically (plus, the pass method would need to rely on pass injection, or similar, and that won't work with frontends with custom pipelines anyway).

Wed, Dec 13, 12:08 PM
K-ballo added inline comments to D33776: [libcxx] LWG2221: No formatted output operator for nullptr.
Wed, Dec 13, 12:06 PM
sbc100 added a comment to D41073: Wasm: add support in libcxx.

Do you want to land this dan? Or I can if you prefer.

Wed, Dec 13, 12:05 PM
juliehockett added inline comments to D40813: [clang-tidy] Adding Fuchsia checker for virtual inheritance.
Wed, Dec 13, 12:04 PM · Restricted Project
efriedma added a comment to D41184: [BuildLibCalls] Cast length argument to the correct integer type.

Instead of scattering casts all over the place, can we just fix TargetLibraryInfoImpl::isValidProtoForLibFunc to reject libcalls which use integers with the wrong width?

Wed, Dec 13, 12:03 PM
asbirlea updated the diff for D38862: Add must alias info to ModRefInfo..

Remove remaining MRI_ instances.

Wed, Dec 13, 12:03 PM
rafael created D41192: Use protected visibility in llvm.
Wed, Dec 13, 12:03 PM