Page MenuHomePhabricator

weiwang (Wei Wang)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 20 2020, 2:20 PM (139 w, 1 d)

Recent Activity

Today

weiwang added a comment to D146543: [Coroutines] Look for dbg.declare for temp spills.

In fact, we did similar things in the downstream too. But we didn't contribute it since we feel it might not be so good. Since it is a little bit dirty and it is natural that the debug information is missing. If you still want to do this, I feel CoroCloner::salvageDebugInfo() may be a better place.

Wed, Mar 22, 1:27 PM · Restricted Project, Restricted Project

Yesterday

weiwang retitled D146543: [Coroutines] Look for dbg.declare for temp spills from debug info to [Coroutines] Look for dbg.declare for temp spills.
Tue, Mar 21, 10:09 AM · Restricted Project, Restricted Project
weiwang requested review of D146543: [Coroutines] Look for dbg.declare for temp spills.
Tue, Mar 21, 10:01 AM · Restricted Project, Restricted Project

Thu, Mar 16

weiwang committed rG013f6d23e692: [Coroutines] Add remarks in CoroSplit and CoroElide passes (authored by weiwang).
[Coroutines] Add remarks in CoroSplit and CoroElide passes
Thu, Mar 16, 9:22 AM · Restricted Project, Restricted Project
weiwang closed D146175: [Coroutines] Add remarks in CoroSplit and CoroElide passes.
Thu, Mar 16, 9:22 AM · Restricted Project, Restricted Project

Wed, Mar 15

weiwang updated the diff for D146175: [Coroutines] Add remarks in CoroSplit and CoroElide passes.

update test.

Wed, Mar 15, 4:51 PM · Restricted Project, Restricted Project
weiwang retitled D146175: [Coroutines] Add remarks in CoroSplit and CoroElide passes from coro remarks to [Coroutines] Add remarks in CoroSplit and CoroElide passes.
Wed, Mar 15, 4:10 PM · Restricted Project, Restricted Project
weiwang requested review of D146175: [Coroutines] Add remarks in CoroSplit and CoroElide passes.
Wed, Mar 15, 4:06 PM · Restricted Project, Restricted Project

Tue, Feb 28

weiwang committed rGce7eb2e05544: [Coroutines] Avoid creating conditional cleanup markers in suspend block (authored by weiwang).
[Coroutines] Avoid creating conditional cleanup markers in suspend block
Tue, Feb 28, 3:32 PM · Restricted Project, Restricted Project
weiwang closed D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.
Tue, Feb 28, 3:31 PM · Restricted Project, Restricted Project
weiwang updated the diff for D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.

add target triple

Tue, Feb 28, 10:14 AM · Restricted Project, Restricted Project

Mon, Feb 27

weiwang updated the diff for D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.

add test case and some comments

Mon, Feb 27, 4:16 PM · Restricted Project, Restricted Project
weiwang added a comment to D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.

BTW, what is the conclusion for the concern about sanitizers? Would change make sanitizers to perform false positive diagnostics?

Mon, Feb 27, 11:19 AM · Restricted Project, Restricted Project

Thu, Feb 23

weiwang added a reviewer for D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block: bruno.
Thu, Feb 23, 3:18 PM · Restricted Project, Restricted Project
weiwang updated the summary of D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.
Thu, Feb 23, 3:17 PM · Restricted Project, Restricted Project
weiwang added a reviewer for D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block: ChuanqiXu.
Thu, Feb 23, 3:14 PM · Restricted Project, Restricted Project
weiwang updated the summary of D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.
Thu, Feb 23, 3:13 PM · Restricted Project, Restricted Project
weiwang requested review of D144680: [Coroutines] Avoid creating conditional cleanup markers in suspend block.
Thu, Feb 23, 3:00 PM · Restricted Project, Restricted Project

Sep 13 2022

weiwang committed rG5529a7524ad7: [ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is… (authored by weiwang).
[ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is…
Sep 13 2022, 2:36 PM · Restricted Project
weiwang closed D133740: [ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is linked.
Sep 13 2022, 2:36 PM · Restricted Project, Restricted Project
weiwang updated subscribers of D133740: [ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is linked.

For Bazel like build systems, it seems weird to have one object file in two modes, lazy and non-lazy. Do you know why it happens for your build system?
I think the change will improve robustness of lld's distributed ThinLTO usage so we probably need to make the change.

Sep 13 2022, 9:22 AM · Restricted Project, Restricted Project

Sep 12 2022

weiwang updated the diff for D133740: [ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is linked.

address comments

Sep 12 2022, 9:54 PM · Restricted Project, Restricted Project
weiwang updated the summary of D133740: [ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is linked.
Sep 12 2022, 5:42 PM · Restricted Project, Restricted Project
weiwang requested review of D133740: [ELF][Distributed ThinLTO] Do not generate empty index when bitcode object is linked.
Sep 12 2022, 5:36 PM · Restricted Project, Restricted Project

Aug 2 2022

weiwang added a comment to D130975: --thinlto-full-index to index both native and bitcode objects.

Thanks for the quick feedback!

I considered this approach. I went with D130229 as I have thought of some minor issues:

  • The requires clang -nostdlib ...
  • An archive member cannot be serialized as file->getName(). Actually lld doesn't support syntax reading just an archive member instead of the whole archive. As an example, glibc libc.so reads libc_nonshared.a which contains elf-init.oS and the patch will incorrectly print elf-init.oS. This approach assuredly needs an option doing something similar to --remapping-file= in spirit, but more involved.
  • A build system needs to recognize all input files, and remove them for the final native link. This is nearly mostly feasible with Bazel/Buck style build systems, but not with others. Even in Bazel/Buck, I don't think linkopts = ["file_added_as_an_option_instead_of_src_or_deps.o"] can be recognized.

The above is off the top of my head.

Your concerns are valid. I encountered them as well during local testing.

  • Our internal build system (buck) applies -nostdlib throughout linking, so this requirement is satisfied.
  • There are no direct support to extract specific members from archive. We are updating build system to make all regular archives into thin archive (--[start|end]-lib). With that done, archive members can be accessed directly. And you are right about elf-init.oS. During my local testing, I had to extract it manually, so some work needs to be done here. In a more general approach, formatting the index with ("archivename", "object") pair and using a separate flag to parse the index (like "--remapping-file=") is also doable.
  • The index file contains line separated input file list, and can be used directly as response file in the final link (I should've just used @thinlto.index.full in the test case provided).
Aug 2 2022, 2:18 PM · Restricted Project, Restricted Project
weiwang added a reviewer for D130975: --thinlto-full-index to index both native and bitcode objects: tejohnson.
Aug 2 2022, 10:17 AM · Restricted Project, Restricted Project
weiwang updated subscribers of D130975: --thinlto-full-index to index both native and bitcode objects.
Aug 2 2022, 3:05 AM · Restricted Project, Restricted Project
weiwang retitled D130975: --thinlto-full-index to index both native and bitcode objects from thinlto-full-index to --thinlto-full-index to index both native and bitcode objects.
Aug 2 2022, 3:03 AM · Restricted Project, Restricted Project
weiwang added a comment to D130229: [ELF] Add --thinlto-index= and --remapping-file=.

The current implicit ThinLTO takes the simplest approach by inserting lto.tmp at the end, breaking ordering.
I don't think the scheme is set in stone and changing distributed ThinLTO to behave like it is probably not the right direction.
On the other hand, it's non-trivial to fix its ordering. Therefore, I think making distributed ThinLTO and implicit ThinLTO have very similar symbol resolution behaviors is a stretch goal. If --thinlto-index= doesn't behave like implicit ThinLTO, I don't think it is a design flaw.

Aug 2 2022, 2:52 AM · Restricted Project, Restricted Project
weiwang requested review of D130975: --thinlto-full-index to index both native and bitcode objects.
Aug 2 2022, 2:45 AM · Restricted Project, Restricted Project

Jul 22 2022

weiwang added a comment to D130229: [ELF] Add --thinlto-index= and --remapping-file=.

I'm not sure how this solution works because symbol resolution before and after importing aren't guaranteed to produce same result even if input order is identical.

Say we have foo in a.o, which references bar in b.o (lazy object). Assuming a.o imports bar, before importing, b.o gets picked because of that foo->bar reference, but the reference will be gone after importing and we may not pick b.o during final linking. The ripple effect from dropping b.o can disturb other symbol's resolution, making final link and thin link diverge.

Imported functions have available_externally linkage and are not considered definitions in the IR symbol table. There is no symbol resolution difference in your example.

I actually meant importing + inlining. After importing and inlining foo->bar, the reference to bar will be gone in a.o output from thinlto codegen, so the input for final linking won't see the reference for bar, right? Does that not change final linking symbol resolution still?

This is my concern too. See discussion on https://reviews.llvm.org/D22356 (there's more discussion on llvm-commits that didn't make it into phab somehow), and the test case included there. That approach was abandoned in favor of the params file based on the extensive discussion there.

As mentioned I had this concern but I am not convinced that forcing a non-lazy state is a necessity. With more fiddling I figured that the simple approach as implemented in this patch likely suffices.

The D22356 and D22467 examples work with --thinlto-index= without forcing the non-lazy state.

% fld.lld t.o --start-lib t2.o --end-lib --start-lib t3.o --end-lib --start-lib t4.o --end-lib -t -e main -mllvm -import-instr-limit=4 --thinlto-index=a.map
% fld.lld t.o.5.precodegen.bc --start-lib t2.o.5.precodegen.bc --end-lib --start-lib t3.o.5.precodegen.bc --end-lib --start-lib t4.o --end-lib -t --remapping-file=a.map -e main
t.o.5.precodegen.bc
t3.o.5.precodegen.bc
t2.o.5.precodegen.bc

It presumably may not work with gold because GNU ld/gold has a different archive member extraction semantics (see https://lld.llvm.org/ELF/warn_backrefs.html).
lld adopts macOS ld64/link.exe's "first lazy object wins" approach and in practice has more stable behaviors.
I suspect lld just needs to drop --warn-backrefs for the final native link in certain cases, but the sacrifice is suitable to me.
(If we re-generate an order depending on -t output for the final native link, --warn-backrefs will not be useful anyway.)

Jul 22 2022, 3:11 PM · Restricted Project, Restricted Project

Jul 21 2022

weiwang added a comment to D130229: [ELF] Add --thinlto-index= and --remapping-file=.

If there is a case, we probably have to give --remapping-file additional semantics that a specified file ignores the surrounding --start-lib. This makes --remapping-file less generic, but we can probably add the third column to indicate whether --start-lib should be ignored.

Can we just have the file generated by thinlto-index-only= also include native objects?

I agree that making thinlto-index-only= to produce the entire input list, both prebuilt native and bitcode-compiled native, is a less disruptive change to still keep the current cmdline options while making it more general. I think we can probably extend the the index file output to also include --start-lib/--end-lib grouping to that the final link can just use it as is.

Actually we specifically don't want to do that (include the selected objects in --start-lib/--end-lib in the index output), we already know which files the linker chose to include from the archive groupings and we don't want to redo that logic which could theoretically come to a different conclusion.

Jul 21 2022, 1:50 PM · Restricted Project, Restricted Project
weiwang added a comment to D130229: [ELF] Add --thinlto-index= and --remapping-file=.

If there is a case, we probably have to give --remapping-file additional semantics that a specified file ignores the surrounding --start-lib. This makes --remapping-file less generic, but we can probably add the third column to indicate whether --start-lib should be ignored.

Can we just have the file generated by thinlto-index-only= also include native objects?

Jul 21 2022, 12:30 PM · Restricted Project, Restricted Project

Apr 27 2022

weiwang committed rG26a0d53b1544: [CHR] Skip region containing llvm.coro.id (authored by weiwang).
[CHR] Skip region containing llvm.coro.id
Apr 27 2022, 10:28 AM · Restricted Project, Restricted Project
weiwang closed D124418: [CHR] Skip region containing llvm.coro.id.
Apr 27 2022, 10:27 AM · Restricted Project, Restricted Project
weiwang updated the diff for D124418: [CHR] Skip region containing llvm.coro.id.

minior update to test

Apr 27 2022, 10:25 AM · Restricted Project, Restricted Project
weiwang updated the diff for D124418: [CHR] Skip region containing llvm.coro.id.

clang-format

Apr 27 2022, 9:57 AM · Restricted Project, Restricted Project
weiwang updated the diff for D124418: [CHR] Skip region containing llvm.coro.id.

comments and fix typo

Apr 27 2022, 9:55 AM · Restricted Project, Restricted Project

Apr 26 2022

weiwang added a comment to D124418: [CHR] Skip region containing llvm.coro.id.

Hi, could you elaborate more what CHR does and why we need to skip coro.id? I don't know what happened now

Apr 26 2022, 1:53 PM · Restricted Project, Restricted Project

Apr 25 2022

weiwang added a reviewer for D124418: [CHR] Skip region containing llvm.coro.id: ChuanqiXu.
Apr 25 2022, 4:49 PM · Restricted Project, Restricted Project
weiwang updated the summary of D124418: [CHR] Skip region containing llvm.coro.id.
Apr 25 2022, 2:55 PM · Restricted Project, Restricted Project
weiwang requested review of D124418: [CHR] Skip region containing llvm.coro.id.
Apr 25 2022, 2:01 PM · Restricted Project, Restricted Project

Mar 30 2022

weiwang updated the diff for D122759: [time-report] Add timers to codegen actions.

fix typo

Mar 30 2022, 2:07 PM · Restricted Project, Restricted Project
weiwang updated the summary of D122759: [time-report] Add timers to codegen actions.
Mar 30 2022, 2:04 PM · Restricted Project, Restricted Project
weiwang requested review of D122759: [time-report] Add timers to codegen actions.
Mar 30 2022, 1:09 PM · Restricted Project, Restricted Project

Jan 21 2022

weiwang committed rG55d887b83364: [time-trace] Add optimizer and codegen regions to NPM (authored by weiwang).
[time-trace] Add optimizer and codegen regions to NPM
Jan 21 2022, 8:21 PM
weiwang closed D117605: [time-trace] Add optimizer and codegen regions to NPM.
Jan 21 2022, 8:21 PM · Restricted Project
weiwang added a reviewer for D117605: [time-trace] Add optimizer and codegen regions to NPM: bruno.
Jan 21 2022, 11:17 AM · Restricted Project

Jan 18 2022

weiwang requested review of D117605: [time-trace] Add optimizer and codegen regions to NPM.
Jan 18 2022, 2:12 PM · Restricted Project

Nov 19 2021

weiwang committed rGa075d6722283: [Sema] fix nondeterminism in ASTContext::getDeducedTemplateSpecializationType (authored by weiwang).
[Sema] fix nondeterminism in ASTContext::getDeducedTemplateSpecializationType
Nov 19 2021, 1:30 PM
weiwang closed D112481: [Sema] fix nondeterminism in ASTContext::getDeducedTemplateSpecializationType.
Nov 19 2021, 1:30 PM · Restricted Project, Restricted Project

Oct 27 2021

weiwang added a comment to D112481: [Sema] fix nondeterminism in ASTContext::getDeducedTemplateSpecializationType.

This issue has been blocking our internal module re-enablement for some time now, and we really appreciate any feedback. We also wonder if only DeducedTemplateSpecializationType is affected or it could also happen to other types.

Oct 27 2021, 10:09 PM · Restricted Project, Restricted Project

Oct 25 2021

weiwang committed rGb283d55c90dd: [openmp] Emit deferred diag only when device compilation presents (authored by weiwang).
[openmp] Emit deferred diag only when device compilation presents
Oct 25 2021, 11:19 AM
weiwang closed D109175: [openmp] Emit deferred diag only when device compilation presents.
Oct 25 2021, 11:19 AM · Restricted Project

Sep 15 2021

weiwang added a comment to D109175: [openmp] Emit deferred diag only when device compilation presents.

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Shall we go ahead and get this change in and think about more longer term solution later?

LGTM. This patch should be sufficient to limit deferred diags to OpenMP with offloading. Device compilation is covered by OpenMPIsDevice and host compilation is covered by !LangOpts.OMPTargetTriples.empty(). I will leave the decision to Johannes.

Sep 15 2021, 9:59 AM · Restricted Project

Sep 8 2021

weiwang added a comment to D109175: [openmp] Emit deferred diag only when device compilation presents.

I agree with Johannes and Alexey that deferred diags are only needed when LangOpts.OMPTargetTriples.empty(). However, I am not sure whether it is only needed in device compilation.

For other offloading languages like CUDA/HIP it is needed in both device and host compilation.

Technically, we might even want to delay in host only mode for OpenMP, but that is something we can revisit (e.g., by dynamically setting a flag based on the directives we've seen).
@yaxunl Should we for now check if there is any associated offload job?

Sep 8 2021, 10:06 AM · Restricted Project

Sep 3 2021

weiwang retitled D109175: [openmp] Emit deferred diag only when device compilation presents from [openmp] Add clang cc1 option -fopenmp-skip-deferred-diags to [openmp] Emit deferred diag only when device compilation presents.
Sep 3 2021, 11:18 AM · Restricted Project
weiwang updated the diff for D109175: [openmp] Emit deferred diag only when device compilation presents.

update as discussed.

Sep 3 2021, 11:17 AM · Restricted Project

Sep 2 2021

weiwang added a comment to D109175: [openmp] Emit deferred diag only when device compilation presents.

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

I don't think we are selective right now. As I was saying, disable deferred parsing if fopenmp-targets is missing, no need for this option.

Sep 2 2021, 2:00 PM · Restricted Project
weiwang added a comment to D109175: [openmp] Emit deferred diag only when device compilation presents.

Why do we need this flag, is the absence of -fopenmp-targets not sufficient?

Just double checked, this is the full omp related options currently in use:

"-fopenmp"
"-fopenmp-version=31"
"-fopenmp-version=31"
"-fopenmp-cuda-parallel-target-regions"

We saw a huge number of DECLS_TO_CHECK_FOR_DEFERRED_DIAGS records. I don't know if this has anything to do with omp version being 31, since prior 5.0, everything is available on host.

Sep 2 2021, 1:24 PM · Restricted Project
weiwang added a comment to D109175: [openmp] Emit deferred diag only when device compilation presents.

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Is that with -fopenmp or without?
That seems, kinda a lot more than i would have expected,
perhaps there are some other ways to reduce the overhead other than this approach?

Sep 2 2021, 11:45 AM · Restricted Project
weiwang added a comment to D109175: [openmp] Emit deferred diag only when device compilation presents.

Our internal codebase never uses the target directive. Once the deferred diags is bypassed, we observed 18% e2e build time improvement.

Sep 2 2021, 10:57 AM · Restricted Project
weiwang updated subscribers of D109175: [openmp] Emit deferred diag only when device compilation presents.
Sep 2 2021, 10:50 AM · Restricted Project
weiwang added a reviewer for D109175: [openmp] Emit deferred diag only when device compilation presents: yaxunl.
Sep 2 2021, 10:49 AM · Restricted Project
weiwang requested review of D109175: [openmp] Emit deferred diag only when device compilation presents.
Sep 2 2021, 10:47 AM · Restricted Project

Aug 26 2021

weiwang added a comment to D70172: [CUDA][HIP][OpenMP] Emit deferred diagnostics by a post-parsing AST travese.

Hi @yaxunl! I'm working on upgrading a large codebase from LLVM-9 to LLVM-12. I noticed on average 10% compilation speed regression that seems to be caused this change. We use Clang modules and historically provide -fopenmp compiler flag by default. The problem seems to be that compiling and importing modules is now slower, with the generated modules size increased by 2X. llvm-bcanalyzer tool shows that it's dominated by DECLS_TO_CHECK_FOR_DEFERRED_DIAGS. If I understand it right, your change is only relevant when target offloading is used. I inspected all of #pragma omp directives and can confirm that we don't use it.

I see that most of this code is gated by OpenMP flag. I wonder if there is a finer grain way to enable openmp parallel code generation without target offloading? Would it make sense to extend this code to check if -fopenom-targets is set before recording DECLS_TO_CHECK_FOR_DEFERRED_DIAGS?

Note, this was measured with @weiwang's https://reviews.llvm.org/D101793.

Aug 26 2021, 10:01 AM · Restricted Project

May 21 2021

weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

Thanks for the approval!

Just want to understand the list of "decls to check for deferred diagnostics" better, where are these decls coming from? And why do they need to be checked for warnings? I see decls from libc are in the list, but I have no idea why are they selected.

For offloading languages e.g. OpenMP/CUDA/HIP, there are apparent errors in functions shared between host and device. However, unless these functions are sure to be emitted on device or host, these errors should not be emitted. These errors are so called deferred error messages. The function decls which need to be checked are recorded. After AST is finalized, the AST of these functions are iterated. If a function is found sure to be emitted, the deferred error message in it are emitted.

May 21 2021, 10:05 AM · Restricted Project

May 20 2021

weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

Thanks for the approval!

May 20 2021, 4:14 PM · Restricted Project
weiwang committed rGe6b8320c0a63: [clang][AST] Improve AST Reader/Writer memory footprint (authored by weiwang).
[clang][AST] Improve AST Reader/Writer memory footprint
May 20 2021, 3:35 PM
weiwang closed D101793: [clang][AST] Improve AST Reader/Writer memory footprint.
May 20 2021, 3:35 PM · Restricted Project
weiwang updated the diff for D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

make both ASTReader::DeclsToCheckForDeferredDiags and Sema::DeclsToCheckForDeferredDiags SmallSetVector

May 20 2021, 3:00 PM · Restricted Project
weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

Tried to make Sema::DeclsToCheckForDeferredDiags llvm::SmallSetVector. The heap RSS did drop significantly (from peak 100GB to 59GB) , but not as good as the current fix (peak 26GB), which makes ASTReader::DeclsToCheckForDeferredDiags llvm::SmallSetVector.

I think the reason is that the duplicated decls are read from multiple module file sources (ASTReader::ReadAST() -> ASTReader::ReadASTBlock()), then stored into ASTReader::DeclsToCheckForDeferredDiags, then goes into Sema::DeclsToCheckForDeferredDiags in ASTReader::ReadDeclsToCheckForDeferredDiags(). Doing dedup at the early stage when the decls were just read in ASTReader is more effective at reducing RSS.

What if you use SmallSetVector for both Sema::DeclsToCheckForDeferredDiags and ASTReader::DeclsToCheckForDeferredDiags? Does it cause extra memory usage compared to using it only for ASTReader::DeclsToCheckForDeferredDiags? Thanks.

May 20 2021, 1:25 PM · Restricted Project

May 19 2021

weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

Tried to make Sema::DeclsToCheckForDeferredDiags llvm::SmallSetVector. The heap RSS did drop significantly (from peak 100GB to 59GB) , but not as good as the current fix (peak 26GB), which makes ASTReader::DeclsToCheckForDeferredDiags llvm::SmallSetVector.

May 19 2021, 11:53 AM · Restricted Project

May 17 2021

weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

I think the root cause might be duplicated decls are added to Sema::DeclsToCheckForDeferredDiags defined in

https://github.com/llvm/llvm-project/blob/main/clang/include/clang/Sema/Sema.h#L1789

When compiling source codes, a decl is added only once. However if modules are imported, duplicate decls may be added.

We need to avoid adding duplicate decls to Sema::DeclsToCheckForDeferredDiags. However we cannot simply change it to a set since the order is important, otherwise the error message for later code may show up earlier, causing confusion for users. I would suggest to change its type to SetVector, which keeps the order and also avoids duplicates.

May 17 2021, 3:51 PM · Restricted Project

May 13 2021

weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

Finally dealt with the other issues I need to take care. Let's resume the discussion.

May 13 2021, 10:59 AM · Restricted Project

May 11 2021

weiwang added a comment to D102229: [libomptarget][nfc] Add hook to easily disable building amdgcn bclib.

LGTM.

May 11 2021, 9:09 AM · Restricted Project
weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

Thanks for helping on this issue!

May 11 2021, 8:49 AM · Restricted Project

May 10 2021

weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

Yes, you can disable the device RTL build using this control.

It looks like your scripts define one of these PATH variables. And it has something to do with -ffreestanding. I was able to reproduce it with the following example:

#include <stdint.h>

uint64_t foo() {
  return UINT64_C(0x1);
}

$ CPLUS_INCLUDE_PATH=<path>/x86_64-linux-gnu/9.2.0/include clang test.cpp -c -ffreestanding

test.cpp:4:10: error: use of undeclared identifier '__UINT64_C'
  return UINT64_C(0x1);
         ^
<path>/x86_64-linux-gnu/9.2.0/include/stdint-gcc.h:254:21: note: expanded from macro 'UINT64_C'
#define UINT64_C(c) __UINT64_C(c)
                    ^
1 error generated.

I am not sure whether clang is supposed to work in this conditions or not, but the error is not specific to amdgcn device RTL build. Basically, any target that uses the in-tree clang may fail the same way (e.g. LIT tests).

May 10 2021, 9:28 PM · Restricted Project
weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

Looks like -DLIBOMPTARGET_AMDGCN_GFXLIST="" would disable Device RTL build, but still build rest of the libomptarget.

May 10 2021, 8:48 PM · Restricted Project
weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

I have a guess that clang-resource-headers are not built at the point, where we invoke clang. Can you please check this workaround in openmp/libomptarget/deviceRTLs/amdgcn/CMakeLists.txt?

add_custom_command(
  OUTPUT ${bc1_filename}
  COMMAND ${cu_cmd} ${file} -o ${bc1_filename}
  DEPENDS ${file} ${h_files} clang-resource-headers)

I added clang-resource-headers dependency after ${h_files}.

May 10 2021, 8:31 PM · Restricted Project
weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

@weiwang, I hope you do not mind if I ask you to run some experiments on your side? Otherwise, I am not sure how to proceed :)

Can you please run the command that fails, pass -E to it and check where the header files are coming from? I.e. run this:

cd /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/projects/openmp/libomptarget/deviceRTLs/amdgcn && /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/bin/clang-13 -xc++ -c -std=c++14 -ffreestanding -target amdgcn-amd-amdhsa -emit-llvm -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__AMDGCN__ -Xclang -target-cpu -Xclang gfx700 -fvisibility=default -Wno-unused-value -nogpulib -O2 -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/common/include -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs /home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip -E

Regarding your question "how this change would change anything", can you please check for "Not building AMDGCN device RTL: AOMP not found" message in your "old" builds? I suppose my change for find_package invocation might have caused different behavior in your setup. Before my change we were looking for LLVM in the following paths:

$ENV{AOMP}
$ENV{HOME}/rocm/aomp
/opt/rocm/aomp
/usr/lib/rocm/aomp
${LIBOMPTARGET_NVPTX_CUDA_COMPILER_DIR}
${LIBOMPTARGET_NVPTX_CUDA_LINKER_DIR}
${CMAKE_CXX_COMPILER_DIR}

Not we look for LLVM in all paths that cmake examines by default: https://cmake.org/cmake/help/latest/command/find_package.html#search-procedure

May 10 2021, 7:06 PM · Restricted Project
weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

We are getting build errors internally with this change. They are all related to libomptarget. Our internal build script uses gcc to build llvm.

One example:

[108/122] Generating target_impl.gfx700.bc
FAILED: projects/openmp/libomptarget/deviceRTLs/amdgcn/target_impl.gfx700.bc
cd /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/projects/openmp/libomptarget/deviceRTLs/amdgcn && /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/bin/clang-13 -xc++ -c -std=c++14 -ffreestanding -target amdgcn-amd-amdhsa -emit-llvm -Xclang -aux-triple -Xclang x86_64-unknown-linux-gnu -fopenmp -fopenmp-cuda-mode -Xclang -fopenmp-is-device -D__AMDGCN__ -Xclang -target-cpu -Xclang gfx700 -fvisibility=default -Wno-unused-value -nogpulib -O2 -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/common/include -I/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs /home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip -o target_impl.gfx700.bc
/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:185:25: error: use of undeclared identifier '__UINT64_C'
  lo = (uint32_t)(val & UINT64_C(0x00000000FFFFFFFF));
                        ^
/home/wangwei/fbsource/fbcode/third-party2/gcc/9.x/centos7-native/3bed279/lib/gcc/x86_64-redhat-linux-gnu/9.x/include/stdint-gcc.h:254:21: note: expanded from macro 'UINT64_C'
#define UINT64_C(c) __UINT64_C(c)
                    ^
/home/wangwei/local/llvm-project/openmp/libomptarget/deviceRTLs/amdgcn/src/target_impl.hip:186:26: error: use of undeclared identifier '__UINT64_C'
  hi = (uint32_t)((val & UINT64_C(0xFFFFFFFF00000000)) >> 32);
                         ^
/home/wangwei/fbsource/fbcode/third-party2/gcc/9.x/centos7-native/3bed279/lib/gcc/x86_64-redhat-linux-gnu/9.x/include/stdint-gcc.h:254:21: note: expanded from macro 'UINT64_C'
#define UINT64_C(c) __UINT64_C(c)
                    ^
2 errors generated.

Hello @weiwang, thank you for reporting this. Can you please provide some details how to reproduce it?

One strange thing is that the pre-built /data/users/wangwei/tp2/llvm-build/platform009/build_nopic/bin/clang-13 includes stdint-gcc.h, where it takes the definition for UINT64_C. In my builds clang takes the definition from its own lib/clang/13.0.0/include/stdint.h.

Thanks for the quick response. It may not be easily reproducible since the build script that triggers this sets up its own environment. This is part of the company's internal build system. During my local try, clang built clang always works, but the build script uses gcc to build clang. Maybe gcc would insert its own library headers into search path, and this could cause some confusing about the order of include paths? But again, we have always used gcc to build clang, and it never had issue until now. I am not sure how this change would change anything.

If you do not need libomptarget for your package, you may pass -DOPENMP_ENABLE_LIBOMPTARGET=OFF to cmake.

With -DOPENMP_ENABLE_LIBOMPTARGET=OFF, the error is gone. I'll check internally to see if libomptarget can be disabled. Meanwhile, it would still be great to know what went wrong.

May 10 2021, 6:34 PM · Restricted Project
weiwang added a comment to D101509: An attempt to abandon omptarget out-of-tree builds..

We are getting build errors internally with this change. They are all related to libomptarget. Our internal build script uses gcc to build llvm.

May 10 2021, 5:19 PM · Restricted Project

May 6 2021

weiwang added reviewers for D101793: [clang][AST] Improve AST Reader/Writer memory footprint: rsmith, riccibruno.
May 6 2021, 10:12 AM · Restricted Project
weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

Decls in Sema::DeclsToCheckForDeferredDiags is supposed to be unique. Therefore the fact that '1,734,387,685 out of 1,734,404,000 elements are the same' is surprising. Did this happen when you compile the source code and write AST? What language was the source code? C++, OpenMP, or CUDA? What was the decl that got duplicated? Thanks.

May 6 2021, 10:11 AM · Restricted Project

May 3 2021

weiwang added a comment to D101793: [clang][AST] Improve AST Reader/Writer memory footprint.

We've seen a huge memory footprint from AST Reader/Writer in a single CU with module enabled from internal workloads. Upon further analysis, the content of vector DeclsToCheckForDeferredDiags seems mostly redundant. In one case, 1,734,387,685 out of 1,734,404,000 elements are the same. While this may indicate something wrong with the source itself, it also suggests that compiler would be better to perform deduplication on this type of Decl ID.

May 3 2021, 4:11 PM · Restricted Project
weiwang added a reviewer for D101793: [clang][AST] Improve AST Reader/Writer memory footprint: yaxunl.
May 3 2021, 4:07 PM · Restricted Project
weiwang requested review of D101793: [clang][AST] Improve AST Reader/Writer memory footprint.
May 3 2021, 4:03 PM · Restricted Project

Apr 28 2021

weiwang added inline comments to D101374: [LV] Consider Loop Unroll Hints When Making Interleave Decisions.
Apr 28 2021, 11:02 PM · Restricted Project, Restricted Project

Feb 12 2021

weiwang committed rG80dc0661bd8b: [LTO] Perform DSOLocal propagation in combined index (authored by weiwang).
[LTO] Perform DSOLocal propagation in combined index
Feb 12 2021, 11:11 PM
weiwang closed D96398: [LTO] Perform DSOLocal propagation in combined index.
Feb 12 2021, 11:10 PM · Restricted Project
weiwang added inline comments to D96398: [LTO] Perform DSOLocal propagation in combined index.
Feb 12 2021, 10:56 PM · Restricted Project
weiwang updated the diff for D96398: [LTO] Perform DSOLocal propagation in combined index.

update comment and remove one unintended change.

Feb 12 2021, 10:52 PM · Restricted Project
weiwang updated the diff for D96398: [LTO] Perform DSOLocal propagation in combined index.

fixed test failures due to bit check

Feb 12 2021, 1:56 PM · Restricted Project
weiwang added inline comments to D96398: [LTO] Perform DSOLocal propagation in combined index.
Feb 12 2021, 10:46 AM · Restricted Project
weiwang updated the diff for D96398: [LTO] Perform DSOLocal propagation in combined index.
  1. serialize and de-serialize the flag.
  2. update related test cases.
Feb 12 2021, 10:44 AM · Restricted Project

Feb 11 2021

weiwang added inline comments to D96398: [LTO] Perform DSOLocal propagation in combined index.
Feb 11 2021, 5:29 PM · Restricted Project
weiwang updated the diff for D96398: [LTO] Perform DSOLocal propagation in combined index.

update according to comment.

Feb 11 2021, 5:22 PM · Restricted Project
weiwang retitled D96398: [LTO] Perform DSOLocal propagation in combined index from [LTO] Pre-populate IsDSOLocal result in combined index to [LTO] Perform DSOLocal propagation in combined index.
Feb 11 2021, 3:44 PM · Restricted Project
weiwang updated the diff for D96398: [LTO] Perform DSOLocal propagation in combined index.

Use the propagation approach.

Feb 11 2021, 3:41 PM · Restricted Project