Page MenuHomePhabricator

hfinkel (Hal Finkel)
User

Projects

User does not belong to any projects.

User Details

User Since
Nov 16 2012, 6:02 PM (374 w, 9 h)

Recent Activity

Mon, Jan 13

hfinkel accepted D71474: [TableGen] Introduce an if/then/else statement..

@nhaehnle, @hfinkel: anything left I need to do to this?

Mon, Jan 13, 8:04 PM · Restricted Project
hfinkel added inline comments to D72675: [Clang][Driver] Fix -ffast-math/-ffp-contract interaction.
Mon, Jan 13, 8:04 PM · Restricted Project
hfinkel accepted D71407: [TableGen] Introduce a `defvar` statement..

@nhaehnle, @hfinkel: anything left I need to do to this?

Mon, Jan 13, 8:04 PM · Restricted Project

Wed, Jan 8

hfinkel added a comment to D72425: [OptRemark] RFC: Introduce a message table for OptRemarks.

I like this general idea; couple of thoughts...

  1. Clang has a similar system, and one disadvantage is that any time you change/add any message, it seems to trigger a large rebuild. Could we have this TG system generate separate .inc files for each category of things, so we don't have the same kind of rebuild problem.

Probably. One thing I was imagining coming out of the .td file(s) is an enum somewhere that assigns values to all of the messages. I suppose we could make that multiple enums, each with a fixed starting value to avoid triggering rebuilds. In general, these should lend themselves to logical groupings. We'd probably also want a way for downstream LLVM-based products to painlessly add their own remarks and groups should help with that.

  1. Given that are arguments are named, can we use something like "Format string with optional specifier like %{NV}" instead of "Format string with optional specifier like %0"? I think that the former would be better.

Yes, that's a great idea. So, if I understand what you're suggesting, the entry for the remark I used as an example would look like this:

def remark_gvn_load_elim: OptRemark<"LoadElim", "load of type %Type eliminated", " in favor of %InfavorOfValue">;

Is that the idea?

Wed, Jan 8, 5:10 PM · Restricted Project
hfinkel added a comment to D72425: [OptRemark] RFC: Introduce a message table for OptRemarks.

I like this general idea; couple of thoughts...

Wed, Jan 8, 4:24 PM · Restricted Project

Sun, Jan 5

hfinkel accepted D70924: [Metadata] Add TBAA struct metadata to `AAMDNode`.

@hfinkel Any more comments?

Sun, Jan 5, 3:19 PM · Restricted Project

Tue, Dec 31

hfinkel accepted D72038: [PowerPC] Handle constant zero bits in BitPermutationSelector.

LGTM

Tue, Dec 31, 6:20 PM · Restricted Project

Mon, Dec 30

hfinkel added a comment to D71983: [PowerPC] Set the SideEffects of branch & call instructions from 1 to 0.

Do you expect any effect at all from doing this? These are all scheduling barriers?

Mon, Dec 30, 3:53 PM · Restricted Project
hfinkel accepted D70758: [PowerPC][NFC] Rename record instructions to use _rec suffix instead of o.

Ping ...

Mon, Dec 30, 3:53 PM · Restricted Project

Sat, Dec 28

hfinkel accepted D71954: [PowerPC] Change default for unaligned FP access for older subtargets.

Add the feature to the BG cores as well.

Sat, Dec 28, 8:53 AM · Restricted Project

Fri, Dec 27

hfinkel added inline comments to D70927: Introduce a CallGraph updater helper class.
Fri, Dec 27, 12:24 AM · Restricted Project
hfinkel added inline comments to D71916: High-Level Code-Review Documentation Update.
Fri, Dec 27, 12:15 AM · Restricted Project

Thu, Dec 26

hfinkel created D71916: High-Level Code-Review Documentation Update.
Thu, Dec 26, 2:43 PM · Restricted Project

Dec 18 2019

hfinkel accepted D70767: [Attributor] Add an Attributor CG-SCC pass.

LGTM

Dec 18 2019, 5:18 AM · Restricted Project

Dec 17 2019

hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

To be clear, I believe he's talking about the new expression that I proposed we add in order to represent this kind of call. If that's not needed, and we can use the FoundDecl/Decl pair for that purpose, that should also be considered.

This solution already does point to both declarations, as shown here: https://reviews.llvm.org/D71241#1782504

Exactly. We need to check if the MemberRefExpr can do this too to handle member functions correctly.
And need to wait for opinion from Richard Smith about the design. We need to be sure that it won't break compatibility with C/C++ in some corner cases. Design bugs are very hard to solve and I want to be sure that we don't miss anything. And we provide full compatibility with both C and C++.

Dec 17 2019, 4:16 AM · Restricted Project

Dec 16 2019

hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Most probably, we can use this solution without adding a new expression. DeclRefExpr class can contain 2 decls: FoundDecl and the Decl being used. You can use FoundDecl to point to the original function and used decl to point to the function being called in this context. But at first, we need to be sure that we can handle all corner cases correctly.

What new expression are you talking about?

Dec 16 2019, 10:08 PM · Restricted Project

Dec 13 2019

hfinkel accepted D67923: [TLI] Support for per-Function TLI that overrides available libfuncs.

Please take a look. This is now updated to reflect the commit of D71193, which translated the options to the new attributes. I also removed some comments that I realized didn't make sense, as we need to keep a baseline availability array on the base TLII since that is set based on the architecture. We simply override this for no-builtin* attributes on the function. I removed the code from clang that was setting up the availability array based on the options, and all tests pass.

Dec 13 2019, 6:01 PM · Restricted Project, Restricted Project
hfinkel accepted D69777: [ConstantFolding] Fold calls to FP remainder function.
Dec 13 2019, 5:43 PM · Restricted Project
hfinkel added inline comments to D69773: [APFloat] Extend converting from special strings.
Dec 13 2019, 5:34 PM · Restricted Project
hfinkel accepted D69770: [APFloat] Add recoverable string parsing errors to APFloat.

If we really want to be confident that this is robust, we should probably fuzz-test it. Regardless, this seems like a definite improvement. LGTM.

Dec 13 2019, 5:16 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D71407: [TableGen] Introduce a `defvar` statement..
Dec 13 2019, 5:07 PM · Restricted Project
hfinkel added a comment to D71474: [TableGen] Introduce an if/then/else statement..

This is a hasty first draft because I'm about to go away for the holidays, and won't have access to this account until the New Year.

Dec 13 2019, 4:58 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

No, you're misinterpreting the intent of the statement. Here's the entire section...

Faithfulness
The AST intends to provide a representation of the program that is faithful to the original source. We intend for it to be possible to write refactoring tools using only information stored in, or easily reconstructible from, the Clang AST. This means that the AST representation should either not desugar source-level constructs to simpler forms, or – where made necessary by language semantics or a clear engineering tradeoff – should desugar minimally and wrap the result in a construct representing the original source form.

For example, CXXForRangeStmt directly represents the syntactic form of a range-based for statement, but also holds a semantic representation of the range declaration and iterator declarations. It does not contain a fully-desugared ForStmt, however.

Some AST nodes (for example, ParenExpr) represent only syntax, and others (for example, ImplicitCastExpr) represent only semantics, but most nodes will represent a combination of syntax and associated semantics. Inheritance is typically used when representing different (but related) syntaxes for nodes with the same or similar semantics.

First, being "faithful" to the original source means both syntax and semantics. I realize that AST is a somewhat-ambiguous term - we have semantic elements in our AST - but Clang's AST is not just some kind of minimized parse tree. The AST even has semantics-only nodes (e.g., for implicit casts). As you can see, the design considerations here are not just "record the local syntactic elements", but require semantic interpretation of these elements.

Again, Clang's AST is used for various kinds of static analysis tools, and these depend on having overload resolution correctly performed prior to that analysis. This includes overload resolution that depends on context (whether that's qualifications on this or host/device in CUDA or anything else).

None of this is to say that we should not record the original spelling of the function call, we should do that *also*, and that should be done when constructing the AST in Sema in addition to performing the variant selection.

You are not corret. Check the implementation of decltype, for example https://godbolt.org/z/R76Nn. We keep the original decltype in AST, though we could easily lower it to the real type. This is the design of AST - keep it as much as possible close to the original code and modify it only if it is absolutely impossible (again, check https://clang.llvm.org/docs/InternalsManual.html#the-ast-library).

Dec 13 2019, 6:34 AM · Restricted Project
hfinkel accepted D71348: Add ExternalAAWrapperPass to createLegacyPMAAResults..

Thanks for the heads-up. I think this is perfectly fine from an AMDGPU perspective. It also seems like a simple oversight that the ExternalAAWrapperPass is not in that list, so this LGTM in general, but I'd wait a bit for somebody who is more familiar with alias analysis to weigh in.

Dec 13 2019, 5:49 AM · Restricted Project
hfinkel added a comment to D71407: [TableGen] Introduce a `defvar` statement..

I've been working around the lack of this for a while, so I thought it was worth at least trying to get it accepted.

Dec 13 2019, 5:49 AM · Restricted Project

Dec 12 2019

hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

I understand, but this is a generic problem. Same with host/device overloads in CUDA. Your tool only gets one compilation context, and thus only one callee. In addition, FWIW, having a base version called everywhere except on the host seems like an uncommon use case. Normally, the base version *is* the version called on the host. The named variants are likely the ones specialized for various accelerators.

Regardless, this is exactly why we should do this in Sema. We can represent links to both Decls in the AST (as I indicated in an earlier comment), and then it will be *possible* for an OpenMP-aware tool to decide on which it wants. Right now, it's not easily possible to write a tool that can use an appropriate set of contexts to examine the AST where the actual callees are represented.

No need to worry for the right decl. See D7097. If you see a refernce for function, before doing something with it, just call member function getOpenMPDeclareVariantFunction() and you get the function that must be actually called here. The tool must do the same. That's hiw the tools work. They must be aware of special costructs/attributes.

Dec 12 2019, 4:51 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

I don't insist on function redefinition solution. You want to replace functions - fine, but do this at the codegen, not in AST.

Dec 12 2019, 4:50 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

...

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Sure, but it has that link because that hst function is actually the function being called (assuming that the clangd-using-tool is configured to interpret the code as if compiling for the host). When I click on a function call in a source file, I want to navigate to the function actually being called. I certainly understand that the function being called now depends on compilation context, and the tool in our example is only providing the resolution in one context, but at least it provides one valid answer. An OpenMP-aware tool could navigate to the base function (we do need to preserve information to make this possible). This is just like dealing with some host/device functions in CUDA (where there are separate overloads) - if you click on the function in such a tool you'll probably navigate to the host variant of the function (even if, in some other context, the device overload might be called).

Again, I see this as exactly analogous to overload resolution, or as another example, when calling a function template with specializations. When using such a tool, my experience is that users want to click on the function and navigate to the function actually being called. If, for example, I have a function template with specializations, if the specialized one is being called, I should navigate to the specialization being called (not the base function template).

You wrote wrong context matcher. You has a bug in the base function, which should be called by default everu sw here but the host, and want to fix it. Etc.

Dec 12 2019, 4:25 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

...

While we talk a lot about what you think is bad about this solution it seems we ignore the problems in the current one. Let me summarize a few:

  • Take https://godbolt.org/z/XCjQUA where the wrong function is called in the target region (because the "hack" to inject code in the wrong definition is not applicable).

No time for it, just short answers. No definition for the variant - no definition for the base.

Dec 12 2019, 4:02 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Can you please be more specific? Please explain why the user would consider this incorrect behavior. If the point of the tool is to allow the user to navigate to the function actually being called, then navigating to base seems incorrect if that's not the function being called. This is just like any other kind of overload resolution - the user likely wants to navigate to the function being called.

Now the user might want an OpenMP-aware tool that understands differences between host and accelerator behavior, how that affects which functions are called, etc. The user might want this for host/device overloads in CUDA too, but this is really an orthogonal concern.

You wrote the code. You called a function in the expression. Now you want to navivate to this function. Clicked on it and instead of the called base you are redirected to hst because AST has the link to hst functiin inthe expression instead of the base.

Dec 12 2019, 3:52 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Explain that you're replacing the function written by the user on the fly by another one. If they accept it, go ahead.

That's the observational effect of variants. Replacing is very similar to calling + inlining.

Not in the AST.

I don't see much difference between mutating the AST and mutating the SSA. What're your objections to the former, specifically? It's not a stable interface so tools hanging off it will have a process for updating when it changes.

Dec 12 2019, 3:39 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

There can be another one issue with this solution with inline assembly. I’m not completely sure about it, will try to investigate it tomorrow. I suggest to discuss this solution with Richard Smith (or John McCall). If he/they are ok with this transformation of the AST, we can switch to this scheme.

Dec 12 2019, 3:31 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

This is exactly right. This is just like any other kind of static overload resolution. It should be resolved in Sema and the CallExpr's DeclRefExpr should refer to the correct callee. This will make sure that tools, including static analysis tools, will correctly understand the semantics of the call (e.g., IPA will work correctly). Also, we prefer, in Clang, to generate errors and warning messages in Sema, not in CodeGen, and it is certainly plausible that errors and warnings could be generated during the selector-based resolution process.

That having been said, Alexey is also correct that we retain the unevaluated form of the constexpr expressions, and there is an important analogy here. I believe that one way of restating Alexey's concerns about the AST representation is that, if we resolve the variant selection as we build the AST, and then we print the AST, the printed function would be the name of the selected variant and not the name of the base function. This is certainly a legitimate concern, and there are several places in Clang where we take care to preserve the spelling used for constructs that are otherwise semantically equivalent (e.g., different spellings of keywords). I can certainly imagine a tool wanting to see the base function called, and we'll want that for the AST printer regardless. We might add this information to CallExpr or make a new subclass of CallExpr (e.g., OpenMPVariantCallExpr) that has that information (the latter seems likely better so that we don't increase the size of CallExpr for an uncommon case).

Writing the dispatch lowering on IR should make it easier to call from a Fortran front end. I'm in favour of minimising work done on the clang AST on general principles.

We need to make the best decision for Clang in Clang, regardless of how this might impact a future Fortran implementation. While the OpenMPIRBuilder will be a point of reuse between different OpenMP-enabled frontends, it need not be the only one. Moreover, Fortran will also want to do this resolution earlier for the same reason that it should be done earlier in Clang (and, for Fortran, we'll end up with inlining and other IPA at the FIR level, so it will be required to resolve the variants prior to hitting the OpenMPIRBuilder). Thus, no, doing this in CodeGen is unlikely to work for the Flang implementation.

Also, "minimizing work done in the Clang AST on general principles", seems like an oversimplification of our general Clang design philosophy. Overload resolution in Clang is certainly a significant part of the implementation, but we wouldn't consider doing it in CodeGen. The AST should faithfully represent the semantic elements in the source code. Overload resolution, template instantiation, constexpr evaluation, etc. all are highly non-trivial, and all happen during Sema (even in cases where we might, technically speaking, be able to delay that logic until CodeGen). What we don't do in Sema are lowering tasks (e.g., translating references into pointers or other things related to picking an underlying implementation strategy for particular constructs) and optimizations - where we do them at all - e.g., constant folding, some devirtualization, and so on are done in CodeGen. For the most part, of course, we defer optimizations to LLVM's optimizer.

Given we have two implementations, each at different points in the pipeline, it might be constructive to each write down why you each choose said point. I suspect the rationale is hidden by the implementation details.

Actually, early resolution will break tbe tools, not help them. It will definitely break clangd, for example. The user will try to navigate to originally called function base and instead he will be redirected to the function hst.

Dec 12 2019, 3:22 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Here is the example that does not work with the proposed solution but works with the existing one:

static void cpu() { asm("nop"); }

#pragma omp declare variant(cpu) match(device = {kind(cpu)})
static __attribute__((used)) void wrong_asm() {
  asm ("xxx");
}

The existing solution has some problems with the delayed error messages too, but they are very easy to fix.

Dec 12 2019, 2:39 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

https://clang.llvm.org/docs/InternalsManual.html#the-ast-library

Faithfulness¶
The AST intends to provide a representation of the program that is faithful to the original source.

That's pretty convincing.

Dec 12 2019, 2:29 PM · Restricted Project
hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

Lowering in sema or in codegen seems a standard phase ordering choice. There will be pros and cons to both.

I think prior art leans towards sema. Variants are loosely equivalent to tag dispatching or constexpr if, both handled before lowering the AST to IR.

Dec 12 2019, 2:10 PM · Restricted Project
hfinkel added inline comments to D70258: [OpenMP][IR-Builder] Introduce the finalization stack.
Dec 12 2019, 11:19 AM · Restricted Project, Restricted Project
hfinkel added a comment to D71348: Add ExternalAAWrapperPass to createLegacyPMAAResults..

@nhaehnle just added you so that you are aware of this change - it in theory could minorly change codegen on AMDGPU but I think you don't actually return any more precise noaliasing state so it should (tm) be fine. But thought it was worth you oking this either way!

Dec 12 2019, 11:00 AM · Restricted Project
hfinkel added a comment to D70924: [Metadata] Add TBAA struct metadata to `AAMDNode`.

Hi @hfinkel, I've investigated the current state of TBAA support in LLVM. AFAIU (see D40975, but it is still uncommited) the new format of tbaa tag replaces tbaa.struct tag. Does it mean tbaa.struct tag will be deprecated soon?

Dec 12 2019, 10:51 AM · Restricted Project

Dec 10 2019

hfinkel added a comment to D71241: [OpenMP][WIP] Use overload centric declare variants.

...

Also, check how -ast-print works with your solution. It returns different result than expected because you're transform the code too early. It is incorrect behavior.

This is debatable. AST print does not print the input but the AST, thus what is correct wrt. OpenMP declare variant is nowhere defined but by us.
Arguably, having it print the actually called function and not the base function is preferable. Thus, the new way is actually more informative.

You're completely wrong here! We shall keep the original AST. This is used in several tools and you significantly modify the user code. I consider it a real issue here.

Dec 10 2019, 4:18 PM · Restricted Project
hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

I understand that they have different names. I don't see why we that means that they can't be added to the overload set as multi-version candidates if we add logic which does exactly that.

Dec 10 2019, 3:57 PM · Restricted Project, Restricted Project

Dec 9 2019

hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

@ABataev, can you please elaborate? It's not obvious to me that we cannot handle the existing declare variant with the same scheme (as @jdoerfert highlighted above). In general, I believe it's preferable to have one generic scheme and use it to handle all cases as opposed to continuing to use a more-limited scheme in addition to the generic scheme.

Eaine already. Current version of declare variant cannot be represented as multiversiin functions, because it is not. We have a function that is the alias to other functions with different names. They just are not multiversion functions by definition.

Dec 9 2019, 7:44 PM · Restricted Project, Restricted Project
hfinkel accepted D71195: [TableGen] Permit dag operators to be unset..

LGTM

Dec 9 2019, 7:08 PM · Restricted Project
hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

You're doing absolutely the same thing as the original declare variant implementation.

I don't think so but if you do why do you oppose this approach?

And I don't think it would be correct to add them as multiversiin variants to the original function.

Why wouldn't it be correct to pick the version through the overload resolution instead of the code generation?
How this could work is already described in the TODO (CodeGenModule.cpp):

// TODO: We should introduce function aliases for `omp declare variant`
//       directives such that we can treat them through the same overload
//       resolution scheme (via multi versioning) as `omp begin declare
//       variant` functions. For an `omp declare variant(VARIANT) ...`
//       that is attached to a BASE function we would create a global alias
//       VARIANT = BASE which will participate in the multi version overload
//       resolution. If picked, here is no need to emit them explicitly.

I still haven't understood why we cannot/should not reuse the existing multi-version support and instead duplicate the logic in some custom scheme.
We have this patch that shows how we can reuse the logic in Clang. It works on a per-call basis, so it will work for all context selector (incl. construct).
If you think there is something conceptually not working, I'd like to hear about it. However, just saying "it wouldn't be correct" is not sufficient. You need to provide details about the situation, what you think would not work, and why.

I explayned already: declare variant cannot be represented as mutiversion functiin, for example.

Dec 9 2019, 6:40 PM · Restricted Project, Restricted Project
hfinkel added a comment to D70758: [PowerPC][NFC] Rename record instructions to use _rec suffix instead of o.

Hi all, thanks for the review and discussion.
I'd like to make a final decision about commit or abandon, I don't see any point of leaving this longer.

Currently,
@jsji (me) is still prefer _record, can accept _rec.
@jhibbits is still against _record (prefer no change or at most _rec).
@lkail agreed with _record as a relatively new comer, can accept _rec.
@nemanjai is still kindly silent.

@steven.zhang @hfinkel Any further comments or suggestions? Thanks.

Dec 9 2019, 2:44 PM · Restricted Project
hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

...

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Okay, but how to we distinguish functions for which there is a declaration and we need the mangling because the user has provided a definition elsewhere, from those for which there is a declaration, and we don't want mangling because we need to link to some system library?

The idea is, declarations inside begin/end declare variant are supposed to be not affected by the begin/end declare variant. That is, if you have declarations you cannot expect variant multi-versioning to happen. Having declarations inside or outside the begin/end declare variant is still fine if they all denote the same function.

Dec 9 2019, 11:29 AM · Restricted Project, Restricted Project
hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

@jdoerfert , how does the ".ompvariant" work with external functions? I see the part of the spec which says, "The symbol name of a function definition that appears between a begin declare variant...", but, if we append this name to, for example, the names of functions present in the device math library, won't we have a problem with linking?

We restricted it for now to function definitions so we don't need to define the mangling as you cannot expect linking. (I did this to get it in TR8 while I figured it will solve all our math.h problems already).
However, we need to avoid collisions with user code, e.g., through the use of symbols in the name that are not allowed to be used by the user (I thought "." is one of them).

Dec 9 2019, 8:42 AM · Restricted Project, Restricted Project
hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

The patch does this (see in ParseOpenMP.cpp where I asked about the potential inf-loop). But when the definitions are not skipped, then we have to worry about having multiple decls/defs of the same name and the overload priorities.

I would recommend to drop all this extra stuff from the patch and focus on the initial patch. We'll need something similar to multiversion in case of the construct context selectors, but at first we need to solve all the problems with the simple versions of the construct rather that try to solve all the problems in the world in one patch. It is almost impossible to review.

Dec 9 2019, 8:24 AM · Restricted Project, Restricted Project
hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

I just don't get it. If begin/end is just a something like #ifdef...endif, why you just can't skip everything between begin/end if the context does not match?

Dec 9 2019, 5:35 AM · Restricted Project, Restricted Project

Dec 8 2019

hfinkel added a comment to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.

They do this because they have several function definitions with the same name. In our case, we have several different functions with different names and for us no need to worry about overloading resolution, the compiler will do everything for us.

I think we talk past each other again. This is the implementation of omp begin/end declare variant as described in TR8. Bt definition, the new variant mechanism will result in several different function definitions with the same name. See the two tests for examples.

Dec 8 2019, 11:38 PM · Restricted Project, Restricted Project
hfinkel added inline comments to D71179: [OpenMP][WIP] Initial support for `begin/end declare variant`.
Dec 8 2019, 11:11 PM · Restricted Project, Restricted Project

Dec 6 2019

hfinkel added a comment to D71115: [TableGen] Add a permissive version of !con..

But I can implement !setop if you prefer. (And while I'm at it, perhaps !getop too.)

Dec 6 2019, 9:46 AM · Restricted Project
hfinkel added a comment to D71115: [TableGen] Add a permissive version of !con..

Did you consider introducing something to change the dag operator instead of this? I think it might be cleaner to have some kind of thing which changes the dag operator, and after that, if you wish to !con them, everything works as expected.

Dec 6 2019, 6:38 AM · Restricted Project
hfinkel added inline comments to D70924: [Metadata] Add TBAA struct metadata to `AAMDNode`.
Dec 6 2019, 6:38 AM · Restricted Project

Dec 2 2019

hfinkel added a comment to D70758: [PowerPC][NFC] Rename record instructions to use _rec suffix instead of o.

Hrmm... the current naming convention predates me, but it wasn't difficult to figure out. If you're familiar with the ISA and the fact that the record forms have mnemonics ending in a '.'. Honestly, replacing 'o' with 'r' doesn't seem all that much better to me, as I still need to know that the 'r' stands for record form and understand what that means. It's not like searching the ISA manual for '.', or 'o', or 'r', is going to buy one much of anything. I am sympathetic to none of these being immediately obvious, and I'd be happy with some fix to that situation.

Dec 2 2019, 3:30 PM · Restricted Project

Nov 19 2019

hfinkel added a comment to D69776: [APFloat] Fix FP remainder operation.

Thanks for posting patches to fix problems with the APFloat implementation. FWIW, this code is hard to review, from my perspective, because it lacks comments (both your new code and the code it is replacing). As you clearly have gone through this and understand what it's doing, could you add a bunch of comments explaining what's going on? I think that would be very helpful for everyone else.

Nov 19 2019, 7:20 AM · Restricted Project
hfinkel added a comment to D70425: [APFloat] Prevent construction of APFloat with Semantics and FP value.

Perhaps there should be an "= delete" constructor to make this a compile error?

Nov 19 2019, 7:16 AM · Restricted Project
hfinkel accepted D70422: [APFloat] Fix fusedMultiplyAdd when `this` equals to `Addend`.

LGTM

Nov 19 2019, 7:06 AM · Restricted Project
hfinkel accepted D66088: AMD Znver2 (Rome) Scheduler enablement.

I agree on having a patch to enable exegesis. Will post that in couple of days. As mentioned, this is part of the initial plan as well.

Nov 19 2019, 6:54 AM · Restricted Project

Nov 17 2019

hfinkel accepted D70352: [PowerPC] Rename DarwinDirective to CPUDirective (NFC).

LGTM too.

Nov 17 2019, 8:56 AM · Restricted Project

Nov 6 2019

hfinkel added a comment to D69900: Align load/store for llvm.loop.aligned metadata.

Do we need this at all? Can't we just emit @llvm.assume calls before the loop if we know the alignment? I've done this in the past and it's worked well (by using __builtin_assume_aligned from Clang).

Nov 6 2019, 12:57 PM · Restricted Project
hfinkel added inline comments to D69897: Add #pragma clang loop vectorize_assume_alignment(n).
Nov 6 2019, 10:55 AM · Restricted Project
hfinkel added inline comments to D69897: Add #pragma clang loop vectorize_assume_alignment(n).
Nov 6 2019, 9:13 AM · Restricted Project

Nov 5 2019

hfinkel added a comment to D69773: [APFloat] Extend converting from special strings.

Please add a test case.

Nov 5 2019, 12:38 PM · Restricted Project

Nov 4 2019

hfinkel accepted D69814: [Scheduling] Enable AA in PostRA Machine scheduler.

If this is wrong (or buggy), then it will probably break self-hosting on PPC, so then we'll know.

Nov 4 2019, 1:00 PM · Restricted Project

Oct 30 2019

hfinkel added inline comments to D69354: Make coding standards document more inclusive.
Oct 30 2019, 5:27 AM · Restricted Project

Oct 29 2019

hfinkel added inline comments to D69232: [PowerPC] Clear the sideeffect bit for those instructions that didn't have the match pattern.
Oct 29 2019, 9:48 AM · Restricted Project

Oct 28 2019

hfinkel added inline comments to D69232: [PowerPC] Clear the sideeffect bit for those instructions that didn't have the match pattern.
Oct 28 2019, 8:43 PM · Restricted Project
hfinkel accepted D68841: [PowerPC] Do not convert loop to HW loop if the body contains calls to lrint/lround.

LGTM

Oct 28 2019, 9:29 AM · Restricted Project
hfinkel accepted D69232: [PowerPC] Clear the sideeffect bit for those instructions that didn't have the match pattern.

LGTM

Oct 28 2019, 8:43 AM · Restricted Project

Oct 25 2019

hfinkel accepted D66971: TableGen: Remove assert that pattern results match input number.

LGTM

Oct 25 2019, 11:21 AM
hfinkel added a comment to D69391: Add #pragma clang loop ivdep.

This is the same as #pragma clang loop vectorize(assume_safety)?

Oct 25 2019, 10:43 AM · Restricted Project

Oct 23 2019

hfinkel added inline comments to D68841: [PowerPC] Do not convert loop to HW loop if the body contains calls to lrint/lround.
Oct 23 2019, 6:14 AM · Restricted Project

Oct 22 2019

hfinkel added inline comments to D69272: Enable '#pragma STDC FENV_ACCESS' in frontend.
Oct 22 2019, 11:43 AM · Restricted Project
hfinkel added a comment to D69272: Enable '#pragma STDC FENV_ACCESS' in frontend.

I don't see how such warning can help a user. A note about impact of the pragma on performance can be put into documentation. Issuing a warning on every use of the pragma may be annoying.

Oct 22 2019, 9:54 AM · Restricted Project
hfinkel accepted D63607: [clang][driver] Add basic --driver-mode=flang support for fortran.

Friendly ping to everybody watching. I'd like to get this in soon if possible.

Hal - do you think this is close to being accepted? Note that I consider this "the beginning" rather than "the end", since there will be more functionality to add piecewise before this is fully functional. In the meantime, since it is new functionality, it should not break anything.

Oct 22 2019, 9:39 AM · Restricted Project
hfinkel added inline comments to D69272: Enable '#pragma STDC FENV_ACCESS' in frontend.
Oct 22 2019, 9:31 AM · Restricted Project

Oct 21 2019

hfinkel added inline comments to D69272: Enable '#pragma STDC FENV_ACCESS' in frontend.
Oct 21 2019, 2:14 PM · Restricted Project

Oct 15 2019

hfinkel added a comment to D68994: [RFC] Redefine `convergent` in terms of dynamic instances.

Thanks for posting this. Please send a plain email explaining the proposal to llvm-dev - RFCs regarding new intrinsics and other IR extensions should go there to reach a wide audience.

Oct 15 2019, 10:01 AM · Restricted Project
hfinkel added a comment to D68994: [RFC] Redefine `convergent` in terms of dynamic instances.

Thanks for posting this. Please send a plain email explaining the proposal to llvm-dev - RFCs regarding new intrinsics and other IR extensions should go there to reach a wide audience.

Oct 15 2019, 10:01 AM · Restricted Project

Oct 10 2019

hfinkel added inline comments to D68824: Fix __builtin_assume_aligned with too large values..
Oct 10 2019, 12:47 PM
hfinkel added a comment to D68793: [System Model] [TTI] Add TTI interfaces for write-combining buffers.

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

See the Intel optimization guide, section 3.6.9.

https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf

Basically, this information can be used to inform loop transformations as well as use of non-temporal instructions. A write-combining buffer is not the same as a store buffer. A write-combining buffer is always one cache line in size, so I don't think we need size information.

Oct 10 2019, 12:28 PM · Restricted Project
hfinkel accepted D68804: [System Model] [TTI] Move default cache/prefetch implementations.

LGTM

Oct 10 2019, 11:41 AM · Restricted Project
hfinkel added inline comments to D68814: [LV] Allow assume calls in predicated blocks..
Oct 10 2019, 11:41 AM · Restricted Project
hfinkel accepted D68817: [PowerPC][docs] Update IBM official docs in Compiler Writers Info page.

LGTM

Oct 10 2019, 11:31 AM · Restricted Project
hfinkel committed rL374408: Add hfinkel to github usernames.
Add hfinkel to github usernames
Oct 10 2019, 11:12 AM
hfinkel added a comment to D68793: [System Model] [TTI] Add TTI interfaces for write-combining buffers.

How do you imagine that we'd use this? Do we need some kind of size to go along with this?

Oct 10 2019, 10:13 AM · Restricted Project

Oct 4 2019

hfinkel added a comment to D68453: TableGen: Allow 'a+b' in TableGen language.

How far down this road can we go? I would like to see infix operators in TableGen because I believe that it will make it easier to use. However, I'm concerned about just adding add as a special case. Of the other similar operators:

case ADD: Result = "!add"; break;
case MUL: Result = "!mul"; break;
case AND: Result = "!and"; break;
case OR: Result = "!or"; break;
case SHL: Result = "!shl"; break;
case SRA: Result = "!sra"; break;
case SRL: Result = "!srl"; break;
case EQ: Result = "!eq"; break;
case NE: Result = "!ne"; break;
case LE: Result = "!le"; break;
case LT: Result = "!lt"; break;
case GE: Result = "!ge"; break;
case GT: Result = "!gt"; break;

how many can we add as infix operators without ambiguities or other difficulties?

I have a half baked shunting-algorithm based infix-operator patch to work with cases you mention but that is kind of major work. I thought maybe, as !add is most commonly used, there maybe an appetite for a simple solution like here.

Oct 4 2019, 4:30 PM · Restricted Project
hfinkel added a comment to D68453: TableGen: Allow 'a+b' in TableGen language.

How far down this road can we go? I would like to see infix operators in TableGen because I believe that it will make it easier to use. However, I'm concerned about just adding add as a special case. Of the other similar operators:

Oct 4 2019, 8:56 AM · Restricted Project

Oct 1 2019

hfinkel added inline comments to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.
Oct 1 2019, 12:21 PM · Restricted Project

Sep 27 2019

hfinkel added a comment to D67867: [libc] Add few docs and implementation of strcpy and strcat..

Maybe everything is fine, but given this setup, does anyone see any potential problems with compiling these functions for nvptx? I'd like to eventually see a mode where we compile an appropriate subset of these functions for GPU targets -- either in bitcode form for linking with our device-side OpenMP runtime library or as a native object -- to provide a more feature-complete offloading environment.

Sep 27 2019, 9:19 AM · Restricted Project, Restricted Project
hfinkel accepted D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.

We can land it firstly and update the interface to use MVT as simplest type to getRegisterClassForType parameter and call TLI->getTypeLegalizationCost before.
This can apply to where GetRegUsage works

Sep 27 2019, 8:20 AM · Restricted Project

Sep 26 2019

hfinkel added a comment to D68035: [PowerPC] Extend custom lower of vector truncate to handle wider input.

Is it possible to check the permutation mask in the test case? (I suppose that these are auto-generated tests, but the test coverage it's all that good if we can't check the mask).

Sep 26 2019, 6:53 AM · Restricted Project
hfinkel added a comment to D67148: [LoopVectorize][PowerPC] Estimate int and float register pressure separately in loop-vectorize.

I apologize for the delay; I've been contemplating what to recommend here...

Sep 26 2019, 6:42 AM · Restricted Project

Sep 25 2019

hfinkel added a comment to D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script.

The three way split looks great, thanks.

Sep 25 2019, 3:57 PM
hfinkel added a comment to D64943: [Clang][OpenMP offload] Eliminate use of OpenMP linker script.

This LGTM. I'm happy that this is a design improvement over the current scheme. @JonChesterfield , @ABataev , any further comments?

Sep 25 2019, 6:19 AM

Sep 23 2019

hfinkel added a comment to D9375: An llvm.noalias intrinsic.
Sep 23 2019, 6:35 PM
hfinkel added a comment to D67941: Invalidate assumption cache before outlining..

Please post patches with full context (the full source file should be here). I can't see where moveCodeToFunction is called.

Sep 23 2019, 6:30 PM · Restricted Project

Sep 18 2019

hfinkel added a comment to D63607: [clang][driver] Add basic --driver-mode=flang support for fortran.

One thought that occurs to me when reviewing this. I think we will assume that F18/flang when it lands in the LLVM project will be an optional thing to build and will be an opt-in project at the start that is off by default. What happens when clang --driver-mode=flang is called and flang has not been built? And where would we put a test for that behaviour? If flang were not to be built, should --driver-mode=flang be unsupported/unrecognised and emit an error message?

Sep 18 2019, 10:54 AM · Restricted Project

Sep 17 2019

hfinkel accepted D66902: [PowerPC] Implementing overflow version for XO-Form instructions.

LGTM too.

Sep 17 2019, 8:57 AM · Restricted Project