This is an archive of the discontinued LLVM Phabricator instance.

[OpenMPOpt] ICV Tracking
ClosedPublic

Authored by sstefan1 on Jun 13 2020, 5:59 AM.

Details

Summary

This is first part if ICV tracking support in OpenMPOpt. Things that are included in this pathc:

  • ICV information definition (not complete) in OMPKinds.def.
  • OpenMP-specific information cache based on Attributor's InformationCache. This should should make it easier to share information between them.
  • possibility of running the Attributor through runAttributor().
  • AAICVTracking AbstractAttribute with basic tracking.

Since there are a lot of changes to agree up on, I left out the deduplication part for now. There are 2 possible places where that can be done: current deduplication functions, or within AAICVTracking's manifest.

This can also be treated as a RFC.

Diff Detail

Event Timeline

sstefan1 created this revision.Jun 13 2020, 5:59 AM
Herald added a project: Restricted Project. · View Herald Transcript

I left a first set of comments. I think the overall direction is good, some interesting opportunities are opening up (see below). We need tests and also to split the patch (some suggestions below).

llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
367 ↗(On Diff #270574)

Remove the comment for now. and also the clause stuff.

llvm/include/llvm/Transforms/IPO/Attributor.h
822

Why do we want a pointer here? We can derive from InformationCache elsewhere and still provide a reference, right?

877

I think everything else would just duplicate code. This can be public but the comment should mention that use *after* the Attributor was started to run is restricted to the Attributor itself. Initial seeding can be done via this function.

929

Just moving this to be public with a new comment LGTM. Feel free to do this right away. If you want input on the comment, let me know.

3070

I think the AA should live in OpenMP opt. TBH, similar to a comment below, I would like us to have a "simplification AA" here at some point. More specifically for this use case a "call simplifier". Those interfaces would live with the Attributor and be used by it, via some registration mechanism. It might even be useful to have the general logic in here eventually so that only a few functions need to be implemented by the user, e.g., bool interactsWithHiddenState(CallBase &CB), Optional<Value> getNewHiddenState(CallBase &CB), bool returnsHiddenState(CallBase &CB). The generic algorithm would track the "hidden state" (here ICV value) and provide all the logic. We can do that in a follow up or now. WDYT?

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
100

As long as there is no other user we keep it here.

132

I think we need to do that eventually but for a first restricted deduction, see below, we might not need to. Just remove the comments for now then.

149

Documentation gone? We need some for both functions.

166

Why this change?

347

I think an RFC commit moving everything into a "cache" helper class should be provided seperatly. That seems trivial to accept and will significantly shrink this diff.

772

I feel the instance might be preferable so we can access AAs later, e.g., via A.lookupAAFor.

774

I think in the long run we want to register the ICV tracking attribute as another "simplification" attribute (for runtime calls). Then, the Attributor can ask all simplification attributes for a better value for a call, potentially resulting in better liveness and therefore better ICV tracking ;)

For now we could whitelist only ICV tracking (I guess).

894

Style: Add braces if you have a comment that makes the body multi-line.
You have to verify there is no cal in-between the two instructions too:

set_num_threads(2);
foo(); // resets num_threads
get_num_threads();

^ negative test we should keep.

We expand our tracking efforts in a follow up, for now we can be restrictive and correct with some positive and negative tests.

901

I don't think this works:

if (...)
  set_num_threads(2);
get_num_threads();

^ negative test we should keep.

For the first version go with the same basic block only

sstefan1 marked 7 inline comments as done.Jun 13 2020, 1:21 PM

I left a first set of comments. I think the overall direction is good, some interesting opportunities are opening up (see below). We need tests and also to split the patch (some suggestions below).

Maybe D81114 should be commited first, or should I just abandon that and put the tests here?

I'll split this with at least two more patches.

llvm/include/llvm/Transforms/IPO/Attributor.h
822

So that we can later cast it to OMPInformationCache and use openmp-specific information. This seemed like least amount of change.

877

I guess then it should also be allowed to be used after Attributor run is finished? Say we want to know ICV value after the Attributor run and all the deduplication, we still won't have querying attribute. Or should we also expose lookupAAFor?

929

You mean a NFC commit without a review?

3070

I think this should work, but I'll have to put more thought into (mainly the registration mechanism). this I'm in favor of getting this in first, since this is already proven to be rather large patch with lots of changes.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
149

Forgot to copy. Will do.

166

I think this comes from the fact that the same OMPInfoCache is shared between the Attributor and OpenMPOPT. Maybe we'll be able to avoid this after splitting up the patch, but I'm not sure.

901

I agree. Not to proud of this part tbh. I'll leave this out for now.

I think this needs a rebase once some other landed.

llvm/include/llvm/Transforms/IPO/Attributor.h
822

You can cast regardless. If there is no other reason, please keep it a reference :)

877

Iff you are careful, you can use this *after* a run, e.g. if you only use information from attributes at a fixpoint. Exposing lookup makes this less error prone and would allow us to have an assert even. The other way is to create and remember all interesting locations. That is usually what we did so far but if we have non AA users lookup might be useful as well.

929

Yes. The code movement parts.

3070

Agreed. We can even split them further and submit the NFC parts so it becomes clearer to me what changes :)

sstefan1 updated this revision to Diff 272562.Jun 22 2020, 3:47 PM
  • clean up the patch
  • basic deduplication
sstefan1 retitled this revision from [WIP][OpenMPOPT] ICV Tracking Support to [OpenMPOpt] ICV Tracking.Jun 22 2020, 3:49 PM

Nit: Add documentation to functions and classes please. Describe what happens and why if applicable.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
790

I'm really happy one can add a local AA like this :)

842

Style: & in both above.

847

Nit: Is this necessary?

866

Nit: GetterRFI

868

We might want to "push" the information from setters instead. That would allow us to remove setters that are not useful, e.g. it is reset before it could have been read. We also need to look at other/implicit reads and writes later: https://godbolt.org/z/fdBqiF

I'm fine with starting here but let me know what you think about pushing instead?

873

Can we replace the pair with a struct that has proper member names and documentation.

908

Somehow broken.

916

Style: & and GetterRFI.

920

Make both conditionals early exits

llvm/test/Transforms/OpenMP/icv_tracking.ll
56–57

I thought this min was necessary but I doubt it reading the standard again. Maybe just remove the fixme

95–96

I guess this should be use(10) too, right? And tmp4 should be removed.

sstefan1 marked 4 inline comments as done.Jun 25 2020, 3:33 AM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
868

I guess we would first need to be aware of 'other' reads and writes, since without that information we could for example remove a setter that was supposed to be read.

I think we can do that with this approach as well. If the value wasn't read when we reach new setter, we mark the last one for removal? Not really sure if one is better than other.

908

The comment is broken?

920

I guess this makes sense now, but as we improve this, more conditionals will be added and it won't make sense to exit early.

llvm/test/Transforms/OpenMP/icv_tracking.ll
95–96

Yes, but only once we add callsite information. Right now we don't know if use might have set a new value.

jdoerfert added inline comments.Jun 26 2020, 1:11 PM
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
868

Both directions conceptually work. Let's wait and see. The correctness should never be compromised, I mean if you see something that you don't know you have to assume it reads and writes the ICV anyway.

908

Yes, a stray \p , right?

920

It depends. Having 3 early exists is, for me at least, nicer than one big condition.

llvm/test/Transforms/OpenMP/icv_tracking.ll
95–96

I see.

sstefan1 marked an inline comment as done.Jun 29 2020, 3:06 PM
sstefan1 added inline comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp
920

Just to make sure we're on the same page here. I'm not saying there will be one big condition. Rather, right now there is the case when both instructions are in the same BB. In the future we also want to cover the case when they are not. Does that make sense?

sstefan1 updated this revision to Diff 274259.Jun 29 2020, 3:12 PM

addressing comments

jdoerfert accepted this revision.Jul 2 2020, 9:04 AM

Some nits to be addressed, otherwise LGTM.

Address the clang-tidy warnings where possible please.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
834

& and below. I assume you don't want to copy it all the time.

865

Note: Once we do proper optimistic initialization we need to compute the replacement value in the update. For now this is fine.

899

Nit: If you declare this as template<> struct DenseMapInfo<ICVValue> you don't need to specify it with the map later.

946

Wouldn't a const block do just fine below?

This revision is now accepted and ready to land.Jul 2 2020, 9:04 AM
This revision was automatically updated to reflect the committed changes.

I've reverted this in rG1d542f0ca83fa1411d6501a8d088450d83abd5b8.

There appears to be some kind of memory corruption/use-after-free/etc
going on here. In particular, in OpenMPOpt::deleteParallelRegions(),
in DeleteCallCB(), CI is garbage.

Reduced reproducer: ./bin/opt -O3 /tmp/test.ll:

; ModuleID = '/home/lebedevri/CREDUCE/input.c'
source_filename = "/home/lebedevri/CREDUCE/input.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-linux-gnu"

%struct.ident_t = type { i32, i32, i32, i32, i8* }

@.str = private unnamed_addr constant [23 x i8] c";unknown;unknown;0;0;;\00", align 1
@0 = private unnamed_addr global %struct.ident_t { i32 0, i32 2, i32 0, i32 0, i8* getelementptr inbounds ([23 x i8], [23 x i8]* @.str, i32 0, i32 0) }, align 8

; Function Attrs: nounwind uwtable
define dso_local i32 @b() #0 {
  %1 = alloca i32, align 4
  %2 = call i32 @a()
  %3 = load i32, i32* %1, align 4
  ret i32 %3
}

; Function Attrs: nounwind uwtable
define internal i32 @a() #0 {
  %1 = alloca i32, align 4
  %2 = call i32 @b()
  call void (%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) @__kmpc_fork_call(%struct.ident_t* @0, i32 0, void (i32*, i32*, ...)* bitcast (void (i32*, i32*)* @.omp_outlined. to void (i32*, i32*, ...)*))
  %3 = load i32, i32* %1, align 4
  ret i32 %3
}

; Function Attrs: norecurse nounwind uwtable
define internal void @.omp_outlined.(i32* noalias %0, i32* noalias %1) #1 {
  %3 = alloca i32*, align 8
  %4 = alloca i32*, align 8
  store i32* %0, i32** %3, align 8, !tbaa !2
  store i32* %1, i32** %4, align 8, !tbaa !2
  ret void
}

; Function Attrs: nounwind
declare !callback !6 void @__kmpc_fork_call(%struct.ident_t*, i32, void (i32*, i32*, ...)*, ...) #2

attributes #0 = { nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "frame-pointer"="none" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"Debian clang version 11.0.0-++20200709100646+c92a8c0a0f6-1~exp1~20200709201313.3348"}
!2 = !{!3, !3, i64 0}
!3 = !{!"any pointer", !4, i64 0}
!4 = !{!"omnipotent char", !5, i64 0}
!5 = !{!"Simple C/C++ TBAA"}
!6 = !{!7}
!7 = !{i64 2, i64 -1, i64 -1, i1 true}
lebedev.ri reopened this revision.Jul 10 2020, 9:01 AM
This revision is now accepted and ready to land.Jul 10 2020, 9:01 AM
lebedev.ri requested changes to this revision.Jul 10 2020, 9:01 AM
This revision now requires changes to proceed.Jul 10 2020, 9:01 AM
sstefan1 updated this revision to Diff 277162.Jul 10 2020, 4:23 PM

fixing the issue with dead uses.

jdoerfert accepted this revision.Jul 10 2020, 4:31 PM

Add the reproducer as test please. With that one and the change below, LGTM.

llvm/lib/Transforms/IPO/OpenMPOpt.cpp
341

-Size
+RFIs.size()

just add the function to the EnumeratedArray please.

This revision was not accepted when it landed; it landed in state Needs Review.Jul 10 2020, 5:27 PM
This revision was automatically updated to reflect the committed changes.
jdoerfert added inline comments.Jul 13 2020, 2:29 PM
llvm/include/llvm/ADT/EnumeratedArray.h
41 ↗(On Diff #277182)

Nit: const