Page MenuHomePhabricator

[WPD] Try harder to find assumes through phis
AbandonedPublic

Authored by aeubanks on May 20 2022, 1:19 PM.

Details

Reviewers
tejohnson
pcc
Summary

The i1 from a type.test may be merged with another type.test into a phi which
gets fed to an assume. Previously we wouldn't delete the type.test in some
cases, causing an assume(false).

Try harder to find corresponding assumes by looking through phis. This still
isn't ideal because there may be other sorts of valid transformations that can
obfuscate the use-def chain between a type test and an assume, but it at least
resolves a miscompile in Chrome when turning on opaque pointers.

This may also end up deleting an assume of a phi where one side is a type test
and another side is something unrelated, meaning we lose non-WPD-related
information.

Diff Detail

Event Timeline

aeubanks created this revision.May 20 2022, 1:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 1:19 PM
aeubanks requested review of this revision.May 20 2022, 1:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 1:19 PM
tejohnson added inline comments.May 20 2022, 2:21 PM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1928

The equivalent code in LowerTypeTests::lower() asserts that the users are all phi nodes in this case. Can you add the same check here?

aeubanks added inline comments.May 20 2022, 4:06 PM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1928

that asserts on [1]. I'm not sure why the assert in LTT never triggers since it does seem like you could potentially do arbitrary things with the result of type tests. if we really wanted to make sure that type tests were only used by assumes, perhaps indirectly through phis, I think that would be more of a verifier check

[1] https://github.com/llvm/llvm-project/blob/8801a5d185fafcb8c03f71ceb717b84c2f08b220/llvm/test/ThinLTO/X86/cache-typeid-resolutions.ll#L32

tejohnson added inline comments.May 21 2022, 4:34 PM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1928

that asserts on [1].

Ah, ok, the assert is too aggressive. But I believe the code change here is as well - for the type test in [1] with the current code LTT will presumably lower it to a false but with this patch it is a true (which I don't think the test is checking, since it is focused on something else, but I'm not sure this code should be replacing all type tests it encounters here with true).

I'm not sure why the assert in LTT never triggers

Because the only type tests that survive the first LTT call are specifically type tests that directly feed assumes [2]. So the only way for it to not feed an assume by the second LTT invocation is if inst combine or the like merged 2 assumes in which case the non-assume use must be a phi.

[2] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/LowerTypeTests.cpp#L2098-L2112

So perhaps this code should look through uses on phis to ensure that the only uses are on assumes. Except, this raises the question of why the assume in the original code was not removed by the loop just above here at lines 1924-1925. Aha - it is because the code that populates this list has the same limitation - looking for type tests that directly feed assumes [3]. Perhaps that is the code that should be modified to look through phis? Because we may be missing out on WPD opportunities by not even considering these.

[3] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Analysis/TypeMetadataUtils.cpp#L82-L85

aeubanks added inline comments.May 23 2022, 1:24 PM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1928

https://reviews.llvm.org/D126228 for changing LTT.

That still seems like a workaround though. I still think that this patch properly addresses the issue discussed in these comments below

// At this point we could remove all type test assume sequences, as they
// were originally inserted for WPD. However, we can keep these in the
// code stream for later analysis (e.g. to help drive more efficient ICP
// sequences). They will eventually be removed by a second LowerTypeTests
// invocation that cleans them up. In order to do this correctly, the first
// LowerTypeTests invocation needs to know that they have "Unknown" type
// test resolution, so that they aren't treated as Unsat and lowered to
// False, which will break any uses on assumes. Below we remove any type
// test assumes that will not be treated as Unknown by LTT.

// The type test assumes will be treated by LTT as Unsat if the type id is
// not used on a global (in which case it has no entry in the TypeIdMap).

If I'm reading it correctly, it's essentially saying that we *have to* remove the type tests for LTT correctness, which is what this patch does. LTT can try and work around type tests that it thinks are for WPD use by ignoring them, but the principled solution is to have WPD remove the type tests. (then maybe we could remove the logic in LTT to try to detect WPD uses of type tests?) And the only way to remove the type tests is to RAUW true to not break the assumes.

We could fix findDevirtualizableCallsForTypeTest to better find corresponding assumes for type tests, but that's not really the problem here, the problem is that we need to make sure we delete all relevant type tests. Improving findDevirtualizableCallsForTypeTest would also have to account for not deleting the same assume multiple times for when multiple type tests map to an assume.

I'm not familiar with WPD, but perhaps a mechanism that doesn't rely on type.test + assumes would make sense, like an intrinsic that's a combined type.test + assume that's only meant to be used for WPD?

tejohnson added inline comments.May 23 2022, 2:39 PM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1928

@pcc I'd really like to get your input as LTT is mostly about CFI which I am not as familiar with.

Regarding the quoted comment, note it is "type test assume" sequences, not all type tests, that need to be removed here in certain circumstances. Note also that before I made the change to try to keep these around longer for use by other analyses, this code attempted to remove all "type test assume" sequences. I.e. it removed all the analyzed assumes, and any type test that was subsequently dead. Presumably that version of WPD+LTT would have also illustrated the original problem this patch tries to address (an assume fed by a phi, in turn fed by a type test that got an Unsat and resulted in llvm.assume(false)), and for a WPD+CFI build we certainly wouldn't want to replace *all* type test uses in the code at this point with true. So I am skeptical that it is correct to replace type test non-assume (non-phi) uses with true in the more limited cases being handled currently either.

I'm not familiar with WPD, but perhaps a mechanism that doesn't rely on type.test + assumes would make sense, like an intrinsic that's a combined type.test + assume that's only meant to be used for WPD?

I would hope we could avoid inventing another intrinsic that conveyed the same information. Also, I believe WPD is important for reducing the overhead of CFI, and I'm not sure how that would be affected by having a different set of intrinsics in the code stream for both.

Does D126228 fix the original failure?

aeubanks added inline comments.May 24 2022, 8:22 AM
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp
1928

I checked D126228 and for some reason the issue I'm seeing is still there with that fix. I'd have to check the IR to see exactly why.

Thanks for the explanation, I guess a more concrete example would be where a type test is used by both an assume and also CFI magic (I was assuming that they'd always be separate but I guess there's no reason why they'd have to be separate). I'll update the patch to do what you've suggested.

aeubanks updated this revision to Diff 431692.May 24 2022, 9:16 AM

change findDevirtualizableCallsForTypeTest instead

aeubanks edited the summary of this revision. (Show Details)May 24 2022, 9:20 AM
aeubanks edited the summary of this revision. (Show Details)
aeubanks retitled this revision from [WPD] Make sure type test is eliminated to [WPD] Try harder to find assumes through phis.May 24 2022, 9:23 AM
pcc added a comment.May 25 2022, 3:32 PM

This case is supposed to be handled as a result of running the WPD pass early. See the comment near the top of PassBuilder::buildThinLTODefaultPipeline:

// These passes import type identifier resolutions for whole-program
// devirtualization and CFI. They must run early because other passes may
// disturb the specific instruction patterns that these passes look for,
// creating dependencies on resolutions that may not appear in the summary.
//
// For example, GVN may transform the pattern assume(type.test) appearing in
// two basic blocks into assume(phi(type.test, type.test)), which would
// transform a dependency on a WPD resolution into a dependency on a type
// identifier resolution for CFI.
//
// Also, WPD has access to more precise information than ICP and can
// devirtualize more effectively, so it should operate on the IR first.
//
// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.

Is there some way that we can now end up with GVN or some other such pass running between when we create the summary and when we run WPD that's inserting a phi?

aeubanks added a comment.EditedMay 25 2022, 4:15 PM

This case is supposed to be handled as a result of running the WPD pass early. See the comment near the top of PassBuilder::buildThinLTODefaultPipeline:

// These passes import type identifier resolutions for whole-program
// devirtualization and CFI. They must run early because other passes may
// disturb the specific instruction patterns that these passes look for,
// creating dependencies on resolutions that may not appear in the summary.
//
// For example, GVN may transform the pattern assume(type.test) appearing in
// two basic blocks into assume(phi(type.test, type.test)), which would
// transform a dependency on a WPD resolution into a dependency on a type
// identifier resolution for CFI.
//
// Also, WPD has access to more precise information than ICP and can
// devirtualize more effectively, so it should operate on the IR first.
//
// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.

Is there some way that we can now end up with GVN or some other such pass running between when we create the summary and when we run WPD that's inserting a phi?

I thought the frontend emitted the type test/assume. I'm seeing them in the IR after the pre link step. CodeGenFunction::EmitTypeMetadataCodeForVCall() in clang emits a type test/assume.

pcc added a comment.May 25 2022, 4:21 PM

This case is supposed to be handled as a result of running the WPD pass early. See the comment near the top of PassBuilder::buildThinLTODefaultPipeline:

// These passes import type identifier resolutions for whole-program
// devirtualization and CFI. They must run early because other passes may
// disturb the specific instruction patterns that these passes look for,
// creating dependencies on resolutions that may not appear in the summary.
//
// For example, GVN may transform the pattern assume(type.test) appearing in
// two basic blocks into assume(phi(type.test, type.test)), which would
// transform a dependency on a WPD resolution into a dependency on a type
// identifier resolution for CFI.
//
// Also, WPD has access to more precise information than ICP and can
// devirtualize more effectively, so it should operate on the IR first.
//
// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.

Is there some way that we can now end up with GVN or some other such pass running between when we create the summary and when we run WPD that's inserting a phi?

I thought the frontend emitted the type test/assume. I'm seeing them in the IR after the pre link step. CodeGenFunction::EmitTypeMetadataCodeForVCall() in clang emits a type test/assume.

Right, but the phi must be coming from somewhere else after summary generation if it isn't accounted for in the summary (if the phi was inserted before summary generation, the summary should end up with a dependency on a CFI type identifier resolution, instead of a WPD resolution). Can you trace where it is coming from?

This case is supposed to be handled as a result of running the WPD pass early. See the comment near the top of PassBuilder::buildThinLTODefaultPipeline:

// These passes import type identifier resolutions for whole-program
// devirtualization and CFI. They must run early because other passes may
// disturb the specific instruction patterns that these passes look for,
// creating dependencies on resolutions that may not appear in the summary.
//
// For example, GVN may transform the pattern assume(type.test) appearing in
// two basic blocks into assume(phi(type.test, type.test)), which would
// transform a dependency on a WPD resolution into a dependency on a type
// identifier resolution for CFI.
//
// Also, WPD has access to more precise information than ICP and can
// devirtualize more effectively, so it should operate on the IR first.
//
// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.

Is there some way that we can now end up with GVN or some other such pass running between when we create the summary and when we run WPD that's inserting a phi?

I thought the frontend emitted the type test/assume. I'm seeing them in the IR after the pre link step. CodeGenFunction::EmitTypeMetadataCodeForVCall() in clang emits a type test/assume.

Right, but the phi must be coming from somewhere else after summary generation if it isn't accounted for in the summary (if the phi was inserted before summary generation, the summary should end up with a dependency on a CFI type identifier resolution, instead of a WPD resolution). Can you trace where it is coming from?

I'm not super familiar with ThinLTO summaries, could you explain a bit more?
The assume(phi(type test, type test)) is coming from the output of the pre link. Then post link WPD looks at all type test usages and I think LTT does the same? So not sure where the summary is relevant.

pcc added a comment.May 25 2022, 5:41 PM

This case is supposed to be handled as a result of running the WPD pass early. See the comment near the top of PassBuilder::buildThinLTODefaultPipeline:

// These passes import type identifier resolutions for whole-program
// devirtualization and CFI. They must run early because other passes may
// disturb the specific instruction patterns that these passes look for,
// creating dependencies on resolutions that may not appear in the summary.
//
// For example, GVN may transform the pattern assume(type.test) appearing in
// two basic blocks into assume(phi(type.test, type.test)), which would
// transform a dependency on a WPD resolution into a dependency on a type
// identifier resolution for CFI.
//
// Also, WPD has access to more precise information than ICP and can
// devirtualize more effectively, so it should operate on the IR first.
//
// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.

Is there some way that we can now end up with GVN or some other such pass running between when we create the summary and when we run WPD that's inserting a phi?

I thought the frontend emitted the type test/assume. I'm seeing them in the IR after the pre link step. CodeGenFunction::EmitTypeMetadataCodeForVCall() in clang emits a type test/assume.

Right, but the phi must be coming from somewhere else after summary generation if it isn't accounted for in the summary (if the phi was inserted before summary generation, the summary should end up with a dependency on a CFI type identifier resolution, instead of a WPD resolution). Can you trace where it is coming from?

I'm not super familiar with ThinLTO summaries, could you explain a bit more?
The assume(phi(type test, type test)) is coming from the output of the pre link. Then post link WPD looks at all type test usages and I think LTT does the same? So not sure where the summary is relevant.

The summary is meant to control what actions are done by the LTT and WPD passes later. For example here is the summary with an assume(phi(type.test, type.test)) which simulates what you told me that the phi comes from the output of the pre link:

> cat foo.ll 
define i1 @f() {
  br i1 undef, label %bb1, label %bb2

bb1:
  %p1 = call i1 @llvm.type.test(i8* null, metadata !"foo")
  br label %end

bb2:
  %p2 = call i1 @llvm.type.test(i8* null, metadata !"bar")
  br label %end

end:
  %p = phi i1 [ %p1, %bb1 ], [ %p2, %bb2 ]
  call void @llvm.assume(i1 %p)
  ret i1 %p
}

declare i1 @llvm.type.test(i8*, metadata) nounwind readnone
declare void @llvm.assume(i1)
> ra/bin/opt -module-summary foo.ll -o foo.bc && ra/bin/llvm-dis -o - foo.bc
; ModuleID = 'foo.bc'
source_filename = "foo.ll"

define i1 @f() {
  br i1 undef, label %bb1, label %bb2

bb1:                                              ; preds = %0
  %p1 = call i1 @llvm.type.test(i8* null, metadata !"foo")
  br label %end

bb2:                                              ; preds = %0
  %p2 = call i1 @llvm.type.test(i8* null, metadata !"bar")
  br label %end

end:                                              ; preds = %bb2, %bb1
  %p = phi i1 [ %p1, %bb1 ], [ %p2, %bb2 ]
  call void @llvm.assume(i1 %p)
  ret i1 %p
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i1 @llvm.type.test(i8*, metadata) #0

; Function Attrs: inaccessiblememonly nocallback nofree nosync nounwind willreturn
declare void @llvm.assume(i1 noundef) #1

attributes #0 = { nocallback nofree nosync nounwind readnone speculatable willreturn }
attributes #1 = { inaccessiblememonly nocallback nofree nosync nounwind willreturn }

^0 = module: (path: "foo.bc", hash: (0, 0, 0, 0, 0))
^1 = gv: (name: "llvm.type.test") ; guid = 608142985856744218
^2 = gv: (name: "llvm.assume") ; guid = 6385187066495850096
^3 = gv: (name: "f", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 8, typeIdInfo: (typeTests: (6699318081062747564, 16434608426314478903))))) ; guid = 14740650423002898831
^4 = blockcount: 4

This is the expected output. Note the appearance of typeTests in the summary, which means that a llvm.type.test was found that was not directly used by llvm.assume. This triggers the creation of a TTRes in the combined summary for foo and bar, as opposed to a WPDRes (you can generate the combined summary with -lowertypetests-write-summary, see e.g. llvm/test/Transforms/LowerTypeTests/export-single.ll). I'm guessing that in your case you are somehow ending up with typeTestAssumes in the summary for these type tests instead of typeTests, or that something else is preventing the TTRes from being created or causing a WPDRes to be created instead.

This case is supposed to be handled as a result of running the WPD pass early. See the comment near the top of PassBuilder::buildThinLTODefaultPipeline:

// These passes import type identifier resolutions for whole-program
// devirtualization and CFI. They must run early because other passes may
// disturb the specific instruction patterns that these passes look for,
// creating dependencies on resolutions that may not appear in the summary.
//
// For example, GVN may transform the pattern assume(type.test) appearing in
// two basic blocks into assume(phi(type.test, type.test)), which would
// transform a dependency on a WPD resolution into a dependency on a type
// identifier resolution for CFI.
//
// Also, WPD has access to more precise information than ICP and can
// devirtualize more effectively, so it should operate on the IR first.
//
// The WPD and LowerTypeTest passes need to run at -O0 to lower type
// metadata and intrinsics.

Is there some way that we can now end up with GVN or some other such pass running between when we create the summary and when we run WPD that's inserting a phi?

I thought the frontend emitted the type test/assume. I'm seeing them in the IR after the pre link step. CodeGenFunction::EmitTypeMetadataCodeForVCall() in clang emits a type test/assume.

Right, but the phi must be coming from somewhere else after summary generation if it isn't accounted for in the summary (if the phi was inserted before summary generation, the summary should end up with a dependency on a CFI type identifier resolution, instead of a WPD resolution). Can you trace where it is coming from?

I'm not super familiar with ThinLTO summaries, could you explain a bit more?
The assume(phi(type test, type test)) is coming from the output of the pre link. Then post link WPD looks at all type test usages and I think LTT does the same? So not sure where the summary is relevant.

The summary is meant to control what actions are done by the LTT and WPD passes later. For example here is the summary with an assume(phi(type.test, type.test)) which simulates what you told me that the phi comes from the output of the pre link:

> cat foo.ll 
define i1 @f() {
  br i1 undef, label %bb1, label %bb2

bb1:
  %p1 = call i1 @llvm.type.test(i8* null, metadata !"foo")
  br label %end

bb2:
  %p2 = call i1 @llvm.type.test(i8* null, metadata !"bar")
  br label %end

end:
  %p = phi i1 [ %p1, %bb1 ], [ %p2, %bb2 ]
  call void @llvm.assume(i1 %p)
  ret i1 %p
}

declare i1 @llvm.type.test(i8*, metadata) nounwind readnone
declare void @llvm.assume(i1)
> ra/bin/opt -module-summary foo.ll -o foo.bc && ra/bin/llvm-dis -o - foo.bc
; ModuleID = 'foo.bc'
source_filename = "foo.ll"

define i1 @f() {
  br i1 undef, label %bb1, label %bb2

bb1:                                              ; preds = %0
  %p1 = call i1 @llvm.type.test(i8* null, metadata !"foo")
  br label %end

bb2:                                              ; preds = %0
  %p2 = call i1 @llvm.type.test(i8* null, metadata !"bar")
  br label %end

end:                                              ; preds = %bb2, %bb1
  %p = phi i1 [ %p1, %bb1 ], [ %p2, %bb2 ]
  call void @llvm.assume(i1 %p)
  ret i1 %p
}

; Function Attrs: nocallback nofree nosync nounwind readnone speculatable willreturn
declare i1 @llvm.type.test(i8*, metadata) #0

; Function Attrs: inaccessiblememonly nocallback nofree nosync nounwind willreturn
declare void @llvm.assume(i1 noundef) #1

attributes #0 = { nocallback nofree nosync nounwind readnone speculatable willreturn }
attributes #1 = { inaccessiblememonly nocallback nofree nosync nounwind willreturn }

^0 = module: (path: "foo.bc", hash: (0, 0, 0, 0, 0))
^1 = gv: (name: "llvm.type.test") ; guid = 608142985856744218
^2 = gv: (name: "llvm.assume") ; guid = 6385187066495850096
^3 = gv: (name: "f", summaries: (function: (module: ^0, flags: (linkage: external, visibility: default, notEligibleToImport: 0, live: 0, dsoLocal: 0, canAutoHide: 0), insts: 8, typeIdInfo: (typeTests: (6699318081062747564, 16434608426314478903))))) ; guid = 14740650423002898831
^4 = blockcount: 4

This is the expected output. Note the appearance of typeTests in the summary, which means that a llvm.type.test was found that was not directly used by llvm.assume. This triggers the creation of a TTRes in the combined summary for foo and bar, as opposed to a WPDRes (you can generate the combined summary with -lowertypetests-write-summary, see e.g. llvm/test/Transforms/LowerTypeTests/export-single.ll). I'm guessing that in your case you are somehow ending up with typeTestAssumes in the summary for these type tests instead of typeTests, or that something else is preventing the TTRes from being created or causing a WPDRes to be created instead.

I don't think the WPDRes is relevant here? If you look at the code that is attempting to remove the type test assumes, in DevirtModule::scanTypeTests(), it is looking at the IR not the resolutions (in findDevirtualizableCallsForTypeTest at https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Analysis/TypeMetadataUtils.cpp#L74-L91), then attempting to remove the sequences from the IR (https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1924-L1929). Before my change to try to keep some of these around for later use, this latter block of code was done unconditionally, and I found I still needed to remove some that would get Unsat resolutions in LTT. Unfortunately because of the phi in this case, the removal is not successful without this patch.

I'm not sure how opaque pointers suddenly started triggering this issue, however. @aeubanks, any ideas as to what change opaque pointers provokes that enables the assume merging?

It's this bit of code: https://crsrc.org/c/third_party/dawn/src/dawn/native/d3d12/ShaderModuleD3D12.cpp;drc=d6b2501be2351fc048727d79acc71f3780d89cf2;l=813

with opaque pointers we're able to sink more common code out of the two branches

typed pointers after pre-link:

; Function Attrs: mustprogress nounwind null_pointer_is_valid optsize sspstrong uwtable
define dso_local void @"?GetD3D12ShaderBytecode@CompiledShader@d3d12@native@dawn@@QEBA?AUD3D12_SHADER_BYTECODE@@XZ"(%474* nocapture noundef readonly align 8 dereferenceable_or_null(24) %0, %738* 
noalias nocapture writeonly sret(%738) align 8 %1) local_unnamed_addr #4 align 2 !type !184 {
  %3 = getelementptr inbounds %474, %474* %0, i64 0, i32 0, i32 0
  %4 = load %476*, %476** %3, align 8
  %5 = icmp eq %476* %4, null
  br i1 %5, label %22, label %6

6:                                                ; preds = %2
  %7 = bitcast %476* %4 to i8* (%476*)***
  %8 = load i8* (%476*)**, i8* (%476*)*** %7, align 8
  %9 = bitcast i8* (%476*)** %8 to i8*
  %10 = tail call i1 @llvm.type.test(i8* %9, metadata !"?AUID3D10Blob@@")
  tail call void @llvm.assume(i1 %10)
  %11 = getelementptr inbounds i8* (%476*)*, i8* (%476*)** %8, i64 3
  %12 = load i8* (%476*)*, i8* (%476*)** %11, align 8
  %13 = tail call noundef i8* %12(%476* noundef nonnull align 8 dereferenceable_or_null(8) %4) #16
  %14 = load %476*, %476** %3, align 8
  %15 = bitcast %476* %14 to i64 (%476*)***
  %16 = load i64 (%476*)**, i64 (%476*)*** %15, align 8
  %17 = bitcast i64 (%476*)** %16 to i8*
  %18 = tail call i1 @llvm.type.test(i8* %17, metadata !"?AUID3D10Blob@@")
  tail call void @llvm.assume(i1 %18)
  %19 = getelementptr inbounds i64 (%476*)*, i64 (%476*)** %16, i64 4
  %20 = load i64 (%476*)*, i64 (%476*)** %19, align 8
  %21 = tail call noundef i64 %20(%476* noundef align 8 dereferenceable_or_null(8) %14) #16
  br label %40

22:                                               ; preds = %2
  %23 = getelementptr inbounds %474, %474* %0, i64 0, i32 1, i32 0
  %24 = load %478*, %478** %23, align 8, !nonnull !185
  %25 = bitcast %478* %24 to i8* (%478*)***
  %26 = load i8* (%478*)**, i8* (%478*)*** %25, align 8
  %27 = bitcast i8* (%478*)** %26 to i8*
  %28 = tail call i1 @llvm.type.test(i8* %27, metadata !"?AUIDxcBlob@@")
  tail call void @llvm.assume(i1 %28)
  %29 = getelementptr inbounds i8* (%478*)*, i8* (%478*)** %26, i64 3
  %30 = load i8* (%478*)*, i8* (%478*)** %29, align 8
  %31 = tail call noundef i8* %30(%478* noundef nonnull align 8 dereferenceable_or_null(8) %24) #16
  %32 = load %478*, %478** %23, align 8
  %33 = bitcast %478* %32 to i64 (%478*)***
  %34 = load i64 (%478*)**, i64 (%478*)*** %33, align 8
  %35 = bitcast i64 (%478*)** %34 to i8*
  %36 = tail call i1 @llvm.type.test(i8* %35, metadata !"?AUIDxcBlob@@")
  tail call void @llvm.assume(i1 %36)
  %37 = getelementptr inbounds i64 (%478*)*, i64 (%478*)** %34, i64 4
  %38 = load i64 (%478*)*, i64 (%478*)** %37, align 8
  %39 = tail call noundef i64 %38(%478* noundef align 8 dereferenceable_or_null(8) %32) #16
  br label %40

40:                                               ; preds = %22, %6
  %41 = phi i8* [ %31, %22 ], [ %13, %6 ]
  %42 = phi i64 [ %39, %22 ], [ %21, %6 ]
  %43 = getelementptr inbounds %738, %738* %1, i64 0, i32 0
  store i8* %41, i8** %43, align 8
  %44 = getelementptr inbounds %738, %738* %1, i64 0, i32 1
  store i64 %42, i64* %44, align 8
  ret void
}

opaque pointers after pre-link

; Function Attrs: mustprogress nounwind null_pointer_is_valid optsize sspstrong uwtable
define dso_local void @"?GetD3D12ShaderBytecode@CompiledShader@d3d12@native@dawn@@QEBA?AUD3D12_SHADER_BYTECODE@@XZ"(ptr nocapture noundef readonly align 8 dereferenceable_or_null(24) %0, ptr noalias nocapture writeonly sret(%435) align 8 %1) local_unnamed_addr #4 align 2 !type !184 {
  %3 = load ptr, ptr %0, align 8
  %4 = icmp eq ptr %3, null
  br i1 %4, label %14, label %5

5:                                                ; preds = %2
  %6 = load ptr, ptr %3, align 8
  %7 = tail call i1 @llvm.type.test(ptr %6, metadata !"?AUID3D10Blob@@")
  tail call void @llvm.assume(i1 %7)
  %8 = getelementptr inbounds ptr, ptr %6, i64 3
  %9 = load ptr, ptr %8, align 8
  %10 = tail call noundef ptr %9(ptr noundef nonnull align 8 dereferenceable_or_null(8) %3) #16
  store ptr %10, ptr %1, align 8
  %11 = load ptr, ptr %0, align 8
  %12 = load ptr, ptr %11, align 8
  %13 = tail call i1 @llvm.type.test(ptr %12, metadata !"?AUID3D10Blob@@")
  br label %25

14:                                               ; preds = %2
  %15 = getelementptr inbounds %157, ptr %0, i64 0, i32 1
  %16 = load ptr, ptr %15, align 8, !nonnull !185
  %17 = load ptr, ptr %16, align 8
  %18 = tail call i1 @llvm.type.test(ptr %17, metadata !"?AUIDxcBlob@@")
  tail call void @llvm.assume(i1 %18)
  %19 = getelementptr inbounds ptr, ptr %17, i64 3
  %20 = load ptr, ptr %19, align 8
  %21 = tail call noundef ptr %20(ptr noundef nonnull align 8 dereferenceable_or_null(8) %16) #16
  store ptr %21, ptr %1, align 8
  %22 = load ptr, ptr %15, align 8
  %23 = load ptr, ptr %22, align 8
  %24 = tail call i1 @llvm.type.test(ptr %23, metadata !"?AUIDxcBlob@@")
  br label %25

25:                                               ; preds = %14, %5
  %26 = phi i1 [ %24, %14 ], [ %13, %5 ]
  %27 = phi ptr [ %23, %14 ], [ %12, %5 ]
  %28 = phi ptr [ %22, %14 ], [ %11, %5 ]
  tail call void @llvm.assume(i1 %26)
  %29 = getelementptr inbounds ptr, ptr %27, i64 4
  %30 = load ptr, ptr %29, align 8
  %31 = tail call noundef i64 %30(ptr noundef align 8 dereferenceable_or_null(8) %28) #16
  %32 = getelementptr inbounds %435, ptr %1, i64 0, i32 1
  store i64 %31, ptr %32, align 8
  ret void
}

ping

I believe tejohnson's comment is correct, the index shouldn't be involved here based on the actual WPD code

ping
can we land this to resolve the miscompile?

C++ repro

$ cat /tmp/a.cc
struct [[clang::lto_visibility_public]] A {
        virtual void f();
};
struct [[clang::lto_visibility_public]] B {
        virtual void f();
};

void f(A*a, B*b, bool c) {
        if (c) a->f();
        else b->f();
}
# clang/lld with the following patch since I don't know how to pass -mllvm flags to lld
diff --git a/llvm/lib/Passes/StandardInstrumentations.cpp b/llvm/lib/Passes/StandardInstrumentations.cpp
index 4cd2a1ad51e3..e3d861ab60d9 100644
--- a/llvm/lib/Passes/StandardInstrumentations.cpp
+++ b/llvm/lib/Passes/StandardInstrumentations.cpp
@@ -93,7 +93,7 @@ enum class ChangePrinter {
 };
 static cl::opt<ChangePrinter> PrintChanged(
     "print-changed", cl::desc("Print changed IRs"), cl::Hidden,
-    cl::ValueOptional, cl::init(ChangePrinter::NoChangePrinter),
+    cl::ValueOptional, cl::init(ChangePrinter::PrintChangedQuiet),
     cl::values(
         clEnumValN(ChangePrinter::PrintChangedQuiet, "quiet",
                    "Run in quiet mode"),

$ ./build/rel/bin/clang++ -O2 -flto=thin -fwhole-program-vtables /tmp/a.cc -shared -o /dev/null -fuse-ld=lld
# shows that the function becomes just an `unreachable`
pcc requested changes to this revision.Wed, Jun 8, 9:14 PM

Thanks for the reproducer!

I think the root cause of the issue here is that the IR generated by Clang is wrong. llvm.type.test will only consider types that are local to the LTO unit, which means that it is incorrect to emit llvm.assume(llvm.type.test) for types A or B, because they have public LTO visibility, which means that definitions may reside outside of the LTO unit. The LowerTypeTests pass is therefore correctly replacing the llvm.type.tests with false, leading to the miscompilation seen here. The issue can also be seen if the [[clang::lto_visibility_public]] attribute is removed.

I gather from looking at the commit history that the intent was to express information about the likelihood of particular calls being made as input to the ICP pass via these intrinsics (although it appears that this work was never completed as I can't find any code in the ICP pass that looks at these intrinsics). I think it would be inappropriate to use llvm.assume(llvm.type.test) for this purpose. Instead, in the case where the type has public LTO visibility, the IR should express a strong likelihood that the llvm.type.test will evaluate to true, for example by using the llvm.expect intrinsic. Then even if an optimization like this were to occur, the IR would still boil down to llvm.expect(false), which would not cause a miscompilation.

This revision now requires changes to proceed.Wed, Jun 8, 9:14 PM

Thanks for the reproducer!

I think the root cause of the issue here is that the IR generated by Clang is wrong. llvm.type.test will only consider types that are local to the LTO unit, which means that it is incorrect to emit llvm.assume(llvm.type.test) for types A or B, because they have public LTO visibility, which means that definitions may reside outside of the LTO unit. The LowerTypeTests pass is therefore correctly replacing the llvm.type.tests with false, leading to the miscompilation seen here. The issue can also be seen if the [[clang::lto_visibility_public]] attribute is removed.

This is by design per the changes I made awhile back to delay the decision about visibility until LTO link time: https://groups.google.com/g/llvm-dev/c/6LfIiAo9g68/m/Ng1Xvw4WAwAJ
They should have public !vcall_visibility metadata by default, and since whole program visibility isn't being asserted here, there should be no devirt, etc.

I gather from looking at the commit history that the intent was to express information about the likelihood of particular calls being made as input to the ICP pass via these intrinsics (although it appears that this work was never completed as I can't find any code in the ICP pass that looks at these intrinsics).

Right, I have prototyped changes to allow this to guide insertion of ICP guard sequences to avoid an extra load (compare against possible vtables, not virtual function defs). I've unfortunately never had the bandwidth to clean that up and send for review, but I should carve out the time to get that upstreamed at least under an option. But note I don't think this is the cause of the failure - the code in WPD to remove these sequences (the assumes, and any type tests with no more uses) existed before I split LTT for the ICP purpose, so even without my changes to keep in some of them we would have left behind these cases incorrectly - currently we are properly detecting here that the assume(type.test) needs to be removed because LTT won't handle it as desired, but the phi is blocking the type test removal.

I think it would be inappropriate to use llvm.assume(llvm.type.test) for this purpose. Instead, in the case where the type has public LTO visibility, the IR should express a strong likelihood that the llvm.type.test will evaluate to true, for example by using the llvm.expect intrinsic. Then even if an optimization like this were to occur, the IR would still boil down to llvm.expect(false), which would not cause a miscompilation.

Again, this is related to the RFC linked above, and I believe when we discussed it back then you were in favor of using the same llvm.assume(llvm.type.test) sequences. Which isn't to say that it couldn't be changed, but I think the !vcall_visibility metadata handling should be enough to guide this appropriately? I think the issue here is still just that it isn't properly being cleaned up when there is a phi involved.

pcc added a comment.Thu, Jun 9, 12:48 PM

Thanks for the reproducer!

I think the root cause of the issue here is that the IR generated by Clang is wrong. llvm.type.test will only consider types that are local to the LTO unit, which means that it is incorrect to emit llvm.assume(llvm.type.test) for types A or B, because they have public LTO visibility, which means that definitions may reside outside of the LTO unit. The LowerTypeTests pass is therefore correctly replacing the llvm.type.tests with false, leading to the miscompilation seen here. The issue can also be seen if the [[clang::lto_visibility_public]] attribute is removed.

This is by design per the changes I made awhile back to delay the decision about visibility until LTO link time: https://groups.google.com/g/llvm-dev/c/6LfIiAo9g68/m/Ng1Xvw4WAwAJ
They should have public !vcall_visibility metadata by default, and since whole program visibility isn't being asserted here, there should be no devirt, etc.

The problem with this design is that it leaves the meaning of llvm.type.test undefined in the case where there are no globals that are a member of the type, because the !vcall_visibility is associated with the global. So there's no way to know whether the type argument was intended to refer to a type with hidden LTO visibility or one with public LTO visibility, if there are no member globals.

Since -lto-whole-program-visibility is a flag that affects the operation of the type tests globally, I think a better design would be to replace all llvm.type.test calls with true, and replaces llvm.type.checked.load with a regular load, if the flag is not passed.

I gather from looking at the commit history that the intent was to express information about the likelihood of particular calls being made as input to the ICP pass via these intrinsics (although it appears that this work was never completed as I can't find any code in the ICP pass that looks at these intrinsics).

Right, I have prototyped changes to allow this to guide insertion of ICP guard sequences to avoid an extra load (compare against possible vtables, not virtual function defs). I've unfortunately never had the bandwidth to clean that up and send for review, but I should carve out the time to get that upstreamed at least under an option. But note I don't think this is the cause of the failure - the code in WPD to remove these sequences (the assumes, and any type tests with no more uses) existed before I split LTT for the ICP purpose, so even without my changes to keep in some of them we would have left behind these cases incorrectly - currently we are properly detecting here that the assume(type.test) needs to be removed because LTT won't handle it as desired, but the phi is blocking the type test removal.

To me that sounds like WPD is taking broken IR and attempting to "fix" it before it reaches another pass. That doesn't make for a particularly sound design because WPD would need to be able to detect all possible cases of broken IR in the presence of arbitrary optimization passes between the producer and WPD.

I think it would be inappropriate to use llvm.assume(llvm.type.test) for this purpose. Instead, in the case where the type has public LTO visibility, the IR should express a strong likelihood that the llvm.type.test will evaluate to true, for example by using the llvm.expect intrinsic. Then even if an optimization like this were to occur, the IR would still boil down to llvm.expect(false), which would not cause a miscompilation.

Again, this is related to the RFC linked above, and I believe when we discussed it back then you were in favor of using the same llvm.assume(llvm.type.test) sequences.

I'm not sure if that was my position, but perhaps I just hadn't noticed the issue with interaction with other passes.

Which isn't to say that it couldn't be changed, but I think the !vcall_visibility metadata handling should be enough to guide this appropriately? I think the issue here is still just that it isn't properly being cleaned up when there is a phi involved.

I think that just dealing with the phi is papering over the root cause of the issue. There's nothing preventing an optimization pass from inserting something else between the type test and the assume and reintroducing the same issue.

Since -lto-whole-program-visibility is a flag that affects the operation of the type tests globally, I think a better design would be to replace all llvm.type.test calls with true, and replaces llvm.type.checked.load with a regular load, if the flag is not passed.

My original patch (https://reviews.llvm.org/D126089?id=431037) replaced llvm.type.test with true in all cases, but tejohnson said

Regarding the quoted comment, note it is "type test assume" sequences, not all type tests, that need to be removed here in certain circumstances. Note also that before I made the change to try to keep these around longer for use by other analyses, this code attempted to remove all "type test assume" sequences. I.e. it removed all the analyzed assumes, and any type test that was subsequently dead. Presumably that version of WPD+LTT would have also illustrated the original problem this patch tries to address (an assume fed by a phi, in turn fed by a type test that got an Unsat and resulted in llvm.assume(false)), and for a WPD+CFI build we certainly wouldn't want to replace *all* type test uses in the code at this point with true. So I am skeptical that it is correct to replace type test non-assume (non-phi) uses with true in the more limited cases being handled currently either.

Thanks for the reproducer!

I think the root cause of the issue here is that the IR generated by Clang is wrong. llvm.type.test will only consider types that are local to the LTO unit, which means that it is incorrect to emit llvm.assume(llvm.type.test) for types A or B, because they have public LTO visibility, which means that definitions may reside outside of the LTO unit. The LowerTypeTests pass is therefore correctly replacing the llvm.type.tests with false, leading to the miscompilation seen here. The issue can also be seen if the [[clang::lto_visibility_public]] attribute is removed.

This is by design per the changes I made awhile back to delay the decision about visibility until LTO link time: https://groups.google.com/g/llvm-dev/c/6LfIiAo9g68/m/Ng1Xvw4WAwAJ
They should have public !vcall_visibility metadata by default, and since whole program visibility isn't being asserted here, there should be no devirt, etc.

The problem with this design is that it leaves the meaning of llvm.type.test undefined in the case where there are no globals that are a member of the type, because the !vcall_visibility is associated with the global. So there's no way to know whether the type argument was intended to refer to a type with hidden LTO visibility or one with public LTO visibility, if there are no member globals.

Since -lto-whole-program-visibility is a flag that affects the operation of the type tests globally, I think a better design would be to replace all llvm.type.test calls with true, and replaces llvm.type.checked.load with a regular load, if the flag is not passed.

We only perform WPD if there is a global, however. And then we subsequently remove these sequences if there is no global for a type id [1].

Maybe I am misunderstanding the concern though, can you give an example where this will go wrong?

Also, is it the case that we can only get into the case [1], where the type id is not used on a global, with my changes to insert type test for non-hidden visibility?

[1] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1942-L1945

Another case I found where I had to keep the removal of the type test assume sequences is described in [2]. I'm not sure that has anything to do with the visibility.

[2] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1947-L1965

I gather from looking at the commit history that the intent was to express information about the likelihood of particular calls being made as input to the ICP pass via these intrinsics (although it appears that this work was never completed as I can't find any code in the ICP pass that looks at these intrinsics).

Right, I have prototyped changes to allow this to guide insertion of ICP guard sequences to avoid an extra load (compare against possible vtables, not virtual function defs). I've unfortunately never had the bandwidth to clean that up and send for review, but I should carve out the time to get that upstreamed at least under an option. But note I don't think this is the cause of the failure - the code in WPD to remove these sequences (the assumes, and any type tests with no more uses) existed before I split LTT for the ICP purpose, so even without my changes to keep in some of them we would have left behind these cases incorrectly - currently we are properly detecting here that the assume(type.test) needs to be removed because LTT won't handle it as desired, but the phi is blocking the type test removal.

To me that sounds like WPD is taking broken IR and attempting to "fix" it before it reaches another pass. That doesn't make for a particularly sound design because WPD would need to be able to detect all possible cases of broken IR in the presence of arbitrary optimization passes between the producer and WPD.

I think it would be inappropriate to use llvm.assume(llvm.type.test) for this purpose. Instead, in the case where the type has public LTO visibility, the IR should express a strong likelihood that the llvm.type.test will evaluate to true, for example by using the llvm.expect intrinsic. Then even if an optimization like this were to occur, the IR would still boil down to llvm.expect(false), which would not cause a miscompilation.

Again, this is related to the RFC linked above, and I believe when we discussed it back then you were in favor of using the same llvm.assume(llvm.type.test) sequences.

I'm not sure if that was my position, but perhaps I just hadn't noticed the issue with interaction with other passes.

Which isn't to say that it couldn't be changed, but I think the !vcall_visibility metadata handling should be enough to guide this appropriately? I think the issue here is still just that it isn't properly being cleaned up when there is a phi involved.

I think that just dealing with the phi is papering over the root cause of the issue. There's nothing preventing an optimization pass from inserting something else between the type test and the assume and reintroducing the same issue.

pcc added a comment.Thu, Jun 9, 2:29 PM

Since -lto-whole-program-visibility is a flag that affects the operation of the type tests globally, I think a better design would be to replace all llvm.type.test calls with true, and replaces llvm.type.checked.load with a regular load, if the flag is not passed.

My original patch (https://reviews.llvm.org/D126089?id=431037) replaced llvm.type.test with true in all cases, but tejohnson said

Regarding the quoted comment, note it is "type test assume" sequences, not all type tests, that need to be removed here in certain circumstances. Note also that before I made the change to try to keep these around longer for use by other analyses, this code attempted to remove all "type test assume" sequences. I.e. it removed all the analyzed assumes, and any type test that was subsequently dead. Presumably that version of WPD+LTT would have also illustrated the original problem this patch tries to address (an assume fed by a phi, in turn fed by a type test that got an Unsat and resulted in llvm.assume(false)), and for a WPD+CFI build we certainly wouldn't want to replace *all* type test uses in the code at this point with true. So I am skeptical that it is correct to replace type test non-assume (non-phi) uses with true in the more limited cases being handled currently either.

Oh right, I managed to confuse myself about the semantics of not passing -lto-whole-program-visibility. It doesn't have the effect of giving all classes public LTO visibility, just those which were declared with public LTO visibility. The former would have some undesirable consequences for CFI.

So I think that means that WPD needs to be using a separate intrinsic for checks on classes with public LTO visibility. That would get replaced with llvm.type.test if -lto-whole-program-visibility is passed, and true if it is not.

pcc added a comment.Thu, Jun 9, 2:40 PM

Thanks for the reproducer!

I think the root cause of the issue here is that the IR generated by Clang is wrong. llvm.type.test will only consider types that are local to the LTO unit, which means that it is incorrect to emit llvm.assume(llvm.type.test) for types A or B, because they have public LTO visibility, which means that definitions may reside outside of the LTO unit. The LowerTypeTests pass is therefore correctly replacing the llvm.type.tests with false, leading to the miscompilation seen here. The issue can also be seen if the [[clang::lto_visibility_public]] attribute is removed.

This is by design per the changes I made awhile back to delay the decision about visibility until LTO link time: https://groups.google.com/g/llvm-dev/c/6LfIiAo9g68/m/Ng1Xvw4WAwAJ
They should have public !vcall_visibility metadata by default, and since whole program visibility isn't being asserted here, there should be no devirt, etc.

The problem with this design is that it leaves the meaning of llvm.type.test undefined in the case where there are no globals that are a member of the type, because the !vcall_visibility is associated with the global. So there's no way to know whether the type argument was intended to refer to a type with hidden LTO visibility or one with public LTO visibility, if there are no member globals.

Since -lto-whole-program-visibility is a flag that affects the operation of the type tests globally, I think a better design would be to replace all llvm.type.test calls with true, and replaces llvm.type.checked.load with a regular load, if the flag is not passed.

We only perform WPD if there is a global, however. And then we subsequently remove these sequences if there is no global for a type id [1].

Maybe I am misunderstanding the concern though, can you give an example where this will go wrong?

I think Arthur's reproducer is exactly an example of where this will go wrong. Any IR which is not in the exact pattern expected by WPD (phi is just one example but who knows what passes will do in the future) will cause WPD to leave the IR in an incorrect form that will cause problems for LTT.

Also, is it the case that we can only get into the case [1], where the type id is not used on a global, with my changes to insert type test for non-hidden visibility?

[1] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1942-L1945

Another case I found where I had to keep the removal of the type test assume sequences is described in [2]. I'm not sure that has anything to do with the visibility.

[2] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1947-L1965

I strongly suspect that we should be able to remove the code handling both of those cases if we fix the root cause here.

tejohnson added a comment.EditedFri, Jun 10, 6:19 AM

Thanks for the reproducer!

I think the root cause of the issue here is that the IR generated by Clang is wrong. llvm.type.test will only consider types that are local to the LTO unit, which means that it is incorrect to emit llvm.assume(llvm.type.test) for types A or B, because they have public LTO visibility, which means that definitions may reside outside of the LTO unit. The LowerTypeTests pass is therefore correctly replacing the llvm.type.tests with false, leading to the miscompilation seen here. The issue can also be seen if the [[clang::lto_visibility_public]] attribute is removed.

This is by design per the changes I made awhile back to delay the decision about visibility until LTO link time: https://groups.google.com/g/llvm-dev/c/6LfIiAo9g68/m/Ng1Xvw4WAwAJ
They should have public !vcall_visibility metadata by default, and since whole program visibility isn't being asserted here, there should be no devirt, etc.

The problem with this design is that it leaves the meaning of llvm.type.test undefined in the case where there are no globals that are a member of the type, because the !vcall_visibility is associated with the global. So there's no way to know whether the type argument was intended to refer to a type with hidden LTO visibility or one with public LTO visibility, if there are no member globals.

Since -lto-whole-program-visibility is a flag that affects the operation of the type tests globally, I think a better design would be to replace all llvm.type.test calls with true, and replaces llvm.type.checked.load with a regular load, if the flag is not passed.

We only perform WPD if there is a global, however. And then we subsequently remove these sequences if there is no global for a type id [1].

Maybe I am misunderstanding the concern though, can you give an example where this will go wrong?

I think Arthur's reproducer is exactly an example of where this will go wrong. Any IR which is not in the exact pattern expected by WPD (phi is just one example but who knows what passes will do in the future) will cause WPD to leave the IR in an incorrect form that will cause problems for LTT.

Ok, I wasn't sure if you meant that or another broader concern about having no globals associated with the type. I think what you are saying is that the handling in [1] is needed as a result of the visibility change I made, i.e. the visibility change is why we need to still try to remove those assume(type.test) sequences, and thus hit the issue with this IR pattern - is that correct?

Also, is it the case that we can only get into the case [1], where the type id is not used on a global, with my changes to insert type test for non-hidden visibility?

[1] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1942-L1945

Another case I found where I had to keep the removal of the assume(type.test) sequences is described in [2]. I'm not sure that has anything to do with the visibility.

[2] https://github.com/llvm/llvm-project/blob/59afc4038b1096bc6fea7b322091f6e5e2dc0b38/llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp#L1947-L1965

I strongly suspect that we should be able to remove the code handling both of those cases if we fix the root cause here.

I'm not convinced that we cannot hit either of these situations, particularly [2], in cases with hidden visibility, and I'd like to confirm whether or not that's the case before changing the IR to add new intrinsics.
How about this, I propose:

  1. Approve this patch, since it makes things strictly better and unblocks the miscompile. While I suppose we could in theory end up with other types of instructions between the type test and the assume use, I'm not sure in practice what that could be beyond a phi from assume merging?
  2. I'll investigate more to see if I can prove or disprove that we have to remove assume(type.test) for one of the existing reasons ([1] or [2]) for cases with hidden visibility, and then we'll go from there.

FTR, I don't think this issue has anything to do with the splitting of LTT. Recall that we were also trying to remove *all* the assume(type.test) sequences before that change (D73242), the question is just whether the couple instances where I found we still needed to remove them ([1] and [2]) are due to the earlier visibility change (D71913) or not.

(fyi made a couple of slight edits to my last comment for clarity)

is this patch ok to submit at least as a short term workaround?

is this patch ok to submit at least as a short term workaround?

lgtm, per my last comment. But @pcc needs to confirm since he has requested changes on the patch.

But can you add some comments to the test to note which cases it is checking?

aeubanks updated this revision to Diff 436836.Tue, Jun 14, 9:59 AM

add comments to test

pcc added a comment.Tue, Jun 14, 10:00 AM

is this patch ok to submit at least as a short term workaround?

Chrome already has a short-term workaround, which is to disable opaque pointers, right? I think you can use that until the proper fix is developed.

is this patch ok to submit at least as a short term workaround?

Chrome already has a short-term workaround, which is to disable opaque pointers, right? I think you can use that until the proper fix is developed.

As noted, I'm really not convinced that we cannot hit this issue for hidden visibility patches.

However, another reason to submit this particular patch is that it is merely enabling WPD in additional cases, when there is a phi between the type test and assume, by recognizing more assumes. I would imagine we will still have cases like that even for hidden visibility classes, or with your proposed fix to use a new TBD intrinsic that is converted to an llvm.type.test in the case of --lto-whole-program-visibility, right? So in either case this seems to me to be an improvement.

pcc added a comment.Tue, Jun 14, 12:24 PM

is this patch ok to submit at least as a short term workaround?

Chrome already has a short-term workaround, which is to disable opaque pointers, right? I think you can use that until the proper fix is developed.

As noted, I'm really not convinced that we cannot hit this issue for hidden visibility patches.

However, another reason to submit this particular patch is that it is merely enabling WPD in additional cases, when there is a phi between the type test and assume, by recognizing more assumes. I would imagine we will still have cases like that even for hidden visibility classes, or with your proposed fix to use a new TBD intrinsic that is converted to an llvm.type.test in the case of --lto-whole-program-visibility, right? So in either case this seems to me to be an improvement.

For hidden LTO visibility classes the set of classes in the LTO unit is exactly the same as the set of classes which is possible at runtime, which is not the case for public LTO visibility classes. So if the type test gets detached from the assume by one of the intervening optimization passes, instead of ending up with assume(false) you will get assume(useless CFI type check), assuming that there's at least one possible class. And that's harmless from a correctness perspective.

is this patch ok to submit at least as a short term workaround?

Chrome already has a short-term workaround, which is to disable opaque pointers, right? I think you can use that until the proper fix is developed.

As upstream tests move away from testing typed pointers, I'm worried that pinning to typed pointers will cause regressions that we don't want to spend time investigating. I'm not sure how long a redesign/rewrite would take so I'd like to get Chrome off of typed pointers soonish.

As noted, I'm really not convinced that we cannot hit this issue for hidden visibility patches.

However, another reason to submit this particular patch is that it is merely enabling WPD in additional cases, when there is a phi between the type test and assume, by recognizing more assumes. I would imagine we will still have cases like that even for hidden visibility classes, or with your proposed fix to use a new TBD intrinsic that is converted to an llvm.type.test in the case of --lto-whole-program-visibility, right? So in either case this seems to me to be an improvement.

For hidden LTO visibility classes the set of classes in the LTO unit is exactly the same as the set of classes which is possible at runtime, which is not the case for public LTO visibility classes. So if the type test gets detached from the assume by one of the intervening optimization passes, instead of ending up with assume(false) you will get assume(useless CFI type check), assuming that there's at least one possible class. And that's harmless from a correctness perspective.

But ignoring the correctness concerns this patch was originally created to fix and looking at this patch from a "do more optimizations" standpoint this is a win because WPD will see more assume(typetests).

pcc added a comment.Tue, Jun 14, 2:21 PM

is this patch ok to submit at least as a short term workaround?

Chrome already has a short-term workaround, which is to disable opaque pointers, right? I think you can use that until the proper fix is developed.

As upstream tests move away from testing typed pointers, I'm worried that pinning to typed pointers will cause regressions that we don't want to spend time investigating. I'm not sure how long a redesign/rewrite would take so I'd like to get Chrome off of typed pointers soonish.

You're still going to have the correctness problem though whether typed pointers are enabled or not.

I'm more worried that this patch will land and then that will deprioritize fixing this properly so we'll be stuck with the correctness issue and LTO will become even more difficult to understand. I'm sorry but I think I have a responsibility as a reviewer to insist on this being fixed properly, even if it means folks have to adjust their priorities.

As noted, I'm really not convinced that we cannot hit this issue for hidden visibility patches.

However, another reason to submit this particular patch is that it is merely enabling WPD in additional cases, when there is a phi between the type test and assume, by recognizing more assumes. I would imagine we will still have cases like that even for hidden visibility classes, or with your proposed fix to use a new TBD intrinsic that is converted to an llvm.type.test in the case of --lto-whole-program-visibility, right? So in either case this seems to me to be an improvement.

For hidden LTO visibility classes the set of classes in the LTO unit is exactly the same as the set of classes which is possible at runtime, which is not the case for public LTO visibility classes. So if the type test gets detached from the assume by one of the intervening optimization passes, instead of ending up with assume(false) you will get assume(useless CFI type check), assuming that there's at least one possible class. And that's harmless from a correctness perspective.

But ignoring the correctness concerns this patch was originally created to fix and looking at this patch from a "do more optimizations" standpoint this is a win because WPD will see more assume(typetests).

I don't think it would though because the individual summaries would still just have typeTests instead of assumeTypeTests, so there wouldn't be a combined summary generated that the WPD pass could use.

aeubanks abandoned this revision.Thu, Jun 16, 10:18 AM