This is an archive of the discontinued LLVM Phabricator instance.

[Matrix] Also run LowerMatrixIntrinsics for -O0.
ClosedPublic

Authored by fhahn on Mar 17 2020, 2:44 PM.

Details

Summary

The lowering of the llvm.matrix intrinsics happens exclusively in the
LowerMatrixIntrinsics pass and the backends do not know about those
intrinsics. That means we also have to run LowerMatrixIntrinsics for
-O0. Otherwise Clang will crash in the backend for code with llvm.matrix
intrinsics.

Fixes PR45227.

Diff Detail

Event Timeline

fhahn created this revision.Mar 17 2020, 2:44 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 17 2020, 2:44 PM
fhahn updated this revision to Diff 252691.Mar 25 2020, 3:20 PM

Ping.

Updated to add a test.

nicolasvasilache accepted this revision.Mar 25 2020, 6:37 PM

I don't imagine this is particularly intrusive, LGTM.

This revision is now accepted and ready to land.Mar 25 2020, 6:37 PM

Normally most "must-run" passes are part of the codegen pipeline (TargetPassConfig). It's convenient if the backend can handle arbitrary bitcode files without having to run some arbitrary set of "opt" passes first.

fhahn added a comment.Mar 26 2020, 9:12 AM

Normally most "must-run" passes are part of the codegen pipeline (TargetPassConfig). It's convenient if the backend can handle arbitrary bitcode files without having to run some arbitrary set of "opt" passes first.

Thanks for pointing me to TargetPassConfig! Ideally the lowering pass would run unconditionally, but should be a no-op for functions that do not contain any matrix intrinsics. I think a good option would be to require the frontends to add an attribute ('may-contain-matrix-intrinsics') to functions that may contain matrix intrinsics. The LowerMatrixIntrinsics pass is run unconditionally, but bails out early for functions without the attribute. I've put up a patch D76857

This also should allow to add it unconditionally to TargetPassConfig and ensures that we do not run it unnecessarily in the backend again, if the intrinsics were already lowered in the middle-end: D76858

For running as part of the backend pipeline, I propose adding a simplified/minimal lowering pass, that only requires TTI, as we are planning on making the lowering pass more involved and running it in the backend should be last-resort kind of thing.

Does that approach seem reasonable?

The attribute seems like overkill as a performance optimization. You can get 99% of the way there by just checking if if the module declares any matrix intrinsics.

Otherwise seems fine.

fhahn added a comment.Mar 26 2020, 9:48 AM

The attribute seems like overkill as a performance optimization. You can get 99% of the way there by just checking if if the module declares any matrix intrinsics.

Hmm, is there an easy way to check that without checking all the declarations in a module? Or would we have to check all declarations for all functions in a module?

I assume the matrix intrinsics overloaded? In that case, you'd have to iterate over the module, yes.

I'm tempted to say we should add a new mechanism to the backend for passes that only care about calls to a specific set of intrinsics, so we don't have to reinvent this repeatedly.

I assume the matrix intrinsics overloaded? In that case, you'd have to iterate over the module, yes.

Unfortunately yes :(

I'm tempted to say we should add a new mechanism to the backend for passes that only care about calls to a specific set of intrinsics, so we don't have to reinvent this repeatedly.

We would also need it in the middle-end for the matrix lowering (and potentially others as well).

Do you have anything in particular in mind? For intrinsics that get solely emitted by the frontend, I guess it would be possible check if the module contains any such instructions and store it and have something like Module::containsMatrixIntrinsics(). Not sure if that's acceptable though.

We could make the Module automatically compute a map from intrinsics to overloads of that intrinsic in the module, I guess. Probably not too expensive to maintain.

We could make the Module automatically compute a map from intrinsics to overloads of that intrinsic in the module, I guess. Probably not too expensive to maintain.

Sounds good. I'll prepare a patch.

We could make the Module automatically compute a map from intrinsics to overloads of that intrinsic in the module, I guess. Probably not too expensive to maintain.

I've put up D76941. Is that along the lines you suggested?

fhahn closed this revision.Jul 16 2020, 9:50 AM

Relevant parts landed in cbe0e539e79e