This is an archive of the discontinued LLVM Phabricator instance.

[llvm-reduce] Add MIR support.
ClosedPublic

Authored by markus on Sep 27 2021, 2:33 AM.

Details

Summary

In response to https://groups.google.com/g/llvm-dev/c/dggTGzLIezw/m/AYA97MVcAwAJ

I had a go at extending the llvm-reduce framework to also be able to do reduction on MIR.

For convenience and more easy integration into our downstream repository (without breaking current llvm-reduce) the source code of llvm-reduce has been copied into llvm-mir-reduce and then modified instead of modified in place on a separate branch. So I think the end goal should be integration with llvm-reduce but for now during initial development this is how it is.

Actually the differences in the existing code are really minor as can be seen by recursively diffing the two directories as in

$ meld llvm-reduce/ llvm-mir-reduce/

It is mostly about wrapping Module inside a made up class MyMachineModule that contains a Module as well as MachineFunction and then provide methods and functions to import, export and clone these.

Previously LLVM has been lacking utility functions to clone MachineFunctions and this is something that I have brought up as well (https://groups.google.com/g/llvm-dev/c/kTD5t_iHhTM/m/nLMuQOGgAwAJ) because I think it could be useful in other scenarios. The one I have implemented here is not perfect but it does seem to work for this purpose and if there are other needs maybe it could be evolved and properly integrated into lib/CodeGen.

So I guess there are a few questions to answer here

  1. Should this tool be upstreamed?
  2. Should it be integrated with llvm-reduce?
  3. How should we handle/wrap the Module and MachineFunctions?
  4. Should we provide generic MahcineFunction cloning support?

Thoughts and comments please!

Diff Detail

Event Timeline

markus created this revision.Sep 27 2021, 2:33 AM
markus requested review of this revision.Sep 27 2021, 2:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 27 2021, 2:33 AM
markus edited the summary of this revision. (Show Details)Sep 27 2021, 3:01 AM
markus added reviewers: qcolombet, aeubanks.
jmorse added a subscriber: jmorse.Sep 28 2021, 2:02 AM
foad added a subscriber: foad.Sep 28 2021, 2:09 AM

I definitely think this should be upstreamed if possible.
It'd be nice if it could fit into the existing llvm-reduce to reduce copy/paste. e.g. llc can read IR and MIR.

I'm not familiar enough with MIR to comment on cloning MachineFunctions.

bjope added a subscriber: bjope.Sep 29 2021, 11:33 AM
markus updated this revision to Diff 377108.Oct 5 2021, 1:05 AM
markus retitled this revision from [WIP] llvm-mir-reduce tool to [WIP][llvm-reduce] Add MIR support..
markus edited the summary of this revision. (Show Details)

Did a quick & dirty integration into llvm-reduce. All lit-tests as in llvm/test/tools/llvm-reduce/ pass but I suspect that the integration needs to be much prettier.

markus updated this revision to Diff 377235.Oct 5 2021, 7:41 AM

Slightly better integration.

markus updated this revision to Diff 377774.Oct 7 2021, 2:45 AM
markus retitled this revision from [WIP][llvm-reduce] Add MIR support. to [llvm-reduce] Add MIR support..

Ok, so lets drop the [WIP] status and try to get this thing reviewed.

As far as I can tell the impact on existing llvm-reduce code should be quite limited at this point.

Obviously the MyMachineModule type name is a bad one and needs to be replaced with something better. Maybe WrappedModule, WrappedMandMF or WrappedModuleAndMachineFunction?

I am not sure who would be interested in reviewing but for anyone who is feel free to add yourself.

aeubanks added inline comments.Oct 11 2021, 10:46 AM
llvm/test/tools/llvm-reduce/mir/instr-reduce.mir
2

is it possible to do this via FileCheck like the other tests?

llvm/tools/llvm-reduce/MyMachineModule.cpp
23

moving this inside llvm/lib would be nice

llvm/tools/llvm-reduce/deltas/Delta.h
106

I'd prefer if this were still a Module param for normal passes to avoid the MyMachineModule abstraction inside the delta passes.
Is there a way to overload this with a Module and a MachineFunction version? e.g. forwarding them to an internal version of runDeltaPass() with wrappers to convert from a MyMachineModule to Module/MachineFunction

llvm/tools/llvm-reduce/deltas/ReduceInstructionsMIR.h
2

needs updating

llvm/tools/llvm-reduce/llvm-reduce.cpp
96

will this work before initialization?

98

it'd be nice if we could automatically infer this from the input (like llc)

Matt added a subscriber: Matt.Oct 11 2021, 11:42 AM
markus updated this revision to Diff 379390.Oct 13 2021, 8:07 AM

Rebased to top of main and addressed some review remarks.

markus marked 2 inline comments as done.Oct 13 2021, 8:17 AM
markus added inline comments.
llvm/test/tools/llvm-reduce/mir/instr-reduce.mir
2

Maybe but others e.g. llvm/test/tools/llvm-reduce/remove-instructions.ll do it with custom python as well and I felt that best resembles a realistic use case scenario.

llvm/tools/llvm-reduce/MyMachineModule.cpp
23

Agree but maybe we can consider that a longer term goal. This implementation seems to be OK for this purpose but not sure it would be for generic use inside llvm/lib.

llvm/tools/llvm-reduce/deltas/Delta.h
106

Agree! It is always a good idea to reduce the amount of diffs and the patch looks much cleaner now.
I ran into some problems with the function_ref and overload resolution so I downgrade to a more primitive function pointer and things worked as expected again. I think the problem is related to https://stackoverflow.com/questions/7608741/c11-template-function-that-takes-a-stdfunction-which-depends-of-template-par but I have not looked at it too closely yet.

llvm/tools/llvm-reduce/llvm-reduce.cpp
98

Yeah, but maybe we can save that for later as well.

markus updated this revision to Diff 379636.Oct 14 2021, 2:29 AM
markus edited the summary of this revision. (Show Details)

Renamed MyMachineModule to ReducerWorkItem
Got rid of raw function pointer.

lkail added a subscriber: lkail.Oct 14 2021, 2:43 AM
markus updated this revision to Diff 380894.Oct 20 2021, 4:34 AM

Added -x flag to set input language as well as autodetection from filename.

But besides that, how far are we from landing this patch?

aeubanks added inline comments.Oct 21 2021, 1:16 PM
llvm/test/tools/llvm-reduce/mir/instr-reduce.mir
5

why does this need plugins?

llvm/tools/llvm-reduce/DeltaManager.cpp
92–96

I think we can probably always print everything here, split into IR and MIR passes

llvm/tools/llvm-reduce/ReducerWorkItem.h
21 ↗(On Diff #380894)

I'm not super familiar with MIR, but do we only support one function at a time?

llvm/tools/llvm-reduce/deltas/ReduceInstructionsMIR.cpp
22

is this include needed?

llvm/tools/llvm-reduce/llvm-reduce.cpp
98

could you add a TODO?

143–150

a cl::opt with cl::values would be nicer, see debug-pass-manager in NewPMDriver.cpp

I'm not sure i fully like the coupling here.
In reality, how much code duplication is being avoided by bundling these two tools together?
Would it not be better to extract the common interface and simplify writing new back/front -ends for llvm-reduce,
instead of having everything in a single tool?

markus updated this revision to Diff 381478.Oct 22 2021, 1:36 AM
markus marked 3 inline comments as done.

Addressed review remarks.

markus added inline comments.Oct 22 2021, 1:41 AM
llvm/test/tools/llvm-reduce/mir/instr-reduce.mir
5

It does not.

llvm/tools/llvm-reduce/ReducerWorkItem.h
21 ↗(On Diff #380894)

There could be multiple functions but we do not support it here right now. There is a comment about it in parsing in ReducerWorkItem.cpp:114. When reducing MIR reproducers you have typically already limited yourself to a single functions so I do not see that there is a great need. Not initially at least, it can always be done as a follow up improvement later on.

llvm/tools/llvm-reduce/llvm-reduce.cpp
98

I thought this comment was about replacing EnableMIR with InputLanguage and inferring from the filename which I have done already.

I'm not sure i fully like the coupling here.
In reality, how much code duplication is being avoided by bundling these two tools together?

Well, it is true that the llvm-reduce tool did not contain that many lines of code in the first place, and especially not if one excludes the IR specific reduction passes.
There is the Delta Debugging Algorithm in Delta.cpp and some general stuff for handling input and output files as well as running the interestingness tests.
In general code duplication is considered a bad thing so I think we all can agree that code reuse in some sense is desirable.

Would it not be better to extract the common interface and simplify writing new back/front -ends for llvm-reduce,
instead of having everything in a single tool?

Could be but it also seems like that would be a much larger change (at least to the existing llvm-reduce). I am not particularly interested in doing that in this patch but as always anyone is free to evolve things further in a follow up commit if they consider it worth while.

So how do we proceed?

luke957 resigned from this revision.Oct 28 2021, 7:23 AM

So sorry for my bad herald script.

I'm not sure i fully like the coupling here.
In reality, how much code duplication is being avoided by bundling these two tools together?

Well, it is true that the llvm-reduce tool did not contain that many lines of code in the first place, and especially not if one excludes the IR specific reduction passes.
There is the Delta Debugging Algorithm in Delta.cpp and some general stuff for handling input and output files as well as running the interestingness tests.
In general code duplication is considered a bad thing so I think we all can agree that code reuse in some sense is desirable.

Would it not be better to extract the common interface and simplify writing new back/front -ends for llvm-reduce,
instead of having everything in a single tool?

Could be but it also seems like that would be a much larger change (at least to the existing llvm-reduce). I am not particularly interested in doing that in this patch but as always anyone is free to evolve things further in a follow up commit if they consider it worth while.

I think currently the shared mir vs ir code doesn't hinder development too much. If it ever does come to that point we should split off an llvm-mir-reduce, but that can be done in the future.

(and sorry, I've been busy, trying to find time to review)

llvm/tools/llvm-reduce/DeltaManager.cpp
41

this really shouldn't be a global, it should be passed in as a parameter

llvm/tools/llvm-reduce/deltas/ReduceInstructionsMIR.cpp
85

delete

llvm/tools/llvm-reduce/llvm-reduce.cpp
147

we shouldn't need this, I think the cl::opt will handle misspellings

I'm not sure i fully like the coupling here.
In reality, how much code duplication is being avoided by bundling these two tools together?

Well, it is true that the llvm-reduce tool did not contain that many lines of code in the first place, and especially not if one excludes the IR specific reduction passes.
There is the Delta Debugging Algorithm in Delta.cpp and some general stuff for handling input and output files as well as running the interestingness tests.
In general code duplication is considered a bad thing so I think we all can agree that code reuse in some sense is desirable.

Would it not be better to extract the common interface and simplify writing new back/front -ends for llvm-reduce,
instead of having everything in a single tool?

Could be but it also seems like that would be a much larger change (at least to the existing llvm-reduce). I am not particularly interested in doing that in this patch but as always anyone is free to evolve things further in a follow up commit if they consider it worth while.

I think currently the shared mir vs ir code doesn't hinder development too much. If it ever does come to that point we should split off an llvm-mir-reduce, but that can be done in the future.

Do these tools have anything in common other than the delta algorithm?
Will *any* pass be shared?

(and sorry, I've been busy, trying to find time to review)

markus updated this revision to Diff 383246.Oct 29 2021, 12:21 AM

Addressed review remarks.

markus marked 3 inline comments as done.EditedOct 29 2021, 12:47 AM

Do these tools have anything in common other than the delta algorithm?

I would say that the tools are almost identical except that they work on different representations (IR and MIR). The purpose of the tools are the same, the command line interface, the handling of temporary files and the execution of interestingness tests are all the same and of course the main component the delta algorithm is the same.

Will *any* pass be shared?

The passes are unlikely to ever be shared as they are specific to the representation.

One can of course go the other way around and split the tools, refactor to make the common parts reusable (and put them in some lib somewhere) but to me that seems like a much larger change and the code duplication will be higher. This is what we have right now. Either way is doable but none of the alternatives are likely to please everyone.

aeubanks accepted this revision.Nov 1 2021, 9:43 AM

Do these tools have anything in common other than the delta algorithm?

I would say that the tools are almost identical except that they work on different representations (IR and MIR). The purpose of the tools are the same, the command line interface, the handling of temporary files and the execution of interestingness tests are all the same and of course the main component the delta algorithm is the same.

Will *any* pass be shared?

The passes are unlikely to ever be shared as they are specific to the representation.

One can of course go the other way around and split the tools, refactor to make the common parts reusable (and put them in some lib somewhere) but to me that seems like a much larger change and the code duplication will be higher. This is what we have right now. Either way is doable but none of the alternatives are likely to please everyone.

I agree at least for now. Currently I think the overhead of maintaining two llvm-reduces would be much larger than the problems that arise from sharing code. This may change in the future, and then we should consider splitting them.

This revision is now accepted and ready to land.Nov 1 2021, 9:43 AM
This revision was automatically updated to reflect the committed changes.
reames added a subscriber: reames.Nov 10 2021, 9:35 AM

I'm looking at using this functionality, and ran into a question on the design. I'm not clear why you exposed an mtriple command line argument on llvm-reduce. It appears you're using this to derive a default data layout, and as an input to parsing the machine function. However, LLVM IR allows specifying both data layout and triple in IR. As a result, we should be able to have the parsing derive the appropriate target directly from the test case. Is there something I'm missing here?

I'm looking at using this functionality, and ran into a question on the design. I'm not clear why you exposed an mtriple command line argument on llvm-reduce. It appears you're using this to derive a default data layout, and as an input to parsing the machine function. However, LLVM IR allows specifying both data layout and triple in IR. As a result, we should be able to have the parsing derive the appropriate target directly from the test case. Is there something I'm missing here?

I believe it's only used in the MIR case, not the IR case. This could definitely be made clearer in the cl::opt help.

I'm looking at using this functionality, and ran into a question on the design. I'm not clear why you exposed an mtriple command line argument on llvm-reduce. It appears you're using this to derive a default data layout, and as an input to parsing the machine function. However, LLVM IR allows specifying both data layout and triple in IR. As a result, we should be able to have the parsing derive the appropriate target directly from the test case. Is there something I'm missing here?

I believe it's only used in the MIR case, not the IR case. This could definitely be made clearer in the cl::opt help.

I don't think it's necessary even for the MIR case. MIR includes the full IR module, and is as a result, is also fully self describing. (If the optional datalayout and triple fields are present.)

I'm looking at using this functionality, and ran into a question on the design. I'm not clear why you exposed an mtriple command line argument on llvm-reduce. It appears you're using this to derive a default data layout, and as an input to parsing the machine function. However, LLVM IR allows specifying both data layout and triple in IR. As a result, we should be able to have the parsing derive the appropriate target directly from the test case. Is there something I'm missing here?

I believe it's only used in the MIR case, not the IR case. This could definitely be made clearer in the cl::opt help.

I don't think it's necessary even for the MIR case. MIR includes the full IR module, and is as a result, is also fully self describing. (If the optional datalayout and triple fields are present.)

The IR part can often be eliminated from a MIR based test case. But sure, one could argue that when running llvm-reduce it could be mandatory to add an IR section in the .mir file first (but then I think one needs to also add the function prototype).

The IR part can often be eliminated from a MIR based test case. But sure, one could argue that when running llvm-reduce it could be mandatory to add an IR section in the .mir file first (but then I think one needs to also add the function prototype).

I can maybe see this as an argument for allowing override of the triple, but not as an argument for not defaulting to the triple if present. Frankly, I find even the override case unconvincing.

And if MIR format doesn't allow an empty-except-for-DL/triple module, that's a bug in the parser/printer and we should just fix that.

Just to be clear here, I'm happy to do the work to adjust this. I'm just trying to make sure I'm not missing some subtlety in the interface before investing the time.

The IR part can often be eliminated from a MIR based test case. But sure, one could argue that when running llvm-reduce it could be mandatory to add an IR section in the .mir file first (but then I think one needs to also add the function prototype).

I can maybe see this as an argument for allowing override of the triple, but not as an argument for not defaulting to the triple if present. Frankly, I find even the override case unconvincing.

Yes. If there is a target triple in the IR then overriding it might be weird. I don't remember exactly how it works in llc, but as a user I guess I'd probably would assume that the behavior would be the same between the tools when using cl option vs having the triple in the IR, or having both.

The IR part can often be eliminated from a MIR based test case. But sure, one could argue that when running llvm-reduce it could be mandatory to add an IR section in the .mir file first (but then I think one needs to also add the function prototype).

I can maybe see this as an argument for allowing override of the triple, but not as an argument for not defaulting to the triple if present. Frankly, I find even the override case unconvincing.

Yes. If there is a target triple in the IR then overriding it might be weird. I don't remember exactly how it works in llc, but as a user I guess I'd probably would assume that the behavior would be the same between the tools when using cl option vs having the triple in the IR, or having both.

The way I see it is more or less what @bjope already said, often with MIR tests the IR section is eliminated (e.g. llvm/test/tools/llvm-reduce/mir/instr-reduce.mir) and you want to be able to work with that so some kind of option for setting the triple is needed.
If the IR section specifies a triple and/or data layout then it makes sense that that is used and -mtriple is not required. If both are specified then either -mtriple should have precedence or an error message should be printed. Perhaps the latter option is preferred to avoid confusion.
Besides that there was no deep thinking behind it other than trying to have MIR support working with the least amount of code changes.