This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add initial support for unfolding broadcast loads from arithmetic instructions to enable LICM hoisting of the load
ClosedPublic

Authored by craig.topper on Aug 30 2019, 1:07 PM.

Details

Summary

MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point.

Event Timeline

craig.topper created this revision.Aug 30 2019, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 30 2019, 1:07 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper marked 3 inline comments as done.Aug 30 2019, 1:09 PM
craig.topper added inline comments.
llvm/lib/Target/X86/X86InstrFoldTables.h
28

This was done to get a free bit for the TB_FOLDED_BCAST flag

44–46

I re-encoded these to get room for the broadcast type. I can pre-commit this.

llvm/lib/Target/X86/X86InstrInfo.cpp
4808

Part of the re-encoding of the alignment field.

Does this break the X86FoldTablesEmitter in anyway?

In the future do you think we could we add something similar to support scalar ops and ops where memory size != register size?

llvm/lib/Target/X86/X86InstrFoldTables.cpp
5248

Do masked ops work? Possibly add a few to this initial test?

llvm/lib/Target/X86/X86InstrFoldTables.h
44–46

I'm happy for the bit adjustments to go in straight away (without the new broadcast bits - these need to stay in this patch).

craig.topper marked an inline comment as done.Aug 31 2019, 11:54 AM

Does this break the X86FoldTablesEmitter in anyway?

In the future do you think we could we add something similar to support scalar ops and ops where memory size != register size?

I don't think it will break the X86FoldTablesEmitter. The emitter just won't generate the new table.

Yes I think we could support scalar ops in the future.

llvm/lib/Target/X86/X86InstrFoldTables.cpp
5248

I think if the mask was dynamically all 0 then any memory fault would be masked. So I think if we unfold it, we would need to generate a masked broadcast as well to maintain the fault suppression. Which wouldn't be eligible for hoisting. Though we don't fold masked operations in the first place so unfolding without applying the mask would be fine today, but I wouldn't want it to break in the future.

Are there any AVX512 ISAs that generate b/w broadcasts folded instructions?

llvm/lib/Target/X86/X86InstrInfo.cpp
5325

Test coverage?

5334

Test coverage?

Remove byte and word broadcasts

RKSimon accepted this revision.Sep 1 2019, 9:45 AM

LGTM with 2 minors, cheers.

llvm/lib/Target/X86/X86InstrFoldTables.h
54

Unused bits 14-15

55

12 - 13 ?

This revision is now accepted and ready to land.Sep 1 2019, 9:45 AM
This revision was automatically updated to reflect the committed changes.