This is an archive of the discontinued LLVM Phabricator instance.

Differential D22042

[AArch64] Macro fusion of simple ALU ops with branches for Broadcom's Vulcan
ClosedPublic

Authored by pgode on Jul 6 2016, 6:31 AM.

Download Raw Diff

Details

Reviewers

rengolin
t.p.northover

Commits

rG5d118a16762f: [AArch64] Macro fusion of simple ALU ops with branches for Broadcom's Vulcan
rL274837: [AArch64] Macro fusion of simple ALU ops with branches for Broadcom's Vulcan

Summary

Support for the macro fusion of simple ALU ops with branches for the Vulcan sub-target.

Patch by Meador Inge

Diff Detail

Event Timeline

pgode updated this revision to Diff 62853.Jul 6 2016, 6:31 AM

pgode retitled this revision from to [AArch64] Macro fusion of simple ALU ops with branches for Broadcom's Vulcan.

pgode updated this object.

pgode added reviewers: t.p.northover, rengolin.

pgode added subscribers: llvm-commits, meadori, echristo and 2 others.

Herald added a subscriber: aemerson. · View Herald TranscriptJul 6 2016, 6:31 AM

testcases?

rengolin added inline comments.Jul 6 2016, 8:18 AM

lib/Target/AArch64/AArch64InstrInfo.cpp
1806	This looks like a job for table-gen?

rengolin added inline comments.Jul 6 2016, 8:19 AM

lib/Target/AArch64/AArch64InstrInfo.cpp
1806	Also, please, don't add more getProcFamily calls, use sub-target features.

Redo the patch please and add llvm-commits as a subscriber before you
publish the patch.

pgode added inline comments.Jul 7 2016, 3:36 AM

lib/Target/AArch64/AArch64InstrInfo.cpp
1806	For Cyclone, only subset of 6 instruction fusion cases are applicable, but for Vulcan, additional instructions are applicable, which may not be applicable for Cyclone. So, to generalize this under one 'Subtarget Feature', getProcFamily call seems unavoidable. Please suggest. Or, Is a new subtarget feature, such as 'FeatureMacroOpFusionVulcanSubset' a better option? Also, I am not sure about the table-gen option and how it can be done there.

rengolin added inline comments.Jul 7 2016, 3:50 AM

lib/Target/AArch64/AArch64InstrInfo.cpp
1806	This is not a Cyclone vs Vulcan, but some things are more profitable than others, and it's possible, as you said, that this is the case for other cores, including Cyclone. A new target feature is needed, but one with "vulcan" in its name, because that would defeat the purpose. We are getting rid of things like that, ex. "isLikeA9" because it's a trap. The table-gen option would be to add a property to those instructions and use a check on that property here. I'll pass on the benefits of that apporach vs. this (other people can chime in), but this is orthogonal to the target feature discussion. Bottom line is: we don't want more CPU checks.

pgode added inline comments.Jul 8 2016, 12:27 AM

lib/Target/AArch64/AArch64InstrInfo.cpp
1806	The approach of adding a new sub-feature for Macro-op fusion, by categorizing the instructions (my presumption) doesn't seem a good option. It will end up adding too many subfeature such as FeatureMacroOpFusionArith/FeatureMacroOpFusionLogical. Please correct me. I approached the table-gen option of adding instruction property, similar to adding CheapAsAMov property. But, in MCID Flags, there are already 32 flags, 'new flag MacroOpFusable' becomes the 33rd flag. As there might be future flags as well, which someone might add, so we should use it as a 64-bit. I am thinking of submitting a 'new diff' on this review by just enabling 'FeatureMacroOpFusion' (AArch64.td file modification) for Vulcan and let only ADDS, SUBS, ANDS get fused (default Subtarget feature behavior) and work on table-gen part for complete solution. Please suggest.

The approach of adding a new sub-feature for Macro-op fusion, by categorizing the instructions (my presumption) doesn't seem a good option. It will end up adding too many subfeature such as FeatureMacroOpFusionArith/FeatureMacroOpFusionLogical. Please correct me.

I approached the table-gen option of adding instruction property, similar to adding CheapAsAMov property. In MCID(MCInstrDesc) Flags, there are already 32 flags, 'new flag MacroOpFusable' becomes the 33rd flag. Though Flags is 'uint64_t', still I see a warning message 'left shift count >= width of type'.

I am thinking of submitting a 'new diff' on this review by just enabling 'FeatureMacroOpFusion' (AArch64.td file modification) for Vulcan and let only ADDS, SUBS, ANDS get fused (default Subtarget feature behavior) and work on table-gen part for complete solution. Please suggest.

In D22042#477752, @pgode wrote:

I approached the table-gen option of adding instruction property, similar to adding CheapAsAMov property. In MCID(MCInstrDesc) Flags, there are already 32 flags, 'new flag MacroOpFusable' becomes the 33rd flag. Though Flags is 'uint64_t', still I see a warning message 'left shift count >= width of type'.

Hum, that's not good. We'll have to think about many of them, if we can turn them into properties, rather than features. There were some that could, maybe we need a larger re-factor than I was expecting.

I am thinking of submitting a 'new diff' on this review by just enabling 'FeatureMacroOpFusion' (AArch64.td file modification) for Vulcan and let only ADDS, SUBS, ANDS get fused (default Subtarget feature behavior) and work on table-gen part for complete solution. Please suggest.

I think you're right. This is the pragmatic approach and will give us time to work out a better way forward.

Thanks!
--renato

Updated diff.
Removed fusion of additional instructions for Vulcan.
Default instructions supported by 'Macrofusion subtarget feature' will be fused.

Right, that looks good, thanks!

test/CodeGen/AArch64/misched-fusion.ll is already testing the flag itself, so we don't need additional tests.

cheers,
--renato

This revision is now accepted and ready to land.Jul 8 2016, 3:46 AM

Closed by commit rL274837: [AArch64] Macro fusion of simple ALU ops with branches for Broadcom's Vulcan (authored by pgode). · Explain WhyJul 8 2016, 4:21 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64.td

1 line

AArch64InstrInfo.cpp

73 lines

Diff 62853

lib/Target/AArch64/AArch64.td

Show First 20 Lines • Show All 239 Lines • ▼ Show 20 Lines	def ProcKryo : SubtargetFeature<"kryo", "ARMProcFamily", "Kryo",
FeaturePredictableSelectIsExpensive		FeaturePredictableSelectIsExpensive
]>;		]>;

def ProcVulcan : SubtargetFeature<"vulcan", "ARMProcFamily", "Vulcan",		def ProcVulcan : SubtargetFeature<"vulcan", "ARMProcFamily", "Vulcan",
"Broadcom Vulcan processors", [		"Broadcom Vulcan processors", [
FeatureCRC,		FeatureCRC,
FeatureCrypto,		FeatureCrypto,
FeatureFPARMv8,		FeatureFPARMv8,
		FeatureMacroOpFusion,
FeatureNEON,		FeatureNEON,
FeaturePostRAScheduler,		FeaturePostRAScheduler,
HasV8_1aOps]>;		HasV8_1aOps]>;

def : ProcessorModel<"generic", NoSchedModel, [		def : ProcessorModel<"generic", NoSchedModel, [
FeatureCRC,		FeatureCRC,
FeatureFPARMv8,		FeatureFPARMv8,
FeatureNEON,		FeatureNEON,
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

lib/Target/AArch64/AArch64InstrInfo.cpp

	Show First 20 Lines • Show All 1,797 Lines • ▼ Show 20 Lines
	}			}

	bool AArch64InstrInfo::shouldScheduleAdjacent(MachineInstr &First,			bool AArch64InstrInfo::shouldScheduleAdjacent(MachineInstr &First,
	MachineInstr &Second) const {			MachineInstr &Second) const {
	if (Subtarget.hasMacroOpFusion()) {			if (Subtarget.hasMacroOpFusion()) {
	// Fuse CMN, CMP, TST followed by Bcc.			// Fuse CMN, CMP, TST followed by Bcc.
	unsigned SecondOpcode = Second.getOpcode();			unsigned SecondOpcode = Second.getOpcode();
	if (SecondOpcode == AArch64::Bcc) {			if (SecondOpcode == AArch64::Bcc) {
				if (Subtarget.getProcFamily() == AArch64Subtarget::Vulcan) {
				rengolinUnsubmitted Not Done Reply Inline Actions This looks like a job for table-gen? rengolin: This looks like a job for table-gen?
				rengolinUnsubmitted Not Done Reply Inline Actions Also, please, don't add more getProcFamily calls, use sub-target features. rengolin: Also, please, don't add more getProcFamily calls, use sub-target features.
				pgodeAuthorUnsubmitted Not Done Reply Inline Actions For Cyclone, only subset of 6 instruction fusion cases are applicable, but for Vulcan, additional instructions are applicable, which may not be applicable for Cyclone. So, to generalize this under one 'Subtarget Feature', getProcFamily call seems unavoidable. Please suggest. Or, Is a new subtarget feature, such as 'FeatureMacroOpFusionVulcanSubset' a better option? Also, I am not sure about the table-gen option and how it can be done there. pgode: For Cyclone, only subset of 6 instruction fusion cases are applicable, but for Vulcan…
				rengolinUnsubmitted Not Done Reply Inline Actions This is not a Cyclone vs Vulcan, but some things are more profitable than others, and it's possible, as you said, that this is the case for other cores, including Cyclone. A new target feature is needed, but one with "vulcan" in its name, because that would defeat the purpose. We are getting rid of things like that, ex. "isLikeA9" because it's a trap. The table-gen option would be to add a property to those instructions and use a check on that property here. I'll pass on the benefits of that apporach vs. this (other people can chime in), but this is orthogonal to the target feature discussion. Bottom line is: we don't want more CPU checks. rengolin: This is not a Cyclone vs Vulcan, but some things are more profitable than others, and it's…
				pgodeAuthorUnsubmitted Not Done Reply Inline Actions The approach of adding a new sub-feature for Macro-op fusion, by categorizing the instructions (my presumption) doesn't seem a good option. It will end up adding too many subfeature such as FeatureMacroOpFusionArith/FeatureMacroOpFusionLogical. Please correct me. I approached the table-gen option of adding instruction property, similar to adding CheapAsAMov property. But, in MCID Flags, there are already 32 flags, 'new flag MacroOpFusable' becomes the 33rd flag. As there might be future flags as well, which someone might add, so we should use it as a 64-bit. I am thinking of submitting a 'new diff' on this review by just enabling 'FeatureMacroOpFusion' (AArch64.td file modification) for Vulcan and let only ADDS, SUBS, ANDS get fused (default Subtarget feature behavior) and work on table-gen part for complete solution. Please suggest. pgode: The approach of adding a new sub-feature for Macro-op fusion, by categorizing the instructions…
				// All simple ALU operations that use one micro-op
				// can be fused on Vulcan. This is essentially the
				// operations without extend and/or shift.
				switch (First.getOpcode()) {
				default:
				return false;
				// ADD(S)
				case AArch64::ADDWri:
				case AArch64::ADDWrr:
				case AArch64::ADDXri:
				case AArch64::ADDXrr:
				case AArch64::ADDSWri:
				case AArch64::ADDSWrr:
				case AArch64::ADDSXri:
				case AArch64::ADDSXrr:
				// SUB(S)
				case AArch64::SUBWri:
				case AArch64::SUBWrr:
				case AArch64::SUBXri:
				case AArch64::SUBXrr:
				case AArch64::SUBSWri:
				case AArch64::SUBSWrr:
				case AArch64::SUBSXri:
				case AArch64::SUBSXrr:
				// ADC(S)
				case AArch64::ADCWr:
				case AArch64::ADCXr:
				case AArch64::ADCSWr:
				case AArch64::ADCSXr:
				// AND(S)
				case AArch64::ANDWri:
				case AArch64::ANDWrr:
				case AArch64::ANDXri:
				case AArch64::ANDXrr:
				case AArch64::ANDSXri:
				case AArch64::ANDSXrr:
				case AArch64::ANDSWri:
				case AArch64::ANDSWrr:
				// BIC(S)
				case AArch64::BICWrr:
				case AArch64::BICXrr:
				case AArch64::BICSXrr:
				case AArch64::BICSWrr:
				// EON
				case AArch64::EONWrr:
				case AArch64::EONXrr:
				// EOR
				case AArch64::EORWri:
				case AArch64::EORWrr:
				case AArch64::EORXri:
				case AArch64::EORXrr:
				// ORN
				case AArch64::ORNWrr:
				case AArch64::ORNXrr:
				// ORR
				case AArch64::ORRWri:
				case AArch64::ORRWrr:
				case AArch64::ORRXri:
				case AArch64::ORRXrr:
				// CCMN
				case AArch64::CCMNWi:
				case AArch64::CCMNWr:
				case AArch64::CCMNXi:
				case AArch64::CCMNXr:
				// CCMP
				case AArch64::CCMPWi:
				case AArch64::CCMPWr:
				case AArch64::CCMPXi:
				case AArch64::CCMPXr:
				return true;
				}
				}
	switch (First.getOpcode()) {			switch (First.getOpcode()) {
	default:			default:
	return false;			return false;
	case AArch64::SUBSWri:			case AArch64::SUBSWri:
	case AArch64::ADDSWri:			case AArch64::ADDSWri:
	case AArch64::ANDSWri:			case AArch64::ANDSWri:
	case AArch64::SUBSXri:			case AArch64::SUBSXri:
	case AArch64::ADDSXri:			case AArch64::ADDSXri:
	▲ Show 20 Lines • Show All 2,182 Lines • Show Last 20 Lines