-
Notifications
You must be signed in to change notification settings - Fork 15.6k
[LV] Support argmin/argmax with strict predicates. #170223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -13,10 +13,12 @@ | |||||||||
|
|
||||||||||
| #include "LoopVectorizationPlanner.h" | ||||||||||
| #include "VPlan.h" | ||||||||||
| #include "VPlanAnalysis.h" | ||||||||||
| #include "VPlanCFG.h" | ||||||||||
| #include "VPlanDominatorTree.h" | ||||||||||
| #include "VPlanPatternMatch.h" | ||||||||||
| #include "VPlanTransforms.h" | ||||||||||
| #include "VPlanUtils.h" | ||||||||||
| #include "llvm/Analysis/LoopInfo.h" | ||||||||||
| #include "llvm/Analysis/LoopIterator.h" | ||||||||||
| #include "llvm/Analysis/ScalarEvolution.h" | ||||||||||
|
|
@@ -1120,6 +1122,133 @@ bool VPlanTransforms::handleMaxMinNumReductions(VPlan &Plan) { | |||||||||
| return true; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| /// For argmin/argmax reductions with strict predicates, convert the existing | ||||||||||
| /// FindLastIV reduction to a new UMin reduction of a wide canonical IV. If the | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| /// original IV was not canonical, a new canonical wide IV is added, and the | ||||||||||
| /// final result is scaled back to the original IV. | ||||||||||
| static bool handleStrictArgMinArgMax(VPlan &Plan, | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The pattern is finding the arg of the first min/max, it corresponds to the condition for updating min/max and its arg being strict. |
||||||||||
| VPReductionPHIRecipe *MinMaxPhiR, | ||||||||||
| VPReductionPHIRecipe *FindIVPhiR, | ||||||||||
| VPWidenIntOrFpInductionRecipe *WideIV, | ||||||||||
| VPInstruction *MinMaxResult) { | ||||||||||
| Type *Ty = Plan.getVectorLoopRegion()->getCanonicalIVType(); | ||||||||||
| if (Ty != VPTypeAnalysis(Plan).inferScalarType(FindIVPhiR)) | ||||||||||
|
Comment on lines
+1134
to
+1135
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So FindIV may record a non-canonical IV, but it must be of the canonical IV's type, hence calling it |
||||||||||
| return false; | ||||||||||
|
|
||||||||||
| // If the original wide IV is not canonical, create a new one. The wide IV is | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why create a new one instead of using the loop's CanonicalIV? |
||||||||||
| // guaranteed to not wrap for all lanes that are active in the vector loop. | ||||||||||
| if (!WideIV->isCanonical()) { | ||||||||||
| VPValue *Zero = Plan.getOrAddLiveIn(ConstantInt::get(Ty, 0)); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| VPValue *One = Plan.getOrAddLiveIn(ConstantInt::get(Ty, 1)); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| auto *WidenCanIV = new VPWidenIntOrFpInductionRecipe( | ||||||||||
| nullptr, Zero, One, WideIV->getVFValue(), | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the recipe works with null as its underlying value, perhaps all instances should consistently be constructed w/o an underlying value, and the recipe be slightly renamed |
||||||||||
| WideIV->getInductionDescriptor(), VPIRFlags(), WideIV->getDebugLoc()); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should VPIRFlags record the claim that "The "wide IV is guaranteed to not wrap for all lanes that are active in the vector loop"? |
||||||||||
| WidenCanIV->insertBefore(WideIV); | ||||||||||
|
|
||||||||||
| // Update the select to use the wide canonical IV. | ||||||||||
| auto *SelectRecipe = cast<VPSingleDefRecipe>( | ||||||||||
| FindIVPhiR->getBackedgeValue()->getDefiningRecipe()); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Worth matching SelectRecipe to make sure it means what we think it means, and assert or return false otherwise? |
||||||||||
| if (SelectRecipe->getOperand(1) == WideIV) | ||||||||||
| SelectRecipe->setOperand(1, WidenCanIV); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could replaceUsesOfWith/replaceUseWithIf be used? |
||||||||||
| else if (SelectRecipe->getOperand(2) == WideIV) | ||||||||||
| SelectRecipe->setOperand(2, WidenCanIV); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. else unreachable or return false? |
||||||||||
| } | ||||||||||
|
|
||||||||||
| // Create the new UMin reduction recipe to track the minimum index. | ||||||||||
| assert(!FindIVPhiR->isInLoop() && !FindIVPhiR->isOrdered() && | ||||||||||
| "inloop and ordered reductions not supported"); | ||||||||||
| VPValue *MaxInt = | ||||||||||
| Plan.getConstantInt(APInt::getMaxValue(Ty->getIntegerBitWidth())); | ||||||||||
| ReductionStyle Style = RdxUnordered{FindIVPhiR->getVFScaleFactor()}; | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does FindIV reduction support scaling? Is MinIdx reduction to support scaling? Perhaps better to assert or return false if FindIV's scale factor is larger than 1. |
||||||||||
| auto *MinIdxPhiR = new VPReductionPHIRecipe( | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
? |
||||||||||
| dyn_cast_or_null<PHINode>(FindIVPhiR->getUnderlyingValue()), | ||||||||||
| RecurKind::UMin, *MaxInt, *FindIVPhiR->getBackedgeValue(), Style, | ||||||||||
| FindIVPhiR->hasUsesOutsideReductionChain()); | ||||||||||
| MinIdxPhiR->insertBefore(FindIVPhiR); | ||||||||||
|
|
||||||||||
| VPInstruction *FindLastIVResult = | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
? |
||||||||||
| findUserOf<VPInstruction::ComputeFindIVResult>(FindIVPhiR); | ||||||||||
| MinMaxResult->moveBefore(*FindLastIVResult->getParent(), | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Independent: good to have a |
||||||||||
| FindLastIVResult->getIterator()); | ||||||||||
|
|
||||||||||
| // The reduction using MinMaxPhiR needs adjusting to compute the correct | ||||||||||
| // result: | ||||||||||
| // 1. We need to find the first canonical IV for which the condition based | ||||||||||
| // on the min/max recurrence is true, | ||||||||||
|
Comment on lines
+1176
to
+1177
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // 2. Compare the partial min/max reduction result to its final value and, | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // 3. Select the lanes of the partial UMin reduction of the canonical wide | ||||||||||
| // IV which correspond to the lanes matching the min/max reduction result. | ||||||||||
|
Comment on lines
+1179
to
+1180
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // 4. Scale the final select canonical IV back to the original IV using | ||||||||||
| // VPDerivedIVRecipe. | ||||||||||
|
Comment on lines
+1181
to
+1182
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // 5. If the minimum value matches the start value, the condition in the | ||||||||||
| // loop was never true, return the start value in that case. | ||||||||||
|
Comment on lines
+1183
to
+1184
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // | ||||||||||
| // The original reductions need adjusting: | ||||||||||
| // For example, this transforms | ||||||||||
| // vp<%min.result> = compute-reduction-result ir<%min.val>, | ||||||||||
| // ir<%min.val.next> | ||||||||||
|
Comment on lines
+1188
to
+1189
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // vp<%find.iv.result = compute-find-iv-result ir<%min.idx>, ir<0>, | ||||||||||
| // SENTINEL, vp<%min.idx.next> | ||||||||||
| // | ||||||||||
| // into: | ||||||||||
| // vp<%min.result> = compute-reduction-result ir<%min.val>, ir<%min.val.next> | ||||||||||
| // vp<%final.min.cmp> = icmp eq ir<%min.val.next>, vp<%min.result> | ||||||||||
| // vp<%final.min.iv> = select vp<%final.min.cmp>, ir<%min.idx.next>, ir<-1> | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| // vp<%13> = compute-reduction-result ir<%min.idx>, vp<%final.min.iv> | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suffice the reduce a single vector - %final.min.idx? |
||||||||||
| // vp<%scaled.result.iv> = DERIVED-IV ir<20> + vp<%13> * ir<1> | ||||||||||
| // vp<%threshold.cmp> = icmp slt vp<%min.result>, ir<0> | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This comparison seems to check for sentinel index (being negative? smaller than zero), rather than overall min/max being equal to the initial min/max value, as documented:
Suggested change
? |
||||||||||
| // vp<%final.result> = select vp<%threshold.cmp>, vp<%scaled.result.iv>, | ||||||||||
| // ir<%original.start> | ||||||||||
|
Comment on lines
+1200
to
+1201
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
|
|
||||||||||
| VPBuilder Builder(FindLastIVResult); | ||||||||||
| VPValue *MinMaxExiting = MinMaxResult->getOperand(1); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: instead of using "MinMax" to mean either one, pick one and explain that everything holds also for the other. As in the example above. Can also use "Extremum", but that's even longer. |
||||||||||
| auto *FinalMinMaxCmp = | ||||||||||
| Builder.createICmp(CmpInst::ICMP_EQ, MinMaxExiting, MinMaxResult); | ||||||||||
| VPValue *LastIVExiting = FindLastIVResult->getOperand(3); | ||||||||||
| auto *FinalIVSelect = | ||||||||||
| Builder.createSelect(FinalMinMaxCmp, LastIVExiting, MaxInt); | ||||||||||
| VPSingleDefRecipe *FinalResult = Builder.createNaryOp( | ||||||||||
| VPInstruction::ComputeReductionResult, {MinIdxPhiR, FinalIVSelect}, {}, | ||||||||||
| FindLastIVResult->getDebugLoc()); | ||||||||||
|
|
||||||||||
| // If we used a new wide canonical IV convert the reduction result back to the | ||||||||||
| // original IV scale before the final select. | ||||||||||
| if (!WideIV->isCanonical()) { | ||||||||||
| auto *DerivedIVRecipe = | ||||||||||
| new VPDerivedIVRecipe(InductionDescriptor::IK_IntInduction, | ||||||||||
| nullptr, // No FPBinOp for integer induction | ||||||||||
| WideIV->getStartValue(), FinalResult, | ||||||||||
| WideIV->getStepValue(), "derived.iv.result"); | ||||||||||
| DerivedIVRecipe->insertBefore(&*Builder.getInsertPoint()); | ||||||||||
| FinalResult = DerivedIVRecipe; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| auto GetPred = [&MinMaxPhiR]() { | ||||||||||
| switch (MinMaxPhiR->getRecurrenceKind()) { | ||||||||||
| case RecurKind::UMin: | ||||||||||
| return CmpInst::ICMP_ULT; | ||||||||||
| case RecurKind::SMin: | ||||||||||
| return CmpInst::ICMP_SLT; | ||||||||||
| case RecurKind::UMax: | ||||||||||
| return CmpInst::ICMP_UGT; | ||||||||||
| case RecurKind::SMax: | ||||||||||
| return CmpInst::ICMP_SGT; | ||||||||||
| default: | ||||||||||
| llvm_unreachable("must be an integer min/max recurrence kind"); | ||||||||||
| } | ||||||||||
| }; | ||||||||||
| // If the final min/max value matches the start value, the condition in the | ||||||||||
| // loop was always false, i.e. no induction value has been selected. If that's | ||||||||||
| // the case, use the original start value. | ||||||||||
| VPValue *MinMaxLT = | ||||||||||
| Builder.createICmp(GetPred(), MinMaxResult, MinMaxPhiR->getStartValue()); | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compare equal/ne instead, as documented? |
||||||||||
| VPValue *Res = Builder.createSelect(MinMaxLT, FinalResult, | ||||||||||
| FindLastIVResult->getOperand(1)); | ||||||||||
| FindIVPhiR->replaceAllUsesWith(MinIdxPhiR); | ||||||||||
| FindLastIVResult->replaceAllUsesWith(Res); | ||||||||||
| return true; | ||||||||||
| } | ||||||||||
|
|
||||||||||
| bool VPlanTransforms::handleMultiUseReductions(VPlan &Plan) { | ||||||||||
| for (auto &PhiR : make_early_inc_range( | ||||||||||
| Plan.getVectorLoopRegion()->getEntryBasicBlock()->phis())) { | ||||||||||
|
|
@@ -1131,7 +1260,7 @@ bool VPlanTransforms::handleMultiUseReductions(VPlan &Plan) { | |||||||||
| // MinMaxPhiR has users outside the reduction cycle in the loop. Check if | ||||||||||
| // the only other user is a FindLastIV reduction. MinMaxPhiR must have | ||||||||||
| // exactly 3 users: 1) the min/max operation, the compare of a FindLastIV | ||||||||||
| // reduction and ComputeReductionResult. The comparisom must compare | ||||||||||
| // reduction and ComputeReductionResult. The comparison must compare | ||||||||||
| // MinMaxPhiR against the min/max operand used for the min/max reduction | ||||||||||
| // and only be used by the select of the FindLastIV reduction. | ||||||||||
| RecurKind RdxKind = MinMaxPhiR->getRecurrenceKind(); | ||||||||||
|
|
@@ -1203,33 +1332,42 @@ bool VPlanTransforms::handleMultiUseReductions(VPlan &Plan) { | |||||||||
| FindIVPhiR->getRecurrenceKind())) | ||||||||||
| return false; | ||||||||||
|
|
||||||||||
| assert(!FindIVPhiR->isInLoop() && !FindIVPhiR->isOrdered() && | ||||||||||
| "cannot handle inloop/ordered reductions yet"); | ||||||||||
|
|
||||||||||
| // TODO: Support cases where IVOp is the IV increment. | ||||||||||
| if (!match(IVOp, m_TruncOrSelf(m_VPValue(IVOp))) || | ||||||||||
| !isa<VPWidenIntOrFpInductionRecipe>(IVOp)) | ||||||||||
| return false; | ||||||||||
|
|
||||||||||
| CmpInst::Predicate RdxPredicate = [RdxKind]() { | ||||||||||
| // Check if the predicate is compatible with the reduction kind. | ||||||||||
| bool IsValidPredicate = [RdxKind, Pred]() { | ||||||||||
| switch (RdxKind) { | ||||||||||
| case RecurKind::UMin: | ||||||||||
| return CmpInst::ICMP_UGE; | ||||||||||
| return Pred == CmpInst::ICMP_UGE || Pred == CmpInst::ICMP_UGT; | ||||||||||
| case RecurKind::UMax: | ||||||||||
| return CmpInst::ICMP_ULE; | ||||||||||
| return Pred == CmpInst::ICMP_ULE || Pred == CmpInst::ICMP_ULT; | ||||||||||
| case RecurKind::SMax: | ||||||||||
| return CmpInst::ICMP_SLE; | ||||||||||
| return Pred == CmpInst::ICMP_SLE || Pred == CmpInst::ICMP_SLT; | ||||||||||
| case RecurKind::SMin: | ||||||||||
| return CmpInst::ICMP_SGE; | ||||||||||
| return Pred == CmpInst::ICMP_SGE || Pred == CmpInst::ICMP_SGT; | ||||||||||
| default: | ||||||||||
| llvm_unreachable("unhandled recurrence kind"); | ||||||||||
| } | ||||||||||
| }(); | ||||||||||
|
|
||||||||||
| // TODO: Strict predicates need to find the first IV value for which the | ||||||||||
| // predicate holds, not the last. | ||||||||||
| if (Pred != RdxPredicate) | ||||||||||
| if (!IsValidPredicate) | ||||||||||
| return false; | ||||||||||
|
|
||||||||||
| assert(!FindIVPhiR->isInLoop() && !FindIVPhiR->isOrdered() && | ||||||||||
| "cannot handle inloop/ordered reductions yet"); | ||||||||||
| // For strict predicates, use a UMin reduction to find the minimum index. | ||||||||||
| // Canonical IVs (0, 1, 2, ...) are guaranteed not to wrap in the vector | ||||||||||
| // loop, so UMin can always be used. | ||||||||||
| bool IsStrictPredicate = ICmpInst::isLT(Pred) || ICmpInst::isGT(Pred); | ||||||||||
| if (IsStrictPredicate) { | ||||||||||
| return handleStrictArgMinArgMax(Plan, MinMaxPhiR, FindIVPhiR, | ||||||||||
| cast<VPWidenIntOrFpInductionRecipe>(IVOp), | ||||||||||
| MinMaxResult); | ||||||||||
| } | ||||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be clearer to outline the rest into (else) |
||||||||||
|
|
||||||||||
| // The reduction using MinMaxPhiR needs adjusting to compute the correct | ||||||||||
| // result: | ||||||||||
|
|
||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -163,6 +163,7 @@ bool VPRecipeBase::mayHaveSideEffects() const { | |
| return cast<VPExpressionRecipe>(this)->mayHaveSideEffects(); | ||
| case VPDerivedIVSC: | ||
| case VPFirstOrderRecurrencePHISC: | ||
| case VPReductionPHISC: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can be applied independently but tested/matters only now? |
||
| case VPPredInstPHISC: | ||
| case VPVectorEndPointerSC: | ||
| return false; | ||
|
|
@@ -1189,6 +1190,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const { | |
| case VPInstruction::BuildVector: | ||
| case VPInstruction::CalculateTripCountMinusVF: | ||
| case VPInstruction::CanonicalIVIncrementForPart: | ||
| case VPInstruction::ComputeFindIVResult: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can be applied independently but tested/matters only now? |
||
| case VPInstruction::ExtractLane: | ||
| case VPInstruction::ExtractLastLane: | ||
| case VPInstruction::ExtractLastPart: | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
runPass()dropped - due to verification issue?