-
Notifications
You must be signed in to change notification settings - Fork 0
Prototype TYP_HALF changes
#15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| return "AVXVNNIINT"; | ||
| case InstructionSet_AVXVNNIINT_V512 : | ||
| return "AVXVNNIINT_V512"; | ||
| case InstructionSet_Half : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What CPUID bit or other support is InstructionSet_Half tracking?
I'd expect us to just use InstructionSet_AVX2 for F16C and InstructionSet_AVX10v1 for FP16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I was kind of mirroring the Vector128 etc. ISAs. Mostly hacking. I will adjust this for the real PR.
|
|
||
| var_types storeType = genParamStackType(paramVarDsc, segment); | ||
| if (!varDsc->TypeIs(TYP_STRUCT) && (genTypeSize(genActualType(varDsc)) < genTypeSize(storeType))) | ||
| if (!varDsc->TypeIs(TYP_STRUCT) && (!varDsc->TypeIs(TYP_HALF)) && (genTypeSize(genActualType(varDsc)) < genTypeSize(storeType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this is needed. I'd expect genActualType to just keep TYP_HALF as TYP_HALF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have placed this because the storeType was getting saved as TYP_HALF when I wanted to keep the handling similar to that of TYP_STRUCT.
Now, I may have fixed the issue in another spot and this is no longer relevant. I will check.
| // todo-xarch-half: understand why I needed to comment this out | ||
| //assert(treeNode->TypeGet() == genActualType(treeNode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing the handling of TYP_HALF in genActualType is wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the issue I was facing was genActualType was returning TYP_INT while treeNode->TypeGet() was returning a TYP_USHORT. It wasn't TYP_HALF directly that was the issue.
Edit: As in, the bitcast node treeNode is casting to a TYP_USHORT but looks like the assumption is that for a bitcast, both the node type and the VM type representing the node type be the same (which is not true for TYP_USHORT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't expect us to have TYP_USHORT here at all for that case
I wonder if it has to do with you having the simd type as TYP_HALF and then the base type ended up as TYP_USHORT, when we rather want simd type as TYP_SIMD16/32/64 and the base type as TYP_HALF
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem is it being modeled as STORE_LCL_VAR half with a ushort field. I think that should've stayed as TYP_STRUCT instead for that scenario and had a GT_BITCAST node inserted
It might actually be that trying to preserve the ABI when we are producing TYP_HALF is more, rather than less, complicated and that it's fine to just let it naturally fall out if there isn't "extra" work required here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, rather than the above, I'd expect this to be more an ABI classification for TYP_HALF and for us to mark that it is it passed/returned the same as TYP_STRUCT containing a single TYP_USHORT field rather than as floating-point.
It'd still be exposed as TYP_HALF, much as TYP_SIMD is still always TYP_SIMD while Unix passes it in register but Windows passes it via shadow copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming back to this now after the refactoring PR --- I understand what you are saying. I will work on this behavior now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm going back over this now...
From what I see, a C# struct with a single ushort field is handled internally a bit differently than a struct with a single float field. The latter I think is a bit closer to GenTree behavior we would need because we would be using a float register for the half type.
When a struct has a single ushort field, a bitcast isn't needed in the lowering from what I can see and instead a PUTREG is inserted and since genActualType(ushort) == int it works fine.
I wonder if I should program the half type as genActualType(half) == float, which would mean its bitcast would produce something closer to the struct with a float field, though I think other issues may arise.
| else if (size == 2) | ||
| { | ||
| // todo-xarch-half: I think this is the wrong approach, come back to refactor | ||
| simdType = TYP_HALF; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is correct. We only want TYP_SIMD8/12/16/32/64 to be "SIMD types".
TYP_HALF is more like TYP_FLOAT and is a potential "base type" for the SIMD type.
| assert((leadingBytes == 0x0F) || ((emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v1) || | ||
| emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v2) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, AVX10v1 implies AVX10v2, so we can just change the first line.
| assert((leadingBytes == 0x0F) || ((emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v1) || | |
| emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v2) || | |
| assert((leadingBytes == 0x0F) || ((emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v1) || |
| leadingBytes = (code >> 16) & 0xFF; | ||
| assert(leadingBytes == 0x0F || | ||
| (emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v2) && leadingBytes >= 0x00 && | ||
| ((emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v1) || emitComp->compIsaSupportedDebugOnly(InstructionSet_AVX10v2)) && leadingBytes >= 0x00 && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar here and elsewhere. AVX10v1 implies AVX10v2
|
|
||
| #if defined(TARGET_AMD64) | ||
| case INS_movsxd: | ||
| case INS_vmovsh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This isn't 64-bit exclusive, it works fine in 32-bit compat mode
|
|
||
| case INS_vmovsh: | ||
| { | ||
| hasSideEffect = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd expect this to be true, much like movss since it clears the upper bits of the register (i.e. it overwrites bits 0-15, preserves bits 16-127, and clears bits 128-MAXVL)
| if (IsXMMReg(reg)) | ||
| { | ||
| return emitXMMregName(reg); | ||
| } | ||
| else if (reg > REG_RDI) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be 64-bit exclusive and probably should follow the pattern of the other paths
if (IsXMMReg(reg))
{
return emitXMMregName(reg);
}
assert(isGeneralRegister(reg));
#if defined(TARGET_AMD64)
| else if (code & 0xFF000000) | ||
| { | ||
| if (size == EA_2BYTE) | ||
| if (size == EA_2BYTE && ins != INS_vmovsh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this and other paths should rather just check for INS_movbe since that's the only path that should hit it.
We would then instead add assert((size != EA_2BYTE) || IsFp16Instruction(ins)) (or something along those lines since this likely also expands to bfloat16 in the future).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can check this as I rework the PR.
| else if (code & 0xFF000000) | ||
| { | ||
| if (size == EA_2BYTE) | ||
| if (size == EA_2BYTE && (ins != INS_vmovsh && ins != INS_vaddsh)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here
| HWIntrinsicFlag flags; // 4-bytes | ||
| NamedIntrinsic id; // 2-bytes | ||
| #if defined(TARGET_XARCH) | ||
| uint16_t ins[11]; // 10 * 2-bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The comment needs fixing here, as does the size listing above.
We really should also fix this for all platforms (and can be mostly handled on others by just fixing their #define to always specify INS_invalid), but that can be done separately if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Was keeping it pay for play but I will make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the platforms will end up getting support here, it just hadn't bubbled up to getting done yet. So I think its fine in this case to just go ahead and setup that cost, which will ideally help motivate other platforms to adopt the changes sooner as well.
| case NI_Half_op_Explicit: | ||
| { | ||
| assert(sig->numArgs == 1); | ||
|
|
||
| if (compOpportunisticallyDependsOn(InstructionSet_AVX10v1)) | ||
| { | ||
| StackEntry se = impPopStack(); | ||
| op1 = se.val; | ||
| retNode = gtNewSimdHWIntrinsicNode(retType, op1, NI_AVX10v1_ConvertFloatToHalf, CORINFO_TYPE_FLOAT, 2); | ||
| } | ||
|
|
||
| break; | ||
| } | ||
|
|
||
| case NI_Half_op_Addition: | ||
| { | ||
| assert(sig->numArgs == 2); | ||
|
|
||
| if (compOpportunisticallyDependsOn(InstructionSet_AVX10v1)) | ||
| { | ||
| StackEntry se1 = impPopStack(); | ||
| StackEntry se2 = impPopStack(); | ||
| op1 = se1.val; | ||
| op2 = se2.val; | ||
| retNode = gtNewSimdHWIntrinsicNode(retType, op1, op2, NI_AVX10v1_HalfAdd, CORINFO_TYPE_HALF, 2); | ||
| } | ||
|
|
||
| break; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than these being here, I'd expect we have handling similar to what we have for System.Single here: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L10456-L10459
That is, under case 'H': we'd check for "Half" and then call lookupPrimitiveFloatNamedIntrinsic
We'd end up having that operate similarly to say NI_System_Math_FusedMultiplyAdd where we check for the ISA support (AVX2 for F16C or AVX10v1 for FP16) and then introduce the relevant gtNewSimdCreateScalarUnsafeNode and gtNewSimdHWIntrinsicNode for the conversion or addition, prior to calling gtNewSimdToScalarNode: https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/importercalls.cpp#L4303-L4368
| if (structSizeMightRepresentSIMDType(originalSize) | ||
| #if defined(TARGET_XARCH) | ||
| || (originalSize == 2) | ||
| #endif | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we're including more typical sizes, I imagine we want to change the check on L1187 to include CORINFO_FLG_INTRINSIC_TYPE and to mark System.Half as [Intrinsic]. We then likely want to tweak this a bit to handle the sizes a bit more efficiently, since we have 8/12/16/32/64 as hitting the SIMD path and 2 hitting the HALF path, but that may expand to include BFLOAT16 in the future or even INT128/UINT128 (16)
Doing this should make it a bit more "pay for play".
| if (simdReturnType != call->TypeGet()) | ||
| { | ||
| assert(varTypeIsSIMD(simdReturnType)); | ||
| assert(varTypeIsSIMD(simdReturnType) || simdReturnType == TYP_HALF); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to tweak the names here, maybe intrinsicReturnType or abiReturnType or something is "better". -- Don't have a preference, but others on jitcontrib might.

I've hacked a bit of changes together to see if I am working on the ABI in the right places. So far, the following code will
which will get lowered into the following asm
Mostly, I am looking for some confirmation that the changes are on the right track, after which I will go back and work on this PR without such a hacking mindset (I know that for example, I have commented out an assert, and hacked in some if statement conditions).