-
Notifications
You must be signed in to change notification settings - Fork 318
Fix | SqlVector: Explicitly perform little-endian multibyte writes #3861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the SqlVector<T> implementation to use explicit little-endian byte operations when serializing and deserializing vector data. The change replaces platform-specific memory marshaling/copying approaches with consistent loop-based operations using BinaryPrimitives methods to ensure correct endianness.
Key Changes:
- Replaced manual bit manipulation and platform-specific
Buffer.BlockCopy/MemoryMarshaloperations with explicit little-endian read/write methods - Introduced platform-specific handling:
BinaryPrimitives.WriteSingleLittleEndianfor .NET andBitConverterCompatible.SingleToInt32Bits+BinaryPrimitives.WriteInt32LittleEndianfor .NET Framework - Ensured symmetry between serialization (
MakeTdsBytes) and deserialization (MakeArray) operations
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlTypes/SqlVector.cs
Outdated
Show resolved
Hide resolved
|
|
||
| private SqlVector(int length) | ||
| { | ||
| if (length < 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also be throwing if length > ushort.Max ?
Same for the ReadOnlyMemory constructor.
(With updated public docs to match.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, yes - although I think it's probably going to be TdsEnums.VECTOR_HEADER_SIZE + (_elementSize * Length) > 8000 to align with SQL Server.
This also means that length will always be < 8000, thus always in the acceptable range for a ushort (so MakeTdsBytes can just have a simple debug assertion rather than an exception.)
| result[1] = VecVersionNo; | ||
| result[2] = (byte)(Length & 0xFF); | ||
| result[3] = (byte)((Length >> 8) & 0xFF); | ||
| BinaryPrimitives.WriteUInt16LittleEndian(result.AsSpan(2), (ushort)Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a new class invariant to ensure Length's value is compatible with ushort. See my comment on the constructor above.
src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlTypes/SqlVector.cs
Outdated
Show resolved
Hide resolved
| for (int i = 0, currPosition = TdsEnums.VECTOR_HEADER_SIZE; i < values.Length; i++, currPosition += _elementSize) | ||
| { | ||
| #if NET | ||
| BinaryPrimitives.WriteSingleLittleEndian(result.AsSpan(currPosition), (float)(object)valueSpan[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a performance impact here (good or bad)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expected there to be one, but didn't expect it to be so large.
Previously, the method would have taken ~70ns to copy a max-length vector when the ReadOnlyMemory was backed by an array and ~240ns when it was backed by unmanaged memory (as a result of the extra copy.) As a result of the first set of changes, it would have taken ~1150ns.
The endianness is only really a concern on big-endian systems, so I've changed the method slightly. On a big-endian machine, it'll continue to use the replacement method (so will take about 1150ns.) On a little-endian machine, it'll continue to use the pre-PR method (albeit with a Span-based copy instead of Buffer.BlockCopy) which now takes ~60ns.
Vectors must always be less than 8000 bytes. This also means that Length will always be <= ushort.MaxValue. Add assertion to highlight this.
Length is the number of elements.
|
/azp run |
|
Azure Pipelines successfully started running 2 pipeline(s). |
Description
This deals with a minor point which came out of the original implementation PR: when we build (or read) the byte array representing a
vector({size}, float32), we now explicitly do so using BinaryPrimitives' little-endian methods.There are two slightly unusual points:
BinaryPrimitives.WriteSingleLittleEndianmethod. Instead, we fall back to the existingBitConverterCompatible.SingleToInt32Bitson this target.(float)(object)valueSpan[i]. This is another variation of the same pattern used elsewhere of(T)(object)item, and the same pattern holds: the JIT sees thatvalueSpanis actually aReadOnlySpan<float>and eliminates the redundant cast. An example of this on Sharplab is here.Issues
Fixes #3790.
Testing
Automated tests continue to pass.