Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 89 additions & 6 deletions src/de.rs
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,52 @@ where
Ok(t)
}

/// Deserializes a `&[u8]` into a type.
///
/// This function will attempt to interpret `bytes` as the BCS serialized form of `T` and
/// deserialize `T` from `bytes`.
///
/// # Safety
///
/// This is unsafe. Developers must ensure that their `Deserialize`
/// structs correctly reflect the data being deserialized.
/// If the deserialization logic of one type overlaps with the next field,
/// the overall deserialization might succeed with undefined behavior.
///
/// If you're not sure what you're doing, consider using `from_bytes` instead.
///
/// # Examples
///
/// ```
/// use bcs::from_bytes_discarding_remaining_input;
/// use serde::Deserialize;
///
/// #[derive(Deserialize)]
/// struct Ip([u8; 4]);
///
/// #[derive(Deserialize)]
/// struct SocketAddr {
/// ip: Ip,
/// }
///
/// let bytes = vec![0x7f, 0x00, 0x00, 0x01, 0x41, 0x1f];
/// let socket_addr: SocketAddr = unsafe {
/// from_bytes_discarding_remaining_input(&bytes).unwrap()
/// };
///
/// assert_eq!(socket_addr.ip.0, [127, 0, 0, 1]);
/// ```
#[allow(unsafe_code)]
pub unsafe fn from_bytes_discarding_remaining_input<'a, T>(bytes: &'a [u8]) -> Result<T>
Comment on lines +82 to +83
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't use unsafe for that. I believe what you're asking is a variant where we don't call end(). Or equivalently exposing the Deserializer struct. Historically, @bmwill and I have pushed back on this idea. Not sure if there is a stronger argument now.

Copy link
Author

@gfusee gfusee May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inspired by some std's methods that are marked as unsafe even they don't use any unsafe code, such as Vec::set_len

To not call end would indeed achieve what I'm asking. However, end is defined in a trait (BcsDeserializer), which implicitly tells that it will be called anyway at the... end of something? Even the comment states this: The Deserializer::end method should be called after a type has been fully deserialized. It would be unintuitive to not call it in my opinion.

Exposing the Deserializer is a solution, but it would oblige developers to write themself the code to bypass the end function. What do you think about a variant of this where:

  • we remove the end function to be a requirement of the BcsDeserializer trait, which has no side effect since it is never called in a generic context (at least I didn't find any generic usage, correct me if I'm wrong)
  • we rename the Deserializer struct to RawDeserializer, and we remove the end method
  • we create a struct called DefaultDeserializer that has the end method that ensures if there is no remaining input. It uses RawDeserializer logic behind the scene
  • we use DefaultDeserializer where Deserializer was used
  • we expose both DefaultDeserializer and RawDeserializer
  • no from_bytes_discarding_remaining_input, no new_discarding_remaining_input and no discard_remaining_input

This way the developer would not notice any change, but someone wanting to not have the checks would just have to use RawDeserializer directly

Let me know if it looks good for you 🙏

where
T: Deserialize<'a>,
{
let mut deserializer = Deserializer::new_discarding_remaining_input(bytes, crate::MAX_CONTAINER_DEPTH);
let t = T::deserialize(&mut deserializer)?;
deserializer.end()?;
Ok(t)
}

/// Same as `from_bytes` but use `limit` as max container depth instead of MAX_CONTAINER_DEPTH`
/// Note that `limit` has to be lower than MAX_CONTAINER_DEPTH
pub fn from_bytes_with_limit<'a, T>(bytes: &'a [u8], limit: usize) -> Result<T>
Expand Down Expand Up @@ -141,24 +187,54 @@ where
struct Deserializer<R> {
input: R,
max_remaining_depth: usize,
discard_remaining_input: bool
}

impl<'de, R: Read> Deserializer<TeeReader<'de, R>> {
fn from_reader(input: &'de mut R, max_remaining_depth: usize) -> Self {
fn from_reader(
input: &'de mut R,
max_remaining_depth: usize,
) -> Self {
Deserializer {
input: TeeReader::new(input),
max_remaining_depth,
discard_remaining_input: false,
}
}
}

impl<'de> Deserializer<&'de [u8]> {
/// Creates a new `Deserializer` which will be deserializing the provided
/// input.
fn new(input: &'de [u8], max_remaining_depth: usize) -> Self {
fn new(
input: &'de [u8],
max_remaining_depth: usize,
) -> Self {
Deserializer {
input,
max_remaining_depth,
discard_remaining_input: false,
}
}

/// Creates a new `Deserializer` which will be deserializing the provided
/// input, and will discard remaining input.
///
/// # Safety
///
/// This is unsafe. Developers must ensure that their `Deserialize`
/// structs correctly reflect the data being deserialized.
/// If the deserialization logic of one type overlaps with the next field,
/// the overall deserialization might succeed with undefined behavior.
#[allow(unsafe_code)]
unsafe fn new_discarding_remaining_input(
input: &'de [u8],
max_remaining_depth: usize,
) -> Self {
Deserializer {
input,
max_remaining_depth,
discard_remaining_input: true,
}
}
}
Expand Down Expand Up @@ -340,7 +416,13 @@ impl<'de, R: Read> BcsDeserializer<'de> for Deserializer<TeeReader<'de, R>> {
fn end(&mut self) -> Result<()> {
let mut byte = [0u8; 1];
match self.input.read_exact(&mut byte) {
Ok(_) => Err(Error::RemainingInput),
Ok(_) => {
if !self.discard_remaining_input {
Err(Error::RemainingInput)
} else {
Ok(())
}
},
Err(e) if e.kind() == std::io::ErrorKind::UnexpectedEof => Ok(()),
Err(e) => Err(e.into()),
}
Expand Down Expand Up @@ -388,11 +470,12 @@ impl<'de> BcsDeserializer<'de> for Deserializer<&'de [u8]> {
}

fn end(&mut self) -> Result<()> {
if self.input.is_empty() {
Ok(())
} else {
if !self.discard_remaining_input && !self.input.is_empty() {
Err(Error::RemainingInput)
} else {
Ok(())
}

}
}

Expand Down
4 changes: 2 additions & 2 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// Copyright (c) The Diem Core Contributors
// SPDX-License-Identifier: Apache-2.0

#![forbid(unsafe_code)]
#![deny(unsafe_code)]

//! # Binary Canonical Serialization (BCS)
//!
Expand Down Expand Up @@ -315,7 +315,7 @@ pub const MAX_SEQUENCE_LENGTH: usize = (1 << 31) - 1;
pub const MAX_CONTAINER_DEPTH: usize = 500;

pub use de::{
from_bytes, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit, from_reader,
from_bytes, from_bytes_discarding_remaining_input, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit, from_reader,
from_reader_seed, from_reader_seed_with_limit, from_reader_with_limit,
};
pub use error::{Error, Result};
Expand Down
15 changes: 11 additions & 4 deletions tests/serde.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,7 @@ use proptest::prelude::*;
use proptest_derive::Arbitrary;
use serde::{de::DeserializeOwned, Deserialize, Serialize};

use bcs::{
from_bytes, from_bytes_with_limit, from_reader, serialized_size, to_bytes, to_bytes_with_limit,
Error, MAX_CONTAINER_DEPTH, MAX_SEQUENCE_LENGTH,
};
use bcs::{from_bytes, from_bytes_discarding_remaining_input, from_bytes_with_limit, from_reader, serialized_size, to_bytes, to_bytes_with_limit, Error, MAX_CONTAINER_DEPTH, MAX_SEQUENCE_LENGTH};

/// A helper function to attempt deserialization via reader
fn from_bytes_via_reader<T: DeserializeOwned>(bytes: &[u8]) -> Result<T, Error> {
Expand Down Expand Up @@ -470,6 +467,16 @@ fn leftover_bytes() {
);
}

#[test]
fn leftover_bytes_discarding_remaining_bytes() {
let seq = vec![5, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; // 5 extra elements
let result = unsafe {
from_bytes_discarding_remaining_input::<Vec<u8>>(&seq)
};

assert_eq!(result, Ok(vec![1, 2, 3, 4, 5]));
}

#[test]
fn test_f32() {
assert!(to_bytes(&1.0f32).is_err());
Expand Down
Loading