Skip to content

Conversation

@Lortunate
Copy link
Owner

No description provided.

@gemini-code-assist
Copy link

Summary of Changes

Hello @Lortunate, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Optical Character Recognition (OCR) functionality into the project by adding a new 'ocr' plugin. The plugin leverages ONNX Runtime for efficient and accurate text detection and recognition from images. It includes robust model management for downloading and caching necessary ONNX models and character dictionaries, ensuring a streamlined setup for users. The core logic encompasses image preprocessing, inference with specialized detection and recognition models, and post-processing to extract recognized text along with their bounding box locations and confidence scores.

Highlights

  • New OCR Plugin: Added a new 'ocr' plugin to the project workspace, introducing Optical Character Recognition capabilities.
  • Text Detection Implementation: Implemented text detection using an ONNX model, including image preprocessing and bounding box extraction.
  • Text Recognition Implementation: Developed text recognition functionality with an ONNX model, utilizing CTC Greedy Decode for converting model outputs to text.
  • Model Management: Introduced a 'ModelManager' to handle automatic downloading and caching of OCR models (detection, recognition, and character keys).
  • Example Usage: Provided an example demonstrating how to initialize the OCR context, download models, and perform text recognition on an image.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces OCR support as a new plugin. While the implementation is generally well-structured, a security audit identified two significant vulnerabilities: a high-severity Server-Side Request Forgery (SSRF) vulnerability due to improper validation of user-provided URLs for model downloads, and a medium-severity Time-of-check Time-of-use (TOCTOU) vulnerability in the model downloading logic that could allow local attackers to overwrite arbitrary files. Additionally, there are suggestions for improving API design, performance, and robustness.

Comment on lines +32 to +38
pub async fn new<P: AsRef<Path>>(
models_dir: Option<P>,
model_type: Option<OcrModelType>,
det_url: Option<&str>,
rec_url: Option<&str>,
keys_url: Option<&str>,
on_progress: Option<Arc<dyn Fn(f32) + Send + Sync>>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The OcrContext::new function accepts URLs for model and key files (det_url, rec_url, keys_url). These URLs are passed to the ModelManager to download files without proper validation against a list of allowed domains. An attacker could provide a URL to an internal service, causing the application to make an unintended network request (SSRF). This could be used to probe the internal network or access sensitive data from cloud metadata services.

Comment on lines +27 to +38
pub fn normalize_image(img: &DynamicImage, mean: &[f32; 3], std: &[f32; 3]) -> Array4<f32> {
let (w, h) = img.dimensions();
let img_rgb = img.to_rgb8();

let pixels: Vec<f32> = img_rgb.pixels().flat_map(|p| normalize_pixel(p, mean, std)).collect();

Array3::from_shape_vec((h as usize, w as usize, 3), pixels)
.unwrap()
.permuted_axes([2, 0, 1])
.insert_axis(ndarray::Axis(0))
.to_owned()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The .unwrap() on line 34 could cause a panic if the number of elements in pixels doesn't match the expected shape (h, w, 3). While it's unlikely with the current logic, it's safer for a library function to avoid panics and handle this potential error by returning a Result. You will also need to add use anyhow::Context; at the top of the file and update the call site in detector.rs to handle the Result by changing Ok(normalize_image(...)) to just normalize_image(...).

pub fn normalize_image(img: &DynamicImage, mean: &[f32; 3], std: &[f32; 3]) -> Result<Array4<f32>> {
    let (w, h) = img.dimensions();
    let img_rgb = img.to_rgb8();

    let pixels: Vec<f32> = img_rgb.pixels().flat_map(|p| normalize_pixel(p, mean, std)).collect();

    let array3d = Array3::from_shape_vec((h as usize, w as usize, 3), pixels)
        .with_context(|| "Failed to create Array3 from pixels")?;

    Ok(array3d
        .permuted_axes([2, 0, 1])
        .insert_axis(ndarray::Axis(0))
        .to_owned())
}

Comment on lines +37 to +54
let file_path = self.save_dir.join(filename);

if file_path.exists() {
info!("Model {} already exists at {:?}", filename, file_path);
if let Some(cb) = on_progress {
cb(1.0);
}
return Ok(file_path);
}

if !self.save_dir.exists() {
fs::create_dir_all(&self.save_dir).await?;
}

info!("Downloading model from {} to {:?}", url, file_path);
self.download_file(url, &file_path, on_progress).await?;

Ok(file_path)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

In the ensure_model function, the code checks if a file exists (line 39) and then later creates and writes to it (line 52, calling download_file which creates the file). This creates a Time-of-Check to Time-of-Use (TOCTOU) race condition. An attacker with local filesystem access could create a symbolic link to a sensitive file (e.g., /etc/passwd) between the check and the write operation. This would cause the downloaded content to overwrite the linked file.

dirs = "6.0.0"
futures-util = "0.3"
image = "0.25"
imageproc = "0.26.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The imageproc dependency transitively brings in nalgebra. Your Cargo.lock file shows that multiple versions of the glam crate are being pulled in, likely due to nalgebra's versioned feature flags for glam interoperability. This increases compile times and binary size. While this is an upstream issue, it's worth being aware of. If you find you need nalgebra for other things, you might consider adding it as a direct dependency and carefully selecting features to see if you can unify the glam version.

};

info!("Initializing OCR Context...");
let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type annotation <&Path> is redundant here and can be removed for cleaner code. Also, since the recognize method doesn't require a mutable reference to OcrContext (with other suggested changes), ocr no longer needs to be declared as mutable.

Suggested change
let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?;
let ocr = OcrContext::new(None, Some(model_type), None, None, None, None).await?;

Ok(Self { detector, recognizer })
}

pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since Detector::detect and Recognizer::recognize_batch can be changed to take &self, this method can also be changed to take &self, as it does not need to mutate the engine's state.

Suggested change
pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {
pub fn ocr(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

Ok(Self { engine })
}

pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since OcrEngine::ocr can be changed to take &self, this public method can also be changed to take &self. This makes the library's API more ergonomic and correct.

Suggested change
pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {
pub fn recognize(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

Ok(keys)
}

pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The recognize_batch method takes &mut self, but it doesn't appear to modify the Recognizer's state. The ort::Session::run method it calls takes &self. It's better to use &self to make the API clearer and more correct.

Suggested change
pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {
pub fn recognize_batch(&self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {

let (max_idx, max_val) = row
.iter()
.enumerate()
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using partial_cmp with unwrap_or to find the maximum value in a float slice can be surprising if NaN values are present. A more robust and idiomatic way to compare floats for a total order is to use f32::total_cmp.

Suggested change
.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.max_by(|(_, a), (_, b)| a.total_cmp(b))

Comment on lines +95 to +109
pub fn get_bounding_rect<T>(points: &[(T, T)]) -> (T, T, T, T)
where
T: Copy + PartialOrd + num_traits::Bounded,
{
points.iter().fold(
(T::max_value(), T::max_value(), T::min_value(), T::min_value()),
|(min_x, min_y, max_x, max_y), &(x, y)| {
(
if x < min_x { x } else { min_x },
if y < min_y { y } else { min_y },
if x > max_x { x } else { max_x },
if y > max_y { y } else { max_y },
)
},
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The implementation of get_bounding_rect can be simplified. Since the types used with this function (u32 and i32) implement the Ord trait, you can change the trait bound from PartialOrd to Ord. This allows using the more concise min and max methods.

pub fn get_bounding_rect<T>(points: &[(T, T)]) -> (T, T, T, T)
where
    T: Copy + Ord + num_traits::Bounded,
{
    points.iter().fold(
        (T::max_value(), T::max_value(), T::min_value(), T::min_value()),
        |(min_x, min_y, max_x, max_y), &(x, y)| {
            (min_x.min(x), min_y.min(y), max_x.max(x), max_y.max(y))
        },
    )
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants