OCR Support #4

Lortunate · 2026-01-17T12:39:19Z

No description provided.

gemini-code-assist · 2026-01-17T12:39:43Z

Summary of Changes

Hello @Lortunate, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates Optical Character Recognition (OCR) functionality into the project by adding a new 'ocr' plugin. The plugin leverages ONNX Runtime for efficient and accurate text detection and recognition from images. It includes robust model management for downloading and caching necessary ONNX models and character dictionaries, ensuring a streamlined setup for users. The core logic encompasses image preprocessing, inference with specialized detection and recognition models, and post-processing to extract recognized text along with their bounding box locations and confidence scores.

Highlights

New OCR Plugin: Added a new 'ocr' plugin to the project workspace, introducing Optical Character Recognition capabilities.
Text Detection Implementation: Implemented text detection using an ONNX model, including image preprocessing and bounding box extraction.
Text Recognition Implementation: Developed text recognition functionality with an ONNX model, utilizing CTC Greedy Decode for converting model outputs to text.
Model Management: Introduced a 'ModelManager' to handle automatic downloading and caching of OCR models (detection, recognition, and character keys).
Example Usage: Provided an example demonstrating how to initialize the OCR context, download models, and perform text recognition on an image.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces OCR support as a new plugin. While the implementation is generally well-structured, a security audit identified two significant vulnerabilities: a high-severity Server-Side Request Forgery (SSRF) vulnerability due to improper validation of user-provided URLs for model downloads, and a medium-severity Time-of-check Time-of-use (TOCTOU) vulnerability in the model downloading logic that could allow local attackers to overwrite arbitrary files. Additionally, there are suggestions for improving API design, performance, and robustness.

gemini-code-assist · 2026-01-17T12:45:33Z

plugins/ocr/src/lib.rs

+    pub async fn new<P: AsRef<Path>>(
+        models_dir: Option<P>,
+        model_type: Option<OcrModelType>,
+        det_url: Option<&str>,
+        rec_url: Option<&str>,
+        keys_url: Option<&str>,
+        on_progress: Option<Arc<dyn Fn(f32) + Send + Sync>>,


The OcrContext::new function accepts URLs for model and key files (det_url, rec_url, keys_url). These URLs are passed to the ModelManager to download files without proper validation against a list of allowed domains. An attacker could provide a URL to an internal service, causing the application to make an unintended network request (SSRF). This could be used to probe the internal network or access sensitive data from cloud metadata services.

gemini-code-assist · 2026-01-17T12:45:33Z

plugins/ocr/src/utils.rs

+pub fn normalize_image(img: &DynamicImage, mean: &[f32; 3], std: &[f32; 3]) -> Array4<f32> {
+    let (w, h) = img.dimensions();
+    let img_rgb = img.to_rgb8();
+
+    let pixels: Vec<f32> = img_rgb.pixels().flat_map(|p| normalize_pixel(p, mean, std)).collect();
+
+    Array3::from_shape_vec((h as usize, w as usize, 3), pixels)
+        .unwrap()
+        .permuted_axes([2, 0, 1])
+        .insert_axis(ndarray::Axis(0))
+        .to_owned()
+}


The .unwrap() on line 34 could cause a panic if the number of elements in pixels doesn't match the expected shape (h, w, 3). While it's unlikely with the current logic, it's safer for a library function to avoid panics and handle this potential error by returning a Result. You will also need to add use anyhow::Context; at the top of the file and update the call site in detector.rs to handle the Result by changing Ok(normalize_image(...)) to just normalize_image(...).

pub fn normalize_image(img: &DynamicImage, mean: &[f32; 3], std: &[f32; 3]) -> Result<Array4<f32>> { let (w, h) = img.dimensions(); let img_rgb = img.to_rgb8(); let pixels: Vec<f32> = img_rgb.pixels().flat_map(|p| normalize_pixel(p, mean, std)).collect(); let array3d = Array3::from_shape_vec((h as usize, w as usize, 3), pixels) .with_context(|| "Failed to create Array3 from pixels")?; Ok(array3d .permuted_axes([2, 0, 1]) .insert_axis(ndarray::Axis(0)) .to_owned()) }

gemini-code-assist · 2026-01-17T12:45:33Z

plugins/ocr/src/model_manager.rs

+        let file_path = self.save_dir.join(filename);
+
+        if file_path.exists() {
+            info!("Model {} already exists at {:?}", filename, file_path);
+            if let Some(cb) = on_progress {
+                cb(1.0);
+            }
+            return Ok(file_path);
+        }
+
+        if !self.save_dir.exists() {
+            fs::create_dir_all(&self.save_dir).await?;
+        }
+
+        info!("Downloading model from {} to {:?}", url, file_path);
+        self.download_file(url, &file_path, on_progress).await?;
+
+        Ok(file_path)


In the ensure_model function, the code checks if a file exists (line 39) and then later creates and writes to it (line 52, calling download_file which creates the file). This creates a Time-of-Check to Time-of-Use (TOCTOU) race condition. An attacker with local filesystem access could create a symbolic link to a sensitive file (e.g., /etc/passwd) between the check and the write operation. This would cause the downloaded content to overwrite the linked file.

gemini-code-assist · 2026-01-17T12:45:33Z

plugins/ocr/Cargo.toml

+dirs = "6.0.0"
+futures-util = "0.3"
+image = "0.25"
+imageproc = "0.26.0"


The imageproc dependency transitively brings in nalgebra. Your Cargo.lock file shows that multiple versions of the glam crate are being pulled in, likely due to nalgebra's versioned feature flags for glam interoperability. This increases compile times and binary size. While this is an upstream issue, it's worth being aware of. If you find you need nalgebra for other things, you might consider adding it as a direct dependency and carefully selecting features to see if you can unify the glam version.

gemini-code-assist · 2026-01-17T12:45:33Z

plugins/ocr/examples/simple_ocr.rs

+    };
+
+    info!("Initializing OCR Context...");
+    let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?;


The type annotation <&Path> is redundant here and can be removed for cleaner code. Also, since the recognize method doesn't require a mutable reference to OcrContext (with other suggested changes), ocr no longer needs to be declared as mutable.

Suggested change

let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?;

let ocr = OcrContext::new(None, Some(model_type), None, None, None, None).await?;

gemini-code-assist · 2026-01-17T12:45:34Z

plugins/ocr/src/engine.rs

+        Ok(Self { detector, recognizer })
+    }
+
+    pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {


Since Detector::detect and Recognizer::recognize_batch can be changed to take &self, this method can also be changed to take &self, as it does not need to mutate the engine's state.

Suggested change

pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

pub fn ocr(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

gemini-code-assist · 2026-01-17T12:45:34Z

plugins/ocr/src/lib.rs

+        Ok(Self { engine })
+    }
+
+    pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {


Since OcrEngine::ocr can be changed to take &self, this public method can also be changed to take &self. This makes the library's API more ergonomic and correct.

Suggested change

pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

pub fn recognize(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

gemini-code-assist · 2026-01-17T12:45:34Z

plugins/ocr/src/recognizer.rs

+        Ok(keys)
+    }
+
+    pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {


The recognize_batch method takes &mut self, but it doesn't appear to modify the Recognizer's state. The ort::Session::run method it calls takes &self. It's better to use &self to make the API clearer and more correct.

Suggested change

pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {

pub fn recognize_batch(&self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {

gemini-code-assist · 2026-01-17T12:45:34Z

plugins/ocr/src/recognizer.rs

+            let (max_idx, max_val) = row
+                .iter()
+                .enumerate()
+                .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))


Using partial_cmp with unwrap_or to find the maximum value in a float slice can be surprising if NaN values are present. A more robust and idiomatic way to compare floats for a total order is to use f32::total_cmp.

Suggested change

.max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))

.max_by(|(_, a), (_, b)| a.total_cmp(b))

gemini-code-assist · 2026-01-17T12:45:34Z

plugins/ocr/src/utils.rs

+pub fn get_bounding_rect<T>(points: &[(T, T)]) -> (T, T, T, T)
+where
+    T: Copy + PartialOrd + num_traits::Bounded,
+{
+    points.iter().fold(
+        (T::max_value(), T::max_value(), T::min_value(), T::min_value()),
+        |(min_x, min_y, max_x, max_y), &(x, y)| {
+            (
+                if x < min_x { x } else { min_x },
+                if y < min_y { y } else { min_y },
+                if x > max_x { x } else { max_x },
+                if y > max_y { y } else { max_y },
+            )
+        },
+    )


The implementation of get_bounding_rect can be simplified. Since the types used with this function (u32 and i32) implement the Ord trait, you can change the trait bound from PartialOrd to Ord. This allows using the more concise min and max methods.

pub fn get_bounding_rect<T>(points: &[(T, T)]) -> (T, T, T, T) where T: Copy + Ord + num_traits::Bounded, { points.iter().fold( (T::max_value(), T::max_value(), T::min_value(), T::min_value()), |(min_x, min_y, max_x, max_y), &(x, y)| { (min_x.min(x), min_y.min(y), max_x.max(x), max_y.max(y)) }, ) }

Add OCR plugin with model handling and recognition pipeline

0ee0c0f

gemini-code-assist bot reviewed Jan 17, 2026

View reviewed changes

	let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?;
	let ocr = OcrContext::new(None, Some(model_type), None, None, None, None).await?;

	pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {
	pub fn ocr(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

	pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> {
	pub fn recognize(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> {

	pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {
	pub fn recognize_batch(&self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> {

	.max_by(\|(_, a), (_, b)\| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
	.max_by(\|(_, a), (_, b)\| a.total_cmp(b))

OCR Support #4

Are you sure you want to change the base?

OCR Support #4

Uh oh!

Conversation

Lortunate commented Jan 17, 2026

Uh oh!

gemini-code-assist bot commented Jan 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants