-
Notifications
You must be signed in to change notification settings - Fork 0
OCR Support #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
OCR Support #4
Conversation
Summary of ChangesHello @Lortunate, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request integrates Optical Character Recognition (OCR) functionality into the project by adding a new 'ocr' plugin. The plugin leverages ONNX Runtime for efficient and accurate text detection and recognition from images. It includes robust model management for downloading and caching necessary ONNX models and character dictionaries, ensuring a streamlined setup for users. The core logic encompasses image preprocessing, inference with specialized detection and recognition models, and post-processing to extract recognized text along with their bounding box locations and confidence scores. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces OCR support as a new plugin. While the implementation is generally well-structured, a security audit identified two significant vulnerabilities: a high-severity Server-Side Request Forgery (SSRF) vulnerability due to improper validation of user-provided URLs for model downloads, and a medium-severity Time-of-check Time-of-use (TOCTOU) vulnerability in the model downloading logic that could allow local attackers to overwrite arbitrary files. Additionally, there are suggestions for improving API design, performance, and robustness.
| pub async fn new<P: AsRef<Path>>( | ||
| models_dir: Option<P>, | ||
| model_type: Option<OcrModelType>, | ||
| det_url: Option<&str>, | ||
| rec_url: Option<&str>, | ||
| keys_url: Option<&str>, | ||
| on_progress: Option<Arc<dyn Fn(f32) + Send + Sync>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OcrContext::new function accepts URLs for model and key files (det_url, rec_url, keys_url). These URLs are passed to the ModelManager to download files without proper validation against a list of allowed domains. An attacker could provide a URL to an internal service, causing the application to make an unintended network request (SSRF). This could be used to probe the internal network or access sensitive data from cloud metadata services.
| pub fn normalize_image(img: &DynamicImage, mean: &[f32; 3], std: &[f32; 3]) -> Array4<f32> { | ||
| let (w, h) = img.dimensions(); | ||
| let img_rgb = img.to_rgb8(); | ||
|
|
||
| let pixels: Vec<f32> = img_rgb.pixels().flat_map(|p| normalize_pixel(p, mean, std)).collect(); | ||
|
|
||
| Array3::from_shape_vec((h as usize, w as usize, 3), pixels) | ||
| .unwrap() | ||
| .permuted_axes([2, 0, 1]) | ||
| .insert_axis(ndarray::Axis(0)) | ||
| .to_owned() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .unwrap() on line 34 could cause a panic if the number of elements in pixels doesn't match the expected shape (h, w, 3). While it's unlikely with the current logic, it's safer for a library function to avoid panics and handle this potential error by returning a Result. You will also need to add use anyhow::Context; at the top of the file and update the call site in detector.rs to handle the Result by changing Ok(normalize_image(...)) to just normalize_image(...).
pub fn normalize_image(img: &DynamicImage, mean: &[f32; 3], std: &[f32; 3]) -> Result<Array4<f32>> {
let (w, h) = img.dimensions();
let img_rgb = img.to_rgb8();
let pixels: Vec<f32> = img_rgb.pixels().flat_map(|p| normalize_pixel(p, mean, std)).collect();
let array3d = Array3::from_shape_vec((h as usize, w as usize, 3), pixels)
.with_context(|| "Failed to create Array3 from pixels")?;
Ok(array3d
.permuted_axes([2, 0, 1])
.insert_axis(ndarray::Axis(0))
.to_owned())
}| let file_path = self.save_dir.join(filename); | ||
|
|
||
| if file_path.exists() { | ||
| info!("Model {} already exists at {:?}", filename, file_path); | ||
| if let Some(cb) = on_progress { | ||
| cb(1.0); | ||
| } | ||
| return Ok(file_path); | ||
| } | ||
|
|
||
| if !self.save_dir.exists() { | ||
| fs::create_dir_all(&self.save_dir).await?; | ||
| } | ||
|
|
||
| info!("Downloading model from {} to {:?}", url, file_path); | ||
| self.download_file(url, &file_path, on_progress).await?; | ||
|
|
||
| Ok(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ensure_model function, the code checks if a file exists (line 39) and then later creates and writes to it (line 52, calling download_file which creates the file). This creates a Time-of-Check to Time-of-Use (TOCTOU) race condition. An attacker with local filesystem access could create a symbolic link to a sensitive file (e.g., /etc/passwd) between the check and the write operation. This would cause the downloaded content to overwrite the linked file.
| dirs = "6.0.0" | ||
| futures-util = "0.3" | ||
| image = "0.25" | ||
| imageproc = "0.26.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The imageproc dependency transitively brings in nalgebra. Your Cargo.lock file shows that multiple versions of the glam crate are being pulled in, likely due to nalgebra's versioned feature flags for glam interoperability. This increases compile times and binary size. While this is an upstream issue, it's worth being aware of. If you find you need nalgebra for other things, you might consider adding it as a direct dependency and carefully selecting features to see if you can unify the glam version.
| }; | ||
|
|
||
| info!("Initializing OCR Context..."); | ||
| let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type annotation <&Path> is redundant here and can be removed for cleaner code. Also, since the recognize method doesn't require a mutable reference to OcrContext (with other suggested changes), ocr no longer needs to be declared as mutable.
| let mut ocr = OcrContext::new::<&Path>(None, Some(model_type), None, None, None, None).await?; | |
| let ocr = OcrContext::new(None, Some(model_type), None, None, None, None).await?; |
| Ok(Self { detector, recognizer }) | ||
| } | ||
|
|
||
| pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Detector::detect and Recognizer::recognize_batch can be changed to take &self, this method can also be changed to take &self, as it does not need to mutate the engine's state.
| pub fn ocr(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> { | |
| pub fn ocr(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> { |
| Ok(Self { engine }) | ||
| } | ||
|
|
||
| pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since OcrEngine::ocr can be changed to take &self, this public method can also be changed to take &self. This makes the library's API more ergonomic and correct.
| pub fn recognize(&mut self, image: &DynamicImage) -> Result<Vec<OcrResult>> { | |
| pub fn recognize(&self, image: &DynamicImage) -> Result<Vec<OcrResult>> { |
| Ok(keys) | ||
| } | ||
|
|
||
| pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recognize_batch method takes &mut self, but it doesn't appear to modify the Recognizer's state. The ort::Session::run method it calls takes &self. It's better to use &self to make the API clearer and more correct.
| pub fn recognize_batch(&mut self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> { | |
| pub fn recognize_batch(&self, images: &[DynamicImage]) -> Result<Vec<(String, f32)>> { |
| let (max_idx, max_val) = row | ||
| .iter() | ||
| .enumerate() | ||
| .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using partial_cmp with unwrap_or to find the maximum value in a float slice can be surprising if NaN values are present. A more robust and idiomatic way to compare floats for a total order is to use f32::total_cmp.
| .max_by(|(_, a), (_, b)| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)) | |
| .max_by(|(_, a), (_, b)| a.total_cmp(b)) |
| pub fn get_bounding_rect<T>(points: &[(T, T)]) -> (T, T, T, T) | ||
| where | ||
| T: Copy + PartialOrd + num_traits::Bounded, | ||
| { | ||
| points.iter().fold( | ||
| (T::max_value(), T::max_value(), T::min_value(), T::min_value()), | ||
| |(min_x, min_y, max_x, max_y), &(x, y)| { | ||
| ( | ||
| if x < min_x { x } else { min_x }, | ||
| if y < min_y { y } else { min_y }, | ||
| if x > max_x { x } else { max_x }, | ||
| if y > max_y { y } else { max_y }, | ||
| ) | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of get_bounding_rect can be simplified. Since the types used with this function (u32 and i32) implement the Ord trait, you can change the trait bound from PartialOrd to Ord. This allows using the more concise min and max methods.
pub fn get_bounding_rect<T>(points: &[(T, T)]) -> (T, T, T, T)
where
T: Copy + Ord + num_traits::Bounded,
{
points.iter().fold(
(T::max_value(), T::max_value(), T::min_value(), T::min_value()),
|(min_x, min_y, max_x, max_y), &(x, y)| {
(min_x.min(x), min_y.min(y), max_x.max(x), max_y.max(y))
},
)
}
No description provided.