Text Extraction with OpenAI Vision API in a Node.js Application

A highly optimized Node.js application demonstrating cost-effective text extraction from images using OpenAI's Vision API. This project showcases how to achieve significant cost reductions through model selection and token optimization while maintaining high accuracy.

🎯 Purpose

This project demonstrates:

How to optimize token usage in OpenAI's Vision API
Best practices for performance monitoring
Efficient image processing strategies

💡 Key Findings

Token Optimization

Optimized token usage: ~110 tokens per image
Achieved through model selection (GPT-4 Turbo) (maybe use gpt-4o-mini model, because more capable for text extraction)
Optimized system prompts and messages
Implemented "low detail" mode without accuracy loss

Cost Efficiency

Cost per image: $0.0012 (0.04 TL)
96% cost reduction achieved
Highly scalable for large datasets

🚀 Features

Efficient text extraction using GPT-4 Turbo
Detailed token usage analytics
Timestamped results with comprehensive stats

📊 Performance Metrics

Token usage: ~110 per image
Cost per image: $0.0012 (0.04 TL)
Processing time: 1-2 seconds
Accuracy rate: Very High

💰 Cost Analysis (Optimized)

Scale comparison at current API prices:

1,000 images: ~$1.20 (42 TL)
10,000 images: ~$12.00 (420 TL)
100,000 images: ~$120.00 (4,200 TL)

🛠️ Setup & Usage

Clone the repository:

git clone https://github.com/yourusername/openai-vision-optimizer.git
cd openai-vision-optimizer

Install dependencies:

npm install

Create a .env file in the root directory:

OPENAI_API_KEY=your_api_key_here

Place your PNG files in the /image folder
Run the application:

node captcha_solver.js

Results are automatically saved in /results with timestamps (format: results-YYYYMMDD-HHMMSS.txt)

📝 Output Format

filename.png: extracted_text
  Response Time: X.XX seconds
  Token Usage: XXX (XXX input + XX output)
  Cost: $X.XXXX (X.XX TL)

⚙️ Technical Optimizations

API Optimization
- Minimal system prompts
- Optimized token usage
- Rate limit handling
Cost Management
- Token usage monitoring
- Detailed reporting
- Multi-currency support (USD/TL)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

⚠️ Disclaimer

This project is not affiliated with OpenAI. All API pricing and performance metrics are subject to change. Please refer to OpenAI's official documentation for current pricing.

📸 Example Captcha Images

Below are some example captcha images that can be processed using this application:

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
image		image
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
captcha_solver.js		captcha_solver.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Extraction with OpenAI Vision API in a Node.js Application

🎯 Purpose

💡 Key Findings

Token Optimization

Cost Efficiency

🚀 Features

📊 Performance Metrics

💰 Cost Analysis (Optimized)

🛠️ Setup & Usage

📝 Output Format

⚙️ Technical Optimizations

🤝 Contributing

⚠️ Disclaimer

📸 Example Captcha Images

About

Uh oh!

Releases

Packages

Uh oh!

Languages

serkanince/captcha

Folders and files

Latest commit

History

Repository files navigation

Text Extraction with OpenAI Vision API in a Node.js Application

🎯 Purpose

💡 Key Findings

Token Optimization

Cost Efficiency

🚀 Features

📊 Performance Metrics

💰 Cost Analysis (Optimized)

🛠️ Setup & Usage

📝 Output Format

⚙️ Technical Optimizations

🤝 Contributing

⚠️ Disclaimer

📸 Example Captcha Images

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages