diff --git a/docs/01-overview.md b/docs/01-overview.md deleted file mode 100644 index 03b54d1..0000000 --- a/docs/01-overview.md +++ /dev/null @@ -1,101 +0,0 @@ -# Overview - -## What is CodeCrow? - -CodeCrow is an automated code review platform that supports multiple VCS providers including Bitbucket Cloud and GitHub. It analyzes code changes in pull requests and branches using AI-powered analysis through Model Context Protocol (MCP) servers, combined with Retrieval-Augmented Generation (RAG) for contextual understanding of your codebase. - -## Supported Platforms - -| Platform | OAuth App | Personal Token | Webhooks | PR Comments | -|----------|-----------|----------------|----------|-------------| -| Bitbucket Cloud | ✅ | ✅ | ✅ | ✅ | -| GitHub | ✅ | ✅ | ✅ | ✅ | -| Bitbucket Server | ❌ | ✅ | ✅ | ✅ | -| GitLab | 🚧 Coming soon | 🚧 | 🚧 | 🚧 | - -## Key Features - -- **Automated Code Analysis**: Analyzes pull requests and branch merges automatically via webhooks -- **Multi-Platform Support**: Connect Bitbucket Cloud, GitHub, or both simultaneously -- **AI-Powered Reviews**: Uses LLM models through OpenRouter for intelligent code analysis -- **Contextual Understanding**: RAG pipeline indexes your codebase for context-aware analysis -- **Issue Tracking**: Tracks issues across branches and pull requests, detects when issues are resolved -- **Multi-Repository Support**: Manage multiple workspaces and projects -- **Private Repository Access**: Custom MCP servers for secure VCS API access -- **Role-Based Access**: Workspace and project-level permissions - -## System Components - -### Java Ecosystem -- **codecrow-core**: Shared models, persistence layer, common services -- **codecrow-security**: Authentication, authorization, JWT handling -- **codecrow-vcs-client**: VCS platform API client (Bitbucket, GitHub) -- **pipeline-agent**: Analysis processing engine and API gateway -- **web-server**: Main backend REST API -- **bitbucket-mcp**: MCP servers for VCS integration (supports both Bitbucket and GitHub) - -### Python Ecosystem -- **mcp-client**: Modified MCP client that generates prompts and communicates with AI -- **rag-pipeline**: Vector database indexing and semantic search - -### Frontend -- React-based web interface with shadcn/ui components - -### Infrastructure -- PostgreSQL for relational data -- Redis for sessions and caching -- Qdrant for vector storage -- Docker Compose orchestration - -## Analysis Flow - -1. **Webhook Trigger**: VCS platform sends webhook on PR creation/update or branch merge -2. **Pipeline Agent**: Receives webhook, fetches code and metadata -3. **RAG Indexing**: First-time branch analysis triggers full repository indexing -4. **MCP Client**: Generates analysis prompts with context from RAG and previous issues -5. **AI Analysis**: OpenRouter processes prompts using configured LLM -6. **Result Processing**: Pipeline agent stores results, updates issue status -7. **Web Interface**: Users view analysis results and manage projects - -## Core Concepts - -### Workspaces -Top-level organizational unit. Each workspace contains projects and has members with specific roles. - -### Projects -Repository representation within a workspace. Projects are bound to VCS repositories and AI connections. - -### Analysis Types -- **Branch Analysis**: Incremental analysis after PR merge, checks if existing issues are resolved -- **Pull Request Analysis**: Analyzes changed files when PR is created or updated - -### Issues -Code problems detected by analysis. Tracked at branch level (BranchIssue) and PR level (CodeAnalysisIssue). - -### RAG Integration -Repository code is indexed in Qdrant vector database. During analysis, relevant code context is retrieved to improve AI understanding. - -## Technology Stack - -**Backend**: -- Java 17, Spring Boot 3.2.5 -- PostgreSQL 15 -- Redis 7 -- Maven - -**Python Services**: -- FastAPI, Uvicorn -- LangChain, LlamaIndex -- Qdrant Client -- OpenAI SDK (OpenRouter compatible) - -**Frontend**: -- React, TypeScript -- Vite -- shadcn/ui, Radix UI -- TanStack Query - -**Infrastructure**: -- Docker, Docker Compose -- PostgreSQL, Redis, Qdrant - diff --git a/docs/02-getting-started.md b/docs/02-getting-started.md deleted file mode 100644 index 6a58559..0000000 --- a/docs/02-getting-started.md +++ /dev/null @@ -1,283 +0,0 @@ -# Getting Started - -## Prerequisites - -- Docker 20.10+ -- Docker Compose v2.0+ -- 4GB+ available RAM -- 10GB+ disk space - -Optional (for local development): -- Java 17+ -- Maven 3.8+ -- Node.js 18+ (with Bun or npm) -- Python 3.10+ - -## Quick Start - -### 1. Clone Repository - -```bash -git clone -cd codecrow -``` - -### 2. Configure Services - -Copy sample configuration files and update credentials: - -```bash -# Copy docker-compose configuration -cp deployment/docker-compose-sample.yml deployment/docker-compose.yml - -# Copy Java configuration -cp deployment/config/java-shared/application.properties.sample \ - deployment/config/java-shared/application.properties - -# Copy Python service configurations -cp deployment/config/mcp-client/.env.sample \ - deployment/config/mcp-client/.env - -cp deployment/config/rag-pipeline/.env.sample \ - deployment/config/rag-pipeline/.env - -cp deployment/config/web-frontend/.env.sample \ - deployment/config/web-frontend/.env -``` - -### 3. Update Required Credentials - -Edit the following files with your actual credentials: - -**deployment/config/java-shared/application.properties**: -```properties -# Generate new secrets (use openssl rand -base64 32) -codecrow.security.jwtSecret= -codecrow.security.encryption-key= - -# Update base URL for your deployment -codecrow.web.base.url=http://localhost:8080 -``` - -**deployment/config/rag-pipeline/.env**: -```bash -# Get API key from https://openrouter.ai/ -OPENROUTER_API_KEY=sk-or-v1-your-api-key-here -``` - -**deployment/config/web-frontend/.env**: -```bash -# Update if deploying to different domain -VITE_API_URL=http://localhost:8081/api -VITE_WEBHOOK_URL=http://localhost:8082 - -# Google OAuth (optional - for social login) -VITE_GOOGLE_CLIENT_ID=your-google-client-id -``` - -### 4. Configure Google OAuth (Optional) - -Google OAuth enables users to sign in/sign up with their Google accounts. To enable this feature: - -#### Step 1: Create Google Cloud Project -1. Go to [Google Cloud Console](https://console.cloud.google.com/) -2. Create a new project or select an existing one -3. Navigate to **APIs & Services → Credentials** - -#### Step 2: Create OAuth 2.0 Client ID -1. Click **Create Credentials → OAuth 2.0 Client ID** -2. Select **Web application** as the application type -3. Configure the following: - - **Name**: CodeCrow (or your preferred name) - - **Authorized JavaScript origins**: - - `http://localhost:8080` (for local development) - - `https://your-domain.com` (for production) - - **Authorized redirect URIs**: Same as JavaScript origins -4. Click **Create** and copy the **Client ID** - -#### Step 3: Configure Application -Add the Google Client ID to your configuration files: - -**Backend (deployment/config/java-shared/application.properties)**: -```properties -codecrow.oauth.google.client-id=your-google-client-id.apps.googleusercontent.com -``` - -**Frontend (deployment/config/web-frontend/.env)**: -```bash -VITE_GOOGLE_CLIENT_ID=your-google-client-id.apps.googleusercontent.com -``` - -> **Note**: Both frontend and backend must use the same Google Client ID. The Google Sign-In button will only appear if `VITE_GOOGLE_CLIENT_ID` is configured. - -### 5. Build and Start Services - -Use the automated build script: - -```bash -./tools/production-build.sh -``` - -This script will: -1. Build Java artifacts with Maven -2. Copy MCP servers JAR to Python client -3. Build and start all Docker containers -4. Wait for services to be healthy - -Or manually with Docker Compose: - -```bash -cd deployment -docker compose up -d --build -``` - -### 6. Verify Services - -Check that all services are running: - -```bash -cd deployment -docker compose ps -``` - -Expected services: -- codecrow-postgres (port 5432) -- codecrow-redis (port 6379) -- codecrow-qdrant (port 6333) -- codecrow-web-application (port 8081) -- codecrow-pipeline-agent (port 8082) -- codecrow-mcp-client (port 8000) -- codecrow-rag-pipeline (port 8001) -- codecrow-web-frontend (port 8080) - -### 7. Access Application - -Open browser: `http://localhost:8080` - -Default admin credentials are created on first startup (check logs or database). - -### 8. Configure VCS Integration - -CodeCrow supports multiple VCS providers and connection types: - -#### Option A: Bitbucket Cloud App (Recommended) - -The Bitbucket Cloud App provides 1-click integration with automatic webhook setup: - -1. Log into CodeCrow web interface -2. Navigate to **Settings → Integrations** -3. Click **Install App** on the Bitbucket Cloud card -4. You'll be redirected to Atlassian to authorize the app -5. Select the Bitbucket workspace(s) to install -6. After installation, you're redirected back to CodeCrow -7. **Step 1**: Select repositories to onboard -8. **Step 2**: Configure AI connection (select existing or create new) -9. Complete setup - projects are created automatically with webhooks configured - -Benefits: -- No manual webhook configuration required -- Automatic OAuth2 token refresh -- Workspace-level access to all repositories -- AI connection setup integrated into onboarding flow -- Simplified repository selection - -#### Option B: Manual OAuth Connection - -For more granular control, use manual OAuth connections: - -1. Log into CodeCrow web interface -2. Create a workspace -3. Navigate to **Settings → Code Hosting** -4. Add a new Bitbucket Cloud connection -5. Create a project using one of these methods: - - **New Project (Step-by-step wizard)**: - - Click "New Project" on the Projects page - - **Step 1**: Select VCS connection and repository - - **Step 2**: Enter project name and description - - **Step 3**: Select or create AI connection - - **Import Project (from existing connection)**: - - Click "Import Project" dropdown on the Projects page - - Select a VCS connection from the list - - Follow the step-by-step wizard (same as above) - -6. Generate project webhook token -7. Configure Bitbucket webhook manually: - - URL: `http://:8082/api/v1/bitbucket-cloud/webhook` - - Events: Pull Request (created, updated), Repository (push) - - Add authentication header with project token - -### 9. Configure AI Connection - -AI connections can be configured in multiple ways: - -**During Project Creation (Recommended):** -- When creating a new project or importing from VCS, the final step allows you to select an existing AI connection or create a new one -- Supported providers: OpenRouter, OpenAI, Anthropic - -**After Project Creation:** -1. Navigate to **Projects → \** -2. Click on 'settings' in the top right corner -3. Go to AI Connections tab -4. Select the desired connection and click the 'Link to current project' button - -#### Supported VCS Providers - -| Provider | Status | Connection Types | -|----------|--------|------------------| -| Bitbucket Cloud | ✅ Available | App, OAuth Manual | -| Bitbucket Server/DC | 🔜 Coming Soon | Personal Token | -| GitHub | 🔜 Coming Soon | App, OAuth | -| GitLab | 🔜 Coming Soon | App, OAuth | - -## Configuration Overview - -### Service Ports - -| Service | Port | Access | -|---------|------|--------| -| Frontend | 8080 | Public | -| Web Server API | 8081 | Public | -| Pipeline Agent | 8082 | Webhook only | -| MCP Client | 8000 | Internal only | -| RAG Pipeline | 8001 | Internal only | -| PostgreSQL | 5432 | Internal only | -| Redis | 6379 | Internal only | -| Qdrant | 6333 | Internal only | - -### Security Considerations - -**Important**: MCP Client (port 8000) must NOT be publicly accessible. It has no authentication as it's designed for internal communication only. - -For production: -- Use reverse proxy (nginx, Traefik) with SSL -- Restrict pipeline-agent access to Bitbucket webhook IPs only -- Keep MCP client and RAG pipeline internal -- Use strong JWT and encryption keys -- Enable firewall rules - -## Initial Setup Checklist - -- [ ] Copy all sample config files -- [ ] Generate new JWT secret -- [ ] Generate new encryption key -- [ ] Set OpenRouter API key -- [ ] Update base URLs for your domain -- [ ] Configure Google OAuth Client ID (optional, for social login) -- [ ] Configure database credentials (if not using defaults) -- [ ] Run production-build.sh or docker compose up -- [ ] Verify all services are healthy -- [ ] Access web interface -- [ ] Create first workspace -- [ ] Create first project -- [ ] Configure Bitbucket webhook -- [ ] Test with sample pull request - -## Next Steps - -- [Configuration Reference](05-configuration.md) - Detailed configuration options -- [Architecture](03-architecture.md) - Understand system design -- [API Reference](06-api-reference.md) - Integrate with REST API -- [Troubleshooting](11-troubleshooting.md) - Common issues - diff --git a/docs/03-architecture.md b/docs/03-architecture.md deleted file mode 100644 index d53c539..0000000 --- a/docs/03-architecture.md +++ /dev/null @@ -1,332 +0,0 @@ -# Architecture - -## System Architecture Diagram - -``` -┌─────────────────────────────────────────────────────────────────┐ -│ Bitbucket Cloud │ -└────────────────────────────┬────────────────────────────────────┘ - │ Webhooks - ▼ -┌─────────────────────────────────────────────────────────────────┐ -│ Pipeline Agent (8082) │ -│ ┌────────────────────────────────────────────────────────┐ │ -│ │ Webhook Controller │ │ -│ │ - Validates requests │ │ -│ │ - Acquires analysis lock │ │ -│ │ - Fetches repository data via VCS client │ │ -│ └──────────────┬──────────────────────────────────────────┘ │ -│ │ │ -│ ┌──────────────▼──────────────────────────────────────────┐ │ -│ │ Analysis Service │ │ -│ │ - Prepares analysis request │ │ -│ │ - Checks for previous issues │ │ -│ │ - Sends to MCP Client │ │ -│ └──────────────┬──────────────────────────────────────────┘ │ -│ │ │ -│ ┌──────────────▼──────────────────────────────────────────┐ │ -│ │ RAG Integration Service │ │ -│ │ - Triggers indexing for first branch analysis │ │ -│ │ - Updates index incrementally │ │ -│ └──────────────┬──────────────────────────────────────────┘ │ -│ │ │ -│ ┌──────────────▼──────────────────────────────────────────┐ │ -│ │ Result Processor │ │ -│ │ - Processes analysis results │ │ -│ │ - Updates issue statuses │ │ -│ │ - Stores in database │ │ -│ └──────────────────────────────────────────────────────────┘ │ -└───────┬─────────────────────────────────┬───────────────────────┘ - │ │ - │ │ RAG requests - ▼ ▼ -┌──────────────────┐ ┌──────────────────────────┐ -│ MCP Client │ │ RAG Pipeline │ -│ (8000) │◄──────────┤ (8001) │ -│ │ Context │ │ -│ ┌──────────────┐ │ │ ┌──────────────────────┐ │ -│ │ Prompt Gen │ │ │ │ Indexing Service │ │ -│ │ - Builds │ │ │ │ - Full index build │ │ -│ │ prompts │ │ │ │ - Incremental upd │ │ -│ │ - RAG query │ │ │ └──────────────────────┘ │ -│ └──────┬───────┘ │ │ │ -│ │ │ │ ┌──────────────────────┐ │ -│ ┌──────▼───────┐ │ │ │ Query Service │ │ -│ │ MCP Tools │ │ │ │ - Semantic search │ │ -│ │ - Bitbucket │ │ │ │ - Context retrieval │ │ -│ │ MCP │ │ │ └──────────────────────┘ │ -│ └──────┬───────┘ │ │ │ -│ │ │ │ ┌──────────────────────┐ │ -│ ┌──────▼───────┐ │ │ │ Qdrant Integration │ │ -│ │ LLM Client │ │ │ │ - Vector operations │ │ -│ │ - OpenRouter│ │ │ └──────────────────────┘ │ -│ └──────────────┘ │ └──────────────────────────┘ -└──────────────────┘ - │ - │ API calls - ▼ -┌──────────────────────────────┐ -│ OpenRouter / LLM Provider │ -└──────────────────────────────┘ - -┌─────────────────────────────────────────────────────────────────┐ -│ Web Server (8081) │ -│ ┌────────────────────────────────────────────────────────┐ │ -│ │ REST API Controllers │ │ -│ │ - Auth, Users, Workspaces, Projects │ │ -│ │ - Analysis, Issues, Pull Requests │ │ -│ │ - VCS Integration, AI Connections │ │ -│ └──────────────┬──────────────────────────────────────────┘ │ -│ │ │ -│ ┌──────────────▼──────────────────────────────────────────┐ │ -│ │ Business Logic Services │ │ -│ └──────────────┬──────────────────────────────────────────┘ │ -│ │ │ -│ ┌──────────────▼──────────────────────────────────────────┐ │ -│ │ Security Layer (JWT, Permissions) │ │ -│ └──────────────────────────────────────────────────────────┘ │ -└───────┬─────────────────────────────────────────────────────────┘ - │ - ▼ -┌──────────────────────────────┐ -│ PostgreSQL (5432) │ -│ ┌─────────────────────────┐ │ -│ │ - Users, Workspaces │ │ -│ │ - Projects, Branches │ │ -│ │ - Issues, Analyses │ │ -│ │ - Permissions, Tokens │ │ -│ └─────────────────────────┘ │ -└──────────────────────────────┘ - -┌──────────────────────────────┐ -│ Redis (6379) │ -│ - Sessions │ -│ - Cache │ -└──────────────────────────────┘ - -┌──────────────────────────────┐ -│ Qdrant (6333) │ -│ - Code embeddings │ -│ - Semantic search │ -└──────────────────────────────┘ - - ▲ - │ -┌───────┴──────────────────────┐ -│ Frontend (8080) │ -│ - React SPA │ -│ - User interface │ -└──────────────────────────────┘ -``` - -## Component Interactions - -### 1. Webhook Processing Flow - -``` -Bitbucket → Pipeline Agent → Lock Check → Fetch Code → -Build Request → MCP Client → AI Analysis → Store Results -``` - -1. Bitbucket sends webhook (PR event or push event) -2. Pipeline agent validates webhook signature and project token -3. Acquires analysis lock to prevent concurrent analysis -4. VCS client fetches repository metadata, diffs, file content -5. For first branch analysis, triggers RAG indexing -6. Builds analysis request with context -7. Sends to MCP client -8. MCP client queries RAG for relevant code context -9. Generates prompts with context -10. Calls OpenRouter LLM -11. Returns analysis results -12. Pipeline agent processes results, updates database -13. Releases analysis lock - -### 2. Branch Analysis Flow - -``` -PR Merged → Webhook → Pipeline Agent → Fetch Changed Files → -Query Existing Issues → RAG Context → MCP Analysis → -Update Issue Status → Store Results -``` - -**First Branch Analysis**: -- Triggers full repository indexing in RAG -- Creates RAG index status entry -- All repository files indexed for semantic search - -**Subsequent Branch Analysis**: -- Incremental RAG index update -- Only changed files re-indexed -- Existing branch issues checked for resolution -- Issues marked as resolved if fixed in merge - -### 3. Pull Request Analysis Flow - -``` -PR Created/Updated → Webhook → Pipeline Agent → Fetch Diffs → -Query Previous PR Issues → RAG Context → MCP Analysis → -Store Issues → Link to PR -``` - -- Analyzes only changed files (diff) -- Reuses previous analysis if PR was analyzed before -- Creates new CodeAnalysisIssue entries -- Links issues to CodeAnalysis and PullRequest entities - -### 4. RAG Integration Flow - -``` -Code Files → Chunking → Embedding (OpenRouter) → -Qdrant Storage → Semantic Search → Context Retrieval -``` - -**Indexing**: -- Files split into chunks (800 chars for code, 1000 for text) -- Chunks embedded using OpenRouter embedding model -- Vectors stored in Qdrant with metadata - -**Retrieval**: -- Query embedded using same model -- Similarity search in Qdrant -- Top K results returned with context - -### 5. Authentication Flow - -``` -User Login → JWT Token → Store in Session → -Validate on Each Request → Permission Check -``` - -- JWT tokens for API authentication -- Redis for session storage -- Role-based access control (workspace and project level) -- Project tokens for webhook authentication - -## Data Flow - -### Analysis Request Structure - -```json -{ - "project_id": "uuid", - "analysis_type": "PULL_REQUEST | BRANCH", - "repository": { - "workspace": "workspace-slug", - "repo_slug": "repo-slug", - "branch": "feature/branch-name" - }, - "changed_files": [ - { - "path": "src/main/java/Example.java", - "diff": "...", - "content": "..." - } - ], - "previous_issues": [...], - "metadata": { - "pr_number": 123, - "author": "username" - } -} -``` - -### Analysis Response Structure - -```json -{ - "issues": [ - { - "file": "src/main/java/Example.java", - "line": 42, - "severity": "HIGH", - "category": "SECURITY", - "description": "SQL injection vulnerability", - "suggestion": "Use parameterized queries" - } - ], - "summary": { - "total_issues": 5, - "by_severity": {"HIGH": 1, "MEDIUM": 3, "LOW": 1} - } -} -``` - -## Scalability Considerations - -### Horizontal Scaling -- **Web Server**: Stateless, can scale horizontally behind load balancer -- **Pipeline Agent**: Single instance recommended (uses DB locks for concurrency) -- **MCP Client**: Can scale with queue-based distribution -- **RAG Pipeline**: Can scale for read operations - -### Performance Optimization -- Analysis locks prevent concurrent analysis of same repository -- Redis caching for frequently accessed data -- Incremental RAG updates reduce indexing overhead -- Async processing for long-running analyses - -### Resource Requirements - -**Minimum**: -- 4 CPU cores -- 8GB RAM -- 50GB disk - -**Recommended**: -- 8 CPU cores -- 16GB RAM -- 200GB SSD - -**Database**: -- PostgreSQL with regular vacuuming -- Indexes on frequently queried columns - -**Vector Database**: -- Qdrant memory mapped mode for large codebases -- SSD for better performance - -## Security Architecture - -### Authentication Layers -1. **User Authentication**: JWT tokens, session management -2. **Webhook Authentication**: Project-specific tokens -3. **Inter-Service**: Internal network, no public exposure - -### Authorization -- Workspace-level roles (Owner, Admin, Member, Viewer) -- Project-level permissions -- Permission templates for flexible access control - -### Data Protection -- Sensitive data encrypted at rest (encryption key in config) -- JWT secrets for token signing -- HTTPS recommended for production -- VCS credentials encrypted in database - -## Technology Decisions - -### Why Java + Spring Boot? -- Robust enterprise framework -- Strong typing and compile-time safety -- Excellent ORM (Hibernate/JPA) -- Large ecosystem - -### Why Python for MCP Client? -- MCP SDK available in Python -- LangChain/LlamaIndex for RAG -- Rapid development for AI integration -- Rich AI/ML library ecosystem - -### Why Qdrant? -- Fast vector similarity search -- Easy Docker deployment -- Good Python/REST API support -- Memory-efficient - -### Why OpenRouter? -- Single API for multiple LLM providers -- No vendor lock-in -- Cost-effective -- Easy to switch models - diff --git a/docs/04-modules/frontend.md b/docs/04-modules/frontend.md deleted file mode 100644 index 16cde27..0000000 --- a/docs/04-modules/frontend.md +++ /dev/null @@ -1,526 +0,0 @@ -# Frontend - -React-based single-page application providing the user interface for CodeCrow. - -## Technology Stack - -- **Framework**: React 18 -- **Build Tool**: Vite -- **Language**: TypeScript -- **UI Components**: shadcn/ui with Radix UI -- **Styling**: Tailwind CSS -- **State Management**: TanStack Query (React Query) -- **Routing**: React Router -- **HTTP Client**: Axios -- **Deployment**: Static build served with `serve` - -## Project Structure - -``` -frontend/ -├── public/ # Static assets -│ ├── favicon.ico -│ ├── logo.png -│ └── robots.txt -├── src/ -│ ├── main.tsx # Application entry point -│ ├── App.tsx # Root component -│ ├── api_service/ # API client layer -│ │ ├── api.ts # Base API configuration -│ │ ├── api.interface.ts # TypeScript interfaces -│ │ ├── ai/ # AI connection APIs -│ │ ├── analysis/ # Analysis APIs -│ │ ├── auth/ # Authentication APIs -│ │ ├── codeHosting/ # VCS integration APIs -│ │ ├── project/ # Project APIs -│ │ ├── user/ # User APIs -│ │ └── workspace/ # Workspace APIs -│ ├── components/ # React components -│ │ ├── ui/ # shadcn/ui base components -│ │ ├── AppSidebar.tsx -│ │ ├── DashboardLayout.tsx -│ │ ├── ProjectStats.tsx -│ │ ├── IssuesByFileDisplay.tsx -│ │ └── ... -│ ├── pages/ # Page components -│ │ ├── auth/ # Login, Register -│ │ ├── dashboard/ # Main dashboard -│ │ ├── workspace/ # Workspace management -│ │ ├── project/ # Project views -│ │ └── analysis/ # Analysis results -│ ├── context/ # React contexts -│ ├── hooks/ # Custom hooks -│ ├── lib/ # Utility libraries -│ └── config/ # Configuration -├── package.json # Dependencies -├── vite.config.ts # Vite configuration -├── tsconfig.json # TypeScript config -├── tailwind.config.ts # Tailwind config -└── Dockerfile # Container build -``` - -## Key Features - -### Authentication -- JWT-based authentication -- Login/Register forms -- Google OAuth social login -- Two-Factor Authentication (2FA) support - - Google Authenticator (TOTP) - - Email verification codes - - Backup codes for recovery -- Protected routes -- Session persistence -- Auto-logout on token expiration - -> **Note**: Email-based 2FA requires SMTP configuration on the backend. See [SMTP Setup Guide](../SMTP_SETUP.md) for details. - -### Workspace Management -- Create and manage workspaces -- Invite and manage members -- Role assignment (Owner, Admin, Member, Viewer) -- Workspace switching - -### Project Management -- Create projects linked to Bitbucket repositories -- Configure AI connections -- Generate webhook tokens -- View project statistics -- Default branch selection - -### Analysis Results -- View pull request analyses -- Browse branch issues -- Filter issues by severity, category, file -- Issue detail view with code snippets -- Analysis history and trends - -### Issue Tracking -- Active issues dashboard -- Resolved issues tracking -- Group issues by file -- Severity-based filtering -- Search and filter capabilities - -### VCS Integration -- Connect Bitbucket accounts -- Browse accessible repositories -- Verify webhook configuration -- Repository metadata display - -### Statistics & Insights -- Project health metrics -- Issue trends over time -- PR/Branch analysis history -- Severity distribution charts -- File-level issue hotspots - -## API Integration - -### Base Configuration - -**src/config/api.ts**: -```typescript -export const API_BASE_URL = import.meta.env.VITE_API_URL || 'http://localhost:8081/api'; -export const WEBHOOK_URL = import.meta.env.VITE_WEBHOOK_URL || 'http://localhost:8082'; -``` - -### API Service Layer - -**src/api_service/api.ts**: -- Axios instance with interceptors -- Request/response transformations -- Error handling -- Token injection -- Base CRUD operations - -**Authentication Interceptor**: -```typescript -api.interceptors.request.use(config => { - const token = localStorage.getItem('token'); - if (token) { - config.headers.Authorization = `Bearer ${token}`; - } - return config; -}); -``` - -**Error Interceptor**: -```typescript -api.interceptors.response.use( - response => response, - error => { - if (error.response?.status === 401) { - // Redirect to login - window.location.href = '/login'; - } - return Promise.reject(error); - } -); -``` - -### API Modules - -Each API module provides typed functions for specific domain: - -**auth/auth.api.ts**: -```typescript -export const authApi = { - login: (credentials: LoginRequest) => api.post('/auth/login', credentials), - register: (data: RegisterRequest) => api.post('/auth/register', data), - logout: () => api.post('/auth/logout'), - getCurrentUser: () => api.get('/auth/me') -}; -``` - -**workspace/workspace.api.ts**: -```typescript -export const workspaceApi = { - list: () => api.get('/workspaces'), - create: (data: WorkspaceCreate) => api.post('/workspaces', data), - get: (id: string) => api.get(`/workspaces/${id}`), - update: (id: string, data: WorkspaceUpdate) => api.put(`/workspaces/${id}`, data), - delete: (id: string) => api.delete(`/workspaces/${id}`), - addMember: (id: string, data: MemberAdd) => api.post(`/workspaces/${id}/members`, data) -}; -``` - -## Key Components - -### DashboardLayout -Main layout wrapper with sidebar navigation. - -**Features**: -- Workspace switcher -- Navigation menu -- User profile menu -- Responsive design - -### ProjectStats -Displays project statistics and health metrics. - -**Metrics**: -- Total analyses -- Active issues count -- Resolved issues count -- Analysis trends chart -- Severity distribution - -### IssuesByFileDisplay -Groups and displays issues by file path. - -**Features**: -- Collapsible file groups -- Severity badges -- Line number links -- Code snippets -- Filter by severity - -### BranchPRHierarchy -Visualizes branch and PR relationships. - -**Features**: -- Branch tree view -- PR status indicators -- Analysis status -- Drill-down to details - -### IssueFilterSidebar -Filter issues by various criteria. - -**Filters**: -- Severity (High, Medium, Low) -- Category (Security, Quality, Performance, etc.) -- Status (Active, Resolved) -- File path search -- Date range - -### ProtectedRoute -Route wrapper requiring authentication. - -**Usage**: -```typescript - - - -``` - -### WorkspaceGuard -Ensures user has access to current workspace. - -**Usage**: -```typescript - - - -``` - -## State Management - -### TanStack Query - -Used for server state management. - -**Example**: -```typescript -const { data, isLoading, error } = useQuery({ - queryKey: ['projects', workspaceId], - queryFn: () => projectApi.list(workspaceId) -}); -``` - -**Mutations**: -```typescript -const createProject = useMutation({ - mutationFn: projectApi.create, - onSuccess: () => { - queryClient.invalidateQueries(['projects']); - } -}); -``` - -### React Context - -**AuthContext**: -- Current user state -- Login/logout functions -- Authentication status - -**ThemeContext**: -- Dark/light mode toggle -- Theme persistence - -## Routing - -**Main Routes**: -- `/` - Welcome page or redirect to dashboard -- `/login` - Login page -- `/register` - Registration page -- `/dashboard` - Main dashboard (protected) -- `/workspaces` - Workspace list (protected) -- `/workspaces/:id` - Workspace detail (protected) -- `/projects/:id` - Project detail (protected) -- `/projects/:id/analysis` - Analysis results (protected) -- `/analysis/:id` - Single analysis view (protected) - -## Styling - -### Tailwind CSS - -Utility-first CSS framework with custom configuration. - -**tailwind.config.ts**: -```typescript -export default { - darkMode: ["class"], - theme: { - extend: { - colors: { - border: "hsl(var(--border))", - background: "hsl(var(--background))", - foreground: "hsl(var(--foreground))", - // ... custom colors - } - } - } -} -``` - -### shadcn/ui Components - -Pre-built accessible components: -- Button, Input, Select, Checkbox -- Dialog, Dropdown, Popover -- Table, Card, Badge -- Tabs, Accordion, Collapsible -- Toast notifications - -**Installation**: -```bash -npx shadcn-ui@latest add button -npx shadcn-ui@latest add dialog -``` - -## Configuration - -**.env**: -```bash -VITE_API_URL=http://localhost:8081/api -VITE_WEBHOOK_URL=http://localhost:8082 -SERVER_PORT=8080 -``` - -**Modes**: -- `.env.development` - Development mode -- `.env.production` - Production build - -## Build & Deploy - -### Development - -```bash -npm install -npm run dev -``` - -Application runs on `http://localhost:5173` (Vite default). - -### Production Build - -```bash -npm run build -``` - -Outputs to `dist/` directory. - -### Serve Static Build - -```bash -npm install -g serve -serve -s dist -l 8080 -``` - -### Docker Build - -**Dockerfile**: -```dockerfile -FROM node:18-alpine AS build -WORKDIR /app -COPY package*.json ./ -RUN npm install -COPY . . -RUN npm run build - -FROM node:18-alpine -WORKDIR /app -RUN npm install -g serve -COPY --from=build /app/dist ./dist -CMD ["serve", "-s", "dist", "-l", "8080"] -``` - -**Build and Run**: -```bash -cd frontend -docker build -t codecrow-frontend . -docker run -p 8080:8080 codecrow-frontend -``` - -## Environment Variables - -Accessed via `import.meta.env`: - -- `VITE_API_URL` - Backend API base URL -- `VITE_WEBHOOK_URL` - Webhook URL for display -- `MODE` - `development` or `production` -- `DEV` - Boolean, true in dev mode -- `PROD` - Boolean, true in prod mode - -## TypeScript Interfaces - -**src/api_service/api.interface.ts**: -```typescript -export interface User { - id: string; - username: string; - email: string; - roles: string[]; -} - -export interface Workspace { - id: string; - name: string; - description?: string; - createdAt: string; - members: WorkspaceMember[]; -} - -export interface Project { - id: string; - name: string; - workspaceId: string; - repositoryUrl: string; - defaultBranch: string; - createdAt: string; -} - -export interface CodeAnalysis { - id: string; - projectId: string; - pullRequestId?: string; - branchId?: string; - status: 'PENDING' | 'IN_PROGRESS' | 'COMPLETED' | 'FAILED'; - createdAt: string; - issues: Issue[]; -} - -export interface Issue { - id: string; - file: string; - line: number; - severity: 'HIGH' | 'MEDIUM' | 'LOW'; - category: string; - description: string; - suggestion?: string; - resolved: boolean; -} -``` - -## Testing - -```bash -# Unit tests -npm run test - -# E2E tests (if configured) -npm run test:e2e -``` - -## Code Quality - -### ESLint - -```bash -npm run lint -``` - -**eslint.config.js**: -- React hooks rules -- TypeScript rules -- Accessibility rules - -### Prettier - -```bash -npm run format -``` - -## Common Issues - -### API Connection Failed -Check `VITE_API_URL` in `.env` matches backend URL. - -### CORS Errors -Backend must allow frontend origin in CORS configuration. - -### Build Fails -Clear `node_modules` and reinstall: `rm -rf node_modules && npm install` - -### Hot Reload Not Working -Restart dev server. Check Vite config. - -### Type Errors -Run `npm run typecheck` to see TypeScript errors. - -## Development Tips - -- Use React DevTools for component debugging -- Use TanStack Query DevTools for state inspection -- Enable source maps in development -- Use ESLint and Prettier VSCode extensions -- Leverage shadcn/ui component customization -- Keep API interfaces in sync with backend -- Use React.memo for expensive components -- Implement error boundaries -- Add loading states for all async operations -- Test responsive design on multiple devices - diff --git a/docs/04-modules/java-ecosystem.md b/docs/04-modules/java-ecosystem.md deleted file mode 100644 index 0b4bfa2..0000000 --- a/docs/04-modules/java-ecosystem.md +++ /dev/null @@ -1,453 +0,0 @@ -# Java Ecosystem - -The Java ecosystem is built as a multi-module Maven project with shared libraries and runnable services. - -## Project Structure - -``` -java-ecosystem/ -├── pom.xml # Parent POM -├── libs/ # Shared libraries -│ ├── core/ # Core models and persistence -│ ├── security/ # Security utilities -│ └── vcs-client/ # VCS API client -├── services/ # Runnable applications -│ ├── pipeline-agent/ # Analysis processing engine -│ └── web-server/ # REST API backend -└── mcp-servers/ # MCP server implementations - └── bitbucket-mcp/ # Bitbucket MCP tools -``` - -## Maven Parent Configuration - -**GroupId**: `org.rostilos.codecrow` -**Version**: `1.0` -**Java Version**: 17 -**Spring Boot**: 3.2.5 - -### Key Dependencies -- Spring Boot Starter (Web, Data JPA, Security) -- PostgreSQL Driver -- JWT (jjwt) -- Lombok -- Jackson -- Hibernate -- Redis - -## Shared Libraries - -### codecrow-core - -Core library containing domain models, repositories, and common services. - -**Package Structure**: -``` -org.rostilos.codecrow.core/ -├── model/ # JPA entities -│ ├── user/ # User, Role -│ ├── workspace/ # Workspace, WorkspaceMember -│ ├── project/ # Project, ProjectMember, ProjectToken -│ ├── branch/ # Branch, BranchFile, BranchIssue -│ ├── analysis/ # CodeAnalysis, AnalysisLock, RagIndexStatus -│ ├── ai/ # AIConnection -│ └── permission/ # PermissionTemplate, ProjectPermissionAssignment -├── dto/ # Data transfer objects -├── persistence/ # Spring Data repositories -├── service/ # Common business logic services -└── utils/ # Utility classes -``` - -**Key Entities**: - -- **User**: System users with authentication credentials -- **Workspace**: Top-level organizational unit -- **Project**: Repository representation, linked to VCS -- **Branch**: Git branch tracking with analysis status -- **BranchIssue**: Issues found in a branch -- **CodeAnalysis**: Pull request analysis record -- **CodeAnalysisIssue**: Issues found in PR analysis -- **PullRequest**: PR metadata and status -- **ProjectToken**: Authentication tokens for webhooks -- **AIConnection**: LLM provider configuration -- **RagIndexStatus**: RAG indexing status per branch - -**Repositories**: -All entities have corresponding Spring Data JPA repositories with custom query methods. - -**Services**: -- User management -- Workspace operations -- Project CRUD -- Branch tracking -- Issue management -- Analysis coordination - -### codecrow-security - -Security library for authentication and authorization. - -**Features**: -- JWT token generation and validation -- Password encryption (BCrypt) -- Role-based access control -- Permission checking utilities -- Security context helpers -- Encryption utilities (AES) - -**Key Classes**: -- `JwtTokenProvider`: Generate and validate JWT tokens -- `JwtAuthenticationFilter`: Extract and validate tokens from requests -- `SecurityConfig`: Spring Security configuration -- `PermissionService`: Check user permissions -- `EncryptionUtil`: Encrypt/decrypt sensitive data - -**Configuration Properties**: -```properties -codecrow.security.jwtSecret= -codecrow.security.jwtExpirationMs=86400000 -codecrow.security.projectJwtExpirationMs=7776000000 -codecrow.security.encryption-key= -``` - -### codecrow-vcs-client - -VCS platform API client library supporting multiple providers (currently Bitbucket Cloud). - -**Features**: -- Provider-agnostic VCS client interface -- Bitbucket REST API client -- Repository operations -- Pull request data fetching -- Diff retrieval -- File content access -- Branch information -- OAuth2 authentication support -- Automatic token refresh for APP connections -- Code Insights report posting - -**Key Classes**: - -- `VcsClient`: Base interface for VCS operations -- `BitbucketCloudClient`: Bitbucket Cloud implementation -- `VcsClientProvider`: Unified client factory with token management -- `HttpAuthorizedClientFactory`: OAuth/bearer token HTTP client factory - -**VCS Connection Types**: - -| Type | Description | Token Handling | -|------|-------------|----------------| -| `APP` | OAuth 2.0 App installation | Auto-refresh when expiring | -| `OAUTH_MANUAL` | User OAuth consumer credentials | Client credentials flow | -| `PERSONAL_TOKEN` | Personal access token | No refresh needed | - -**VcsRepoInfo Interface**: - -A common abstraction for repository information used by both connection types: - -```java -public interface VcsRepoInfo { - String getRepoWorkspace(); // Workspace slug (e.g., "my-workspace") - String getRepoSlug(); // Repository slug (e.g., "my-repo") - VcsConnection getVcsConnection(); -} -``` - -Implemented by: -- `ProjectVcsConnectionBinding` - Legacy OAuth consumer connections -- `VcsRepoBinding` - New APP-style connections - -**Token Refresh**: - -For APP connections, `VcsClientProvider` automatically refreshes tokens when they're about to expire (within 5 minutes): - -```java -// Token refresh happens automatically -OkHttpClient httpClient = vcsClientProvider.getHttpClient(connection); -// If token is expired/expiring, it's refreshed before returning the client -``` - -**Usage Example**: -```java -@Autowired -private VcsClientProvider vcsClientProvider; - -// Get authorized client for a connection -VcsClient client = vcsClientProvider.getClient(connectionId); - -// Or from a VcsRepoInfo -OkHttpClient httpClient = vcsClientProvider.getHttpClient(vcsRepoInfo.getVcsConnection()); -``` - -**Bitbucket Actions**: -- `CommentOnBitbucketCloudAction` - Post PR comments -- `PostReportOnBitbucketCloudAction` - Post Code Insights reports - -## Services - -### pipeline-agent - -Analysis processing engine and API gateway between VCS and analysis components. - -**Port**: 8082 - -**Responsibilities**: -- Receive and validate webhooks from Bitbucket -- Coordinate analysis workflow -- Fetch repository data via VCS client -- Send analysis requests to MCP client -- Trigger RAG indexing -- Process and store analysis results -- Update issue statuses - -**Package Structure**: -``` -org.rostilos.codecrow.pipelineagent/ -├── bitbucket/ -│ ├── controller/ # Webhook endpoints -│ ├── service/ # Analysis orchestration -│ └── dto/ # Bitbucket-specific DTOs -├── generic/ -│ ├── controller/ # Health checks -│ └── service/ # Common services -└── config/ # Configuration classes -``` - -**Key Components**: - -**WebhookController**: -- Endpoint: `/api/v1/bitbucket-cloud/webhook` -- Validates webhook signatures -- Authenticates using project tokens -- Routes to appropriate analysis service - -**BranchAnalysisService**: -- Processes branch merge events -- Triggers RAG indexing for first analysis -- Fetches branch issues for re-analysis -- Updates issue resolved status - -**PullRequestAnalysisService**: -- Processes PR created/updated events -- Fetches PR metadata and diffs -- Includes previous PR issues if reanalysis -- Creates/updates CodeAnalysis records - -**AnalysisLockService**: -- Prevents concurrent analysis of same repository -- Uses database-level locks -- Ensures data consistency - -**MCPClientService**: -- HTTP client for MCP client API -- Sends analysis requests -- Handles timeouts and retries - -**RAGClientService**: -- HTTP client for RAG pipeline API -- Triggers indexing and updates -- Queries for code context - -**Configuration**: -```properties -codecrow.mcp.client.url=http://host.docker.internal:8000/review -codecrow.rag.api.url=http://host.docker.internal:8001 -codecrow.rag.api.enabled=true -spring.mvc.async.request-timeout=-1 -``` - -**Endpoints**: -- `POST /api/v1/bitbucket-cloud/webhook` - Webhook receiver -- `GET /actuator/health` - Health check - -### web-server - -Main backend REST API for web interface. - -**Port**: 8081 - -**Responsibilities**: -- User authentication and management -- Workspace CRUD operations -- Project management -- Analysis results retrieval -- Issue browsing and filtering -- VCS integration management -- AI connection configuration -- Permission management - -**Package Structure**: -``` -org.rostilos.codecrow.webserver/ -├── controller/ -│ ├── auth/ # Authentication -│ ├── user/ # User management -│ ├── workspace/ # Workspace operations -│ ├── project/ # Project CRUD -│ ├── analysis/ # Analysis results, issues -│ ├── vcs/ # VCS integration -│ ├── ai/ # AI connections -│ └── permission/ # Permissions -├── service/ # Business logic -├── dto/ # Request/Response DTOs -├── exception/ # Exception handling -└── config/ # Security, Swagger config -``` - -**Key Endpoints**: - -**Authentication** (`/api/auth`): -- `POST /register` - User registration -- `POST /login` - User login -- `POST /logout` - User logout -- `GET /me` - Current user info - -**Workspaces** (`/api/workspaces`): -- `GET /` - List user's workspaces -- `POST /` - Create workspace -- `GET /{id}` - Get workspace details -- `PUT /{id}` - Update workspace -- `DELETE /{id}` - Delete workspace -- `POST /{id}/members` - Add member -- `DELETE /{id}/members/{userId}` - Remove member - -**Projects** (`/api/projects`): -- `GET /workspace/{workspaceId}` - List workspace projects -- `POST /workspace/{workspaceId}` - Create project -- `GET /{id}` - Get project details -- `PUT /{id}` - Update project -- `DELETE /{id}` - Delete project -- `POST /{id}/tokens` - Generate webhook token -- `GET /{id}/statistics` - Project statistics - -**Analysis** (`/api/analysis`): -- `GET /project/{projectId}` - List project analyses -- `GET /{id}` - Get analysis details -- `GET /{id}/issues` - Get analysis issues -- `GET /pull-request/{prId}` - Get PR analysis - -**Issues** (`/api/issues`): -- `GET /branch/{branchId}` - Branch issues -- `GET /project/{projectId}/active` - Active issues -- `GET /{id}` - Issue details - -**VCS Integration** (`/api/vcs/bitbucket`): -- `POST /connect` - Connect Bitbucket account -- `GET /repositories` - List accessible repositories -- `POST /verify-webhook` - Verify webhook configuration - -**AI Connections** (`/api/ai`): -- `GET /` - List AI connections -- `POST /` - Create AI connection -- `PUT /{id}` - Update AI connection -- `DELETE /{id}` - Delete AI connection - -**Configuration**: -```properties -server.port=8081 -spring.datasource.url=jdbc:postgresql://postgres:5432/codecrow_ai -spring.jpa.hibernate.ddl-auto=update -spring.session.store-type=redis -springdoc.swagger-ui.path=/swagger-ui-custom.html -``` - -**Security**: -- JWT-based authentication -- Role-based authorization -- Permission checks on endpoints -- CORS configuration -- Session management via Redis - -**Swagger/OpenAPI**: -Available at `/swagger-ui-custom.html` when running. - -## MCP Servers - -### bitbucket-mcp - -Java-based MCP server providing Bitbucket tools for private repository access. - -**Responsibilities**: -- Provide MCP tools for Bitbucket operations -- Access private repositories securely -- Support MCP protocol for LLM integration - -**Build Output**: -JAR file: `codecrow-mcp-servers-1.0.jar` -Copied to: `python-ecosystem/mcp-client/` - -**MCP Tools Provided**: -- `get_repository_info` - Repository metadata -- `get_file_content` - File content retrieval -- `list_directory` - Directory listing -- `get_commit_info` - Commit details -- `search_code` - Code search in repository - -**Usage**: -MCP client loads this JAR and executes tools via MCP protocol when needed during analysis. - -## Building - -### Build All Modules - -```bash -cd java-ecosystem -mvn clean package -DskipTests -``` - -### Build Specific Module - -```bash -cd java-ecosystem/services/web-server -mvn clean package -``` - -### Run Locally (Development) - -**Web Server**: -```bash -cd java-ecosystem/services/web-server -mvn spring-boot:run -``` - -**Pipeline Agent**: -```bash -cd java-ecosystem/services/pipeline-agent -mvn spring-boot:run -``` - -## Testing - -```bash -# Run all tests -mvn test - -# Run specific module tests -cd services/web-server -mvn test - -# Skip tests during build -mvn clean package -DskipTests -``` - -## Common Issues - -### Build Fails - Dependency Resolution -Ensure Maven can access Maven Central. Check `~/.m2/settings.xml`. - -### Port Already in Use -Change port in `application.properties` or stop conflicting process. - -### Database Connection Failed -Verify PostgreSQL is running and credentials are correct. - -### JWT Token Issues -Ensure `jwtSecret` is set and consistent across restarts. - -## Development Tips - -- Use IDE (IntelliJ IDEA recommended) with Lombok plugin -- Enable annotation processing for Lombok -- Use Spring Boot DevTools for hot reload -- Check logs in `/app/logs/` directory when running in Docker -- Use debug ports (5005, 5006) for remote debugging - diff --git a/docs/04-modules/python-ecosystem.md b/docs/04-modules/python-ecosystem.md deleted file mode 100644 index d6646a6..0000000 --- a/docs/04-modules/python-ecosystem.md +++ /dev/null @@ -1,528 +0,0 @@ -# Python Ecosystem - -Python ecosystem consists of two FastAPI services: MCP client and RAG pipeline. - -## Project Structure - -``` -python-ecosystem/ -├── mcp-client/ # MCP client service -│ ├── main.py # FastAPI application -│ ├── requirements.txt # Python dependencies -│ ├── Dockerfile # Container build -│ ├── codecrow-mcp-servers-1.0.jar # Java MCP servers -│ ├── llm/ # LLM integration -│ ├── model/ # Data models -│ ├── server/ # MCP server management -│ ├── service/ # Business logic -│ └── utils/ # Utilities -└── rag-pipeline/ # RAG service - ├── main.py # FastAPI application - ├── requirements.txt # Python dependencies - ├── Dockerfile # Container build - ├── setup.py # Package setup - ├── src/ # Source code - │ ├── api/ # API routes - │ ├── core/ # Core functionality - │ ├── models/ # Data models - │ └── services/ # RAG services - ├── docs/ # Documentation - └── tests/ # Unit tests -``` - -## MCP Client - -### Overview - -Modified MCP client that receives analysis requests from pipeline-agent, generates AI prompts with context from RAG, executes MCP tools, and returns analysis results. - -**Port**: 8000 -**Framework**: FastAPI + Uvicorn -**Key Libraries**: langchain-openai, mcp-use, httpx - -### Architecture - -``` -Request → Prompt Generation → RAG Context Retrieval → -MCP Tools Execution → LLM Analysis → Response Processing → Result -``` - -### Key Components - -**main.py**: -FastAPI application entry point with routes: -- `POST /review` - Main analysis endpoint -- `GET /health` - Health check - -**llm/client.py**: -LLM client for OpenRouter integration. -- Configures ChatOpenAI for OpenRouter -- Handles streaming responses -- Error handling and retries - -**service/analysis_service.py**: -Core analysis orchestration: -- Receives analysis request -- Queries RAG for relevant context -- Builds prompts based on analysis type -- Executes MCP tools if needed -- Calls LLM with context -- Parses and structures response - -**service/rag_service.py**: -RAG integration client: -- Queries RAG pipeline for code context -- Handles retrieval errors -- Formats context for prompts - -**server/mcp_manager.py**: -MCP server lifecycle management: -- Loads Java MCP servers from JAR -- Manages server processes -- Provides tool execution interface - -**model/analysis_request.py**: -Pydantic models for requests and responses: -```python -class AnalysisRequest: - project_id: str - analysis_type: str # PULL_REQUEST or BRANCH - repository: RepositoryInfo - changed_files: List[ChangedFile] - previous_issues: List[Issue] - metadata: dict - -class AnalysisResponse: - issues: List[Issue] - summary: AnalysisSummary -``` - -### Configuration - -**.env**: -```bash -AI_CLIENT_PORT=8000 -RAG_ENABLED=true -RAG_API_URL=http://localhost:8001 - -# Optional -OPENROUTER_API_KEY=sk-or-v1-... # If not using system config -OPENROUTER_MODEL=anthropic/claude-3.5-sonnet -``` - -### Prompt Engineering - -**Pull Request Analysis Prompt**: -``` -You are a code review expert analyzing a pull request. - -Repository: {workspace}/{repo} -Branch: {branch} -PR Author: {author} - -Changed Files: -{file_diffs} - -Previous Issues: -{previous_issues} - -Relevant Code Context from RAG: -{rag_context} - -Analyze the code changes for: -- Code quality issues -- Security vulnerabilities -- Performance problems -- Best practice violations -- Logic errors - -Return structured JSON with issues. -``` - -**Branch Analysis Prompt**: -``` -You are analyzing a merged branch to verify if previously reported issues are resolved. - -Repository: {workspace}/{repo} -Branch: {branch} - -Previously Reported Issues: -{branch_issues} - -Changed Files in Merge: -{changed_files} - -For each issue, determine if it has been resolved based on the changes. -Return JSON with issue_id and resolved status. -``` - -### MCP Tools Usage - -Bitbucket MCP tools are used when additional repository context is needed: -- Fetch file content not in diff -- Get repository structure -- Access commit history -- Search code patterns - -### Response Format - -```json -{ - "issues": [ - { - "file": "src/main/java/Example.java", - "line": 42, - "severity": "HIGH", - "category": "SECURITY", - "description": "SQL injection vulnerability detected", - "suggestion": "Use parameterized queries or ORM", - "code_snippet": "String query = \"SELECT * FROM users WHERE id=\" + userId;" - } - ], - "summary": { - "total_issues": 5, - "by_severity": { - "HIGH": 1, - "MEDIUM": 3, - "LOW": 1 - }, - "by_category": { - "SECURITY": 1, - "CODE_QUALITY": 2, - "PERFORMANCE": 1, - "BEST_PRACTICES": 1 - } - } -} -``` - -### Running Locally - -```bash -cd python-ecosystem/mcp-client -pip install -r requirements.txt -cp .env.sample .env -# Edit .env with configuration -uvicorn main:app --host 0.0.0.0 --port 8000 -``` - -### Docker Build - -```bash -cd python-ecosystem/mcp-client -docker build -t codecrow-mcp-client . -docker run -p 8000:8000 --env-file .env codecrow-mcp-client -``` - -## RAG Pipeline - -### Overview - -Retrieval-Augmented Generation service for indexing codebases and providing semantic search over code. - -**Port**: 8001 -**Framework**: FastAPI + Uvicorn -**Key Libraries**: llama-index, qdrant-client, openai - -### Architecture - -``` -Index Request → File Processing → Chunking → Embedding → -Qdrant Storage → Search Query → Vector Similarity → Context Return -``` - -### Key Components - -**src/api/routes.py**: -FastAPI routes: -- `POST /index` - Create or update index -- `POST /query` - Search for relevant code -- `DELETE /index/{collection}` - Delete index -- `GET /health` - Health check -- `GET /status/{collection}` - Index status - -**src/services/indexing_service.py**: -Code indexing: -- Processes source code files -- Chunks code intelligently -- Generates embeddings -- Stores in Qdrant -- Supports incremental updates - -**src/services/query_service.py**: -Semantic search: -- Embeds query -- Vector similarity search -- Retrieves top-K results -- Returns with metadata - -**src/core/chunking.py**: -Code chunking strategies: -- Language-aware chunking -- Function/class boundaries -- Configurable chunk size and overlap -- Preserves context - -**src/core/embeddings.py**: -Embedding generation: -- OpenRouter-compatible OpenAI client -- Uses text-embedding models -- Batch processing -- Caching - -**src/models/**: -Pydantic models for API: -```python -class IndexRequest: - project_id: str - repository: str - branch: str - files: List[SourceFile] - incremental: bool = False - -class QueryRequest: - project_id: str - repository: str - branch: str - query: str - top_k: int = 10 - -class QueryResponse: - results: List[SearchResult] - total: int -``` - -### Configuration - -**.env**: -```bash -# Qdrant -QDRANT_URL=http://localhost:6333 -QDRANT_COLLECTION_PREFIX=codecrow - -# OpenRouter -OPENROUTER_API_KEY=sk-or-v1-your-key -OPENROUTER_MODEL=openai/text-embedding-3-small - -# Chunking -CHUNK_SIZE=800 -CHUNK_OVERLAP=200 -TEXT_CHUNK_SIZE=1000 -TEXT_CHUNK_OVERLAP=200 - -# Retrieval -RETRIEVAL_TOP_K=10 -SIMILARITY_THRESHOLD=0.7 - -# File Processing -MAX_FILE_SIZE_BYTES=1048576 - -# Server -SERVER_HOST=0.0.0.0 -SERVER_PORT=8001 - -# Cache directories -HOME=/tmp -TIKTOKEN_CACHE_DIR=/tmp/.tiktoken_cache -TRANSFORMERS_CACHE=/tmp/.transformers_cache -HF_HOME=/tmp/.huggingface -LLAMA_INDEX_CACHE_DIR=/tmp/.llama_index -``` - -### Indexing Flow - -**Full Index**: -1. Receive all repository files -2. Filter supported file types -3. Chunk each file -4. Generate embeddings -5. Store in Qdrant collection -6. Create metadata index - -**Incremental Update**: -1. Receive changed files -2. Delete old chunks for changed files -3. Re-chunk modified files -4. Generate embeddings -5. Update Qdrant collection - -### Collection Naming - -Collections are named: `{prefix}_{project_id}_{branch_name}` - -Example: `codecrow_proj123_main` - -### Chunking Strategy - -**Code Files** (.java, .py, .js, .ts, etc.): -- 800 character chunks -- 200 character overlap -- Preserve function boundaries when possible - -**Text Files** (.md, .txt): -- 1000 character chunks -- 200 character overlap -- Preserve paragraph boundaries - -**Metadata Stored**: -- File path -- Language -- Chunk index -- Line numbers -- Repository info - -### Query Processing - -1. Embed query using same model -2. Similarity search in Qdrant -3. Filter by similarity threshold (0.7 default) -4. Return top K results (10 default) -5. Include file path, chunk content, score - -### API Examples - -**Index Repository**: -```bash -curl -X POST http://localhost:8001/index \ - -H "Content-Type: application/json" \ - -d '{ - "project_id": "proj123", - "repository": "my-repo", - "branch": "main", - "files": [ - { - "path": "src/main.py", - "content": "def main():\n print(\"Hello\")" - } - ], - "incremental": false - }' -``` - -**Query Code**: -```bash -curl -X POST http://localhost:8001/query \ - -H "Content-Type: application/json" \ - -d '{ - "project_id": "proj123", - "repository": "my-repo", - "branch": "main", - "query": "authentication implementation", - "top_k": 5 - }' -``` - -**Response**: -```json -{ - "results": [ - { - "file": "src/auth/auth_service.py", - "content": "class AuthService:\n def authenticate(self, username, password)...", - "score": 0.89, - "metadata": { - "language": "python", - "lines": "10-25" - } - } - ], - "total": 5 -} -``` - -### Running Locally - -```bash -cd python-ecosystem/rag-pipeline -pip install -r requirements.txt -cp .env.sample .env -# Edit .env with Qdrant URL and OpenRouter key -uvicorn main:app --host 0.0.0.0 --port 8001 -``` - -### Docker Build - -```bash -cd python-ecosystem/rag-pipeline -docker build -t codecrow-rag-pipeline . -docker run -p 8001:8001 --env-file .env codecrow-rag-pipeline -``` - -### Performance Considerations - -**Indexing**: -- Large repositories may take several minutes -- Use incremental updates when possible -- Monitor Qdrant memory usage -- Consider batching for very large repos - -**Querying**: -- Fast (<100ms for most queries) -- Adjust top_k to balance quality vs speed -- Similarity threshold affects precision/recall - -**Storage**: -- Qdrant uses memory-mapped files -- Disk usage: ~1-2GB per 10k code files -- Regular collection cleanup recommended - -## Dependencies - -### MCP Client Requirements - -``` -asyncio -fastapi -uvicorn -pydantic -python-dotenv -httpx -langchain-openai==0.3.32 -langchain-core==0.3.75 -mcp-use -``` - -### RAG Pipeline Requirements - -``` -python-dotenv>=1.0.0 -openai>=1.12.0 -tiktoken>=0.5.0 -llama-index-core==0.13.0 -llama-index-embeddings-openai>=0.3.0 -llama-index-vector-stores-qdrant>=0.5.0 -llama-index-llms-openai>=0.3.0 -qdrant-client>=1.7.0 -fastapi>=0.109.0 -uvicorn[standard]>=0.27.0 -pydantic>=2.6.0 -aiofiles>=23.2.0 -pytest>=8.0.0 -``` - -## Common Issues - -### MCP Client Can't Load Java JAR -Ensure `codecrow-mcp-servers-1.0.jar` is in the mcp-client directory. Run `tools/production-build.sh` to copy it. - -### RAG Pipeline Connection Failed -Verify Qdrant is running and accessible at configured URL. - -### OpenRouter API Errors -Check API key is valid and has credits. Verify model name is correct. - -### Embedding Generation Slow -Use smaller embedding model or reduce chunk size. Consider caching. - -### Out of Memory -Reduce batch size for indexing. Increase container memory limits. - -## Development Tips - -- Use virtual environments: `python -m venv venv` -- Hot reload: `uvicorn main:app --reload` -- Check logs for detailed error messages -- Test RAG queries independently before integration -- Monitor OpenRouter usage and costs -- Use smaller models for development/testing - diff --git a/docs/05-configuration.md b/docs/05-configuration.md deleted file mode 100644 index a934171..0000000 --- a/docs/05-configuration.md +++ /dev/null @@ -1,765 +0,0 @@ -# Configuration Reference - -Complete configuration guide for all CodeCrow components. - -## Configuration Files Overview - -``` -deployment/config/ -├── java-shared/ -│ └── application.properties # Java services configuration -├── mcp-client/ -│ └── .env # MCP client configuration -├── rag-pipeline/ -│ └── .env # RAG pipeline configuration -└── web-frontend/ - └── .env # Frontend configuration -``` - -## Java Services Configuration - -**File**: `deployment/config/java-shared/application.properties` - -Used by: web-server, pipeline-agent - -### Security Settings - -```properties -# JWT Configuration -codecrow.security.jwtSecret= -codecrow.security.jwtExpirationMs=86400000 -codecrow.security.projectJwtExpirationMs=7776000000 - -# Encryption Key (AES) -codecrow.security.encryption-key= -``` - -**Generate Secrets**: -```bash -# JWT Secret (256-bit) -openssl rand -base64 32 - -# Encryption Key (256-bit) -openssl rand -base64 32 -``` - -**Token Expiration**: -- `jwtExpirationMs`: User JWT token validity (default: 24 hours) -- `projectJwtExpirationMs`: Project webhook token validity (default: 3 months) - -### Google OAuth (Social Login) - -Enable Google Sign-In for user authentication: - -```properties -# Google OAuth Client ID (same value in frontend and backend) -codecrow.oauth.google.client-id=your-client-id.apps.googleusercontent.com -``` - -**Setup Steps**: -1. Go to [Google Cloud Console](https://console.cloud.google.com/) -2. Create or select a project -3. Navigate to **APIs & Services → Credentials** -4. Click **Create Credentials → OAuth 2.0 Client ID** -5. Select **Web application** -6. Add **Authorized JavaScript origins**: Your frontend URL(s) -7. Add **Authorized redirect URIs**: Same as JavaScript origins -8. Copy the **Client ID** to both backend and frontend configuration - -**Frontend Configuration** (`deployment/config/web-frontend/.env`): -```bash -VITE_GOOGLE_CLIENT_ID=your-client-id.apps.googleusercontent.com -``` - -**Important Notes**: -- Both frontend and backend must use the same Client ID -- Google Sign-In button only appears if `VITE_GOOGLE_CLIENT_ID` is configured -- For production, add your domain to authorized origins -- Users signing in with Google can link to existing accounts with matching email - -### Email / SMTP Configuration - -Email is required for Two-Factor Authentication (2FA), security notifications, and backup codes. - -```properties -# Enable/disable email sending -codecrow.email.enabled=true - -# Sender email address and display name -codecrow.email.from=noreply@yourdomain.com -codecrow.email.from-name=CodeCrow -codecrow.email.app-name=CodeCrow - -# Frontend URL (for email links) -codecrow.email.frontend-url=https://codecrow.example.com - -# SMTP Server Configuration -spring.mail.host=smtp.gmail.com -spring.mail.port=587 -spring.mail.username=your-email@gmail.com -spring.mail.password=your-app-password -spring.mail.properties.mail.smtp.auth=true -spring.mail.properties.mail.smtp.starttls.enable=true -``` - -**Quick Setup (Gmail)**: -1. Enable 2-Step Verification in your Google Account -2. Go to Security → App passwords → Generate new app password -3. Use the 16-character password in `spring.mail.password` - -> **📖 For detailed SMTP setup with different providers (Amazon SES, SendGrid, Mailgun, Office 365), see [SMTP_SETUP.md](./SMTP_SETUP.md)** - -### Application URLs - -```properties -# Web Frontend Base URL -codecrow.web.base.url=https://codecrow.example.com - -# MCP Client URL (internal) -codecrow.mcp.client.url=http://host.docker.internal:8000/review - -# RAG Pipeline URL (internal) -codecrow.rag.api.url=http://host.docker.internal:8001 -codecrow.rag.api.enabled=true -``` - -**Notes**: -- Use `host.docker.internal` for Docker container-to-host communication -- Use service names for inter-container communication -- For production, use actual hostnames or service discovery - -### Database Configuration - -Set via environment variables in docker-compose.yml: - -```yaml -environment: - SPRING_DATASOURCE_URL: jdbc:postgresql://postgres:5432/codecrow_ai - SPRING_DATASOURCE_USERNAME: codecrow_user - SPRING_DATASOURCE_PASSWORD: codecrow_pass - SPRING_JPA_HIBERNATE_DDL_AUTO: update - SPRING_JPA_DATABASE_PLATFORM: org.hibernate.dialect.PostgreSQLDialect -``` - -**Hibernate DDL Auto Options**: -- `none`: No schema management -- `validate`: Validate schema, no changes -- `update`: Update schema (recommended for development) -- `create`: Drop and recreate schema (data loss!) -- `create-drop`: Create on start, drop on stop - -**Production Recommendation**: Use `validate` with managed migrations (Flyway/Liquibase). - -### Redis Configuration - -```yaml -environment: - SPRING_SESSION_STORE_TYPE: redis - SPRING_REDIS_HOST: redis - SPRING_REDIS_PORT: 6379 -``` - -### File Upload Limits - -```properties -spring.servlet.multipart.max-file-size=500MB -spring.servlet.multipart.max-request-size=500MB -``` - -Adjust based on expected repository archive sizes. - -### Async Request Timeout - -```properties -spring.mvc.async.request-timeout=-1 -``` - -`-1` = no timeout. Necessary for long-running analysis operations. - -### Swagger/OpenAPI - -```properties -springdoc.swagger-ui.operationsSorter=method -springdoc.swagger-ui.path=/swagger-ui-custom.html -springdoc.api-docs.path=/api-docs -``` - -Access Swagger UI at: `http://localhost:8081/swagger-ui-custom.html` - -### Logging - -```properties -logging.level.org.springframework.security.web.access.ExceptionTranslationFilter=ERROR -logging.level.org.apache.catalina.core.ApplicationDispatcher=ERROR -logging.level.org.apache.catalina.core.StandardHostValve=ERROR -``` - -**Log Levels**: TRACE, DEBUG, INFO, WARN, ERROR, FATAL - -**Add Custom Loggers**: -```properties -logging.level.org.rostilos.codecrow=DEBUG -logging.level.org.hibernate.SQL=DEBUG -``` - -### Server Ports - -Set via environment variables: - -```yaml -environment: - SERVER_PORT: 8081 # web-server - SERVER_PORT: 8082 # pipeline-agent -``` - -## MCP Client Configuration - -**File**: `deployment/config/mcp-client/.env` - -```bash -# Server Port -AI_CLIENT_PORT=8000 - -# RAG Integration -RAG_ENABLED=true -RAG_API_URL=http://localhost:8001 - -# Optional: OpenRouter Override -# If not set, uses configuration from pipeline-agent request -OPENROUTER_API_KEY=sk-or-v1-your-api-key -OPENROUTER_MODEL=anthropic/claude-3.5-sonnet -``` - -**OpenRouter Models**: -- `anthropic/claude-3.5-sonnet` - Recommended for code analysis -- `openai/gpt-4-turbo` - OpenAI GPT-4 -- `google/gemini-pro` - Google Gemini -- See https://openrouter.ai/docs for full list - -**RAG Settings**: -- `RAG_ENABLED=true`: Enable RAG context retrieval -- `RAG_ENABLED=false`: Disable RAG (analysis without code context) - -## RAG Pipeline Configuration - -**File**: `deployment/config/rag-pipeline/.env` - -### Vector Database - -```bash -QDRANT_URL=http://localhost:6333 -QDRANT_COLLECTION_PREFIX=codecrow -``` - -Collections are named: `{prefix}_{project_id}_{branch_name}` - -### OpenRouter Configuration - -```bash -OPENROUTER_API_KEY=sk-or-v1-your-api-key-here -OPENROUTER_MODEL=openai/text-embedding-3-small -``` - -**Embedding Models**: -- `openai/text-embedding-3-small` - Fast, cost-effective (default) -- `openai/text-embedding-3-large` - Higher quality, more expensive -- `openai/text-embedding-ada-002` - Legacy model - -**Get API Key**: https://openrouter.ai/ - -### Chunking Configuration - -```bash -# Code Files -CHUNK_SIZE=800 -CHUNK_OVERLAP=200 - -# Text Files -TEXT_CHUNK_SIZE=1000 -TEXT_CHUNK_OVERLAP=200 -``` - -**Tuning**: -- Larger chunks: More context, fewer embeddings, less granular -- Smaller chunks: More granular, more embeddings, higher cost -- Overlap: Ensures context continuity across chunks - -### Retrieval Configuration - -```bash -RETRIEVAL_TOP_K=10 -SIMILARITY_THRESHOLD=0.7 -``` - -**Top K**: Number of most similar chunks to return -**Similarity Threshold**: Minimum similarity score (0.0-1.0) - -### File Processing - -```bash -MAX_FILE_SIZE_BYTES=1048576 -``` - -Files larger than this are skipped (default: 1MB). - -### Server Configuration - -```bash -SERVER_HOST=0.0.0.0 -SERVER_PORT=8001 -``` - -### Cache Directories - -```bash -HOME=/tmp -TIKTOKEN_CACHE_DIR=/tmp/.tiktoken_cache -TRANSFORMERS_CACHE=/tmp/.transformers_cache -HF_HOME=/tmp/.huggingface -LLAMA_INDEX_CACHE_DIR=/tmp/.llama_index -``` - -**Important for Docker**: These should be writable directories. - -### Default Exclude Patterns - -The RAG pipeline automatically excludes common non-code files: - -``` -node_modules/** -.venv/** -venv/** -__pycache__/** -*.pyc, *.pyo, *.so, *.dll, *.dylib, *.exe, *.bin -*.jar, *.war, *.class -target/** -build/** -dist/** -.git/** -.idea/** -*.min.js, *.min.css, *.bundle.js -*.lock, package-lock.json, yarn.lock, bun.lockb -``` - -Additional patterns can be configured per-project (see Project RAG Configuration below). - -## Project-Level RAG Configuration - -Each project can configure RAG indexing via the web UI or API. - -### Configuration Options - -| Option | Type | Description | -|--------|------|-------------| -| `enabled` | boolean | Enable/disable RAG indexing for this project | -| `branch` | string | Branch to index (defaults to project's default branch) | -| `excludePatterns` | string[] | Additional paths/patterns to exclude from indexing | - -### Exclude Patterns - -Project-specific exclude patterns are merged with the default system patterns. - -**Supported Pattern Formats**: - -| Pattern | Description | Example Matches | -|---------|-------------|-----------------| -| `vendor/**` | Directory with all subdirectories | `vendor/lib/file.php`, `vendor/autoload.php` | -| `app/code/**` | Nested directory pattern | `app/code/Module/Model.php` | -| `*.min.js` | File extension pattern | `script.min.js`, `vendor/lib.min.js` | -| `**/*.generated.ts` | Any directory + file pattern | `src/types.generated.ts` | -| `lib/` | Exact directory prefix | `lib/file.js`, `lib/sub/file.js` | - -**Example Configuration** (via API): -```json -{ - "enabled": true, - "branch": "main", - "excludePatterns": [ - "vendor/**", - "lib/**", - "generated/**", - "app/design/**", - "*.min.js", - "*.map" - ] -} -``` - -**Use Cases**: -- Exclude third-party code (`vendor/**`, `node_modules/**`) -- Exclude generated files (`generated/**`, `*.generated.ts`) -- Exclude design/theme files (`app/design/**`) -- Exclude build artifacts (`dist/**`, `build/**`) -- Exclude large data files (`data/**`, `fixtures/**`) - -### Configuring via Web UI - -1. Navigate to **Project Settings** → **RAG Configuration** -2. Enable RAG indexing with the toggle -3. Set the branch to index (optional) -4. Add exclude patterns: - - Type pattern in the input field - - Press Enter or click the **+** button - - Remove patterns by clicking the **×** on each badge -5. Click **Save Configuration** - -### Configuring via API - -```http -PUT /api/workspace/{workspaceSlug}/project/{projectNamespace}/rag/config -Authorization: Bearer -Content-Type: application/json - -{ - "enabled": true, - "branch": "main", - "excludePatterns": ["vendor/**", "lib/**"] -} -``` - -## Frontend Configuration - -**File**: `deployment/config/web-frontend/.env` - -```bash -# Backend API URL -VITE_API_URL=http://localhost:8081/api - -# Webhook URL (for display in UI) -VITE_WEBHOOK_URL=http://localhost:8082 - -# Server Port -SERVER_PORT=8080 -``` - -**Production Example**: -```bash -VITE_API_URL=https://api.codecrow.example.com/api -VITE_WEBHOOK_URL=https://webhooks.codecrow.example.com -SERVER_PORT=8080 -``` - -## Docker Compose Configuration - -**File**: `deployment/docker-compose.yml` - -### PostgreSQL - -```yaml -postgres: - environment: - POSTGRES_DB: codecrow_ai - POSTGRES_USER: codecrow_user - POSTGRES_PASSWORD: codecrow_pass - volumes: - - postgres_data:/var/lib/postgresql/data -``` - -**Change Credentials**: Update all services that connect to PostgreSQL. - -### Redis - -```yaml -redis: - image: redis:7-alpine - volumes: - - redis_data:/data -``` - -**Password Protection** (optional): -```yaml -redis: - command: redis-server --requirepass -``` - -Then update services: -```yaml -environment: - SPRING_REDIS_PASSWORD: -``` - -### Qdrant - -```yaml -qdrant: - image: qdrant/qdrant:latest - volumes: - - qdrant_data:/qdrant/storage -``` - -**Persistence**: All data stored in named volume. - -### Resource Limits - -```yaml -mcp-client: - deploy: - resources: - limits: - cpus: '1.0' - memory: 1G - reservations: - cpus: '0.5' - memory: 512M -``` - -Adjust based on workload and available resources. - -### Health Checks - -```yaml -healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8081/actuator/health"] - interval: 30s - timeout: 10s - retries: 5 - start_period: 60s -``` - -**Tuning**: -- `interval`: How often to check -- `timeout`: Max time for health check -- `retries`: Failures before marking unhealthy -- `start_period`: Grace period on startup - -### Networks - -```yaml -networks: - codecrow-network: -``` - -All services on same network can communicate by service name. - -### Volumes - -```yaml -volumes: - source_code_tmp: - postgres_data: - redis_data: - qdrant_data: - web_logs: - pipeline_agent_logs: - web_frontend_logs: - rag_logs: -``` - -**Persistent**: Data survives container restarts. - -**Backup**: Use `docker volume` commands to backup/restore. - -## VCS Provider Configuration - -CodeCrow supports multiple VCS providers with different connection types. Configure these in `application.properties`: - -### Bitbucket Cloud App - -For 1-click app installation, configure your Bitbucket OAuth consumer: - -```properties -# Bitbucket Cloud App Configuration -codecrow.bitbucket.app.client-id= -codecrow.bitbucket.app.client-secret= -``` - -**Setup Steps**: -1. Go to Bitbucket Settings → Workspace settings → OAuth consumers → Add consumer -2. Set callback URL to: `https://your-domain.com/api/{workspaceSlug}/integrations/bitbucket-cloud/app/callback` -3. Required permissions: - - Account: Read - - Repositories: Read, Write - - Pull requests: Read, Write - - Webhooks: Read and write -4. Copy the Key (client-id) and Secret (client-secret) to `application.properties` - -### GitHub OAuth App - -For 1-click GitHub integration, configure a GitHub OAuth App: - -```properties -# GitHub OAuth App Configuration -codecrow.github.app.client-id= -codecrow.github.app.client-secret= -``` - -**Setup Steps**: -1. Go to GitHub → Settings → Developer settings → OAuth Apps → New OAuth App - - Direct URL: https://github.com/settings/developers -2. Fill in the application details: - - **Application name**: CodeCrow (or your preferred name) - - **Homepage URL**: Your frontend URL (e.g., `https://codecrow.example.com`) - - **Authorization callback URL**: `https://your-api-domain.com/api/{workspaceSlug}/integrations/github/app/callback` - - Replace `{workspaceSlug}` with your actual workspace slug, or configure your reverse proxy to handle workspace routing -3. Click "Register application" -4. Copy the **Client ID** to `codecrow.github.app.client-id` -5. Click "Generate a new client secret" and copy it to `codecrow.github.app.client-secret` - -**OAuth Scopes Requested**: -| Scope | Description | -|-------|-------------| -| `repo` | Full control of private repositories (read/write code, issues, PRs) | -| `read:user` | Read user profile data | -| `read:org` | Read organization membership | - -**Important Notes**: -- GitHub OAuth Apps only support ONE callback URL -- For multi-workspace support, use a wildcard approach in your reverse proxy or use a single workspace slug -- Client secrets cannot be viewed again after creation - store them securely -- For GitHub Enterprise Server, contact your admin for OAuth App setup - -### Connection Types - -| Type | Description | Use Case | -|------|-------------|----------| -| `APP` | OAuth 2.0 App installation | Recommended for teams, workspace-level access | -| `OAUTH_MANUAL` | User-initiated OAuth flow | Individual user connections | -| `PERSONAL_TOKEN` | Personal access token | Bitbucket Server/DC, GitHub, automation | -| `APPLICATION` | Server-to-server OAuth | Background services | - -### VCS Provider Settings - -```properties -# Enable/disable providers -codecrow.vcs.providers.bitbucket-cloud.enabled=true -codecrow.vcs.providers.bitbucket-server.enabled=false -codecrow.vcs.providers.github.enabled=true -codecrow.vcs.providers.gitlab.enabled=false - -# Provider-specific API URLs (for self-hosted instances) -codecrow.vcs.bitbucket-server.api-url=https://bitbucket.your-company.com -codecrow.vcs.gitlab.api-url=https://gitlab.your-company.com -``` - -### Webhook Configuration - -```properties -# Webhook secret for signature verification (optional but recommended) -codecrow.webhooks.secret= - -# Provider-specific webhook endpoints (configured automatically by CodeCrow) -# Bitbucket Cloud: /api/webhooks/bitbucket-cloud/{projectId} -# GitHub: /api/webhooks/github/{projectId} -``` - -**Webhook Events**: -| Provider | Events | -|----------|--------| -| Bitbucket Cloud | `pullrequest:created`, `pullrequest:updated`, `pullrequest:fulfilled`, `repo:push` | -| GitHub | `pull_request` (opened, synchronize, reopened), `push` | - -**GitHub Webhook Security**: -- Webhooks are automatically created when onboarding repositories -- Each project gets a unique webhook secret stored in the database -- Webhook payloads are verified using HMAC-SHA256 signature - -## Environment-Specific Configuration - -### Development - -```properties -# Enable debug logging -logging.level.org.rostilos.codecrow=DEBUG - -# Enable SQL logging -spring.jpa.show-sql=true -spring.jpa.properties.hibernate.format_sql=true - -# Hot reload -spring.devtools.restart.enabled=true -``` - -### Production - -```properties -# Minimal logging -logging.level.org.rostilos.codecrow=INFO - -# Disable SQL logging -spring.jpa.show-sql=false - -# Validate schema only -spring.jpa.hibernate.ddl-auto=validate - -# Enable actuator with authentication -management.endpoints.web.exposure.include=health,info -management.endpoint.health.show-details=when-authorized -``` - -**Additional Production Settings**: -- Use HTTPS/SSL -- Enable firewall rules -- Restrict database access -- Use secrets management (Vault, AWS Secrets Manager) -- Enable monitoring and alerting -- Regular backups -- Use reverse proxy (nginx, Traefik) - -## Configuration Validation - -### Check Configuration - -```bash -# Java services -docker logs codecrow-web-application | grep "Started" -docker logs codecrow-pipeline-agent | grep "Started" - -# Python services -curl http://localhost:8000/health -curl http://localhost:8001/health - -# Frontend -curl http://localhost:8080 -``` - -### Test Database Connection - -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai -c "SELECT version();" -``` - -### Test Redis Connection - -```bash -docker exec -it codecrow-redis redis-cli ping -``` - -### Test Qdrant Connection - -```bash -curl http://localhost:6333/collections -``` - -## Troubleshooting Configuration - -### Service Won't Start - -Check logs: -```bash -docker logs -``` - -Common issues: -- Missing environment variables -- Invalid configuration values -- Database connection failed -- Port already in use - -### Configuration Not Applied - -Ensure: -- Config file is mounted correctly in docker-compose.yml -- File permissions are correct -- Container is restarted after config change -- No typos in property names - -### Secrets Exposed in Logs - -Avoid: -```properties -# Don't log passwords -spring.datasource.password=secret -``` - -Use environment variables and secure secret management in production. - diff --git a/docs/06-api-reference.md b/docs/06-api-reference.md deleted file mode 100644 index 9acecf4..0000000 --- a/docs/06-api-reference.md +++ /dev/null @@ -1,1179 +0,0 @@ -# API Reference - -Complete REST API documentation for CodeCrow services. - -## Base URLs - -- **Web Server**: `http://localhost:8081/api` -- **Pipeline Agent**: `http://localhost:8082/api/v1` -- **MCP Client**: `http://localhost:8000` (internal only) -- **RAG Pipeline**: `http://localhost:8001` (internal only) - -## Authentication - -### JWT Token Authentication - -Most endpoints require JWT authentication. - -**Obtain Token**: -```http -POST /api/auth/login -Content-Type: application/json - -{ - "username": "user@example.com", - "password": "password123" -} -``` - -**Response**: -```json -{ - "token": "eyJhbGciOiJIUzI1NiIs...", - "user": { - "id": "uuid", - "username": "user@example.com", - "email": "user@example.com", - "roles": ["USER"] - } -} -``` - -**Use Token**: -```http -Authorization: Bearer eyJhbGciOiJIUzI1NiIs... -``` - -### Project Token Authentication - -Webhooks use project-specific tokens. - -**Generate Token**: -```http -POST /api/projects/{projectId}/tokens -Authorization: Bearer -``` - -**Use in Webhook**: -```http -Authorization: Bearer -``` - -## Web Server API - -### Authentication Endpoints - -#### Register User - -```http -POST /api/auth/register -Content-Type: application/json - -{ - "username": "newuser", - "email": "user@example.com", - "password": "SecurePass123!" -} -``` - -**Response**: `201 Created` -```json -{ - "id": "uuid", - "username": "newuser", - "email": "user@example.com", - "createdAt": "2024-01-15T10:00:00" -} -``` - -#### Login - -```http -POST /api/auth/login -Content-Type: application/json - -{ - "username": "user@example.com", - "password": "password123" -} -``` - -**Response**: `200 OK` -```json -{ - "token": "jwt-token", - "user": { ... } -} -``` - -#### Get Current User - -```http -GET /api/auth/me -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "uuid", - "username": "user@example.com", - "email": "user@example.com", - "roles": ["USER"], - "workspaces": [...] -} -``` - -#### Logout - -```http -POST /api/auth/logout -Authorization: Bearer -``` - -**Response**: `200 OK` - -### Workspace Endpoints - -#### List Workspaces - -```http -GET /api/workspaces -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "id": "workspace-uuid", - "name": "My Workspace", - "description": "Company projects", - "createdAt": "2024-01-01T00:00:00", - "role": "OWNER", - "memberCount": 5 - } -] -``` - -#### Create Workspace - -```http -POST /api/workspaces -Authorization: Bearer -Content-Type: application/json - -{ - "name": "New Workspace", - "description": "Description here" -} -``` - -**Response**: `201 Created` -```json -{ - "id": "uuid", - "name": "New Workspace", - "description": "Description here", - "createdAt": "2024-01-15T10:00:00", - "ownerId": "user-uuid" -} -``` - -#### Get Workspace - -```http -GET /api/workspaces/{id} -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "uuid", - "name": "My Workspace", - "description": "...", - "createdAt": "2024-01-01T00:00:00", - "members": [ - { - "userId": "uuid", - "username": "user1", - "email": "user1@example.com", - "role": "OWNER", - "joinedAt": "2024-01-01T00:00:00" - } - ], - "projects": [...] -} -``` - -#### Update Workspace - -```http -PUT /api/workspaces/{id} -Authorization: Bearer -Content-Type: application/json - -{ - "name": "Updated Name", - "description": "Updated description" -} -``` - -**Response**: `200 OK` - -#### Delete Workspace - -```http -DELETE /api/workspaces/{id} -Authorization: Bearer -``` - -**Response**: `204 No Content` - -#### Add Member - -```http -POST /api/workspaces/{id}/members -Authorization: Bearer -Content-Type: application/json - -{ - "email": "newmember@example.com", - "role": "MEMBER" -} -``` - -**Roles**: `OWNER`, `ADMIN`, `MEMBER`, `VIEWER` - -**Response**: `201 Created` - -#### Remove Member - -```http -DELETE /api/workspaces/{id}/members/{userId} -Authorization: Bearer -``` - -**Response**: `204 No Content` - -### Project Endpoints - -#### List Projects - -```http -GET /api/projects/workspace/{workspaceId} -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "id": "project-uuid", - "name": "My Project", - "workspaceId": "workspace-uuid", - "repositoryUrl": "https://bitbucket.org/workspace/repo", - "defaultBranch": "main", - "createdAt": "2024-01-01T00:00:00", - "analysisCount": 45, - "activeIssues": 12 - } -] -``` - -#### Create Project - -```http -POST /api/projects/workspace/{workspaceId} -Authorization: Bearer -Content-Type: application/json - -{ - "name": "New Project", - "description": "Project description", - "repositoryUrl": "https://bitbucket.org/workspace/repo", - "defaultBranch": "main", - "vcsConnectionId": "vcs-uuid", - "aiConnectionId": "ai-uuid" -} -``` - -**Response**: `201 Created` - -#### Get Project - -```http -GET /api/projects/{id} -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "uuid", - "name": "My Project", - "description": "...", - "workspaceId": "workspace-uuid", - "repositoryUrl": "...", - "defaultBranch": "main", - "vcsConnection": { ... }, - "aiConnection": { ... }, - "createdAt": "2024-01-01T00:00:00", - "branches": [...], - "statistics": { - "totalAnalyses": 45, - "activeIssues": 12, - "resolvedIssues": 89 - } -} -``` - -#### Update Project - -```http -PUT /api/projects/{id} -Authorization: Bearer -Content-Type: application/json - -{ - "name": "Updated Name", - "description": "Updated description", - "defaultBranch": "develop" -} -``` - -**Response**: `200 OK` - -#### Delete Project - -```http -DELETE /api/projects/{id} -Authorization: Bearer -``` - -**Response**: `204 No Content` - -#### Generate Webhook Token - -```http -POST /api/projects/{id}/tokens -Authorization: Bearer -Content-Type: application/json - -{ - "name": "Production Webhook", - "expiresInDays": 90 -} -``` - -**Response**: `201 Created` -```json -{ - "id": "token-uuid", - "token": "proj_xxxxxxxxxxxxxx", - "projectId": "project-uuid", - "name": "Production Webhook", - "createdAt": "2024-01-15T10:00:00", - "expiresAt": "2024-04-15T10:00:00" -} -``` - -#### Get Project Statistics - -```http -GET /api/projects/{id}/statistics -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "totalAnalyses": 45, - "pullRequestAnalyses": 40, - "branchAnalyses": 5, - "activeIssues": 12, - "resolvedIssues": 89, - "issuesBySeverity": { - "HIGH": 3, - "MEDIUM": 7, - "LOW": 2 - }, - "issuesByCategory": { - "SECURITY": 2, - "CODE_QUALITY": 8, - "PERFORMANCE": 2 - }, - "analysisHistory": [ - { - "date": "2024-01-15", - "count": 3, - "issuesFound": 5 - } - ] -} -``` - -#### Get Branch Analysis Configuration - -```http -GET /api/{workspaceSlug}/project/{namespace}/branch-analysis-config -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "prTargetBranches": ["main", "develop", "release/*"], - "branchPushPatterns": ["main", "develop"] -} -``` - -Returns `null` if no configuration is set (all branches analyzed). - -#### Update Branch Analysis Configuration - -Configure which branches trigger automated analysis. Supports exact names and glob patterns. - -```http -PUT /api/{workspaceSlug}/project/{namespace}/branch-analysis-config -Authorization: Bearer -Content-Type: application/json - -{ - "prTargetBranches": ["main", "develop", "release/*"], - "branchPushPatterns": ["main", "develop"] -} -``` - -**Pattern Syntax**: -- `main` - Exact match -- `release/*` - Matches `release/1.0`, `release/2.0` (single level) -- `feature/**` - Matches `feature/auth`, `feature/auth/oauth` (any depth) - -**Response**: `200 OK` - Returns updated ProjectDTO - -**Default Behavior**: If arrays are empty or null, all branches are analyzed. - -#### Get RAG Configuration - -```http -GET /api/workspace/{workspaceSlug}/project/{projectNamespace}/rag/config -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "enabled": true, - "branch": "main", - "excludePatterns": ["vendor/**", "lib/**", "generated/**"] -} -``` - -#### Update RAG Configuration - -Configure RAG indexing settings including exclude patterns. - -```http -PUT /api/workspace/{workspaceSlug}/project/{projectNamespace}/rag/config -Authorization: Bearer -Content-Type: application/json - -{ - "enabled": true, - "branch": "main", - "excludePatterns": ["vendor/**", "lib/**", "app/design/**"] -} -``` - -**Fields**: -- `enabled` (required): Enable/disable RAG indexing -- `branch` (optional): Branch to index (null = use default branch) -- `excludePatterns` (optional): Array of glob patterns to exclude - -**Exclude Pattern Syntax**: -- `vendor/**` - Directory and all subdirectories -- `*.min.js` - File extension pattern -- `**/*.generated.ts` - Pattern at any depth -- `lib/` - Directory prefix - -**Response**: `200 OK` - Returns updated ProjectDTO - -#### Get RAG Index Status - -```http -GET /api/workspace/{workspaceSlug}/project/{projectNamespace}/rag/status -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "isIndexed": true, - "indexStatus": { - "projectId": 123, - "status": "INDEXED", - "indexedBranch": "main", - "indexedCommitHash": "abc123def456", - "totalFilesIndexed": 1250, - "lastIndexedAt": "2024-01-15T10:30:00Z", - "errorMessage": null, - "collectionName": "codecrow_workspace__project__main" - }, - "canStartIndexing": true -} -``` - -**Status Values**: -- `NOT_INDEXED` - Never indexed -- `INDEXING` - Currently indexing -- `INDEXED` - Successfully indexed -- `UPDATING` - Incremental update in progress -- `FAILED` - Last indexing failed - -#### Trigger RAG Indexing - -Manually trigger RAG indexing. Returns Server-Sent Events stream with progress updates. - -```http -POST /api/workspace/{workspaceSlug}/project/{projectNamespace}/rag/trigger -Authorization: Bearer -Accept: text/event-stream -``` - -**Response**: SSE Stream -``` -event: message -data: {"type":"progress","stage":"init","message":"Starting RAG indexing..."} - -event: message -data: {"type":"progress","stage":"download","message":"Downloading repository..."} - -event: message -data: {"type":"progress","stage":"indexing","message":"Excluding 4 custom patterns"} - -event: message -data: {"type":"complete","filesIndexed":1250} -``` - -**Rate Limiting**: Minimum 60 seconds between trigger requests per project. - -### Analysis Endpoints - -#### List Project Analyses - -```http -GET /api/analysis/project/{projectId}?page=0&size=20&sort=createdAt,desc -Authorization: Bearer -``` - -**Query Parameters**: -- `page`: Page number (0-indexed) -- `size`: Items per page -- `sort`: Sort field and direction - -**Response**: `200 OK` -```json -{ - "content": [ - { - "id": "analysis-uuid", - "projectId": "project-uuid", - "pullRequestId": "pr-uuid", - "status": "COMPLETED", - "totalIssues": 5, - "highSeverity": 1, - "mediumSeverity": 3, - "lowSeverity": 1, - "createdAt": "2024-01-15T10:00:00", - "completedAt": "2024-01-15T10:02:00" - } - ], - "totalElements": 45, - "totalPages": 3, - "number": 0, - "size": 20 -} -``` - -#### Get Analysis - -```http -GET /api/analysis/{id} -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "uuid", - "projectId": "project-uuid", - "pullRequest": { - "id": "pr-uuid", - "number": 123, - "title": "Add authentication", - "author": "developer" - }, - "status": "COMPLETED", - "createdAt": "2024-01-15T10:00:00", - "completedAt": "2024-01-15T10:02:00", - "totalIssues": 5, - "issuesBySeverity": { ... }, - "issuesByCategory": { ... }, - "issues": [...] -} -``` - -#### Get Analysis Issues - -```http -GET /api/analysis/{id}/issues?severity=HIGH&category=SECURITY -Authorization: Bearer -``` - -**Query Parameters**: -- `severity`: Filter by severity (HIGH, MEDIUM, LOW) -- `category`: Filter by category -- `file`: Filter by file path -- `resolved`: Filter by resolution status - -**Response**: `200 OK` -```json -[ - { - "id": "issue-uuid", - "analysisId": "analysis-uuid", - "file": "src/main/java/Auth.java", - "line": 42, - "severity": "HIGH", - "category": "SECURITY", - "description": "SQL injection vulnerability", - "suggestion": "Use parameterized queries", - "codeSnippet": "String query = ...", - "resolved": false - } -] -``` - -### Issue Endpoints - -#### Get Branch Issues - -```http -GET /api/issues/branch/{branchId}?resolved=false -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "id": "issue-uuid", - "branchId": "branch-uuid", - "file": "src/service/Payment.java", - "line": 78, - "severity": "MEDIUM", - "category": "CODE_QUALITY", - "description": "Method too complex", - "suggestion": "Refactor into smaller methods", - "resolved": false, - "createdAt": "2024-01-10T15:00:00" - } -] -``` - -#### Get Active Issues - -```http -GET /api/issues/project/{projectId}/active -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "id": "issue-uuid", - "branch": "main", - "file": "src/...", - "line": 42, - "severity": "HIGH", - "category": "SECURITY", - "description": "...", - "createdAt": "2024-01-15T10:00:00" - } -] -``` - -#### Get Issue Details - -```http -GET /api/issues/{id} -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "uuid", - "file": "src/service/Auth.java", - "line": 42, - "severity": "HIGH", - "category": "SECURITY", - "description": "SQL injection vulnerability in login method", - "suggestion": "Use PreparedStatement with parameterized queries", - "codeSnippet": "String query = \"SELECT * FROM users WHERE id=\" + userId;", - "resolved": false, - "createdAt": "2024-01-15T10:00:00", - "analysis": { ... }, - "branch": { ... } -} -``` - -### VCS Integration Endpoints - -#### Connect Bitbucket - -```http -POST /api/vcs/bitbucket/connect -Authorization: Bearer -Content-Type: application/json - -{ - "workspaceId": "workspace-uuid", - "appPassword": "bitbucket-app-password", - "username": "bitbucket-username" -} -``` - -**Response**: `201 Created` - -#### List Repositories - -```http -GET /api/vcs/bitbucket/repositories?workspaceId=uuid -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "workspace": "my-workspace", - "slug": "my-repo", - "name": "My Repository", - "url": "https://bitbucket.org/my-workspace/my-repo", - "isPrivate": true, - "mainBranch": "main" - } -] -``` - -### AI Connection Endpoints - -#### List AI Connections - -```http -GET /api/ai/connections -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "id": "uuid", - "name": "OpenRouter", - "provider": "OPENROUTER", - "model": "anthropic/claude-3.5-sonnet", - "createdAt": "2024-01-01T00:00:00" - } -] -``` - -#### Create AI Connection - -```http -POST /api/ai/connections -Authorization: Bearer -Content-Type: application/json - -{ - "name": "OpenRouter Production", - "provider": "OPENROUTER", - "apiKey": "sk-or-v1-...", - "model": "anthropic/claude-3.5-sonnet", - "configuration": { - "temperature": 0.7, - "maxTokens": 4096 - } -} -``` - -**Response**: `201 Created` - -## Pipeline Agent API - -### Webhook Endpoint - -```http -POST /api/webhooks/{provider}/{authToken} -Content-Type: application/json - -{ - // VCS provider webhook payload (Bitbucket, GitHub, GitLab) -} -``` - -**Response**: `202 Accepted` -```json -{ - "status": "accepted", - "message": "Webhook received, processing started", - "jobId": "uuid", - "jobUrl": "https://codecrow.io/api/{workspace}/projects/{project}/jobs/{jobId}", - "logsStreamUrl": "https://codecrow.io/api/jobs/{jobId}/logs/stream", - "projectId": 123, - "eventType": "pullrequest:created" -} -``` - -> **Note**: Webhooks now return immediately with a job ID. Use the `logsStreamUrl` for real-time progress via SSE, or poll the `jobUrl` for status updates. - -## Jobs API - -Background jobs track long-running operations like code analysis, RAG indexing, and repository sync. Jobs provide real-time progress tracking and persistent logs. - -### Job Types - -| Type | Description | -|------|-------------| -| `PR_ANALYSIS` | Pull request code review | -| `BRANCH_ANALYSIS` | Branch push analysis | -| `BRANCH_RECONCILIATION` | Post-merge reconciliation | -| `RAG_INITIAL_INDEX` | Initial repository indexing | -| `RAG_INCREMENTAL_INDEX` | Incremental index update | -| `MANUAL_ANALYSIS` | On-demand analysis | -| `REPO_SYNC` | Repository synchronization | - -### Job Statuses - -| Status | Description | -|--------|-------------| -| `PENDING` | Job created, not yet started | -| `QUEUED` | Waiting for resources | -| `RUNNING` | Currently executing | -| `COMPLETED` | Finished successfully | -| `FAILED` | Finished with error | -| `CANCELLED` | Cancelled by user/system | -| `WAITING` | Waiting for lock release | - -### List Workspace Jobs - -```http -GET /api/{workspaceSlug}/jobs -Authorization: Bearer -``` - -**Query Parameters**: -- `page`: Page number (0-indexed) -- `size`: Items per page (default: 20) -- `status`: Filter by status (RUNNING, COMPLETED, FAILED, etc.) -- `type`: Filter by job type (PR_ANALYSIS, BRANCH_ANALYSIS, etc.) - -**Response**: `200 OK` -```json -{ - "jobs": [ - { - "id": "550e8400-e29b-41d4-a716-446655440000", - "projectId": 123, - "projectName": "My Project", - "projectNamespace": "my-project", - "workspaceId": 1, - "workspaceName": "My Workspace", - "jobType": "PR_ANALYSIS", - "status": "COMPLETED", - "triggerSource": "WEBHOOK", - "title": "PR #42 Analysis: feature/auth → main", - "branchName": "main", - "prNumber": 42, - "commitHash": "abc1234", - "progress": 100, - "currentStep": "complete", - "createdAt": "2024-01-15T10:00:00Z", - "startedAt": "2024-01-15T10:00:01Z", - "completedAt": "2024-01-15T10:02:30Z", - "durationMs": 149000, - "logCount": 45 - } - ], - "page": 0, - "pageSize": 20, - "totalElements": 150, - "totalPages": 8 -} -``` - -### List Project Jobs - -```http -GET /api/{workspaceSlug}/projects/{projectNamespace}/jobs -Authorization: Bearer -``` - -**Query Parameters**: Same as workspace jobs - -**Response**: Same format as workspace jobs - -### Get Active Jobs - -```http -GET /api/{workspaceSlug}/projects/{projectNamespace}/jobs/active -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -[ - { - "id": "uuid", - "jobType": "PR_ANALYSIS", - "status": "RUNNING", - "progress": 65, - "currentStep": "analyzing_diff", - ... - } -] -``` - -### Get Job Details - -```http -GET /api/{workspaceSlug}/projects/{projectNamespace}/jobs/{jobId} -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "550e8400-e29b-41d4-a716-446655440000", - "projectId": 123, - "projectName": "My Project", - "jobType": "PR_ANALYSIS", - "status": "RUNNING", - "triggerSource": "WEBHOOK", - "title": "PR #42 Analysis", - "branchName": "main", - "prNumber": 42, - "commitHash": "abc1234def5678", - "progress": 75, - "currentStep": "generating_report", - "createdAt": "2024-01-15T10:00:00Z", - "startedAt": "2024-01-15T10:00:01Z", - "durationMs": 45000, - "logCount": 32 -} -``` - -### Get Job Logs - -```http -GET /api/{workspaceSlug}/projects/{projectNamespace}/jobs/{jobId}/logs -Authorization: Bearer -``` - -**Query Parameters**: -- `afterSequence`: Return logs after this sequence number (for pagination/polling) - -**Response**: `200 OK` -```json -{ - "jobId": "550e8400-e29b-41d4-a716-446655440000", - "logs": [ - { - "id": "log-uuid-1", - "sequenceNumber": 1, - "level": "INFO", - "step": "init", - "message": "Job created for PR #42", - "timestamp": "2024-01-15T10:00:00Z" - }, - { - "id": "log-uuid-2", - "sequenceNumber": 2, - "level": "INFO", - "step": "fetching_diff", - "message": "Fetching PR diff from Bitbucket", - "timestamp": "2024-01-15T10:00:01Z" - }, - { - "id": "log-uuid-3", - "sequenceNumber": 3, - "level": "WARN", - "step": "analysis", - "message": "Large file detected, truncating context", - "metadata": "{\"file\": \"large-bundle.js\", \"size\": 5242880}", - "timestamp": "2024-01-15T10:00:15Z" - } - ], - "latestSequence": 32, - "isComplete": false -} -``` - -### Stream Job Logs (SSE) - -Real-time log streaming using Server-Sent Events. - -```http -GET /api/{workspaceSlug}/projects/{projectNamespace}/jobs/{jobId}/logs/stream -Authorization: Bearer -Accept: text/event-stream -``` - -**Query Parameters**: -- `afterSequence`: Start streaming from this sequence (default: 0) - -**SSE Events**: - -``` -event: log -data: {"id":"uuid","sequenceNumber":1,"level":"INFO","step":"init","message":"Job started","timestamp":"2024-01-15T10:00:00Z"} - -event: log -data: {"id":"uuid","sequenceNumber":2,"level":"INFO","step":"fetching_diff","message":"Fetching diff...","timestamp":"2024-01-15T10:00:01Z"} - -event: complete -data: {"status":"COMPLETED","message":"Job completed successfully"} -``` - -**JavaScript Example**: -```javascript -const eventSource = new EventSource( - '/api/workspace/projects/my-project/jobs/job-uuid/logs/stream?afterSequence=0' -); - -eventSource.addEventListener('log', (event) => { - const log = JSON.parse(event.data); - console.log(`[${log.level}] ${log.message}`); -}); - -eventSource.addEventListener('complete', (event) => { - const { status, message } = JSON.parse(event.data); - console.log(`Job ${status}: ${message}`); - eventSource.close(); -}); - -eventSource.onerror = () => { - eventSource.close(); -}; -``` - -### Cancel Job - -```http -POST /api/{workspaceSlug}/projects/{projectNamespace}/jobs/{jobId}/cancel -Authorization: Bearer -``` - -**Response**: `200 OK` -```json -{ - "id": "uuid", - "status": "CANCELLED", - ... -} -``` - -### Public Job Endpoints - -These endpoints use the job's external UUID directly, useful for webhook responses. - -#### Get Job by External ID - -```http -GET /api/jobs/{jobId} -``` - -#### Stream Logs by External ID - -```http -GET /api/jobs/{jobId}/logs/stream -Accept: text/event-stream -``` -``` - -## Response Codes - -- `200 OK`: Request successful -- `201 Created`: Resource created -- `202 Accepted`: Request accepted, processing async -- `204 No Content`: Successful deletion -- `400 Bad Request`: Invalid request data -- `401 Unauthorized`: Missing or invalid authentication -- `403 Forbidden`: Insufficient permissions -- `404 Not Found`: Resource not found -- `409 Conflict`: Resource conflict (e.g., duplicate) -- `422 Unprocessable Entity`: Validation error -- `500 Internal Server Error`: Server error - -## Error Response Format - -```json -{ - "timestamp": "2024-01-15T10:00:00", - "status": 400, - "error": "Bad Request", - "message": "Validation failed", - "path": "/api/projects", - "errors": [ - { - "field": "name", - "message": "Name is required" - } - ] -} -``` - -## Pagination - -List endpoints support pagination: - -**Query Parameters**: -- `page`: Page number (0-indexed) -- `size`: Items per page (default: 20, max: 100) -- `sort`: Sort field and direction (e.g., `createdAt,desc`) - -**Response**: -```json -{ - "content": [...], - "totalElements": 100, - "totalPages": 5, - "number": 0, - "size": 20, - "first": true, - "last": false -} -``` - -## Rate Limiting - -Currently not implemented. Consider adding for production: - -- Per-user limits -- Per-project limits -- Webhook endpoint limits - -## Swagger/OpenAPI - -Interactive API documentation available at: - -`http://localhost:8081/swagger-ui-custom.html` - -OpenAPI spec: - -`http://localhost:8081/api-docs` - diff --git a/docs/07-analysis-types.md b/docs/07-analysis-types.md deleted file mode 100644 index badf696..0000000 --- a/docs/07-analysis-types.md +++ /dev/null @@ -1,624 +0,0 @@ -# Analysis Types - -CodeCrow performs two types of automated code analysis: Branch Analysis and Pull Request Analysis. - -## Branch Analysis - -### Overview - -Branch analysis is triggered when a pull request is merged into a branch. It performs incremental analysis of the target branch to verify if previously reported issues have been resolved. - -### Trigger Event - -**Bitbucket Webhook**: `repo:push` event - -**Conditions**: -- Push is a merge (pull request merged) -- Target branch is tracked by CodeCrow -- Project has active webhook token - -### Flow Diagram - -``` -PR Merged → Push Event → Webhook Received → -Acquire Lock → Fetch Changed Files → -Check First Analysis → RAG Indexing/Update → -Query Existing Issues → Build Request → -MCP Client Analysis → Process Results → -Update Issue Status → Release Lock -``` - -### First Branch Analysis - -When a branch is analyzed for the first time: - -1. **Full Repository Indexing**: - - All repository files are fetched - - RAG pipeline indexes entire codebase - - Vector embeddings stored in Qdrant - - RagIndexStatus record created with status `COMPLETED` - -2. **No Existing Issues**: - - No previous issues to check - - Analysis focuses on new code quality - -3. **Baseline Established**: - - All detected issues stored as BranchIssue - - Provides baseline for future analyses - -### Subsequent Branch Analysis - -For branches with existing analyses: - -1. **Incremental RAG Update**: - - Only changed files re-indexed - - Existing embeddings updated or removed - - New file embeddings added - - RagIndexStatus updated - -2. **Issue Resolution Check**: - - Fetch all active BranchIssue records for branch - - Filter issues related to changed files - - Include in analysis request as `previous_issues` - -3. **AI Re-Analysis**: - - LLM checks if issues are resolved - - Compares previous issue location with new code - - Returns resolution status for each issue - -4. **Issue Status Update**: - - Issues marked as resolved: `resolved = true` - - Unresolved issues remain active - - New issues added as BranchIssue - -### Request Structure - -```json -{ - "project_id": "uuid", - "analysis_type": "BRANCH", - "repository": { - "workspace": "my-workspace", - "repo_slug": "my-repo", - "branch": "main" - }, - "changed_files": [ - { - "path": "src/service/auth.java", - "diff": "@@ -10,5 +10,7 @@...", - "content": "package org.example..." - } - ], - "previous_issues": [ - { - "id": "issue-uuid", - "file": "src/service/auth.java", - "line": 42, - "severity": "HIGH", - "description": "SQL injection vulnerability" - } - ], - "metadata": { - "merge_commit": "abc123", - "pr_number": 45, - "author": "username" - } -} -``` - -### Response Processing - -**Resolution Detected**: -```json -{ - "resolved_issues": [ - { - "issue_id": "issue-uuid", - "resolved": true, - "reason": "Parameterized queries now used" - } - ], - "new_issues": [...] -} -``` - -**Update Database**: -- Set `BranchIssue.resolved = true` for resolved issues -- Set `BranchIssue.resolved_at = timestamp` -- Create new BranchIssue records for new issues - -### RAG Integration - -**First Analysis**: -```python -# Full index build -POST /index -{ - "project_id": "proj-123", - "repository": "my-repo", - "branch": "main", - "files": [...all repository files...], - "incremental": false -} -``` - -**Incremental Update**: -```python -# Update only changed files -POST /index -{ - "project_id": "proj-123", - "repository": "my-repo", - "branch": "main", - "files": [...changed files only...], - "incremental": true -} -``` - -### Locking Mechanism - -**Purpose**: Prevent concurrent branch analysis - -**Implementation**: -- Database record in `AnalysisLock` table -- Lock acquired before analysis starts -- Lock includes: repository, branch, timestamp -- Lock released after analysis completes -- Stale locks auto-expire (configurable timeout) - -**Lock Check**: -```sql -SELECT * FROM analysis_lock -WHERE repository = ? AND branch = ? -AND locked_at > NOW() - INTERVAL '30 minutes'; -``` - -If lock exists and not expired, analysis is skipped. - -### Performance Considerations - -**Large Repositories**: -- First analysis can take 10-30 minutes for large codebases -- Incremental updates typically < 1 minute -- RAG indexing is the bottleneck - -**Optimization**: -- Index only supported file types -- Skip binary files and build artifacts -- Use file size limits (default: 1MB per file) -- Batch embedding generation -- Configure project-specific exclude patterns - -### Exclude Patterns - -Projects can configure custom exclude patterns to skip irrelevant files during RAG indexing: - -**Default System Patterns**: -- `node_modules/**`, `.venv/**`, `__pycache__/**` -- Binary files: `*.jar`, `*.dll`, `*.exe`, etc. -- Build outputs: `target/**`, `build/**`, `dist/**` -- Lock files: `package-lock.json`, `yarn.lock`, etc. - -**Custom Project Patterns** (configured per project): -```json -{ - "excludePatterns": [ - "vendor/**", - "lib/**", - "generated/**", - "app/design/**" - ] -} -``` - -**Pattern Matching**: -- `vendor/**` matches all files under vendor/ directory -- `*.min.js` matches minified JavaScript files -- `**/*.generated.ts` matches generated TypeScript files anywhere - -See [Configuration Guide](./05-configuration.md#project-level-rag-configuration) for details. - -### Full Reindex Behavior - -When RAG indexing is triggered (manually or automatically): - -1. **New temporary collection created** - Ensures old index remains available during indexing -2. **Documents loaded and filtered** - Applies both system and project exclude patterns -3. **Chunks generated and embedded** - Creates vector embeddings -4. **On success**: Old collection deleted, new collection activated -5. **On failure**: Temporary collection cleaned up, old index preserved - -This ensures zero-downtime reindexing and no data loss on failures. - -### Error Handling - -**RAG Indexing Failed**: -- RagIndexStatus marked as `FAILED` -- Branch analysis continues without RAG context -- Retry indexing on next analysis - -**MCP Client Timeout**: -- Analysis marked as `FAILED` -- Lock released -- Webhook can retry - -**Partial Success**: -- Some issues updated, others failed -- Record error details in logs -- Return partial results - -## Pull Request Analysis - -### Overview - -Pull request analysis is triggered when a PR is created or updated. It analyzes only the changed code in the PR. - -### Trigger Events - -**Bitbucket Webhooks**: -- `pullrequest:created` -- `pullrequest:updated` - -**Conditions**: -- PR is in open state -- Target branch belongs to tracked project -- Project has active webhook token - -### Flow Diagram - -``` -PR Created/Updated → Webhook Received → -Acquire Lock → Fetch PR Metadata → -Fetch Diffs → Check Previous PR Analysis → -Query RAG Context → Build Request → -MCP Client Analysis → Process Results → -Create/Update CodeAnalysis → Create Issues → -Link to PR → Release Lock -``` - -### First PR Analysis - -When a PR is analyzed for the first time: - -1. **Fetch PR Data**: - - PR metadata (author, title, description, source/target branches) - - Diffs for all changed files - - File content for context - -2. **RAG Context**: - - Query RAG for relevant code from target branch - - Provides context about related code - - Helps AI understand broader codebase - -3. **Analysis**: - - LLM analyzes diffs with RAG context - - Identifies issues in changed code - - Suggests improvements - -4. **Store Results**: - - Create `CodeAnalysis` record - - Create `CodeAnalysisIssue` for each issue found - - Link to `PullRequest` entity - -### PR Re-Analysis - -When a PR is updated (new commits pushed): - -1. **Fetch Updated Diffs**: - - Get latest changes since last analysis - - Or re-analyze entire PR (configurable) - -2. **Include Previous Issues**: - - Fetch issues from previous CodeAnalysis - - Include in request as `previous_issues` - - AI can check if issues were addressed - -3. **Update Analysis**: - - Update existing `CodeAnalysis` record - - Mark old issues as outdated - - Create new issues for new findings - - Reuse issue records if still present - -### Request Structure - -```json -{ - "project_id": "uuid", - "analysis_type": "PULL_REQUEST", - "repository": { - "workspace": "my-workspace", - "repo_slug": "my-repo", - "branch": "feature/new-feature", - "target_branch": "main" - }, - "changed_files": [ - { - "path": "src/controller/UserController.java", - "diff": "@@ -15,3 +15,5 @@...", - "old_content": "...", - "new_content": "..." - } - ], - "previous_issues": [ - { - "id": "issue-uuid", - "file": "src/controller/UserController.java", - "line": 23, - "description": "Missing input validation" - } - ], - "metadata": { - "pr_number": 123, - "pr_title": "Add user authentication", - "author": "dev-user", - "source_branch": "feature/new-feature", - "target_branch": "main", - "reviewers": ["reviewer1", "reviewer2"] - } -} -``` - -### Response Processing - -```json -{ - "issues": [ - { - "file": "src/controller/UserController.java", - "line": 23, - "severity": "HIGH", - "category": "SECURITY", - "description": "SQL injection vulnerability in login method", - "suggestion": "Use parameterized queries", - "code_snippet": "String query = \"SELECT * FROM users WHERE...\";" - }, - { - "file": "src/controller/UserController.java", - "line": 45, - "severity": "MEDIUM", - "category": "CODE_QUALITY", - "description": "Method too long (120 lines)", - "suggestion": "Refactor into smaller methods" - } - ], - "summary": { - "total_issues": 2, - "by_severity": {"HIGH": 1, "MEDIUM": 1}, - "by_category": {"SECURITY": 1, "CODE_QUALITY": 1} - }, - "previous_issues_status": [ - { - "issue_id": "issue-uuid", - "resolved": false, - "comment": "Issue still present" - } - ] -} -``` - -### Database Records - -**CodeAnalysis**: -```java -{ - id: "analysis-uuid", - projectId: "proj-uuid", - pullRequestId: "pr-uuid", - status: "COMPLETED", - createdAt: "2024-01-15T10:30:00", - completedAt: "2024-01-15T10:32:00", - totalIssues: 2, - highSeverity: 1, - mediumSeverity: 1, - lowSeverity: 0 -} -``` - -**CodeAnalysisIssue**: -```java -{ - id: "issue-uuid", - analysisId: "analysis-uuid", - file: "src/controller/UserController.java", - line: 23, - severity: "HIGH", - category: "SECURITY", - description: "SQL injection vulnerability...", - suggestion: "Use parameterized queries", - codeSnippet: "String query = ...", - resolved: false -} -``` - -### RAG Integration - -**Query for Context**: -```python -POST /query -{ - "project_id": "proj-123", - "repository": "my-repo", - "branch": "main", # Target branch - "query": "authentication login user validation", - "top_k": 10 -} -``` - -**Response**: -```json -{ - "results": [ - { - "file": "src/service/AuthService.java", - "content": "public class AuthService { ... }", - "score": 0.92 - }, - { - "file": "docs/security.md", - "content": "# Security Guidelines...", - "score": 0.85 - } - ] -} -``` - -AI receives this context to better understand codebase patterns and standards. - -### Diff Analysis - -**Added Lines**: Primary focus for new issues -**Modified Lines**: Check for improvements or regressions -**Removed Lines**: Context only, issues here are resolved - -**Diff Parsing**: -``` -@@ -10,5 +10,7 @@ class UserService { -- String query = "SELECT * FROM users WHERE id=" + userId; -+ PreparedStatement stmt = conn.prepareStatement( -+ "SELECT * FROM users WHERE id=?" -+ ); -``` - -AI identifies: -- Removed code: SQL injection vulnerability -- Added code: Properly uses prepared statement -- Resolution: Previous issue fixed - -### Issue Categorization - -**Severity Levels**: -- `HIGH`: Security vulnerabilities, critical bugs, data loss risks -- `MEDIUM`: Code quality issues, performance problems, maintainability -- `LOW`: Style issues, minor improvements, optimization suggestions - -**Categories**: -- `SECURITY`: Security vulnerabilities, authentication, authorization -- `CODE_QUALITY`: Code smells, complexity, duplication -- `PERFORMANCE`: Inefficiencies, resource usage, bottlenecks -- `BEST_PRACTICES`: Convention violations, anti-patterns -- `DOCUMENTATION`: Missing or incorrect documentation -- `TESTING`: Missing tests, test quality issues - -### Webhook Response - -After analysis completes, CodeCrow can optionally: - -1. **Post PR Comment** (if configured): - - Summary of findings - - Link to detailed results - - Severity breakdown - -2. **Set PR Status**: - - Pass/Fail based on thresholds - - Block merge if critical issues found - -3. **Notify Reviewers**: - - Email or Slack notifications - - Issue summary and link - -## Comparison: Branch vs PR Analysis - -| Aspect | Branch Analysis | Pull Request Analysis | -|--------|----------------|----------------------| -| Trigger | PR merge (push event) | PR create/update | -| Scope | Changed files in merge | Changed files in PR | -| Purpose | Verify issue resolution | Find issues in changes | -| Previous Issues | Branch issues | Previous PR analysis | -| RAG Indexing | Yes (full or incremental) | No (uses existing index) | -| Frequency | After each merge | After each PR update | -| Duration | Longer (indexing) | Faster (no indexing) | -| Result Storage | BranchIssue | CodeAnalysisIssue | - -## Analysis Configuration - -### Project Settings - -Projects can configure: - -- **Analysis Enabled**: Enable/disable automated analysis -- **Auto-Comment**: Post analysis summary to PR -- **Block on Critical**: Prevent merge if critical issues found -- **Severity Threshold**: Minimum severity to report -- **File Filters**: Exclude paths (e.g., `node_modules/`, `*.test.js`) -- **Max Analysis Time**: Timeout for analysis operation - -### Analysis Scope Configuration - -Control which branches trigger automated analysis using pattern matching. Configure in **Project Settings → Analysis Scope**. - -**PR Target Branch Patterns**: -Only analyze PRs targeting branches matching these patterns. - -**Branch Push Patterns**: -Only analyze pushes (including PR merges) to branches matching these patterns. - -**Pattern Syntax**: -| Pattern | Description | Example Matches | -|---------|-------------|-----------------| -| `main` | Exact match | `main` | -| `develop` | Exact match | `develop` | -| `release/*` | Single-level wildcard | `release/1.0`, `release/2.0` | -| `feature/**` | Multi-level wildcard | `feature/auth`, `feature/auth/oauth` | -| `hotfix-*` | Prefix match | `hotfix-123`, `hotfix-urgent` | - -**Examples**: -``` -# Analyze PRs targeting main and develop branches only -PR Target Branches: main, develop - -# Also analyze PRs targeting any release branch -PR Target Branches: main, develop, release/* - -# Analyze pushes to main only (for branch analysis) -Branch Push Patterns: main -``` - -**Default Behavior**: If no patterns are configured, all branches are analyzed. - -### Webhook Configuration - -**Bitbucket Webhook URL**: -``` -https://your-domain.com:8082/api/v1/bitbucket-cloud/webhook -``` - -**Events to Enable**: -- Repository: Push -- Pull Request: Created, Updated, Merged - -**Authentication Header**: -``` -Authorization: Bearer -``` - -Generate token in CodeCrow UI under Project Settings. - -## Best Practices - -### For Branch Analysis - -- Enable on main/develop branches -- Configure analysis scope patterns to filter analysis (e.g., `main`, `develop`) -- Review resolved issues periodically -- Clean up old resolved issues -- Monitor RAG indexing performance -- Schedule manual re-indexing if needed - -### For PR Analysis - -- Analyze all PRs before merge -- Configure PR target patterns in Analysis Scope to focus on protected branches -- Use as required status check -- Review and address issues before merging -- Don't ignore security issues -- Use suggestions to improve code - -### General - -- Keep codebase patterns consistent -- Document coding standards -- Train team on issue categories -- Adjust severity thresholds as needed -- Monitor analysis costs (OpenRouter) -- Review false positives and improve prompts - diff --git a/docs/08-database-schema.md b/docs/08-database-schema.md deleted file mode 100644 index 5fc49fe..0000000 --- a/docs/08-database-schema.md +++ /dev/null @@ -1,742 +0,0 @@ -# Database Schema - -CodeCrow uses PostgreSQL as the primary relational database. - -## Entity Relationship Diagram - -``` -┌──────────────┐ -│ User │ -│──────────────│ -│ id (PK) │ -│ username │ -│ email │ -│ password │ -│ roles │ -│ created_at │ -└──────┬───────┘ - │ - │ 1:N - │ -┌──────▼──────────────┐ ┌─────────────────┐ -│ WorkspaceMember │ N:1 │ Workspace │ -│─────────────────────│◄────────┤─────────────────│ -│ id (PK) │ │ id (PK) │ -│ workspace_id (FK) │ │ name │ -│ user_id (FK) │ │ description │ -│ role │ │ created_at │ -│ joined_at │ └────────┬────────┘ -└─────────────────────┘ │ - │ 1:N - │ - ┌────────────▼────────────┐ - │ Project │ - │─────────────────────────│ - │ id (PK) │ - │ workspace_id (FK) │ - │ name │ - │ description │ - │ repository_url │ - │ default_branch │ - │ created_at │ - └────────┬────────────────┘ - │ - ┌────────────────────┼────────────────────┐ - │ │ │ - │ 1:N │ 1:N │ 1:N - ┌──────▼────────┐ ┌─────▼─────┐ ┌──────▼────────┐ - │ Branch │ │ProjectToken│ │ CodeAnalysis │ - │───────────────│ │────────────│ │───────────────│ - │ id (PK) │ │ id (PK) │ │ id (PK) │ - │ project_id(FK)│ │ project_id │ │ project_id(FK)│ - │ name │ │ token │ │ pr_id (FK) │ - │ created_at │ │ name │ │ status │ - └───┬───────────┘ │ expires_at │ │ created_at │ - │ └────────────┘ │ completed_at │ - │ 1:N └───┬───────────┘ - │ │ - ┌────────┼──────────┐ │ 1:N - │ │ │ │ - │ 1:N │ 1:N │ 1:1 ┌───────▼──────────────┐ -┌───▼──────┐ │ ┌───────▼────────┐ ┌───────┤ CodeAnalysisIssue │ -│BranchFile│ │ │ BranchIssue │ │ │────────────────────│ -│──────────│ │ │────────────────│ │ │ id (PK) │ -│ id (PK) │ │ │ id (PK) │ │ │ analysis_id (FK) │ -│branch(FK)│ │ │ branch_id (FK) │ │ │ file │ -│ path │ │ │ file │ │ │ line │ -│ hash │ │ │ line │ │ │ severity │ -└──────────┘ │ │ severity │ │ │ category │ - │ │ category │ │ │ description │ - │ │ description │ │ │ suggestion │ - │ │ resolved │ │ │ code_snippet │ - │ │ resolved_at │ │ └────────────────────┘ - │ │ created_at │ │ - │ └────────────────┘ │ - │ │ - │ 1:1 │ - ┌────▼───────────┐ ┌───────▼──────┐ - │RagIndexStatus │ │ PullRequest │ - │────────────────│ │──────────────│ - │ id (PK) │ │ id (PK) │ - │ branch_id (FK) │ │ project_id │ - │ status │ │ number │ - │ started_at │ │ title │ - │ completed_at │ │ author │ - │ error_message │ │ source_branch│ - └────────────────┘ │ target_branch│ - │ created_at │ - └──────────────┘ - -┌──────────────────┐ ┌─────────────────────────┐ -│ AIConnection │ │ ProjectVcsConnectionBinding│ -│──────────────────│ │─────────────────────────│ -│ id (PK) │ │ id (PK) │ -│ name │ │ project_id (FK) │ -│ provider │ │ vcs_type │ -│ api_key (enc) │ │ username │ -│ model │ │ app_password (enc) │ -│ configuration │ │ workspace_slug │ -└──────────────────┘ └─────────────────────────┘ - -┌───────────────────┐ -│ AnalysisLock │ -│───────────────────│ -│ id (PK) │ -│ repository │ -│ branch │ -│ locked_at │ -│ locked_by │ -└───────────────────┘ - - ┌────────────────────────┐ - │ Job │ - │────────────────────────│ - │ id (PK) │ - │ external_id (UUID) │ - │ project_id (FK) │ - │ job_type │ - │ status │ - │ trigger_source │ - │ progress │ - │ message │ - │ metadata (JSONB) │ - │ created_at │ - │ started_at │ - │ completed_at │ - │ updated_at │ - └────────┬───────────────┘ - │ - │ 1:N - │ - ┌────────▼───────────────┐ - │ JobLog │ - │────────────────────────│ - │ id (PK) │ - │ external_id (UUID) │ - │ job_id (FK) │ - │ level │ - │ message │ - │ timestamp │ - │ sequence_number │ - │ metadata (JSONB) │ - └────────────────────────┘ -``` - -## Tables - -### User - -User accounts and authentication. - -```sql -CREATE TABLE user ( - id UUID PRIMARY KEY, - username VARCHAR(255) UNIQUE NOT NULL, - email VARCHAR(255) UNIQUE NOT NULL, - password VARCHAR(255) NOT NULL, - roles VARCHAR(255)[] DEFAULT ARRAY['USER'], - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_user_username ON user(username); -CREATE INDEX idx_user_email ON user(email); -``` - -**Fields**: -- `id`: Unique user identifier (UUID) -- `username`: Unique username -- `email`: Unique email address -- `password`: BCrypt hashed password -- `roles`: Array of roles (USER, ADMIN) -- `created_at`: Account creation timestamp -- `updated_at`: Last update timestamp - -### Workspace - -Top-level organizational units. - -```sql -CREATE TABLE workspace ( - id UUID PRIMARY KEY, - name VARCHAR(255) NOT NULL, - description TEXT, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_workspace_name ON workspace(name); -``` - -### WorkspaceMember - -Workspace membership and roles. - -```sql -CREATE TABLE workspace_member ( - id UUID PRIMARY KEY, - workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE, - user_id UUID NOT NULL REFERENCES user(id) ON DELETE CASCADE, - role VARCHAR(50) NOT NULL, - joined_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - UNIQUE(workspace_id, user_id) -); - -CREATE INDEX idx_workspace_member_workspace ON workspace_member(workspace_id); -CREATE INDEX idx_workspace_member_user ON workspace_member(user_id); -``` - -**Roles**: OWNER, ADMIN, MEMBER, VIEWER - -### Project - -Repository projects within workspaces. - -```sql -CREATE TABLE project ( - id UUID PRIMARY KEY, - workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE, - name VARCHAR(255) NOT NULL, - description TEXT, - repository_url VARCHAR(500) NOT NULL, - default_branch VARCHAR(255) DEFAULT 'main', - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_project_workspace ON project(workspace_id); -CREATE INDEX idx_project_name ON project(name); -``` - -### ProjectToken - -Webhook authentication tokens. - -```sql -CREATE TABLE project_token ( - id UUID PRIMARY KEY, - project_id UUID NOT NULL REFERENCES project(id) ON DELETE CASCADE, - token VARCHAR(500) UNIQUE NOT NULL, - name VARCHAR(255), - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - expires_at TIMESTAMP -); - -CREATE INDEX idx_project_token_project ON project_token(project_id); -CREATE INDEX idx_project_token_token ON project_token(token); -``` - -### Branch - -Git branches being tracked. - -```sql -CREATE TABLE branch ( - id UUID PRIMARY KEY, - project_id UUID NOT NULL REFERENCES project(id) ON DELETE CASCADE, - name VARCHAR(255) NOT NULL, - last_commit_hash VARCHAR(255), - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - UNIQUE(project_id, name) -); - -CREATE INDEX idx_branch_project ON branch(project_id); -CREATE INDEX idx_branch_name ON branch(name); -``` - -### BranchFile - -Files tracked in branches. - -```sql -CREATE TABLE branch_file ( - id UUID PRIMARY KEY, - branch_id UUID NOT NULL REFERENCES branch(id) ON DELETE CASCADE, - path VARCHAR(1000) NOT NULL, - file_hash VARCHAR(255), - last_modified TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - UNIQUE(branch_id, path) -); - -CREATE INDEX idx_branch_file_branch ON branch_file(branch_id); -CREATE INDEX idx_branch_file_path ON branch_file(path); -``` - -### BranchIssue - -Issues found in branch analysis. - -```sql -CREATE TABLE branch_issue ( - id UUID PRIMARY KEY, - branch_id UUID NOT NULL REFERENCES branch(id) ON DELETE CASCADE, - file VARCHAR(1000) NOT NULL, - line INTEGER, - severity VARCHAR(50) NOT NULL, - category VARCHAR(100) NOT NULL, - description TEXT NOT NULL, - suggestion TEXT, - code_snippet TEXT, - resolved BOOLEAN DEFAULT FALSE, - resolved_at TIMESTAMP, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_branch_issue_branch ON branch_issue(branch_id); -CREATE INDEX idx_branch_issue_resolved ON branch_issue(resolved); -CREATE INDEX idx_branch_issue_severity ON branch_issue(severity); -CREATE INDEX idx_branch_issue_file ON branch_issue(file); -``` - -### RagIndexStatus - -RAG indexing status per branch. - -```sql -CREATE TABLE rag_index_status ( - id UUID PRIMARY KEY, - branch_id UUID UNIQUE NOT NULL REFERENCES branch(id) ON DELETE CASCADE, - status VARCHAR(50) NOT NULL, - started_at TIMESTAMP, - completed_at TIMESTAMP, - error_message TEXT, - total_files INTEGER, - indexed_files INTEGER -); - -CREATE INDEX idx_rag_index_branch ON rag_index_status(branch_id); -CREATE INDEX idx_rag_index_status ON rag_index_status(status); -``` - -**Status Values**: PENDING, IN_PROGRESS, COMPLETED, FAILED - -### PullRequest - -Pull request metadata. - -```sql -CREATE TABLE pull_request ( - id UUID PRIMARY KEY, - project_id UUID NOT NULL REFERENCES project(id) ON DELETE CASCADE, - number INTEGER NOT NULL, - title VARCHAR(500), - description TEXT, - author VARCHAR(255), - source_branch VARCHAR(255) NOT NULL, - target_branch VARCHAR(255) NOT NULL, - status VARCHAR(50) NOT NULL, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - UNIQUE(project_id, number) -); - -CREATE INDEX idx_pull_request_project ON pull_request(project_id); -CREATE INDEX idx_pull_request_number ON pull_request(project_id, number); -CREATE INDEX idx_pull_request_status ON pull_request(status); -``` - -**Status Values**: OPEN, MERGED, DECLINED - -### CodeAnalysis - -Pull request analysis records. - -```sql -CREATE TABLE code_analysis ( - id UUID PRIMARY KEY, - project_id UUID NOT NULL REFERENCES project(id) ON DELETE CASCADE, - pull_request_id UUID REFERENCES pull_request(id) ON DELETE CASCADE, - status VARCHAR(50) NOT NULL, - total_issues INTEGER DEFAULT 0, - high_severity INTEGER DEFAULT 0, - medium_severity INTEGER DEFAULT 0, - low_severity INTEGER DEFAULT 0, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - completed_at TIMESTAMP -); - -CREATE INDEX idx_code_analysis_project ON code_analysis(project_id); -CREATE INDEX idx_code_analysis_pr ON code_analysis(pull_request_id); -CREATE INDEX idx_code_analysis_status ON code_analysis(status); -CREATE INDEX idx_code_analysis_created ON code_analysis(created_at DESC); -``` - -**Status Values**: PENDING, IN_PROGRESS, COMPLETED, FAILED - -### CodeAnalysisIssue - -Issues found in PR analysis. - -```sql -CREATE TABLE code_analysis_issue ( - id UUID PRIMARY KEY, - analysis_id UUID NOT NULL REFERENCES code_analysis(id) ON DELETE CASCADE, - file VARCHAR(1000) NOT NULL, - line INTEGER, - severity VARCHAR(50) NOT NULL, - category VARCHAR(100) NOT NULL, - description TEXT NOT NULL, - suggestion TEXT, - code_snippet TEXT, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_code_analysis_issue_analysis ON code_analysis_issue(analysis_id); -CREATE INDEX idx_code_analysis_issue_severity ON code_analysis_issue(severity); -CREATE INDEX idx_code_analysis_issue_category ON code_analysis_issue(category); -CREATE INDEX idx_code_analysis_issue_file ON code_analysis_issue(file); -``` - -### AIConnection - -AI provider configurations. - -```sql -CREATE TABLE ai_connection ( - id UUID PRIMARY KEY, - name VARCHAR(255) NOT NULL, - provider VARCHAR(100) NOT NULL, - api_key VARCHAR(1000) NOT NULL, - model VARCHAR(255), - configuration JSONB, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_ai_connection_provider ON ai_connection(provider); -``` - -**api_key**: Encrypted with AES - -### ProjectVcsConnectionBinding - -Legacy VCS connection per project (OAuth consumer credentials). - -```sql -CREATE TABLE project_vcs_connection_binding ( - id UUID PRIMARY KEY, - project_id UUID UNIQUE NOT NULL REFERENCES project(id) ON DELETE CASCADE, - vcs_type VARCHAR(50) NOT NULL, - username VARCHAR(255), - app_password VARCHAR(1000), - workspace_slug VARCHAR(255), - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_vcs_binding_project ON project_vcs_connection_binding(project_id); -``` - -**app_password**: Encrypted with AES - -### VcsConnection - -Workspace-level VCS connections supporting multiple connection types. - -```sql -CREATE TABLE vcs_connection ( - id BIGSERIAL PRIMARY KEY, - workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE, - connection_name VARCHAR(255) NOT NULL, - setup_status VARCHAR(50), - provider_type VARCHAR(50), - connection_type VARCHAR(50), - external_workspace_id VARCHAR(128), - external_workspace_slug VARCHAR(256), - installation_id VARCHAR(128), - access_token VARCHAR(1024), - refresh_token VARCHAR(1024), - token_expires_at TIMESTAMP, - scopes VARCHAR(512), - repo_count INTEGER DEFAULT 0, - configuration JSONB, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_vcs_connection_workspace ON vcs_connection(workspace_id); -CREATE INDEX idx_vcs_connection_provider ON vcs_connection(workspace_id, provider_type); -CREATE INDEX idx_vcs_connection_external ON vcs_connection(provider_type, external_workspace_id); -``` - -**Connection Types**: -- `APP`: OAuth 2.0 App installation with token refresh -- `OAUTH_MANUAL`: User-initiated OAuth consumer -- `PERSONAL_TOKEN`: Personal access token - -**Token Fields**: `access_token` and `refresh_token` are encrypted with AES. - -### VcsRepoBinding - -Binds CodeCrow projects to external VCS repositories (used by APP connections). - -```sql -CREATE TABLE vcs_repo_binding ( - id BIGSERIAL PRIMARY KEY, - workspace_id UUID NOT NULL REFERENCES workspace(id) ON DELETE CASCADE, - project_id UUID UNIQUE NOT NULL REFERENCES project(id) ON DELETE CASCADE, - vcs_connection_id BIGINT NOT NULL REFERENCES vcs_connection(id), - provider VARCHAR(32) NOT NULL, - external_repo_id VARCHAR(128) NOT NULL, - external_repo_slug VARCHAR(256), - external_namespace VARCHAR(256), - display_name VARCHAR(256), - default_branch VARCHAR(128), - webhooks_configured BOOLEAN DEFAULT FALSE, - webhook_id VARCHAR(256), - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - UNIQUE(provider, external_repo_id) -); - -CREATE INDEX idx_vcs_repo_binding_project ON vcs_repo_binding(project_id); -CREATE INDEX idx_vcs_repo_binding_workspace ON vcs_repo_binding(workspace_id); -CREATE INDEX idx_vcs_repo_binding_connection ON vcs_repo_binding(vcs_connection_id); -CREATE INDEX idx_vcs_repo_binding_external ON vcs_repo_binding(provider, external_repo_id); -``` - -**Key Fields**: -- `external_namespace`: Workspace/organization slug (e.g., "my-workspace") -- `external_repo_slug`: Repository slug (e.g., "my-repo") -- `external_repo_id`: Stable repository UUID from provider - -### AnalysisLock - -Analysis locking mechanism. - -```sql -CREATE TABLE analysis_lock ( - id UUID PRIMARY KEY, - repository VARCHAR(500) NOT NULL, - branch VARCHAR(255) NOT NULL, - locked_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - locked_by VARCHAR(255), - UNIQUE(repository, branch) -); - -CREATE INDEX idx_analysis_lock_repo_branch ON analysis_lock(repository, branch); -CREATE INDEX idx_analysis_lock_timestamp ON analysis_lock(locked_at); -``` - -### Job - -Background job tracking for analyses and indexing operations. - -```sql -CREATE TABLE job ( - id BIGSERIAL PRIMARY KEY, - external_id UUID NOT NULL UNIQUE, - project_id UUID NOT NULL REFERENCES project(id) ON DELETE CASCADE, - job_type VARCHAR(50) NOT NULL, - status VARCHAR(50) NOT NULL, - trigger_source VARCHAR(50) NOT NULL, - progress INTEGER DEFAULT 0, - message VARCHAR(1000), - metadata JSONB, - created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - started_at TIMESTAMP, - completed_at TIMESTAMP, - updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP -); - -CREATE INDEX idx_job_project ON job(project_id); -CREATE INDEX idx_job_external_id ON job(external_id); -CREATE INDEX idx_job_status ON job(status); -CREATE INDEX idx_job_type ON job(job_type); -CREATE INDEX idx_job_created_at ON job(created_at DESC); -CREATE INDEX idx_job_project_status ON job(project_id, status); -``` - -**Job Types**: -- `PR_ANALYSIS`: Pull request code analysis -- `BRANCH_ANALYSIS`: Full branch analysis -- `RAG_INITIAL_INDEX`: Initial RAG indexing -- `RAG_INCREMENTAL_INDEX`: Incremental RAG update -- `CODE_REVIEW`: Manual code review - -**Job Status Values**: -- `PENDING`: Job created, waiting to start -- `QUEUED`: Job queued for processing -- `RUNNING`: Job actively executing -- `COMPLETED`: Job finished successfully -- `FAILED`: Job failed with error -- `CANCELLED`: Job cancelled by user - -**Trigger Sources**: -- `WEBHOOK`: Triggered by VCS webhook -- `PIPELINE`: Triggered by Bitbucket Pipeline -- `API`: Triggered via REST API -- `MANUAL`: Triggered manually from UI -- `SCHEDULED`: Triggered by scheduler - -### JobLog - -Log entries for background jobs. - -```sql -CREATE TABLE job_log ( - id BIGSERIAL PRIMARY KEY, - external_id UUID NOT NULL UNIQUE, - job_id BIGINT NOT NULL REFERENCES job(id) ON DELETE CASCADE, - level VARCHAR(20) NOT NULL, - message TEXT NOT NULL, - timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP, - sequence_number BIGINT NOT NULL, - metadata JSONB -); - -CREATE INDEX idx_job_log_job ON job_log(job_id); -CREATE INDEX idx_job_log_external_id ON job_log(external_id); -CREATE INDEX idx_job_log_job_sequence ON job_log(job_id, sequence_number); -CREATE INDEX idx_job_log_timestamp ON job_log(timestamp DESC); -CREATE INDEX idx_job_log_level ON job_log(job_id, level); -``` - -**Log Levels**: DEBUG, INFO, WARN, ERROR - -**Sequence Number**: Auto-incrementing per-job sequence for ordered log retrieval and SSE streaming pagination. - -## Data Encryption - -Sensitive fields are encrypted using AES-256: - -- `ai_connection.api_key` -- `project_vcs_connection_binding.app_password` - -Encryption key configured in `application.properties`: -```properties -codecrow.security.encryption-key= -``` - -## Database Migrations - -Currently using Hibernate auto-DDL: -```properties -spring.jpa.hibernate.ddl-auto=update -``` - -For production, consider: -- Flyway for versioned migrations -- Liquibase for database-agnostic migrations -- Manual migrations with version control - -## Backup Strategy - -### Full Backup - -```bash -docker exec codecrow-postgres pg_dump -U codecrow_user codecrow_ai > backup.sql -``` - -### Restore - -```bash -cat backup.sql | docker exec -i codecrow-postgres psql -U codecrow_user -d codecrow_ai -``` - -### Automated Backups - -Setup cron job for daily backups: -```bash -0 2 * * * docker exec codecrow-postgres pg_dump -U codecrow_user codecrow_ai | gzip > /backups/codecrow_$(date +\%Y\%m\%d).sql.gz -``` - -## Performance Tuning - -### Index Optimization - -Key indexes already defined. Monitor query performance and add as needed: - -```sql --- Example: Add composite index -CREATE INDEX idx_branch_issue_branch_resolved ON branch_issue(branch_id, resolved); - --- Example: Partial index for active issues -CREATE INDEX idx_branch_issue_active ON branch_issue(branch_id) WHERE resolved = FALSE; -``` - -### Query Optimization - -Use EXPLAIN to analyze slow queries: -```sql -EXPLAIN ANALYZE SELECT * FROM branch_issue WHERE branch_id = '...' AND resolved = FALSE; -``` - -### Connection Pooling - -Configure HikariCP (default in Spring Boot): -```properties -spring.datasource.hikari.maximum-pool-size=10 -spring.datasource.hikari.minimum-idle=5 -spring.datasource.hikari.connection-timeout=30000 -``` - -### Maintenance - -Regular maintenance tasks: -```sql --- Vacuum to reclaim storage -VACUUM ANALYZE branch_issue; - --- Reindex for better performance -REINDEX TABLE branch_issue; - --- Update statistics -ANALYZE branch_issue; -``` - -## Data Retention - -Consider implementing data retention policies: - -```sql --- Archive old resolved issues -DELETE FROM branch_issue -WHERE resolved = TRUE -AND resolved_at < NOW() - INTERVAL '1 year'; - --- Archive old analyses -DELETE FROM code_analysis -WHERE completed_at < NOW() - INTERVAL '6 months' -AND status = 'COMPLETED'; -``` - -## Monitoring - -Monitor database health: - -```sql --- Check database size -SELECT pg_size_pretty(pg_database_size('codecrow_ai')); - --- Check table sizes -SELECT schemaname, tablename, - pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) -FROM pg_tables -WHERE schemaname = 'public' -ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; - --- Check active connections -SELECT count(*) FROM pg_stat_activity WHERE datname = 'codecrow_ai'; -``` - diff --git a/docs/09-deployment.md b/docs/09-deployment.md deleted file mode 100644 index 9662a66..0000000 --- a/docs/09-deployment.md +++ /dev/null @@ -1,675 +0,0 @@ -# Deployment Guide - -Production deployment guide for CodeCrow. - -## Prerequisites - -- Linux server (Ubuntu 20.04+ or similar) -- Docker 20.10+ -- Docker Compose v2.0+ -- Domain name with DNS configured -- SSL certificate (Let's Encrypt recommended) -- 8GB+ RAM -- 4+ CPU cores -- 100GB+ disk space - -## Pre-Deployment Checklist - -- [ ] Server provisioned and accessible -- [ ] Docker and Docker Compose installed -- [ ] Domain DNS configured -- [ ] SSL certificates obtained -- [ ] OpenRouter API key obtained -- [ ] Bitbucket workspace access configured -- [ ] Google OAuth Client ID configured (optional - for social login) -- [ ] Firewall rules planned -- [ ] Backup strategy defined - -## Installation Steps - -### 1. Clone Repository - -```bash -git clone /opt/codecrow -cd /opt/codecrow -``` - -### 2. Configure Environment - -```bash -# Copy sample configurations -cp deployment/docker-compose-sample.yml deployment/docker-compose.yml -cp deployment/config/java-shared/application.properties.sample \ - deployment/config/java-shared/application.properties -cp deployment/config/mcp-client/.env.sample \ - deployment/config/mcp-client/.env -cp deployment/config/rag-pipeline/.env.sample \ - deployment/config/rag-pipeline/.env -cp deployment/config/web-frontend/.env.sample \ - deployment/config/web-frontend/.env -``` - -### 3. Generate Secrets - -```bash -# Generate JWT secret (256-bit) -JWT_SECRET=$(openssl rand -base64 32) -echo "JWT Secret: $JWT_SECRET" - -# Generate encryption key (256-bit) -ENCRYPTION_KEY=$(openssl rand -base64 32) -echo "Encryption Key: $ENCRYPTION_KEY" - -# Generate strong database password -DB_PASSWORD=$(openssl rand -base64 24) -echo "Database Password: $DB_PASSWORD" -``` - -**Store these securely** - you'll need them for configuration. - -### 4. Update Configuration - -**deployment/config/java-shared/application.properties**: -```properties -codecrow.security.jwtSecret= -codecrow.security.encryption-key= -codecrow.web.base.url=https://codecrow.example.com -codecrow.mcp.client.url=http://mcp-client:8000/review -codecrow.rag.api.url=http://rag-pipeline:8001 - -# Google OAuth (optional - for social login) -codecrow.oauth.google.client-id=.apps.googleusercontent.com -``` - -**deployment/config/rag-pipeline/.env**: -```bash -OPENROUTER_API_KEY=sk-or-v1-your-actual-key -QDRANT_URL=http://qdrant:6333 -``` - -**deployment/config/web-frontend/.env**: -```bash -VITE_API_URL=https://codecrow.example.com/api -VITE_WEBHOOK_URL=https://codecrow.example.com/webhook - -# Google OAuth (optional - for social login) -VITE_GOOGLE_CLIENT_ID=.apps.googleusercontent.com -``` - -**deployment/docker-compose.yml**: - -Update database credentials: -```yaml -postgres: - environment: - POSTGRES_PASSWORD: - -web-server: - environment: - SPRING_DATASOURCE_PASSWORD: - -pipeline-agent: - environment: - SPRING_DATASOURCE_PASSWORD: -``` - -### 5. Build and Start Services - -```bash -cd /opt/codecrow -./tools/production-build.sh -``` - -This script: -- Builds Java artifacts -- Copies MCP servers JAR -- Starts all Docker containers -- Waits for services to be healthy - -### 6. Verify Services - -```bash -cd deployment -docker compose ps -``` - -All services should show status "Up (healthy)". - -```bash -# Check logs -docker compose logs -f web-server -docker compose logs -f pipeline-agent -``` - -### 7. Setup Reverse Proxy - -#### Nginx Configuration - -Create `/etc/nginx/sites-available/codecrow`: - -```nginx -# Frontend -server { - listen 80; - listen [::]:80; - server_name codecrow.example.com; - return 301 https://$server_name$request_uri; -} - -server { - listen 443 ssl http2; - listen [::]:443 ssl http2; - server_name codecrow.example.com; - - ssl_certificate /etc/letsencrypt/live/codecrow.example.com/fullchain.pem; - ssl_certificate_key /etc/letsencrypt/live/codecrow.example.com/privkey.pem; - ssl_protocols TLSv1.2 TLSv1.3; - ssl_ciphers HIGH:!aNULL:!MD5; - - # Frontend - location / { - proxy_pass http://localhost:8080; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - } - - # API - location /api { - proxy_pass http://localhost:8081; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - - # Increase timeouts for long-running requests - proxy_read_timeout 300s; - proxy_connect_timeout 75s; - } - - # Webhooks (restrict to Bitbucket IPs) - location /webhook { - # Bitbucket Cloud IP ranges - allow 104.192.136.0/21; - allow 185.166.140.0/22; - deny all; - - proxy_pass http://localhost:8082; - proxy_set_header Host $host; - proxy_set_header X-Real-IP $remote_addr; - proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; - proxy_set_header X-Forwarded-Proto $scheme; - - proxy_read_timeout 600s; - } -} -``` - -Enable site: -```bash -ln -s /etc/nginx/sites-available/codecrow /etc/nginx/sites-enabled/ -nginx -t -systemctl reload nginx -``` - -#### SSL with Let's Encrypt - -```bash -apt-get install certbot python3-certbot-nginx -certbot --nginx -d codecrow.example.com -``` - -### 8. Configure Firewall - -```bash -# Allow SSH -ufw allow 22/tcp - -# Allow HTTP/HTTPS -ufw allow 80/tcp -ufw allow 443/tcp - -# Block direct access to services -ufw deny 5432/tcp # PostgreSQL -ufw deny 6379/tcp # Redis -ufw deny 6333/tcp # Qdrant -ufw deny 8000/tcp # MCP Client -ufw deny 8001/tcp # RAG Pipeline -ufw deny 8080/tcp # Frontend -ufw deny 8081/tcp # Web Server -ufw deny 8082/tcp # Pipeline Agent - -# Enable firewall -ufw enable -``` - -### 9. Create First Admin User - -Connect to database: -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai -``` - -Create admin user: -```sql --- Generate password hash (use bcrypt generator or Spring Boot) --- Example password: admin123 (change this!) -INSERT INTO "user" (id, username, email, password, roles, created_at) -VALUES ( - gen_random_uuid(), - 'admin', - 'admin@example.com', - '$2a$10$encrypted_password_here', - ARRAY['USER', 'ADMIN'], - NOW() -); -``` - -Or use Spring Boot's password encoder programmatically. - -### 10. Setup Monitoring - -#### Docker Health Monitoring - -Create `/opt/codecrow/scripts/health-check.sh`: -```bash -#!/bin/bash - -SERVICES="codecrow-postgres codecrow-redis codecrow-qdrant codecrow-web-application codecrow-pipeline-agent codecrow-mcp-client codecrow-rag-pipeline codecrow-web-frontend" - -for service in $SERVICES; do - if ! docker ps | grep -q $service; then - echo "ALERT: $service is down!" - # Send alert (email, Slack, etc.) - fi -done -``` - -Add to crontab: -```bash -*/5 * * * * /opt/codecrow/scripts/health-check.sh -``` - -#### Log Rotation - -Configure log rotation in `/etc/logrotate.d/codecrow`: -``` -/var/lib/docker/volumes/web_logs/_data/*.log -/var/lib/docker/volumes/pipeline_agent_logs/_data/*.log -/var/lib/docker/volumes/web_frontend_logs/_data/*.log -/var/lib/docker/volumes/rag_logs/_data/*.log -{ - daily - rotate 14 - compress - delaycompress - notifempty - missingok - copytruncate -} -``` - -### 11. Backup Configuration - -Create backup script `/opt/codecrow/scripts/backup.sh`: -```bash -#!/bin/bash - -BACKUP_DIR=/backups/codecrow -DATE=$(date +%Y%m%d_%H%M%S) - -mkdir -p $BACKUP_DIR - -# Database backup -docker exec codecrow-postgres pg_dump -U codecrow_user codecrow_ai | \ - gzip > $BACKUP_DIR/db_$DATE.sql.gz - -# Qdrant backup -docker exec codecrow-qdrant tar czf - /qdrant/storage | \ - cat > $BACKUP_DIR/qdrant_$DATE.tar.gz - -# Configuration backup -tar czf $BACKUP_DIR/config_$DATE.tar.gz /opt/codecrow/deployment/config - -# Cleanup old backups (keep 30 days) -find $BACKUP_DIR -name "*.gz" -mtime +30 -delete - -echo "Backup completed: $DATE" -``` - -Schedule daily backups: -```bash -0 2 * * * /opt/codecrow/scripts/backup.sh -``` - -## Production Configuration Tuning - -### Database - -**deployment/docker-compose.yml** - PostgreSQL: -```yaml -postgres: - command: - - postgres - - -c - - max_connections=200 - - -c - - shared_buffers=256MB - - -c - - effective_cache_size=1GB - - -c - - work_mem=8MB -``` - -### Java Services - -**deployment/config/java-shared/application.properties**: -```properties -# Production settings -spring.jpa.hibernate.ddl-auto=validate -spring.jpa.show-sql=false -logging.level.org.rostilos.codecrow=INFO - -# Connection pool -spring.datasource.hikari.maximum-pool-size=20 -spring.datasource.hikari.minimum-idle=10 -spring.datasource.hikari.connection-timeout=30000 -``` - -**deployment/docker-compose.yml** - JVM options: -```yaml -web-server: - environment: - JAVA_OPTS: "-Xmx2G -Xms1G -XX:+UseG1GC" - -pipeline-agent: - environment: - JAVA_OPTS: "-Xmx2G -Xms1G -XX:+UseG1GC" -``` - -### Resource Limits - -```yaml -services: - web-server: - deploy: - resources: - limits: - cpus: '2.0' - memory: 3G - reservations: - cpus: '1.0' - memory: 2G - - pipeline-agent: - deploy: - resources: - limits: - cpus: '2.0' - memory: 3G - reservations: - cpus: '1.0' - memory: 2G - - mcp-client: - deploy: - resources: - limits: - cpus: '1.0' - memory: 2G - reservations: - cpus: '0.5' - memory: 1G - - rag-pipeline: - deploy: - resources: - limits: - cpus: '2.0' - memory: 4G - reservations: - cpus: '1.0' - memory: 2G -``` - -## Security Hardening - -### 1. Secrets Management - -Use Docker secrets or external secret manager: - -```yaml -secrets: - db_password: - file: ./secrets/db_password.txt - jwt_secret: - file: ./secrets/jwt_secret.txt - -services: - web-server: - secrets: - - db_password - - jwt_secret -``` - -### 2. Network Isolation - -```yaml -networks: - frontend: - backend: - internal: - -services: - web-frontend: - networks: - - frontend - - web-server: - networks: - - frontend - - backend - - postgres: - networks: - - internal -``` - -### 3. Read-Only Root Filesystem - -```yaml -services: - web-server: - read_only: true - tmpfs: - - /tmp - - /app/logs -``` - -### 4. Run as Non-Root - -Update Dockerfiles: -```dockerfile -RUN groupadd -r codecrow && useradd -r -g codecrow codecrow -USER codecrow -``` - -### 5. Security Scanning - -```bash -# Scan images for vulnerabilities -docker scan codecrow-web-server -docker scan codecrow-pipeline-agent -``` - -## Scaling - -### Horizontal Scaling - -**Web Server** (stateless): -```yaml -web-server: - deploy: - replicas: 3 -``` - -Use load balancer (nginx, HAProxy) to distribute traffic. - -**Pipeline Agent** (use queue-based distribution): -- Setup message queue (RabbitMQ, Redis Queue) -- Multiple workers consume from queue -- Each worker acquires lock before processing - -### Vertical Scaling - -Increase resources in docker-compose.yml or use Kubernetes for auto-scaling. - -## Monitoring & Observability - -### Prometheus + Grafana - -Add monitoring stack: -```yaml -prometheus: - image: prom/prometheus - volumes: - - ./prometheus.yml:/etc/prometheus/prometheus.yml - ports: - - "9090:9090" - -grafana: - image: grafana/grafana - ports: - - "3000:3000" - volumes: - - grafana_data:/var/lib/grafana -``` - -Spring Boot Actuator exposes metrics at `/actuator/prometheus`. - -### Application Logs - -Centralized logging with ELK stack or Loki + Grafana. - -### Alerts - -Setup alerts for: -- Service down -- High error rate -- Database connection failures -- Disk space low -- High memory usage -- Analysis failures - -## Troubleshooting Production Issues - -### Service Won't Start - -```bash -docker compose logs -docker inspect -``` - -### Database Connection Issues - -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai -# Test queries -SELECT version(); -SELECT count(*) FROM "user"; -``` - -### High Memory Usage - -```bash -docker stats -# Adjust memory limits in docker-compose.yml -``` - -### Slow Analysis - -- Check RAG indexing performance -- Monitor OpenRouter API latency -- Verify network connectivity -- Check database query performance - -### Webhook Not Received - -- Verify firewall allows Bitbucket IPs -- Check nginx logs: `tail -f /var/log/nginx/access.log` -- Verify webhook configuration in Bitbucket -- Check project token is valid - -## Disaster Recovery - -### Full System Recovery - -1. Restore configuration files -2. Restore database from backup -3. Restore Qdrant data -4. Start services -5. Verify functionality - -```bash -# Restore database -gunzip < /backups/codecrow/db_20240115.sql.gz | \ - docker exec -i codecrow-postgres psql -U codecrow_user -d codecrow_ai - -# Restore Qdrant -docker exec codecrow-qdrant rm -rf /qdrant/storage/* -gunzip < /backups/codecrow/qdrant_20240115.tar.gz | \ - docker exec -i codecrow-qdrant tar xzf - -C / -``` - -## Maintenance - -### Update Application - -```bash -cd /opt/codecrow -git pull -./tools/production-build.sh -``` - -### Database Maintenance - -```bash -docker exec codecrow-postgres psql -U codecrow_user -d codecrow_ai -c "VACUUM ANALYZE;" -``` - -### Clear Old Data - -```sql --- Remove old resolved issues -DELETE FROM branch_issue WHERE resolved = TRUE AND resolved_at < NOW() - INTERVAL '90 days'; -``` - -### Update Dependencies - -```bash -# Java -cd java-ecosystem -mvn versions:display-dependency-updates - -# Python -cd python-ecosystem/mcp-client -pip list --outdated -``` - -## Cost Optimization - -### OpenRouter - -- Monitor API usage -- Use cheaper models for non-critical analysis -- Cache embeddings where possible -- Implement rate limiting - -### Infrastructure - -- Right-size server resources -- Use spot instances for non-production -- Implement data retention policies -- Compress logs and backups - diff --git a/docs/10-development.md b/docs/10-development.md deleted file mode 100644 index 1e8eda8..0000000 --- a/docs/10-development.md +++ /dev/null @@ -1,715 +0,0 @@ -# Development Guide - -Guide for developing and contributing to CodeCrow. - -## Development Environment Setup - -### Prerequisites - -- Java 17 JDK -- Maven 3.8+ -- Node.js 18+ (with npm or bun) -- Python 3.10+ -- Docker & Docker Compose -- Git -- IDE (IntelliJ IDEA recommended) - -### Initial Setup - -#### 1. Clone Repository - -```bash -git clone -cd codecrow -``` - -#### 2. Setup Java Development - -**IntelliJ IDEA**: -1. Open project from `java-ecosystem/pom.xml` -2. Install Lombok plugin -3. Enable annotation processing (Preferences → Build → Compiler → Annotation Processors) -4. Configure Java 17 SDK -5. Import Maven dependencies - -**Eclipse**: -1. Install Lombok -2. Import as Maven project -3. Configure Java 17 - -#### 3. Setup Python Development - -```bash -# MCP Client -cd python-ecosystem/mcp-client -python -m venv venv -source venv/bin/activate # Windows: venv\Scripts\activate -pip install -r requirements.txt - -# RAG Pipeline -cd ../rag-pipeline -python -m venv venv -source venv/bin/activate -pip install -r requirements.txt -``` - -#### 4. Setup Frontend Development - -```bash -cd frontend -npm install -# Or with bun -bun install -``` - -#### 5. Start Infrastructure Services - -```bash -cd deployment -docker compose up -d postgres redis qdrant -``` - -This starts only the infrastructure services, allowing you to run application services locally. - -### Running Services Locally - -#### Web Server - -```bash -cd java-ecosystem/services/web-server -mvn spring-boot:run -``` - -Access: `http://localhost:8081` -Swagger UI: `http://localhost:8081/swagger-ui-custom.html` - -#### Pipeline Agent - -```bash -cd java-ecosystem/services/pipeline-agent -mvn spring-boot:run -``` - -Access: `http://localhost:8082` - -#### MCP Client - -```bash -cd python-ecosystem/mcp-client -source venv/bin/activate -cp .env.sample .env -# Edit .env with configuration -uvicorn main:app --reload --host 0.0.0.0 --port 8000 -``` - -Access: `http://localhost:8000` - -#### RAG Pipeline - -```bash -cd python-ecosystem/rag-pipeline -source venv/bin/activate -cp .env.sample .env -# Edit .env with Qdrant URL and OpenRouter key -uvicorn main:app --reload --host 0.0.0.0 --port 8001 -``` - -Access: `http://localhost:8001` - -#### Frontend - -```bash -cd frontend -npm run dev -# Or -bun run dev -``` - -Access: `http://localhost:5173` - -## Development Workflow - -### Branch Strategy - -- `main`: Production-ready code -- `develop`: Integration branch -- `feature/*`: Feature development -- `bugfix/*`: Bug fixes -- `hotfix/*`: Production hotfixes - -### Commit Convention - -Use conventional commits: - -``` -feat: Add user authentication -fix: Resolve database connection issue -docs: Update API documentation -refactor: Simplify analysis service -test: Add unit tests for project service -chore: Update dependencies -``` - -### Pull Request Process - -1. Create feature branch from `develop` -2. Implement changes with tests -3. Update documentation -4. Create pull request -5. Code review -6. Merge after approval - -## Code Standards - -### Java - -**Code Style**: -- Google Java Style Guide -- Use Lombok for boilerplate reduction -- Follow Spring Boot best practices - -**Example**: -```java -@Service -@RequiredArgsConstructor -@Slf4j -public class ProjectService { - - private final ProjectRepository projectRepository; - - @Transactional(readOnly = true) - public Project findById(UUID id) { - return projectRepository.findById(id) - .orElseThrow(() -> new ResourceNotFoundException("Project not found")); - } - - @Transactional - public Project create(ProjectCreateRequest request) { - log.info("Creating project: {}", request.getName()); - - Project project = Project.builder() - .name(request.getName()) - .description(request.getDescription()) - .build(); - - return projectRepository.save(project); - } -} -``` - -**Testing**: -```java -@SpringBootTest -@Transactional -class ProjectServiceTest { - - @Autowired - private ProjectService projectService; - - @Test - void shouldCreateProject() { - ProjectCreateRequest request = new ProjectCreateRequest(); - request.setName("Test Project"); - - Project project = projectService.create(request); - - assertNotNull(project.getId()); - assertEquals("Test Project", project.getName()); - } -} -``` - -### Python - -**Code Style**: -- PEP 8 -- Use type hints -- Docstrings for functions/classes - -**Example**: -```python -from typing import List -from pydantic import BaseModel - -class AnalysisRequest(BaseModel): - """Request model for code analysis.""" - - project_id: str - files: List[str] - - class Config: - schema_extra = { - "example": { - "project_id": "proj-123", - "files": ["src/main.py"] - } - } - -async def analyze_code(request: AnalysisRequest) -> AnalysisResponse: - """ - Analyze code for issues. - - Args: - request: Analysis request with project and files - - Returns: - Analysis response with issues found - - Raises: - ValidationError: If request is invalid - """ - # Implementation - pass -``` - -**Testing**: -```python -import pytest -from fastapi.testclient import TestClient - -def test_analyze_endpoint(client: TestClient): - """Test analysis endpoint.""" - response = client.post( - "/analyze", - json={"project_id": "test", "files": ["test.py"]} - ) - assert response.status_code == 200 - assert "issues" in response.json() -``` - -### TypeScript/React - -**Code Style**: -- ESLint configuration -- Prettier formatting -- Functional components with hooks - -**Example**: -```typescript -interface ProjectCardProps { - project: Project; - onSelect: (id: string) => void; -} - -export const ProjectCard: React.FC = ({ - project, - onSelect -}) => { - const handleClick = () => { - onSelect(project.id); - }; - - return ( - - - {project.name} - - -

{project.description}

-
-
- ); -}; -``` - -## Testing (TODO) - -### Java Unit Tests - -```bash -# Run all tests -cd java-ecosystem -mvn test - -# Run specific module -cd services/web-server -mvn test - -# Run specific test class -mvn test -Dtest=ProjectServiceTest - -# Skip tests during build -mvn clean package -DskipTests -``` - -### Python Tests - -```bash -# MCP Client tests -cd python-ecosystem/mcp-client -pytest - -# RAG Pipeline tests -cd ../rag-pipeline -pytest - -# With coverage -pytest --cov=src --cov-report=html -``` - -### Frontend Tests - -```bash -cd frontend -npm test -# Or -bun test -``` - -### Integration Tests - -```bash -# Start all services with Docker Compose -cd deployment -docker compose up -d - -# Run integration tests -cd ../ -./scripts/integration-tests.sh -``` - -## Debugging - -### Java Services - -**IntelliJ IDEA Remote Debug**: -1. Services expose debug ports (5005, 5006) -2. Create Remote JVM Debug configuration -3. Set host: localhost, port: 5005 or 5006 -4. Set breakpoints -5. Run debug configuration - -**Docker Container Debug**: -```yaml -web-server: - environment: - JAVA_OPTS: "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5005" - ports: - - "5005:5005" -``` - -### Python Services - -**VSCode Debug Configuration**: -```json -{ - "version": "0.2.0", - "configurations": [ - { - "name": "MCP Client", - "type": "python", - "request": "launch", - "module": "uvicorn", - "args": ["main:app", "--reload"], - "cwd": "${workspaceFolder}/python-ecosystem/mcp-client" - } - ] -} -``` - -**Print Debugging**: -```python -import logging -logger = logging.getLogger(__name__) -logger.debug("Variable value: %s", variable) -``` - -### Frontend - -**React DevTools**: -- Install browser extension -- Inspect component tree -- View props and state - -**Network Tab**: -- Monitor API requests -- Check request/response payloads -- Verify authentication headers - -## Database Development - -### Schema Changes - -1. Update JPA entity -2. Hibernate auto-generates schema changes (development) -3. For production, create manual migration - -**Example Migration** (Flyway): -```sql --- V1.1__add_analysis_duration.sql -ALTER TABLE code_analysis -ADD COLUMN duration_ms INTEGER; -``` - -### Test Data - -Create test data script: -```sql --- test-data.sql -INSERT INTO "user" (id, username, email, password, roles) -VALUES ( - gen_random_uuid(), - 'testuser', - 'test@example.com', - '$2a$10$...', - ARRAY['USER'] -); -``` - -Load test data: -```bash -cat test-data.sql | docker exec -i codecrow-postgres \ - psql -U codecrow_user -d codecrow_ai -``` - -### Database Console - -```bash -# Connect to database -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai - -# Common queries -\dt -- List tables -\d table_name -- Describe table -SELECT * FROM "user"; -- Query -``` - -## API Development - -### Adding New Endpoint - -1. **Define DTO**: -```java -@Data -public class ProjectCreateRequest { - @NotNull - private String name; - private String description; -} -``` - -2. **Update Service**: -```java -@Service -public class ProjectService { - public Project create(ProjectCreateRequest request) { - // Implementation - } -} -``` - -3. **Add Controller Endpoint**: -```java -@RestController -@RequestMapping("/api/projects") -public class ProjectController { - - @PostMapping - public ResponseEntity create(@Valid @RequestBody ProjectCreateRequest request) { - Project project = projectService.create(request); - return ResponseEntity.status(HttpStatus.CREATED).body(project); - } -} -``` - -4. **Add Tests**: -```java -@Test -void shouldCreateProject() { - // Test implementation -} -``` - -5. **Update API Documentation**: -- Swagger annotations automatically generate docs -- Update API reference docs - -### API Versioning - -Current version: `/api/v1` - -For breaking changes, create new version: -```java -@RequestMapping("/api/v2/projects") -``` - -## Performance Optimization - -### Database Query Optimization - -**Use Projections**: -```java -public interface ProjectSummary { - String getId(); - String getName(); -} - -List findAllProjectedBy(); -``` - -**Batch Fetching**: -```java -@Entity -public class Project { - @OneToMany(fetch = FetchType.LAZY) - @BatchSize(size = 10) - private List branches; -} -``` - -**Query Optimization**: -```java -@Query("SELECT p FROM Project p LEFT JOIN FETCH p.branches WHERE p.id = :id") -Project findByIdWithBranches(@Param("id") UUID id); -``` - -### Caching - -**Spring Cache**: -```java -@Cacheable(value = "projects", key = "#id") -public Project findById(UUID id) { - return projectRepository.findById(id).orElseThrow(); -} - -@CacheEvict(value = "projects", key = "#project.id") -public Project update(Project project) { - return projectRepository.save(project); -} -``` - -### Async Processing - -```java -@Async -public CompletableFuture analyzeAsync(AnalysisRequest request) { - AnalysisResult result = performAnalysis(request); - return CompletableFuture.completedFuture(result); -} -``` - -## Common Development Tasks - -### Add New Entity - -1. Create entity class with JPA annotations -2. Create repository interface -3. Create service class -4. Create DTOs -5. Create controller endpoints -6. Add tests -7. Update documentation - -### Add New Analysis Feature - -1. Update analysis request/response models -2. Modify prompt generation in MCP client -3. Update result processing in pipeline agent -4. Add database fields if needed -5. Update frontend display -6. Test end-to-end - -### Update Dependencies - -**Java**: -```bash -cd java-ecosystem -mvn versions:display-dependency-updates -mvn versions:use-latest-versions -``` - -**Python**: -```bash -pip list --outdated -pip install --upgrade -pip freeze > requirements.txt -``` - -**Frontend**: -```bash -npm outdated -npm update -# Or -bun update -``` - -## Troubleshooting Development Issues - -### Port Already in Use - -```bash -# Find process using port -lsof -i :8081 -# Kill process -kill -9 -``` - -### Maven Build Fails - -```bash -# Clean and rebuild -mvn clean install -U - -# Clear local repository -rm -rf ~/.m2/repository -``` - -### Python Import Errors - -```bash -# Reinstall dependencies -pip install -r requirements.txt --force-reinstall -``` - -### Docker Issues - -```bash -# Rebuild images -docker compose build --no-cache - -# Clean system -docker system prune -a -``` - -## Contributing Guidelines - -### Before Submitting PR - -- [ ] Code follows style guidelines -- [ ] All tests pass -- [ ] New tests added for new features -- [ ] Documentation updated -- [ ] Commit messages follow convention -- [ ] No commented-out code -- [ ] No debug statements -- [ ] Secrets not committed - -### Code Review Checklist - -- Logic correctness -- Error handling -- Security considerations -- Performance implications -- Test coverage -- Documentation completeness -- Code style adherence - -## Resources - -### Documentation -- Spring Boot: https://spring.io/projects/spring-boot -- FastAPI: https://fastapi.tiangolo.com/ -- React: https://react.dev/ -- shadcn/ui: https://ui.shadcn.com/ - -### Tools -- Postman: API testing -- DBeaver: Database management -- Docker Desktop: Container management -- GitHub Copilot: AI code assistant - -### Community -- GitHub Issues: Bug reports and feature requests -- GitHub Discussions: Q&A and ideas -- Slack/Discord: Real-time chat (if available) - diff --git a/docs/11-troubleshooting.md b/docs/11-troubleshooting.md deleted file mode 100644 index 6f74933..0000000 --- a/docs/11-troubleshooting.md +++ /dev/null @@ -1,621 +0,0 @@ -# Troubleshooting - -Common issues and solutions for CodeCrow. - -## Installation & Setup Issues - -### Docker Compose Fails to Start - -**Symptom**: Services fail to start or keep restarting. - -**Solutions**: - -1. **Check logs**: -```bash -docker compose logs -``` - -2. **Insufficient resources**: -```bash -# Increase Docker memory limit (Docker Desktop) -# Settings → Resources → Memory (set to 8GB+) -``` - -3. **Port conflicts**: -```bash -# Check if ports are already in use -lsof -i :8080 -lsof -i :8081 -# Kill conflicting processes or change ports -``` - -4. **Permission issues**: -```bash -# Fix volume permissions -docker compose down -docker volume rm source_code_tmp -docker compose up -d -``` - -### Configuration Files Not Found - -**Symptom**: "Configuration file not found" errors. - -**Solution**: -```bash -# Ensure all config files are copied from samples -cp deployment/config/java-shared/application.properties.sample \ - deployment/config/java-shared/application.properties -cp deployment/config/mcp-client/.env.sample \ - deployment/config/mcp-client/.env -cp deployment/config/rag-pipeline/.env.sample \ - deployment/config/rag-pipeline/.env -cp deployment/config/web-frontend/.env.sample \ - deployment/config/web-frontend/.env -``` - -### Database Connection Failed - -**Symptom**: Services can't connect to PostgreSQL. - -**Checks**: - -1. **Verify PostgreSQL is running**: -```bash -docker ps | grep postgres -docker logs codecrow-postgres -``` - -2. **Check credentials match**: -```yaml -# docker-compose.yml -postgres: - environment: - POSTGRES_PASSWORD: codecrow_pass - -web-server: - environment: - SPRING_DATASOURCE_PASSWORD: codecrow_pass -``` - -3. **Check database exists**: -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -l -``` - -4. **Create database if missing**: -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -c "CREATE DATABASE codecrow_ai;" -``` - -## Service-Specific Issues - -### Web Server Won't Start - -**Check health**: -```bash -docker logs codecrow-web-application -curl http://localhost:8081/actuator/health -``` - -**Common issues**: - -1. **JWT secret not set**: -```properties -# application.properties -codecrow.security.jwtSecret= -``` - -2. **Database schema mismatch**: -```bash -# Reset database (data loss!) -docker exec -it codecrow-postgres psql -U codecrow_user -c "DROP DATABASE codecrow_ai;" -docker exec -it codecrow-postgres psql -U codecrow_user -c "CREATE DATABASE codecrow_ai;" -docker compose restart web-server -``` - -3. **Redis connection failed**: -```bash -docker ps | grep redis -docker logs codecrow-redis -``` - -### Pipeline Agent Issues - -**Analysis stuck in processing**: - -1. **Check for stale locks**: -```sql -SELECT * FROM analysis_lock WHERE locked_at < NOW() - INTERVAL '30 minutes'; --- Remove stale locks -DELETE FROM analysis_lock WHERE locked_at < NOW() - INTERVAL '30 minutes'; -``` - -2. **Check MCP client connectivity**: -```bash -docker exec codecrow-pipeline-agent curl http://mcp-client:8000/health -``` - -3. **Check RAG pipeline connectivity**: -```bash -docker exec codecrow-pipeline-agent curl http://rag-pipeline:8001/health -``` - -**Webhook not received**: - -1. **Verify firewall allows Bitbucket IPs**: -```bash -# Check current firewall rules -ufw status -# Allow Bitbucket IP ranges -ufw allow from 104.192.136.0/21 -ufw allow from 185.166.140.0/22 -``` - -2. **Check webhook configuration in Bitbucket**: -- URL correct (https://domain.com/webhook) -- Events enabled (PR created, PR updated, Repo push) -- Token in Authorization header - -3. **Test webhook manually**: -```bash -curl -X POST http://localhost:8082/api/v1/bitbucket-cloud/webhook \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - -d @sample-webhook.json -``` - -### MCP Client Issues - -**Service not responding**: - -1. **Check logs**: -```bash -docker logs codecrow-mcp-client -``` - -2. **Verify Java MCP servers loaded**: -```bash -docker exec codecrow-mcp-client ls -la /app/codecrow-mcp-servers-1.0.jar -``` - -3. **Rebuild if JAR missing**: -```bash -./tools/production-build.sh -``` - -**OpenRouter API errors**: - -1. **Invalid API key**: -```bash -# Check .env file -docker exec codecrow-mcp-client cat /app/.env | grep OPENROUTER_API_KEY -``` - -2. **Rate limiting**: -- Check OpenRouter dashboard for quota -- Reduce analysis frequency -- Use smaller models - -3. **Model not available**: -```bash -# Check model name is correct -OPENROUTER_MODEL=anthropic/claude-3.5-sonnet -``` - -### RAG Pipeline Issues - -**Indexing fails**: - -1. **Qdrant connection failed**: -```bash -docker logs codecrow-qdrant -curl http://localhost:6333/collections -``` - -2. **OpenRouter embedding errors**: -- Check API key is valid -- Verify model supports embeddings -- Check for rate limits - -3. **Out of memory**: -```yaml -# Increase memory limit -rag-pipeline: - deploy: - resources: - limits: - memory: 4G -``` - -**Slow indexing**: - -1. **Reduce chunk size**: -```bash -# .env -CHUNK_SIZE=500 -TEXT_CHUNK_SIZE=800 -``` - -2. **Skip large files**: -```bash -MAX_FILE_SIZE_BYTES=524288 # 512KB -``` - -3. **Monitor Qdrant performance**: -```bash -docker stats codecrow-qdrant -``` - -### Frontend Issues - -**Can't connect to API**: - -1. **Check backend URL**: -```bash -# .env -VITE_API_URL=http://localhost:8081/api -``` - -2. **CORS errors**: -```java -// WebSecurityConfig.java -@Bean -public CorsConfigurationSource corsConfigurationSource() { - CorsConfiguration config = new CorsConfiguration(); - config.setAllowedOrigins(Arrays.asList("http://localhost:5173", "http://localhost:8080")); - config.setAllowedMethods(Arrays.asList("*")); - config.setAllowedHeaders(Arrays.asList("*")); - config.setAllowCredentials(true); - // ... -} -``` - -3. **Clear browser cache**: -- Hard refresh (Ctrl+Shift+R) -- Clear localStorage -- Use incognito mode - -**Build fails**: - -```bash -# Clear cache and rebuild -rm -rf node_modules dist -npm install -npm run build -``` - -## Analysis Issues - -### No Issues Found - -**Possible causes**: - -1. **Prompt not effective**: Adjust prompts in MCP client -2. **Model too lenient**: Try different model or adjust temperature -3. **RAG context missing**: Verify RAG indexing completed -4. **Files not analyzed**: Check changed files are included - -### False Positives - -**Solutions**: - -1. **Improve prompts**: Add more context about project standards -2. **Adjust severity thresholds**: Filter low-severity issues -3. **Add exclusion patterns**: Ignore test files, generated code -4. **Fine-tune model**: Provide examples of correct code - -### Analysis Timeout - -**Increase timeouts**: - -```properties -# application.properties -spring.mvc.async.request-timeout=600000 # 10 minutes -``` - -```python -# MCP client -httpx.Client(timeout=300.0) # 5 minutes -``` - -**Optimize analysis**: -- Reduce files analyzed -- Limit RAG context results -- Use faster model - -## Performance Issues - -### Slow Response Times - -**Database queries**: - -1. **Enable query logging**: -```properties -spring.jpa.show-sql=true -logging.level.org.hibernate.SQL=DEBUG -``` - -2. **Add indexes**: -```sql -CREATE INDEX idx_branch_issue_branch_resolved ON branch_issue(branch_id, resolved); -``` - -3. **Optimize queries**: -```java -// Use projections instead of full entities -interface ProjectSummary { - String getId(); - String getName(); -} -``` - -**High CPU usage**: - -```bash -# Check container stats -docker stats - -# Limit CPU usage -docker compose down -# Edit docker-compose.yml to add CPU limits -docker compose up -d -``` - -**High memory usage**: - -```bash -# Check memory usage -docker stats - -# Adjust JVM heap -environment: - JAVA_OPTS: "-Xmx1G -Xms512M" -``` - -### Database Growing Large - -**Check table sizes**: -```sql -SELECT - schemaname, - tablename, - pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size -FROM pg_tables -WHERE schemaname = 'public' -ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC; -``` - -**Clean up old data**: -```sql --- Remove old resolved issues -DELETE FROM branch_issue -WHERE resolved = TRUE -AND resolved_at < NOW() - INTERVAL '6 months'; - --- Remove old analyses -DELETE FROM code_analysis -WHERE completed_at < NOW() - INTERVAL '3 months'; - --- Vacuum to reclaim space -VACUUM FULL; -``` - -**Archive strategy**: -```sql --- Move old data to archive table -CREATE TABLE code_analysis_archive AS -SELECT * FROM code_analysis -WHERE completed_at < NOW() - INTERVAL '1 year'; - -DELETE FROM code_analysis -WHERE id IN (SELECT id FROM code_analysis_archive); -``` - -## Authentication Issues - -### Can't Login - -**Check user exists**: -```sql -SELECT * FROM "user" WHERE email = 'user@example.com'; -``` - -**Reset password**: -```bash -# Generate new hash (use BCrypt generator or Spring Boot app) -# Update database -UPDATE "user" SET password = '$2a$10$newhash' WHERE email = 'user@example.com'; -``` - -**JWT token expired**: -- Tokens expire after configured time -- Login again to get new token -- Increase expiration time in config - -**Token validation fails**: - -1. **Check JWT secret is consistent**: -```properties -# Same secret in all instances -codecrow.security.jwtSecret= -``` - -2. **Clear old sessions**: -```bash -docker exec codecrow-redis redis-cli FLUSHDB -``` - -### Permission Denied - -**Check workspace membership**: -```sql -SELECT * FROM workspace_member -WHERE user_id = '' AND workspace_id = ''; -``` - -**Check project permissions**: -```sql -SELECT * FROM project_permission_assignment -WHERE user_id = '' AND project_id = ''; -``` - -**Grant access**: -- Add user to workspace as member -- Assign project permissions - -## Data Issues - -### Missing Data - -**Database connection lost**: -- Check database is running -- Verify connection pool settings -- Check for network issues - -**Transaction rollback**: -- Check application logs for exceptions -- Verify constraints not violated -- Check foreign key relationships - -### Corrupted Data - -**Inconsistent state**: -```sql --- Find orphaned records -SELECT * FROM code_analysis_issue -WHERE analysis_id NOT IN (SELECT id FROM code_analysis); - --- Clean up -DELETE FROM code_analysis_issue -WHERE analysis_id NOT IN (SELECT id FROM code_analysis); -``` - -**Fix relationships**: -```sql --- Ensure all projects belong to existing workspaces -UPDATE project -SET workspace_id = (SELECT id FROM workspace LIMIT 1) -WHERE workspace_id NOT IN (SELECT id FROM workspace); -``` - -## Monitoring & Debugging - -### Enable Debug Logging - -**Java services**: -```properties -logging.level.org.rostilos.codecrow=DEBUG -logging.level.org.springframework.web=DEBUG -logging.level.org.hibernate.SQL=DEBUG -``` - -**Python services**: -```python -import logging -logging.basicConfig(level=logging.DEBUG) -``` - -### Check Service Health - -```bash -# All services -docker compose ps - -# Specific service health -curl http://localhost:8081/actuator/health -curl http://localhost:8000/health -curl http://localhost:8001/health - -# Service logs -docker compose logs -f web-server -docker compose logs -f pipeline-agent -``` - -### Database Inspection - -```bash -# Connect to database -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai - -# Common queries -\dt -- List tables -\d+ table_name -- Table structure -SELECT count(*) FROM "user"; -- Count users -SELECT * FROM analysis_lock; -- Check locks -``` - -### Network Issues - -**Test connectivity**: -```bash -# From host to container -curl http://localhost:8081/actuator/health - -# Between containers -docker exec codecrow-pipeline-agent curl http://mcp-client:8000/health -docker exec codecrow-mcp-client curl http://rag-pipeline:8001/health -``` - -**DNS resolution**: -```bash -docker exec codecrow-web-application nslookup postgres -``` - -## Getting Help - -### Collect Information - -When reporting issues, include: - -1. **Version information**: -```bash -git rev-parse HEAD -docker --version -docker compose version -``` - -2. **Service logs**: -```bash -docker compose logs > all-logs.txt -``` - -3. **Configuration** (redact secrets): -```bash -cat deployment/config/java-shared/application.properties -``` - -4. **System information**: -```bash -uname -a -docker info -``` - -### Support Channels - -- GitHub Issues: Bug reports and feature requests -- GitHub Discussions: Questions and community support -- Documentation: Check all docs first -- Logs: Enable debug logging for detailed info - -### Emergency Recovery - -**Complete reset** (data loss!): -```bash -cd deployment -docker compose down -v -docker system prune -a -# Reconfigure and restart -./tools/production-build.sh -``` - -**Restore from backup**: -```bash -# Restore database -gunzip < backup.sql.gz | \ - docker exec -i codecrow-postgres psql -U codecrow_user -d codecrow_ai - -# Restart services -docker compose restart -``` - diff --git a/docs/DOCUMENTATION_SUMMARY.md b/docs/DOCUMENTATION_SUMMARY.md deleted file mode 100644 index d7e7658..0000000 --- a/docs/DOCUMENTATION_SUMMARY.md +++ /dev/null @@ -1,320 +0,0 @@ -# CodeCrow Documentation Summary - -Complete and comprehensive documentation for the CodeCrow automated code review system. - -## Documentation Created - -### Main Documentation (documentation/) - -| File | Description | Status | -|------|-------------|--------| -| **README.md** | Documentation index and navigation | ✅ Complete | -| **01-overview.md** | System overview, features, key concepts | ✅ Complete | -| **02-getting-started.md** | Installation, quick start, initial setup | ✅ Complete | -| **03-architecture.md** | System architecture, data flow, technology decisions | ✅ Complete | -| **05-configuration.md** | Complete configuration reference for all components | ✅ Complete | -| **06-api-reference.md** | REST API endpoints, authentication, examples | ✅ Complete | -| **07-analysis-types.md** | Branch and PR analysis workflows, RAG integration | ✅ Complete | -| **08-database-schema.md** | Database schema, entities, relationships | ✅ Complete | -| **09-deployment.md** | Production deployment, security, monitoring | ✅ Complete | -| **10-development.md** | Development setup, workflows, guidelines | ✅ Complete | -| **11-troubleshooting.md** | Common issues, debugging, solutions | ✅ Complete | - -### Module Documentation (documentation/04-modules/) - -| File | Description | Status | -|------|-------------|--------| -| **java-ecosystem.md** | Java libraries and services (core, security, vcs-client, web-server, pipeline-agent) | ✅ Complete | -| **python-ecosystem.md** | Python services (mcp-client, rag-pipeline) | ✅ Complete | -| **frontend.md** | React frontend application | ✅ Complete | - -## Content Coverage - -### 1. Overview (01-overview.md) -- ✅ What is CodeCrow -- ✅ Key features -- ✅ System components (Java, Python, Frontend, Infrastructure) -- ✅ Analysis flow -- ✅ Core concepts (Workspaces, Projects, Analysis types) -- ✅ RAG integration -- ✅ Technology stack - -### 2. Getting Started (02-getting-started.md) -- ✅ Prerequisites -- ✅ Quick start guide -- ✅ Configuration setup (all config files) -- ✅ Credential generation -- ✅ Building and starting services -- ✅ Verification steps -- ✅ Bitbucket integration -- ✅ Service ports reference -- ✅ Security considerations -- ✅ Initial setup checklist - -### 3. Architecture (03-architecture.md) -- ✅ System architecture diagram -- ✅ Component interactions -- ✅ Webhook processing flow -- ✅ Branch analysis flow -- ✅ Pull request analysis flow -- ✅ RAG integration flow -- ✅ Authentication flow -- ✅ Data flow (request/response structures) -- ✅ Scalability considerations -- ✅ Performance optimization -- ✅ Security architecture -- ✅ Technology decisions rationale - -### 4. Modules - -#### Java Ecosystem (04-modules/java-ecosystem.md) -- ✅ Project structure -- ✅ Maven configuration -- ✅ Shared libraries (core, security, vcs-client) -- ✅ Services (pipeline-agent, web-server) -- ✅ MCP servers (bitbucket-mcp) -- ✅ Key components and responsibilities -- ✅ Package structures -- ✅ Endpoints and APIs -- ✅ Building and testing -- ✅ Development tips - -#### Python Ecosystem (04-modules/python-ecosystem.md) -- ✅ MCP client architecture -- ✅ Prompt engineering -- ✅ MCP tools usage -- ✅ RAG pipeline architecture -- ✅ Indexing and query flow -- ✅ Configuration options -- ✅ API examples -- ✅ Dependencies -- ✅ Running locally -- ✅ Docker builds -- ✅ Common issues - -#### Frontend (04-modules/frontend.md) -- ✅ Technology stack -- ✅ Project structure -- ✅ Key features -- ✅ API integration -- ✅ Component documentation -- ✅ State management -- ✅ Routing -- ✅ Styling (Tailwind, shadcn/ui) -- ✅ Build and deployment -- ✅ Environment variables -- ✅ TypeScript interfaces - -### 5. Configuration (05-configuration.md) -- ✅ Configuration files overview -- ✅ Java services configuration (application.properties) -- ✅ MCP client configuration (.env) -- ✅ RAG pipeline configuration (.env) -- ✅ Frontend configuration (.env) -- ✅ Docker Compose configuration -- ✅ Security settings (JWT, encryption) -- ✅ Database configuration -- ✅ Redis configuration -- ✅ File upload limits -- ✅ Logging configuration -- ✅ Environment-specific settings -- ✅ Configuration validation -- ✅ Troubleshooting configuration - -### 6. API Reference (06-api-reference.md) -- ✅ Base URLs -- ✅ Authentication (JWT, project tokens) -- ✅ Authentication endpoints -- ✅ Workspace endpoints -- ✅ Project endpoints -- ✅ Analysis endpoints -- ✅ Issue endpoints -- ✅ VCS integration endpoints -- ✅ AI connection endpoints -- ✅ Pipeline agent webhook endpoint -- ✅ Response codes -- ✅ Error format -- ✅ Pagination -- ✅ Swagger/OpenAPI reference - -### 7. Analysis Types (07-analysis-types.md) -- ✅ Branch analysis overview -- ✅ Branch analysis trigger and flow -- ✅ First vs subsequent branch analysis -- ✅ RAG indexing (full and incremental) -- ✅ Issue resolution check -- ✅ Pull request analysis overview -- ✅ PR analysis trigger and flow -- ✅ First vs re-analysis -- ✅ Request/response structures -- ✅ Database records -- ✅ RAG integration details -- ✅ Diff analysis -- ✅ Issue categorization -- ✅ Comparison table -- ✅ Configuration options -- ✅ Best practices - -### 8. Database Schema (08-database-schema.md) -- ✅ Entity relationship diagram -- ✅ All table definitions with CREATE statements -- ✅ Indexes for performance -- ✅ Field descriptions -- ✅ Relationships (foreign keys) -- ✅ Data encryption -- ✅ Database migrations -- ✅ Backup strategy -- ✅ Performance tuning -- ✅ Data retention policies -- ✅ Monitoring queries - -### 9. Deployment (09-deployment.md) -- ✅ Prerequisites -- ✅ Pre-deployment checklist -- ✅ Installation steps (1-11) -- ✅ Secret generation -- ✅ Configuration update -- ✅ Reverse proxy setup (Nginx) -- ✅ SSL with Let's Encrypt -- ✅ Firewall configuration -- ✅ Admin user creation -- ✅ Monitoring setup -- ✅ Log rotation -- ✅ Backup configuration -- ✅ Production tuning (database, JVM, resources) -- ✅ Security hardening -- ✅ Scaling strategies -- ✅ Monitoring & observability -- ✅ Disaster recovery -- ✅ Maintenance procedures -- ✅ Cost optimization - -### 10. Development (10-development.md) -- ✅ Development environment setup -- ✅ Prerequisites -- ✅ IDE configuration -- ✅ Running services locally -- ✅ Development workflow -- ✅ Branch strategy -- ✅ Commit conventions -- ✅ Code standards (Java, Python, TypeScript) -- ✅ Testing (unit, integration) -- ✅ Debugging (Java, Python, Frontend) -- ✅ Database development -- ✅ API development -- ✅ Performance optimization -- ✅ Common development tasks -- ✅ Dependency updates -- ✅ Contributing guidelines - -### 11. Troubleshooting (11-troubleshooting.md) -- ✅ Installation & setup issues -- ✅ Service-specific issues (all services) -- ✅ Analysis issues -- ✅ Performance issues -- ✅ Authentication issues -- ✅ Data issues -- ✅ Monitoring & debugging -- ✅ Network issues -- ✅ Emergency recovery -- ✅ Getting help guide - -## Documentation Statistics - -- **Total documents**: 14 files -- **Main guides**: 11 -- **Module docs**: 3 -- **Total lines**: ~3,500+ lines -- **Total words**: ~35,000+ words -- **Code examples**: 200+ -- **Diagrams**: 5 ASCII diagrams -- **Tables**: 25+ - -## Key Features - -✅ **Comprehensive Coverage** -- Every module documented -- All configuration options explained -- Complete API reference -- Full database schema - -✅ **Practical Examples** -- Code snippets for all languages -- Configuration examples -- API request/response examples -- SQL queries - -✅ **Operational Focus** -- Installation guide -- Deployment procedures -- Troubleshooting solutions -- Monitoring strategies - -✅ **Developer-Friendly** -- Development setup -- Code standards -- Testing guidelines -- Contributing guide - -✅ **Production-Ready** -- Security hardening -- Performance tuning -- Backup strategies -- Disaster recovery - -## Usage - -### For New Users -1. Start with [01-overview.md](01-overview.md) -2. Follow [02-getting-started.md](02-getting-started.md) -3. Reference [05-configuration.md](05-configuration.md) as needed - -### For Developers -1. Read [10-development.md](10-development.md) -2. Review module docs in [04-modules/](04-modules/) -3. Check [06-api-reference.md](06-api-reference.md) for API details - -### For DevOps -1. Follow [09-deployment.md](09-deployment.md) -2. Review [05-configuration.md](05-configuration.md) -3. Setup monitoring from [09-deployment.md](09-deployment.md#monitoring--observability) - -### For Troubleshooting -1. Check [11-troubleshooting.md](11-troubleshooting.md) -2. Enable debug logging -3. Review service-specific sections - -## Maintenance - -This documentation should be updated when: -- New features are added -- Configuration options change -- API endpoints are added/modified -- Database schema changes -- New modules are introduced -- Deployment procedures change - -## Quality Standards - -All documentation follows: -- Clear, concise language (no AI filler) -- Technical specialist style -- Practical, actionable content -- Complete code examples -- Proper formatting (Markdown) -- Logical organization -- Cross-references between docs - -## Feedback - -Documentation improvements welcome via: -- GitHub Issues (documentation label) -- Pull requests with doc updates -- Community discussions - ---- - -**Version**: 1.0 -**Last Updated**: November 26, 2024 -**Status**: Complete - diff --git a/docs/QUICK_REFERENCE.md b/docs/QUICK_REFERENCE.md deleted file mode 100644 index dc47a73..0000000 --- a/docs/QUICK_REFERENCE.md +++ /dev/null @@ -1,320 +0,0 @@ -# CodeCrow Quick Reference - -Fast reference guide for common tasks and commands. - -## Service URLs - -| Service | URL | Access | -|---------|-----|--------| -| Frontend | http://localhost:8080 | Public | -| Web API | http://localhost:8081 | Public | -| Swagger | http://localhost:8081/swagger-ui-custom.html | Public | -| Pipeline Agent | http://localhost:8082 | Webhook only | -| MCP Client | http://localhost:8000 | Internal | -| RAG Pipeline | http://localhost:8001 | Internal | - -## Quick Commands - -### Start Services -```bash -cd /opt/codecrow -./tools/production-build.sh -``` - -### Stop Services -```bash -cd deployment -docker compose down -``` - -### View Logs -```bash -docker compose logs -f -# Examples: -docker compose logs -f web-server -docker compose logs -f pipeline-agent -``` - -### Restart Service -```bash -docker compose restart -``` - -### Database Access -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai -``` - -### Check Service Health -```bash -curl http://localhost:8081/actuator/health -curl http://localhost:8000/health -curl http://localhost:8001/health -``` - -## Configuration Files - -| File | Purpose | -|------|---------| -| `deployment/config/java-shared/application.properties` | Java services config | -| `deployment/config/mcp-client/.env` | MCP client settings | -| `deployment/config/rag-pipeline/.env` | RAG pipeline settings | -| `deployment/config/web-frontend/.env` | Frontend settings | -| `deployment/docker-compose.yml` | Container orchestration | - -## Generate Secrets - -```bash -# JWT Secret -openssl rand -base64 32 - -# Encryption Key -openssl rand -base64 32 - -# Database Password -openssl rand -base64 24 -``` - -## Common Tasks - -### Create Admin User -```sql -INSERT INTO "user" (id, username, email, password, roles, created_at) -VALUES ( - gen_random_uuid(), - 'admin', - 'admin@example.com', - '$2a$10$hashed_password', - ARRAY['USER', 'ADMIN'], - NOW() -); -``` - -### Reset Database -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -c "DROP DATABASE codecrow_ai;" -docker exec -it codecrow-postgres psql -U codecrow_user -c "CREATE DATABASE codecrow_ai;" -docker compose restart web-server pipeline-agent -``` - -### Clear Analysis Locks -```sql -DELETE FROM analysis_lock WHERE locked_at < NOW() - INTERVAL '30 minutes'; -``` - -### View Active Issues -```sql -SELECT b.name, COUNT(*) -FROM branch_issue bi -JOIN branch b ON bi.branch_id = b.id -WHERE bi.resolved = FALSE -GROUP BY b.name; -``` - -### Backup Database -```bash -docker exec codecrow-postgres pg_dump -U codecrow_user codecrow_ai | gzip > backup_$(date +%Y%m%d).sql.gz -``` - -### Restore Database -```bash -gunzip < backup_20240115.sql.gz | docker exec -i codecrow-postgres psql -U codecrow_user -d codecrow_ai -``` - -## Development - -### Run Java Service Locally -```bash -cd java-ecosystem/services/web-server -mvn spring-boot:run -``` - -### Run Python Service Locally -```bash -cd python-ecosystem/mcp-client -source venv/bin/activate -uvicorn main:app --reload --port 8000 -``` - -### Run Frontend Locally -```bash -cd frontend -npm run dev # or: bun run dev -``` - -### Build Java Artifacts -```bash -cd java-ecosystem -mvn clean package -DskipTests -``` - -### Run Tests -```bash -# Java -mvn test - -# Python -pytest - -# Frontend -npm test -``` - -## API Quick Reference - -### Login -```bash -curl -X POST http://localhost:8081/api/auth/login \ - -H "Content-Type: application/json" \ - -d '{"username":"user@example.com","password":"password"}' -``` - -### Create Workspace -```bash -curl -X POST http://localhost:8081/api/workspaces \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - -d '{"name":"My Workspace","description":"..."}' -``` - -### Trigger Webhook (Test) -```bash -curl -X POST http://localhost:8082/api/v1/bitbucket-cloud/webhook \ - -H "Authorization: Bearer " \ - -H "Content-Type: application/json" \ - -d @sample-webhook.json -``` - -## Troubleshooting - -### Service Won't Start -```bash -docker compose logs -docker compose restart -``` - -### High Memory Usage -```bash -docker stats -# Adjust limits in docker-compose.yml -``` - -### Database Connection Issues -```bash -docker exec -it codecrow-postgres psql -U codecrow_user -d codecrow_ai -c "SELECT version();" -``` - -### Clear Redis Cache -```bash -docker exec codecrow-redis redis-cli FLUSHDB -``` - -### Check Qdrant Collections -```bash -curl http://localhost:6333/collections -``` - -## Environment Variables - -### Java Services -```properties -SPRING_DATASOURCE_URL=jdbc:postgresql://postgres:5432/codecrow_ai -SPRING_DATASOURCE_USERNAME=codecrow_user -SPRING_DATASOURCE_PASSWORD=codecrow_pass -SERVER_PORT=8081 -``` - -### Python Services -```bash -OPENROUTER_API_KEY=sk-or-v1-... -RAG_ENABLED=true -QDRANT_URL=http://localhost:6333 -``` - -### Frontend -```bash -VITE_API_URL=http://localhost:8081/api -VITE_WEBHOOK_URL=http://localhost:8082 -``` - -## File Locations - -### Logs -- Web Server: `/var/lib/docker/volumes/web_logs/_data/` -- Pipeline Agent: `/var/lib/docker/volumes/pipeline_agent_logs/_data/` -- RAG Pipeline: `/var/lib/docker/volumes/rag_logs/_data/` - -### Data -- PostgreSQL: `/var/lib/docker/volumes/postgres_data/_data/` -- Qdrant: `/var/lib/docker/volumes/qdrant_data/_data/` -- Redis: `/var/lib/docker/volumes/redis_data/_data/` - -## Network - -### Bitbucket IP Ranges (for firewall) -``` -104.192.136.0/21 -185.166.140.0/22 -``` - -### Container Network -All services on `codecrow-network` can communicate by service name. - -## Performance - -### Database Maintenance -```sql -VACUUM ANALYZE; -REINDEX DATABASE codecrow_ai; -``` - -### Clean Old Data -```sql -DELETE FROM branch_issue WHERE resolved = TRUE AND resolved_at < NOW() - INTERVAL '90 days'; -DELETE FROM code_analysis WHERE completed_at < NOW() - INTERVAL '180 days'; -``` - -## Security - -### Change Database Password -1. Update in `docker-compose.yml` for all services -2. Restart services: -```bash -docker compose down -docker compose up -d -``` - -### Rotate JWT Secret -1. Update `application.properties` -2. Restart Java services -3. All users must re-login - -## Monitoring - -### Check All Services -```bash -docker compose ps -``` - -### Resource Usage -```bash -docker stats -``` - -### Database Size -```sql -SELECT pg_size_pretty(pg_database_size('codecrow_ai')); -``` - -### Active Connections -```sql -SELECT count(*) FROM pg_stat_activity WHERE datname = 'codecrow_ai'; -``` - -## Links - -- [Full Documentation](README.md) -- [Getting Started](02-getting-started.md) -- [Troubleshooting](11-troubleshooting.md) -- [API Reference](06-api-reference.md) - diff --git a/docs/README.md b/docs/README.md deleted file mode 100644 index e19198f..0000000 --- a/docs/README.md +++ /dev/null @@ -1,38 +0,0 @@ -# CodeCrow Documentation - -CodeCrow is an automated code review system that leverages AI and Model Context Protocol (MCP) servers to analyze code changes, track issues, and provide intelligent insights. - -## Supported VCS Platforms - -- **Bitbucket Cloud** - Full support with OAuth App integration -- **GitHub** - Full support with OAuth App integration -- **Bitbucket Server/Data Center** - Personal Access Token support -- **GitLab** - Coming soon - -## Documentation Structure - -- [1. Overview](01-overview.md) - System architecture and key concepts -- [2. Getting Started](02-getting-started.md) - Installation and initial setup -- [3. Architecture](03-architecture.md) - Detailed system architecture -- [4. Modules](04-modules/) - Individual module documentation - - [4.1 Java Ecosystem](04-modules/java-ecosystem.md) - - [4.2 Python Ecosystem](04-modules/python-ecosystem.md) - - [4.3 Frontend](04-modules/frontend.md) -- [5. Configuration](05-configuration.md) - Configuration reference -- [6. API Reference](06-api-reference.md) - REST API endpoints -- [7. Analysis Types](07-analysis-types.md) - Branch and PR analysis details -- [8. Database Schema](08-database-schema.md) - Data model -- [9. Deployment](09-deployment.md) - Production deployment guide -- [10. Development](10-development.md) - Development workflow -- [11. Troubleshooting](11-troubleshooting.md) - Common issues and solutions -- [12. Localization](12-localization.md) - Internationalization guide -- [SMTP Setup](SMTP_SETUP.md) - Email configuration for 2FA and notifications - -## Quick Links - -- [Quick Start Guide](02-getting-started.md#quick-start) -- [Configuration Reference](05-configuration.md) -- [API Documentation](06-api-reference.md) -- [Development Setup](10-development.md) -- [SMTP Configuration](SMTP_SETUP.md) - diff --git a/docs/architecture/mcp-scaling-strategy.md b/docs/architecture/mcp-scaling-strategy.md deleted file mode 100644 index 2750ae3..0000000 --- a/docs/architecture/mcp-scaling-strategy.md +++ /dev/null @@ -1,176 +0,0 @@ -# MCP Server Architecture for SaaS Scale - -## Executive Summary - -For a SaaS serving **dozens of teams and hundreds of developers**, the current architecture of spawning a new JVM process per request is **inefficient**. This document analyzes the options and provides recommendations. - -## Current Architecture - -``` -Pipeline Agent ─HTTP─▶ MCP Client (Python) ─spawns─▶ Java JVM (STDIO) ─dies after─▶ - │ │ - │ Per-request process spawn │ - │ ~500ms-2s JVM startup │ - │ 100-300MB memory each │ - └──────────────────────────────┘ -``` - -### Problems at Scale - -| Metric | 10 concurrent | 50 concurrent | 100 concurrent | -|--------|---------------|---------------|----------------| -| JVM startup time | 5-20s total | 25-100s total | 50-200s total | -| Memory usage | 1-3 GB | 5-15 GB | 10-30 GB | -| OS process overhead | Minimal | Noticeable | Significant | - -## Option 1: Process Pooling (Recommended) ✅ - -**Keep STDIO transport but pool MCP server processes.** - -``` -Pipeline Agent ─HTTP─▶ MCP Client ─▶ Process Pool ─▶ Pre-warmed JVM 1 (reused) - Manager Pre-warmed JVM 2 (reused) - Pre-warmed JVM 3 (reused) - ... -``` - -### Benefits -- **Zero JVM startup latency** after warmup -- **Shared memory footprint** - 5 processes serve 100+ requests -- **Compatible with existing MCP protocol** - no protocol changes -- **Works with mcp_use library** - minimal Python changes -- **Gradual rollout** - can be enabled per-environment - -### Implementation -See `utils/mcp_pool.py` - Process pool manager that: -- Pre-warms N JVM processes at startup -- Routes requests to available processes -- Recycles processes after N requests or time limit -- Handles process crashes gracefully - -### Configuration -```bash -MCP_POOL_SIZE=5 # Number of pre-warmed processes -MCP_POOL_MAX_REQUESTS=100 # Recycle after N requests -MCP_POOL_MAX_AGE=3600 # Recycle after 1 hour -``` - -### Memory Math -- Without pooling (100 concurrent): 100 × 200MB = **20 GB** -- With pooling (5 processes): 5 × 200MB = **1 GB** - ---- - -## Option 2: HTTP Transport - -**Convert MCP servers to HTTP services.** - -``` -Pipeline Agent ─HTTP─▶ MCP Client ─HTTP─▶ VCS MCP Service (long-running) - │ - └──▶ Platform MCP Service (long-running) -``` - -### Benefits -- Standard HTTP load balancing -- Native Docker health checks -- Easy horizontal scaling -- Can use connection pooling to VCS APIs - -### Drawbacks -- **Breaking change** to MCP protocol contract -- Requires rewriting Python MCP client -- Not compatible with mcp_use library (expects STDIO) -- More complex deployment - -### When to Choose HTTP -- If you abandon the `mcp_use` library entirely -- If you need to scale MCP servers independently -- If you want standard REST API observability - ---- - -## Option 3: SSE Transport - -**Use MCP's native SSE (Server-Sent Events) transport.** - -The MCP SDK supports SSE transport which allows long-running server connections. - -### Benefits -- Part of official MCP specification -- Maintains MCP protocol compatibility -- Supports streaming responses - -### Drawbacks -- Requires MCP SDK changes on both sides -- More complex than process pooling -- Limited ecosystem support in Python - ---- - -## Recommendation - -### Phase 1: Process Pooling (Immediate) - -1. **Enable process pooling** in Python MCP client -2. **Configure pool size** based on expected load -3. **Monitor metrics** to tune pool parameters - -This provides **80% of the benefit with 20% of the effort**. - -### Phase 2: Consider HTTP (Future) - -If you later need: -- Independent scaling of MCP servers -- Advanced load balancing -- Cross-datacenter deployment - -Then evaluate HTTP transport. - ---- - -## Implementation Checklist - -### Process Pooling -- [x] Create `McpProcessPool` class -- [ ] Integrate with `ReviewService` -- [ ] Add pool metrics endpoint -- [ ] Configure Docker healthchecks -- [ ] Load test with pool enabled - -### Docker Compose Changes (if choosing HTTP later) -```yaml -services: - vcs-mcp-server: - build: ../java-ecosystem/mcp-servers/bitbucket-mcp - command: ["java", "-jar", "app.jar", "--http"] - ports: - - "8765:8765" - healthcheck: - test: ["CMD", "curl", "-f", "http://localhost:8765/health"] - deploy: - replicas: 3 # Scale horizontally -``` - ---- - -## Metrics to Watch - -| Metric | Target | Alert Threshold | -|--------|--------|-----------------| -| Pool utilization | < 80% | > 90% | -| Process recycles/hour | < 10 | > 50 | -| Request latency p99 | < 100ms | > 500ms | -| Process errors/hour | 0 | > 5 | - ---- - -## Conclusion - -**Use process pooling for now.** It's the most pragmatic solution that: -1. Requires minimal code changes -2. Provides massive performance improvement -3. Maintains compatibility with existing architecture -4. Can be deployed incrementally - -The HTTP approach is valid but represents a larger architectural shift that should only be undertaken if process pooling proves insufficient at your scale. diff --git a/frontend b/frontend index 34c6a5a..2aafc7f 160000 --- a/frontend +++ b/frontend @@ -1 +1 @@ -Subproject commit 34c6a5ad9b0091a1ce03a57563dd4167f081a003 +Subproject commit 2aafc7fac8525f893eee6dd60df132727bd7c226