diff --git a/docs/issues/1-bug-seen-submission-race-condition.md b/docs/issues/1-bug-seen-submission-race-condition.md new file mode 100644 index 0000000..bba3e49 --- /dev/null +++ b/docs/issues/1-bug-seen-submission-race-condition.md @@ -0,0 +1,123 @@ +# Seen Submission Race Condition + +**Priority**: 1 **Type**: bug **Status**: open **Created**: 2025-07-04 +**Updated**: 2025-07-04 + +## Description + +Race condition exists between filtering submissions and marking them as seen, +potentially allowing duplicate notifications if the monitoring loop runs +multiple times rapidly or if API responses change between calls. + +## Reproduction Steps + +1. Set up monitoring targets with very short poll intervals (difficult to + reproduce consistently) +2. Force multiple rapid checks using `.trigger` command +3. Monitor for duplicate notifications of the same submission +4. Check database for timing issues in `seen_submissions` table + +## Expected vs Actual Behavior + +- **Expected**: Each submission should only be notified once, regardless of + timing +- **Actual**: Rapid polling or API inconsistencies could cause duplicate + notifications + +## Technical Details + +### Code Location + +- **File**: `src/cogs/runner.py` +- **Functions**: `filter_new_submissions()` (line 386-398), + `post_submissions()`, `mark_submissions_seen()` + +### Race Condition Flow + +```python +# Time T1: First check starts +submissions = await fetch_submissions_for_location(...) # API call +new_submissions = self.db.filter_new_submissions(channel_id, submissions) # Query DB + +# Time T2: Second check starts (before first completes) +submissions_2 = await fetch_submissions_for_location(...) # Same API response +new_submissions_2 = self.db.filter_new_submissions(channel_id, submissions_2) # Same query result + +# Time T3: Both post notifications +await self.post_submissions(new_submissions) # Posts duplicates +await self.post_submissions(new_submissions_2) # Posts duplicates + +# Time T4: Both mark as seen +self.db.mark_submissions_seen(channel_id, submission_ids) # Second call gets IntegrityError +``` + +### Database Protection + +- `seen_submissions` table has unique constraint on + `(channel_id, submission_id)` +- `mark_submissions_seen()` handles `IntegrityError` gracefully +- **Problem**: Notifications already sent before constraint violation + +### Potential Triggers + +- Multiple rapid manual checks (`.trigger` command) +- Cloud Run scaling events causing multiple instances +- API response timing variations +- Database connection delays + +## Proposed Solutions + +### Option 1: Atomic Transaction + +```python +async def process_submissions_atomically(self, channel_id, submissions): + async with self.db.transaction(): + new_submissions = self.db.filter_new_submissions(channel_id, submissions) + if new_submissions: + self.db.mark_submissions_seen(channel_id, [s["id"] for s in new_submissions]) + await self.post_submissions(new_submissions) +``` + +### Option 2: Channel-Level Locking + +```python +class Runner: + def __init__(self): + self.channel_locks = {} + + async def run_checks_for_channel(self, channel_id, config, is_manual_check=False): + if channel_id not in self.channel_locks: + self.channel_locks[channel_id] = asyncio.Lock() + + async with self.channel_locks[channel_id]: + # Existing check logic +``` + +### Option 3: Submission ID Tracking + +```python +# Track recently processed submission IDs in memory +self.recently_processed = {} # {channel_id: set(submission_ids)} + +async def filter_new_submissions(self, channel_id, submissions): + # Filter by database AND recent memory + recent = self.recently_processed.get(channel_id, set()) + truly_new = [s for s in db_filtered if s["id"] not in recent] + return truly_new +``` + +## Acceptance Criteria + +- [ ] No duplicate notifications under rapid polling scenarios +- [ ] Multiple manual checks don't cause duplicates +- [ ] Database integrity maintained +- [ ] Performance impact minimal +- [ ] Cloud Run scaling doesn't trigger race conditions +- [ ] Test with concurrent `.trigger` commands + +## Notes + +- Difficult to reproduce consistently in testing +- May require load testing to verify fix +- Related to Cloud Run scaling and multiple instances +- Consider distributed locking for multi-instance deployments diff --git a/docs/issues/1-bug-startup-duplicate-notifications.md b/docs/issues/1-bug-startup-duplicate-notifications.md new file mode 100644 index 0000000..3be3b99 --- /dev/null +++ b/docs/issues/1-bug-startup-duplicate-notifications.md @@ -0,0 +1,90 @@ +# Startup Duplicate Notifications + +**Priority**: 1 **Type**: bug **Status**: open **Created**: 2025-07-04 +**Updated**: 2025-07-04 + +## Description + +When the bot starts up, it sends notifications for old submissions that have +already been shown previously. The startup check bypasses normal polling logic +and treats all submissions as "new" even though they were previously processed. + +## Reproduction Steps + +1. Start the bot with existing monitoring targets +2. Observe that one or more channels receive notifications +3. Check log timestamps - notifications are for submissions that occurred before + the current session +4. Compare with previous bot runs - same submissions are re-notified + +## Expected vs Actual Behavior + +- **Expected**: Startup should only initialize monitoring, not send duplicate + notifications +- **Actual**: Startup sends notifications for previously seen submissions + +## Technical Details + +### Code Location + +- **File**: `src/cogs/runner.py` +- **Function**: `_run_startup_checks()` (lines 457-508) +- **Issue**: Calls `run_checks_for_channel(is_manual_check=False)` during + startup + +### Root Cause + +1. `_run_startup_checks()` runs immediately when bot becomes ready +2. Calls `run_checks_for_channel()` with `is_manual_check=False` +3. Since `last_poll_at` is None, API fetches all recent submissions without date + filtering +4. `filter_new_submissions()` has no seen submissions in database yet (fresh + start) +5. All submissions get posted as notifications +6. Submissions are then marked as seen, preventing future duplicates + +### Database State + +- `seen_submissions` table is empty on startup +- `last_poll_at` is None for channels that haven't been polled +- API returns submissions from past 24 hours without filtering + +## Proposed Solution + +### Option 1: Change Startup to Manual Check + +```python +# In _run_startup_checks() +result = await self.run_checks_for_channel(channel_id, config, is_manual_check=True) +``` + +This would show "Last 5 submissions" format instead of posting as notifications. + +### Option 2: Mark as Seen Without Posting + +```python +# In _run_startup_checks() +submissions = await self._fetch_submissions_for_channel(channel_id, config) +if submissions: + self.db.mark_submissions_seen(channel_id, [s["id"] for s in submissions]) +# Don't call post_submissions() +``` + +### Option 3: Skip Startup Checks Entirely + +Remove `_run_startup_checks()` and let normal 1-minute polling handle first +check. + +## Acceptance Criteria + +- [ ] Bot startup does not trigger duplicate notifications +- [ ] Normal polling behavior unchanged +- [ ] Manual check commands still show recent submissions +- [ ] No regression in monitoring functionality +- [ ] Test with multiple channels and monitoring targets + +## Notes + +- This issue is particularly noticeable with active monitoring targets +- Users report confusion when old submissions appear as "new" +- May be related to timezone handling in date filtering logic diff --git a/docs/issues/2-bug-timezone-handling-inconsistency.md b/docs/issues/2-bug-timezone-handling-inconsistency.md new file mode 100644 index 0000000..34c2d39 --- /dev/null +++ b/docs/issues/2-bug-timezone-handling-inconsistency.md @@ -0,0 +1,113 @@ +# Timezone Handling Inconsistency + +**Priority**: 2 **Type**: bug **Status**: open **Created**: 2025-07-04 +**Updated**: 2025-07-04 + +## Description + +Inconsistent timezone handling across the application may cause polling schedule +issues, incorrect date filtering in API calls, and database timestamp problems. + +## Reproduction Steps + +1. Review datetime handling in different parts of the codebase +2. Check for naive vs timezone-aware datetime objects +3. Monitor for polling schedule irregularities +4. Verify API date filtering works correctly across time zones + +## Expected vs Actual Behavior + +- **Expected**: All datetime operations should be timezone-aware and consistent +- **Actual**: Mix of naive and timezone-aware datetime objects causing potential + issues + +## Technical Details + +### Code Locations + +- **File**: `src/database.py` - `get_channel_config()` (lines 113-131) +- **File**: `src/cogs/runner.py` - Various datetime operations +- **File**: `src/api.py` - Date filtering in API calls + +### Specific Issues + +#### Database Operations + +```python +# In get_channel_config() - timezone conversion happens here +if config.last_poll_at and config.last_poll_at.tzinfo is None: + config.last_poll_at = config.last_poll_at.replace(tzinfo=timezone.utc) +``` + +#### Polling Schedule Logic + +```python +# In _should_poll_channel() - may use naive datetime +current_time = datetime.now() # Potentially naive +time_since_last_poll = current_time - config.last_poll_at # Mixed types +``` + +#### API Date Filtering + +```python +# Date calculations for min_date_of_submission parameter +yesterday = datetime.now() - timedelta(days=1) # Potentially naive +date_str = yesterday.strftime("%Y-%m-%d") # May not account for timezone +``` + +### Potential Impact + +- **Polling Schedule**: Channels may poll too frequently or not frequently + enough +- **API Filtering**: Submissions might be missed or duplicated due to date range + issues +- **Database Consistency**: Timestamps may not align properly +- **Multi-timezone Users**: Different behavior based on server vs user timezones + +## Proposed Solution + +### Centralized Timezone Handling + +```python +# Create utility functions for consistent datetime handling +from datetime import datetime, timezone +import pytz + +def now_utc() -> datetime: + """Get current UTC time, always timezone-aware""" + return datetime.now(timezone.utc) + +def ensure_utc(dt: datetime) -> datetime: + """Ensure datetime is timezone-aware and in UTC""" + if dt.tzinfo is None: + # Assume naive datetime is UTC + return dt.replace(tzinfo=timezone.utc) + return dt.astimezone(timezone.utc) + +def format_api_date(dt: datetime) -> str: + """Format datetime for API calls consistently""" + return ensure_utc(dt).strftime("%Y-%m-%d") +``` + +### Update All Datetime Operations + +1. Replace `datetime.now()` with `now_utc()` +2. Ensure all database datetime fields are timezone-aware +3. Standardize API date formatting +4. Add timezone validation in tests + +## Acceptance Criteria + +- [ ] All datetime objects are timezone-aware +- [ ] Polling schedules work correctly regardless of server timezone +- [ ] API date filtering is consistent and accurate +- [ ] Database timestamps are properly handled +- [ ] No regression in existing functionality +- [ ] Add timezone-specific tests + +## Notes + +- May require database migration if existing timestamps are naive +- Consider impact on existing data +- Test with different server timezone configurations +- Document timezone assumptions for deployment diff --git a/docs/issues/2-feature-monitoring-loop-debug-logging.md b/docs/issues/2-feature-monitoring-loop-debug-logging.md new file mode 100644 index 0000000..b864c62 --- /dev/null +++ b/docs/issues/2-feature-monitoring-loop-debug-logging.md @@ -0,0 +1,121 @@ +# Enhanced Monitoring Loop Debug Logging + +**Priority**: 2 **Type**: feature **Status**: open **Created**: 2025-07-04 +**Updated**: 2025-07-04 + +## Description + +Add comprehensive debug logging to the monitoring loop to help diagnose timing +issues, polling schedule problems, and unexpected behavior. Current logging is +insufficient for troubleshooting complex monitoring issues. + +## Use Cases + +- Debugging why monitoring loop appears to stop working +- Understanding polling schedule and timing behavior +- Tracking down duplicate notification sources +- Monitoring API call patterns and responses +- Analyzing performance and bottlenecks + +## Acceptance Criteria + +- [ ] Log monitoring loop state transitions (start, stop, iteration) +- [ ] Log channel polling decisions and timing calculations +- [ ] Log API call details (URL, response time, submission counts) +- [ ] Log database operations (queries, results, timing) +- [ ] Log seen submission filtering details +- [ ] Configurable log levels for different components +- [ ] Performance metrics logging (iteration time, API latency) +- [ ] Error context logging with full state information + +## Technical Details + +### Proposed Logging Categories + +#### Loop State Logging + +```python +logger.debug(f"LOOP_STATE: iteration #{self.iteration_count} starting") +logger.debug(f"LOOP_STATE: {len(active_channels)} active channels found") +logger.debug(f"LOOP_STATE: iteration completed in {duration:.2f}s") +``` + +#### Channel Polling Decisions + +```python +logger.debug(f"POLL_DECISION: channel {channel_id} - last_poll: {last_poll}, " + f"poll_rate: {poll_rate}min, should_poll: {should_poll}") +logger.debug(f"POLL_TIMING: channel {channel_id} - time_since_last: {time_since}min, " + f"next_poll_in: {next_poll_in}min") +``` + +#### API Call Details + +```python +logger.debug(f"API_CALL: {api_type} for channel {channel_id} - " + f"URL: {url}, min_date: {min_date}") +logger.debug(f"API_RESPONSE: {api_type} - {len(submissions)} submissions, " + f"response_time: {response_time:.2f}s") +``` + +#### Database Operations + +```python +logger.debug(f"DB_QUERY: filtering {len(submissions)} submissions for channel {channel_id}") +logger.debug(f"DB_RESULT: {len(new_submissions)} new submissions after filtering") +logger.debug(f"DB_UPDATE: marking {len(submission_ids)} submissions as seen") +``` + +#### Performance Metrics + +```python +logger.info(f"PERF_METRICS: iteration #{iteration} - " + f"channels: {channel_count}, api_calls: {api_call_count}, " + f"notifications: {notification_count}, duration: {total_time:.2f}s") +``` + +### Configuration + +- Environment variable `MONITORING_LOG_LEVEL` to control verbosity +- Separate log levels for different components +- Option to enable/disable performance metrics logging + +### Code Locations to Enhance + +- `src/cogs/runner.py` - Main monitoring loop +- `src/database.py` - Submission filtering and marking +- `src/api.py` - API call timing and results +- `src/notifier.py` - Notification posting + +## Implementation Plan + +### Phase 1: Core Loop Logging + +- Add state transition logging to `monitor_task_loop()` +- Log iteration timing and channel processing +- Add polling decision logging to `_should_poll_channel()` + +### Phase 2: API and Database Logging + +- Enhance API call logging with timing and results +- Add detailed database operation logging +- Log submission filtering details + +### Phase 3: Performance and Error Logging + +- Add performance metrics collection +- Enhance error logging with full context +- Add configurable log levels + +### Phase 4: Integration and Testing + +- Test with local development environment +- Verify log output is useful for debugging +- Document log analysis procedures + +## Notes + +- Balance between useful information and log volume +- Consider log rotation and storage impact +- Ensure sensitive data (tokens, etc.) not logged +- Design for both local development and production debugging diff --git a/docs/issues/CLAUDE.md b/docs/issues/CLAUDE.md new file mode 100644 index 0000000..32bab83 --- /dev/null +++ b/docs/issues/CLAUDE.md @@ -0,0 +1,124 @@ +# Issue Tracking Framework + +This directory contains systematically documented issues for the DisPinMap +project, organized for easy prioritization and tracking. + +## File Naming Convention + +All issues follow the format: `--.md` + +### Priority Levels (0-3) + +- **0**: Critical - System breaking, production down, data loss +- **1**: High - Major functionality broken, security issues, performance + problems +- **2**: Medium - Important features not working, notable bugs, usability issues +- **3**: Low - Minor bugs, enhancements, documentation updates + +### Issue Types + +- **bug**: Defects in existing functionality +- **feature**: New functionality or capabilities +- **docs**: Documentation improvements or additions +- **chore**: Maintenance tasks, refactoring, technical debt + +## Example File Names + +- `0-bug-monitoring-loop-not-starting.md` - Critical bug +- `1-bug-duplicate-notifications-on-startup.md` - High priority bug +- `2-feature-add-location-search.md` - Medium priority feature +- `3-docs-update-api-documentation.md` - Low priority documentation + +## Issue File Structure + +Each issue file should contain: + +```markdown +# Issue Title + +**Priority**: [0-3] **Type**: [bug|feature|docs|chore] **Status**: +[open|in-progress|blocked|closed] **Assignee**: [optional] **Created**: [date] +**Updated**: [date] + +## Description + +Clear description of the issue, bug, or feature request. + +## Reproduction Steps (for bugs) + +1. Step by step instructions +2. To reproduce the issue +3. Expected vs actual behavior + +## Acceptance Criteria (for features) + +- [ ] Specific deliverable 1 +- [ ] Specific deliverable 2 +- [ ] Testing requirements + +## Technical Details + +- Code locations involved +- Dependencies or related issues +- Potential solutions or approaches + +## Notes + +Additional context, related discussions, or implementation notes. +``` + +## Workflow + +1. **Create Issue**: Document problems/features as they're discovered +2. **Prioritize**: Assign appropriate priority and type +3. **Track Progress**: Update status as work progresses +4. **Close**: Move to closed status when resolved +5. **Archive**: Optionally move closed issues to subdirectories + +## Priority Guidelines + +### Priority 0 (Critical) + +- Production system down +- Data corruption or loss +- Security vulnerabilities +- Core functionality completely broken + +### Priority 1 (High) + +- Major features not working +- Performance issues affecting users +- Monitoring/alerting failures +- High-impact bugs + +### Priority 2 (Medium) + +- Minor feature bugs +- Usability improvements +- Non-critical performance issues +- Documentation gaps + +### Priority 3 (Low) + +- Cosmetic issues +- Code organization +- Nice-to-have features +- Non-urgent documentation + +## Integration with Development + +- Reference issues in commit messages: + `git commit -m "fix: resolve startup duplicates (refs 1-bug-startup-duplicates)"` +- Link to issues in PR descriptions +- Update issue status when work begins/completes +- Use issue analysis to guide development priorities + +## AI Agent Guidelines + +When working on this project: + +1. **Check existing issues** before starting work +2. **Create new issues** for discovered problems +3. **Update issue status** as work progresses +4. **Reference issues** in commits and PRs +5. **Prioritize work** based on issue priorities diff --git a/src/cogs/runner.py b/src/cogs/runner.py index b5a70a6..347af45 100644 --- a/src/cogs/runner.py +++ b/src/cogs/runner.py @@ -450,8 +450,8 @@ async def before_monitor_task_loop(self): logger.info( f"🔍 Task loop is running: {self.monitor_task_loop.is_running()}" ) - except Exception as e: - logger.warning(f"⚠️ Could not access bot debug info during startup: {e}") + except Exception: + logger.exception("⚠️ Could not access bot debug info during startup") logger.info("🔄 Debug info will be available once bot is fully initialized") # Run immediate first check to avoid waiting 60 minutes on startup