-
Notifications
You must be signed in to change notification settings - Fork 525
Description
Operating System
Linux (Docker)
Run Mode
Docker
App Version
0.13.0
Bug Description
Features in automated testing mode are marked as "Verified" regardless of whether Playwright tests actually passed or failed. There is no visual indication in the UI when tests fail, no parsing of test results from agent output, and no backlog or history of test results to review.
Root Cause Analysis
In apps/server/src/services/auto-mode-service.ts:
Line 1350-1354 - Status is set based solely on mode, not test results:
// Determine final status based on testing mode:
// - skipTests=false (automated testing): go directly to 'verified' (no manual verify needed)
// - skipTests=true (manual verification): go to 'waiting_approval' for manual review
const finalStatus = feature.skipTests ? 'waiting_approval' : 'verified';
await this.updateFeatureStatus(projectPath, featureId, finalStatus);Line 1393 - Success is hardcoded:
passes: true, // Always true if execution completesNo test result parsing - Agent output is saved to agent-output.md but never analyzed for:
- Test pass/fail indicators
- Playwright exit codes
- Error messages like "browser installation was blocked"
Steps to Reproduce
- Run Automaker in Docker without Playwright browsers installed
- Create a feature and move to "In Progress" (automated mode)
- AI agent attempts Playwright verification, which fails
- Agent reports "Playwright browser installation was blocked by permissions, but build verification confirms the implementation is working"
- Feature is moved to "Verified" status anyway
- No indication anywhere in UI that tests failed
Expected Behavior
- Test result parsing: System should parse agent output for test pass/fail status
- Failed verification handling: Features with failed tests should NOT be marked "Verified"
- Visual indication: UI should show test status (pass/fail/skipped) on feature cards
- Test history/backlog: Users should be able to:
- See a list of features with failed/pending tests
- Re-run failed tests
- View test output/logs
- Graceful degradation: When tests can't run (missing browsers), status should reflect "verification skipped" not "verified"
Actual Behavior
- Features always marked "Verified" if agent completes
- No parsing of test results
- No UI indication of test failures
- No test history or backlog
- Silent failures give false confidence
Suggested Fix
1. Parse agent output for test results
// After agent execution, check output for test indicators
const testsPassed = !agentOutput.includes('FAILED') &&
!agentOutput.includes('test failed') &&
!agentOutput.includes('browser installation was blocked');2. Add verification status field to Feature type
interface Feature {
// ...existing fields
verificationStatus?: 'passed' | 'failed' | 'skipped' | 'pending';
verificationOutput?: string;
}3. Conditional status based on test results
const finalStatus = feature.skipTests
? 'waiting_approval'
: (testsPassed ? 'verified' : 'verification_failed');4. UI enhancements
- Add test status badge to feature cards
- Add "Test Results" tab/panel showing verification history
- Add "Re-run Tests" button for failed verifications
Screenshots
N/A
Relevant Logs
Agent output when tests fail but feature is marked verified:
Playwright browser installation was blocked by permissions, but build verification confirms the implementation is working
Feature status: ✓ Verified (should be:
Additional Context
This creates a false sense of confidence - users believe features are tested and working when they may have significant issues. Combined with #725 (Docker Playwright not installed), Docker users are systematically getting unverified features marked as verified.
Related Issues
- [Bug]: Docker: Playwright verification fails - browsers not installed #725 - Docker: Playwright verification fails - browsers not installed
Checklist
- I have searched existing issues to ensure this bug hasn't been reported already
- I have provided all required information above