Skip to content

[Bug]: Features marked 'Verified' even when Playwright tests fail - no test result tracking #726

@JasonBroderick

Description

@JasonBroderick

Operating System

Linux (Docker)

Run Mode

Docker

App Version

0.13.0

Bug Description

Features in automated testing mode are marked as "Verified" regardless of whether Playwright tests actually passed or failed. There is no visual indication in the UI when tests fail, no parsing of test results from agent output, and no backlog or history of test results to review.

Root Cause Analysis

In apps/server/src/services/auto-mode-service.ts:

Line 1350-1354 - Status is set based solely on mode, not test results:

// Determine final status based on testing mode:
// - skipTests=false (automated testing): go directly to 'verified' (no manual verify needed)
// - skipTests=true (manual verification): go to 'waiting_approval' for manual review
const finalStatus = feature.skipTests ? 'waiting_approval' : 'verified';
await this.updateFeatureStatus(projectPath, featureId, finalStatus);

Line 1393 - Success is hardcoded:

passes: true,  // Always true if execution completes

No test result parsing - Agent output is saved to agent-output.md but never analyzed for:

  • Test pass/fail indicators
  • Playwright exit codes
  • Error messages like "browser installation was blocked"

Steps to Reproduce

  1. Run Automaker in Docker without Playwright browsers installed
  2. Create a feature and move to "In Progress" (automated mode)
  3. AI agent attempts Playwright verification, which fails
  4. Agent reports "Playwright browser installation was blocked by permissions, but build verification confirms the implementation is working"
  5. Feature is moved to "Verified" status anyway
  6. No indication anywhere in UI that tests failed

Expected Behavior

  1. Test result parsing: System should parse agent output for test pass/fail status
  2. Failed verification handling: Features with failed tests should NOT be marked "Verified"
  3. Visual indication: UI should show test status (pass/fail/skipped) on feature cards
  4. Test history/backlog: Users should be able to:
    • See a list of features with failed/pending tests
    • Re-run failed tests
    • View test output/logs
  5. Graceful degradation: When tests can't run (missing browsers), status should reflect "verification skipped" not "verified"

Actual Behavior

  • Features always marked "Verified" if agent completes
  • No parsing of test results
  • No UI indication of test failures
  • No test history or backlog
  • Silent failures give false confidence

Suggested Fix

1. Parse agent output for test results

// After agent execution, check output for test indicators
const testsPassed = !agentOutput.includes('FAILED') && 
                   !agentOutput.includes('test failed') &&
                   !agentOutput.includes('browser installation was blocked');

2. Add verification status field to Feature type

interface Feature {
  // ...existing fields
  verificationStatus?: 'passed' | 'failed' | 'skipped' | 'pending';
  verificationOutput?: string;
}

3. Conditional status based on test results

const finalStatus = feature.skipTests 
  ? 'waiting_approval' 
  : (testsPassed ? 'verified' : 'verification_failed');

4. UI enhancements

  • Add test status badge to feature cards
  • Add "Test Results" tab/panel showing verification history
  • Add "Re-run Tests" button for failed verifications

Screenshots

N/A

Relevant Logs

Agent output when tests fail but feature is marked verified:

Playwright browser installation was blocked by permissions, but build verification confirms the implementation is working

Feature status: ✓ Verified (should be: ⚠️ Verification Failed)

Additional Context

This creates a false sense of confidence - users believe features are tested and working when they may have significant issues. Combined with #725 (Docker Playwright not installed), Docker users are systematically getting unverified features marked as verified.

Related Issues

Checklist

  • I have searched existing issues to ensure this bug hasn't been reported already
  • I have provided all required information above

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions