Skip to content

Conversation

@erawat
Copy link
Member

@erawat erawat commented Nov 27, 2025

Overview

Add centralized webhook processing infrastructure that payment processors can use for async webhook handling with automatic deduplication, retry logic, and queue management. This eliminates code
duplication across payment processor extensions.

Before

Each payment processor extension implemented its own webhook handling:

  • Duplicate detection logic
  • Queue management
  • Retry logic with backoff
  • Handler registration

After

Payment processors extend base classes and register handlers via DI:

  • Automatic webhook deduplication
  • Queue-based async processing with retry (5min → 15min → 45min)
  • Scheduled job processes all webhooks every 5 minutes
  • Focus on processor-specific logic only

Architecture

Components Overview

  ┌─────────────────────────────────────────────────────────┐
  │ Payment Processor Extension (e.g., Stripe)              │
  ├─────────────────────────────────────────────────────────┤
  │ 1. StripeWebhookReceiver extends WebhookReceiverService │
  │    - Receives HTTP webhook                              │
  │    - Validates signature                                │
  │    - Calls saveWebhookEvent() (deduplication)           │
  │    - Calls queueWebhook() (async processing)            │
  │                                                          │
  │ 2. StripePaymentSucceededHandler implements             │
  │    WebhookHandlerInterface                              │
  │    - Processes payment.succeeded event                  │
  │    - Completes contribution                             │
  │    - Returns result code                                │
  │                                                          │
  │ 3. Register handler in ServiceContainer:                │
  │    $registry->registerHandler(                          │
  │      'stripe',                                          │
  │      'payment.succeeded',                               │
  │      'stripe.payment_succeeded_handler'                 │
  │    );                                                   │
  └─────────────────────────────────────────────────────────┘
                              ↓
  ┌─────────────────────────────────────────────────────────┐
  │ PaymentProcessingCore (This Extension)                  │
  ├─────────────────────────────────────────────────────────┤
  │ WebhookReceiverService → WebhookQueueService →          │
  │ WebhookQueueRunnerService → WebhookHandlerRegistry →    │
  │ Handler executes → Contribution completed                │
  └─────────────────────────────────────────────────────────┘

  ---

Database Schema

civicrm_payment_webhook


  | Field | Type | Description |
  |-------|------|-------------|
  | `id` | int | Primary key |
  | `event_id` | varchar(255) | Processor event ID (evt_... for Stripe) |
  | `processor_type` | varchar(50) | Processor type: 'stripe', 'gocardless', etc. |
  | `event_type` | varchar(100) | Event type (payment_intent.succeeded, etc.) |
  | `payment_attempt_id` | int | FK to civicrm_payment_attempt (nullable) |
  | `status` | varchar(25) | new, processing, processed, error, permanent_error |
  | `attempts` | int | Number of processing attempts (default 0) |
  | `next_retry_at` | timestamp | When to retry (for exponential backoff) |
  | `result` | varchar(50) | Processing result: applied, noop, error, etc. |
  | `error_log` | text | Error details if processing failed |
  | `processing_started_at` | timestamp | When webhook entered 'processing' state |
  | `processed_at` | timestamp | When event was successfully processed |
  | `created_date` | timestamp | When webhook was received |

  **Indexes:**
  - `UI_event_processor` (UNIQUE): `event_id` + `processor_type` (prevents duplicates)
  - `index_event_type`: `event_type` (for filtering by event type)
  - `index_status_retry`: `status` + `next_retry_at` (for retry queries)

Retry Logic Deep Dive

How Retry Works

When a webhook fails to process, the system implements exponential backoff retry:

Attempt 1 → Fails → Wait 5 minutes → Retry (Attempt 2)
Attempt 2 → Fails → Wait 15 minutes → Retry (Attempt 3)
Attempt 3 → Fails → Wait 45 minutes → Permanent Error

Constants:

MAX_RETRY_ATTEMPTS = 3;      // Max attempts before permanent error
RETRY_BASE_DELAY = 300;      // 5 minutes in seconds

Exponential Backoff Formula:
$delaySeconds = 300 * pow(3, $attempts - 1);
// Attempt 1: 300 * 3^0 = 300 seconds (5 minutes)
// Attempt 2: 300 * 3^1 = 900 seconds (15 minutes)
// Attempt 3: 300 * 3^2 = 2700 seconds (45 minutes)

Retry Flow


  ┌─────────────────────────────────────────────────────────┐
  │ Webhook Processing Fails                                │
  └────────────────────┬────────────────────────────────────┘
                       ↓
  ┌─────────────────────────────────────────────────────────┐
  │ PaymentWebhook Record Updated:                          │
  │ - status: 'error'                                       │
  │ - attempts: incremented                                 │
  │ - next_retry_at: NOW() + backoff delay                  │
  │ - error_log: exception message                          │
  └────────────────────┬────────────────────────────────────┘
                       ↓
           ┌──────────────────────┐
           │ attempts >= 3?       │
           └──────┬───────────────┘
                  │
         No ──────┴───── Yes
          │               │
          ↓               ↓
  ┌───────────────┐  ┌──────────────────┐
  │ Wait for      │  │ Mark as          │
  │ next_retry_at │  │ permanent_error  │
  │               │  │ (no more retries)│
  └───────┬───────┘  └──────────────────┘
          │
          ↓
  ┌───────────────────────────────────┐
  │ Scheduled Job Runs (every 5 min) │
  │ - Finds webhooks with:            │
  │   status='error' AND              │
  │   next_retry_at <= NOW() AND      │
  │   attempts < 3                    │
  │ - Re-queues for processing        │
  └───────┬───────────────────────────┘
          │
          ↓
  ┌───────────────────┐
  │ Retry Processing  │
  └───────────────────┘

Example Timeline

Webhook arrives at 10:00 AM:


  | Time     | Attempt | Status          | Action                                       |
  |----------|---------|-----------------|----------------------------------------------|
  | 10:00 AM | 1       | processing      | Handler throws exception                     |
  | 10:00 AM | 1       | error           | next_retry_at = 10:05 AM (5min)              |
  | 10:05 AM | 2       | processing      | Handler throws exception again               |
  | 10:05 AM | 2       | error           | next_retry_at = 10:20 AM (15min)             |
  | 10:20 AM | 3       | processing      | Handler throws exception again               |
  | 10:20 AM | 3       | permanent_error | No more retries - manual intervention needed |

Total time before permanent error: ~20 minutes


Stuck Webhook Recovery

If a webhook gets stuck in "processing" status (e.g., server crash during processing), it's automatically reset:

  public static function resetStuckWebhooks(int $timeoutMinutes = 30): int {
    // Find webhooks stuck in processing for > 30 minutes
    $cutoff = date('Y-m-d H:i:s', strtotime("-{$timeoutMinutes} minutes"));

    $stuckWebhooks = PaymentWebhook::get(FALSE)
      ->addSelect('id')
      ->addWhere('status', '=', 'processing')
      ->addWhere('processing_started_at', 'IS NOT NULL')
      ->addWhere('processing_started_at', '<', $cutoff)
      ->setLimit(100)  // Prevent unbounded loops
      ->execute();

    // Batch update to 'new' status (will be reprocessed)
    // ...

    return count($webhookIds);
  }

Called automatically at the start of each scheduled job run.


Monitoring Failed Webhooks

Query webhooks needing attention:

  -- Permanent errors (need manual intervention)
  SELECT id, event_id, processor_type, event_type, attempts, error_log, created_date
  FROM civicrm_payment_webhook
  WHERE status = 'permanent_error'
  ORDER BY created_date DESC;
```sql
  -- Currently failing (will retry)
  SELECT id, event_id, processor_type, event_type, attempts, next_retry_at, error_log
  FROM civicrm_payment_webhook
  WHERE status = 'error' AND attempts < 3
  ORDER BY next_retry_at ASC;
```sql
  -- Stuck in processing (will be reset)
  SELECT id, event_id, processor_type, event_type, processing_started_at
  FROM civicrm_payment_webhook
  WHERE status = 'processing'
    AND processing_started_at IS NOT NULL
    AND processing_started_at < DATE_SUB(NOW(), INTERVAL 30 MINUTE);
  ---

  ## An example how to Implement (For Payment Processor Developers)

  ### Step 1: Create Webhook Receiver

  Extend `WebhookReceiverService` to handle incoming webhooks:

  ```php
  class StripeWebhookReceiver extends WebhookReceiverService {

    public function getProcessorType(): string {
      return 'stripe';
    }

    public function processWebhook(): void {
      // 1. Get raw webhook data
      $payload = file_get_contents('php://input');
      $signature = $_SERVER['HTTP_STRIPE_SIGNATURE'];

      // 2. Verify signature (processor-specific)
      $event = \Stripe\Webhook::constructEvent($payload, $signature, $secret);

      // 3. Save webhook (automatic deduplication)
      $webhookId = $this->saveWebhookEvent(
        $event->id,                    // event_id
        $event->type,                  // event_type
        $this->findPaymentAttemptId($event->data->object->id)
      );

      if (!$webhookId) {
        // Duplicate - already processed
        return;
      }

      // 4. Queue for async processing
      $this->queueWebhook($webhookId, ['event_data' => $event]);
    }
  }

Step 2: Create Event Handler

Implement WebhookHandlerInterface for each event type:

  class StripePaymentSucceededHandler implements WebhookHandlerInterface {

    public function handle(int $webhookId, array $params): string {
      // 1. Get webhook and payment attempt
      $webhook = PaymentWebhook::get(FALSE)
        ->addWhere('id', '=', $webhookId)
        ->execute()->first();

      $attempt = PaymentAttempt::get(FALSE)
        ->addWhere('id', '=', $webhook['payment_attempt_id'])
        ->execute()->first();

      // 2. Process payment (processor-specific logic)
      $eventData = $params['event_data'];
      $paymentIntent = $eventData['data']['object'];

      // 3. Complete contribution
      $completionService = \Civi::service('paymentprocessingcore.contribution_completion');
      $completionService->complete(
        $attempt['contribution_id'],
        $paymentIntent['id'],
        $paymentIntent['amount'] / 100
      );

      return 'applied'; // Result codes: applied, noop, ignored_out_of_order
    }
  }

Step 3: Register Handler

Register handlers in your extension's ServiceContainer:

  class ServiceContainer {
    public function register(): void {
      // Register handler service
      $this->container->setDefinition(
        'stripe.payment_succeeded_handler',
        new Definition(StripePaymentSucceededHandler::class)
      )->setAutowired(TRUE);

      // Register with webhook handler registry
      $registry = $this->container->getDefinition(
        'paymentprocessingcore.webhook_handler_registry'
      );

      $registry->addMethodCall('registerHandler', [
        'stripe',                           // processor_type
        'payment_intent.succeeded',         // event_type
        'stripe.payment_succeeded_handler'  // service_id
      ]);
    }
  }

Step 4: That's It!

The infrastructure handles:

  • ✅ Duplicate detection (unique constraint on event_id + processor_type)
  • ✅ Async processing (queued in civicrm_queue)
  • ✅ Retry logic (exponential backoff: 5min, 15min, 45min)
  • ✅ Batch processing (250 events per processor per run)
  • ✅ Permanent error marking (after 3 failed attempts)
  • ✅ Stuck webhook recovery (resets processing webhooks older than 1 hour)

@erawat erawat force-pushed the CIVIMM-426-webhook branch 2 times, most recently from ef3df89 to 65cf38f Compare November 27, 2025 16:32
@erawat
Copy link
Member Author

erawat commented Nov 27, 2025

@codex review

@erawat erawat requested a review from Copilot November 27, 2025 16:33
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds centralized webhook processing infrastructure for payment processors with automatic deduplication, retry logic, and queue management. The system eliminates code duplication across payment processor extensions by providing reusable base classes and services.

Key Changes:

  • Webhook queue infrastructure with exponential backoff retry (5min → 15min → 45min)
  • Auto-discovery of payment processors via dependency injection
  • Scheduled job processing all webhooks every 5 minutes with batch limiting

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
xml/schema/CRM/Paymentprocessingcore/PaymentWebhook.xml Added retry fields (attempts, next_retry_at) and permanent_error status
sql/auto_install.sql Added database columns and index for retry logic
managed/Job_WebhookQueueRunner.mgd.php Scheduled job definition for webhook processing
api/v3/WebhookQueueRunner/Run.php API endpoint for processing webhook queues
Civi/Paymentprocessingcore/Webhook/WebhookHandlerInterface.php Interface contract for webhook handlers
Civi/Paymentprocessingcore/Service/WebhookReceiverService.php Abstract base class for webhook receivers
Civi/Paymentprocessingcore/Service/WebhookQueueService.php Queue management service
Civi/Paymentprocessingcore/Service/WebhookQueueRunnerService.php Queue processing with retry logic
Civi/Paymentprocessingcore/Service/WebhookHandlerRegistry.php Handler registry for auto-discovery
CRM/Paymentprocessingcore/BAO/PaymentWebhook.php Database operations for webhook management
tests/phpunit/TESTING.md Testing strategy documentation
tests/phpunit/**/*Test.php Unit tests for webhook infrastructure
README.md Documentation of webhook system
CLAUDE.md Architecture documentation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@erawat erawat force-pushed the CIVIMM-426-webhook branch from 65cf38f to 21a7f5e Compare November 27, 2025 18:40
@erawat
Copy link
Member Author

erawat commented Nov 27, 2025

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…k detection

Add processing_started_at timestamp field to track when webhooks enter processing state, fixing critical bug where resetStuckWebhooks() incorrectly used created_date.

Changes:
- Add processing_started_at field to PaymentWebhook schema
- Update updateStatusAtomic() to set timestamp when status becomes 'processing'
- Refactor updateStatusAtomic() to use API4 instead of raw SQL
- Fix resetStuckWebhooks() to use processing_started_at instead of created_date
- Add 4 comprehensive tests for processing_started_at functionality
- Regenerate DAO files with new schema
@erawat erawat force-pushed the CIVIMM-426-webhook branch from 6683bea to 79b6c2f Compare November 27, 2025 20:17
@erawat erawat merged commit edd5379 into master Nov 28, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants