Skip to content

Conversation

@sitaowang1998
Copy link
Collaborator

@sitaowang1998 sitaowang1998 commented May 12, 2025

Description

User might want to stop a job from executing if the task fails and has no hope of recovery or because of other reason, so Spider need to support job cancellation both from inside inside the task through TaskContext::abort and from user through Job::cancel. The cancelled job will be in JobStatus::Cancelled state.

User also need to get the reason why a job is cancelled. This PR adds job_errors table to store the cancellation messages. The messages are set inside TaskContext::abort as a user argument and set to Job cancelled by user. inside Job::cancel. The table also stores the function name of the task calling TaskContext::abort, or user if the job is cancelled using Job::cancel. User could retrieve both information using Job::get_error.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

  • GitHub workflows pass.
  • Unit tests pass in dev container.
  • Integration tests pass in dev container.

Summary by CodeRabbit

Summary by CodeRabbit

  • New Features
    • Added job and task cancellation methods with detailed error message retrieval.
    • Introduced thread-safe management of task execution and cancellation control.
    • Extended database schema with a job_errors table to store cancellation details.
    • Added new task cancellation and abort test tasks.
  • Bug Fixes
    • Enhanced task state handling to properly reflect cancellations and error conditions.
  • Tests
    • Added comprehensive unit and integration tests for cancellation workflows.
    • Included a new cancellation test executable with command-line interface.
  • Refactor
    • Renamed task state check methods for improved clarity and consistency.
  • Chores
    • Updated build and test configurations to support new features and tests.

@sitaowang1998 sitaowang1998 requested a review from a team as a code owner May 12, 2025 20:22
@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 12, 2025

## Walkthrough

This update adds job and task cancellation features, including schema changes, storage API extensions, worker cancellation checks, and related tests. It also introduces thread-safe `ExecutorHandle` management and updates task executor state methods.

## Changes

| File(s)                                                                                 | Summary                                                                                                             |
|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| src/spider/CMakeLists.txt, src/spider/worker/ExecutorHandle.cpp, src/spider/worker/ExecutorHandle.hpp | Adds `ExecutorHandle` class and integrates it into the build process.                                              |
| src/spider/client/Job.hpp                                                               | Implements `cancel()` and `get_error()` methods for job cancellation and error retrieval.                          |
| src/spider/client/TaskContext.cpp                                                       | Adds `abort()` method to cancel a job from within a task, with immediate process termination.                      |
| src/spider/storage/MetadataStorage.hpp, src/spider/storage/mysql/MySqlStorage.hpp       | Extends storage API with job/task cancellation and error message retrieval methods.                                 |
| src/spider/storage/mysql/MySqlStorage.cpp                                               | Implements job/task cancellation and error retrieval in MySQL storage; updates `task_fail` logic.                  |
| src/spider/storage/mysql/mysql_stmt.hpp, tools/scripts/storage/init_db.sql              | Adds new `job_errors` table and corresponding SQL schema for storing job error messages.                           |
| src/spider/worker/TaskExecutor.cpp, src/spider/worker/TaskExecutor.hpp                  | Renames task state query methods; adds `cancelled()` method.                                                      |
| src/spider/worker/worker.cpp                                                            | Implements cooperative task cancellation in the worker, integrates with heartbeat, and updates control flow.        |
| tests/CMakeLists.txt, tests/client/cancel-test.cpp, tests/integration/test_cancel.py    | Adds new client and integration tests for cancellation; updates test build configuration.                          |
| tests/storage/test-MetadataStorage.cpp                                                  | Adds unit tests for job cancellation and cancellation by task.                                                     |
| tests/worker/worker-test.cpp, tests/worker/worker-test.hpp                              | Adds test tasks for sleep and abort scenarios; updates test task declarations.                                     |
| tests/worker/test-TaskExecutor.cpp                                                      | Updates test assertions to use new task executor state method names.                                               |

## Sequence Diagram(s)

```mermaid
sequenceDiagram
    participant Client
    participant Driver
    participant MetadataStorage
    participant Worker
    participant ExecutorHandle
    participant TaskExecutor

    Client->>Driver: submit_job()
    Driver->>MetadataStorage: create_job()
    MetadataStorage-->>Driver: job_id
    Driver-->>Client: Job

    Client->>Job: cancel()
    Job->>MetadataStorage: cancel_job(job_id, message)
    MetadataStorage-->>Job: result

    loop Heartbeat
        Worker->>ExecutorHandle: get_task_id()
        alt Task running
            Worker->>MetadataStorage: get_task_state(task_id)
            MetadataStorage-->>Worker: state
            alt state == Cancelled
                Worker->>ExecutorHandle: get_executor()
                Worker->>TaskExecutor: cancel()
            end
        end
    end

    Worker->>TaskExecutor: run()
    alt Task cancelled
        TaskExecutor-->>Worker: is_cancelled() == true
        Worker-->>MetadataStorage: update task/job state to Cancelled
    else Task completes
        Worker-->>MetadataStorage: update task/job state to Succeeded/Failed
    end

Suggested reviewers

  • davidlion

<!-- walkthrough_end -->


---

<details>
<summary>📜 Recent review details</summary>

**Configuration used: CodeRabbit UI**
**Review profile: CHILL**
**Plan: Pro**


<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 18ce4e94f0d1975f71629c5d137630c5d89b1fa8 and 8015388a22453dda8ca5e0417c9a83ab585c480d.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `src/spider/worker/TaskExecutor.hpp` (4 hunks)

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (1)</summary>

* src/spider/worker/TaskExecutor.hpp

</details>

<details>
<summary>⏰ Context from checks skipped due to timeout of 90000ms (3)</summary>

* GitHub Check: non-storage-unit-tests (ubuntu-24.04)
* GitHub Check: lint
* GitHub Check: non-storage-unit-tests (ubuntu-22.04)

</details>

</details>
<!-- internal state start -->


<!-- = -->

<!-- internal state end -->
<!-- finishing_touch_checkbox_start -->

<details open="true">
<summary>✨ Finishing Touches</summary>

- [ ] <!-- {"checkboxId": "7962f53c-55bc-4827-bfbf-6a18da830691"} --> 📝 Generate Docstrings

</details>

<!-- finishing_touch_checkbox_end -->
<!-- tips_start -->

---

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

<details>
<summary>❤️ Share</summary>

- [X](https://twitter.com/intent/tweet?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A&url=https%3A//coderabbit.ai)
- [Mastodon](https://mastodon.social/share?text=I%20just%20used%20%40coderabbitai%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20the%20proprietary%20code.%20Check%20it%20out%3A%20https%3A%2F%2Fcoderabbit.ai)
- [Reddit](https://www.reddit.com/submit?title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&text=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code.%20Check%20it%20out%3A%20https%3A//coderabbit.ai)
- [LinkedIn](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fcoderabbit.ai&mini=true&title=Great%20tool%20for%20code%20review%20-%20CodeRabbit&summary=I%20just%20used%20CodeRabbit%20for%20my%20code%20review%2C%20and%20it%27s%20fantastic%21%20It%27s%20free%20for%20OSS%20and%20offers%20a%20free%20trial%20for%20proprietary%20code)

</details>

<details>
<summary>🪧 Tips</summary>

### Chat

There are 3 ways to chat with [CodeRabbit](https://coderabbit.ai?utm_source=oss&utm_medium=github&utm_campaign=y-scope/spider&utm_content=137):

- Review comments: Directly reply to a review comment made by CodeRabbit. Example:
  - `I pushed a fix in commit <commit_id>, please review it.`
  - `Explain this complex logic.`
  - `Open a follow-up GitHub issue for this discussion.`
- Files and specific lines of code (under the "Files changed" tab): Tag `@coderabbitai` in a new review comment at the desired location with your query. Examples:
  - `@coderabbitai explain this code block.`
  -	`@coderabbitai modularize this function.`
- PR comments: Tag `@coderabbitai` in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
  - `@coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.`
  - `@coderabbitai read src/utils.ts and explain its main purpose.`
  - `@coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.`
  - `@coderabbitai help me debug CodeRabbit configuration file.`

### Support

Need help? Create a ticket on our [support page](https://www.coderabbit.ai/contact-us/support) for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

### CodeRabbit Commands (Invoked using PR comments)

- `@coderabbitai pause` to pause the reviews on a PR.
- `@coderabbitai resume` to resume the paused reviews.
- `@coderabbitai review` to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
- `@coderabbitai full review` to do a full review from scratch and review all the files again.
- `@coderabbitai summary` to regenerate the summary of the PR.
- `@coderabbitai generate docstrings` to [generate docstrings](https://docs.coderabbit.ai/finishing-touches/docstrings) for this PR.
- `@coderabbitai generate sequence diagram` to generate a sequence diagram of the changes in this PR.
- `@coderabbitai resolve` resolve all the CodeRabbit review comments.
- `@coderabbitai configuration` to show the current CodeRabbit configuration for the repository.
- `@coderabbitai help` to get help.

### Other keywords and placeholders

- Add `@coderabbitai ignore` anywhere in the PR description to prevent this PR from being reviewed.
- Add `@coderabbitai summary` to generate the high-level summary at a specific location in the PR description.
- Add `@coderabbitai` anywhere in the PR title to generate the title automatically.

### CodeRabbit Configuration File (`.coderabbit.yaml`)

- You can programmatically configure CodeRabbit by adding a `.coderabbit.yaml` file to the root of your repository.
- Please see the [configuration documentation](https://docs.coderabbit.ai/guides/configure-coderabbit) for more information.
- If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: `# yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json`

### Documentation and Community

- Visit our [Documentation](https://docs.coderabbit.ai) for detailed information on how to use CodeRabbit.
- Join our [Discord Community](http://discord.gg/coderabbit) to get help, request features, and share feedback.
- Follow us on [X/Twitter](https://twitter.com/coderabbitai) for updates and announcements.

</details>

<!-- tips_end -->

@sitaowang1998 sitaowang1998 requested a review from davidlion May 29, 2025 20:28
Copy link
Member

@davidlion davidlion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR and future PRs, we need to try and do a better job of explaining what and why in PR descriptions and titles. This is important as we want other engineers in YScope to reference these PRs and understand them without needing to read all the code.

Currently, both the title and description just list the names of things that were added with little to no explanation on their motivation, usage, or design.

Comment on lines +148 to +153
return;
}

if (spider::core::TaskState::Canceled != task_state) {
return;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function can fail so we should return an error code.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (6)
src/spider/worker/ExecutorHandle.hpp (2)

27-28: Move implementation comment to class documentation.

The raw pointer comment should be moved to the class documentation as suggested in previous reviews, since it explains an important design decision.

 /**
  * This class acts as a handle for thread-safe access to the task executor and task id.
  * It maintains a weak reference to the task executor to prevent multiple destructor calls and
- * ensures that access remains valid only while the executor itself is valid.
+ * ensures that access remains valid only while the executor itself is valid.
+ * 
+ * Note: Uses a raw pointer instead of std::shared_ptr to avoid calling the TaskExecutor 
+ * destructor multiple times.
  */
 class ExecutorHandle {
 public:
     [[nodiscard]] auto get_task_id() -> std::optional<boost::uuids::uuid>;
     auto executor_cancel() -> void;
     auto set(boost::uuids::uuid task_id, TaskExecutor* executor) -> void;
     auto clear() -> void;

 private:
     boost::uuids::uuid m_task_id;

-    // Do not use std::shared_ptr to avoid calling destructor twice.
     TaskExecutor* m_executor = nullptr;

     std::mutex m_mutex;

25-30: Follow coding guidelines for member variable ordering.

According to the coding guidelines referenced in past reviews, consider reordering member variables to follow the established pattern.

 private:
+    std::mutex m_mutex;
+    
     boost::uuids::uuid m_task_id;
-
-    // Do not use std::shared_ptr to avoid calling destructor twice.
     TaskExecutor* m_executor = nullptr;
-
-    std::mutex m_mutex;
src/spider/client/Job.hpp (2)

87-100: Refactor duplicated storage connection pattern.

This method contains the same storage connection acquisition pattern that was flagged in previous reviews for refactoring into a helper method.

The connection acquisition logic is repeated across multiple methods and should be extracted into a helper as discussed in previous reviews and tracked in issue #158.


173-197: Simplify branching logic and address code duplication.

The method has two issues:

  1. Contains the same duplicated storage connection pattern mentioned in previous reviews
  2. The branching logic for handling connection vs no-connection cases can be simplified

Consider refactoring to eliminate code duplication:

 auto get_error() -> std::pair<std::string, std::string> {
-    if (nullptr == m_conn) {
-        std::variant<std::unique_ptr<core::StorageConnection>, core::StorageErr> conn_result
-                = m_storage_factory->provide_storage_connection();
-        if (std::holds_alternative<core::StorageErr>(conn_result)) {
-            throw ConnectionException(std::get<core::StorageErr>(conn_result).description);
-        }
-        auto conn = std::move(std::get<std::unique_ptr<core::StorageConnection>>(conn_result));
-
-        std::pair<std::string, std::string> res;
-        core::StorageErr const err
-                = m_metadata_storage->get_error_message(*conn, m_id, &res.first, &res.second);
-        if (false == err.success()) {
-            throw ConnectionException{err.description};
-        }
-        return res;
-    }
-
+    auto* connection = m_conn.get();
+    std::unique_ptr<core::StorageConnection> temp_conn;
+    
+    if (nullptr == connection) {
+        // Use helper method once available (issue #158)
+        std::variant<std::unique_ptr<core::StorageConnection>, core::StorageErr> conn_result
+                = m_storage_factory->provide_storage_connection();
+        if (std::holds_alternative<core::StorageErr>(conn_result)) {
+            throw ConnectionException(std::get<core::StorageErr>(conn_result).description);
+        }
+        temp_conn = std::move(std::get<std::unique_ptr<core::StorageConnection>>(conn_result));
+        connection = temp_conn.get();
+    }
+    
     std::pair<std::string, std::string> res;
     core::StorageErr const err
-            = m_metadata_storage->get_error_message(*m_conn, m_id, &res.first, &res.second);
+            = m_metadata_storage->get_error_message(*connection, m_id, &res.first, &res.second);
     if (false == err.success()) {
         throw ConnectionException{err.description};
     }
     return res;
src/spider/storage/mysql/MySqlStorage.cpp (2)

1079-1120: Address the validation concerns from previous reviews

This method still has the issues identified in previous reviews:

  1. No validation of affected rows: The UPDATE statements on lines 1094 and 1102 don't check if any rows were actually updated, meaning the method could silently "succeed" even for non-existent jobs.

  2. Potential duplicate error rows: The INSERT on line 1106 could create multiple identical error messages if called repeatedly for the same job.

Apply the suggested fixes from the previous review:

- task_statement->executeUpdate();
+ int32_t updated = task_statement->executeUpdate();
+ if (updated == 0) {
+     static_cast<MySqlConnection&>(conn)->rollback();
+     return StorageErr{StorageErrType::KeyNotFoundErr,
+                       fmt::format("No running job with id {}", boost::uuids::to_string(id))};
+ }

...

- job_statement->executeUpdate();
+ updated = job_statement->executeUpdate();
+ if (updated == 0) {
+     static_cast<MySqlConnection&>(conn)->rollback();
+     return StorageErr{StorageErrType::KeyNotFoundErr,
+                       fmt::format("No job with id {}", boost::uuids::to_string(id))};
+ }

- "INSERT INTO `job_errors` (`job_id`, `func_name`, `message`) VALUES (?, ?, ?)"
+ "INSERT IGNORE INTO `job_errors` (`job_id`, `func_name`, `message`) VALUES (?, ?, ?)"

1122-1179: Apply the same validation fixes as suggested for cancel_job_by_user

This method has the same validation issues as cancel_job_by_user:

  1. Lines 1153 and 1161: The UPDATE statements don't verify affected row counts
  2. Line 1165: Uses regular INSERT which could create duplicates

Apply the same defensive checks as suggested in the previous review for consistency.

🧹 Nitpick comments (1)
src/spider/storage/mysql/MySqlStorage.cpp (1)

1181-1209: Consider handling multiple error messages

The method only retrieves the first error message if multiple exist for the same job. Consider whether this is the intended behaviour or if you should:

  1. Return the most recent error message (ORDER BY a timestamp)
  2. Concatenate multiple messages
  3. Document that only the first message is returned

If you want the most recent message, modify the query:

- "SELECT `func_name`, `message` FROM `job_errors` WHERE `job_id` = ?"
+ "SELECT `func_name`, `message` FROM `job_errors` WHERE `job_id` = ? ORDER BY `id` DESC LIMIT 1"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2f9d378 and a344f88.

📒 Files selected for processing (8)
  • src/spider/client/Job.hpp (2 hunks)
  • src/spider/storage/MetadataStorage.hpp (2 hunks)
  • src/spider/storage/mysql/MySqlStorage.cpp (3 hunks)
  • src/spider/storage/mysql/MySqlStorage.hpp (2 hunks)
  • src/spider/worker/ExecutorHandle.cpp (1 hunks)
  • src/spider/worker/ExecutorHandle.hpp (1 hunks)
  • tests/client/cancel-test.cpp (1 hunks)
  • tests/storage/test-MetadataStorage.cpp (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • tests/client/cancel-test.cpp
  • src/spider/storage/mysql/MySqlStorage.hpp
  • src/spider/worker/ExecutorHandle.cpp
  • src/spider/storage/MetadataStorage.hpp
  • tests/storage/test-MetadataStorage.cpp
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/spider/client/Job.hpp (2)
src/spider/client/Exception.hpp (2)
  • ConnectionException (14-15)
  • ConnectionException (14-14)
src/spider/storage/MetadataStorage.hpp (16)
  • conn (27-27)
  • conn (29-29)
  • conn (30-30)
  • conn (32-33)
  • conn (35-36)
  • conn (40-43)
  • conn (45-51)
  • conn (53-54)
  • conn (56-57)
  • conn (59-60)
  • conn (62-66)
  • conn (69-70)
  • conn (72-76)
  • conn (87-88)
  • conn (99-100)
  • conn (110-115)
src/spider/storage/mysql/MySqlStorage.cpp (1)
src/spider/storage/mysql/MySqlStorage.hpp (16)
  • conn (38-38)
  • conn (39-39)
  • conn (40-40)
  • conn (41-42)
  • conn (43-44)
  • conn (45-50)
  • conn (51-57)
  • conn (58-59)
  • conn (60-61)
  • conn (62-63)
  • conn (64-68)
  • conn (69-70)
  • conn (71-75)
  • conn (77-78)
  • conn (80-81)
  • conn (82-87)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: lint
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
🔇 Additional comments (4)
src/spider/worker/ExecutorHandle.hpp (1)

12-16: Good improvement on class documentation.

The class documentation now properly explains the weak reference design rationale, addressing the previous feedback about moving important design comments to the class docstring.

src/spider/client/Job.hpp (1)

162-169: Documentation clearly explains the new error retrieval functionality.

The updated documentation properly explains the return value structure and the different sources of cancellation (user vs task).

src/spider/storage/mysql/MySqlStorage.cpp (2)

1557-1588: LGTM!

This method correctly implements task state retrieval with proper error handling and follows the established patterns in the codebase.


1873-1882: Excellent improvement to handle race conditions

The modification to only update tasks in 'running' state and check affected row counts is a great improvement that:

  1. Prevents invalid state transitions: Only fails tasks that are actually running
  2. Handles cancellation race conditions: Gracefully handles cases where tasks are cancelled before failing
  3. Optimizes unnecessary operations: Skips job state updates when no task was affected

@sitaowang1998 sitaowang1998 changed the title feat: Add job cancellation. feat: Add job cancellation for early stop of the job execution. Jun 26, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
src/spider/worker/ExecutorHandle.cpp (1)

19-24: Avoid holding the mutex while executing user code – potential deadlock

The method locks m_mutex and directly calls m_executor->cancel(). If cancel() (or any callback it triggers) tries to access the ExecutorHandle again, the thread will deadlock on m_mutex.

Refactor to follow the "lock-copy-unlock-act" idiom:

auto ExecutorHandle::cancel_executor() -> void {
-    std::lock_guard const lock_guard{m_mutex};
-    if (nullptr != m_executor) {
-        m_executor->cancel();
-    }
+    TaskExecutor* exec = nullptr;
+    {
+        std::lock_guard const lock_guard{m_mutex};
+        exec = m_executor;
+    }
+    if (exec != nullptr) {
+        exec->cancel();   // outside lock
+    }
}
🧹 Nitpick comments (1)
tests/storage/test-MetadataStorage.cpp (1)

452-459: Enhance the documentation to describe task relationships

The current documentation could be clearer about the task graph structure and relationships.

Consider this more descriptive documentation:

/**
- * Create a common job cancel test setup. Create a job with a task graph that consists of two
- * parent tasks and one child task. Set the state of parent 1 to succeed. Parent 2 state remains
- * ready and child state remains pending.
+ * Creates a test job with a task dependency graph for cancellation testing.
+ * 
+ * Task graph structure:
+ *   parent_1 (p1) ──┐
+ *                   ├──> child_task
+ *   parent_2 (p2) ──┘
+ * 
+ * Initial states after setup:
+ *   - parent_1: Succeeded (with output "1.1")
+ *   - parent_2: Ready
+ *   - child_task: Pending (waiting for parent_2 to complete)
+ * 
  * @param storage
  * @param conn
- * @return A tuple containing the job_id, parent_1_id, parent_2_id, and child_task_id.
+ * @return A tuple containing (job_id, parent_1_id, parent_2_id, child_task_id)
  */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a344f88 and e4da187.

📒 Files selected for processing (12)
  • src/spider/storage/MetadataStorage.hpp (2 hunks)
  • src/spider/storage/mysql/MySqlStorage.cpp (3 hunks)
  • src/spider/storage/mysql/MySqlStorage.hpp (2 hunks)
  • src/spider/storage/mysql/mysql_stmt.hpp (2 hunks)
  • src/spider/worker/ExecutorHandle.cpp (1 hunks)
  • src/spider/worker/ExecutorHandle.hpp (1 hunks)
  • src/spider/worker/TaskExecutor.cpp (4 hunks)
  • src/spider/worker/TaskExecutor.hpp (4 hunks)
  • src/spider/worker/worker.cpp (7 hunks)
  • tests/storage/test-MetadataStorage.cpp (2 hunks)
  • tests/worker/worker-test.cpp (3 hunks)
  • tools/scripts/storage/init_db.sql (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
  • tests/worker/worker-test.cpp
  • tools/scripts/storage/init_db.sql
  • src/spider/storage/mysql/mysql_stmt.hpp
  • src/spider/worker/TaskExecutor.hpp
  • src/spider/worker/TaskExecutor.cpp
  • src/spider/storage/mysql/MySqlStorage.hpp
  • src/spider/worker/worker.cpp
  • src/spider/storage/MetadataStorage.hpp
  • src/spider/storage/mysql/MySqlStorage.cpp
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/spider/worker/ExecutorHandle.cpp (2)
src/spider/worker/TaskExecutor.cpp (2)
  • get_task_id (21-23)
  • get_task_id (21-21)
src/spider/worker/ExecutorHandle.hpp (1)
  • executor (21-21)
tests/storage/test-MetadataStorage.cpp (3)
src/spider/storage/MetadataStorage.hpp (16)
  • conn (27-27)
  • conn (29-29)
  • conn (30-30)
  • conn (32-33)
  • conn (35-36)
  • conn (40-43)
  • conn (45-51)
  • conn (53-54)
  • conn (56-57)
  • conn (59-60)
  • conn (62-66)
  • conn (69-70)
  • conn (72-76)
  • conn (87-88)
  • conn (100-101)
  • conn (112-117)
src/spider/core/Data.hpp (1)
  • gen (40-43)
src/spider/client/Job.hpp (1)
  • conn_result (87-100)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: non-storage-unit-tests (ubuntu-24.04)
  • GitHub Check: non-storage-unit-tests (ubuntu-22.04)
  • GitHub Check: lint
🔇 Additional comments (2)
src/spider/worker/ExecutorHandle.hpp (1)

12-40: Well-designed thread-safe singleton implementation

The class documentation clearly explains the design rationale, and the implementation properly enforces singleton semantics by deleting all constructors and assignment operators. The explanation for using raw pointers instead of std::shared_ptr is helpful.

tests/storage/test-MetadataStorage.cpp (1)

506-578: Well-structured cancellation tests

Both test cases comprehensively verify the cancellation functionality:

  • Proper state transitions for all tasks
  • Correct error message and offender recording
  • Appropriate cleanup after tests

The use of the helper function reduces duplication effectively.

@sitaowang1998 sitaowang1998 changed the title feat: Add job cancellation for early stop of the job execution. feat: Add job cancellation for early stop of the job execution (#137). Jun 26, 2025
@sitaowang1998 sitaowang1998 changed the title feat: Add job cancellation for early stop of the job execution (#137). feat: Add job cancellation for early stop of the job execution. Jun 26, 2025
@sitaowang1998 sitaowang1998 requested a review from davidlion June 26, 2025 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants