Skip to content

Conversation

@TobeTek
Copy link
Collaborator

@TobeTek TobeTek commented May 4, 2025

…ADME

Summary by Sourcery

Add scraping functionality for DeepLearning.ai courses to the course scraper module

New Features:

  • Implement scraping of DeepLearning.ai courses from Algolia index
  • Create data models for storing course information
  • Generate CSV files for courses and learning pathways

Enhancements:

  • Update README to mark DeepLearning.ai courses as completed in scraping roadmap

@TobeTek TobeTek requested a review from neomatrix369 as a code owner May 4, 2025 23:48
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented May 4, 2025

Reviewer's Guide

Implemented a new scraper for DeepLearning.ai courses by querying their Algolia API endpoint. The script fetches course data, parses it using Pydantic models, and saves the results into two distinct CSV files.

Sequence diagram for DeepLearning.ai course scraping

sequenceDiagram
    participant Script as scrape_all_courses.py
    participant Algolia
    participant CSV Files

    Script->>Algolia: POST /queries (fetch page 0)
    activate Algolia
    Algolia-->>Script: Course data (hits) + nbPages
    deactivate Algolia

    loop Fetch all pages
        Script->>Algolia: POST /queries (fetch page N)
        activate Algolia
        Algolia-->>Script: Course data (hits)
        deactivate Algolia
    end

    Script->>Script: Parse course data (parse_algolia_data)
    Script->>CSV Files: Write Courses_and_Learning_Materials.csv
    Script->>CSV Files: Write Learning_Pathway_Index.csv
Loading

Class diagram for new DeepLearning.ai data models

classDiagram
    class Course {
        +str Module_Code
        +str Source
        +Optional[str] Course_Level
        +Optional[str] Duration
        +Optional[str] Prerequisites
        +Optional[str] Prework
        +str Course_Learning_Material
        +str Course_Learning_Material_Link
        +str Type_Free_Paid
    }

    class CourseIndex {
        +str Module_Code
        +str Course_Learning_Material
        +str Source
        +str Course_Level
        +str Type_Free_Paid
        +str Module
        +Optional[float] Duration
        +Optional[str] Difficulty_Level
        +Optional[str] Keywords_Tags_Skills_Interests_Categories
        +str Links
    }
Loading

File-Level Changes

Change Details Files
Added DeepLearning.ai course scraper.
  • Created a script to fetch course data from the DeepLearning.ai Algolia API.
  • Implemented pagination to retrieve all course results.
  • Parsed the API response data.
  • Saved the parsed data into two CSV files ('Courses_and_Learning_Materials' and 'Learning_Pathway_Index').
  • Added logging and basic error handling.
app/course-scraper/src/scrapers/deeplearning/scrape_all_courses.py
Defined data models for DeepLearning.ai courses.
  • Created Pydantic models Course and CourseIndex to structure the scraped data.
app/course-scraper/src/scrapers/deeplearning/models.py
Updated documentation.
  • Marked 'Deeplearning.ai Courses' as completed in the README checklist.
app/course-scraper/README.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @TobeTek - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

  • Hardcoded Algolia API key and Application ID (link)

  • Move the hardcoded Algolia URL, API key, and application ID to a configuration file or environment variables.

  • Consider consolidating the Course and CourseIndex data models and their corresponding CSV outputs.

Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🔴 Security: 1 blocking issue
  • 🟢 Testing: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@TobeTek TobeTek force-pushed the deeplearning-ai-scrapper branch from f042314 to d5b82e9 Compare May 13, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants