-
Notifications
You must be signed in to change notification settings - Fork 18
feat: Add DeepLearning AI course scraping functionality and update RE… #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideImplemented a new scraper for DeepLearning.ai courses by querying their Algolia API endpoint. The script fetches course data, parses it using Pydantic models, and saves the results into two distinct CSV files. Sequence diagram for DeepLearning.ai course scrapingsequenceDiagram
participant Script as scrape_all_courses.py
participant Algolia
participant CSV Files
Script->>Algolia: POST /queries (fetch page 0)
activate Algolia
Algolia-->>Script: Course data (hits) + nbPages
deactivate Algolia
loop Fetch all pages
Script->>Algolia: POST /queries (fetch page N)
activate Algolia
Algolia-->>Script: Course data (hits)
deactivate Algolia
end
Script->>Script: Parse course data (parse_algolia_data)
Script->>CSV Files: Write Courses_and_Learning_Materials.csv
Script->>CSV Files: Write Learning_Pathway_Index.csv
Class diagram for new DeepLearning.ai data modelsclassDiagram
class Course {
+str Module_Code
+str Source
+Optional[str] Course_Level
+Optional[str] Duration
+Optional[str] Prerequisites
+Optional[str] Prework
+str Course_Learning_Material
+str Course_Learning_Material_Link
+str Type_Free_Paid
}
class CourseIndex {
+str Module_Code
+str Course_Learning_Material
+str Source
+str Course_Level
+str Type_Free_Paid
+str Module
+Optional[float] Duration
+Optional[str] Difficulty_Level
+Optional[str] Keywords_Tags_Skills_Interests_Categories
+str Links
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @TobeTek - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
-
Hardcoded Algolia API key and Application ID (link)
-
Move the hardcoded Algolia URL, API key, and application ID to a configuration file or environment variables.
-
Consider consolidating the
CourseandCourseIndexdata models and their corresponding CSV outputs.
Here's what I looked at during the review
- 🟡 General issues: 2 issues found
- 🔴 Security: 1 blocking issue
- 🟢 Testing: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
app/course-scraper/src/scrapers/deeplearning/scrape_all_courses.py
Outdated
Show resolved
Hide resolved
app/course-scraper/src/scrapers/deeplearning/scrape_all_courses.py
Outdated
Show resolved
Hide resolved
f042314 to
d5b82e9
Compare
…ADME
Summary by Sourcery
Add scraping functionality for DeepLearning.ai courses to the course scraper module
New Features:
Enhancements: