Webscraper

This project scrapes documentation off of the IBM website and outputs them into a JSON document perserving the formatting of the HTML.

The script can take over an hour to run depending on the amount of pages you are scraping.

Requirements

python 3.12.+

Dependencies

requests-html
BeautifulSoup4

To install the dependecies use

pip install requests-html
pip install beautifulSoup4

Runing the scripts

You can run the scripts from the command line or there are configurations to debug them from Visual Studio Code.

Visual Studio Code

Clone this repository and open it in Visual Studio Code. On the left hand side hit the debug tab and choose the configuration at the top. Then press the green run button or F5 keyboard shortcut.

Command Line

python reason_codes.py parses the reason codes
python sql_codes.py parses the SQL codes
python dsn_codes.py parses the dsn codes
python working.py simpler script that leverages all the functionality but only scrapes one reason/SQL code on outputs it to a JSON document
python parse_and_print.py simpler script that leverages all the functionality but only scrapes one reason/SQL code and pretty prints it to the console

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
.vscode		.vscode
Images		Images
.gitignore		.gitignore
README.md		README.md
constants.py		constants.py
dsn_codes.py		dsn_codes.py
parse_and_print.py		parse_and_print.py
profiler.py		profiler.py
progress_bar.py		progress_bar.py
reason_codes.py		reason_codes.py
soup_engine.py		soup_engine.py
soup_engine.pyc		soup_engine.pyc
soup_to_dict.pyc		soup_to_dict.pyc
sql_codes.py		sql_codes.py
system_messages.py		system_messages.py
working.py		working.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Webscraper

Requirements

Dependencies

Runing the scripts

Visual Studio Code

Command Line

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Nordeast/webscraper

Folders and files

Latest commit

History

Repository files navigation

Webscraper

Requirements

Dependencies

Runing the scripts

Visual Studio Code

Command Line

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages