This project scrapes documentation off of the IBM website and outputs them into a JSON document perserving the formatting of the HTML.
The script can take over an hour to run depending on the amount of pages you are scraping.
- python 3.12.+
- requests-html
- BeautifulSoup4
To install the dependecies use
pip install requests-html
pip install beautifulSoup4
You can run the scripts from the command line or there are configurations to debug them from Visual Studio Code.
Clone this repository and open it in Visual Studio Code. On the left hand side hit the debug tab and choose the configuration at the top. Then press the green run button or F5 keyboard shortcut.
python reason_codes.pyparses the reason codespython sql_codes.pyparses the SQL codespython dsn_codes.pyparses the dsn codespython working.pysimpler script that leverages all the functionality but only scrapes one reason/SQL code on outputs it to a JSON documentpython parse_and_print.pysimpler script that leverages all the functionality but only scrapes one reason/SQL code and pretty prints it to the console
