The Link Extractor Tool is a Python-based web scraping utility that extracts all links (relative and absolute) from a webpage and its associated JavaScript files. It is designed for developers and researchers who need to analyze links or URLs embedded in websites.
- Extracts links from:
- Webpage HTML
- External JavaScript files
- Supports multiple browsers (Firefox, Chrome, Edge)
- Efficient multithreading for downloading and processing script files
- Saves results in a text file named after the website domain
- Simple and intuitive CLI interface
- Python 3.7 or higher
- Selenium
- Requests
- A compatible WebDriver for your browser:
- Geckodriver for Firefox
- Chromedriver for Chrome
- Edgedriver for Edge
-
Clone the repository:
git clone https://github.com/Amirprx3/LinkeXtractor.git cd LinkeXtractor -
Install dependencies:
pip install -r requirements.txt
-
Download the WebDriver for your preferred browser and add it to your system's PATH.
-
Run the script:
python LinkeXtractor.py
-
Select a browser:
- firefox
- chrome
- edge
-
Enter one or more website URLs to process. Type
exitwhen done. -
Extracted links will be saved in a text file named after the website's domain. For example:
- Input:
https://example.com - Output:
example.com.txt
- Input:
Input:
Enter the browser to use (firefox, chrome, edge): firefox
Enter the website URL (or type 'exit' to quit): https://example.com
Enter the website URL (or type 'exit' to quit): exit
Output:
example.com.txt:
- Change Link Regex: Modify the
regexvariable incollect_linksto customize the type of links extracted. - Set Max Threads: Adjust
max_workersin theThreadPoolExecutorfor optimal performance based on your system.
- Websites with complex JavaScript-based navigation (e.g., SPAs) may not load all links.
- Some links might not be accessible due to restrictions like CORS policies or CAPTCHA.
This project is licensed under the MIT License. See the LICENSE file for details.
Developed by Amirprx3.