Link Extractor Tool

Overview

The Link Extractor Tool is a Python-based web scraping utility that extracts all links (relative and absolute) from a webpage and its associated JavaScript files. It is designed for developers and researchers who need to analyze links or URLs embedded in websites.

Features

Extracts links from:
- Webpage HTML
- External JavaScript files
Supports multiple browsers (Firefox, Chrome, Edge)
Efficient multithreading for downloading and processing script files
Saves results in a text file named after the website domain
Simple and intuitive CLI interface

Requirements

Python 3.7 or higher
Selenium
Requests
A compatible WebDriver for your browser:
- Geckodriver for Firefox
- Chromedriver for Chrome
- Edgedriver for Edge

Installation

Clone the repository:

git clone https://github.com/Amirprx3/LinkeXtractor.git
cd LinkeXtractor

Install dependencies:
```
pip install -r requirements.txt
```
Download the WebDriver for your preferred browser and add it to your system's PATH.

Usage

Run the script:
```
python LinkeXtractor.py
```
Select a browser:
- firefox
- chrome
- edge
Enter one or more website URLs to process. Type exit when done.
Extracted links will be saved in a text file named after the website's domain. For example:
- Input: https://example.com
- Output: example.com.txt

Example

Input:

Enter the browser to use (firefox, chrome, edge): firefox
Enter the website URL (or type 'exit' to quit): https://example.com 
Enter the website URL (or type 'exit' to quit): exit

Output:

example.com.txt:

Customization

Change Link Regex: Modify the regex variable in collect_links to customize the type of links extracted.
Set Max Threads: Adjust max_workers in the ThreadPoolExecutor for optimal performance based on your system.

Known Issues

Websites with complex JavaScript-based navigation (e.g., SPAs) may not load all links.
Some links might not be accessible due to restrictions like CORS policies or CAPTCHA.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Developed by Amirprx3.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
LICENSE		LICENSE
LinkeXtractor.py		LinkeXtractor.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Link Extractor Tool

Overview

Features

Requirements

Installation

Usage

Example

Customization

Known Issues

License

Author

About

Uh oh!

Releases

Packages

Languages

License

Amirprx3/LinkeXtractor

Folders and files

Latest commit

History

Repository files navigation

Link Extractor Tool

Overview

Features

Requirements

Installation

Usage

Example

Customization

Known Issues

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages