Skip to content

Amirprx3/LinkeXtractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Link Extractor Tool

License Python Selenium

Overview

The Link Extractor Tool is a Python-based web scraping utility that extracts all links (relative and absolute) from a webpage and its associated JavaScript files. It is designed for developers and researchers who need to analyze links or URLs embedded in websites.

Features

  • Extracts links from:
    • Webpage HTML
    • External JavaScript files
  • Supports multiple browsers (Firefox, Chrome, Edge)
  • Efficient multithreading for downloading and processing script files
  • Saves results in a text file named after the website domain
  • Simple and intuitive CLI interface

Requirements

Installation

  1. Clone the repository:

    git clone https://github.com/Amirprx3/LinkeXtractor.git
    cd LinkeXtractor
  2. Install dependencies:

    pip install -r requirements.txt
  3. Download the WebDriver for your preferred browser and add it to your system's PATH.

Usage

  1. Run the script:

    python LinkeXtractor.py
  2. Select a browser:

    • firefox
    • chrome
    • edge
  3. Enter one or more website URLs to process. Type exit when done.

  4. Extracted links will be saved in a text file named after the website's domain. For example:

    • Input: https://example.com
    • Output: example.com.txt

Example

Input:

Enter the browser to use (firefox, chrome, edge): firefox
Enter the website URL (or type 'exit' to quit): https://example.com 
Enter the website URL (or type 'exit' to quit): exit

Output:

  • example.com.txt:

Customization

  • Change Link Regex: Modify the regex variable in collect_links to customize the type of links extracted.
  • Set Max Threads: Adjust max_workers in the ThreadPoolExecutor for optimal performance based on your system.

Known Issues

  • Websites with complex JavaScript-based navigation (e.g., SPAs) may not load all links.
  • Some links might not be accessible due to restrictions like CORS policies or CAPTCHA.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Author

Developed by Amirprx3.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages