Web Scraper

Overview

This project is a simple web scraper built using Python. It extracts text-only content from a specified web page and saves it to a local file. The application features a graphical user interface (GUI) powered by Tkinter, making it user-friendly for non-technical users.

Features

Extracts all text content from a web page using the URL provided by the user.
Saves the extracted content to a specified text file.
Error handling for invalid URLs, network issues, and file saving errors.

Requirements

To run this project, you need:

Python 3.6 or later
The following Python libraries:
- tkinter (comes pre-installed with Python on most platforms)
- requests
- beautifulsoup4

Installation

Clone or download this repository.
Install the required Python libraries by running:
```
pip install requests beautifulsoup4
```

Usage

Run the main.py script:
```
python main.py
```
In the GUI:
- Enter the URL of the web page in the "Enter URL" field.
- Enter the desired file name (e.g., output.txt) in the "File Name" field.
- Click the "SAVE" button.
The extracted text content will be saved to the specified file. Check the status message for success or error feedback.

Example

Input: URL: https://example.com, File Name: example_output.txt
Output: A text file example_output.txt containing the text-only content of the web page.

Error Handling

Invalid URL: The application will display an error message if the URL is malformed or unreachable.
File Saving Issues: If the file cannot be saved (e.g., due to invalid file name or permissions), a descriptive error message will be shown.

Known Limitations

The scraper does not handle JavaScript-rendered content.
The output is raw text without formatting or structure.
Requires internet access to fetch the web page content.

Future Enhancements

Add support for JavaScript-rendered pages using a headless browser (e.g., Selenium).
Improve error messages with more specific diagnostics.
Allow additional output formats, such as JSON or Markdown.

License

This project is released under the MIT License. See the LICENSE file for details.

Acknowledgments

Enjoy using the Web Scraper! Feel free to contribute or suggest improvements.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraper

Overview

Features

Requirements

Installation

Usage

Example

Error Handling

Known Limitations

Future Enhancements

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

avadhoot7004/python-web-scraper

Folders and files

Latest commit

History

Repository files navigation

Web Scraper

Overview

Features

Requirements

Installation

Usage

Example

Error Handling

Known Limitations

Future Enhancements

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages