GitHub - tejanshsachdeva/WRPC: WRPC Data Extractor is a Streamlit web app developed during an internship at Havish M Consulting. It allows users to extract and analyze data from WRPC's PDF reports for monthly Scheduled Revenue and Deviation Settlement Mechanism (DSM), generating consolidated Excel reports for corporate analysis.

WRPC Data Extractor

This project was developed during my internship at Havish M Consulting. The Streamlit web application allows users to extract meaningful data from WRPC's PDF reports for monthly Scheduled Revenue and Deviation Settlement Mechanism (DSM). Users can enter a year, select PDFs, and search for specific keywords to generate consolidated Excel reports.

This is the source website for data extraction: https://wrpc.gov.in/

Features

Data Extraction: Extracts data from WRPC's PDF reports based on user-provided year and search terms.
Excel Output: Generates Excel files containing summarized data for easy analysis.
Interactive Interface: User-friendly interface powered by Streamlit, making it easy to select options and download results.

Deployment

The application is deployed and accessible online at WRPC Data Extractor.

How to Use

Choose Data Type: Select between "Monthly Scheduled Revenue" and "Deviation Settlement Mechanism (DSM)".
Enter Year: Input the desired year(s) for which data extraction is required.
Select PDFs: Choose specific PDFs from the list provided.
Search Term: Enter keywords to search within the selected PDFs.
Generate Excel: Click on the "Download Excel" button to obtain the consolidated data in Excel format.

Function Explanations

`fetch_pdfs_for_year(year)`

Fetches PDF URLs for a given year.

Inputs: year (e.g., '2023')
Outputs: List of PDF URLs and names
Process: Retrieves and parses a data file to extract and construct the URLs and names of the weekly summary PDFs for the specified year.

`extract_all_table_rows_from_url(pdf_url, pdf_name, search_term)`

Extracts data from a PDF based on a search term.

Inputs: pdf_url (URL of the PDF), pdf_name (name for the PDF), search_term (keyword to search)
Outputs: Lists containing rows of data: all_rows and summary_rows
Process: Downloads the PDF, reads its content, searches for the term, and captures relevant rows in two categories: summary rows and daywise summary rows.

`display_results(results, summary_rows, search_term)`

Displays the extracted data on the Streamlit app.

Inputs: results (daywise data rows), summary_rows (summary data rows), search_term (keyword used for search)
Outputs: None (Displays data in Streamlit UI)
Process: Shows success message, displays dataframes for summary and daywise rows, sorts daywise data by date, and provides an info message with the total results count.

`convert_to_excel(summary_rows, daywise_rows)`

Converts extracted data to an Excel file.

Inputs: summary_rows (summary data rows), daywise_rows (daywise data rows)
Outputs: An Excel file in memory (BytesIO buffer)
Process: Creates two dataframes from the input rows, writes them to separate sheets in an Excel file, and returns the file as a buffer.

Screenshots

Screenshot of the interface showing data extraction options.

Screenshot of the Excel output generated.

Dependencies

Streamlit
Requests
pdfplumber
Pandas

Installation

To run this application locally:

Clone the repository.
Install dependencies using pip install -r requirements.txt.
Run the application with streamlit run home.py.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
DSM		DSM
REA		REA
SRPC		SRPC
SRPC_DSM		SRPC_DSM
image/Readme		image/Readme
.gitignore		.gitignore
CaseStudy.md		CaseStudy.md
Readme.md		Readme.md
home.py		home.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WRPC Data Extractor

Features

Deployment

How to Use

Function Explanations

`fetch_pdfs_for_year(year)`

`extract_all_table_rows_from_url(pdf_url, pdf_name, search_term)`

`display_results(results, summary_rows, search_term)`

`convert_to_excel(summary_rows, daywise_rows)`

Screenshots

Dependencies

Installation

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

tejanshsachdeva/WRPC

Folders and files

Latest commit

History

Repository files navigation

WRPC Data Extractor

Features

Deployment

How to Use

Function Explanations

fetch_pdfs_for_year(year)

extract_all_table_rows_from_url(pdf_url, pdf_name, search_term)

display_results(results, summary_rows, search_term)

convert_to_excel(summary_rows, daywise_rows)

Screenshots

Dependencies

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

`fetch_pdfs_for_year(year)`

`extract_all_table_rows_from_url(pdf_url, pdf_name, search_term)`

`display_results(results, summary_rows, search_term)`

`convert_to_excel(summary_rows, daywise_rows)`

Packages