SharePoint Site Downloader

A Python tool to download entire SharePoint sites via Microsoft Graph API. Supports both Device Code (delegated) and Client Credentials (application) authentication flows, and can generate standalone static HTML sites from the downloaded content.

Features

Generic: Works with any SharePoint Online site URL
Authentication: MSAL Device Code (delegated) or Client Credentials (application) flows
Complete Download: Recursively downloads all document libraries, Site Pages, Site Assets, Style Library, and Master Page Gallery
Static Site Generation: Converts downloaded SharePoint content into standalone HTML sites
Resilient: Auto-retries on throttling (HTTP 429/503), resumes partially downloaded files
Structure Preservation: Local folder tree mirrors SharePoint hierarchy
Image Handling: Downloads and fixes image references for offline viewing

Prerequisites

Python 3.9+
A Microsoft Entra ID App Registration with Microsoft Graph permissions

Quick Start

Clone and install

git clone <repository-url>
cd sharepoint-api-download
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Create Azure Entra ID App Registration

Go to Azure Portal → Microsoft Entra ID → App registrations → New registration
Name: "SharePoint Downloader"; Supported account types: single tenant (or your choice)
Note the Application (client) ID and Directory (tenant) ID

Important: Use Azure Portal (portal.azure.com), not Microsoft 365 admin center

Configure API permissions

For Device Code (delegated):

Microsoft Graph → Delegated permissions: Sites.Read.All, Files.Read.All
Click "Grant admin consent" (required for org-wide sites)

For Client Credentials (application):

Microsoft Graph → Application permissions: Sites.Read.All, Files.Read.All
Add a client secret (Certificates & secrets → New client secret) and note the value
Click "Grant admin consent"

Configure environment

Copy .env.example to .env and fill values:

cp env.example .env

Edit .env with your values:

TENANT_ID=your-tenant-id-here
CLIENT_ID=your-client-id-here
CLIENT_SECRET=your-client-secret-here
SITE_URL=https://yourtenant.sharepoint.com/sites/YourSiteName
AUTH_FLOW=application
OUTPUT_DIR=./downloads
CONCURRENCY=4

Run the downloaderc Simple way (recommended):

./run.sh

Manual way:

# Application auth (no prompts)
python -m sharepoint_downloader.cli \
  --site-url "https://yourtenant.sharepoint.com/sites/YourSiteName" \
  --output ./downloads \
  --auth application \
  --tenant-id "$TENANT_ID" \
  --client-id "$CLIENT_ID" \
  --client-secret "$CLIENT_SECRET" \
  --generate-static

# Device auth (requires browser sign-in)
python -m sharepoint_downloader.cli \
  --site-url "https://yourtenant.sharepoint.com/sites/YourSiteName" \
  --output ./downloads \
  --auth device \
  --tenant-id "$TENANT_ID" \
  --client-id "$CLIENT_ID" \
  --generate-static

Static Site Generation

The --generate-static flag converts downloaded SharePoint content into a standalone HTML site:

Converts ASPX pages to clean HTML
Fixes image references to work offline
Creates an index page with links to all pages
Removes SharePoint-specific styling and dependencies
Generates a static_site/ directory with the standalone site

python -m sharepoint_downloader.cli --help

Options:

--site-url: Full SharePoint site URL
--output: Local directory to write files
--library: Optional library name filter (can be repeated); default: all
--auth: device (default) or application
--tenant-id, --client-id, --client-secret: Auth config (can also come from env)
--concurrency: Parallel downloads (default 4)
--skip-existing: Skip files that already exist with same size
--generate-static: Generate standalone HTML site from downloaded content

How it works

Site Resolution: Resolve site ID from the URL via GET /v1.0/sites/{hostname}:/sites/{path}
Drive Discovery: List document libraries via GET /v1.0/sites/{site-id}/drives
Content Traversal: Recursively enumerate folders/files via GET /v1.0/drives/{drive-id}/items/{item-id}/children
File Download: Download files via GET /v1.0/drives/{drive-id}/items/{item-id}/content with retries and chunking
Static Generation: Convert ASPX pages to HTML and fix asset references for offline viewing

Troubleshooting

401/403: Verify permissions and admin consent; ensure correct auth flow
404 site not found: Check SITE_URL host and path
Throttling: The downloader auto-retries with backoff; you can lower --concurrency
Empty downloads: Ensure you have Sites.Read.All and Files.Read.All permissions with admin consent

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SharePoint Site Downloader

Features

Prerequisites

Quick Start

Static Site Generation

How it works

Troubleshooting

Contributing

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
sharepoint_downloader		sharepoint_downloader
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.example		env.example
requirements.txt		requirements.txt
run.sh		run.sh

License

refactorau/sharepoint-api-download

Folders and files

Latest commit

History

Repository files navigation

SharePoint Site Downloader

Features

Prerequisites

Quick Start

Static Site Generation

How it works

Troubleshooting

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages