A Python tool to download entire SharePoint sites via Microsoft Graph API. Supports both Device Code (delegated) and Client Credentials (application) authentication flows, and can generate standalone static HTML sites from the downloaded content.
- Generic: Works with any SharePoint Online site URL
- Authentication: MSAL Device Code (delegated) or Client Credentials (application) flows
- Complete Download: Recursively downloads all document libraries, Site Pages, Site Assets, Style Library, and Master Page Gallery
- Static Site Generation: Converts downloaded SharePoint content into standalone HTML sites
- Resilient: Auto-retries on throttling (HTTP 429/503), resumes partially downloaded files
- Structure Preservation: Local folder tree mirrors SharePoint hierarchy
- Image Handling: Downloads and fixes image references for offline viewing
- Python 3.9+
- A Microsoft Entra ID App Registration with Microsoft Graph permissions
- Clone and install
git clone <repository-url>
cd sharepoint-api-download
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt- Create Azure Entra ID App Registration
- Go to Azure Portal → Microsoft Entra ID → App registrations → New registration
- Name: "SharePoint Downloader"; Supported account types: single tenant (or your choice)
- Note the Application (client) ID and Directory (tenant) ID
Important: Use Azure Portal (portal.azure.com), not Microsoft 365 admin center
- Configure API permissions
For Device Code (delegated):
- Microsoft Graph → Delegated permissions:
Sites.Read.All,Files.Read.All - Click "Grant admin consent" (required for org-wide sites)
For Client Credentials (application):
- Microsoft Graph → Application permissions:
Sites.Read.All,Files.Read.All - Add a client secret (Certificates & secrets → New client secret) and note the value
- Click "Grant admin consent"
- Configure environment
Copy .env.example to .env and fill values:
cp env.example .envEdit .env with your values:
TENANT_ID=your-tenant-id-here
CLIENT_ID=your-client-id-here
CLIENT_SECRET=your-client-secret-here
SITE_URL=https://yourtenant.sharepoint.com/sites/YourSiteName
AUTH_FLOW=application
OUTPUT_DIR=./downloads
CONCURRENCY=4- Run the downloaderc Simple way (recommended):
./run.shManual way:
# Application auth (no prompts)
python -m sharepoint_downloader.cli \
--site-url "https://yourtenant.sharepoint.com/sites/YourSiteName" \
--output ./downloads \
--auth application \
--tenant-id "$TENANT_ID" \
--client-id "$CLIENT_ID" \
--client-secret "$CLIENT_SECRET" \
--generate-static
# Device auth (requires browser sign-in)
python -m sharepoint_downloader.cli \
--site-url "https://yourtenant.sharepoint.com/sites/YourSiteName" \
--output ./downloads \
--auth device \
--tenant-id "$TENANT_ID" \
--client-id "$CLIENT_ID" \
--generate-staticThe --generate-static flag converts downloaded SharePoint content into a standalone HTML site:
- Converts ASPX pages to clean HTML
- Fixes image references to work offline
- Creates an index page with links to all pages
- Removes SharePoint-specific styling and dependencies
- Generates a
static_site/directory with the standalone site
python -m sharepoint_downloader.cli --helpOptions:
--site-url: Full SharePoint site URL--output: Local directory to write files--library: Optional library name filter (can be repeated); default: all--auth:device(default) orapplication--tenant-id,--client-id,--client-secret: Auth config (can also come from env)--concurrency: Parallel downloads (default 4)--skip-existing: Skip files that already exist with same size--generate-static: Generate standalone HTML site from downloaded content
- Site Resolution: Resolve site ID from the URL via
GET /v1.0/sites/{hostname}:/sites/{path} - Drive Discovery: List document libraries via
GET /v1.0/sites/{site-id}/drives - Content Traversal: Recursively enumerate folders/files via
GET /v1.0/drives/{drive-id}/items/{item-id}/children - File Download: Download files via
GET /v1.0/drives/{drive-id}/items/{item-id}/contentwith retries and chunking - Static Generation: Convert ASPX pages to HTML and fix asset references for offline viewing
- 401/403: Verify permissions and admin consent; ensure correct auth flow
- 404 site not found: Check
SITE_URLhost and path - Throttling: The downloader auto-retries with backoff; you can lower
--concurrency - Empty downloads: Ensure you have
Sites.Read.AllandFiles.Read.Allpermissions with admin consent
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.