A Java-based application for scraping JDK metadata from various vendors. This project replaces the original bash scripts with a robust, parallel Java implementation.
- Parallel Execution: Run multiple vendor scrapers concurrently for improved performance
- Selective Scraping: Run all scrapers or select specific vendors
- Central Reporting: Thread-safe progress reporting with real-time status updates
- Extensible Architecture: Easy to add new vendor scrapers
- Generic Base Classes: Reduces code duplication for similar vendors (e.g., Semeru versions)
- Comprehensive Logging: SLF4J/Logback integration with both console and file output
- Java 21 or higher
This project uses Gradle for dependency management and building.
# Build the project
./gradlew build
# This creates two jars:
# - java-metadata-scraper-1.0.0-SNAPSHOT.jar (regular jar)
# - java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar (fat jar with all dependencies)# Run all scrapers
java -jar build/libs/java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar
# List available scrapers
java -jar build/libs/java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar --list
# Run specific scrapers
java -jar build/libs/java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar --scrapers microsoft,semeru-11,semeru-17
# Specify custom directories
java -jar build/libs/java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar \
--metadata-dir /path/to/metadata \
--checksum-dir /path/to/checksums
# Control parallelism
java -jar build/libs/java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar --threads 4
# Show help
java -jar build/libs/java-metadata-scraper-1.0.0-SNAPSHOT-standalone.jar --helpUsage: java-metadata-scraper [-hlV] [-c=<checksumDir>] [-m=<metadataDir>]
[-s=<scraperIds>[,<scraperIds>...]]...
[-t=<maxThreads>]
Scrapes JDK metadata from various vendors
Options:
-m, --metadata-dir=<metadataDir>
Directory to store metadata files (default: docs/metadata)
-c, --checksum-dir=<checksumDir>
Directory to store checksum files (default: docs/checksums)
-s, --scrapers=<scraperIds>[,<scraperIds>...]
Comma-separated list of scraper IDs to run (if not specified,
all scrapers run)
-l, --list List all available scraper IDs and exit
-t, --threads=<maxThreads>
Maximum number of parallel scraper threads (default: number
of processors)
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
- ProgressReporter: Central reporting thread that receives and logs progress events from all scrapers
- BaseScraper: Abstract base class for all scrapers with common functionality (downloading, hashing, metadata saving)
- GitHubReleaseScraper: Specialized base class for scrapers that fetch releases from GitHub
- AdoptiumMarketplaceScraper: Specialized base class for scrapers using Adoptium Marketplace API
- ScraperFactory: Factory class that uses ServiceLoader to dynamically discover and instantiate scrapers
- Scraper.Discovery: Service provider interface for scraper registration via Java ServiceLoader
The project includes 35+ vendor scrapers, supporting all major JDK distributions:
- Temurin (Eclipse Adoptium): temurin, temurin-ea
- Zulu (Azul): zulu, zulu-prime
- Liberica (BellSoft): liberica, liberica-native
- Corretto (Amazon)
- SapMachine (SAP)
- Microsoft (Microsoft Build of OpenJDK)
- OpenJDK: openjdk, openjdk-leyden, openjdk-loom, openjdk-valhalla
- Dragonwell (Alibaba)
- Kona (Tencent)
- Oracle: oracle, oracle-graalvm, oracle-graalvm-ea
- Semeru (IBM): semeru, semeru-certified
- Trava (TravaOpenJDK)
- AdoptOpenJDK (Legacy)
- Bisheng (Huawei)
- Red Hat
- GraalVM: graalvm-legacy, graalvm-ce, graalvm-ce-ea, graalvm-community, graalvm-community-ea
- IBM JDK
- Java SE RI (Reference Implementation)
- JetBrains Runtime
- Mandrel (Red Hat's GraalVM)
- Gluon GraalVM
- OpenLogic
Each scraper is registered via Java's ServiceLoader mechanism in META-INF/services.
- Create a new class extending
BaseScraper,GitHubReleaseScraper, orAdoptiumMarketplaceScraper - Implement required abstract methods
- Add an inner
Discoveryclass implementingScraper.Discovery - Register the discovery class in
META-INF/services/com.github.joschi.javametadata.scraper.Scraper$Discovery
Example:
public class NewScraper extends BaseScraper {
public NewScraper(Path metadataDir, Path checksumDir, Logger logger) {
super(metadataDir, checksumDir, logger);
}
@Override
protected ScraperResult scrapeImpl() throws Exception {
// Implementation here
}
// ServiceLoader discovery
public static class Discovery implements Scraper.Discovery {
@Override
public String name() {
return "scraper-name";
}
@Override
public String vendor() {
return "vendor-name";
}
@Override
public Scraper create(Path metadataDir, Path checksumDir, Logger logger) {
return new NewScraper(metadataDir, checksumDir, logger);
}
}
}src/
├── main/
│ ├── java/
│ │ └── com/github/joschi/javametadata/
│ │ ├── Main.java # CLI application entry point
│ │ ├── model/
│ │ │ └── JdkMetadata.java # Data model for JDK metadata
│ │ ├── reporting/
│ │ │ ├── ProgressEvent.java # Progress event types
│ │ │ ├── ProgressReporter.java # Central reporting thread
│ │ │ └── ProgressReporterLogger.java # Logger adapter for scrapers
│ │ ├── scraper/
│ │ │ ├── Scraper.java # Scraper interface with Discovery SPI
│ │ │ ├── BaseScraper.java # Base class for all scrapers
│ │ │ ├── GitHubReleaseScraper.java # Base for GitHub-based scrapers
│ │ │ ├── AdoptiumMarketplaceScraper.java # Base for Adoptium Marketplace
│ │ │ ├── ScraperFactory.java # Factory using ServiceLoader
│ │ │ ├── ScraperResult.java # Result wrapper
│ │ │ └── vendors/
│ │ │ ├── TemurinScraper.java
│ │ │ ├── ZuluScraper.java
│ │ │ ├── LibericaScraper.java
│ │ │ ├── MicrosoftScraper.java
│ │ │ ├── SemeruScraper.java
│ │ │ ├── ... (35+ vendor scrapers)
│ │ │ └── (See full list in Vendor Scrapers section)
│ │ └── util/
│ │ ├── FileUtils.java # File operations
│ │ ├── HashUtils.java # Hash computation
│ │ └── HttpUtils.java # HTTP operations
│ └── resources/
│ ├── logback.xml # Logging configuration
│ └── META-INF/
│ └── services/
│ └── com.github.joschi.javametadata.scraper.Scraper$Discovery
└── test/
└── java/
└── (test classes)
- Jackson: JSON processing (2.16.1)
- Apache HttpClient 5: HTTP operations (5.3.1)
- SLF4J/Logback: Logging (SLF4J 2.0.7, Logback 1.4.14)
- Picocli: Command-line interface (4.7.5)
- JUnit 5: Testing (5.10.1)
The scrapers generate two types of output:
- Metadata files: JSON files containing JDK metadata (stored in
docs/metadata/<vendor>/) - Checksum files: MD5, SHA1, SHA256, SHA512 checksums (stored in
docs/checksums/<vendor>/)
Each vendor directory contains:
- Individual
.jsonfiles for each JDK release - An
all.jsonfile combining all releases for that vendor
Logs are written to:
- Console (STDOUT) - Real-time progress
- File (
logs/java-metadata-scraper.log) - Detailed execution log
The logging configuration can be customized in src/main/resources/logback.xml.
- Java 21 or higher
- Gradle 8.x (included via wrapper)
Same as the original project (see LICENSE file)