Skip to content

check recentchanges API against last IA upload #40

@pabs3

Description

@pabs3

Currently wikiteam3 only checks that the wiki doesn't have an upload on IA in the past year.

Some wikis are very actively edited while others are rarely edited.

The rarely edited wikis could be captured once per year even if they don't change that often, which would waste space on IA if multiple uploads are done with no changes between them.

It would be good to avoid this situation by comparing the IA upload date to the date of the most recent change and rejecting uploads without any change.

There are two ways the date of recent change can be found:

MediaWiki instances have an API for checking the recent changes. Since only the latest change is needed, set the rclimit parameter to one change. Select the JSON output format for machine readability.

$ curl -s 'https://wiki.archiveteam.org/api.php?action=query&list=recentchanges&rclimit=1&rcnamespace=0&format=json' | jq -r .query.recentchanges[].timestamp
2025-02-15T03:00:52Z

The Special:RecentChanges page lists recent changes. Since only the latest change is needed, set the limit parameter to one change. Since this page requires a lower date limit to search, set days to a very large value to ensure the date limit covers all dates. The date of the change is available in two places; The first is the data-mw-ts attribute in table.mw-changeslist-line in div.mw-changeslist, this is not available on older instances. The second is the text of the h4 in the div.mw-changeslist, this is translated to the local language of the instance.

$  curl -s 'https://wiki.archiveteam.org/index.php?title=Special:RecentChanges&limit=1' |  pup 'div.mw-changeslist table.mw-changeslist-line attr{data-mw-ts}'
20250215055720
$ curl -s 'https://wiki.archiveteam.org/index.php?title=Special:RecentChanges&limit=1' | pup 'div.mw-changeslist h4 text{}'
15 February 2025
$ curl -s 'https://millcomputing.com/w/index.php?title=Special:RecentChanges&days=10000000000000&from=&limit=1' | pup 'div.mw-changeslist h4 text{}'
23 February 2021
$ curl -s 'https://wiki.freifunk-franken.de/mediawiki/index.php?title=Spezial:Letzte_%C3%84nderungen&limit=1&days=100000000' | pup 'div.mw-changeslist h4 text{}'
4. Februar 2025

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions