-
-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Follows on from discussions in #63 - currently the HostAlias setting is relatively limited, requiring an exact match before it crawls a link with that domain.
To make crawling a large number of subdomains easier, support for a wildcard (*) would be useful.
eg.
using InfinityCrawler;
var crawler = new Crawler();
var result = await crawler.Crawl(new Uri("http://example.org/"), new CrawlSettings {
UserAgent = "MyVeryOwnWebCrawler/1.0",
RequestProcessorOptions = new RequestProcessorOptions
{
MaxNumberOfSimultaneousRequests = 5
},
HostAliases = new [] { "*.example.org" }
});There likely doesn't need to be any specific rules around wildcard handling. A host alias that is only a wildcard would indicate crawling any domain linked to. This is likely where analyzers of some kind would be useful as well as additional documentation.
A full wildcard setup does allow crawling of more complex subdomains like web.*.example.org, which may help in some specific usecases.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request