forked from reverbrain/swarm
-
Notifications
You must be signed in to change notification settings - Fork 0
serpentf/swarm
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Swarm is high-perfomance library for web crawling. Depends on libcurl, uriparser, libxml2. libcurl built with --enable-ares is highly needed for better perfomance. libcurl must be built with SSL support for accessing https sites. network_url is helper class for URI processing. It's not thread safe. Before using set_base must be called, true is returned if argument is valid url. network_url::normalized() returnes normalized form of url. network_url::host() returnes host part of URI. network_url::relative() returnes absolute URI generated from base and relative argument, if host is passed it's set to host part of new URI. url_finder is helper class for HTML parsing. It's not thread safe. Cosntructor takes the only argument - content of HTML document. url_finder::urls() parses the document and returnes list of all found urls from <a href="..."/> tags network_manager is helper class for url fetching. It's thread safe. Constructor takes the only argument - libev++ event loop reference. Object must be constructed in the same thread as event loop. network_manager::set_limit() set limit for number of concurrent web requests. network_manager::get() put network_request to processing queue. Callback will be called as result is ready.
About
Swarm is aiming at your web
Resources
Stars
Watchers
Forks
Packages 0
No packages published