PHP class to parse X-Robots-Tag HTTP headers according to Google X-Robots-Tag HTTP header specifications.
Note: HHVM support is planned once facebook/hhvm#4277 is fixed.
The library is available via Composer. Add this to your composer.json file:
{
"require": {
"vipnytt/robotstagparser": "~0.2"
}
}Then run composer update.
Get all rules affecting you, this includes the following:
- All generic rules
- Rules specific to your User-Agent (if there is any)
use vipnytt\XRobotsTagParser;
$headers = [
'X-Robots-Tag: noindex, noodp',
'X-Robots-Tag: googlebot: noindex, noarchive',
'X-Robots-Tag: bingbot: noindex, noarchive, noimageindex'
];
$parser = new XRobotsTagParser('myUserAgent', $headers);
$rules = $parser->getRules(); // <-- returns an array of rulesuse vipnytt\XRobotsTagParser;
$parser = new XRobotsTagParser\Adapters\Url('http://example.com/', 'myUserAgent');
$rules = $parser->getRules();use vipnytt\XRobotsTagParser;
use GuzzleHttp\Client;
$client = new GuzzleHttp\Client();
$response = $client->request('GET', 'http://example.com/');
$parser = new XRobotsTagParser\Adapters\GuzzleHttp($response, 'myUserAgent');
$array = $parser->getRules();use vipnytt\XRobotsTagParser;
$string = <<<STRING
HTTP/1.1 200 OK
Date: Tue, 25 May 2010 21:42:43 GMT
X-Robots-Tag: noindex
X-Robots-Tag: nofollow
STRING;
$parser = new XRobotsTagParser\Adapters\TextString($string, 'myUserAgent');
$array = $parser->getRules();Returns an array containing all rules for any User-Agent.
use vipnytt\XRobotsTagParser;
$parser = new XRobotsTagParser('myUserAgent', $headers);
$array = $parser->export();-
all- There are no restrictions for indexing or serving. -
none- Equivalent tonoindexandnofollow. -
noindex- Do not show this page in search results and do not show a "Cached" link in search results. -
nofollow- Do not follow the links on this page. -
noarchive- Do not show a "Cached" link in search results. -
nosnippet- Do not show a snippet in the search results for this page. -
noodp- Do not use metadata from the Open Directory project for titles or snippets shown for this page. -
notranslate- Do not offer translation of this page in search results. -
noimageindex- Do not index images on this page. -
unavailable_after- Do not show this page in search results after the specified date/time.
Source: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

