Monday, 7 October 2013

PHP Web Scraper : Extract XML Sitemap

Assalamualaikom w.b.t and Greetings,

Today i just want to share the simple code to extract sitemap.xml and obtain all the titles and publish date. This code not use any extension like simplexml but just basic regex to extract certain part of the xmls.

Here the codes:

<?

echo "welcome to the ExtractSitemapXml PHP Project!";

$url ='http://myskali.blogspot.com/sitemap.xml';


$result = file_get_contents($url);

$pat = "~<loc>(.*?)<\/loc>\s*<lastmod>(.*?)<\/lastmod>~is";
preg_match_all($pat, $result, $match);

echo '<pre>';

print_r(array_slice($match,1)); //slice index 0 all pattern match

?>

You can test the code as live here:

Simple Sitemap XML Grabber and Extractor

No comments:

Post a Comment