dracoblue.net

Generating/Parsing a sitemap.xml with craur

Some of you might have heard or used my json/xml/csv/xlsx to json/xml/csv/xlsx conversion library called craur.

It advocates a simple api for any of those formats, instead of simplexmlloadstring or other php built in functions, which are usually not consistent when it comes to throwing exceptions or warnings.

Today we'll look into an example, how to use craur to generate and parse a sitemap.xml.

A usual sitemap.xml, looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.example.com/contact</loc>
   </url>
   <url>
      <loc>http://www.example.com/robots.txt</loc>
      <priority>1.0</priority>
   </url>
</urlset>

Generating this xml, works with craur like this:

<?php

$craur = new \Craur(array(
    'urlset' => array(
        '@xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
        'url' => array(
            array(
                'loc' => 'http://www.example.com/',
                'lastmod' => '2005-01-01',
                'changefreq' => 'monthly',
                'priority' => '0.8'
            ),
            array(
                'loc' => 'http://www.example.com/contact'
            ),
            array(
                'loc' => 'http://www.example.com/robots.txt',
                'priority' => '1.0'
            )
        )
    )
));

echo '<' . '?xml version="1.0" encoding="UTF-8"?' . '>' . $craur->toXmlString();

If you want to parse this xml string, you can do this easily with craur in the following way:

<?php

$craur = Craur::createFromXml($xml_string);

var_dump($craur->get('[email protected]'));
# 'http://www.sitemaps.org/schemas/sitemap/0.9'

foreach ($craur->get('urlset.url[]') as $url_item)
{
    var_dump($url_item->get('loc'));
}
# http://www.example.com/
# http://www.example.com/contact
# http://www.example.com/robots.txt

The brackets [] in urlset.url[] indicate, that you are expecting an array here. Craur will handle the case, when the sitemap.xml is empty and no element is found, or if there is only one element. In all cases craur will return an array, if the key is suffixed with []. If the key is not suffixed with [], craur will only return the first value.

That's it!

Bonus: If you want to convert plenty of values, you might want to use the Craur#getValues:

<?php

$craur = Craur::createFromXml($xml_string);

foreach ($craur->get('urlset.url[]') as $url_item)
{
    var_dump($url_item->getValues(
      array(
        'url' => 'loc',
        'priority' => 'priority',
        'changefreq' => 'changefreq',
        'lastmod' => 'lastmod'
      ),
      array(
        'priority' => '',
        'changefreq' => 'often',
        'lastmod' => ''
      )
   ));
}

# loc is mapped to url
# if priority/lastmod, ist not available an empty string will be used
# if changereq is not set, 'often' will be used
# array(
#   array(
#     'url' => 'http://www.example.com/',
#     'lastmod' => '2005-01-01',
#     'changefreq' => 'monthly',
#     'priority' => '0.8'
#     ),
#   array(
#     'url' => 'http://www.example.com/contact',
#     'lastmod' => '',
#     'changefreq' => 'often',
#     'priority' => ''
#     ),
#   array(
#     'url' => 'http://www.example.com/robots.txt',
#     'lastmod' => '',
#     'changefreq' => 'often',
#     'priority' => '1.0'
#     )
# )

In craur, open source, php, xml by @ 08 Dec 2013

comments powered by Disqus

Recent Files

Advertisement

Recent Dev-Articles

Read recently

About

Blogroll