dracoblue.net

Generating/Parsing a sitemap.xml with craur

Some of you might have heard or used my json/xml/csv/xlsx to json/xml/csv/xlsx conversion library called craur.

It advocates a simple api for any of those formats, instead of simplexmlloadstring or other php built in functions, which are usually not consistent when it comes to throwing exceptions or warnings.

Today we'll look into an example, how to use craur to generate and parse a sitemap.xml.

A usual sitemap.xml, looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.example.com/contact</loc>
   </url>
   <url>
      <loc>http://www.example.com/robots.txt</loc>
      <priority>1.0</priority>
   </url>
</urlset>

Generating this xml, works with craur like this:

<?php

$craur = new \Craur(array(
    'urlset' => array(
        '@xmlns' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
        'url' => array(
            array(
                'loc' => 'http://www.example.com/',
                'lastmod' => '2005-01-01',
                'changefreq' => 'monthly',
                'priority' => '0.8'
            ),
            array(
                'loc' => 'http://www.example.com/contact'
            ),
            array(
                'loc' => 'http://www.example.com/robots.txt',
                'priority' => '1.0'
            )
        )
    )
));

echo '<' . '?xml version="1.0" encoding="UTF-8"?' . '>' . $craur->toXmlString();

If you want to parse this xml string, you can do this easily with craur in the following way:

<?php

$craur = Craur::createFromXml($xml_string);

var_dump($craur->get('urlset.@xmlns'));
# 'http://www.sitemaps.org/schemas/sitemap/0.9'

foreach ($craur->get('urlset.url[]') as $url_item)
{
    var_dump($url_item->get('loc'));
}
# http://www.example.com/
# http://www.example.com/contact
# http://www.example.com/robots.txt

The brackets [] in urlset.url[] indicate, that you are expecting an array here. Craur will handle the case, when the sitemap.xml is empty and no element is found, or if there is only one element. In all cases craur will return an array, if the key is suffixed with []. If the key is not suffixed with [], craur will only return the first value.

That's it!

Bonus: If you want to convert plenty of values, you might want to use the Craur#getValues:

<?php

$craur = Craur::createFromXml($xml_string);

foreach ($craur->get('urlset.url[]') as $url_item)
{
    var_dump($url_item->getValues(
      array(
        'url' => 'loc',
        'priority' => 'priority',
        'changefreq' => 'changefreq',
        'lastmod' => 'lastmod'
      ),
      array(
        'priority' => '',
        'changefreq' => 'often',
        'lastmod' => ''
      )
   ));
}

# loc is mapped to url
# if priority/lastmod, ist not available an empty string will be used
# if changereq is not set, 'often' will be used
# array(
#   array(
#     'url' => 'http://www.example.com/',
#     'lastmod' => '2005-01-01',
#     'changefreq' => 'monthly',
#     'priority' => '0.8'
#     ),
#   array(
#     'url' => 'http://www.example.com/contact',
#     'lastmod' => '',
#     'changefreq' => 'often',
#     'priority' => ''
#     ),
#   array(
#     'url' => 'http://www.example.com/robots.txt',
#     'lastmod' => '',
#     'changefreq' => 'often',
#     'priority' => '1.0'
#     )
# )
In craur, open source, php, xml by
@ 08 Dec 2013, Comments at Reddit & Hackernews

Give something back

Were my blog posts useful to you? If you want to give back, support one of these charities, too!

Report hate in social media Campact e.V. With our technology and your help, we protect the oceans from plastic waste. Gesellschaft fur Freiheitsrechte e. V. The civil eye in the mediterranean

Recent Dev-Articles

Read recently

Recent Files

About