ROBOTS.TXT UND SITEMAP.XML IN ODOO

Understand how it works in Odoo

Robert Rübner

ROBOTS.TXT IN ODOO

The robots.txt can be reached in odoo simply via the URL http://www.your-domain.de/robots.txt. By default, the following content is included.

User-agent: *
Sitemap: http://www.your-domain.de/sitemap.xml

Since the content is loaded via a template, you can customize the robots.txt according to your wishes by extending the template like any other. The corresponding template has the ID website.robots and is included in the website module. From odoo v9 it is also possible to use different robots.txt per web page as templates can be linked to individual web pages. There are already modules, website_no_crawler and website_norobots that disable crawling. It does not matter which one is used by both, since they both do the same thing.

SITEMAP.XML IN ODOO

The sitemap.xml can be called in odoo simply via the URL http://www.your-domain.de/sitemap.xml. By default, the page contains the following structure.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
    <url>
        <loc>http://www.your-domain.de/page/site-1</loc>
    </url><url>
        <loc>http://www.your-domain.de/page/site-2</loc>
        <lastmod> 2015-11-13</lastmod>
    </url><url>
    ...
    </url>
</urlset>

The content of sitemap.xml is also loaded via an appropriate template. Therefore you can customize the XML structure through template extensions. The corresponding template IDs are website.sitemap_xml and website.sitemap_locs. Both templates are also included in the website module. Since these templates contain dynamic data, you have to overwrite the corresponding controller if you want to use new or adapted data in the templates. Again, it is possible from odoo v9 to use different sitemap.xml per web page, as templates can be linked to individual web pages.

In the background odoo attachments are created for the sitemap.xml. This means that you can also view the content of sitemap.xml in the odoo backend under Settings -> Technical -> Database Structure -> Attachments. There is no scheduler that automatically generates the attachments. The attachments are created the first time you visit http://www.your-domain.de/sitemap.xml. If you then call the URL again, the attachments will not be recreated and will not be updated. Only after a certain period of time will the attachments be regenerated. By default, this happens if the existing sitemap.xml is older than 12h on the URL call.

The URL entries in the sitemap.xml are created by the function enumerate_pages() in the website module. By default, these are all templates that have the page ="True" attribute. If you want to have other URL entries, you have to override this function. There is another template with the ID website.sitemap_index_xml in the website module. This template is used when the number of sitemap.xml entries exceeds a certain value. By default, 45000 entries are entered in a sitemap.xml. If this value is exceeded, the sitemap.xml will be split into several files. The naming scheme is then sitemap-1.xml, sitemap-2.xml, sitemap-3.xml etc. The structure of this page looks like this.

<sitemapindex xmlns = "http://www.sitemaps.org/schemas/sitemap/0.9">
    <sitemap>
        <Loc>http://www.your-domain.de/sitemap-1.xml</loc>
        <Loc>http://www.your-domain.de/sitemap-2.xml</loc>
        <Loc>http://www.your-domain.de/sitemap-3.xml</loc>
        ...
    </sitemap>
</sitemapindex>

Even if only one sitemap.xml has to be created, odoo automatically generates a sitemap-1.xml. However, this does not load the <sitemapindex> structure but the <urlset> structure.

About Robert Rübner

The python and odoo ninja/ We've heard some people call him Yoda/ Got a smartphone some months ago.