How to create a sitemap in Google Sitemaps Protocol Format

Google Sitemaps is a new way to tell the Googlebot which pages to crawl on your site: http://www.google.com/webmasters/sitemaps The following steps will show you how to generate automatically your site map in Google Sitemaps Protocol / XML Sitemaps format specified here: http://www.google.com/webmasters/sitemaps/docs/en/protocol.html

  1. Create a new entry (HTML formatting +DTML) with the following content (for instance mypage.hu/Googlesitemap ):
      <?xml version="1.0" encoding="UTF-8"?>
      <urlset xmlns="http://www.google.com/schemas/sitemap/0.84">
      <dtml-in pages>
      <url>
      <loc><dtml-var URL1>/<dtml-var id></loc>
      </url>
      </dtml-in>
      </urlset>
    
  2. Submit the sitemap to Google with the bare parameter having set to 1 : (for instance mypage.hu/Googlesitemap?bare=1 )
  3. Google will also ask you to place an empty html file on your server with the name given by them for security reasons. Create a new image object for instance with the ZMI to avoid having generated not found pages made by zwiki.
  4. If everything goes well, then Google will display OK and fetch your sitemap on a regular basis.

To Do:

I have tried to specify the last modified parameter for every page, but Google requires an other time format so i couldn't just simply insert the last_edit_time parameter, because it hasn't validate. Maybe someone with DTML skills could make it.

Notes:

I guess if you submit your sitemap to Google then Googlebot should index your site more frequently and thoroughly, but have no real experience with this. It won't harm anyway.

The idea came from the method described at urllist.txt. I couldn't make it work so I had to go further. I decided to use the function ?bare=1 to pass to Google only the necessary info without zwiki page header an footer. First I wanted to do it with simple text Sitemap format using STX formatting + DTML, but somehow it had always been nested in a <P> element which was not OK for Google. Then I decided to use the XML thingie. You can see my sitemap here: http://webni.innen.hu/SitemapTxt or see it as it is registered at Google: http://webni.innen.hu/SitemapTxt?bare=1 If you check out the source you will note that the code slightly differs from the above described. I just needed the changes due to my subdomain redirection (it redirects to innen.hu/webni ). Maybe an ugly ugly solution but I had to create a new dtml method with the name http and the content http:// and call it to avoid the automagical wrapping of the URL in anchor element.


comments:

Re: [HowToCreateAGoogleSitemap]? --Simon Michael, Wed, 21 Sep 2005 13:06:39 -0700 reply
JózsefJároli wrote:

How to create a sitemap in Google Sitemaps Protocol Format

Nice!

Sitemaps Protocol --Bill Page, Wed, 19 Sep 2007 16:33:37 -0700 reply
Here is another Google Sitemaps generator:

 <?xml version="1.0" encoding="UTF-8"?>
 <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
         http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
         xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
 <dtml-in Catalog sort=lastEditTime reverse>
 <url>
 <loc><dtml-var URL1>/<dtml-var id></loc>
 <lastmod><dtml-try>
 <dtml-var "lastEditTime.toZone(zwiki_timezone)" fmt="ISO">
 <dtml-except>
 <dtml-try>
 <dtml-var lastEditTime fmt="ISO">
 <dtml-except>
 <dtml-try>
 <dtml-var "bobobase_modification_time.toZone(zwiki_timezone)" fmt="ISO">
 <dtml-except>
 <dtml-var bobobase_modification_time fmt="ISO">
 </dtml-try>
 </dtml-try>
 </dtml-try></lastmod>
 </url>
 </dtml-in>
 </urlset>

I use this one as a Zope DTML method.

Sitemaps Protocol --simon, Thu, 27 Sep 2007 12:36:57 -0700 reply
Here's a slightly more robust version, which works in plone zwikis and (probably) old zwikis without a catalog. I've installed this in the root folder at joyful.com and wiki.zope.org, so anywikifolder/sitemap.txt should work. I don't know if google will find these automatically, or only when configured in google webmaster tools. If it finds them, I don't know what difference it will make.. but possibly reduce googlebot traffic ?

If it turns out to be beneficial, we can figure out how to deploy it - perhaps as wikis/basic/sitemap.txt.dtml, installed by setupDtmlMethods.

 <dtml-comment>generate a wiki index in Google Sitemaps format</dtml-comment>
 <?xml version="1.0" encoding="UTF-8"?> 
 <urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
 xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" 
 xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 
 <dtml-let
  folder="_.this #bah _.hasattr(_.this,'isFolderish') and _.this.isFolderish() and _.this or aq_parent"
  wikipage="folder[folder.objectIds(spec='ZWiki Page')[0]]"
  wikiurl="wikipage.wikiUrl()">
 <dtml-in "wikipage.pages(sort_on='lastEditTime',sort_order='reverse')">
 <url> 
 <loc><dtml-var wikiurl>/<dtml-var id></loc> 
 <lastmod><dtml-try> 
 <dtml-var "lastEditTime.toZone(zwiki_timezone)" fmt="ISO"> 
 <dtml-except> 
 <dtml-try> 
 <dtml-var lastEditTime fmt="ISO"> 
 <dtml-except> 
 <dtml-try> 
 <dtml-var "bobobase_modification_time.toZone(zwiki_timezone)" fmt="ISO"> 
 <dtml-except> 
 <dtml-var bobobase_modification_time fmt="ISO"> 
 </dtml-try> 
 </dtml-try> 
 </dtml-try></lastmod> 
 </url> 
 </dtml-in> 
 </dtml-let>
 </urlset>

Sitemaps Protocol --simon, Thu, 27 Sep 2007 12:43:04 -0700 reply
PS: I imagine this will not list pages which you don't have access to view, but that needs testing.

Sitemaps Protocol --simon, Thu, 27 Sep 2007 12:48:38 -0700 reply
PPS actually, I will probably use sitemap.xml, not .txt.

Sitemaps Protocol --JózsefJároli, Mon, 26 Nov 2007 04:39:12 -0800 reply
Using the fmt="ISO" parameter Google gave me an parsing error uponn submitting the sitemap (surely the date/time was not in the required format). I have used the fmt="HTML4" format instead and no complaints so far from Webmaster Tools.