Submitted by : 127.0.0.1 at: 2004-10-15T22:28:48+00:00 (13 years ago)
Name :
Category : Severity : Status :
Optional subject :  
Optional comment :

We have a big amount of ZWiki page written in iso-8859-1 but since ZWiki 0.27 the coding of these pages is proudly announced as UTF-8... :-( The only answer I found up to now is : convert the page manually... :-( Is there anybody with enough ZWiki knowledge to kindly write a method to iterate on all the pages of a ZWiki and if it find heuristically some incorrect UTF-8 character convert the page from the old encoding to UTF-8 ?

It should help all the people with old international pages...

Thank you in advance !

here's what I used --simon, Fri, 15 Oct 2004 23:01:05 -0700 reply

Call this python script in the context of your wiki folder:

# convert iso-8859-1-encoded wiki pages to system default encoding

import sys

RESPONSE =  container.REQUEST.RESPONSE
#RESPONSE.setHeader('Content-Type', 'text/html')
page = context.objectValues(spec='ZWiki Page')[0]
converted=0
for p in page.pageObjects():
  try: t = unicode(p.text())
  except UnicodeDecodeError:
    converted += 1
    log = "converting %s to %s\n" % (p.title_or_id(),'default encoding')#sys.getdefaultencoding())
    #RESPONSE.write(log) #stops
    print log
    p.edit(text=unicode(p.text(),'iso8859-1').encode(), log=log)
print 'fixed encoding of %d pages\n' % (converted)
return printed

property change --simon, Fri, 15 Oct 2004 23:01:23 -0700 reply

Status: open => closed

property change --simon, Fri, 15 Oct 2004 23:03:00 -0700 reply

Severity: serious => normal

Small mistake? --Wed, 22 Jun 2005 07:42:30 -0700 reply

This script works here only with the following line:
p.edit(text=unicode(p.text(),'iso8859-1').encode('utf-8'), log=log)

Otherwise, .encode() seems to try the ASCII encoding/default encoding(?).

small change --Mon, 08 Aug 2005 01:12:51 -0700 reply

Hi, needed to change <tt>p.edit(text=unicode(p.text(),''iso8859-1'').encode(), log=log)</tt> to <tt>p.edit(text=unicode(p.text(),''iso8859-1'').encode(''utf-8''), log=log)</tt>

Watch out for ReStructuredText?! --kaleissin, Mon, 15 May 2006 13:09:30 -0700 reply

If you have a wikipage using ReStructuredText? in the wiki-folder you must set <tt>rest-output-encoding</tt> and <tt>rest-input-encoding</tt> in <tt>zope.conf</tt> to the original encoding prior to conversion or you'll get a UnicodeError?.

Prettified version that also converts pagenames --kaleissin, Mon, 15 May 2006 13:14:54 -0700 reply

The first version doesn't convert the pagenames/ids:

FROM     = 'iso8859-1'
TO       = 'UTF-8'
REQUEST  = container.REQUEST
RESPONSE = REQUEST.RESPONSE
page = context.objectValues(spec='ZWiki Page')[0]
converted=0
for p in page.pageObjects():
    try:
        # any non-ascii characters ?
        t = unicode(p.text())
        t = unicode(p.pageName())
    except UnicodeDecodeError:
        # yes - convert it
        converted += 1
        log = "Converting %s from %s to %s\n" % (p.title_or_id(), FROM, TO)
        print log
        p.edit(
            text=unicode(p.text(), FROM).encode(TO),
            title=unicode(p.pageName(), FROM).encode(TO),
            log=log,
            REQUEST=REQUEST, # needed for Zwiki < ~0.46, probably
            )
# make sure hierarchy cache is current
page.updatecontents()
# would like to print the log here, but edit & rename do a redirect;
# not sure how to undo that.. we'll redirect to the front page
#print 'Fixed encoding of %d pages\n' % (converted)
#return printed
RESPONSE.redirect(REQUEST['URL1'])

Prettified version that also converts pagenames --simon, Mon, 15 May 2006 13:42:39 -0700 reply

We think this one works reasonably well, but it may forget parents for some reason - beware, keep a backup, any clues welcome.

ok --simon, Mon, 15 May 2006 14:04:00 -0700 reply

It should be fine now.

note --simon, Wed, 18 Apr 2007 02:26:43 +0000 reply

Note this was built in to Zwiki a while back. If you visit SOMEPAGE/upgradeAll as manager, it will hopefully resolve all these problems for you. You can also try to fix just one page by visiting PROBLEMPAGE/fixPageEncoding , optionally adding ?FROM=oldencoding&TO=newencoding arguments. See Admin.py for more.