Submitted by : simon at: 2007-04-18T03:45:36+00:00 (10 years ago)
Name :
Category : Severity : Status :
Optional subject :  
Optional comment :

After moving wiki.zope.org to a new server, I think all pages with non-ascii characters are broken (eg, http://wiki.zope.org/zope3/GermanDictionary). They work on the old, zope 2.9.4 server. I think zope 2.10 has a new tal implementation. Here is zwiki's content.pt:

<div tal:replace="structure options/body">

and zope's talinterpreter.py:

def do_insertStructure_tal(self, (expr, repldict, block)):
    structure = self.engine.evaluateStructure(expr)
    if structure is None:
        return
    if structure is self.Default:
        self.interpret(block)
        return
    if isinstance(structure, I18nMessageTypes):
        text = self.translate(structure)
    else:
        text = unicode(structure) # <- ERROR

This is quite a problem. We need that structure keyword so that we can render the html generated by markup rules, or inlined by users. And we need non-ascii characters in our content. How is that alternate self.translate path activated, is it useful to us ? What else could we do ?

Can we (at least temporarily) move that server to 2.9? --betabug, Wed, 18 Apr 2007 07:38:03 +0000 reply

Till we can find a solution? As you know we've found another bug with 2.10 ZPT's and I don't think the Zope bug correction mechanism moves all too fast.

Another observation: the /text and /editform methods still works, so we can at least get at the content. Looking at it I notice that the german umlauts are seriously messed up, like they went through a confused iso-to-utf8 conversion a couple of times. What once was an ü has become ü. It could be more than just the rendering being messed up I believe.

... --EmmaLaurijssens, Wed, 18 Apr 2007 08:29:16 +0000 reply

ü is C383,C283,C3B8,C2BC, which renders to Ãø¼. This in turn is C383,F8BC, Ã☐, which I believe could turn out to be C3BC, ü. But I may be wrong there.

Can we (at least temporarily) move that server to 2.9? --Simon Michael, Wed, 18 Apr 2007 15:31:56 +0000 reply

I suppose you're right, downgrading is the best next step. I'll do that.

I saved the text and the unix file command told me it was utf-8, for what it's worth.

Can we (at least temporarily) move that server to 2.9? --simon, Wed, 18 Apr 2007 17:20:59 +0000 reply

I downgraded wiki.zope.org to 2.9.6. Unfortunately it didn't help! What is going on ?

Can we (at least temporarily) move that server to 2.9? --Frank Laurijssens, Wed, 18 Apr 2007 17:49:21 +0000 reply

I think the page got 'upgraded' twice and of course there's no downgrade path.

Can we (at least temporarily) move that server to 2.9? --simon, Wed, 18 Apr 2007 18:50:57 +0000 reply

I don't think so, the page text is still identical to the old one.

2.9.6 actually did worse (the editform also broke), so I'm back on 2.10.3 now. The old server where it works is actually zope 2.9.4.

I've committed a workaround in addSkinTo which seems to get those pages rendering. There are other places which also use structure, such as the contents and recent changes, and those still can break.

Note that http://wiki.zope.org/zope3/GermanDictionary can be utf-8 decoded, but still displays junk characters in a utf-8 aware web browser; I think that's just bad data that needs fixing. The new server displays it exactly as the old one does.

The question is: what damaged those contents --betabug, Thu, 19 Apr 2007 07:20:03 +0000 reply

I've seen damaged high-ascii data in other places too (e.g. italian translation of the z3 dev book). I think we will have to double check any methods that attempt to convert content encoding. Maybe something went wrong only in this particular site, but we better make sure.

Uh-oh --EmmaLaurijssens, Fri, 27 Apr 2007 18:22:09 +0000 reply

I don't want to submit a new issue right away because it is likely to be a duplicate, but:

EmmaLaurijssens

#1262

(just two examples)

logo breakage --Simon Michael, Sat, 28 Apr 2007 02:43:43 +0000 reply

I noticed the zope 2.10 upgrade broke leo.zwiki.org's skin. This was due to the structure keyword in tal:replace="structure here/site_logo|default" in that site's old customized wikipage template. This looks like #1330 again, but I don't understand why it broke; the site_logo property contains:

<img src="logo" border="0" alt="home" height=64 width=64 style="margin-left:8px;" />

This Zope 2.10 bug/feature is making Zwiki "just break" :/

Uh-oh --simon, Sat, 28 Apr 2007 02:48:44 +0000 reply

I fixed those and a couple more pages on zwiki.org (by reencoding the content).

fixed in darcs --simon, Sun, 29 Apr 2007 15:28:39 -0700 reply

Name: '#1330 non-ascii pages give UnicodeDecodeError? with zope 2.10 ?' => '#1330 non-ascii content causing UnicodeDecodeErrors with zope 2.10' Status: open => closed

After much study and some helpful discussions on irc, I've committed this:

* #1330: a better fix for these unicode errors. Zope 2.10 expects TAL data
to be unicode, older zopes do not. This can lead to many obscure unicode
errors depending on your system locale, wiki content, cookies, phase of
the moon etc. This fix aims to make all the standard templates robust
against this. Wikis with old customized templates will still be vulnerable
to this problem after upgrading to zope 2.10, until those templates are
updated.

What it boils down to: there is a new talsafe() method which converts strings to unicode if zope is 2.10 or greater; and anywhere the structure keyword is used in a page template, we need to pass the data through talsafe first to guard against these errors.

This is a bit of a pain. All customized templates will need this change when you upgrade from zope 2.9. You might not see the problem right away, it depends on your page content, username cookies, and default system encoding. The only alternative I could see was to monkey-patch zope 2.10's TAL, which is unappealing.

final word for today --simon, Sun, 29 Apr 2007 15:43:31 -0700 reply

Search for unicode in zope CHANGES.txt (should have tried this sooner!) for interesting information such as:

- the ZopePageTemplate implementation now uses unicode
  internally.  Non-unicode instances are migrated on-the-fly to
  unicode. However this will work only properly for ZPT
  instances formerly encoded as utf-8 or ISO-8859-15. For other
  encodings you might set the environment variable
  ZPT_REFERRED_ENCODING to insert your preferred encoding in
  front of utf-8 and ISO-8859-15 within the encoding sniffer
  code. In addition there is a new 'output_encodings' property
  that controls the conversion from/to unicode for WebDAV/FTP
  operations.

I don't see how this "sniffer" kicks in for our case, but ok.. and:

- the ZPT implementation has now a configurable option in order
  how to deal with UnicodeDecodeErrors. A custom
  UnicodeEncodingConflictResolver can be configured through ZCML
  (see Products/PageTemplates/(configure.zcml,
  unicodeconflictresolver.py, interfaces.py)

and:

- Collector #1490: Added a new zope.conf option to control the
  character set used to encode unicode data that reaches
  ZPublisher without any specified encoding.

I think this is 'default-zpublisher-encoding' ?

reopening, as we saw on BugDay that it still needs some work --betabug, Mon, 07 May 2007 00:29:01 -0700 reply

Status: closed => open

It was decided that the content storage in Zwiki should become a clean Unicode/UTF-8. Ensure that all strings entering pass by unicode(input, 'utf-8'). That way encoding should happen only once and if (ever) someone decides to set up a wiki with different encoding on output, that could still be possible with some rewriting work.

See also #1339

immediate issue solved --simon, Mon, 07 May 2007 11:27:28 -0700 reply

Status: open => closed

See also #1345.

related ? --simon, Thu, 23 Aug 2007 10:32:12 -0700 reply

http://svn.zope.org/Zope/trunk/lib/python/Products/PageTemplates/Expressions.py?rev=78767&r1=71802&r2=78767

related ? --EmmaLaurijssens, Thu, 23 Aug 2007 11:00:41 -0700 reply

Could you paste an excerpt, I get a 403 error on that link...

update --simon, Thu, 01 May 2008 20:48:44 -0700 reply

#1376 seems to show that this can happen even without the structure keyword, so perhaps there are more of these lurking. Note the possible global solutions mentioned above in "final word for today".