Edit detail for #1330 non-ascii content causing UnicodeDecodeErrors with zope 2.10 revision 1 of 5

1 2 3 4 5
Editor: simon
Time: 2007/05/07 11:27:28 GMT-7
Note: immediate issue solved

changed:
-
After moving wiki.zope.org to a new server, I think all pages with
non-ascii characters are broken (eg,
http://wiki.zope.org/zope3/GermanDictionary). They work on the old, zope
2.9.4 server. I think zope 2.10 has a new tal implementation. Here is
zwiki's content.pt::

      <div tal:replace="structure options/body">

and zope's talinterpreter.py::

    def do_insertStructure_tal(self, (expr, repldict, block)):
        structure = self.engine.evaluateStructure(expr)
        if structure is None:
            return
        if structure is self.Default:
            self.interpret(block)
            return
        if isinstance(structure, I18nMessageTypes):
            text = self.translate(structure)
        else:
            text = unicode(structure) # <- ERROR

This is quite a problem. We need that structure keyword so that we can
render the html generated by markup rules, or inlined by users. And we
need non-ascii characters in our content. How is that alternate
self.translate path activated, is it useful to us ? What else could we do ?


From betabug Wed Apr 18 07:38:03 +0000 2007
From: betabug
Date: Wed, 18 Apr 2007 07:38:03 +0000
Subject: Can we (at least temporarily) move that server to 2.9?
Message-ID: <20070418073803+0000@zwiki.org>

Till we can find a solution? As you know we've found another bug with 2.10 ZPT's and I don't think the Zope bug correction mechanism moves all too fast.

Another observation: the /text and /editform methods still works, so we can at least get at the content. Looking at it I notice that the german umlauts are seriously messed up, like they went through a confused iso-to-utf8 conversion a couple of times. What once was an ``ü`` has become ``ü``. It could be more than just the rendering being messed up I believe.

From FrankLaurijssens Wed Apr 18 08:29:16 +0000 2007
From: FrankLaurijssens
Date: Wed, 18 Apr 2007 08:29:16 +0000
Subject: 
Message-ID: <20070418082916+0000@zwiki.org>

``ü`` is C383,C283,C3B8,C2BC, which renders to ``Ãø¼``. This in turn is C383,F8BC, ``Ã☐``, which I believe could turn out to be C3BC, ``ü``. But I may be wrong there.

From SimonMichael Wed Apr 18 15:31:56 +0000 2007
From: Simon Michael
Date: Wed, 18 Apr 2007 15:31:56 +0000
Subject: Can we (at least temporarily) move that server to 2.9?
Message-ID: <1176910328.10243.6.camel@dynabook.joyful.com>
In-Reply-To: <20070418073803+0000@zwiki.org>

I suppose you're right, downgrading is the best next step. I'll do that.

I saved the text and the unix file command told me it was utf-8, for
what it's worth.


From simon Wed Apr 18 17:20:59 +0000 2007
From: simon
Date: Wed, 18 Apr 2007 17:20:59 +0000
Subject: Can we (at least temporarily) move that server to 2.9?
Message-ID: <20070418172059+0000@zwiki.org>
In-Reply-To: <1176910328.10243.6.camel@dynabook.joyful.com>

I downgraded wiki.zope.org to 2.9.6. Unfortunately it didn't help! What is going on ?

From FrankLaurijssens Wed Apr 18 17:49:21 +0000 2007
From: Frank Laurijssens
Date: Wed, 18 Apr 2007 17:49:21 +0000
Subject: Can we (at least temporarily) move that server to 2.9?
Message-ID: <000101c781e2$0cc85a6c$030913ac@thuis.laurijssens.nl>

I think the page got 'upgraded' twice and of course there's no downgrade path.

From simon Wed Apr 18 18:50:57 +0000 2007
From: simon
Date: Wed, 18 Apr 2007 18:50:57 +0000
Subject: Can we (at least temporarily) move that server to 2.9?
Message-ID: <20070418185057+0000@zwiki.org>
In-Reply-To: <000101c781e2$0cc85a6c$030913ac@thuis.laurijssens.nl>

I don't think so, the page text is still identical to the old one. 

2.9.6 actually did worse (the editform also broke), so I'm back on 2.10.3 now. The old server where it works is actually zope 2.9.4. 

I've committed a workaround in addSkinTo which seems to get those pages rendering. There are other places which also use structure, such as the contents and recent changes, and those still can break.

Note that http://wiki.zope.org/zope3/GermanDictionary can be utf-8 decoded, but still displays junk characters in a utf-8 aware web browser; I think that's just bad data that needs fixing. The new server displays it exactly as the old one does.

From betabug Thu Apr 19 07:20:03 +0000 2007
From: betabug
Date: Thu, 19 Apr 2007 07:20:03 +0000
Subject: The question is: what damaged those contents
Message-ID: <20070419072003+0000@zwiki.org>

I've seen damaged high-ascii data in other places too (e.g. italian translation of the z3 dev book). I think we will have to double check any methods that attempt to convert content encoding. Maybe something went wrong only in this particular site, but we better make sure.

From FrankLaurijssens Fri Apr 27 18:22:09 +0000 2007
From: FrankLaurijssens
Date: Fri, 27 Apr 2007 18:22:09 +0000
Subject: Uh-oh
Message-ID: <20070427182209+0000@zwiki.org>

I don't want to submit a new issue right away because it is likely to be a duplicate, but: 

FrankLaurijssens

#1262

(just two examples)

From SimonMichael Sat Apr 28 02:43:43 +0000 2007
From: Simon Michael
Date: Sat, 28 Apr 2007 02:43:43 +0000
Subject: logo breakage
Message-ID: <4632B4DE.4080904@joyful.com>

I noticed the zope 2.10 upgrade broke leo.zwiki.org's skin. This was due 
to the structure keyword in  tal:replace="structure 
here/site_logo|default" in that site's old customized wikipage template. 
This looks like #1330 again, but I don't understand why it broke; the 
site_logo property contains::

 <img src="logo" border="0" alt="home" height=64 width=64 style="margin-left:8px;" />

This Zope 2.10 bug/feature is making Zwiki "just break" :/


From simon Sat Apr 28 02:48:44 +0000 2007
From: simon
Date: Sat, 28 Apr 2007 02:48:44 +0000
Subject: Uh-oh
Message-ID: <20070428024844+0000@zwiki.org>
In-Reply-To: <20070427182209+0000@zwiki.org>

I fixed those and a couple more pages on zwiki.org (by reencoding the content).

From simon Sun Apr 29 15:28:39 -0700 2007
From: simon
Date: Sun, 29 Apr 2007 15:28:39 -0700
Subject: fixed in darcs
Message-ID: <20070429152839-0700@zwiki.org>

Name: '#1330 non-ascii pages give UnicodeDecodeError with zope 2.10 ?' => '#1330 non-ascii content causing UnicodeDecodeErrors with zope 2.10' 
Status: open => closed 

After much study and some helpful discussions on irc, I've committed this::

  * #1330: a better fix for these unicode errors. Zope 2.10 expects TAL data
  to be unicode, older zopes do not. This can lead to many obscure unicode
  errors depending on your system locale, wiki content, cookies, phase of
  the moon etc. This fix aims to make all the standard templates robust
  against this. Wikis with old customized templates will still be vulnerable
  to this problem after upgrading to zope 2.10, until those templates are
  updated.

What it boils down to: there is a new talsafe() method which converts strings to unicode if zope is 2.10 or greater; and anywhere the structure keyword is used in a page template, we need to pass the data through talsafe first to guard against these errors. 

This is a bit of a pain. All customized templates will need this change when you upgrade from zope 2.9. You might not see the problem right away, it depends on your page content, username cookies, and default system encoding. The only alternative I could see was to monkey-patch zope 2.10's TAL, which is unappealing. 

From simon Sun Apr 29 15:43:31 -0700 2007
From: simon
Date: Sun, 29 Apr 2007 15:43:31 -0700
Subject: final word for today
Message-ID: <20070429154331-0700@zwiki.org>

Search for unicode in zope CHANGES.txt (should have tried this sooner!) for interesting information such as::

      - the ZopePageTemplate implementation now uses unicode
        internally.  Non-unicode instances are migrated on-the-fly to
        unicode. However this will work only properly for ZPT
        instances formerly encoded as utf-8 or ISO-8859-15. For other
        encodings you might set the environment variable
        ZPT_REFERRED_ENCODING to insert your preferred encoding in
        front of utf-8 and ISO-8859-15 within the encoding sniffer
        code. In addition there is a new 'output_encodings' property
        that controls the conversion from/to unicode for WebDAV/FTP
        operations.

I don't see how this "sniffer" kicks in for our case, but ok.. and::

      - the ZPT implementation has now a configurable option in order
        how to deal with UnicodeDecodeErrors. A custom
        UnicodeEncodingConflictResolver can be configured through ZCML
        (see Products/PageTemplates/(configure.zcml,
        unicodeconflictresolver.py, interfaces.py)

and::

      - Collector #1490: Added a new zope.conf option to control the
        character set used to encode unicode data that reaches
        ZPublisher without any specified encoding.

I think this is 'default-zpublisher-encoding' ?

From betabug Mon May 7 00:29:01 -0700 2007
From: betabug
Date: Mon, 07 May 2007 00:29:01 -0700
Subject: reopening, as we saw on BugDay that it still needs some work
Message-ID: <20070507002901-0700@zwiki.org>

Status: closed => open 

It was decided that the content storage in Zwiki should become a clean Unicode/UTF-8. Ensure that all strings entering pass by unicode(input, 'utf-8'). That way encoding should happen only once and if (ever) someone decides to set up a wiki with different encoding on output, that could still be possible with some rewriting work.

See also #1339

From simon Mon May 7 11:27:28 -0700 2007
From: simon
Date: Mon, 07 May 2007 11:27:28 -0700
Subject: immediate issue solved
Message-ID: <20070507112728-0700@zwiki.org>

Status: open => closed 

See also #1345.

Submitted by : simon at: 2007-04-18T03:45:36+00:00 (12 years ago)
Name :
Category : Severity : Status :
Optional subject :  
Optional comment :

After moving wiki.zope.org to a new server, I think all pages with non-ascii characters are broken (eg, http://wiki.zope.org/zope3/GermanDictionary). They work on the old, zope 2.9.4 server. I think zope 2.10 has a new tal implementation. Here is zwiki's content.pt:

<div tal:replace="structure options/body">

and zope's talinterpreter.py:

def do_insertStructure_tal(self, (expr, repldict, block)):
    structure = self.engine.evaluateStructure(expr)
    if structure is None:
        return
    if structure is self.Default:
        self.interpret(block)
        return
    if isinstance(structure, I18nMessageTypes):
        text = self.translate(structure)
    else:
        text = unicode(structure) # <- ERROR

This is quite a problem. We need that structure keyword so that we can render the html generated by markup rules, or inlined by users. And we need non-ascii characters in our content. How is that alternate self.translate path activated, is it useful to us ? What else could we do ?

Can we (at least temporarily) move that server to 2.9? --betabug, Wed, 18 Apr 2007 07:38:03 +0000 reply

Till we can find a solution? As you know we've found another bug with 2.10 ZPT's and I don't think the Zope bug correction mechanism moves all too fast.

Another observation: the /text and /editform methods still works, so we can at least get at the content. Looking at it I notice that the german umlauts are seriously messed up, like they went through a confused iso-to-utf8 conversion a couple of times. What once was an ü has become ü. It could be more than just the rendering being messed up I believe.

... --FrankLaurijssens?, Wed, 18 Apr 2007 08:29:16 +0000 reply

ü is C383,C283,C3B8,C2BC, which renders to Ãø¼. This in turn is C383,F8BC, Ã☐, which I believe could turn out to be C3BC, ü. But I may be wrong there.

Can we (at least temporarily) move that server to 2.9? --Simon Michael, Wed, 18 Apr 2007 15:31:56 +0000 reply

I suppose you're right, downgrading is the best next step. I'll do that.

I saved the text and the unix file command told me it was utf-8, for what it's worth.

Can we (at least temporarily) move that server to 2.9? --simon, Wed, 18 Apr 2007 17:20:59 +0000 reply

I downgraded wiki.zope.org to 2.9.6. Unfortunately it didn't help! What is going on ?

Can we (at least temporarily) move that server to 2.9? --Frank Laurijssens, Wed, 18 Apr 2007 17:49:21 +0000 reply

I think the page got 'upgraded' twice and of course there's no downgrade path.

Can we (at least temporarily) move that server to 2.9? --simon, Wed, 18 Apr 2007 18:50:57 +0000 reply

I don't think so, the page text is still identical to the old one.

2.9.6 actually did worse (the editform also broke), so I'm back on 2.10.3 now. The old server where it works is actually zope 2.9.4.

I've committed a workaround in addSkinTo which seems to get those pages rendering. There are other places which also use structure, such as the contents and recent changes, and those still can break.

Note that http://wiki.zope.org/zope3/GermanDictionary can be utf-8 decoded, but still displays junk characters in a utf-8 aware web browser; I think that's just bad data that needs fixing. The new server displays it exactly as the old one does.

The question is: what damaged those contents --betabug, Thu, 19 Apr 2007 07:20:03 +0000 reply

I've seen damaged high-ascii data in other places too (e.g. italian translation of the z3 dev book). I think we will have to double check any methods that attempt to convert content encoding. Maybe something went wrong only in this particular site, but we better make sure.

Uh-oh --FrankLaurijssens?, Fri, 27 Apr 2007 18:22:09 +0000 reply

I don't want to submit a new issue right away because it is likely to be a duplicate, but:

FrankLaurijssens?

#1262

(just two examples)

logo breakage --Simon Michael, Sat, 28 Apr 2007 02:43:43 +0000 reply

I noticed the zope 2.10 upgrade broke leo.zwiki.org's skin. This was due to the structure keyword in tal:replace="structure here/site_logo|default" in that site's old customized wikipage template. This looks like #1330 again, but I don't understand why it broke; the site_logo property contains:

<img src="logo" border="0" alt="home" height=64 width=64 style="margin-left:8px;" />

This Zope 2.10 bug/feature is making Zwiki "just break" :/

Uh-oh --simon, Sat, 28 Apr 2007 02:48:44 +0000 reply

I fixed those and a couple more pages on zwiki.org (by reencoding the content).

fixed in darcs --simon, Sun, 29 Apr 2007 15:28:39 -0700 reply

Name: '#1330 non-ascii pages give UnicodeDecodeError? with zope 2.10 ?' => '#1330 non-ascii content causing UnicodeDecodeErrors? with zope 2.10' Status: open => closed

After much study and some helpful discussions on irc, I've committed this:

* #1330: a better fix for these unicode errors. Zope 2.10 expects TAL data
to be unicode, older zopes do not. This can lead to many obscure unicode
errors depending on your system locale, wiki content, cookies, phase of
the moon etc. This fix aims to make all the standard templates robust
against this. Wikis with old customized templates will still be vulnerable
to this problem after upgrading to zope 2.10, until those templates are
updated.

What it boils down to: there is a new talsafe() method which converts strings to unicode if zope is 2.10 or greater; and anywhere the structure keyword is used in a page template, we need to pass the data through talsafe first to guard against these errors.

This is a bit of a pain. All customized templates will need this change when you upgrade from zope 2.9. You might not see the problem right away, it depends on your page content, username cookies, and default system encoding. The only alternative I could see was to monkey-patch zope 2.10's TAL, which is unappealing.

final word for today --simon, Sun, 29 Apr 2007 15:43:31 -0700 reply

Search for unicode in zope CHANGES.txt (should have tried this sooner!) for interesting information such as:

- the ZopePageTemplate implementation now uses unicode
  internally.  Non-unicode instances are migrated on-the-fly to
  unicode. However this will work only properly for ZPT
  instances formerly encoded as utf-8 or ISO-8859-15. For other
  encodings you might set the environment variable
  ZPT_REFERRED_ENCODING to insert your preferred encoding in
  front of utf-8 and ISO-8859-15 within the encoding sniffer
  code. In addition there is a new 'output_encodings' property
  that controls the conversion from/to unicode for WebDAV/FTP
  operations.

I don't see how this "sniffer" kicks in for our case, but ok.. and:

- the ZPT implementation has now a configurable option in order
  how to deal with UnicodeDecodeErrors. A custom
  UnicodeEncodingConflictResolver can be configured through ZCML
  (see Products/PageTemplates/(configure.zcml,
  unicodeconflictresolver.py, interfaces.py)

and:

- Collector #1490: Added a new zope.conf option to control the
  character set used to encode unicode data that reaches
  ZPublisher without any specified encoding.

I think this is 'default-zpublisher-encoding' ?

reopening, as we saw on BugDay? that it still needs some work --betabug, Mon, 07 May 2007 00:29:01 -0700 reply

Status: closed => open

It was decided that the content storage in Zwiki should become a clean Unicode/UTF-8. Ensure that all strings entering pass by unicode(input, 'utf-8'). That way encoding should happen only once and if (ever) someone decides to set up a wiki with different encoding on output, that could still be possible with some rewriting work.

See also #1339

immediate issue solved --simon, Mon, 07 May 2007 11:27:28 -0700 reply

Status: open => closed

See also #1345.