Submitted by : 192.168.1.21 at: 2004-01-28T17:57:25+00:00 (13 years ago)
Name :
Category : Severity : Status :
Optional subject :  
Optional comment :

When you type a word that begins with Uppercase and contains an UTF-8 international character, it is recognized as a WikiWord?, in a wrong fashion. For instance:

Alguém

This happened to me using ZWiki 0.27rc1, Plone 2.0 and Zope 2.7. Using Zwiki inside Plone.

However, as I need a quick solution I decided to fix it myself!

So here is my fix to Regexps.py (line 135):

import locale
(lang, encoding) = locale.getlocale()
#locale.setlocale(locale.LC_ALL, '')
if locale.getlocale() != (None,None):
  Ul = [x.encode('utf8') for x in unicode(string.uppercase, encoding)]
  Ll = [x.encode('utf8') for x in unicode(string.lowercase, encoding)]

  U = reduce(lambda x,y:x+'|'+y, Ul)
  L = reduce(lambda x,y:x+'|'+y, Ll)

  wikiname1 = r'(?L)\b(%s)+(%s)+(%s)(%s)*[0-9]*' % (U,L,U,U+'|'+L)
  wikiname2 = r'(?L)\b(%s)(%s)+(%s)(%s)*[0-9]*'  % (U,U,L,U+'|'+L)

Remember to setup Zope's locale correctly also.

I hope this helps!

traldar

very interesting --SimonMichael, Thu, 29 Jan 2004 00:16:59 -0800 reply

It helps a lot, thanks. I have some questions - answer any you care to!

Is this ok, or is the reduce required for tricky i18n string handling ?

status --SimonMichael, Thu, 29 Jan 2004 11:16:49 -0800 reply

This server is now running my version of your patch (see http://zwiki.org/zwikidir/Regexps.py ) for testing.. I have no locale set on this server though, so what we're seeing here is the no-locale case, which I've tried to update for UTF-8.

It's not yet right though (and getting hard to follow!). Although it recognizes utf-8 Alguém as a whole word, it links it when it shouldn't.

So let's try --Samotnik, Thu, 29 Jan 2004 11:30:25 -0800 reply

Śpiąca Królewna łka, a wróble ćwierkają

AlguémPage

checking in --SimonMichael, Thu, 29 Jan 2004 22:29:56 -0800 reply

The no-locale case seems to be working now also, recognizing western european characters in links and not linking things it shouldn't. The regexps just got more complicated, I haven't noticed any blatant speed change. More testing needed. I think this needs to go into 0.27.

property change --SimonMichael, Thu, 29 Jan 2004 22:37:31 -0800 reply

Status: open => closed

Problem if locale encoding is utf-8 --Tue, 03 Feb 2004 03:35:46 -0800 reply

i get a broken Product Zwiki if the locale is set to de_DE.UTF-8 (on FreeBSD? with utf8locale port. the traceback is:

Import Traceback

Traceback (most recent call last):
 File "/usr/local/www/z27c1/lib/python/OFS/Application.py", line 654, in import_product
   product=__import__(pname, global_dict, global_dict, silly)
 File "/usr/local/www/z27c1inst/Products/ZWiki/__init__.py", line 10, in ?
   import ZWikiPage, ZWikiWeb, Permissions, Defaults
 File "/usr/local/www/z27c1inst/Products/ZWiki/ZWikiPage.py", line 63, in ?
   from Regexps import url, bracketedexpr, doublebracketedexpr, \
 File "/usr/local/www/z27c1inst/Products/ZWiki/Regexps.py", line 71, in ?
   U='|'.join([x.encode('utf8') for x in unicode(string.uppercase,encoding)])
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 26-27: invalid data

some more info --Tue, 03 Feb 2004 03:56:51 -0800 reply

the versions are: zwiki 0.27 , zope 2.7.0rc1, python 2.3.3

the last "File" line in the traceback should be line 70 in the original source because i added a line to set the encoding to "iso8859_15" for testing but commented it out to get the traceback back.

very interesting --Sat, 28 Feb 2004 09:31:49 -0800 reply

Hello!

Sorry for the late answer, but here it goes: 1. I guess the list comprehension is more comprehensive, better than the reduce version.

  1. the setlocale was ressenting my locale, because I use zope's own locale setting, and not rely on the LANG environment var.
  2. I don't know how the performance would be impacted, but for my tests it does not seem to be any absurd performance issues...
  3. It is a good idea to utf8 encode the default international characters.

5. In my case, the ZWiki I installed will only be used by folks speaking Portuguese, so I only need to encode the chars in ISO-8859-15, but for a more i18n ZWiki it would be nice to encode all i18n characters to utf8. I don't know how much performance it would suck up, however...