Old discussion of InternationalCharactersInPageNames. 200610: salvaged from google

Can I use international characters in WikiName's? I.e., can we have WikiName's like Fr kkeFr l r and bleGr d?

To accomodate for the Scandinavian languages (Danish, Finnish, Norwegian and Swedish) and German, ZWiki needs to recognize " " as uppercase and " " as lowercase chars. Further refinements are neceesary for languages like Icelandic and Spanish.

I tried including " " and " " at suitable places in the urlchars, wikiname1 and wikiname2 variables in ZWikiPage.py and made similar corrections to wwml.py. Indeed, ZWiki recognized "Fr kkeFr l r" (which, btw, is Danish for "NaughtyFrogThighs?") and " bleGr d" ("AppleButter?") as WikiName's. :-) However, ZWiki refused to actually create the pages, saying that the names contained illegal characters. :-(

So much for my first hacking into the ZWikiSource?... --KlausSeistrup

I don't know about latest ZWikiProduct?, but WFN uses [A-Z] and [a-z] in its regexps to match wikinames. I replaced it by string.uppercase and string.lowercase and now I can use anything in my wikinames. This doesn't help for multibyte characters, of course. I also had to comment out a piece of code where zwiki tries to quote some accented characters to their HTML entity form. Finally, I also had to replace [A-Za-z] with string.*case in OFS/ObjectManager.py or it would raise an exception when I try to create the object. --LaloMartins

Lalo is absolutely correct, and his solution applies directly to ZWiki. You have to change the wikiname1/2 regexps, and then the one in OFS/ObjectManager?.py . However, \b gives us tremendous trouble -- it is LOCALE-sensitive! It wants a letter on the right, and what's a letter, is determined by the language. Simon: I think this is a very serious handicap towards internationalization. This better be stated explicitly, so basically you can specify a set of preceding non-wikiname conditions separately yourself, and then your search functions will take two parameters, an explicit \b and the rest. In order to reduce my initial hacking to the minimum, I use Locale, and it works! I'm just not sure that changing Locale will not bring about the whole shebang of new boundaries, and I simply want to have both English and Russian to work. So: Here's my changes which allow me now to create a Great Russian Novel:

In ZWikiPage.py, we import locale, and the the new regexps for Russian KOI8 look like:

 import locale
 loc = locale.setlocale(locale.LC_ALL, 'ru_RU')
 wikiname1 = r'(?L)\b[A-Z\xE0-\xFF]+[a-z\xC0-\xDF]+[A-Z\xE0-\xFF][A-Za-z\xC0-\xFF]*[0-9]*'
 wikiname2 = r'(?L)\b[A-Z\xE0-\xFF][A-Z\xE0-\xFF]+[a-z\xC0-\xDF][A-Za-z\xC0-\xFF]*[0-9]*'

In OFS/ObjectManager?.py, we rewrite bad object name regexp like this:

 bad_id=re.compile(r'[^a-zA-Z\xC0-\xFF0-9-_~,.$# ]').search #TS
-- AlexyKhrabrov

Thanks Lalo, Alexy, this sounds very helpful. Do you think a zope bugfix request is warranted for ObjectManager?.py ? --SM

I think so. It should be locale-sensitive. --AK

From a purist's point of view, I don't think that WikiName should contain double-byte characters. You would then have to put all the DBC WikiName into brackets, making it all a big mess.

Simon Michael <simon@joyful.com>, Sat, 10 Nov 2001 13:49:23 -0800 (via mail):
Hello Joe - thanks for this patch you sent some time ago. I have just forwarded it to http://zwiki.org/ZWikiInternationalisation where we are working on this stuff. Best regards --Simon

Joseph Wayne Norton writes:
> I have made a few changes to the ZWikiPage.py. I wanted to see if
> these could be incorporated into the main branch.


>
> We need support for Japanese text so I thought the simplest would be
> use the "title" property when rendering the page and the "id" property
> when authoring the page. The id has to stick to valid url syntax but
> the title can be any text. Please ignore the last patch item ... I


> simply turned off the replacing of international characters.
>
> I also modified the editform DTML method in the ZWikiWebs?.zexp bundle
> to allow modification of the title property.
>
> If you have any other suggestions on how to better handle non-english


> based text, please let me know.

20010614-joe.patch

Simon Michael <simon@joyful.com>, Sat, 10 Nov 2001 14:15:14 -0800 (via mail):
Alexy Khrabrov writes:
> First of all, I successfully Russified ZWiki and added my $.02


> at GeneralProblems about international characters.
> The scoop is, \b is a terrible thing to have in a pattern
> until all locale issues are clear. Since it matches nothing,
> the code needs to be changed to it can accomodate a "precondition"
> which actually matches something -- then this precondition regexp


> would be passed to other routines. Or, you's need to accomodate
> a wikiname pattern with surrounding non-wiki delimiters, and use
> a numbered match in those routines for the meat.

SimonMichael, 2002/02/22 07:36 GMT (via web):
Hmm, I don't think I understand Alexy. Isn't a locale-sensitive \b just what we want ? Now that I am using string.upper/lowercase ([zwikidir/Regexps.py]?) I wonder if 0.9.9 will work in your setup.

EdwardKreis, 2002/05/30 07:51 GMT (via web):

Using the 0.9.9 and Lalo approach (replaced [A-Za-z]? with string.*case in OFS/ObjectManager?.py), I come to the situation where ZWiki recognizes and allows international characters in WikiName except when the international character is at the beginning of the WikiName. Perhaps, that is what the Alexey talks about?

Hmm, have added (?L) to regeexps and now everything works:

 wikiname1        = r'(?L)\b[%s]+[%s]+[%s][%s]*[0-9]*' % (UC,LC,UC,UC+LC)
 wikiname2        = r'(?L)\b[%s][%s]+[%s][%s]*[0-9]*' % (UC,UC,LC,UC+LC)

Simon, 2002/05/30 17:42 GMT (via web):
Good news. Aha, I didn't know about (?L). I will add that for 0.9.10.

Luciano Ramalho:
I would prefer a solution that does not require a patch to Zope and does not generate ugly URLs?: ZWiki could replace the accented chars by their ASCII equivalents when creating new pages and linking to them. So the WikiName PeL? should lead to a PeLe? page. And the URL to that page would show as .../PeLe? instead of .../PeL?%9D . I am willing to do the necessary refactoring, but I would like to have a little help from someone more knowledgeable about the code in the beginning. If you want to help, write to luciano@hiper.com.br

Simon, 2002/09/13 02:58 GMT (via web):
We should have two solutions for this these days, using square bracket links and hacking the bad_id. Lalo told me that when doing the latter, the wikinames work but the urls are not quoted and so different browsers interpret the encodings differently and end up creating different pages. The solution was to url-quote them. I've hacked bad_id on this server for experimentation now. Let's see.. Acentua��oInBareWikinames

Acentua��oInBareWikinames can't I paste these into mozilla ??

"type ãÁç or something in the text, then view it, select, copy, go to edit, and paste"

Acentua oInBareWikiNames, fricaAnd sia, CatchThe? nibus

The above don't get linked yet. So we still have a zwiki problem.

latin1 characters:

 A-Z                                                                                          
 a-z                                                                                                   

in python code:

 A-Z\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd2\xd3\xd4\xd5\xd6\xd8\xd9\xda\xdb\xdc\xdd\xc7\xd0\xd1\xde
 a-z\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf2\xf3\xf4\xf5\xf6\xf8\xf9\xfa\xfb\xfc\xfd\xff\xb5\xdf\xe7\xf0\xf1\xfe

Simon, 2002/09/13 04:44 GMT (via web):
Another little step forward. It now recognizes most bare wikinames containing latin-1 characters (not the africaandasia link).

This stuff is not easy. I'm not getting the results Lalo got at the top of this page using string.uppercase & lowercase. I think I need to setlocale(), but first need to figure out how to install locales on this freebsd server.

And what about freeform links.. at least they should work: [PeL ]? [ fricaAnd sia]?

Yes, but they don't quote the ids quite as intended (using _). To be fixed.

2002/09/13
I had the same problem with Greek in Wiki names. ZWiki 0.10.0 accepts wikinames with Greek characters if locale is set up correctly. However, Zope refuses to accept them as legitimate object names. The following modification to lib/python/OFS/ObjectManager.py worked for me (you can find the relevant line near the beginning of the script):

 bad_id=re.compile(r'[^a-zA-Z0-9\xA2\xB8-\xBA\xBC\xBE-\xFE-_~,.$# ]').search #TS 

Don't know if the problem appears with accented Latin-based characters, if so I guess this might work there too. CostisDallas

Simon, 2002/09/13 17:54 GMT (via web):
That was a timely post, similar to the statements above. I see now that my problems yesterday arose from not having the locale set up on this server. It seems zwiki does work as advertised - hack bad_id, get international characters in bare wikinames - with the browser dependency when creating pages still to be addressed.

Simon, 2002/09/13 18:37 GMT (via web):
Plus,

Simon, 2002/09/13 18:45 GMT (via web):
Re conflating similar international and ascii characters as Luciano suggests above - the first version of freeform links did this but I decided users would want these to be distinct for page naming purposes. Was I wrong ?

Simon, 2002/09/13 19:04 GMT (via web):
... and, bare wikinames containing international characters, working briefly on this site, have stopped working again. More later.

NB if you're experimenting with this stuff, don't forget to .../page/clearCache to re-render your page after refreshing/restarting with code changes.

Simon, 2002/09/13 19:16 GMT (via web):

Oh yes! Because I don't have a locale set on this server. They worked briefly because of my exploratory hacks, and will work again if I get a locale installed. (And differently depending on which locale I set).

Simon Michael, 2002/09/14 01:09 GMT (via mail):
I'm making progress with this. See the latest comments in Regexps.py . Basically, with the latest code,

  1. "International ids" - To enable international characters in zope (and zwiki) ids, you hack zope's bad_id.
  2. "International wiki names" - To enable them in wikinames, you set your locale (if necessary) or hard-code some values in Regexps.py.
  3. "International freeform names" - They are always enabled in freeform names.

If 1 is not in effect, 2 and 3 quote their international characters. If it is, they don't. This site currently has 1 disabled and 2 enabled.

Note that this configuration implies that wikinames and page ids may differ! Eg: bleGr d. This is another fairly fundamental simplifying axiom that we should not abandon lightly.

As you can see (unless I've turned it off) it seems to be working pretty well here, except zope is working harder and using more memory (70+Mb instead of the usual 50). Right now this is true whether you are using the feature or not, so this needs to be improved. A good refactoring of the whole naming and linking system should help, but this may not happen for 0.11.

Comments ?

Es posible usar "e es" en un enlace?