Suggestion for to make it internationalized

[Mobile version(QRCode)] Total views:49,864 / Applauds for blog:1
Welcome to my page. I am an adminisrator of this site.
If you are this db's user, please contact with me by private mail. If not, please contact with me by email or twitter or facebook
Access record[Graph / PV Info.(Past 1 day / Past 1 week) / Access from outside (Yesterday / Past 1 week) / Vistors's list]
Inbox   /Send   /Sent
Reviews(List   /Limit)
Poll   /Agree:Got   /Sent
My Play List
<=Newer article2009/11/15 The release date of Final Fantasy XIII outside Japan was decided
=>Older article2009/10/21 Page to show the statics of the usage of this database

2009/11/04 "Programming > Suggestion for to make it internationalized"
[Show only this article / Modify / Delete / Send trackback / Add to the shared category]

1. Bug report 1
2. Bug report 2

1. Bug report 1

Recently I started using twitter ( ) basically for announcing the release of web service which I make.
In most of cases, I write in Japanese because my readers are Japanese.

I found the service which enables us to find the comment marked as favorite.
The service is used more in English world though started earlier and the author of favotter wrote the blog post in Japanese about why favstar won over favotter even though it started later and many discussion has been done.

Great growth.

Anyway, due to the discussion, I noticed the existence of and tried it.
It is good service, I think.
Today was down due to the distruction of database caused by power loss at his house, so considering such kind of incident, favstar can be good choice even for Japanese to use as the alternative of Favotter.
But favstar has quite bad point compared to favotter for not english users.

The sentence written in Japanese can be made shorten and the full comment is not shown.
And the character at the end of the sentence can be broken in many cases if you write long sentence as twitter's post.

I think it is because favstar doesn't except correct bytes which must be allowed for storing 1 Japanese twitter's post.
In the case of English, only 140 bytes are enough because 1 character of ascii consumes only 1 byte.
But in the case of Japanese character, 1 character consumes 3 bytes in the case of utf8 character, so you have to assign 420 bytes (=140 chars x 3 bytes) for storing Japanese twitter's 1 post.
Even if it doesn't try to store full sentence, you cannot cut the sentence just seeing the length of the bytes but you have to cut the sentence to make the last character is compatible with utf8's 1 character.

UTF8's character can be expressed following way if you express it in regex.
1 byte characterAscii[\x00-\x7F]
2 bytes character [\xC0-\xDF][\x80-\xBF]
3 bytes characterJapanese and so on[\xE0-\xEF][\x80-\xBF][\x80-\xBF]
4 bytes character [\xF0-\xF7][\x80-\xBF][\x80-\xBF][\x80-\xBF]
5 bytes character [\xF8-\xFB][\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF]
6 bytes character [\xFC-\xFD][\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF][\x80-\xBF]

So you can remove the corrupted character at the end of the utf8 character after cutting the sentence in the certain bytes following way.
my $utf8_sentence='一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十一二三四五六七八九十';

my $utf8_char_regex=join('|','[\x00-\x7F]',

my $max_bytes=280;
print 'Before'."\t".$utf8_sentence."\n";
print 'Before cut'."\t".length($utf8_sentence)." bytes\n";

$utf8_sentence=substr($utf8_sentence, 0, $max_bytes);
print 'Cutted in '.$max_bytes." bytes \t".length($utf8_sentence)." bytes\n";
print $utf8_sentence."\n";

$utf8_sentence=~ s!^((?:$utf8_char_regex)*).*!$1!s;
print 'Final length'."\t".length($utf8_sentence)." bytes\n";
print $utf8_sentence."\n";

Anyway, it seems that favstar tries to be compatible with Japanese, so I hope it will be able to store 140 Japanese characters.
2. Bug report 2

I encountered another issue.
It seems that favstar behaves wrongly about which part should be made url.
Maybe assuming that url starts only from http:// , https:// or ftp:// will solve this issue becauase the second link is not starting from http:// in this case.

Add comment to this article

[Read other articles]
<=Newer article2009/11/15 The release date of Final Fantasy XIII outside Japan was decided
=>Older article2009/10/21 Page to show the statics of the usage of this database

Articles categorized as "Programming by this user"
All articles of this user
Subscribe to RSS
Display Style of blog
2.Atlassian's products
4.Development of this site
5.Japanese comics
6.Japanese anime
7.Weekly hot news of Japanese culture
9.Japanese game
11.Japanese Comics (Manga)
12.Search Engine
13.Japanese drama
14.Japanese otaku culture
16.Ineternet world
20.Apache programming
34.Mysql Cluster
38.Good and new
Sayings from S-Cry-Ed

Rule in this world is speed. Even stupid person can write cool novel if he can spend 20 years for it.

If someone helped me, I will help him in return, which is my rule.

To become stronger, consider what is cowardliest thinking. And rebel against the thinking, which will make you stronger.

I am Japanese but working for some English sites.

Doctor Job Career
Nurse Job Career
Top Page top MetaSeachJP Works