星之一角: coffee-script (4) (and regexp, ruby)

想把 T algebras 看完，也想來寫寫 haskell 習題，結果無奈 (?) 又有新發現 (淚)

*

上篇提到

另外，發現 javascript 的 \w regular expression 只能 match
A-Za-z 和 underscore, 不能 match 其他 unicode character...
啥鬼 :x

然後我才發現，也只有 ruby 1.8 的 \w 可以 match unicode.
在 1.9 裡，就算有用 u 去修飾 regexp, 也沒辦法 match unicode.
查了一下，才發現在 1.9 裡應該用比較正式的方式去 match unicode.
可以用 character class 或是 character property. match unicode
用後者應該比較正式一點，我想。

也就是說，可以用 [[:word:]] 或是 \p{L}, 而在 ruby 1.9 裡，
還可以用 \p{word}, 我猜應該跟 [[:word:]] 的意思一樣。
為了移植性，看起來是固定用正常的 character property 比較好一點。

reference:
\w didn't match unicode word in ruby 1.9
Regexps (Read Ruby 1.9)

* * *

回到 coffee-script 和 javascript, 既然內建不能用，只好用 external library.
試了一下，XRegExp 加上 unicode plugins 應該是可以用的。

不過要讓 XRegExp 在 server/client side 都用同樣的方式運作的話，
還需要稍微調整一下。這點應該是由於 XRegExp 本身不是 nodejs 的
module, 而 nodejs 本身是有做好 module 的功能，而非像原本的
javascript 根本就是一團漿糊全部混在一起......

也因此，不能一個個 javascript file 讀入，要把所有的檔案串起來，
然後本身做成一個 XRegExp module. 步驟：xregexp-all.sh

# download XRegExp 1.5.0
wget http://xregexp.com/xregexp-min.js

# download XRegExp unicode modules
wget http://xregexp.com/plugins/xregexp-unicode-base.js
wget http://xregexp.com/plugins/xregexp-unicode-categories.js
wget http://xregexp.com/plugins/xregexp-unicode-scripts.js
wget http://xregexp.com/plugins/xregexp-unicode-blocks.js

# cat all files into one. note, xregexp-min.js should be
# on the top. here lexically m is front of u, so we
# don't need bother it
cat xregexp-*.js > xregexp-tmp.js

# convert CRLF to LF
tr -d "\r" < xregexp-tmp.js > xregexp-all.js

# make XRegExp a nodejs module
echo '
XRegExp.new = XRegExp;
var root;
root = (typeof exports !== "undefined" && exports !== null) ? exports : this;
root.XRegExp = XRegExp;' >> xregexp-all.js

在 coffee-script 中，這樣讀入

XRegExp = require('xregexp-all.js')
XRegExp.new = XRegExp.XRegExp

如此一來， XRegExp.new('\\p{L}+').test('啊囉哈') 就是一個
coffee-script 和 javascript 的 polyglot 了

禁止餵食

日期分類

標籤分類

星之一角

2011-05-14

coffee-script (4) (and regexp, ruby)

0 retries:

Post a Comment

favorite albums