tim> Straight character n-grams are very appealing because they're the
tim> simplest and most language-neutral; I didn't have any luck with
tim> them over the weekend, but the size of my training data was
tim> trivial.
Anybody up for pooling corpi (corpora?)?
Skip