2012-08-31 o Use resolved native symbols. 2010-02-17 o Added CT n-gram counting. o We now use vectors instead of pairlists to avoid memory corruption problems. 2010-01-09 o Use perl in word splitting in textcnt. This fixes a problem with combining characters when using the default split pattern. 2009-10-21 o Provide a useBytes argument in textcnt (R >= 2.10.x). o Fixed blank strings in word-sequence counting. o Added reduced n-gram counting. 2009-09-04 0.0-3 o Make Unicode latin ligature translation work "portably". o Migrated encoding utilities. 2009-08-31 0.0-2 o We now call l10n_info from C to avoid problems on non-Linux platforms with langinfo.h 2009-08-24 0.0-2 o Migrated core utilities to CRAN package tau. o Fixed to SET_STRING_ELT in C code. 2009-01-15 0.1-2 o Added counting of word-sequences. 2008-04-30 0.1-1 Initial Release o Provide efficient C-level counting of n-grams, affixes, or words. o Provide utilities for tokenizing, stopword removal, and other preprocessing tasks.