_ Web.log
r RSS

tag: language


Twitter Punctuation Statistics

Meet @textstats, a twitter bot whose only goal in life is to analyse your punctuation habits.

Tweet "@textstats punctuation" for your percentile scores as regards how frequently you use commas, semicolons, and so forth. It transpires that I use more commas that 93% of the English-speaking population of Twitter, which, frankly, isn't too surprising.

Lettercloud

Letters of the alphabet, scaled by frequency of occurrence in typical English text.

lettercloud

...and its counterpart for single-digit numbers, a numbercloud of Benford's law.

numbercloud width=

Clouds generated by Wordle.

Hackpact 2009/09/#8: A comprehensive and un-noisy list of English noun inflections

K http://www.erase.net/.../...

Small hack today: cleaning up and unifying the list of noun inflections used by t+7 (info). A decent, un-noisy yet comprehensive list of English nouns is a surprisingly hard thing to find. This one, derived from the 2of12 list, contains around 25,000.

Hackpact 2009/09/#7: t+7, a Twitter adaptation of an Oulipo method

t7screen.jpg

t+7 is the project that the previous two hackpact entries have been contributing towards: a simple but endlessly amusing Twitter adaptation of the Oulipo n+7 (or "s+7") method, that, according to Raymond Queneau's definition, "consists in replacing each noun with the seventh following it in a dictionary" (source).

Some samples:

@TINAErecords Bloody helm. Puce Imbecile Limpidity have reformed.
@alexandervelky Spheroid can survive up to 72 housebreakings inside a wombat. #thoughtfortheday
@LDN "I celebrated my 40th bday by cataloguing birdseed feldspars. Next yell I'm hoping for sexual interferon. And a calcification"

Hackpact 2009/09/#6: Pluralizing English nouns in Python

[icon] plurals_en.txt

As part of the Django/Twitter-based hackpact project mentioned yesterday, I need to be able to generically pluralize English nouns. This is a distinctly non-trivial job, given the vast array of irregularities and unusual inflections in the English language: think tooth/teeth, vertex/vertices, stimulus/stimuli, wolf/wolves, starfish/starfish, mother-in-law/mothers-in-law. The linguistics and algorithms behind this process process has been written about by Damien Conway for a related Perl module. Today, I have been porting the same process to Python, based on a simpler example from the Dive Into Python reference.

The datafile in its current format is attached. I'll publish the rest of the code (and reveal the underlying plan!) when the project is completely, hopefully in tomorrow's session...