Here are the slides from a seminar at UCT introducing speech recognition and the project to integrate CMU Sphinx into Opencast Matterhorn, looking inter alia at language modelling using Wikipedia.
The project is at an early stage, so this is more an overview of the problem space and plans rather than specific results.