In 1948, the founder of information theory, Claude Shannon, proposed modeling language in terms of the probability of the next word in a sentence given the previous words. These types of probabilistic ...