What Can An N-Gram Tell Us?
For laughing and learning this TED talk is wonderful.
Having digitized 15 million books, Google enabled us to access a mammoth body of knowledge. However, just reading the entries from the year 2000, without eating or sleeping, absorbing 200 words a minute, would take someone 80 years. How then to learn from an information avalanche that could easily bury you?
A new discipline focusing on the frequency of selected words and phrases, culturomics conveys trends and I suspect much more. The TED researchers told us, for example, that the word “women” appears much more frequently than “men” after 1970. They also compared the date that that an innovation appeared to when its name became common usage. As you might expect, since 1800, the time span has narrowed considerably. Even censorship in Nazi Germany can be displayed through their database.
You can see why their website is addictive. Entering apple, for example, I observed when its usage skyrocketed. Then I compared it to PC to see how their trajectories differed. I also tried stock market, 1929 and Adam Smith.
What is an n-gram? It is the word(s) that their database tracks. For example, “the United States of America” is a 5-gram and 1929 is a 1-gram.
The Economic Lesson
As the U.S. economy grew, so too did the “infrastructures” that facilitated its expansion. A transportation infrastructure of roads, canals, and railroads moved people and goods. Our financial infrastructure moves money and credit. Google’s accomplishments and n-grams relate to our information infrastructure.
An economic question: What might compose a financial and information infrastructure?