Amazon: making data work

Lorcan 2 min read

I spent a little while just now looking at Amazon’s Statistically Improbable Phrases (distinctive patterns of words in a book). And at their other data mining features.
I tried some books with which I am familiar. First The Rise of the Network Society by Manuel Castells. Here is the Amazon entry. Here are the SIPs:

goods handling employment, ratio between services, informational paradigm, transnational production networks, core labor force, new spatial logic, information technology paradigm, networking logic, informational production, market capitalization value, business serv, personal serv, informational economy, new technological system, informational society, new technological paradigm, informational work, horizontal corporation, instant wars, new industrial space

This is interesting. For example, clicking on informational economy gives a list of books that may of interest that would have been difficult to find otherwise. Of course, informational is a distinctive usage of Castell’s. What did not show up was space of flows and space of places, phrases that are central to some of Castell’s arguments in this book. The Books on related topics is a good list (this is a list based on number of shared SIPs), but it does not show the other two books in the trilogy of which this is the first part. Now this is a pretty dense book 😉 It gets a Fog Index of 16.3, and 26% of its words are complex (three or more syllables).
Information Rules is the influential book on the the economics of network information by Hal Varian and Carl Shapiro. Here are its SIPs:

total switching costs, your switching costs, future switching costs, ignite positive feedback, collective switching costs, your installed base, your information product, own switching costs, formal standard setting, reducing switching costs, selling complementary products, own installed base, personalized pricing, assemble allies, openness strategy, rival evolutions, four generic strategies, rival revolutions, personalized prices, consumer switching costs, evolution versus revolution, open migration, group pricing, dual sourcing, demand side economies

Again, we see many of the central concepts introduced in the book. And clicking on individual phrases gives relevant results. Apparently an easier read than Castells: it gets a Fog Index of 16 and 19% of its words are complex. Again, an interesting range of related books is presented.
Theories of the information society by Frank Webster? Its textbook orientation is maybe countered by the difficulty of the material presented. Nevertheless, I was surprised to see it get a Fox Index of 21.9 and that 24% of its words are complex.
I am sure that this data will give rise to further study. A cursory view suggests that the data they provide is a real enhancement in many cases. They are making data work hard to good effect.

