My colleague Thom Hickey pointed me at this presentation from Jeff Dean, a Distinguished Engineer in Google’s Systems Lab. It provides an overview of how Google works and looks at some of the engineering issues it faces moving forward. Two things struck me. The first was simply, again, the scale of their operation and the engineering challenges that that poses. The second was also about scale: this time about the service benefits that they can derive from that scale. For example, he describes clustering based on machine learning approaches. At the moment, the ‘intelligence’ that they create is buried in their systems and services – see how they bring together Google News stories, or match ads to web pages. It will be interesting downstream to see if they find a market for services based on this ‘intelligence’. A few moment’s reflection turns up many such services. A very simple example: they could provide a ‘Google thesaurus’ (along the lines of Roget’s thesaurus).
Search is one of the most important applications used on the internet, but it also poses some of the most interesting challenges in computer science. Providing high-quality search requires understanding across a wide range of computer science disciplines, from lower-level systems issues like computer architecture and distributed systems to applied areas like information retrieval, machine learning, data mining, and user interface design. I’ll describe some of the challenges in these areas, discuss some of the applications that Google has developed over the past few years. I’ll also highlight some of the systems that we’ve built at Google, including GFS, a large-scale distributed file system, and MapReduce, a library for automatic parallelization and distribution of large-scale computation. Along the way, I’ll share some interesting observations derived from Google’s web data. Jeff Dean joined Google in 1999 and is currently a Distinguished Engineer in Google’s Systems Lab. [Colloquium Detail]
The presentation takes about an hour to play.