We just released a new technical report “Mapping the Expansion of Google’s Serving Infrastructure”, available as https://www.isi.edu/~johnh/PAPERS/Calder13a.pdf
From the abstract:
Modern content-distribution networks both provide bulk content and act as “serving infrastructure” for web services in order to reduce user-perceived latency. These serving infrastructures (such as Google’s) are now critical to the online economy, making it imperative to understand their size, geographic distribution, and growth strategies. To this end, we develop techniques that enumerate servers in these infrastructures, find their geographic location, and identify the association between clients and servers. While general techniques for server enumeration and geolocation can exhibit large error, our techniques exploit the design and mechanisms of serving infrastructure to improve accuracy. We use the EDNS-client-subnet extension to DNS to measure which clients a service maps to which of its servers. We devise a novel technique that uses this mapping to geolocate servers by combining noisy information about client locations with speed-of-light constraints. We demonstrate that this technique substantially improves geolocation accurate relative to existing approaches. We also cluster servers into physical sites by measuring RTTs and adapting the cluster thresholds dynamically. Google’s serving infrastructure has grown dramatically in the last six months, and we use our methods to chart its growth and understand its content serving strategy. We find that Google has almost doubled in size, and that most of the growth has occurred by placing servers in large and small ISPs across the world, not by expanding on Google’s backbone.
Datasets from this work will be available, please contact the authors at this time if you’re interested.