The paper “Do You See Me Now? Sparsity in Passive Observations of Address Liveness” will appear in the 2017 Conference on Network Traffic Measurement and Analyais (TMA) July 21-23, 2017 in Dublin, Ireland. The datasets from the paper that we can make public will be at https://ant.isi.edu/datasets/sparsity/.From the abstract of the paper:
Accurate information about address and block usage in the Internet has many applications in planning address allocation, topology studies, and simulations. Prior studies used active probing, sometimes augmented with passive observation, to study macroscopic phenomena, such as the overall usage of the IPv4 address space. This paper instead studies the completeness of passive sources: how well they can observe microscopic phenomena such as address usage within a given network. We define sparsity as the limitation of a given monitor to see a target, and we quantify the effects of interest, temporal, and coverage sparsity. To study sparsity, we introduce inverted analysis, a novel approach that uses complete passive observations of a few end networks (three campus networks in our case) to infer what of these networks would be seen by millions of virtual monitors near their traffic’s destinations. Unsurprisingly, we find that monitors near popular content see many more targets and that visibility is strongly influenced by bipartite traffic between clients and servers. We are the first to quantify these effects and show their implications for the study of Internet liveness from passive observations. We find that visibility is heavy-tailed, with only 0.5% monitors seeing more than 10\% of our targets’ addresses, and is most affected by interest sparsity over temporal and coverage sparsity. Visibility is also strongly bipartite. Monitors of a different class than a target (e.g., a server monitor observing a client target) outperform monitors of the same class as a target in 82-99% of cases in our datasets. Finally, we find that adding active probing to passive observations greatly improves visibility of both server and client target addresses, but is not critical for visibility of target blocks. Our findings are valuable to understand limitations of existing measurement studies, and to develop methods to maximize microscopic completeness in future studies.