John Heidemann gave the talk “A First Look at Measuring the Internet during Novel Coronavirus to Evaluate Quarantine (MINCEQ)” at Digital Technologies for COVID-19 Webinar Series, hosted by Craig Knoblock and Bhaskar Krishnamachari of USC Viterbi School of Engineering on May 29, 2020. Internet Outages: Reliablity and Security” at the University of Oregon Cybersecurity Day in Eugene, Oregon on April 23, 2018. A video of the talk is on YoutTube at https://www.youtube.com/watch?v=tduZ1Y_FX0s. Slides are available at https://www.isi.edu/~johnh/PAPERS/Heidemann20a.pdf.
From the abstract:
Measuring the Internet during Novel Coronavirus to Evaluate Quarantine (RAPID-MINCEQ) is a project to measure changes in Internet use during the COVID-19 outbreak of 2020.
Today social distancing and work-from-home/study-from-home are the best tools we have to limit COVID’s spread. But implementation of these policies varies in the US and around the global, and we would like to evaluate participation in these policies. This project plans to develop two complementary methods of assessing Internet use by measuring address activity and how it changes relative to historical trends. Changes in the Internet can reflect work-from-home behavior. Although we cannot see all IP addresses (many are hidden behind firewalls or home routers), early work shows changes at USC and ISI.
This project is support by an NSF RAPID grant for COVID-19 and just began in May 2020, so this talk will discuss directions we plan to explore.
This project is joint work of Guillermo Baltra, Asma Enayet, John Heidemann, Yuri Pradkin, and Xiao Song and is supported by NSF/CISE as award NSF-2028279.
The Internet is central to our lives, but we know astoundingly little about it. Even though many businesses and individuals depend on it, how reliable is the Internet? Do policies and practices make it better in some places than others?
Since 2006, we have been studying the public face of the Internet to answer these questions. We take regular censuses, probing the entire IPv4 Internet address space. For more than two years we have been observing Internet reliability through active probing with Trinocular outage detection, revealing the effects of the Internet due to natural disasters like Hurricanes from Sandy to Harvey and Maria, configuration errors that sometimes affect millions of customers, and political events where governments have intervened in Internet operation. This talk will describe how it is possible to observe Internet outages today and what they are beginning to say about the Internet and about the physical world.
This talk builds on research over the last decade in IPv4 censuses and outage detection and includes the work of many of my collaborators.
Data from this talk is all available; see links on the last slide.
New network measurements are great–you can learn about the whole world! But new network measurements are horrible–are you sure you learn about the world, and not about bugs in your code or approach? New scientific approaches must be tested and ultimately calibrated against ground truth. Yet ground truth about the Internet can be quite difficult—often network operators themselves do not know all the details of their network. This talk will explore the role of ground truth in network measurement: getting it when you can, alternatives when it’s imperfect, and what we learn when none is available.
This talk builds on research over the last decade with many people, and the slides include some discussion from the TMA PhD school audience.
The paper “Do You See Me Now? Sparsity in Passive Observations of Address Liveness” will appear in the 2017 Conference on Network Traffic Measurement and Analyais (TMA) July 21-23, 2017 in Dublin, Ireland. The datasets from the paper that we can make public will be at https://ant.isi.edu/datasets/sparsity/.
From the abstract of the paper:
Accurate information about address and block usage in the Internet has many applications in planning address allocation, topology studies, and simulations. Prior studies used active probing, sometimes augmented with passive observation, to study macroscopic phenomena, such as the overall usage of the IPv4 address space. This paper instead studies the completeness of passive sources: how well they can observe microscopic phenomena such as address usage within a given network. We define sparsity as the limitation of a given monitor to see a target, and we quantify the effects of interest, temporal, and coverage sparsity. To study sparsity, we introduce inverted analysis, a novel approach that uses complete passive observations of a few end networks (three campus networks in our case) to infer what of these networks would be seen by millions of virtual monitors near their traffic’s destinations. Unsurprisingly, we find that monitors near popular content see many more targets and that visibility is strongly influenced by bipartite traffic between clients and servers. We are the first to quantify these effects and show their implications for the study of Internet liveness from passive observations. We find that visibility is heavy-tailed, with only 0.5% monitors seeing more than 10\% of our targets’ addresses, and is most affected by interest sparsity over temporal and coverage sparsity. Visibility is also strongly bipartite. Monitors of a different class than a target (e.g., a server monitor observing a client target) outperform monitors of the same class as a target in 82-99% of cases in our datasets. Finally, we find that adding active probing to passive observations greatly improves visibility of both server and client target addresses, but is not critical for visibility of target blocks. Our findings are valuable to understand limitations of existing measurement studies, and to develop methods to maximize microscopic completeness in future studies.
Full allocation of IPv4 addresses has prompted interest in measuring address liveness, first with active probing, and recently with the addition of passive observation. While prior work has shown dramatic increases in coverage, this paper explores what factors affect contributions of passive observers to visibility. While all passive monitors are sparse, seeing only a part of the Internet, we seek to understand how different types of sparsity impact observation quality: the interests of external hosts and the hosts within the observed network, the temporal limitations on the observation duration, and coverage challenges to observe all traffic for a given target or a given vantage point. We study sparsity with inverted analysis, a new approach where we use passive monitors at four sites to infer what monitors would see at all sites exchanging traffic with those four. We show that visibility provided by monitors is heavy-tailed—interest sparsity means popular monitors see a great deal, while 99% see very little. We find that traffic is bipartite, with visibility much stronger between client-networks and server-networks than within each group. Finally, we find that popular monitors are robust to temporal and coverage sparsity, but they greatly reduce power of monitors that start with low visibility.
This technical report is joint work of Jelena Mirkovic, Genevieve Bartlett, John Heidemann, Hao Shi, and Xiyue Deng, all of USC/ISI.
This animation was first shown at the Dec. 2014 DHS Cyber Security Division R&D Showcase and Technical Workshop as part of the talk “Towards Understanding Internet Reliability” given by John Heidemann. This work was supported by DHS, most recently through the LACREND project.
Our research studies the Internet’s public face. Since 2006 we have been taking censuses of the Internet address space (pinging all IPv4 addresses) every 3 months. Since 2012 we have studied network outages and events like Hurricane Sandy, using probes of much of the Internet every 11 minutes. Most recently we have evaluated the diurnal Internet, finding countries where most people turn off their computers at night. Finally, we have looked at network reputation, identifying how spam generation correlates with network location, and others have studies multiple measurements of “network reputation”.
A common theme across this work is one must estimate characteristics of the edge of the Internet in spite of noisy measurements and a underlying changes. One also need to compare and correlate these imperfect measurements with other factors (from GDP to telecommunications policies).
How do these applications relate to the mathematics of census taking and measurement, estimation, and correlation? Are there tools we should be using that we aren’t? Do the properties of the Internet suggest new approaches (for example where rapid full enumeration is possible)? Does correlation and estimates of network “badness” help us improve cybersecurity by treating different parts of the network differently?