Categories
Publications Technical Report

new technical report “Peek Inside the Closed World: Evaluating Autoencoder-Based Detection of DDoS to Cloud ”

We have released a new technical report “Peek Inside the Closed World: Evaluating Autoencoder-Based Detection of DDoS to Cloud” as an ArXiv technical report 1912.05590, available at https://www.isi.edu/~hangguo/papers/Guo19a.pdf

We study 4 cloud IPs (SR1VP1 to 3 and SR2VP1) that are under attack. SR1VP3 sees a large number of mostly short DDoS events (71% of its 49 events being 1 second or less). SR1VP1 and SR1VP2 see smaller numbers of longer DDoS events (median duration for their 20 and 27 events are 121 and 140 seconds). SR2VP1 sees DDoS events of broad range of durations (from 1 second to more than 14 hours).

From the abstract of our technical report:

From the abstract:

Machine-learning-based anomaly detection (ML-based AD) has been successful at detecting DDoS events in the lab. However published evaluations of ML-based AD have only had limited data and have not provided insight into why it works. To address limited evaluation against real-world data, we apply autoencoder, an existing ML-AD model, to 57 DDoS attack events captured at 5 cloud IPs from a major cloud provider. To improve our understanding for why ML-based AD works or not works, we interpret this data with feature attribution and counterfactual explanation. We show that our version of autoencoders work well overall: our models capture nearly all malicious flows to 2 of the 4 cloud IPs under attacks (at least 99.99%) but generate a few false negatives (5% and 9%) for the remaining 2 IPs. We show that our models maintain near-zero false positives on benign flows to all 5 IPs. Our interpretation of results shows that our models identify almost all malicious flows with non-whitelisted (non-WL) destination ports (99.92%) by learning the full list of benign destination ports from training data (the normality). Interpretation shows that although our models learn incomplete normality for protocols and source ports, they still identify most malicious flows with non-WL protocols and blacklisted (BL) source ports (100.0% and 97.5%) but risk false positives. Interpretation also shows that our models only detect a few malicious flows with BL packet sizes (8.5%) by incorrectly inferring these BL sizes as normal based on incomplete normality learned. We find our models still detect a quarter of flows (24.7%) with abnormal payload contents even when they do not see payload by combining anomalies from multiple flow features. Lastly, we summarize the implications of what we learn on applying autoencoder-based AD in production.problme?Machine-learning-based anomaly detection (ML-based AD) has been successful at detecting DDoS events in the lab. However published evaluations of ML-based AD have only had limited data and have not provided insight into why it works. To address limited evaluation against real-world data, we apply autoencoder, an existing ML-AD model, to 57 DDoS attack events captured at 5 cloud IPs from a major cloud provider. To improve our understanding for why ML-based AD works or not works, we interpret this data with feature attribution and counterfactual explanation. We show that our version of autoencoders work well overall: our models capture nearly all malicious flows to 2 of the 4 cloud IPs under attacks (at least 99.99%) but generate a few false negatives (5% and 9%) for the remaining 2 IPs. We show that our models maintain near-zero false positives on benign flows to all 5 IPs. Our interpretation of results shows that our models identify almost all malicious flows with non-whitelisted (non-WL) destination ports (99.92%) by learning the full list of benign destination ports from training data (the normality). Interpretation shows that although our models learn incomplete normality for protocols and source ports, they still identify most malicious flows with non-WL protocols and blacklisted (BL) source ports (100.0% and 97.5%) but risk false positives. Interpretation also shows that our models only detect a few malicious flows with BL packet sizes (8.5%) by incorrectly inferring these BL sizes as normal based on incomplete normality learned. We find our models still detect a quarter of flows (24.7%) with abnormal payload contents even when they do not see payload by combining anomalies from multiple flow features. Lastly, we summarize the implications of what we learn on applying autoencoder-based AD in production.

This technical report is joint work of Hang Guo and John Heidemann from USC/ISI and Xun Fan, Anh Cao and Geoff Outhred from Microsoft

Categories
Announcements DNS Internet

Early longitudinal results in measuring the usage of Mozilla’s DNS Canary

Mozilla announced the creation of a “use-application-dns.net” “Canary Domain” that could be configured within ISPs to disable Firefox’s default use of DNS over HTTPS. On 2019/09/21 Wes Hardaker created a RIPE Atlas measurement to study resolvers within ISPs that had been configured to return an NXDOMAIN response. This measurement is configured to have 1000 Atlas probes query for the use-application-dns.net name once a day.

The full description of methodology is on Wes’ ISI site, which should receive regular updates to the graph.

canary

Categories
Social

group lunch in honor of a departure and two arrivals

On November 14 we had a group lunch near ISI to celebrate the completion of Joao Ceron’s visit from the University of Twente as a visiting scholar, to welcome Asma Enayet to the group as a new PhD student, and to welcome Hang Guo’s son into the world. (Hang was understandably not able to make the lunch.) Happy Thanksgiving to all!

A group lunch in honor of Asma (left) and Joao (6th left).
Categories
Papers Publications

new paper “Identifying Important Internet Outages” at the Sixth National Symposium for NSF REU Research in Data Science, Systems, and Security

We will publish a new paper “Identifying Important Internet Outages” by Ryan Bogutz, Yuri Pradkin, and John Heidemann, in the Sixth National Symposium for NSF REU Research in Data Science, Systems, and Security in Los Angeles, California, USA, on December 12, 2019.

From the abstract:

[Bogutz19a, figure 1]: Our sideboard showing important outages on 2019-03-08, including this outage in Venezuela.

Today, outage detection systems can track outages across the whole IPv4 Internet—millions of networks. However, it becomes difficult to find meaningful, interesting events in this huge dataset, since three months of data can easily include 660M observations and thousands of outage events. We propose an outage reporting system that sifts through this data to find the most interesting events. We explore multiple metrics to evaluate interesting”, reflecting the size and severity of outages. We show that defining interest as the product of size by severity works well, avoiding degenerate cases like complete outages affecting a few people, and apparently large outages that affect only a small fraction of people in an area. We have integrated outage reporting into our existing public website (https://outage.ant.isi.edu) with the goal of making near-real-time outage information accessible to the general public. Such data can help answer questions like “what are the most significant outages today?”, did Florida have major problems in an ongoing hurricane?”, and
“are there power outages in Venezuela?”.

The data from this paper is available publicly and in our website. The technical report ISI-TR-735 includes some additional data.

Categories
Publications Technical Report

new technical report “Improving the Optics of Active Outage Detection (extended)”

We have released a new technical report “Improving the Optics of the Active Outage Detection (extended)”, by Guillermo Baltra and John Heidemann, as ISI-TR-733.

From the abstract:

A sample block showing changes in block usage (c), and outage detection results of Trinocular (b) and improved with the Full Block Scanning Algorithm (a).

There is a growing interest in carefully observing the reliability of the Internet’s edge. Outage information can inform our understanding of Internet reliability and planning, and it can help guide operations. Outage detection algorithms using active probing from third parties have been shown to be accurate for most of the Internet, but inaccurate for blocks that are sparsely occupied. Our contributions include a definition of outages, which we use to determine how many independent observers are required to determine global outages. We propose a new Full Block Scanning (FBS) algorithm that gathers more information for sparse blocks to reduce false outage reports. We also propose ISP Availability Sensing (IAS) to detect maintenance activity using only external information. We study a year of outage data and show that FBS has a True Positive Rate of 86%, and show that IAS detects maintenance events in a large U.S. ISP.

All data from this paper will be publicly available.

Categories
Presentations

Talks at DNS-OARC 61

Wes Hardaker gave two presentations at DNS-OARC on November 1st, 2019. The first was a presentation about the previously announced “Cache me if you can” paper, which is on youtube, and the slides are available as well. The second talk presented Haoyu Jiang’s work during the summer of 2018 on analyzing DNS B-Root traffic during the 2018 DITL data for levels of traffic sent by the Chrome web browser, levels of traffic associated with different languages, and levels of traffic sent by different label lengths. It is available on youtube with the slides here.

Categories
Papers Publications

new conference paper “Cache Me If You Can: Effects of DNS Time-to-Live” at ACM IMC 2019

We will publish a new paper “Cache Me If You Can: Effects of DNS Time-to-Live” by Giovane C. M. Moura, John Heidemann, Ricardo de O. Schmidt, and Wes Hardaker, in the ACM Internet Measurements Conference (IMC 2019) in Amsterdam, the Netherlands.

From the abstract:

Figure 10a from [Moura19b], showing the distribution of latency with small TTLs before (right in blue) and with larger TTLs after (left in red) the .uy domain reviewed our work and lengthened their domain’s cache lifetimes to reduce latency to their customers.

DNS depends on extensive caching for good performance, and every DNS zone owner must set Time-to-Live (TTL) values to control their DNS caching. Today there is relatively little guidance backed by research about how to set TTLs, and operators must balance conflicting demands of caching against agility of configuration. Exactly how TTL value choices affect operational networks is quite challenging to understand due to interactions across the distributed DNS service, where resolvers receive TTLs in different ways (answers and hints), TTLs are specified in multiple places (zones and their parent’s glue), and while DNS resolution must be security-aware. This paper provides the first careful evaluation of how these multiple, interacting factors affect the effective cache lifetimes of DNS records, and provides recommendations for how to configure DNS TTLs based on our findings. We provide recommendations in TTL choice for different situations, and for where they must be configured. We show that longer TTLs have significant promise in reducing latency, reducing it from 183ms to 28.7ms for one country-code TLD.

We have also reported on this work at the RIPE and APNIC blogs.

Categories
Students

congratulations to Ryan Bogutz for his summer undergraduate internship

Ryan Bogutz completed his summer undergraduate research internship at ISI this summer, working with John Heidemann and Yuri Pradkin on his project “Identifying Interesting Outages”.

Ryan Bogutz with his poster at the ISI summer undergraduate research poster session.

In this project, Ryan examined Internet Outage data from Trinocular, developing an outage report that summarized the most “interesting” outages each day. Yuri integrated this report into our outage website where is available as a left side panel.

We hope Ryan’s new report makes it easier to evaluate Internet outages on a given day, and we look forward to continue to work with Ryan on this topic.

Ryan visited USC/ISI in summer 2019 as part of the (ISI Research Experiences for Undergraduates. We thank Jelena Mirkovic (PI) for coordinating the second year of this great program, and NSF for support through award #1659886.

See also ISI’s post about this summer undergradate program.

Categories
Announcements

reblogging: the diurnal Internet and DNS backscatter

We are happy to share that two of our older topics have appeared more recently in other venues.

Our animations of the diurnal Internet, originally seen in our 2014 ACM IMC paper and our blog posts, was noticed by Gerald Smith who used it to start a discussion with seventh-grade classes in Mahe, India and (I think) Indiana, USA as part of his Fullbright work. It’s great to see research work that useful to middle-schoolers!

Kensuke Fukuda recently posted about our work on identifying IPv6 scanning with DNS backscatter at the APNIC blog. This work was originally published at the 2018 ACM IMC and posted in our blog. It’s great to see that work get out to a new audience.

Categories
Publications Technical Report

new technical report “Plumb: Efficient Processing of Multi-User Pipelines (Poster)”

We released a new technical report “Plumb: Efficient Processing of Multi-User Pipelines (Poster)”, by Abdul Qadeer and John Heidemann, as ISI-TR-731.  This work was originally presented at ACM Symposium on Cloud Computing (the poster abstract is available at ACM). The poster abstract with a small version of the poster is available at https://www.isi.edu/publications/trpublic/pdfs/isi-tr-731.pdf

aqadeer at SoCC 2018 Carlsbad CA

From the abstract:

As the field of big data analytics matures, workflows are increasingly complex and often include components that are shared by different users. Individual workflows often include multiple stages, and when groups build on each other’s work it is easy to lose track of computation that may be shared across different groups.

The contribution of this poster is to provide an organization-wide processing substrate Plumb that can be used to solve commonly occurring problems and to achieve a common goal. Plumb makes multi-user sharing a first-class concern by providing pipeline-graph abstraction. This abstraction is simple and based on fundamental model of input-processing-output but is powerful to capture processing and data duplication. Plumb then employs best available solutions to tackle problems of large-block processing under structural and computational skew without user intervention.

We expect to release the Plumb software this fall; please contact us if you have questions or interest in using it.