MEGA: Modern Graph Analysis for Dynamic Networks

Project Summary

The goal of the MEGA research project is to develop new models and algorithms to examine dynamic, multi-modal, large-scale social and computer networks.

MEGA is a joint research effort of Stanford University and USC's Information Sciences Institute and Computer Science Department, and part of the ANT: the Analysis of Network Traffic research group. It is supported by the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA) from grant #FA9550-12-1-0411.

ISI's activities on MEGA ran from July 2012 to January 2014.

People

Publications

For related publications, please see the ANT publications web page.

Software

See also ANT software.

Datasets

Dataset Format

Each .bz2 compressed file is in plaintext in tab-separated (key, value) pairs. There are multiple schemas for keys and values. Keys are generally delimited by dashes (-), and values are delimited by colons (:). The basic schema is:

{SHA1}              source-arc:begin-offset:length:url
HashFile-{SHA1}     source-arc:url
DocVector-{URL}     {SHA1}:{SHA1}:...:{SHA1}:

An example entry (URL sanitized):

HashFile-b1a4d1dd9df7d3f2d66c03e5b4bdc679234dfa964189911840100631aec6f3fe       common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:http://example.com/example.html
f467127944e1709e35fb11afaf1f44cc97b33870d04dd678b4a6d8d4524f3d36                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:0:10236:http://example.com/example.html
f576fa846498366aecebeda872f7144d095df69f92adc1ad23e7efdb4624f6a9                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:10236:63:http://example.com/example.html
35f55e4abf51d4cc825d8f2014b9eeb2e023b1dc7a1de3a7e0a4d1b0af3be72a                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:10299:172:http://example.com/example.html
6dba2c195989a0c5049b7b3b2527b41c22370ba6a73e384a62d196976cc7db34                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:10471:100:http://example.com/example.html
5cdf26e417454027a864774fa1195eb92550c98ff10e9b8eb25343e22153e30c                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:10571:123:http://example.com/example.html
d1de190ed24e8587c34c416dad5bb656df2d1a3dd6a507f89eb0198ec9780978                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:10694:366:http://example.com/example.html
5820c825c4e16108b77f7ea91f3416e1c7718dc64811688ecb4f426af7a4dbce                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:11060:131:http://example.com/example.html
21f3915561bfa6b11b832f22cd946ac2ca46e88ece75136aa29fed3edd2a7ef3                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:11191:222:http://example.com/example.html
e13bccabffc2f3b9cc9f0faf50735f70de84adc93b0122df37e5890848be7619                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:11413:725:http://example.com/example.html
0d1b59bee1c80d57cfb96e8ed0aa041534d1115eb6f90021a941ada06329dcd2                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:12138:102:http://example.com/example.html
300508ba33e4cff10bb9f89381bc28392a710ca74898a766d9554b4928f129e1                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:12240:94:http://example.com/example.html
d209e54ecec7e9ec39cd9682e421d8c1bcbcd14996de9dd61b6f78fe4028d603                common-crawl/crawl-002/2010/09/24/20/1285393861218_20.arc.gz:12334:558:http://example.com/example.html
DocVector-http://example.com/example.html                                       f467127944e1709e35fb11afaf1f44cc97b33870d04dd678b4a6d8d4524f3d36:f576fa846498366aecebeda872f7144d095df69f92adc1ad23e7efdb4624f6a9:35f55e4abf51d4cc825d8f2014b9eeb2e023b1dc7a1de3a7e0a4d1b0af3be72a:6dba2c195989a0c5049b7b3b2527b41c22370ba6a73e384a62d196976cc7db34:5cdf26e417454027a864774fa1195eb92550c98ff10e9b8eb25343e22153e30c:d1de190ed24e8587c34c416dad5bb656df2d1a3dd6a507f89eb0198ec9780978:5820c825c4e16108b77f7ea91f3416e1c7718dc64811688ecb4f426af7a4dbce:21f3915561bfa6b11b832f22cd946ac2ca46e88ece75136aa29fed3edd2a7ef3:e13bccabffc2f3b9cc9f0faf50735f70de84adc93b0122df37e5890848be7619:0d1b59bee1c80d57cfb96e8ed0aa041534d1115eb6f90021a941ada06329dcd2:300508ba33e4cff10bb9f89381bc28392a710ca74898a766d9554b4928f129e1:d209e54ecec7e9ec39cd9682e421d8c1bcbcd14996de9dd61b6f78fe4028d603:

Related Links

ANT: the Analysis of Network Traffic research group