This repository contains the code and pointers to datasets used in the paper "Precise Detection of Content Reuse in the Web" by Calvin Ardi and John Heidemann.
A public git repository is at https://github.com/cardi/content-reuse-detection
See our GitHub.