Software to handle indexing and selection of multiple network data types based on a given time range.
Pre-built versions for RPM (Fedora, CentOS, REHL): see https://copr.fedorainfracloud.org/coprs/johnh/timefind/
Bug fixes.
Recursive directory tree support.
Modifications to build with Go-1.4.
Add timefind_lander_indexer.
Make install and a .spec to build a Fedora package.
Fixed README.
Initial public release.
Initial test release.
The latest code can also be checked out via git:
A group of folks at Los Alamos National Laboratory
and at USC/ISI have developed two tools to handle indexing and selection
of multiple network data types: timefind
and indexer
.
Most of us have processed or will be processing large amounts of timestamped data (.pcap, logs, and so on). For example, if we had .pcap spanning 2010-2015, we’d probably want to downselect on a time range, e.g., 2015-Jan-01 to 2015-Feb-01.
Some ways people do downselection now is to build regexes and walk the directory tree. This probably works fine with only one consistently-formatted data source (good luck to the next person that decodes and inevitably rebuilds the regex).
indexer
will walk through all your data and index the timestamps of
the earliest and latest records.
timefind
will then use the indexes and retrieve the filenames that
overlap with the given time range input. For example, if I want to
downselect 2015-Jan-01 to 2015-Feb-01 on DNS .pcap data:
timefind --begin="2015-01-01" --end="2015-02-01" dns
It’s that simple and consistent.
Please send email to calvin@isi.edu with questions, bugs, feature requests, patches, and any notes on your usage!
Requires Go v1.5+.
Download and extract the tarball, and run make
.
Binaries and corresponding README.*
files will be built in bin/
.
indexer
reads in a configuration file describing a source and outputs an
index in CSV format containing a list of filenames, timestamp of the earliest
record, and timestamp of the latest record.
Using timefind
in conjunction with these indexes, a user can downselect the
number of files based on a time range.
Given a large data store, a user may only need a subset of data for processing. For example, a user may only want to process a month’s worth of data (e.g., January 2015) instead of the entire collection.
Given a time range, timefind
retrieves the filenames from an index generated
by indexer
that overlap with the time range.
For example, to retrieve all DNS data from January 2015, we might run timefind as follows:
timefind --begin="2015-01-01" --end="2015-02-01" dns
Copyright (C) 2015. Los Alamos National Security, LLC.
This software has been authored by an employee or employees of Los Alamos National Security, LLC, operator of the Los Alamos National Laboratory (LANL) under Contract No. DE-AC52-06NA25396 with the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this software. The public may copy, distribute, prepare derivative works and publicly display this software without charge, provided that this Notice and any statement of authorship are reproduced on all copies. Neither the Government nor LANS makes any warranty, express or implied, or assumes any liability or responsibility for the use of this software. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL.
Additionally, this program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.