NAME

antlink - support funky symlinks to manage a tree of git (or other) repositories

SYNOPSIS

antlink SUBCOMMAND AN_ANT_SYMLINK

DESCRIPTION

Antlink's goal is to make groups of git (or other) repositories discoverable and clonable without requiring everyone check out everything.

Antlink handles the meta-repository, which is a tree of repositories. A meta-repository has links, called antlinks, to other regular repositories. Some repositories may be cloned or not. When cloned, a local copy can be edited. When not cloned, they take no space and are easy to clone when desired. (If a clone is no longer needed, it can be removed, then restored later.)

Repositories stored on the same server can be grouped into a site, simplifying their discovery. There can be multiple sites, like github and gitlab, or the ANT project set and the other project set. Sites share the same access method (file or ssh), version control system (git or svn), hostname, and perhaps a common path at that host.

In practice, an antlink is just a specially formatted symlink, so when they are checked A sub-repository is an "antlink", a funny symlink.

Typically one interacts with a repository with standard git commands. Antlink commands are used only to clone, remove, or rename repositories. There are also antlink status commands that, when invoked in the meta-repository, run over all currently cloned repositories.

In addition to git, leaf repositories can be stored by subversion.

WORKFLOW: REGULAR USE

(If you are starting for the first time, see "WORKFLOW: STARTING A NEW ANTLINK METAREPOSITORY" below.)

Typical workflow is to go to the meta-repository and see if anything needs to be updated in any cloned repository via

cd META
antlink pull .

To work on an repository that is not yet local, clone it:

cd META
antlink clone subrepo
cd subrepo
# edit away

To look for things that are not checked in:

cd META
antlink status .

To start a new sub-repository inside the existing meta-repository:

cd META
antlink init newsubrepo

One can organize things in the default meta-repository:

cd META
mkdir PAPERS
cd PAPERS
antlink init workshop_paper_0
antlink init conference_paper_1
antlink init journal_paper_2
antlink init tenure_acceptance_3

One can make a copy of an existing repo to be managed under antlink, or you can link in a pointer to the other repo, as described in the next section.

For copying, make a new antlink repo and use standard git to pull the old history:

cd META
antlink init copy_of_paper
cd copy_of_paper
git remote add upstream https://location/of/otherrepo.git
git pull upstream main

(or replace main with whatever your upstream's prefered branch is).

and to abandon the upstream

git remote remove upstream

and save to the antlink copy

git push

WORKFLOW: LINKING IN OTHER REPOSITORIES

Antlink can tie together repos on multiple sites. Each remote meta-repository is listed in _antlink.yaml, and the graft command puts them there, and clones the repo for you.

cd META
mkdir EXTERNAL
antlink graft https://github.com/jekyll/jekyll.git EXTERNAL/jekyll_read_only
antlink graft git@github.com:jekyll/jekyll.git EXTERNAL/jekyll_rw


antlink graft --vc svn https://github.com/jekyll/jekyll EXTERNAL/jekyll_via_svn

One can also omit the destination to get a default:

antlink graft https://github.com/jekyll/jekyll.git

(will appear in "jekyll" in the current directory).

WORKFLOW: LINKING OVERLEAF

Antlink integrates with overleaf, treating it as an external git repository. First, create a project in overleaf (or have your friend invite you to their project, and join it on the website.)

Then put this in your _antlink.yaml file like this:

- name: OVERLEAF_GIT
  type: git
  url: "https://git.overleaf.com/"

and put your userid in ~/.gitconfig:

[credential "https://git.overleaf.com"]
    username = johnh@isi.edu

and do

cd META
antlink graft https://git.overleaf.com/63cb1095c6b536300dc7f02a overleaf_example_project

This command will graft in a specific overleaf sample project. (Note that the URL has "git", not "www"---use the URL from the "clone with git" recommendation under Overleaf's Sync > Git menu). It will clone the project into the "overleaf_example_project" antlink.

To make a brand new meta-repository on your current computer:

cd $HOME
antlink initmeta /home/yourid/metarepo.git

To start on your computer from an existing meta-repository:

antlink clonemeta /home/yourid/metarepo.git

will check out "metarepo" into the current directory.

Or to pick up the metarepo from another computer:

antlink clonemeta ssh://git.example.com/home/yourid/metarepo.git

Then look in metarepo.

ON-LINE AND OFF-LINE USE

Currently all interations with the meta-repository must be done on-line, with access to that respository. This requirement avoids independent, conflicting operations on repositories (for example, if two people were to create or rename the same sub-repository).

Operations on inside individual sub-repositories can be carried out when off-line, as with normal git.

In principle we can operate fully-offline; we did it in 1990 (see "Implementation of the Ficus Replicated File System" by Guy et al., Usenix Technical Conference, 1990). However, the current implementation does not support infrastructure to support offline operation (something recognized as a bug).

SUBCOMMANDS

The following sub-commands work on the given antlink:

help

Show basic help. See also antlink --man to show the full manual page.

clone

Check out an antlink, if not checked out. (An old synonym is resolve.)

unclone

Discard a checked-out antlink (assuming all changes are committed and pushed).

init

Create a new antlink and its new backing repository on the server.

graft

Link in a new external repository.

mv

Rename an antlink, including its local repository and the server. However, renaming does not currently catch local copies on other computers---they will become disconnected. Because of this risk, mv therefore requires the -f option.

rm

Remove an antlink and its repository, both the local copy and on the server. As with renaming, remove does not currently catch local copies on other computers---they will become disconnected. Because of this risk and because it is destructive, rm therefore requires the -f option. (Run without -f but with -v to show what it will do, if you're nervous.)

status and push and pull

Report the status in a resolved antlink, listing contents not yet committed or pushed. Or push or pull across each resolved antlink.

With no argument, report across all cloned antlinks under the given path (or the current directory).

listsubcommands

List all possible subcommands. (Mainly for command line completion; humans should use antlink help.)

OPTIONS

-f or --force

Force, allowing potentially risky behavior.

-d

Enable debugging output.

-v

Enable verbose output.

--help

Show help.

--man

Show full manual, including list of subcommands.

REPOSITORY ASSUMPTIONS

We assume that, on the client, everything lives in a working directory W. The local copy of meta repository is in W/META, with the official copy at ssh://git.example.com/path/META.git.

We assume all repositories on a site follow a centralized model, with a central, offical copy and local checked out version, and default to using git. One can also patch in things that use other patterns and other VC software.

Local copies of sub-repostories get checked out in W/SITE. There's a default set of sub-repositories stored next to the meta-repository; they are checked out into W/META_GIT/SUB1 and with offical, central copies in ssh://git.example.com/path/META_GIT/SUB1.git.

If META has many sub-repositories, they may live in a tree of subdirectories in the meta repository. Thus (ssh://site.example.com/path/code/SUBCODE2.git, ssh://site.example.com/path/code/SUBCODE3.git, ssh://site.example.com/path/www/SUBWWW4.git, etc.). Their working copies might be collected into W/SITE/code/SUBCODE2, W/SITE/code/SUBCODE3, W/SITE/www/SUBCODE4.

If there are multiple external groups of repositories, that list of sites accumulates in META/_antlink.yaml. The first one is always the meta directory and the second the default site.

WHY BOTHER WITH ANTLINK?

Git is great. But git's assumptions don't cover the world of uses. Specifically, git basically requires that one check out all history to do anything. This approach fundamentally prevents a single repository from scaling to cover many different projects over many years.

The git authors recognize this limitation and advise one git repository per "thing", where thing is a program (like the Linux kernel, or git source). This allows git to scale for that project, but it creates the new scaling problem: you now have many, many repositories. (My research lab has more than 300; my personal site has a dozen.)

Antlink is the minimum glue needed to paste together a bunch of git repositories and manage them as a whole.

WHY NOT SOMETHING ELSE?

Many people have proposed similar things, but none is quite right:

git-submodule doesn't work for us because it freezes the sub-module at a particular version. We instead want to track the latest version of the subtrees. (More detailed dislike: https://codingkilledthecat.wordpress.com/2012/04/28/why-your-company-shouldnt-use-git-submodules/, http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/).
git-subtree is like android repo (described below). It also assumes you want all subtrees, and it ties subtrees to specific URLs (and therefore access methods of direct file or ssh). We require the ability to copy some specific subtrees, and we need to access them with different methods from different places (for example, using direct file access when on the same server as the repository).
git-annex is intended to track pointers to large things that are not archived by git and may be stored off-line. We instead want to track small things (many files) that are in turn tracked by other gits. (We share goals in future-proofing and the need to avoid keeping a copy of all content locally.)
Android repo (https://source.android.com/source/using-repo.html) this tool is really close to what we need, but it assumes one always downloads all subtrees. We instead require the ability to select only some of subtrees. (In addition, its XML configuration format seems cumbersome.)
just use svn This worked for quite a while, but svn has problems that git fixes. (Details: search for "git vs. svn".)
gr (https://github.com/mixu/gr) I found gr a year after I started antlink. Seems to have roughly the same goals (and similar design choices, basically passing git commands through). I need to look at it more carefully. (Seems like last edit was 2017.)
mu (https://fabioz.github.io/mu-repo/) I found mu a year after I started antlink. It seems to have similar goals, and I need to look at it more carefully.
myrepos (https://myrepos.branchable.com/) It seems to have similar goals, and I need to look at it more carefully.

Our goals:

These antlink "pointers" are symlinks that point just outside this directory into "parallel" repositories that are checked out only when needed. By default, you get a minimal checkout. If you need another repository that's not yet checked out, run "antlink_resolve" on the symlink and it will check out the backing repository.

ANTLINK's _antlink.yaml

The root of a metarepository has a file _antlink.yaml. In the fullness of time the antlink graft command will edit this file. For now, users must edit it if they want to paste together different repositories.

The first two entries will always be the primary metarepo:

repos:
  - name: ANT
    type: git
    url: "meta:"
  - name: ANT_GIT
    type: git
    url: "parent:/home/ant/ANT_GIT"
    init_hook: /home/ant/githooks/configure_new_repository

But one can hook in foreign repos, like overleaf and github or a privately hosted github like thing, or even subversion:

- name: OVERLEAF_GIT
  type: git
  url: "https://git.overleaf.com/"
- name: GITHUB_GENERAL
  type: git
  url: "https://github.com/"
- name: ANT_GITEA
  type: git
  verify_ssh: false
  url: "ssh://git@git.ant.isi.edu/"
- name: ANT_SVN
  type: svn
  url: "parent:/home/ant/ANT_SVN"

In most cases the URL is the prefix you pass to "git clone", so https or ssh access methods.

A "parent:" in the URL means inherit the access method of the parent git repo. Thus one can clone the ANT metarepo with file:// on one machine with direct access to the central repo, and with ssh:// on another machine, and it will all work.

Subverion support is incomplete and intended for legacy use.

Antlink assumes it can shell ssh to the hosting computer when the access method is ssh, unless you set verify_ssh: false.

SUBCOMMANDS IN DETAIL

antlink clone PATH_TO_ANTLINK

"clones" an antlink by checking it out into the parallel tree.

antlink mv [-f] PATH_TO_ANTLINK NEW_PATH

Renames an antlink, on both the local copy and server.

antlink rm -f PATH_TO_ANTLINK

Removes an antlink, on both the local copy and server.

antlink rename-branch -f [--local | --remote] PATH_TO_ANTLINK OLD_NAME NEW_NAME

Rename a branch, either the local copy (with --local) or both local and remote (with --remote). The old branch will be removed.

antlink rename-branch-meta -f  [--local | --remote] OLD_NAME NEW_NAME

Rename a branch of the metadirectory, either the local copy (with --local) or both local and remote (with --remote). The old branch will be removed.

antlink unclone PATH_TO_ANTLINK

"unclones" an antlink by (1) making sure no changes are pending, (2) discarding the checked out copy.

(See also "rm" which, in addition to unclone, removes it on the server.)

antlink initmeta GIT_REPOSITORY_DIRECTORY

Create a new meta-repository. These are always on the local computer.

antlink clonemeta GIT_REPO_URL_OR_PATH [LOCAL_DIR]

Clone a meta-repository. Could be from a local or remote computer. Result is always local.

antlink graft [--vc svn|git] GIT_REPO_URL_OR_PATH [LOCAL_DIR]

Graft in an external meta-repository. Could be from a local or remote computer.

antlink init PATH_TO_ANTLINK

Initialize a new antlink with some path, creating a new git repository for it on the server checking that out on the local computer, and adding the antlink to the meta-repository

If the repository has an "init_hook" set (defined in _antlink.yaml in the root of the meta-repo), that script will be run on the server to setup any repo-specific things (like commit hooks to send e-mail).

antlink status PATH_TO_ANTLINK
antlink push PATH_TO_ANTLINK
antlink pull PATH_TO_ANTLINK

Show the git status of an antlink, or push or pull.

If given a path, it performs the action on all antlinks in that directory or its children.

antlink listsubcommands

Enumerate all possible subcommands. Useful in filename completion.

RELEASE HISTORY

The most recent version of antlink is at https://ant.isi.edu/software/antlink/.

0.1 (2015-06-09) Released for internal ANT project use. Full of unportability, but functional.
1.0 (2016-01-03) Cleaned up with no ANT-specific dependencies. A "real" release.
1.1 (2016-01-04) Better documentation and a website.
1.3 (2016-06-06)

Bugfix: no more infinite loop when antlink init run outside a meta repository. (Bug reported by Calvin Ardi.)

Enhancement: antlink help and antlink man now work. (Suggestion from Calvin Ardi.)

1.4 (2016-12-06)

Enhancement: Added bash autocompletion, and antlink listsubcommands to support it.

Enhancement: Added preliminary verison of antlink mv to rename antlinks. (More work is needed, though, to handle distributed moves.) Motivated by a rename for Lan Wei.

1.5 (2016-12-06)

Bug fix: improved documentation installtion to fix Fedora packaging problem.

1.6 (2016-12-06)

Bug fix: fix numerous bugs in antlink mv.

1.7 (2017-07-19)

Enhancement: an initial test suite, so no more silly "numerous bugs", and finally got antlink mv to work.

Enhancement: antlink init now works when run outside the meta-repostiroy.

1.8 (2017-07-21)

Enhancement: antlink --version now works.

Bug fix: Several packaging problems due to the test suite in antlink.spec are now fixed. CentOS-6 packages only build with antlink-1.6, but all current RH RPM OSes work (epel7, f24, f25, f26).

Bug fix: antlink mv now works in subdirs, not just the meta's root.

1.9 (2018-09-05)

Enhancement: antlink initmeta now accepts an existing META_GIT directory, if it exists.

Bug fix: More bugs for antlink mv now works in subdirs.

1.10 (2019-07-10)

Enhancement: Several antlink subcommands now check for ssh working and give a reasonable error message if it's not (rather than dumping a stack trace).

1.11 (2021-04-04)

Enhancement: antlink rm now exists.

1.12 (2021-04-08)

Enhancement: antlink now honors init.defaultBranch, or uses "main" if no default branch is given.

1.13 (2021-04-19)

Enhancement: more robust handling of init.defaultBranch and git version checking, so that it builds on ELEP7 and F32 to F34.

1.14 (2021-04-21)

Enhancement: antlink show-clones and antlink rename-branch now exist. Typically, the system administrator will bulk-rename all directories on the server:

1.15 (2021-04-21b)

Enhancement: antlink rename-branch-meta now exists. Typically, the system administrator will bulk-rename all directories on the server:

cd $META_GIT
OLD_NAME=master; NEW_NAME=main
find . -type d -name \*.git -print | while read D;
do
  grep $OLD_NAME $D/HEAD && (
    cd $D; git branch -m $OLD_NAME $NEW_NAME;
  );
done

And then each user will rename all checked out copies with:

cd $LOCAL_META_CHECKEDOUT
antlink --force --local rename-branch-meta . master main
antlink show-clones | while read D;
do
  antlink --force --local rename-branch $D master main;
done
1.16 (2021-07-20)

Finally, partial support for antlink graft, just for overleaf.

Bug fix: antlink init failed if the server's git was pre-2.28. We now check for and handle that case.

1.17 (2022-01-18)

Improvement: better error message if the antlink path doesn't exist. (Bug report from Jelena Mirkovic.)

An editing pass over the documentation.

The YAML parsing library was changed (to YAML::PP) since that seems available in places where YAML::XS is not.

1.18 (2022-01-25)

Improvement: add the verify_ssh option to _antlink.yaml to restore support for gitea grafting.

Fix test suites on boxes with pre-initialBranch gits.

Document _antlink.yaml.

1.19 (2023-02-08)

Improvement: a special case for overleaf auto-sets credential storage.

Bug fix: add missing install-time dependency on YAML::PP.

1.20 (2024-01-30)

Bug fix: the error for grafting or init'ing a new subrepo over an existing like were confusing ("expect but cannot find meta dir..."). Now they should be clearer and about cannot overwrite.

Bug fix: packaging uses perl-interpreter, not just perl.

KNOWN BUGS

off-line operation on the meta-repository is not currently supported.

AUTHOR AND THANKS

Antlink is written by John Heidemann.

Antlink benefited from feedback and bug reports from many people (thanks!): Yuri Pradkin, Calvin Ardi, Wes Hardaker, Jelena Mirkovic.

COPYRIGHT

Copyright (C) 2015-2023 the University of Southern California.

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License, version 2, as published by the Free Software Foundation.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along ith this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.

POD ERRORS

Hey! The above document had some coding errors, which are explained below:

Around line 437:

Expected '=item *'