Migrating multiple repositories to Git

A few weeks ago I faced the challenge of migrating and merging multiple SVN and Git repositories into one single repository. The stackoverflow discussion “Merge two separate SVN repositories into a single Git repository” contains all the information required to solve this problem. This is a concise reproduction of all the bits an pieces presented in the article.

Migrating Birds, image by Emilian Robert Vicol

The plan is simple:

  1. clone the involved Git repositories
  2. migrate relevant SVN repositories to Git
  3. rewrite the repositories in case of overlaps or errors
  4. create new repository and add empty commit
  5. add remotes for all repositories
  6. fetch all remotes
  7. create a list of all commits of all repositories, sort it chronologically
  8. cherry-pick each commit in the list and apply it in the new repository

And here are the commands that implement the plan above. First clone and migrate Git and SVN repositories.

mkdir ~/delme
cd ~/delme/
git clone ~/dev/repo1
git clone ~/dev/repo2
git svn clone svn://server:/repo3/
git svn clone svn://server:/repo4/

If the repositories have the same file or folder names a history rewrite is necessary. Assuming repo1 overlaps with other repositories, it is a good idea to put the contents of repo1 in a subfolder in the target repository. To accomplish this, the history of the master branch of repo1 is rewritten and all its contents is moved to the folder “subfolder”.

cd repo1
git filter-branch --tree-filter 'mkdir -p subfolder; find -mindepth 1 -maxdepth 1 -not -name subfolder -exec mv {} $fname subfolder \;' master

In this step, it is also possible to completely remove files from a repository. The following command removes the file “invalidfile” in “subfolder” from the repository completely.

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch subfolder/invalidfile;' master

This can be repeated for other repositories as well if necessary or desired. In the next step, the target repository that should contain all merges is created. Remote repositories are added and fetched.

mkdir ~/newpreo
cd ~/newpreo
git init .
git commit --allow-empty -m'Initial commit (empty)'
git branch seed
git checkout seed

git remote add repo1 ~/delme/repo1
git remote add repo2 ~/delme/repo2
git remote add repo3 ~/delme/repo3
git remote add repo4 ~/delme/repo4

git fetch repo1
git fetch repo2
git fetch repo3
git fetch repo4

Finally, file containing lists are created for all commits from all repositories. The lists include the timestamp for each commit (seconds since 1/1/1970). The lists are then sorted and merged. The final result is stored in the file “ordered_commits”. This list is then iterated over and each entry is fed to the git cherry-pick command.

git --no-pager log --format='%at %H' repo1/master > reco1_commits
git --no-pager log --format='%at %H' repo2/master > reco2_commits
git --no-pager log --format='%at %H' repo3/master > reco3_commits
git --no-pager log --format='%at %H' repo4/master > reco4_commits

cat *_commits | sort | cut -d' ' -f2 > ordered_commits

cat ordered_commits | while read commit; do git cherry-pick $commit; done

The cherry-pick command prompts git to apply the commit to the current branch. This results in a repository containing all commits from all 4 repositories in a chronological order. That’s all there is to it.


#1 Chris wrote on March 20, 2014:

I would have assumed you could do some kind of octopus merge to bring them all together. But looking it up I see a merge is only possible when there’s common ancestry, so this cherry-pick/rebase method is required.

#2 Sebastian Schaetz wrote on March 20, 2014:

Chris, the technique seems cumbersome but it really works flawlessly. I cleaned and merged 2 SVN and 2 GIT repositories, the SVN repositories had over 4 years of history. Now everything is in one place, neat and clean.

Leave a comment