Migrating multiple repositories to Git

A few weeks ago I faced the challenge of migrating and merging multiple SVN and Git repositories into one single repository. The stackoverflow discussion “Merge two separate SVN repositories into a single Git repository” contains all the information required to solve this problem. This is a concise reproduction of all the bits an pieces presented in the article.

Migrating Birds, image by Emilian Robert Vicol

The plan is simple:

  1. clone the involved Git repositories
  2. migrate relevant SVN repositories to Git
  3. rewrite the repositories in case of overlaps or errors
  4. create new repository and add empty commit
  5. add remotes for all repositories
  6. fetch all remotes
  7. create a list of all commits of all repositories, sort it chronologically
  8. cherry-pick each commit in the list and apply it in the new repository

And here are the commands that implement the plan above. First clone and migrate Git and SVN repositories.

mkdir ~/delme
cd ~/delme/
git clone ~/dev/repo1
git clone ~/dev/repo2
git svn clone svn://server:/repo3/
git svn clone svn://server:/repo4/

If the repositories have the same file or folder names a history rewrite is necessary. Assuming repo1 overlaps with other repositories, it is a good idea to put the contents of repo1 in a subfolder in the target repository. To accomplish this, the history of the master branch of repo1 is rewritten and all its contents is moved to the folder “subfolder”.

cd repo1
git filter-branch --tree-filter 'mkdir -p subfolder; find -mindepth 1 -maxdepth 1 -not -name subfolder -exec mv {} $fname subfolder \;' master

In this step, it is also possible to completely remove files from a repository. The following command removes the file “invalidfile” in “subfolder” from the repository completely.

git filter-branch -f --index-filter 'git rm -r --cached --ignore-unmatch subfolder/invalidfile;' master

This can be repeated for other repositories as well if necessary or desired. In the next step, the target repository that should contain all merges is created. Remote repositories are added and fetched.

mkdir ~/newpreo
cd ~/newpreo
git init .
git commit --allow-empty -m'Initial commit (empty)'
git branch seed
git checkout seed

git remote add repo1 ~/delme/repo1
git remote add repo2 ~/delme/repo2
git remote add repo3 ~/delme/repo3
git remote add repo4 ~/delme/repo4

git fetch repo1
git fetch repo2
git fetch repo3
git fetch repo4

Finally, file containing lists are created for all commits from all repositories. The lists include the timestamp for each commit (seconds since 1/1/1970). The lists are then sorted and merged. The final result is stored in the file “ordered_commits”. This list is then iterated over and each entry is fed to the git cherry-pick command.

git --no-pager log --format='%at %H' repo1/master > reco1_commits
git --no-pager log --format='%at %H' repo2/master > reco2_commits
git --no-pager log --format='%at %H' repo3/master > reco3_commits
git --no-pager log --format='%at %H' repo4/master > reco4_commits

cat *_commits | sort | cut -d' ' -f2 > ordered_commits

cat ordered_commits | while read commit; do git cherry-pick $commit; done

The cherry-pick command prompts git to apply the commit to the current branch. This results in a repository containing all commits from all 4 repositories in a chronological order. That’s all there is to it.

Stop teaching Matlab

Many universities rely on Matlab for their mathematical and technical computation curriculum. This is because the syntax of the Matlab language is very intuitive and a perfect fit for numerical computation. Matlab also comes with a huge library of sophisticated math functions and excellent documentation. And universities are often equipped with campus-wide Matlab licenses. Professors as well as students can use Matlab for free.

Disorderly Conduct, image by Ken

Mathworks, the company behind Matlab, is pursuing an obvious plan with these generous campus licenses. Their strategy is aimed at selling software to mathematicians, engineers, physicists and computer scientists after they graduate. Since Matlab is often the only or most convenient tool these scientists get to know during their studies, Mathworks’ plan is very successful.

If for-profit companies decide to base their research and product development on Matlab I have no objections. The market will decide if it is the right decision.

But I find it appalling that a wide variety of todays scientific advances are based on a proprietary software1 product such as Matlab. Institutes that base their research on Matlab are at the mercy of a for-profit US company to sell them and renew licenses.

Scientific results based on Matlab are not free2. To reproduce, validate and build upon them, a Matlab software license is required. It is my opinion that, since science is largely paid for by the public, its results must also be available to the public. They must be free. It is thus a fatal mistake to train students and young scientists, the future creators of scientific knowledge, in using tools that restrict the freedom of their results.

There are many excellent free alternatives to Matlab. I would just like to point out two of them here: The long-established Matlab alternative is Python with the computing environment SciPy. The other alternative is new: Julia, a dynamic programming language designed to address the requirements of high-performance numerical and scientific computing. From a C++ developer’s perspective, Julia’s expressive type system and the excellent performance compared to compiled languages are very attractive.

There are numerous free software alternatives that allows researchers to do open and reproducible science. As of Spring 2014, the MIT linear algebra course suggests Julia as a Matlab alternative to solve homework problems. I hope teachers and professors will switch to free software and instruct the next generation of scientists how to produce free results.

  1. Proprietary software is software that does not give the user freedoms to study, modify and share the software, and threatens users with legal penalties if they do not conform to the terms of restrictive software licenses (source).
  2. free as in freedom, both negative (free of oppression or coercion) and positive liberty

Cutting off Google’s Tentacles

I just realized how easy it is to cut off one of Google’s tentacles throughout the web. This is a WordPress blog and I used the Ultimate Google Analytics plugin to keep track of the number of visitors, where they come from (referer, not geo-location), keywords and so forth.

Google Analytics is just one of the many tentacles, Google spreads throughout the web. There is the Google Fonts API, there is Google Hosted Libraries and probably numerous other things that I am not aware of. Visitors to websites that include any one of these Google services will always contact one or multiple Google servers, thus identifying themselves (to some degree).

So I uninstalled the Google Analytics plugin, deleted my Google Analytics account and installed the WordPress Statistics plugin instead. It works flawlessly so far. I can recommend it.

Mere users of the web can monitor and block these tentacles as they browse the web through a Firefox add-on called Disconnect. I can recommend this plugin as well.

Let’s work together to make the web a less centralized, more private place for everyone. Let’s try to exclude large corporations from our interactions between each other. It is probably none of their business and certainly should not be their business.

8 GPU GeForce Titan Tyan System

A box arrived a few days ago at work.

8 GPU Titan System box

The box contained a little supercomputer comprised of 8 GeForce Titan GPUs in a Tyan FT77A platform.

8 GPU Nvidia Titan System

The system fits nicely in our server room.

8 GPU Nvidia Titan System Installed

And the 8 GPUs light up too!

8 GPU Nvidia Titan System Glow

scikit-image – Image processing in Python

I just discovered scikit-image – an image processing toolbox for the python programming language. The project appears to be very active and under heavy development. I have been looking for a while for ways to replace my scripts that rely on the Matlaband this seems to do the trick. I’m especially excited about the libraries functionality to measure region properties. This will come in very handy. I have either been sleeping under a stone or this project is not very well known. Let’s change that. Go check out scikit-image!

Nvidia OpenCL Examples

It looks like Nvidia is slowly but steadily abandoning OpenCL – their OpenCL examples are not included in the 5.0 SDK anymore and the links on their OpenCL webpage are dead. It is not clear if this is an oversight or intentional. With the recent introduction of the Intel Xeon Phi, an accelerator that supports OpenCL, this could be a strategic move on Nvidia’s part.

I’ve created a github repository that contains all Nvidia OpenCL examples from CUDA version 4.2.9. Please note that it currently contains only the Linux examples. Feel free to fork and add the Windows and Mac OS examples.

C99: casting to variable-length arrays

C99 understands variable-length arrays. They look something like this:

int d1, d2, d3, d4;
// runtime-assign those variables
int vla[d1][d2][d3][d3];

Now the question is, how to properly cast a pointer to this type when passing it to a function that accepts such a type in a way that makes the compiler happy? The syntax is somewhat unusual:

void func(int d1, int d2, int d3, int d4, int vla[d1][d2][d3][d4]){}
// ...
int * x = malloc();
func(d1, d2, d3, d4, (int (*)[(int)(d2)][(int)(d3)][(int)(d4)])x);

Tested with gcc 4.6.1 and the –std=c99 compiler option. I could not find information about this anywhere on the web so I hope this will help others who wonder how it should be done.

Mozilla Thunderbird and Flowed Format

I like my e-mails to look a particular way. I send plain-text only e-mails with line wrapping at about 80 characters. This way I feel I can control to some degree what the e-mail will look like at the receivers end. And I want my e-mails to look like that even when I’m quoting someone.

Up until now I hated to use Thunderbird because it did not allow me to send e-mails like that. Since gmail looks ugly as hell now I came back to Thunderbird and set out to solve this issue once and for all. I searched for “thunderbird plain text wrap” and whatnot. A smart fellow on a forum had the answer:

There are two hidden preference settings which one can access through Edit > Preference > Advanced > General > Config Editor button:

  • mailnews.display.disable_format_flowed_support must be set to true to avoid rewarpping of messages received for which the sender allows flowed formatting
  • mailnews.send_plaintext_flowed must be set to “false” to avoid sending flowed format e-mails

The problem seems to be something called “flowed format” but I’m too ignorant to go into the details of what it does, what it means and why I would want it. I’m just happy the e-mails my Thunderbird sends look the way I want them to look. Finally. Thank you rsx11m!

Cryptography for the Masses

“The multiple human needs and desires that demand privacy among two or more people in the midst of social life must inevitably lead to cryptology wherever men thrive and wherever they write.” wrote David Kahn in his book “The Codebreakers”, chronicling the history of cryptography. The book was published in 1967. Almost 45 years later cryptography is seldom used to protect our privacy.

The information age spawned databases and networks capable of extracting and storing large amounts of private data. Those databases are often unknown to us and if we know of their existence we can not control them. They store personal information, communication and financial transactions. This gathering of private data happens against our will if we believe surveys that show that we actually do care about privacy. Skeptics and experts caution us but the majority of web users is forced to give in to the subtle but grave disintegration of privacy, pushed forward by industry and government. They are growing their databases steadily, expanding the records they keep on all of us.

Good question, image by Garrett Coakley

We can see the consequences of these uncontrollable, central databases today. In what is believed to be one of the largest data security breaches in history, attackers stole personally identifiable information of 77 million PlayStation Network users earlier this year.

Accidental exposure of personal data is another problem. It is very difficult to control who has access to which piece of information. People get fired for how they behave online because they confuse personal with public communication. The web does not forget. And ever since the uprisings in the Arab world it should be clear to everybody that what one posts online can have severe consequences, including imprisonment and torture.

There are a variety of interesting judicial and ethical approaches to cope with these issues. And there is cryptography – a technological means of preserving privacy. Cryptography enables anonymity, the concept of ‘publishing information while ones identity is publicly unknown’ as well as privacy, the ability to to ‘seclude oneself or information about oneself and reveal oneself selectively’.

But almost nobody uses cryptography. Asked if he encrypts his e-mail, Bruce Schneier, cryptographer and highly regarded computer security specialist answers “I do not, except for special circumstances”. He further argues that for more people to encrypt their communication, services like Gmail would have to do it by default. This will of course never happen, since those services draw their revenue from reading our messages.

It has to work out of the box

But the more important point Schneier makes is this: what has to happen to spread the use of cryptology? It has to work out of the box. No additional application should be required, no plug-in, no add-on and certainly no driver installation. There exists a concept that could potentially offer a transparent solution for everyone: browser based cryptography.

Browsers have evolved from being a mere presentation and navigation tool for the world wide web to a platform for collaboration and information sharing web applications. If browsers were able to do cryptography, every web users could potentially benefit from it. JavaScript engines have evolved rapidly to a point where they are efficient enough to handle the complex algorithms that cryptography entails. It is ironic that the same web applications that threaten our privacy are the main reason such powerful engines were developed in the first place.

The idea of browser based cryptography is simple: before users upload their personal data to application hosts they encrypt the data in the browser. The host only receives encrypted blobs of data and since users don’t share their key with the host the data is secure. If they decide to share their data with someone else they can provide them with means of decrypting the blobs. Users are in control at all times.

But JavaScript cryptography has many critics and there has been some discussion whether or not it is a viable solution. But the potential is vast and the issue of retaking privacy is too important to dismiss the technology right away. The discussion should not stop here. Solutions can be found to the given objections.

JavaScript Cryptography Criticism

The trust model certainly is a problem and seems inconsistent. The general assumption is that users don’t trust application providers with their data, thus the need for encryption. Modern web applications however download their code from the very same provider and consequently also download the code required for decryption and encryption. A contradiction: users don’t entrust providers with their data but they trust the provider to deliver the application and most importantly the correct cryptography code. Critics argue that users can decide to either trust or not trust whoever hosts an application. If they trust the host there is no need for encryption. If they don’t trust the host they should not share their data in the first place. If someone suspects a host has malicious intentions JavaScript cryptology is worthless.

The situation changes if the ‘honest-but-curious’ adversary model is taken as a basis. It assumes that the company providing a web services carries out the stated instructions and is not lying to the user. It is further assumed that it might do more than it promised such as storing the data for an unreasonably long time or even sharing data with third parties. In such a case JavaScript cryptography might be viable. Furthermore if the company is attacked and database dumps are stolen, the data is worthless. For an attacker to gain access to the data the web application source code has to be modified and users have to use the malicious application. To defeat such attacks, browsers would have to be able to validate web applications to make sure that they were not modified.

Dystopia 2, image by Hervé Girod

Today the browser is an environment not very well suited for cryptography. There is the threat of cross-site attacks: a web page loads content from many sources and all of those can potentially modify the cryptography code. Due to the dynamic nature of the JavaScript language, an attacker can replace correct code with a malicious version. Such an attack can only be discovered through tedious code analysis of all sources.

Critics also argue that browsers lack some crucial primitives important for cryptography such as a random number generation. Fortunately browser vendors see the need for such functions and are moving toward implementing them. Browsers further lack a secure key store, a crucial component in every crypto-system. It is also important to have the ability to securely erase secrets from memory once they are no longer needed. Since JavaScript engines employ garbage collection there is no way to control when objects are deleted and secrets are forgotten.

These issues are significant but they can all be addressed with care. Web developers, cryptographers and browser vendors will have to work together to find solutions to these shortcomings. Once this happens I expect to see secure JavaScript cryptography applications that satisfy even the skeptics. There is still a lot to be done but the potential is tremendous. So we should get to work.

Existing Implementations

Despite the criticism there are already some implementations that utilize JavaScript cryptography out there. They try to make do with the current state of browser support for de- and encryption. Some supplement browser capabilities with custom add-ons. This of course defeats the purpose of JavaScript cryptography but is a necessity at the moment since browser support is still in its infancy.

Aldo Cortesi’s is the most interesting project. It is an online list-manager that encrypts user data in the browser and sends only an encrypted blob to the host. It inspired people to think about what JavaScript cryptology can do and what it means to encrypt data in the browser. Encrypted user data is completely exposed as the application intentionally lacks authentication (except to prevent overwrites). Only the passphrase protects the data. Cortesi uses the expression ‘host-proof’ in his documentation of the project which is a dubious term, especially in the context of JavaScript cryptography. It was coined by Richard Schwartz and emphasizes that the host does not have to be trusted. This is a difficult claim. Users still have to trust the host because it grants the privilege of encrypting the data and it can revoke that privilege at any time. The ‘honest-but-curious’ adversary model again makes more sense.

More questionable applications are Lockify, Zero-Knowledge Box and Clipperz because they explicitly advertise the security of their products that is solely based on JavaScript cryptography. Critics argue that appearance of security is worse than no security at all. Their claim of increased security should indeed be taken with a grain of salt: data that otherwise would not be uploaded due to security considerations should not be stored with those hosts. Even so, the cryptology is a welcome addition to the security measures these companies take. It accomplishes the goal of preserving privacy.

Vintage Mail Boxes, image by Nathen Jantzen, all rights reserved

When asked to choose between a regular web application and a JavaScript cryptography enabled one, the latter is preferable due to privacy considerations since the user does not lose control of his data. Claiming that JavaScript cryptography enabled applications, with the current state of research and browser support, are more secure than others is debatable. We are not there yet.


There are a number of alternatives and especially the concept of storing encrypted data with a curious or even untrusted host is not new. Traditionally, host applications have been used to handle cryptographic operations. These tools must be installed and have to be properly set up by the user. Mobile platforms might be an ideal environment for these alternatives. Installing applications is hassle-free and very common on mobile devices. Due to the well defined platform, developers can keep user effort to configure these applications to a minimum.

Another promising alternative is a browser add-on called Cipherbox. Its developers recognize that it is unlikely that application providers will enable cryptography as well as the shortcomings of current JavaScript solutions. In their architecture they make a point of separating the interaction with the web content from the cryptology functionality. This could proof to be a significant security advantage over other solutions that give web content access to cryptology functions.

A colleague and I devised the idea of a cryptography enabled http proxy that is similar to the Cipherbox. The proxy is a trusted instance possibly hosted locally or connected via a VPN. All http traffic is sent through the proxy. It analyzes the traffic and encrypts and encrypts relevant parts like messages or images depending on its configuration. We implemented a prototype that is capable of transparently encrypting and decrypting Facebook messages using gpg. A proxy like this could run on a user’s FreedomBox and can in theory be extended to provide crypto-functionality for various platforms including for example Gmail.

A special form of cryptography called homomorphic encryption could enable users to take advantage of both cryptography and computing as a service at the same time. If encrypted data is sent to hosts, they usually can not process the data. If instead a homomorphic encryption scheme is in place, for certain algebraic functions on the plaintext, equivalent functions exist that can be applied to the ciphertext. Proponents of this technology argue that it could enable widespread use of cloud computing by ensuring the confidentiality of private data.


Controlling ones personal data is more difficult with every new database and network based innovation. At the same time privacy is more important than ever in a world that prepares to conglomerate health records, gathers and centralizes consumer behavior data and merges individual financial records into powerful profiles. Cryptography is an effective safeguard we must implement to prevent exploitation and discrimination based on our personal information. Every user must be enabled to use cryptology to control the data he wishes to share.

Browsers vendors must implement the building blocks required for cryptography including a secure key store that can be managed by the user. They should also include means of validating a running application against a checksum. Cryptographers and web developers must work together to implement correct and easy to use de- and encryption functionality for browser based applications.

More people must start thinking about this problem, more ideas are needed and should be carefully vetted by cryptographers and security experts. User interface specialists should work on making cryptography a transparent process. We need to get everyone involved and try to revert the damage that has already been done.

The Decentralized Web Movement

Over the years computers grew in numbers and a logical step in their evolution was to connect them together to allow their users to share things. Little networks grew into huge networks and some computers gained more power than the rest: they called themselves “servers”. Today millions of people are connected online at the mercy of middleman who control the servers of the world.

This is not an introduction to an dystopian fantasy world but an excerpt from a promotion video for Opera Unite, a framework that allows users to host information from their home computer. It was a bold attempt to change the centralized architecture of the Internet. A number of smart people have been pondering this idea even before Opera’s experiment failed miserably.

Communication breakdown, image by miuenski

And the concept of a decentralized web is gaining traction: more and more people realize something has to change. The cause for this trend is obvious: the number of data security and privacy disasters that were made public has spiked in recent times . In April ’11 for example an update to the security terms of service of the widely used Dropbox tool revealed that contrary to previous claims, Dropbox Inc. has full access to user data.

An analysis of the changes to the Facebook privacy policy over time paints a gloomy picture of how the world’s largest social network changed “from a private communication space to a platform that shares user information with advertising and business partners while limiting the users’ options to control their own information”.

With more and more of our personal data moving to centralized servers or “cloud services” – a term that should be used as an euphemism – we’re no longer in control. But there is hope in sight: there are dozens of projects out there that try to stop the trend of centralization and data consolidation.

Decentralized Applications

The most popular of the lot is probably Diaspora. The project got a lot of attention in April 2010 when they managed to raise about $200.000 from almost 6500 supporters. The software looks and feels very much like Facebook or Google+. The innovation is that users are allowed and even encouraged to set up their own Diaspora node. This essentially means allowing users to set up their own Facebook server at home (or wherever they want). The Diaspora nodes are able to interact with each other to form one distributed social network. Furthermore, instead of users having to log in to one central server, they may choose one of many servers administered by different entities. In the end they can decide whom to trust with their data and there is no one entity that has access to all the data.

A social network project that is also worth mentioning follows the same principle. Its name is Buddycloud. The main difference between Buddycloud and Diaspora can be found in their implementation details: Buddycloud builds upon XMPP (Extensible Messaging and Presence Protocol), a more than 10 year old and often implemented specification for “near-real-time, extensible instant messaging, presence information, and contact list maintenance”. There are many unknowns in this area so building on such proven protocols instead of defining new standards might proof to be an advantage. But there are many more social networking projects out there. Wikipedia has a nice list.

The Unhosted project implements another concept. Instead of providing a specific decentralized service it aims to be a meta-service. And after talking to Michiel de Jong I have the impression his plan is even more crucial. He aims to create something fundamental, a protocol, an architecture, a new way of writing web applications. The idea is the following: the traditional architecture of a hosted website provides both processing and storage. An unhosted website only hosts the application, not the data. Unhosted wants to separate the application from the data. By storing the data in another location and combining both application and data only in the browser, the application provider can never access the data. An ingenious and very ambitious idea. I hope they succeed!

Decentralized Storage

A project that aims to replace Dropbox is ownCloud, an open personal cloud which runs on your personal server. It enables accessing your data from all of your devices. Sharing with other people is also possible. It supports automatic backups, versioning and encryption.

The Locker Project has similar goals. They allow self-hosting (installing their software on your own server) and offer a hosted service similar to what Dropbox provides. The service pulls in and archives all kinds of data that the user has permission to access and stores this data into the user’s personal Locker: Tweets, photos, videos, click-streams, check-ins, data from real-world sensors like heart monitors, health records and financial records like transaction histories (source).

Shimmering, image by Jason A. Samfield

A third project worth mentioning is sparkleshare. It is similar to the other projects in this category but allows pluggable backends. That means you can choose to use for example Github as backend for your data or of course your personal server. Awesome!

Freedom to the Networks

Projects such as netless carry the idea even further because after the data is liberated, the connection itself is a soft spot. Network connections should be liberated from corporate and government control by circumventing the big centralized data hubs and instead installing a decentralized wireless mesh network where everyone can participate and communicate.

The adventurous netless project plans to use the city transportation grid as its data backbone. Nodes of the network are attached to city vehicles – trams, buses, taxis and possibly – pedestrians. Information exchange between the nodes happens only when the carriers pass by each other in the city traffic. Digital data switches its routes just the same way you’d switch from tram number 2 to bus number 5. Very inspiring.

Another idea is to utilize networks of mobile phones to create a mesh network. The serval project is working on this. And they have a prototype for the Android platform ready.

The German Freifunk community pursues a similar goal. It is a non-commercial open initiative to support free radio networks in the German region. It is part of the international movement for free and wireless radio networks (source).

A purely software based project is Tor. It is free software and an open network that helps its users to defend against a form of network surveillance that threatens personal freedom and privacy as well as confidential business activities and relationships.

Peer to Peer Currency

One integral thing this article did not talk about yet is money. Bitcoin, a peer to peer currency, might be the missing puzzle piece. The Bitcoin system has no central authority that issues new money or tracks transactions – it is managed collectively by the network.

A major problem of digital currency has been preventing double-spending. Digital money can be copied multiple times so a mechanism is necessary to forbid spending money twice. Bitcoin refrains from having actual digital coins. The system is merely one large transaction log that tracks what money was transferred where.

Each participant has a pair of public and private keys to sign transactions and to allow others to verify transactions. The transactions are entered into a global ever running log that is signed in regular intervals. The signing of the log is designed to require extensive computation time. The entire network of participating users is required to sign the log.

This protects the entire system from false signatures and from anyone tempering with the log and modifying past transactions. An attacker would have to have more computational power at his disposal than the entire Bitcoin network to forge transactions.

Users that give their computing time to the network are rewarded with Bitcoins for their troubles. This is also how the money is generated in the first place. In addition, participants that transfer money are free to include a transaction fee in their order. This extra money is given to the particular user signing the transaction.

A considerable number of sites have emerged that accept Bitcoins in exchange for services or goods. You can buy for example socks online or even pay for your lunch at a burger restaurant in Berlin.


In closing, I find it encouraging, that so many people feel that things have to change and are developing ideas and projects to make it happen. We will see many exciting things in the future and despite the overwhelming might of well-established products, I am hopeful.

« Older Entries     Newer Entries »