Have we misunderstood DVCS and git / hg ?

Yesterday github was at least partially inaccessible due to a DDOS attack.

And there were some interesting tweets appearing. I just show a few that appeared at the top of my twitter search results. But there were many many more.

Perhaps the most hard hitting of them all

There is clearly a mismatch between perceived expectation of DVCS workflows and our actual experiences. I did start wondering, was it our expectations that were inflated, or were we (as a community) not using git correctly.

Frankly all the work a developer does by himself/herself is not affected. One can continue to commit, branch, merge etc. at will. What is impacted is the collaboration with other developers. So the "D" in the "DVCS" has at least partially lived up to its promise. Since each developer can still do a lot of work even as he/she is distributed.

The event allows us to reflect on whether we could use git a bit differently, and perhaps suggest some opportunities for improvement. However, I think the major issue is perhaps our expectation of the D in DVCS is inflated. Some of the reasons are related to the state of technology. Some others are inherent in the workflow. Here's why

  • A truly distributed (federated) workflow may not require github at all. However it will require all repositories to be able to communicate with each other in a peer to peer fashion. That would mean it would require us to open up our development workstations to be accessible to other development workstations either directly or via some protocol like bittorrent. Is it really practical in the age of don't expose your development machine, private IP addresses, proxy servers, firewalls etc.? I suspect not. We are not yet ready for a peer to peer communication over the internet between our development nodes. To the extent that developer nodes can communicate with each other, some amount of collaboration remains feasible even if the central github/origin repo goes down.
  • Assuming these nodes can communicate with each other, we don't ship different versions of software from different development nodes. We typically have a reference version that is ready to test, ship etc. We need to be able to point to a branch or a tag and say thats the current state of my software under development. Granted that in this model, the reference node may not be able to reflect the current state of software development in realtime. In the sense, some changes and commits on the development nodes still haven't made it to the reference node (often called the origin). But the organisation as a whole treats the development nodes as work in progress and the origin as the current state of (partially) completed work. If we work in a fully federated model where will we have the reference version? If we annoint one dev node as the reference node - then that node is in effect the equivalent of github, and should that node be inaccessible, many collaborative activities will stop. And if we do not annoint one particular node as the first amongst equals, there are things we need to figure out from an overall business perspective, how to make sense out of a bunch of disparate nodes, and compute the current state of software as a function of all the collective development nodes.
  • While I cannot readily imagine a good solution to the issue I just described, let us imagine, we did find a good solution. We would have an issue of different views of history. Different developers would see history differently. They would see the software being developed with a differently ordered commits. Plus given n-to-n communication, the complexity of doing rebases and merges would shoot up drastically. Are we sure we have understood what kind of complexity it would require us to deal with even if the technology to do so was feasible? I suspect not.

An analogous situation would be where sales staff synced up their notebooks with a central database in the morning, and went off to do their tasks independently during the day in remote areas without any connectivity, came back in the evening and synced up their notebooks once again with the central database. This is a similar scenario to how we use git. Clearly, issues such as resolving conflicts would remain, yet the offline model allows the sales staff to do "most" of their work except for the coordination that might be required but not feasible due to lack of connectivity.

Is there a way to use git better than how we've been using it so far? Perhaps yes. Perhaps we could have multiple central databases (using the analogy above) in a master slave model or multi master model. Now I dont think we have the ready technology with DVCS to use the multi master model where different masters can readily synchronise with each other without any manual intervention (I could be wrong, but I make that assumption). In which case a master slave model would work better. So perhaps we could have a master/slave arrangement between multiple "origins". And the master would near instantaneously push changes out to the slaves. These would be hosted by different git providers. And if the master was to be not accessible, we could always promote another slave to become the master, and have the original master re-join the cluster as a slave when it is back online. This might require a bit of tooling but does not seem like a very difficult task.

Perhaps, I haven't fully imagined how people would be doing it "right" when they design their workflows in a manner a github downtime does not impact them at all. And would be keen to learn or get answers. But at least for the moment, I think we need to temper our expectations. Based on the real constraints. It does not mean distributed VCS has failed, but instead has lived up to promises that it reasonably could. And the rest was just us letting our expectations go a little further.