Monday, October 5, 2009

FeatureBranch

FeatureBranch: "

With the rise of Distributed Version Control Systems (DVCS) such
as git and Mercurial, I've seen more conversations about strategies
for branching and merging and how they fit in with Continuous
Integration
(CI). There's a bit of confusion here, particularly
on the practice of feature branching and how it fits in with CI.


Simple (isolated) Feature Branch


The basic idea of a feature branch is that when you start work on
a feature (or story if you prefer that term) you take a branch of
the repository to work on that feature. In a DVCS, you'll do this
in your personal repository, but the same kind of thing works in a
centralized VCS too.

I'm going to illustrate this with a series of diagrams. I have a
shared project mainline, colored blue, and two developers, colored
purple and green (since the developers names are Reverend Green and
Professor Plum).

I'm using labeled colored boxes (eg P1 and P2) to represent
local commits on the branch. Arrows between branches represent
merges between branches, the boxes are colored orange to make them stand
out. In this case there are updates, say a couple of bug-fixes,
applied to the mainline (presumably by Mrs Peacock). When these
happen our developers merge them into their work. To give this a
sense of time, I'll assume we're looking at a few days work here,
with each developer committing to their local branch roughly once a day.

In order to ensure things are working properly, they can run
builds and tests on their branch. Indeed for this article I'll
assume that each commit and merge comes with an automated build and
test on the branch it's on.

The advantage of feature branching is that each developer can
work on their own feature and be isolated from changes going on
elsewhere. They can pull in changes from the mainline at their own
pace, ensuring they don't break the flow of their
feature. Furthermore it allows the team to choose its features for
release. If Reverend Green takes too long, we can release with just
Professor Plum's changes. Or we may want to delay Professor Plum's
feature, perhaps because we are uncertain that the feature works the
way we want to release it. In this case we just tell the professor
to not merge his changes into mainline until we are ready for the
feature. This is called cherry-picking, the team decides
which features to merge in before release.

Attractive though that picture looks, there can be trouble
ahead.

Although our developers can develop their features in isolation,
at some point their work does have to be integrated. In this case
Professor Plum easily updates the mainline with his own
changes. There's no merge here because he's already incorporated the
mainline changes into his own branch (there will be a build). Things
are however not so simple for Reverend Green, he needs to merge all
of his changes (G1-6) with all of Professor Plum's (P1-5).

(At this point many users of DVCSs may feel I'm missing
something as this is a simple, perhaps simplistic view of feature
branching. I'll get to a more involved scheme later.)

I've made this a big merge box as it's a scary merge. It may be
just fine, the developers may have been working on completely
separate parts of the code base with no interaction, in which case
the merge will go smoothly. But they may be working on bits that do
interact, in which case here lye dragons.

The dragons can come in many forms, and tooling can help slay
some of them. The most of obvious dragon is the complexity of
merging the source code and dealing with conflicts as developers
edit the same files. Modern DVCSs actually handle this rather well,
indeed somewhat magically. Git has quite the reputation for dealing
with complicated merges. So much so that the textual issues of
merging are much better than they used to be - indeed I'll go so far
as to discount textual conflicts for the purposes of this
article.

The problem I worry more about is a semantic conflict. A simple
example of this is that if Professor Plum changes the name of a method
that Reverend Green's code calls. Refactoring tools allow you to
rename a method safely, but only on your code base. So if G1-6
contain new code that calls foo, Professor Plum can't tell in his
code base as he doesn't have it. You only find out on the big merge.

A function rename is a relatively obvious case of a semantic
conflict. In practice they can be much more subtle. Tests are the
key to discovering them, but the more code there is to merge the
more likely you'll have conflicts and the harder it is to fix
them. It's the risk of conflicts, particularly semantic conflicts,
that make big merges scary.

This fear of big merges also acts as a deterrent to
refactoring. Keeping code clean is constant effort, to do it well it
requires everyone to keep an eye out for cruft and fix it wherever
they see it. However this kind of refactoring on a feature branch is
awkward because it makes the Big Scary Merge worse. The result we
see is that teams using feature branches shy away from refactoring
which leads to uglier code bases.


Continuous Integration


It's these problems that Continuous Integration was designed to
solve. With Continuous Integration my diagram looks like this.

There's a lot more merging going on here, but merging is one of
those things that's much easier to do frequently and small rather
than rarely and large. As a result if Professor Plum is changing
some code that Reverend Green relies on, the Reverend will find it
early, such as when he merges in P1-2. At that point he's only got
to modify G1-2 to work with the changes, rather than G1-6.

CI is effective at removing the problem of big merges, but it's
also a vital communication mechanism. In this scenario the potential
conflict will actually appear when Professor Plum merges G1 and
realizes that Reverend Green is actively building on Plum's
libraries. At this point Professor Plum can go and find Reverend
Green and they can discuss how their two features interact. It may
be that Professor Plum's feature requires some changes that don't
mesh well with Reverend Green's changes. By looking at both their
features they can come up with a better design that affects both
their work-streams. With the isolated feature branches our
developers don't discover this till late, probably too late to do
much about it. Communication is one of the key factors in software
development and one of CI's most important features is that it
facilitates human communication.

It's important to note that, most of the time, feature branching
like this is a different approach to CI. One of the principles of CI
is that everyone commits to the mainline every day. So unless
feature branches only last less than a day, running a feature branch
is a different animal to CI. I've heard people say they are doing CI
because they are running builds, perhaps using a CI server, on every
branch with every commit. That's continuous building, and a Good
Thing, but there's no integration, so it's not CI.


Promiscuous Integration


Earlier I said parenthetically that there are other ways of doing
feature branching. Say Professor Plum and Reverend Green take tea
together early in the cycle. While chatting they discover they are
working on features that interact. At this point they may choose to
integrate with each other directly, like this.

With this approach they only push to the mainline at the end, as
before. But they merge frequently with each other, so this avoids
the Big Scary Merge. The point here is that the primary issue with
the isolated feature branching scheme is its isolation. When you
isolate the feature branches, there is a risk of a nasty conflict
growing without you realizing it. Then the isolation is an illusion,
and will be shattered painfully sooner or later.

So is this more ad-hoc integration a form of CI or a different
animal entirely? I think it is a different animal, again a key point
of CI is everyone integrates to the mainline every
day. Integrating across feature branches, which I shall call
promiscuous integration (PI), doesn't involve or even need a
mainline. I think this difference is important.



I see CI as primarily giving birth to
a release candidate at each commit. The job of the CI system and
deployment process is to disprove the production-readiness of a
release candidate. This model relies on the need to have some
mainline that represents the current shared, most up to date
picture of complete.


--Dave Farley




Promiscuous Integration vs Continuous Integration


So if it's different is PI better than CI, or more
realistically under what circumstances is PI better than CI?

With CI, you lose the ability to use the VCS to do cherry
picking. Every developer is touching mainline, so all features grow
in the mainline. With CI, the mainline must always be healthy, so in
theory (and often in practice) you can safely release after any
commit. Having a half built feature or a feature you'd rather not
release yet won't damage the other functionality of the software,
but may require some masking if you don't want it to be visible in
the user-interface. This can be as simple as not including a menu
item in the UI to trigger the feature.

PI can provide some middle ground here. It allows Reverend Green
the choice of when to incorporate Professor Plum's changes. If
Professor Plum makes some core API changes in P2, then Reverend
Green can import P1-2 but leave the others until Professor Plum's
feature is put onto the release.

One worry with all this picking and choosing is that PI makes it
really hard to keep track of who has what in their branch. In
practice, it seems tooling pretty much solves this problem. DVCSs
keep a clear track of changes and their origins and can figure out
that when Professor Plum pulls G3 he already has G2 but doesn't have
B2. I may have made mistakes drawing the diagram by hand, but tools
do keep track of these things well.

On the whole, however, I don't think cherry-picking with the VCS
is a good idea.



Feature Branching is a poor man's
modular architecture, instead of building systems with the ability
to easy swap in and out features at runtime/deploytime they couple
themselves to the source control providing this mechanism through
manual merging.


--Dan Bodart



I much prefer designing the software in such a way that makes it
easy to enable or disable features through configuration changes. My
colleague Paul Hammant calls this Branch by
Abstraction
. This requires you to put some thought into what
needs to be modularized and how to control that variation, but we've
found the result to be far less messy that relying on the VCS.

The main thing that makes me nervous about PI is the influence on
human communication. With CI the mainline acts as a communication
point. Even if Professor Plum and Reverend Green never talk, they
will discover the nascent conflict - within a day of it
forming. With PI they have to notice they are working on interacting
code. An up-to-date mainline also makes it easy for someone to be
sure they are integrating with everyone, they don't have to poke
around to find out who is doing what - so less chance of some
changes being hidden until a late integration.

PI arose out
of open-source work, and it could be that the less intensive tempo
of open-source could be a factor here. In a full time job, you work
several hours a day on a project. This makes it easier for features
to be worked in priority. With an open source project people often
put in a hour here, and the next hour a few days later. A feature
may take one developer quite a while to complete while other
developers with more time are able to get features into a releasable
state earlier. In this situation cherry picking can be more
important.

It's important to realize that the tools you use are largely
independent of the integration strategy you use. Although many
people associate DVCSs with feature branching, they can be used with
CI. All you need to do is mark one branch on one repository as the
mainline. If everyone pulls and pushes to that every day, then you
have a CI mainline. Indeed with a disciplined team, I would usually
prefer to use a DVCS on a CI project than a centralized one. With a
less disciplined team I would worry that a DVCS would nudge people
towards long lived branches, while a centralized VCS and a
reluctance to branch nudges them towards frequent mainline
commits. Paul Hammant may be right: 'I wonder though, if a team
should not be adept with trunk-based development before they move to
distributed.'

"

Sunday, March 29, 2009

Software Configuration Management

Overview

This document is intended to explore some best practices for software configuration management.


Basics

  • All development groups or sub-groups need a versioning strategy

  • The versioning strategy must include non-production environments

  • Entry to any shared environment requires a strategy for automated build/deploy, version labeling and rollback


Versioning Strategy

It is widely acknowledged that versioning artifacts (be it source code, documents or otherwise), is a good practice. This is easily accomplished through a variety of means. However, environmental complexities can make versioning unwieldy. Some examples of environmental complexities include:

  • Multiple artifact owners/contributors

  • Multiple versions supported at a given time (through multiple production or non-production environments)

  • Dependencies, integration


Among application developers, there seem to be four varieties of versioning strategies in use:

  • Concurrent Versioning Systems

  • Versioning Systems (not concurrent)

  • DIY (folders, spreadsheets, etc.)

  • Nothing


Any of these may suffice depending on the overall complexity of the problems being solved. This document intends to make a case for moving from a lighter strategy to a more robust strategy as problem complexity increases. Many organizations face all of the complexities mentioned above, but in many cases application development is not prepared to manage them.


Analysis

There are some key questions an application developer may want to ask when solving a problem having to do with artifact contributions:

  • Does this problem require multiple people to contribute to an artifact?

  • Does this problem require multiple people to contribute to an artifact at one time?

  • Does this problem require multiple people to contribute to an artifact at one time who do not reside in the same location?

  • Etc.


Without furthering that line of questioning, it may become apparent that the logistics behind a multi-collaborator environment would be more easily solved with the help of a tool.


A similar line of questioning can apply to the complexity of versions:

  • Does this problem require supporting multiple versions of the solution?

  • Does this problem require supporting multiple versions of the solution at one time?

  • Does this problem require supporting multiple versions of the solution at one time in a single environment, multiple environments, production, non-production…

  • Etc.


Tenets

Given a brief analysis of common problems/questions and techniques listed above, this section is intended to submit, for discussion, a list of tenets for application development:


  1. All project groups or sub-groups need a versioning strategy

  2. The versioning strategy must include non-production environments

  3. Entry to any shared environment requires a strategy for automated build/deploy, version labeling and rollback

  4. Configuration should be separate from code

  5. Code is not compiled in production, but rather it is migrated from an environment where it has been validated (see tenet 4)


Questions to Consider

  • Can I progress through the environment stack from bottom (dev) to top (production)?

  • Can I progress through the environment stack from top to bottom?

  • Can I support multiple versions across environments (production or otherwise) at one time?

  • Can I rebuild/restore an environment to any version, from any version?

  • Can I have backwards compatibility?

  • Can I do all of these things in a repeatable way?


For all of the questions above it may not be necessary (or even a good idea in some cases) to do the things listed in the question. However, the real issue is whether you are nimble enough to do it when the problem you are solving calls for it.

Saturday, March 28, 2009

Where's my Google Mobile?


I got my first Blackberry (a Curve) about 10 months ago. I fell in love with it and part of the reason was the Google Mobile App. It made it easy to access all of my favorite Google products without the hassle of opening the Blackberry native browser. Recently, I upgraded to a Blackberry Storm and assumed that I would immediately be downloading my favorite search engine's mobile application.

It turns out that the Google Mobile App is not yet supported for the Blackberry Storm. This is distressing to me. I have since found versions of the jad online and have made attempts to install. I now have the application installed, but it doesn't work right with the touch screen and is still unsupported. What gives?

Sunday, February 22, 2009

Free Apps for Blackberry Storm

I've been thinking for a while about posting a listing of favorite apps for my Blackberry Storm.  Enough procrastinating...here it is:

1.  Gmail - If you use Gmail, then this is the mobile mail client for you.  It had HTML e-mail before the standard Blackberry mail client did.  It allows aggregation of multiple addresses and makes use of my favorite Gmail feature:  labels.

2.  SocialScope - I had been using TwitterBerry as a Twitter client for some time and had abandoned the Facebook app some time before.  That was when I discovered SocialScope (currently in private Alpha).  I requested an invite and received an immediate response.  This app allows me to update my status on social networks as well as follow my friends.  I know, lots of apps do this...but this app has a wonderfully pleasant (and intuitive) interface, and what's more it is fast.  If trying this on the Storm, be sure to disable compatability mode.

3.  VyMail - Visual Voicemail.  This is awesome.  It sits on top of the YouMail service and provides an interface for retrieving voicemail.  The app notifies you of voicemail, but more importantly, it provides important details about the message.  Also, the message can be retrieved with a click (no dialing voicemail anymore!!).

To be continued...

Friday, February 13, 2009

Blackberry Storm

I've had a cell phone for years (of course), but didn't use it for much more than calling. No texting, no data plan...nothing. That all changed about eight months ago when I purchased my first Blackberry. It was a Blackberry Curve 8330 and before long I had installed so much freeware that it became a bit bogged down.  It was time to move on to a new device.

My dilemma was that I wasn't sure if I wanted a Blackberry Storm or if I wanted an iPhone.  Obviously, I had heard good things about the iPhone, but after a few short months, I had become a Blackberry zealot.  In the end, it came down to a decision based on provider.  Ultimately, I decided to stick with Verizon (for reasons I won't go into) and as such, the Storm was the easier choice.

I had read some pretty bad reviews for the Storm right out of the gate, so I was skeptical.  However, I knew that a new OS had leaked (4.7.0.99) that week and I was in a gambling mood.  After trying several stores, I finally scored my device.  Within an hour, I was home installing the new (leaked) OS.  Everything worked like a charm.  
I must say; I love this device.  It performs well, has a beautiful display (with plenty of real-estate) and great touch screen (including tactile feedback).  I literally have no complaints thus far.  Stay tuned for a list of my favorite Blackberry applications.

Saturday, January 31, 2009

Marketing at it's finest

I'm still not tired of these commercials:



C'mon! It's a talking baby!

Rapture for the Geeks

I just finished reading Rapture for the Geeks by Richard Dooling. I picked it up @ the local Barnes and Noble while my companion was looking at cooking magazines (or something). I had a chance to read a bit before it was time to go and before too long I was drawn in. The subtitle itself was intriguing enough: "When AI Outsmarts IQ"...I decided I had to have it.

The opening few chapters were amusing enough but did spend some time on what I consider to be computer/software basics. However, I imagined that the author had to establish a vocabulary to be used throughout the book and didn't want to leave any readers behind.

The fundamental premise of the book was about a concept of which I hadn't previous been exposed: The Singularity. According to Wikipedia, this is defined as: "a theoretical future point of unprecedented technological progress, caused in part by the ability of machines to improve themselves using artificial intelligence".

The book challenged me in all the ways that I hoped (abstraction, philosophy, technical facts). If you're looking for a quick read, pick it up.