Sunday, 3 October 2010

How to build a Java buildserver

At Zenbi, we develop software systems using the Java language and technology. In this article, I will give you an insight into the architecture of our buildserver and development infrastructure.

For those of you who haven't ever used or even heard about buildservers, let me explain what they are and what they offer. Without a buildserver, you probably only use two tools in your development infrastructure: your IDE, such as Eclipse, IntelliJ or NetBeans and a version control system (VCS), such as CVS, SVN or GIT. Everyone checks in their code into the VCS at the end of each day and when the application is done, the lead developer builds the deliverables with his/her IDE and emails it to the server admins who install it on the servers. This could work very well in a small environment with only a few developers sitting in the same room, but when things get bigger and more complicated, the following problems arise:

  • There is no uniform way to build the final deliverables, because the local environment of the lead developer may change
  • Every developer must copy dependencies to his/her PC, leading to problems when different versions of dependencies are used
  • There is no uniform way to test the application. Unit-tests are run locally, making them dependent on the local PC environment
  • There is no way to automatically deploy the deliverables to a Development, Test or Production environment
  • You have no way of customizing the build-process of the deliverables. You could write some Ant-scripts, but they would probably only work on the PC on which the deliverables are built
  • You don't have statistics on the code, such as test coverage, bug-reports (such as PMD, Checkstyle or Findbugs) or metrics (such as JDepend)
A buildserver offers a centralized component in the infrastructure that can build and analyze the deliverables, manage the dependencies and deploy the deliverables to the servers.

Components
So, what components make up a buildserver? First of all, I would like to tell you that all software that we use at Zenbi is Open Source, we use Linux, OpenOffice, Eclipse, etc. Our buildserver too, is completely built with Open Source components, so if you want to copy our approach, you don't have to be afraid of costs.

The first component is a component that everyone, even the people without a buildserver, have. It's the VCS, the version control system. This is an absolutely necessary component, used to keep track of changes in source-code and providing a central storage point for all source-code. At Zenbi, we use SVN for this. I also have experience with CVS, but that one is clearly outdated and inferior to SVN. The new kid on the block here is GIT. We haven't tried it yet and the main reason why I have my doubts about it, is because it doesn't have version control on directories, but on files only. This was one of the major improvements of SVN compared to CVS, so I don't understand why GIT doesn't have it. Anyway, use whatever VCS you like, they're all Open Source.

The second component is used to actually build the deliverables from the source-code. As I said earlier, without a buildserver, you would do this with your IDE (for example Eclipse), but since this is a "client" application, you cannot use it to build in batch-mode (from the command-line). So therefore, you will need another tool, let's call it a Batch Builder. For this, we use Maven 2. Here too, there are many alternatives, Ant being the most famous one, but we chose Maven because it offers much more than just building your deliverables. Maven also provides a lot of plugins with which you can extend and modify the build-process, such as reporting plugins, that generate useful HTML-reports about your source-code (possible bugs in your code, test coverage by unit-tests, which are also automatically run by Maven during each build, and dependency metrics). You can write your own plugins too, if you want, to customize the build-process even further. Finally, Maven offers a central repository that contains all dependencies (jar-files) that you need. You can access this repository from your IDE (such as Eclipse), so that you won't have to copy all dependencies to your workstation anymore, making sure that there is only one place that contains dependencies, preventing version-hell. Maven also copies the actual deliverables from the build-process to this repository, so that you can re-use them as dependencies in other projects. Maven does add some configuration files to each project, but it's definitely worth it.

Maven however, is a command-line tool that you can start. It runs, and when it is finished, the process ends. There is no background process that is running continuously (I just said that you can access the Maven repository from your workstation, but this is filesystem-access, a.k.a. a "share", it does not require a separate "server"-process). So, who starts Maven? Does the buildserver administrator log in each day to start Maven for each project? Of course not. For this, we have the next component, the Continuous Integration Tool (CIT) and we at Zenbi use Luntbuild for this. Hudson is also a very popular and good choice, but CruiseControl and Apache Continuum are outdated, don't use them. The CIT runs as a background-process with a web-interface (Luntbuild has an embedded Jetty webserver). You can start builds from that web-interface, but you can also configure the CIT to schedule automatic builds (for example each night). This way, all source-code that is checked in by the end of the day is automatically built during the night. Luntbuild keeps its own "local working copy" of each project, which is basically a checkout from the VCS. Before each build, it updates the LWC and then it runs Maven on it. Maven builds, deploys to the Maven repository and Luntbuild tags the current version in VCS with the current buildnumber, so that you know which build belonged to which version of the source-code. It's that easy!

Automatic nightly builds have another advantage that is also very funny and brings joy to the department! Imagine for example, that someone checks in code that does not compile. Then, at night, the Maven-build will fail and Luntbuild will display a big fat red dot next to the corresponding project. Thanks to the VCS, it is easy to find out which developer checked in the faulty code and he/she has to buy cake for the entire department the next day! This way, everyone will make sure that their code works, before checking in.

Beware, though, of people who avoid checking in at all, because they know that their code won't compile... This is dangerous! The longer someone's code isn't integrated with the other people's code (hence the name "Continuous Integration"), the bigger the incompatibilities when it finally is. Also, workstations usually aren't backed up regularly, unlike the buildserver, so if a workstation harddisk crashes, it's goodbye to all code that wasn't checked in...

I told you that Maven generates useful HTML-reports about your source-code. These are just HTML-files that can be placed anywhere by Maven, but they will have to be published somehow. Therefore, we have also installed a Webserver on our buildserver. We use Apache Tomcat for this, but feel free to use any other.

We have also installed a Wiki on the Webserver, for general information sharing in the team. This doesn't really have anything to do with the build-duties, but the buildserver provides a centralized location for the development team and we have a Webserver running here anyway. Our choice here is JSPWiki, but there are many alternatives.

Next up is the database. Since almost every application uses a database, you need one for testing. Most companies use a database in the Development environment for this, but I'm not a very big fan of this approach. The Development environment is supposed to be identical to the Production environment (in architectural terms, not content), so that developers can test their applications before release. When you use the database in this environment for other purposes (local testing, local unit-testing or buildserver unit-testing), you "pollute" the database. The Development environment is then no longer identical to the Production environment. Therefore, we have installed a separate database on the buildserver, used for said local testing, local unit-testing and unit-tests run by the buildserver. Our choice here is MySQL, because we use that in Production too. Be sure to use the same database as in the other environments, like Test and Production!

And finally, the last component. It's the SystemManager, used to deploy the deliverables (we call them Systems, hence the name SystemManager) to the different environments, such as Development, Test and Production. I have to take back what I said earlier, this one is not Open Source, we built it ourselves. Deploying to an environment is usually very company-specific, so tools like this will almost always have to be developed inhouse. What it does is very simple though, it takes the most recent deliverables from the Maven repository and deploys them to the specified environment. The SystemManager is a command-line tool, without a web-interface, so an admin will have to log in to the buildserver in order to do it. We chose this option for security reasons, because you don't want anyone to deploy just like that. If you do want a web-interface, well it's your tool, so go ahead and develop it!

To make things a little more insightful, below is a picture with the different components of our development infrastructure. Between parentheses is the actual name of the tool that we use, but as I said, you can choose others. Click the picture to enlarge.


Extensions
There are some possible extensions to this setup that are sometimes used by other companies. One of these is a repository manager like Archiva or Artifactory. It acts as an intermediate between the Maven repository and the components that want to use the Maven repository and offers functionality such as version control on dependencies and identification of the person who submitted a dependency to the repository. We don't need such a component, because we see no harm in accessing the Maven repository directly and the buildserver administrator is the only authorized party to add new dependencies to the repository. Also, the above picture is complex enough for the average developer. Remember, we are not all buildserver admins. If you keep the usage of the buildserver simple, developers are much more likely to contribute to a clean environment. Make it complex and they see the buildserver as a difficult piece of overhead that is best left alone.

Simple goals and rules lead to intelligent behaviour,
complex goals and rules lead to stupid behaviour

- Albert Einstein -

Another thing to take care about is release management. You want to have control over the different versions that are released. This is also an issue where a repository manager can help. We, however, keep this very simple. Basically, we stick to the Maven standard of keeping the head revision of the application a SNAPSHOT revision. Every iteration, we create a branch for the release. Changes in that branch will also have to be merged into the trunk, of course. Because the branches and the trunk have different Maven versions, Maven keeps them separated in the Maven repository. That way, you don't need a repository manager to keep track of the different versions that you have.

Apart from that, each build gets its own version number from Luntbuild. We don't really use those version numbers, but because Luntbuild tags each build in the Subversion repository, we can always get the source-code of a certain build back. Luntbuild even supports re-building. That way, you can re-build a previous build, with the source-code as it was at that time. We have also developed a Maven plugin that writes the buildnumber to the actual application (in an xml-file), so that you can identify the buildnumber of a running application.

So, my advice is, if you want to extend your buildserver, keep one thing in mind: keep it simple! At my previous job, the buildserver was far more complex, with both Tomcat and Websphere running, with a CI tool (CruiseControl) that consisted out of 2 separate parts, with a custom made management application, with a separate release versioning tool (Harvest), with four different version-numbers per application (Maven version, CruiseControl buildnumber, Harvest version and a so-called "application-version" that I'm not even getting into), etc., etc. I have ample experience with developers avoiding as much as they could of that buildserver, because it was just too complicated. And still, some people wanted to install a repository manager on top of all that! Luckily, I was in the position to prevent that...

It takes an intelligent person to build something complex;
it takes a genius to build something simple

- Albert Einstein -

Be the genius!