Software metrics (PHP focused) part 1

By | 2009-09-12

Managing a software project with various programmers and around 10 contributions per day is a complex thing.
How can we measure the quality of every single contribution?
How can control the work of the programmers?

First stop: Lines of code

The easiest thing to measure in a project is the source lines of code it has. For example, the main project we are developing at work has ~30k lines (k=1000). That is, 24k of pure code and 7k of comments (and I’m counting only pure PHP OO code, without HTML, CSS or javascript).

Is this that simple to count? No, it’s not. The main problem is that programming is brain work, where creativity, skills to solve problems, and smartness are put in play. It’s like writing a novel: can you say that one novel is better than other just counting the pages it has? Can you say a Boeing 717 is worse than a 747 because it weights less? As the Wikipedia’s article says, only this metric can be useful when comparing 2 projects with different order of magnitude. However, the ratio comments per lines of code (in my case 1 comment every 4 lines) usually is a good index of quality.

Second stop: Average lines per day

Calculating the average lines of code per day can be tricky as well. At my job, a senior developer usually does ~50 lines per day, but a junior programmer does ~130 lines. Is that much? Well, having a look at some metrics about a similar project, phpMyAdmin, in Ohloh (a website with metrics for open source soft), and doing some maths, it’s suggestion is just 17 lines per day! On the other hand, if you google for “lines of code per day” you will get really wide values, from tops of 1k (using code generators: tricking) to normal less-than-100 values. Moreover, the deviation from this average value is actually huge: one day I can do 2 lines, other day I can do 200.

Senior does 50 lines, junior 130… is the senior one a slacker? Of course is not… usually the code from the senior one is better: less prone to errors, more adaptable, more concrete function’s names, more elegant, and does more things in less lines. The more the better? Actually the less code needed to solve a problem, the better. About this topic, I recommend reading the article : Code is your enemy!.

Tools? We use phploc for counting lines of code, and phpcpd for detecting cases of duplicated code. Both tools are developed by Sebastian Bergmann, the author of PHPUnit (the most popular testing framework for PHP).

Next, in part 2 of this post, I’ll be speaking about other metrics, like code coverage, cyclomatic complexity and some interesting ratios based on software package metrics.