Concepts of programming languages

Does Your Programming Language Influence Code Quality?

These days Twitter had a couple of interesting tweets about programming languages. Which one is the best one? According to the tweets the question has been settled:

(click the image to see the original tweet)

Let the flame war begin!

This question is an endless source of flame wars. Ruby is better than Java because it has dynamic typing, which in turn allows for short, concise source code. Yeah, right, the Java guys say, your programs read fine, but they crash at your customer’s because type checking is left to the runtime. You’ve got it all wrong, the followers of LISP and Clojure say. Object Oriented Programming is an expensive disaster which must end. Don’t you see it’s all about functions? My boss is inclined to agree. “When we’re programming NATURAL on the mainframe my programmers never come up with the kind of problems you Java gals and guys are fighting”, he frequently says, adding “Are you sure objects are an improvement of procedural programming?”.

C programmers shake their head in despair. How can you waste so many CPU cycles? Don’t you ever think about memory consumption? “But we do”, assembler programmers say. “We know exactly how many cycles our instructions take”, they claim, only to hear “So what? Our GNU C optimizer knows even better”.

Counting the languages to choose from

I could continue this list endlessly. By the mid-90’s, more than a thousand programming languages had been invented, and every single language had its followers. I don’t have a clue how many programming languages exist today, but it’s a safe bet to say it’s 10.000 languages at least. I, too, have invented at least one language (probably more than one, depending on how you define a programming language). I didn’t promote it, so nobody has ever counted my language. Doubtless, that’s the fate of most languages. Counting languages is a moot point.

Be that as it may, there are roughly 50 popular programming languages to choose from. Which one to choose for your next project? Sometimes the answer is clear. If you happened to buy SAP, you have to choose number 39 on the October 2014 TIOBE index of programming languages. If you’re into statistics, you’ve got the choice between R and SAS. Bloggers and web site owners almost inevitably have to learn PHP (unless they’re content with standard software).

Factors limiting your freedom of choice

Most of the time you have more choice. Even web design – once the domain of Javascript – can be done in a variety of languages: Dart, Ceylon, Typescript, CoffeeScript, AtScript, and thanks to Emscripten and CLANG, even native C code. Just think of the famous port of DOOM. Most web sites covering the Javascript port of the DOOM game are heavily infested with trackers, so I won’t provide a link in order to protect your privacy. Suffice it to say I’ve seen an impressive live demo by Google’s Daniel Kurka who’s been playing the Javascript port of DOOM at the JAX 2013 conference. Funny thing is the demo has raised so much attention that Google doesn’t find many articles covering the doom of Javascript if you’re looking for “Javascript doom”.

So we’ve got some 50 languages to choose from. Asking the internet doesn’t give you a good hint on which language is the best one. Neither does asking your friends or colleagues: every single one of them is biased.

Examining the language question scientifically

Time to approach the question scientifically. Are certain programming languages better suited to a specific domain than others? Which programming language makes you more productive?

One of these questions has been tackled by Baishakhi Ray, Daryl Posnett, Vladimir Filkov and Premkumar T. Devanbu of the University of California. Their study focuses on whether languages have an impact on code quality. Obviously that’s only one the myriad of aspects to take into account, but doubtless, it’s an important one.

Preliminary results

Cutting a long story short, the study says that

  • Your programming language does have an influence on code quality.
  • Functional languages are better than procedural languages.
  • Strong typing is better than weak typing.
  • Static typing is better than dynamic typing.
  • Managed memory (i.e. using a garbage collector) is better than unmanaged memory (i.e. calling malloc() and dealloc() manually).

But the study also says the influence is small. Other factors are much more important. Code quality differs only by a small percentage between languages. So it’s better to look at your team’s skills when choosing a language. First of all, you should care about your team’s motivation. Lack of motivation is the number one killer of productivity. Closely related are corporate politics and bureaucracy. You shouldn’t even think about improving developer productivity as long as you spend a major part of your budget on politics or bureaucracy.

Weaknesses of the statistical approach

In any case, you have to take the result with a grain of salt. It’s an interesting study, performed meticulously, but it’s hard to tell whether the raw data is biased or not. Better consider this a preliminary result. More likely than not it will turn out to be correct. But there’s little in the way of certainty.

The study did sort of a brute-force approach on GitHub repositories. The researchers selected some 750 popular projects with a decent history. It’s the commit history the researchers were interested in: they evaluated the commit comments. The team counted commits containing the word “error”, “bug”, “fix” and half a dozen similar key words.

That’s a potential weakness of this approach: who’s to say this assumption is valid? If it’s a – say – French project, it’s unlikely any commit comment contains such a word. The paper left me with the impression only projects “speaking English” were considered, but that’s one of the sources of doubt. Mind you, every programming language tends to form its own closed community using its own language. Maybe one of the communities under examination prefers a different word. Even more likely, maybe there’s a different culture dealing with errors. Admitting an error isn’t easy. Maybe some developers prefer to call a “bug fix” an “optimization” or “improvement”? For what it’s worth, I often describe the corrected behavior instead of describing the bug, so there’s no need to include the words “bug” or “fixed”. The more bug fixes are committed, the worse the code quality of the project.

I suppose the researchers did examine the committers’ vocabulary first, but the 11-page summary doesn’t say it, so there’s no way to be sure. On the other hand, the sheer multitude of projects may compensate the effects of project-specific cultures. Oh, wait – that’s another source of doubt: granted, the researchers examined 750 projects, which sounds like a huge number (and a lot of work). On the other hand, that’s only a few dozen projects per language. Is that enough to provide reliable statistical data?

By the way, the study used a statistical approach. In most cases, such an approach doesn’t prove anything. It gives you a hint how things might be, but you can’t be sure as long as you don’t understand why the results are the way they are. Such studies also have been conducted, and many of them point in the same direction: prefer strong typing over weak, static over dynamic, functional over procedural and managed memory over unmanaged memory. By the way, I doubt anybody questions the last point: anybody who’s ever programmed native C knows how hard it is to manage memory manually.

So how to chose the perfect language for your next project?

First of all, ask your team. Which languages do they “speak”? Is there an established language in your company already? If so, are your team members happy with the current language, or do they want to try something new?

Take your company’s culture into account. Do your developers prefer statically or dynamically typed language? My observation is there are two kinds of developers. The “enterprise guys” who love static type because of the type safety. And the other fraction who doesn’t like the verbosity of static typing. Forcing a “dynamic” developer to use a statically typed language won’t help your productivity. Unless your programmer is open for a change. Look out for such a window of opportunity. If you see such a window of opportunity, heed the recommendation of the study.

Mind you: I’ve already seen very productive programmers using Assembler to implement business software. Sounds like an unlikely pairing, doesn’t it? Under the right circumstances, you can run your entire company using Assembler.

Dig deeper

A Large Scale Study of Programming Languages
and Code Quality in Github

The discussion of the paper on Reddit
A similar discussion on Hacker News