shawncat
9/22/2019 - 1:21 PM

Code style

Code style

My preferred code style is 2-space K&R. This is intended to provide a justification for this style.

Why K&R?

K&R style has the following properties:

  1. Provides symmetric size (in terms of screen space consumed) between the opening and closing syntax of a clode block.
  2. Forces no empty/meaningless lines, thereby avoiding artificial distance between related things that should be together.
  3. Consumes the minimum vertical space while keeping the opening and closing syntax of the block on separate lines from the content.

Symmetric size

Symmetry in size is important because it provides a visual cue that two things are similar in effect/importance.

For example, in written text, lines of a paragraph are the same space apart. The space between paragraphs is larger, but also the same between each. The space between subsections, sections and chapters are progressively larger still. The space between things indicates their relatedness, the size of the headings indicate how large of a topic change they introduce. All line spaces are the same, all paragraph breaks are the same size, all chapter headings sizes are the same, etc.

Two subsequent statements in a program are related by one happening after the other, i.e. one directly inheriting the state of the previous. Within a control structure, the first statement inside the structure and the statement before the structure are less related than two sequential statements, and so the introducing line of the control structure increases the space between them, and specifies the different nature of their relationship. The finishing line has the opposite, but equivalent role. The opener transitions fully into a block and the closer transitions fully out. Their real effect on the meaning of the code above and below each of them is of the same magnitude, in opposite directions. In K&R, the space the opener and closer of a control structure occupy on the screen is equivalent, to reflect their equivalent importance in terms of effect on the meaning of the code above and below each.

Comparison

Here is a comparison of brace styles found on Wikipedia:

NameExampleOpen/close
equal height
Open/close
equal indent
Open/close
don't share
lines with
content
K&R
Allman
GNU
Whitesmiths
Horstmann
Pico
Ratliff
Lisp

Similarity to other syntax

Notice the parallel that results between a stand-alone C (or a C-style language) block,

{
  foo();
  bar();
}

an if block in C (or a C-style language) (K&R style),

if (x == y) {
  foo();
  bar();
}

a PHP block-style if block,

if (x == y):
  foo();
  bar();
endif;

a Ruby if block,

if x == y
  foo()
  bar()
end

an if block in the Fish shell,

if x == y
  foo
  bar
end

some HTML/XML,

<ul style="...">
  <li>foo</li>
  <li>bar</li>
</ul>

and an array/object/dictionary literal (JSON, JavaScript, Python, PHP, Ruby...).

var things = [
  'foo',
  'bar',
];
var things = {
  foo: 'foo',
  bar: 'bar',
};

All of these follow a simple pattern:

(introducer)
  (entries)
(finisher)

One line to open, one line to close.

The visual relationship between the opener of the block and it's contents is also the same as in languages which use the off-side rule, such as CoffeeScript, YAML, Python, Haskell and SASS:

if foo():
  bar()
  baz()

boo()

Except for the lack of a finishing line, since a dedicated closer is not needed. The block is closed by returning to the previous indent.

It is also the same as a control structure lacking braces (if permitted by your code style):

if (foo())
  bar();

if (foo()) {
  bar();
}

This means that the presence/absence of braces has a less dramatic effect on the code's layout. Compare these code samples, for example:

function typeName(obj) {
  if (obj.isFoo())
    return 'foo';
  else if (obj.isBar())
    return 'bar';
  else if (obj.isBoo()) {
    log_notice('Cannot get name of a boo');
    return null;
  } else
    throw new Error();
}
function typeName(obj)
{
  if (obj.isFoo())
    return 'foo';
  else if (obj.isBar())
    return 'bar';
  else if (obj.isBoo())
  {
    log_notice('Cannot get name of a boo');
    return null;
  }
  else
    throw new Error();
}

Meaningless lines

Code in K&R style has no enforced meaningless lines. Each line tells the reader something they wouldn't otherwise know, and they can therefore progress from one line to the next gathering new information at each. Blank lines are not bad per se, since they are useful to group related items together, and so hitting a blank line is a signpost, like a paragraph or section break in a book, that the old topic ends and a new topic starts, but whether a blank line is applicable in any given context is entirely dependant on that context, and so cannot be decided by the coding style. Under K&R, every blank line has a purpose, decided by the programmer, based on the context, to group related lines together.

Here is an example of the consequence of forced meaningless lines:

$foos = array();
foreach (getFoos() as $foo)
{
  $foos[] = transformFoo($foo);
}

Under Allman style, a blank line separates the initialisation of $foos and the loop header from the loop body. The blank line implies that they are unrelated, when in fact they are. The whole set of code belongs to a single topic of "array of transformed $foos" which the Allman style has artificially inserted a topic break inside of.

Under K&R, the whole code block can be properly treated as a single visual unit:

$foos = array();
foreach (getFoos() as $foo) {
  $foos[] = transformFoo($foo);
}

Vertical space

K&R style minimises the amount of vertical space which the code consumes, while maintaining that the syntax of the control structure itself does not share lines with it's contents.

Ensuring the control structure doesn't share lines with it's content is important. Lines are the "boxes" or categories in which related syntax goes. In the case of a code block, all the syntax related to a given statement goes on the same line. This not only helps visual comprehension, but means line-wise operations (triple-click select line, delete line, cut line, duplicate line, source code diff etc.) are meaningful. With the Lisp-style bracing, for example, the last brace on the same line as the last statement means that the "delete line" operation cannot be used to delete the last statement, and adding a statement to the end of the block creates a "-1 lines +2 lines" diff instead of only "+1 lines".

Ensuring the code minimises the amount of vertical space is important for information density, i.e. the total amount of information that is readily available, per unit of screen space. Minimising vertical space (lines) used helps the reader to get a "birds eye" view of the code without using a smaller font and without removing the indents and purposeful blank lines that give the code a visual structure. The more code that can be fit on screen without compromising it's visual structure, the more readily the code can be read.

Consider you have two statements:

foo();
bar();

and you want to wrap them in a if block wrapped by a for loop. The K&R result is:

for (...; ...; ...) {
  if (...) {
    foo();
    bar();
  }
}

While the Allman result is:

for (...; ...; ...)
{
  if (...)
  {
    foo();
    bar();
  }
}

Under K&R, the overhead in consumed lines for each control structure is 2. In Allman the overhead is 3, i.e. a 50% higher cost in vertical space.

Survey

This is a survey of well known software projects/companies and their brace/indent style.

Note that some codebases use K&R style for control structures but Allman style for classes and functions. These have been grouped under "K&R".

Company/ProjectBrace StyleIndent TypeIndent SizeSpaces inside
( )?
Reference
GoogleK&Rspaces2no
(allowed)
Google C++ style guide, Google Java style guide
V8 (JavaScript engine) (Google)K&Rspaces2nocode
HHVM (Facebook)K&Rspaces2noHHVM guidelines, see also code
Proxygen (Facebook)K&Rspaces2nocode
Phabricator (Facebook)K&Rspaces2noPhabricator coding standards
.NET CLR (Microsoft)Allmanspaces4nocode
C# Guidelines (Microsoft)Allmanspaces4nolink
TypeScript (Microsoft)K&Rspaces4nocode
Sun Java JRE/JDK (Oracle)K&Rspaces2nocode
Linux KernelK&Rtab8noCoding Style
IntelliJ (JetBrains)K&Rspaces2nocode
NginxK&Rspaces4nocode
LLVMK&Rspaces2nocode
JavaScriptCore (Apple)K&Rspaces4nocode
WebCore (Apple)K&Rspaces4nocode
SystemD (RedHat)K&Rspaces8nocode
KDEK&Rspaces4nocode
GNOMEK&Rspaces8nocode
RequireJSK&Rspaces4nocode
ApacheK&Rspaces4nocode
FirefoxK&Rspaces2nocode
Chromium (Google)K&Rspaces2nocode
LibreOfficeAllmanspaces4nocode
PHPK&Rtabs8nocode, code 2
VimAllmanmixed4nocode, code 2
GitK&Rtabs8noCodingGuidelines, code

Why does what other projects do matter? Only for familiarity. Switching between code written in different styles can be jarring. With most code having been written in K&R, the code you write will feel more familiar to others and others' code will feel more familiar to you by sharing the style.

Why spaces?

  1. The good thing about tabs is that each person can configure their tools to render tabs using their preferred size.
  2. The bad thing about tabs is that each person must configure their tools to render tabs using their preferred size.

The requirement of #2, that all software displaying the code be configured to have the correct tab size, is at best inconvenient, at worst impossible. The problem is there is a large and diverse range of software that will be involved in displaying your code, including:

  • Debuggers (gdb, lldb, hphpd, Firefox/Chrome JavaScript debugger...)
  • Error reporting systems (Bugsnag, Rollbar, FailWhale, Whoops!...)
  • Source code browsers (GitHub, Bitbucket, Upsource...)
  • Code review tools (GitHub, Bitbucket, Upsource, Crucible...)
  • Diff tools (git diff, GitHub app, Meld, TortoiseGit, KDiff...)
  • Editors (including both IDEs and simple text editors (Vim, Gedit...) used to quicktly view files and make small changes)

Not to mention code samples that are put in emails, chat messages, code review/bug tracker issues/comments, blog posts and presentation slides.

Each tool will have it's own default display size for a tab (usually 8) and each tool may or may not let you change it. GitHub, for example, renders tabs with 8 spaces, which can only temporarily be changed by adding ?ts=... to the URL and reloading the page. Consequently, using tabs, you are destined to find yourself reading your code with the wrong indent size (usually 8), whether you like it or not, and depending on the tool, you may not be able to do anything about it. It is not possible to impose the requirement of user-configurable tab sizes on all the software which happens to render your code.

Spaces impose no such requirement. By embedding the correct tab size directly in the code, the code is rendered correctly everywhere, even if you cut and paste it into an email.

Why 2 spaces?

So what should the tab size be? The size is a balance between:

  • Giving the indent a big enough "kick" so that the structure of the code pops out.
  • Keeping the first indented line close enough to the line that introduced it so as not to disrupt the flow of reading.

For this purpose, 2 or 4 is reasonable and common. I find 2 spaces to be preferrable because it conserves horizontal real estate and often provides a near symmetry between the height of a line and the side of an indent.

From Steve McConnell's Code Complete Second Edition chapter on Layout and Style:

Subjects scored 20 to 30 percent higher on a test of comprehension when programs had a two-to-four-spaces indentation scheme than they did when programs had no indentation at all. The same study found that it was important to neither under-emphasize nor over emphasize a program’s logical structure. The lowest comprehension scores were achieved on programs that were not indented at all. The second lowest were achieved on programs that used six-space indentation. The study concluded that two-to-four-space indentation was optimal. Interestingly, many subjects in the experiment felt that the six-space indentation was easier to use than the smaller indentations, even though their scores were lower. That’s probably because six space indentation looks pleasing. But regardless of how pretty it looks, six-space indentation turns out to be less readable. This is an example of a collision be tween aesthetic appeal and readability.

Class member ordering

Class members can be classified along four different dimensions:

  • Type (constant, property, constructor, method)
  • Staticness (static, non-static)
  • Visibility (public, protected, private)
  • Abstrction (final, non-final, abstract)

All else being equal, class members should be sorted in the order of: staticness, type, visibility, abstraction. For example:

abstract class Foo {
  use Trait1;

  // static members

  const _1 = 0;

  static public $_1;
  static protected $_2;
  static private $_3;

  static public final function _1() {}
  static public function _2() {}
  static protected final function _3() {}
  static protected function _4() {}
  static private function _5() {}

  // instance members

  public $_4;
  protected $_5;
  private $_6;

  public function __construct() {}

  public final function _6() {}
  public function _7() {}
  public abstract function _8();

  protected final function _9() {}
  protected function _10() {}
  protected abstract function _11();

  private function _12() {}
}

It is permissable to use a different ordering if the circumstances favour it. For example, a very large class may be more easily navigated with members arranged by topic. (However a class of that size should probably be split up where possible.)