🗓️ Did you miss our Demo Webinar about Qubu Ultralight? Watch it now
Software Engineering Values and Guidelines
The purpose of this document is to help programmers who are new to qubu to write code in the qubu style, as well as to answer frequently asked questions about programming and software design.
This document is heavily inspired by the (unfortunately retired) Our Machinery Guidebook (1). When in doubt, check that for guidance. Although we don’t use C and use Rust (and sometimes other languages, when we have to), many of the guidelines can be intelligently translated to our context, and some can even be applied verbatim.
Be consistent with existing code
You may find existing code in the codebase that conflicts with the guidelines. That is usually because the code was written before the guideline. Resist the urge to go in and “fix” all that code. Such fixes do not add much value by themselves and risk introducing new bugs. Instead, only fix the code that needs to be updated for other reasons (bug fixes, new features, etc.). Fixing code that you are already working on anyway is a lot safer because you already have a good understanding of the code which makes you less likely to introduce bugs. Also, since you are actively working on the code, chances are that if you introduce a bug, you will spot it quickly.
Code must be easy to change
The primary purpose of these guidelines is to design code that is easy to change and adapt to various situations and circumstances.
Being easy to change is the most important property of a codebase as it is the basis for everything else we want to do with the code. Code that is easy to change can adapt to new technical requirements or business goals with ease. Code that can’t change is doomed to stagnate, become technically irrelevant and then die (when it is easier to throw everything away and start from scratch than to keep modifying it).
For example, imagine there’s code doing computation we do on an array of i32s. After a while, an i64 version of the same code appears in the codebase. Here we’d start having temptations of introducing a generic parameter and letting the compiler generate the two versions for us. Now imagine, we suddenly have to add an i50 version of the same code. There’s no numeric type to represent i50 directly, so we’d have to read multiple one or two i64s to produce the i50. This new version is sufficiently different from the ones that came before, that some assumptions made when we’d choose an abstraction would no longer hold - e.g. we don’t visit just one element of the array at a time.
While the example above is contrived, situations like this happen all the time. When thinking about introducing a new abstraction, try honestly answering a few questions:
How much of the code would unify under the new abstraction? Is the abstraction even worth it?
How much has the affected code changed in the last months? How much will it change in the near-term
Will we be modifying (or removing) the abstraction soon after introducing it?
How many potential use-cases will the abstraction prevent?
Programmers like finding patterns, but sometimes we haven’t seen the whole picture yet. Resist the urge to solidify something that is still evolving.
Less is more
Code is not an asset, it is a liability. A smaller and simpler codebase is easier to understand, improve, and extend. The more code you have, the more you have to optimize, debug, modernize, refactor, and understand. To keep the codebase small, follow these strategies:
Minimalistic Approach
Implement the smallest solution that solves the current problem.
Avoid expecting future needs and add functionality only when necessary.
Build advanced things out of simple building blocks like arrays and hash tables.
Pruning and Refactoring
Prune unused code paths actively.
Refactor out unneeded complexity whenever possible.
Never leave commented out code in the project, use version history instead.
Conceptual Complexity: An external library may not add code to the repository but it increases dependencies and conceptual complexity. To keep this complexity low:
Avoid introducing unnecessary abstraction layers and prefer simpler concepts.
Minimize the use of external libraries and prefer minimalistic one-file libraries.
Dependency on External Libraries External libraries can be tempting to rely on as they provide quick implementation of features. However, they are hard to modify and increase the maintenance burden. Consider implementing small functionalities instead of adding a library dependency.
Keep it simple
Simpler solutions are easier to understand, easier to reason about, and easier to modify.
What is “simple” can be discussed. From our perspective it means:
Fewer levels of abstraction.
Easier to understand completely (performance implications, threading implications, etc).
Easier to debug.
More straightforward and easier to follow logic.
Closer to the metal.
Examples:
C is simpler than C++.
&str is simpler than String.
[i32] is simpler than Vec<i32>.
Immutable is simpler than mutable.
Non-generic code is simpler than generic code.
Single-threaded is simpler than multi-threaded.
Sometimes complexity is needed. For example, you can’t get good performance on modern hardware without multithreading and some mutability. The point is to keep things as simple as possible.
Use a limited set of core data structures and concepts. Use established standards when they are simple:
Simple standards: JSON, Web sockets, UTF-8.
Complicated standards: XML, Corba, UTF-16.
Explicit is better than implicit
It is better when programmers can see what is going on than when it is hidden. Avoid doing cute tricks with generics and macros. Think about code that is easy to understand and step through in a debugger.
Design with performance in mind
The performance of a system is a part of its design. When you design a new system you should have clear performance goals in mind — how many objects the system should be able to handle, how many milliseconds it should take to update, etc. A system that doesn’t live up to its performance goals is not finished. Profile your code so you know where time is being spent.
Unit test where it makes sense
Unit tests provide the most value for complicated low-level code that makes the building blocks of your program, algorithm, or library. For example, you’d expect your hash table, memory allocator, and quicksort to be correct when using them to build functionality. When writing foundational building blocks like this, it is good to cover them with unit tests and property tests, when appropriate.
However, as one gets further up the abstraction stack, the code becomes less testable, and unit tests become more of an obstacle for changing the program to match new requirements. If necessary for confidence about the program’s correctness, consider using snapshot tests, input recording, and similar methods to quickly test end-to-end program functionality. Because these tests should be run regularly, if they fail, the error was likely introduced recently and should be easier to spot.
Deliver changes quickly
The path from implementing a feature to delivering it to the end user should be short.
Regardless of the benefits to end users, there are also many advantages to delivering changes fast internally. When we integrate often and quickly, designs can be validated earlier, bugs can be discovered sooner and everybody gets on the same page about the current status of the project.
To achieve fast delivery, we discourage long-lived branches. Users should be working against the main branch as much as possible. Instead of using branches, experimental and unfinished features should be protected by feature flags. Such feature flags can also be used for quick rollback if critical bugs are discovered.
Avoid coupling
Avoid complicated dependencies between systems. These make the codebase harder to understand and modify. It should be possible to modify, optimize or replace each system on its own.
Use abstract interfaces to access shared services such as logging, file systems, memory allocation, etc. That way, these systems can be replaced or mocked for unit tests.
Follow Data Oriented Design principles
Design your systems around data layouts and data flows first. Memory throughput is often the limiting factor of software performance. Make sure your data layouts and access patterns are cache friendly. Don’t abstract away the nature of the hardware.
Avoid object-oriented design principles. They encourage heap allocated individual objects with data hidden behind accessor functions. This leads to bad data access patterns and code flows that are hard to optimize and multi-thread. Instead, lay out the data in the most efficient way and then write functions to operate on that data.
Multi-threaded, job-based parallelism should be considered the norm for any high-throughput system.
Shared responsibility
The codebase should be the shared responsibility and pride of every developer that works on it.
If you see a problem, it is your problem. Don’t wait for somebody else to fix it. Fixing issues in parts of the code you’re not familiar with is a good way of understanding more of the codebase. Of course, reach out to the people who know that part of the code, to help you understand it better, and verify that your fix is the right one.
When you deliver a feature or a new system, you are responsible for the whole delivery. Design, clean code, documentation, testing, performance, knowledge sharing, etc. A feature is not done just because it works. When you go in to make a change, you should always leave the code in a better state than you found it: cleaner, simpler, faster, easier to understand.
When you are fixing a bug, don’t just fix the bug. Ask yourself what the underlying cause was that made the bug slip through compilation, unit tests, developer tests, and QA. Can you somehow write the code so that bugs like this can be detected at compile time? Can you add a unit test that detects the bug? Can you add an assert that prevents misuse of the functions? Can you improve the function names and/or the documentation to make misuse less likely?
When you are responding to questions, consider adding them to an FAQ in the project readme file or clarifying the code documentation, so that you won’t have to answer the same question again.
Write for readability
Code is read more often than it is written. Write your code so that it is easy to read and understand for a new programmer (or for yourself coming back to the code later).
Code should aim to be plain and straightforward. Avoid tricky constructs, trying to be clever and showing off your programming skills.
It is not always 100 % clear cut what is an “unreadable clever hack” and what is a “common programming idiom”. For example, consider this code for setting a bit in an integer:
flags |= 1 << FLAG_USE_WEIGHTS;
For someone not used to bit twiddling this may seem like a “hack”. However, for someone used to bit twiddling, this is just the standard way of setting a bit and adding an abstraction around it (such as a set_bit() function) is just obfuscating things.
Familiarize yourself with the idioms that are commonly used at qubu by reading the code.
Consider writing inlined code
Contrary to conventional wisdom, when writing a function and it starts getting long in terms of lines of code, consider whether it is really worth it splitting chunks of it to separate functions. There is not just prototyping value, but also clarity and readability value in the code being co-situated close together:
It is less confusing to step through the code in a debugger.
It reads top to bottom, without us having to jump to a different location in the file, or other files.
If a function becomes too large to navigate, consider introducing larger, paragraph-sized blocks of comments to visually demarcate various functional blocks of the function.
Naturally, there are conditions where a piece of code is a good candidate splitting to a separate function, e.g. if it needs to be called multiple times from various places, or the original function is simply too complicated and long.
This guideline is inspired by Carmack’s email to id software on inlined code (2) - go read that if you haven’t already! It is hard to come up with one-size-fits-all solution, so as always, use your judgement.
Source code should be formatted with standard formatting tools
Even if not perfect, it is better to have the source formatted automatically by rustfmt (or similar alternatives in other programming languages) to avoid boring and tedious discussions.
Commenting and Documentation
Exposed API functions need documentation comments, other code only needs to be commented as necessary. Write your code clearly and use sensible names to reduce the need for comments.
/// comments should be used for documentation-level comments in Rust, otherwise // comments should be used for all comments.
The documentation style in Rust should match that of its standard library.
Most documentation should live (or be generated from) a source code repository, except design documents that span and integrate multiple projects, or documents that are co-created with people that don’t have access to source code.
Single button builds
Building a project should be easy. Ideally just one click or one command-line call, like cargo build or ./build.bat.
Keep commit history linear
A more linear commit history makes it easier to tell what is going on in the project, use tools such as git bisect, or otherwise manually binary-search for a commit where an error was introduced.
The tools for keeping history simple are git rebase, squashing of PR commits on GitHub, and selectively using git diff and git apply (when we want to discard git’s metadata and start fresh).
(1) Our Machinery Code Guidelines was unfortunately taken down from the internet, but a copy can be found with Wayback Machine, or on our Google Drive
Join Our Team
Passionate about clean, efficient code? Eager to make an impact? We’re looking for talented engineers who share our values.