Code Quality Tips Vol. 1
In the process of development it's important to stay on the cutting edge and be aware of how things are developing in our craft. Recently I've been researching one particular game dev code module. There were numerous open source solutions online, so I thought it would be good to start from there as rewriting from scratch seemed a bit excessive.
I started research and downloaded a dozen candidates. I had a list of desired features and tested them all one by one against it. To my non-surprise most of them didn't satisfy. A third of them didn't even work. A half of them were so barebones, it was clear somebody wanted to open source something to feel good about their work, but rewriting that feature set was non-issue. From all the tested solutions, only one had the feature set (almost) from the start, and I decided to dissect the thing. I didn't test any paid solutions, but I doubt situation would be much different: people make products for the reasons that they do, and most of those solutions are not clean enough to stand the test of time and not modular enough to plug seamlessly in the rest of the system even if they are made to look like they are doing everything and predicting every variation. In fact, the overly bloated modules tend to be the least extensible and „pluggable“, counter intuitively, even if the author wanted to make it seem like the extension does everything and more.
This is the problem I'd like to comment on in this article.
I started dissecting the before mentioned asset and started thinking how things are overly bloated, the methods are unclear; the mechanism of how the data passes through the system is unclear. This is in fact very common. I've had cases previously where I wanted to import someone's module because rewrite wouldn't pay time-wise and would discover the same thing. It seems like it works, but good luck if something doesn't. Previously I would rewrite the things after looking at the code anyway – I couldn't bear to have those kinds of inefficiencies in my system. And I did the same thing here.
I was able to rewrite 5250+ lines of code to 1300+ lines, coming to about a quarter of initial line count. I also avoided numerous bugs and purposefully removed 2 completely useless features (if you're curious, I would estimate those 2 features definitely fewer than 50 lines). We all know the lines of code are not the point, but they illustrate the amount of code duplication, architectural inefficiencies and confusion in numbers. I actually thought I did even better – I was sure it would come down to somewhere 5-10 times less lines, because of the clarity and separation that was present in my solution. I’ve had this before: my rewrites would come down to 2 -3 times less lines for the same feature set. This has nothing to do with coding conventions or use of space – it has everything to do with eliminating duplication.
So, you’re reading this article to get some tips to improve your work. Here are some things I’ve noticed off the top of my head:
- Literal duplication: most people now know that duplication is bad, but you’d still be surprised how much code is actually duplicated. The example of the examined module, most methods had to use the same set of nested IF checks before getting to the “meat” of the method. This could have easily been abstracted by a method that returns a bool, and just checking for that.
- Logical duplication: sometimes methods do similar things but with a couple of different arguments in places. This makes it easy to extract the method from there but most people don’t do it. Sometimes things are repeating logically across classes in a module, but the programmer never noticed. Sometimes new method does something a piece of a previous method does, but the code is already too bloated and large for a programmer to notice. I am certain I could have cut at least 1000 lines from the mentioned module just by removing duplication.
- Wrong logical levels within methods. This is a distinction I picked up from Bob Martin and it has served me well. It states that every line in the method must operate on the same logical level. This creates a very nice, readable style where a method calls other methods, and these do the required actions. Also, sometimes removing duplication will actually increase your lines of code. For example I sometimes have one-liners that are methods. The reason for this is they are on a lower logical level than the method they are called in. When reading that method, I would frequently have to lose focus of the bigger picture and focus on just what that line was doing, because it was actually writing how it was doing it and not what it was doing. Separating adds readability and is much preferred to comments. I don’t always do that, of course. But when reading that line becomes a tripping point, I will extract it to a method. However, most people’s code is such that most lines are tripping points, so they don’t notice.
- Too many comments and regions. Comment is a lie waiting to happen (I catch them sometimes) and an excess in regions mean your classes are probably not separated properly (or you got some bad style habits caught from people on YouTube). An internal method should rarely need to be XML documented. The naming, position and arguments should make it perfectly clear what the method is doing. Now, we all know that common sense is actually not very common, so most programmers will opt for form to feel like they are achieving something very pedantic, but the truth is comments (and that includes XML docs) do not make for good code. They make for more formal coding style, with equally bad code underneath. If you’re curious, the asset I talked about also wasn’t excessively commented, the line count problem was on logical levels. Instead, work on your naming skills and architectural separation skills.
- I walked though my own implementation and saw that most methods are 2-10 lines of code. Lots of classes are within one screen height, if not one then two. This is in contrast to 500+ lines methods you see in the wild. It is not a good thing.
- Poor performance. For example, I saw code that uses pre-fix increment inside FOR loop for faster operation, and then allocates every frame on the very next line, by concatenating strings that could easily have been cached.
- Not using new features, such as auto-properties. This needs a separate article.
- Not invented here anti-pattern. For example, the mentioned module tried to serialize and deserialize to string, while I used the Memento pattern and serialized it to JSON with a provided standard serializer. This manual deserialization took a lot of lines of code and made the whole thing a duplicated mess. I can only assume that at the time of writing Unity didn’t have the provided serializer like it has today, but even then some external way to deserialize was available. There are few bigger wastes of programmer time (and memory resources – side note) then parsing strings manually. Instead we should rely on tested standards, which preferably come outside the game dev world (and especially asset stores).
- Architecture not thought out. This is the biggest problem I see with other people code. It should be clear from the architecture of the examples how the thing is called, what it does and how it does it. This is not the case. If the problem is even remotely complex, you see all these logical errors and discover bugs just by reading the code. This is because an overly complex thing has been bolted together all in one go. Example: many methods were overloaded, however most of them were not used and I struggle to think of situations where these overloads will be used. I achieved the same effect with 2 methods what they did with 20+ overloads.
- Useless features that look good on paper. This ties in with the previous point. Some features look good on paper, but will not be used 99+% of the time, and even then could have been extended on the user side with less fuss. Classes that try to do everything under the sun are a problem: they complicate testing, are slower, are harder to maintain, are harder to extend in another direction.
- Chasing architectural purity. I am a big fan of OOP thinking, but disprove of chasing OOP purity. Sometimes for performance reasons we want a more data-driven style and that’s fine. There are situations where other styles are more appropriate, but those are all tools for the job. However, some people with too much formal education in programming sometimes forget that the goal isn’t to achieve idealistic purity of some kind, the goal is to achieve a result with the tools we have as engineers. There are no good reasons why chasing this kind of purity is a good idea, if you ask the proponents of such ideas “why” a couple of times in succession it will quickly boil down to personal reasons.
That should be more than enough for part 1, but in the meantime…
Consider my asset, Dialogical. Almost two and a half years since release, I have yet to be reported a bug in functionality. There were no bugfix updates, and numerous feature updates. I corrected one bug related to test scenes and Unity versions (reported by user) and that's that. And I have much improved since then. I am not going to say I write perfect code – I don't. But I will say you can't do that on random and the key to clean, reliable code is simplicity. Any kind of complication in the code is a future tripping point for somebody. Excess is bad.