Software Engineering vs The Alternative

In my decades of work as a custom software developer, I’ve concluded that engineering is a good idea.

Engineering is what happens when requirements are distilled into the simplest, most elegant, fast and beautiful machine that enables long-lasting performance. If you just manage to get something to work through repeated fiddling, it probably shouldn’t be called “engineering”. Engineering requires a plan and process, projecting into the future, knowing the steps in advance, and reliably achieving the results.

A lot of software isn’t ever engineered. It’s as if a person built a bookshelf by sticking a board against the wall and, finding that it is not stable, propping it up with something else, and just going on adding things until it holds books. Clearly, it would be better (and usually faster) to draw it on paper first, calculate the amount of wood needed, go to the store once, and successfully build it once.

The “think ahead and do it once” mentality is standard with building bookshelves, and in the construction industry in general, but seems to be mistrusted, maybe even going out of fashion in software development. This appears to be occuring on two levels: Firstly, programmers are being asked to work above their skill level, likely due to a shortage of skilled labor, and understandably they are overwhelmed by the task, and try to teach themselves on the fly, resulting in a need to code experimentally. Secondly, I see that there appears to be a backlash against the very idea of planning ahead, as if writing specifications is hopelessly old-fashioned and bureaucratic. It feels as if the custom software development field has collectively given up on the idea of apprenticeship and mastery.

Imagine two roofing contractors: Chris knows exactly how to do it, she measured and cut correctly the first time, while Bob approached the task with uninformed guessing, was caught off guard by every new situation, and ended up wasting materials and taking longer. The company that employs Chris, the master craftsperson, is going to make more profit and do better, faster work. The software industry tends to operate like Bob, with the differences even more pronounced. Software created by an actual engineering process yields better results in less time.

People routinely make the comment that I am fast, but then when I recommend detailed planning steps, they sometimes say they “don’t have the time” for that. The reality is that I am fast because of doing those steps.

The Engineering Phase in Practice

In terms of day to day work, the pure engineering phase occurs between the steps of determining the requirements and coding. After establishing what the system should accomplish (the topic of another article), but before a programmer jumps in and builds it, the programmer needs to know how he or she plans to build it. In my practice, that middle phase consists of these separate disciplines, each of which results in a milestone document, or deliverable:

Prototyping – Learning what we don’t know yet
Architecture – Identifying the system components and how they interact
Technical Design – Specifying the functionality of each component, to the point at which the coding becomes obvious.
User Interface Design – Writing out the screens, windows, menus and other visible elements and how the user gets around

Once the above are written-out and go through the approval process, then it is possible to build a system and get it 90-95% right on the first pass. In my experience, if I design out a system with 50 database columns, I will likely need to add or change only one or two of them by the end of the project. The point is not to be perfect, but to shoot for the level of accuracy in the design that minimizes the total time. If you shoot for 100%, you will spend a significant amount of time on design and analyzing every detail in advance, which takes more time overall. On the other hand, if you only shoot for 80% accuracy or less, then the design has too many holes and the coding takes much longer.

The purpose of the pre-coding work is to be convincing, mainly to yourself and the people building it, not to achieve a standard of process set by some industry group. You know the quality is there when another engineer can look at it and say, “yes, this will work as you described it.” If other engineers do not understand what you have written, then the architecture and/or the explanation have not been sufficiently created.

Prototyping

Prototyping is coding experimentally to determine if something will work, or finding the best way to do something by jumping in and trying it out. It’s often very helpful, but the problem with prototyping is when it becomes the entire methodology; then engineering was not taken place.

Its fine to steal prototyped code and put it in the final product, but it is often buggy and time-wasting to morph a prototype into a final product. The reason for this is that prototyped code usually lacks proper layering, encapsulation and documentation, and once it is written like that, it is tedious to apply those constraints later. Since you started without knowledge of what shape it would take – how many classes and what their purposes are – as a prototype, it is likely written in an overly monolithic manner..

Remember the purpose of prototyping is to prove something that you didn’t know before, not to complete the functionality. So, stop coding as soon as the original hypothesis is proven or disproven.

Architecture

Software architecture deals with black box components, interfaces and persistence. The architecture document contains a definition of each of the parts, which includes anything that is persistent, executable or a message between components.

As a rule, I include the architecture step no matter how simple it is. An Android game that doesn’t connect to anything has only one component, and that’s the whole architecture. It is worth stating that fact, even though it is simple.

For more complex projects, I start with a sketch on paper. I draw a box for the database, main processes, background processes, web interfaces configuration files and I ask the following questions:

How are these things connected?
How can I minimize the number of components and minimize the number and size of the interfaces?
Can I eliminate configuration?
Can I make the dependencies linear rather than complex or circular?

At the architecture level, you don’t need to think about what the components do in any detail; at this stage you call them a black box. Here are examples of statements in the architectural language:

“Application server – handles all user sessions”
“Secondary database – consists of de-normalized transactional data detail and summaries to support analytic reporting”
“Background services – always-running service that copies data into the secondary database, sends email, and cleans up”
“Client configuration file – contains the positions of windows and theme settings, written each time the client exits, separate for each user; optional, if missing, defaults are used”

Notice that none of the contents are defined, but there remains specificity in the reasons for the component’s existence, and who creates and accesses it.

As architects we need to think about all components, no matter how small. A little file that stores some icons, for example, is a small detail in a big system. It is however, an architectural choice, because by allowing it to remain, you have made the decision that it doesn’t belong in a database, and then you have to determine when it can be changed, and if processes need access to be able to read or write it. If you gloss over these kinds of details in a large system, pretty soon programmers will start creating all kinds of components on the fly, and the system becomes too complex for that reason.

Programmers also need to think about all interfaces, no matter how small. Suppose the user sessions are in a web browser that talks to an application server – that’s an http interface (which is all you need to define at this stage). The application server may have multiple interfaces to databases, files and other systems. Each thing that talks to another thing needs to have its language stated. For example, if the application server makes sessionless SOAP requests to a legacy system, if possible, design all interfaces one-sided, where the client always initiates; introduce more complex techniques only if you cannot do it the simple way.

Ideally, we want a linear series of as few components as possible, for example, browser to application server to database. Another example is local database to desktop app to central database. If any components are connected multiply or cyclically (two parts accessing the same resource, making a triangle in the diagram) or in duplex (initiation coming from both sides), you should have a good explanation as to why that’s necessary.

Most architectures can be expressed on one page. A complex enterprise system may take five pages. Another engineer should be able to look at an architecture in its written form and confirm that it will work by intuition.

Technical Design

I know people may think of “design” as just visual, but I am in the habit of using the word to describe the design of classes and data structures, also known as specifications (which is another word that means different things to different people).

A common misconception is that tech design is just “more detailed requirements”. While it is true that sometimes detailed requirements occur in a stage after high-level requirements, the specification is a completely different thing. The order is different and the audience is different. While requirements are for non-technical readers and state items in the order that non-technical individuals think of them, technical design is stated in the order of building the product and is written for coders. Often a requirement is met by statements in the design document that are spread out all over the document.

Here are examples of statements in the technical design language:

(Defining a database field) non_cash, decimal – The tax deductible amount of the value of an in-kind donation. Noncash items are only recorded with a value if it has resale value, and this is the tax deductible amount, not necessarily the full amount.
(In the process model) If “in-kind” was checked but non_cash is zero, clear in-kind checkbox.

Notice in those statements, the business logic is stated unambiguously and nothing is left to the programmer; yet it is concise and readable. The time it takes to think through how the data model and process model interact, when working in a design document, is much less than the time it takes to figure it out while coding.

Divergent Labs’ Tried And True Method for Writing A Software Design:

The entire process is done in a word processor. Don’t distract yourself with design tools.
Design each architecture component separately – one main heading per component.
Start with the obvious parts of the data model – usually database tables and columns. If the requirements were well written, this should flow pretty cleanly from there. A statement in the requirements such as, “track recent contact dates for each client,” results in tables and columns that store that information.
Make a copy of the requirements and paste them at the end of the file. Once I am confident that the design in progress meets a particular requirement, I delete it; so gradually the document changes from requirements to design points.
Make notes about details to come back to. I use square brackets, so I can easily find the notes. For example, if the requirement is to email the client the first time they log in, then I might write in the section describing the login: “If this is the first login [make flag for this] then send email with subject line…,” and so on, defining exactly what the email contains. The note in brackets identifies that I must come back at some point and make a database column for that purpose, as well as make a note about where it gets populated. Using the square bracket approach, I can think about one thing at a time and not be distracted with connecting points. Additionally, this tool ensures that I do not lose track of those other points. In the example mentioned, I can think about all the aspects of sending the email and get completed, without being distracted by the aspects of determining whether it is the first login.

Much of the art of design for a typical business application is going back and forth between the data model and process model, adding the persistent storage to the document, and adding specifications for how that field is accessed and changed.

A well-engineered product can yield deceptively simple results, which can ironically make it appear less valuable. As a simple example: if a requirement existed, stating that customers can query their account balance online, and the engineer distilled this into a product that just passively shows the balance without any special “query balance” function, then the result is less (less code, less behavior, less testing) than someone may have originally conceived, and it’s better. Many small bits of distillation can make a huge savings in complexity. Engineering has a cost, but if a programmer can get a more elegant, more concise and more generic result, in most cases, the program will cost less in total to build. Even a talented and skilled programmer will not achieve the process of distillation unless he or she sets out to do engineering.

The technical design document can become the living programming documentation after the product is built.

User interface (UI) design

Depending on your customer, the only two items they might care to read are the requirements and the UI design. Customers want to confirm that the system will meet their stated need and how the system will appear, but they won’t be concerned with architecture and design beyond that. From an engineering perspective, the UI design isn’t usually a necessary document since all the decisions about UI flow will be baked into the technical design document. This document is readable (while the technical design is only readable by coders). The UI flow is a way to get buy-in on the technical design while bypassing the need to teach the customer (or boss) how to read it.

Depending on the project, I write the UI flow design in different ways. The UI flow design can be in written words, rough sketches, fully designed screens or even prototyped. Often a programmer need only o go as far as the identified interests of the customer; for example, the customer may not care how the administrative setup screens are designed.

A lot of people think that software “design” starts and ends with the UI, but ideally, to get the best result the UI flow is an offshoot of the technical design after completing the requirements and architecture. In the past, I have dealt with customers who deliver a UI flow to me, in which they call a “design”, and it actually represented a convoluted expression of their requirements. For example, one customer submitted a UI flow design with a “New customer” button on their picture, and, in a way, they were stating through pictures that a requirement exists in which a user could create a customer. In addition, the customer, through their UI flow design, was also indicating how they want the button to appear, and that the user should be able to create the customer from the screen in question. Because the process is tangled up (requirements with appearance and functionality), the acceptance of the customer’s UI flow design would have yielded poor results. This last example illustrates why the proper engineering sequence is so necessary. Using the prior example, if I were to have placed the New Customer button where they had indicated, then I would have been required to disable the function when a user is not able to create a new customer In addition, if I had followed the customer’s UI design flow, the function would have been inconsistent and subsequent functions created would not be present in that same location. I approach a situation such as this by disentangling the requirements from the visual concept, so that they are stated in requirements language, such as “User can create a new customer while entering an order”. After requirements are approved, I do the technical design and UI design together, thus allowing the manner in which a new customer is created (as an example) to become more elegant as opposed to the initial manner in which it was initially envisioned by the customer. One way of doing it more elegantly is to use a “+” button next to the list of customers, and use this pattern for lists of any data type. A list is only shown when you need a customer, so it is probably a better place to put that functionality. Hundreds of details being coordinated like this yields a much better result, technically and for the user.

The more general issue in UI design is that engineers are responsible for making sure form follows function, while most everyone else starts with form, or views form and function as inseparable. In my experience the best software begins with clearly stated requirements, progresses through technical design, and the technical design yields its final appearance. It is most intuitive for the user when they are interacting more closely with the actual data structures, such as when a window contains all the fields in one database record.

When the engineering process fails, it is typically because a preliminary concept of the UI ends up controlling the resulting product, and the programmer then has to figure out how to fill the holes. For example, suppose the original visual concept requires that the customer payment terms appear on the screen where the user enters a customer payment. This is a requirement because sometimes customers renegotiate terms at that time of payment. This may not be standard, but businesses often require specialized procedures such as this. If the picture is drawn up, but that concept never makes it into requirements language, then it will not transfer into the technical design. If the coder is responsible for programming the payment terms into that specific location, the programmer will not know if the terms are separate per payment, separate per invoice or just one per customer. In addition, the programmer will not know if changing the function in the specific location will change it for all open invoices, or just future ones. The programmer may make a choice, and then the user may not understand how it works. If the original picture had been converted to requirements and then discarded (as it should be), then the resulting screens would be more elegant, more cleanly defined, and there wouldn’t be any last-minute hidden logic to confuse the user. A possible way to handle it may be a link to the customer record on the payment screen, and then it would be clear to the user that he or she is editing the terms for the customer as a whole. The key point is that the best UI design is often not what is originally envisioned.

Sometimes I create the technical design first, and then use the resulting flow to create the UI design. It is important to note that the two documents need to stay in sync. If there is a change in the UI design, I go back to the technical design where I make change first, and then update the UI design accordingly. This approach keeps the elegance intact.

For other projects, especially if I am less familiar with the user base and what aspects are important to the customer or if the customer is very particular about visual layout, a UI design could come first, which acts as a test of the requirements. Once that is approved, the technical design can take it into account. It comes down to determining what efforts will take less time overall for the whole project.

Step Size

Step size is the difference between developing a tiny bit of the desired functionality at a time, versus making a gigantic leap to a complete product with all desired functionality. To me the answer to that is mainly dependent on the level of mastery of the engineer; if the person can design a large product all at once, then there is no reason not to design it all at once, and you should build it in one project. But if you are attempting a bigger step than you can really do, the design will have holes in it, and you’ll end up with the non-engineered approach. Instead it is better to do enough prototyping to learn what you don’t know, and then you will be able to write out a solid design.

Overall, you know the project size is right when you look at it and you know you can do it; there’s no guessing that you can probably do it. If the specs have loose ends that are meant to be figured out “later” then the step size is too big. In a properly scoped engineered project, there is no “later”, everything is done the first time.

An engineer can train herself to think big by taking advantage of the built-in fractality of engineering problems. A problem-to-solution thought process on a large scale can be divided into problem-to-solution processes on smaller scales, therefore, engineering is a time-wise fractal pattern. In other words, you are using the same kind of thinking regardless of the level of abstraction that governs the problem at hand. The fractal advantage is learning ways of thinking about solving problems and being able to apply them to different scales without having to consider scale at all times. If the engineering process is hurt by being overwhelmed by the size of the system, there’s likely an encapsulation, or black boxing problem at the architecture level. Once you define architectural components and, possibly sub-components, then the fractal advantage kicks in and you can focus on solving each problem separately.

Upgrade Engineering

Most of the work in a product life cycle is upgrades and maintenance. Upgrade engineering is the same thing as engineering a new project, but with these vital differences:

Upgrades have the added complication of analysis of the existing code to determine how the new feature or fix will fit in.
Sometimes upgrades are dealing with the unknown and have to defer some decisions, especially if the existing code is poorly organized or undocumented.
It is often more difficult to maintain the original elegance of a product when adding onto it, so upgrade engineering requires more skill.

The art in upgrades is to see the product as layers. Imagine a foundation and structures built on top, such as a house with a foundation and ground floor and upper floors. If you want more rooms you have to expand the foundation first. A common mistake when the need arises to add a feature that is not supported well by an application, is adding at too high of a level. This makes the code look like a house with a small foundation and extra rooms built on the second floor that stick out over the back yard. I look at it as a problem of the base not being flexible enough to anticipate a new requirement, so the base must expand first. When there is a bigger base on which to build bigger structures, then the whole system maintains its elegance and integrity.

Case studies

At one point, I took on a project that, in retrospect, was too big, and the project played out like other programming horror stories that I had read – the project was behind schedule and involved many rewrites of the same code. It took me a long time to figure out what went wrong, until I finally realized that I hadn’t followed my own best practice in software engineering. Somehow, I psychologically fell into the trap of feeling overwhelmed and unsure of how to proceed. I jumped into prototyping, and tried to morph prototypes into the actual product, all the while thinking that I will take care of major issues “later”. So with that experience of going against all my own advice, I understand how hard it is to be rigorous in the process, but also I have a more strongly confirmed conviction of how important it is.

I often see other products from the outside and they reveal the engineering process that led to poor functionality. Here are some examples of spectacularly terrible bugs that are not easily fixed because they are failures of engineering rather than of coding:

Windows Explorer shows icons for each file corresponding to the file extension. A long time ago, it didn’t do this, and when the feature was added, the drop in performance was striking. It’s clear that the lowest level (the foundation of the house, the file system in this case) did not support the concept of icons as part of its language. There was a failure of upgrade engineering here, since they didn’t expand the base first to support the feature. Instead they wrote on-the-fly code that looked up the application associated with the file name extension, and went to fetch the icon from that application; or load it from the file itself (for graphics files). This is extremely slow, and the user can observe it churning during this process, even 15 years later. Besides the performance problem, it does not show true information, it’s just guessing by the file name what kind of file it is. A non-technical user might think by renaming a .txt file to .pdf, and observing that Windows changes the icon, that they have actually changed the type. A better solution to all of this (assuming we accept the requirement of showing icons) would be to define file types at the file system level, and require any process that writes files to declare the type. Then the file system can quickly index the types and provide icons, and the result would not only be faster, and would show accurate information.
Skype cannot accurately remember whether the user was alerted to a missed call or not. This bug has existed for years, and my best guess is that it is an architecture problem related to the storage location and ownership of that persistent data. When the same account is used on different devices or the network connection is changed on one device, Skype forgets the alert status for each call. Similar to the Windows example, this appears to be an upgrade engineering problem, in which the first versions of Skype did not support multiple devices, and at the time the feature was added, Skype did not expand the base appropriately. Presumably if you use multiple devices, the missed calls should be alerted on all devices, but after you acknowledge the alert on any one device, it should clear the alert on all devices. It may be sending the alert status from the servers in the down direction only (server to client), and failing to send it up, because originally there was no need to send that data up.
In healthcare.gov, the requirements were pretty well known because they were mainly written into law. This is a system that could have been built in one pass. They spent a half billion dollars on a system that has as many data points as some systems I’ve designed by myself or on small teams. It would have been ridiculous to just say, we don’t know how to engineer software, and instead, we’re going to ’grow’’ the system organically over many iterations, in the general direction of the requirements. But there is a lot about the final appearance (and the fact that it is riddled with bugs) that suggests that hardly any engineering took place, or the process as a whole was not engineering driven. An example obvious to the user is that it shows a message saying, “Please Wait” after every single field is entered. If someone intentionally designed a system that asks for your first name, and then sends data to the server while you wait, then allows you enter the last name, and so on, that idea should have been nixed in the review process. Clearly no one intentionally designed it that way.

The engineering approach creates value for the customer because it operates within a timeline and cost that can be estimated and adhered to. Insist on it!