It’s tempting to believe that having a flexible file format for a system’s configuration is enough to ensure forward-compatibility and flexibility as the system grows. The thought process is that with a suitable data format—for example, XML or JSON—extra information can be added without breaking existing systems. For adding new fields into the data, this is true. However, if you don’t engineer additional flexibility into the fields themselves then you may end up needing to make breaking changes anyway or, worse, end up with a much more complicated data file.
Imagine you are implementing an ecommerce server that exposes
products. The spec for the server stipulates that there will only be a
very small number of products. It also stipulates that rebooting the
server to refresh the system with new products is fine. With that in
mind, you add a products
field to the server’s configuration:
port: 8080
products:
- name: Acme Explosive
- name: Fake Backdrop
One day—and this day always comes—a change to the specification
comes: some products should be hidden on the site. To implement that
feature, you add a hidden:
field to the product. It is assumed that
if a hidden:
field is not defined for a product then it should
default to false
. This is backwards-compatible and easy to implement
given the extensible YAML format being used:
- name: Fake Backdrop
hidden: true
Later on, it emerges that product-hiding should not be binary. It
should depend on the group that the user belongs to. To implement
that, you add a hidden_to:
field. However, to allow for backwards
compatibility, it is enforced that hidden_to
s entries take priority
over the hidden
state. Side-rules are beginning to creep in:
- name: Fake Backdrop
hidden: false
hidden_to:
- roadrunners
- unregistered-customers
As the system gains popularity, it becomes increasingly necessary to
be able to add products to a running system. It is decided that
products should be held in a database rather being loaded from the
configuration file. The configuration file, therefore, should support
providing database details as an alternative to products
. However,
backwards compatibility should still be maintained to prevent legacy
deployments and tests from breaking:
port: 8080
use_database: true
database:
url: //localhost/database
user: productapp
pw: password123
A heuristic is adopted that if use_database
has a value of true
then the server should load products via the database defined in
database
; otherwise, it should load them via products
.
The sysadmin then mentions that it would be useful to allow the server to listen on multiple ports, some of them being HTTP and some of them being HTTPS. Marketing also wants to integrate email updates into the server. However, one week into developing email updates, they also mention that some systems may need to send tweets instead of emails. Your implementation strategy follows the same logic as previous development efforts:
port: 8080
other_port_configurations:
- port: 8081
protocol: http
port: 8082
protocol: https
use_database: true
database:
url: //localhost/database
user: productapp
pw: password123database:
send_tweets_instead_of_email: false
email:
host: smtp.app.com
port: 442
username: productapp-email
password: thisisgettingsillynow
This process of introducing additional data that is handled by special-case heuristics is a common pattern for extending configurations without breaking existing fields. However, it can become quite difficult to keep track of all the different rules that arise out of this gradual growth in complexity. Not only that, the resulting configuration file can be difficult to work with.
What would make this process a lot easier would be to extend the
actual fields themselves. However products
, for example, is defined
to contain an array of products. The only actions that can be carried
out on products
are to add or remove products.
The solution to this is to to ensure that there are extension points at major parts of the configuration hierarchy when it’s initially designed. Relegate primitive values (which can’t be extended later) deeper into the configuration hierarchy and have clear extension points.
An example of adding extension points would be:
products:
product_data:
- name:
value: Acme Explosive
In this design products
, product (within product_data)
, and name
could all be extended later with more type switches or other
information to allow them to be loaded/handled differently. Applying
this nesting concept to the configuration at each step of the above
scenario would result in something like this:
server:
application:
port: 8080
protocol: https
admin:
port: 8081
protocol: http
internal:
port: 8082
protocol: http
products:
source: database
database:
url: //localhost/database
user: productapp
pw: password123database:database:
product_notifications:
type: email
settings:
host: smtp.app.com
port: 442
username: productapp-email
password: thisisgettingsillynow
There would still be some heuristics involved with handling missing
keys, defaults, type switches, etc. However, the hierarchy between
features is maintained and there are now no sibling heuristics. As a
result, the config-handling code can be modularized to then read each
key in a configuration in isolation. The products
in this improved
example could be handed to
.handleProductsConfiguration(ProductsConfiguration conf)
, a function
that now doesn’t also need to receive sibling use_database
and
database
keys.
A strict tree hierarchy maps cleanly onto polymorphic systems and is
surprisingly easy to implement using libraries such as jackson
(java):
@JsonTypeInfo(
use = JsonTypeInfo.Id.CLASS,
include = JsonTypeInfo.As.PROPERTY,
property = "source")
interface ProductsConfiguration {
<T> T accept(ProductsConfigurationVisitor<T> visitor);
}
class DatabaseProductsConfiguration implements ProductsConfiguration {
@JsonProperty
private String url;
@JsonProperty
private String user;
@JsonProperty
private String pw;
@Override
public <T> T accept(ProductsConfigurationVisitor<T> visitor) {
visitor.visit(this);
}
}
The above is compile-time verified, type-safe, and involves no
conditional logic. Completely new product sources can be configured by
defining a configuration that implements ProductsConfiguration
and
updating the relevant visitor. Clearly, this implementation requires
more up-front engineering than the original design. However, for
projects with changing specification (i.e. most projects) allowing for
flexibility pays for itself.