Abstractions are wonderful things, that is until they leak. At that point, I tend to wish someone didn’t spare me the low-level details.
Recently, I was tasked with developing a system that continually logs temperature readings from 12 hotplates. The plates’ use an RS232 communication interface, which is very easy to negotiate.
With only those high-level details available, I declared that the logging software would be an “afternoon in n’ out job” and created an appropriate design for the timescale / effort:
- Hard-code the 12 plates’
COM
/ttyS
port identifiers and output paths - Loop through each port/path pair
- Send the
GET_TEMPERATURE
request - Write the response to the output file
Job done.
Port identifiers can change.
Wait, who are you?
The god of “You don’t know enough yet”.
OK. Why would port numbers change? They’re physically wired into the RS232 box. It’s not like they will re-wire themselves overnight.
Port identifiers are a convenient abstraction the operating system, or your RS232 box’s drivers, provide to make it easier for your application to request handles to them.
Fine, Mr. Theoretical. That might be the case, but the OS won’t change port identifiers in any real circumstance.
Try plugging your USB-to-RS232 box into a different USB port.
Oh crap, some of the devices have switched around! Now my software is writing temperature readings from a different plate to the same file!
Yeah, about that, you aren’t actually writing temperature readings as regularly as you think.
Why not? The loop iterates through the 12 hotplates and uses a timer to synchronize the next iteration. The temperature reading is practically instantaneous.
Experimentalists regularly turn the hotplates off, especially overnight.
Ah yes, they do that sometimes, but I’ll just add a small timeout that will skip a measurement if a response does not come back in a timely manner. I’ll set the timeout to ~100 ms, which is way smaller than the measurement interval (1.5 sec).
The interval between measurements is now greater than 1.5 seconds, which is greater than specified.
That’s mathematically impossible! Even if 11 balances were turned off then the maximum delay between reads would be ~1100 ms, which is far below the interval.
Disk writing takes a non-negligible amount of time. Adding that time to your timeout interval pushes your cycle time to over 2 seconds.
Clearly, that disk is far too slow. I’ll install a new one.
You can’t. The experimentalists are pointing the output to a network drive, which is why writes can occasionally go slowly.
Fine, I’m sure they will live with a slightly longer interval if I explain the situation, at least the application isn’t skipping measurements.
Your application is missing measurements. Whenever the network goes down, your application either crashes or (at least) skips measurements.
OK, I’ll write a memory cache that holds onto any measurements that didn’t write to the output and then, when the network goes back up, I’ll flush the cache to the output folder.
Your application now has a memory leak.
OK, I’ll write it all to the local system disk—that surely won’t go offline—and then, each iteration, try to copy the data file to the network drive.
Your application now has a disk-space leak. Because you are copying the entire data file each iteration, your application now runs very slowly once the output goes beyond a reasonable size.
OK, I’ll keep track of what—specifically—didn’t flush to the network drive. I’ll also keep cache and output limits to prevent memory/drive leaks. Job done. Now that I’ve got a reliable output algorithm and a timeout for whenever the plates are off, this entire system is bombproof.
Just because the plates have sent a response in time does not mean they have sent the response you wanted.
That’s ridiculous! I’ve asked for a temperature reading, they respond with a temperature reading. Why would they respond with anything else?
RS232 is only a serial communication standard that specifies how signals are sent between your computer and a device. It does not provide a TCP-like transport abstraction that automatically deals with parity checks or flow control. Whenever your hotplate runs into a parity error, your application will encounter unexpected behaviour.
I’ll put parity & handshake checks into the code then. Now this application is surely done!
Mike wants a big green “Start Measuring” button.
Oh for fu-