Tuesday, 27 November 2012

Why programming takes so long

Some things that look simple can take a very long time.

Over the past week I have been working on something that is much harder than it looks, and I though it might be interesting to share the experience.

The requirement was to paste data in from Excel. This seems like a simple requirement: it is something we do every day, and lots of programs support it.

At a more detailed level, I needed to meet lots of technical requirements.
  • It has to support both the format used when you paste directly out of Excel (where values are separated by tab characters), or when you export a comma separated values (CSV) file.
  • CSV data contains lines of values separated by commas, with quotes around values that contain special characters. But there are lots of variations. For example, in continental Europe where a comma is used as a decimal point, CSV files can use semicolons instead of commas.
  • The program usually has to interpret the format of the incoming data automatically, but also needs options to specify what separator and quote characters are used.
  • The program needs options to ignore blank lines, or lines that are intended as comments.
  • Different technologies have different rules for new lines. Windows uses a carriage return character followed by a line feed character; Unix uses just line feed; and some Apple systems use just carriage return.
  • Pasted data and CSV files may have column names as the first row. The program has to cope sensibly if there are more data than columns.
  • The program needs to output data to programs which have different formatting needs. Some need cleaned-up CSV. JavaScript programs need the data in JavaScript Object Notation (JSON) format. Other programs need Extensible Markup Language (XML), optionally using column headings as the names of the XML elements.

What looked like a simple requirement has ended up complicated, and is a few days work even under ideal conditions. To develop the new component I used existing components as much as possible, particularly handling the XML and JSON output. The only thing I had to program from scratch was the logic to interpret the data. I have a lot of experience of this type of code. I am completely familiar with the development and test environment. But even under these near-perfect conditions, the component required 750 lines of code, 450 lines of test code, and took me 20 hours to develop. If it was an area I was less familiar with, or where I had fewer existing components, or where the development and test environment was unfamiliar, it would have taken me many times longer, perhaps around 100 hours.

Now that we have this new component, we can now use it to meet other seemingly more complicated requirements really quickly. It only took me about four hours to develop a new bulk emailer component with it (for sending out and following-up on surveys). And I can add that component to new solutions in a matter of minutes.

The time taken to add a new feature depends hugely on the components available and the developer's experience of the situation. Sometimes things that look simple take ages, and things that look hard take no time at all.

© Copyright 2012 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.

Tuesday, 20 November 2012

Problem density

Problem density could be a valuable new concept for measuring IT.

The term "problem density" cropped up in conversation with a colleague. I had not heard the term before, so I searched for it on the Internet. The term is used occasionally in healthcare to measure the the number of problems that patients have. It is particularly used to assess drug addiction, where the patient may have many problem associated with their addiction.

Problem density could be a good concept for us to use when measuring IT. Most of the measures that we use are measures of general "goodness", or of compliance to standards or defined processes. We tend to use percentages or maturity levels. These are OK, but it is easy to lose focus on what is important. The scales that we use do not emphasise the situation well. What does 50% compliant mean? How much worse is maturity level 2 than maturity level 3? Bounding the numbers in a range (0 to 100, or 1 to 5, or whatever) does not give enough emphasis on what is really important.

Measurement of the number and severity of problems would give much easier-to-understand figures. Saying that one situation has a problem density of 1 and another has a problem density of 10 is easy to understand. Although it is an arbitrary scale, you can see that one situation is ten times worse than the other. It is also more accurate. We know that some situations give us hugely more problems than others, which is difficult to capture on a 1 to 5 maturity scale.

Problem density can be applied to both business and IT situations. We could devise scales for aspects of IT, such as measures of systems, projects, services, suppliers, and so on. But we could apply it to business situations too. We could measure business issues, such as task complexity, data errors, operational problems, rework, and so on, by devising a problem scale.

Solution density is the corollary of problem density. When we implement new solutions, whether they are procedures or systems or a mix of both, we can measure the breadth, complexity and variety of problems that they are intended to solve.

Comparing solution density to problem density would be a really useful analysis. IT solutions are intended to address problems, but they also introduce problems. Taking a ratio of the problems they solve (solution density) to the problems they cause (problem density) gives us a really useful management tool for identifying what systems are worthwhile. Looking back at my time working as an integration architect, this would have been really useful. I could have used it to illustrate how a lot of integration solutions have a solution to problem density lower than one, meaning that they cause more problems and complexity than the problems that they are intended to solve.

I will try introducing problem density into the IT measurement work we do. It will be interesting to see whether problem density is a more useful management tool than the measures we usually use.

© Copyright 2012 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.

Tuesday, 13 November 2012

The ideas amplifier

Computers don't just automate our ideas, they make them bigger.

I have often wondered why computer systems work. I have been developing a new data load feature for our software. It involves a fairly complex mixture of data management, dependency resolution, change detection and version control. I have designed it carefully and am testing it as I go. But how do I know it will work when I have finished?

I have had this thought many times before. I have written lots of computer systems and by-and-large they work. They do what I want. The design and programming achieve the aims set out for them. But, philosophically, why is this? What is it about computers that allow us to translate our ideas into machines that meet our requirements?

In part, computer systems work because of the sort of people we are. IT attracts people with systematic, consistent minds. We work over problems in our heads and find ways through. We translate those methods into computer programs, and out come working computer systems.

It is not just the sort of people we are. I was trying to mend a garage door the other day, and only succeeded in jamming it and needing to take a hammer to it. I might have a mind to get computers to do my bidding, but not the physical world. In what way are information systems different from physical systems?

The nature of information partly explains why computer systems work. Computers store, process and move information, and information is a much more flexible and forgiving medium than garage doors. We can fiddle around with the rules for storing, processing and moving information until we get a recipe that works. But there is more to it than that.

The final part of why computer system work is their ability to repeat our recipes tirelessly. Computers can consistently and efficiently repeat the instructions we give them, and so apply our ideas to much larger problems.

This amplification allows us to develop small programs that do big things. In my example load program, I need to synchronize large, interconnected structures across two instances of the software. But although that is a big requirement, I only have to think through the rules for a single piece of data, and can rely on the computer to apply my recipe reliably to large and complex data sets.

As well as helping us deliver technically, computers allow us to amplify business ideas and exploit them in whole new ways. Take Twitter for example. There's nothing particularly clever about noting down a few thought for other people to read. But by using computers and the Internet, this simple idea is amplified into something which is not only much larger but different in kind. The simple idea of writing down thoughts has turned into a responsive and inclusive global news and opinion system.

When we think of how computers add value, we need to think of this amplification effect. I often write how computers are just machines that store, process and move information. But maybe I am guilty of oversimplification. Their ability to do this quickly, reliably and on a large scale gives a whole new dimension. Computers don't just automate our ideas, they make them bigger.

© Copyright 2012 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.