Minimal IT: March 2011

Tuesday, 29 March 2011

Caveat emptor

We may think that IT is often mis-sold, but the real problem is that IT is often mis-bought.

One of the great ironies of IT is the contrast between how we think IT should be used and how IT is sold (and bought).

We think that IT should be used in a rational way. We analyse and prioritise requirements. We model business processes and data and draw up enterprise architectures. We involve stakeholders. We have detailed methods for project management and systems development. IT is a rational, systematic process designed to deliver to the common good of the organisation.

The way we sell IT is totally different. Good IT sales people, like all good sales people, find the senior executives who have money. They find out what problems they have, and what makes them tick as individuals. They grow the problems, and propose solutions that appeal to their individual agendas as directors and senior managers. Like all sales, IT sales is an empathetic process that delivers to individual agendas.

In this respect, IT is the same as any other industry. However, the intricacies and impacts of IT are so great that the way we sell IT, or more specifically the way we buy IT, is particularly dangerous.

You can see this problem just internally within organisations. Many IT departments see their role as "selling" IT into the rest of the organisation, and those who are good at it focus on meeting the individual agendas of the senior executives. This can lead to very political projects which have been agreed at the highest level, but where infeasible expectations, timescales and budgets have been set.

The problem is worse when purchasing solutions externally. Despite the procedures and good intentions of the IT department, many IT solutions are sold to senior executives without an appropriate level of diligence. As a vendor, you can end up implementing solutions that make no sense for the customer either technically or commercially, because that is what has been sold.

Imagine, instead of IT, that organisations were purchasing something else of similar cost and complexity. Maybe an aeroplane. In many respects, aeroplanes are simpler: they are already designed, there are stringent regulations about how they work, and having one does not change how everyone works in your organisation. A major IT solution is more like designing of a new type of aeroplane in which to fly all your employees around. Does the way we buy IT really do justice to something that is so technically demanding and that has such far-reaching consequences?

I do not blame sales people for these problems. The problems lie with how organisations purchase IT. Two things need to be in place.

First, IT specialists within the organisation must understand their role. Your job is not to sell IT into the organisation. Your job is as buyers, providing insight and expertise to the organisation on the acquisition of IT.

Second, decision makers need to take decisions more carefully. IT is more expensive, complicated, risky and specialised than nearly any other purchasing decision, and will have impacts way beyond your immediate needs. Beware of the salesman who understands you and really knows how to help - your colleagues are the ones to trust.

© Copyright 2011 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.

Tuesday, 22 March 2011

AntiSamy

AntiSamy is a very effective open-source library for making web sites more secure.

Cross-site scripting (XSS) describes a broad category of web site security problems, in which malicious code is inserted into web pages. The malicious code then runs on the user's browser under the same security profile as the original website. This can allow, for example, the malicious code to steal the user's password or cause other disruption.

XSS is a serious problem, and accounts for the majority of web-based security threats. One of the most notorious XSS attacks was the Samy worm inflicted on MySpace in February 2005, which infected 1,000,000 MySpace users in less than 24 hours.

The most important part of avoiding XSS attacks is to prevent users from adding JavaScript to pages, so that they can not get code to run in other users' browsers. For most websites, this is not a problem, because users do not need to enter any data. However, if sites allow users to enter content to be redisplayed on the website, such as comments, then there is a potential for problems.

There are various approaches to preventing users from adding JavaScript to content.

The easiest is simply to disallow users to enter anything other than plain text, and use standard HTML escape codes to represent any special characters in the content. This works well, but it means that your users can not enter any additional formatting on their content.

Another method is to use your own set of limited formatting codes, for example using simplified "bulletin-board code" markup such as *bold* and /italic/. This can work, but the codes are non-standard and it is difficult to do well.

Another option is to allow users to enter a limited set of HTML markup, and then to filter the HTML that they enter to remove any JavaScript or other malicious code. However, as the technical explanation of Samy demonstrates, there are all sorts of ways to get around filtering.

This was the background to AntiSamy, part of The Open Web Application Security Project (OWASP).

AntiSamy is an open source code library which you can add to web applications to filter user-provided HTML content and remove intricate XSS attacks such as Samy. The main version is written in Java, and a version is also available for .NET.

You only need a few lines of code to add AntiSamy to a web application. It takes a string of text as input, mends invalid HTML, and removes everything other than the allowed markup. It returns the filtered HTML as a string or as XML. It returns friendly error messages to help users understand how their input has been interpreted.

AntiSamy is configured using a policy file which describes what markup should be allowed. A selection of pre-built policy files are available.

The documentation for AntiSamy is brief, and I found it took a while to work out what other libraries it depends on. However, the code works very well, it is easy to use, and it is fast.

If you are responsible for websites that to take formatted content from users, then I recommend you look at AntiSamy.

© Copyright 2011 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.

Tuesday, 15 March 2011

Removing data constraint

Breaking data down into a simpler form can overcome constraints on how data is accessed, stored and structured.

A couple of weeks ago I wrote that the technologies that support the semantic web could be very significant for mainstream IT.

The semantic web is an idea championed by web pioneer Tim Berners-Lee. The basic idea is that the web can contain data that can be read by computers, not just pages of text that can be read by humans. The vision of the semantic web is that computers can perform much more meaningful queries of the web, though better understanding of the meaning of the data.

What interests me most about the semantic web is some of the technologies being used to develop it, and how these could be applied in other ways.

One of the chief technologies is Resource Description Framework (RDF). RDF is conceptually simple. In a conventional database, data is arranged in tables, in which each thing is represented by a row, each type of data as a column, and each data value as a cell. RDF breaks this down further into triples, in which the first item of the triple identifies the thing, the second the data type, and the third the data value. Each thing is represented by a uniform resource identifier (URI), which can be a web address. Each data type is also represented by a URI. The value can be a textual value, or a URI which identifies another thing.

This way of arranging data has very interesting characteristics.

Because each piece of data has a web address, it can be referenced from anywhere. If you want some data, you do not have to be sent it, you only need to know its address. These technologies could, for example, be used to publish reference data from enterprise systems to departmental systems. These technologies allow this to be done in a simple, consistent way, using only web access technologies. It could overcome many of the problems with other methods, such as database sharing and file transfer.

Because the data structure and access methods are consistent, this approach overcomes constraints of how data is stored. For example, it would allow data stored in a file archive to be combined seamlessly with a current view from an operational database, or with publicly-published reference data, without having to merge them into a single database.

Perhaps most interestingly is that breaking data down into triples lets you build very rich data structures. It can achieve a level of data polymorphism or sub typing that is difficult to achieve on a relational database. It can deal with data and data-about-data in the same structures, creating self-describing data. You can build data structures that are more flexible and less constrained.

These approaches remove constraints on how data is accessed, how data is stored, and how data is structured. This could overcome the constraints of current databases, where data is stuck in a fixed structure within the database. Although a huge amount of engineering is required to make this approach as efficient and secure as current databases, it could become a very significant set of technologies for mainstream IT.

© Copyright 2011 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.

Tuesday, 8 March 2011

Why are websites so hard?

Tools and services for creating websites are so much more complicated than the underlying technology that it is almost impossible to explain how to set up a simple website.

Because I "work in computers", I am occasionally asked how to set up simple websites. I though I would write a newsletter about it, but as I tried to write it, I realised that I could not. Although the web is basically simple, and I understand it reasonably well, it has become very hard to explain.

The web works like this. A browser on a PC asks another computer, a web server, for a web page. The web server sends back a web page coded as hypertext markup language (HTML). The browser interprets the HTML and displays the page to the user.

To set up a website, you need to know a little bit about HTML. It is useful to have a basic understanding of internet domain names and web servers. You need to know how to move files from one computer to another, for example using FTP. Some basics of web site layout and writing for the web are useful.

Each piece of setting up a website if fairly straightforward, but it is very difficult to bring them all together in a simple "how to" guide.

Part of the problem is how services for the web are bundled.

To run a website, you need a domain name, domain name server set up, and some space on a web server. However, different providers supply these bundled in different ways, and bundled with other capabilities. Some capabilities, like email accounts and email forwarding, are closely related. Others capabilities, such as vouchers for web advertising, are not related to the underlying technical task. So instead of explaining some fairly simple technical ideas, you have to explain these through the products and deals that different providers offer.

Another big problem is the software used to write web pages. Now that Microsoft Frontpage is no more, there are no obvious pieces of software for simple websites for beginners. The main commercial web authoring tools, such as Adobe Dreamweaver, are so full of features that they are difficult to explain simply. A single copy of the software can cost as much as many year's of web hosting, so they are an expensive option for anyone just experimenting. To add to the confusion, hosting providers sometimes bundle online "site builder" tools, but different providers supply very different capabilities.

Some open source tools, such as KompoZer and Amaya are simpler and free. Although they are reasonably easy to use, they can come across as rather geeky to a beginner.

We have made the web far too hard. The underlying ideas are simple, and can be explained easily enough. But the commercial offers and product bundles and software packages built on top of this make the web much harder, not easier. To use the web effectively, you have to wrench layers of confusion out of the way before you can see the underlying simple concepts.

p.s. although I could not write a simple guide myself, I did come across this guide on how to start/create your own website at thesitewizard.com, which I recommend to anyone setting up a website for the first time.

Tuesday, 1 March 2011

What's on your mind?

What do you do when there are thoughts that keep floating around in your head, and you can not make sense of them or make them go away?

Six years ago, I was faced with exactly that situation. I had recently left a job in the IT department of a large retailer. Although they were by no means the worst offenders, I was tormented by the huge waste I saw in IT. My response was to start writing this newsletter, to help me explore and make sense of those thoughts.

I was not particularly concerned about how efficiently we supply IT, but rather demand-side inefficiencies, or the tendency to want more IT than we need.

This inappropriate demand comes in many forms. Politically-motivated projects, the must-have new enterprise systems that are just too complicated and different for the organisation to accept, are a large source of waste. The career aspirations of both technologists and managers drive unnecessary and inefficient IT solutions. Because we do not rationalise and decommission IT aggressively enough, we have much larger volumes of IT to maintain and operate than we strictly need.

In this newsletter I have explored how our perception of IT is at the root of many inefficiencies. A more down-to-earth perception of IT would help us understand IT more clearly and cut through some of the excessive demand. I have been exploring a strict definition that IT is only a tool for storing, transforming and moving information, and whether we should consider IT as a set of independent systems, rather than a complex of layered technologies and processes.

Exploring these ideas has achieved their primary objective. My torments have gone. I now feel I have got to grips with the problems I saw, and have at least some ideas on solutions. But more importantly, I have come to accept that many solutions to waste fall into the "nice idea, wrong species" category. Because of the people we are, we are bound to misunderstand something as radical as IT and misuse it for all sorts of political and career ends. The economic situation has now changed, and most organisations have reigned in their worst IT excesses. And my experience of working as a consultant has taught me to sell what people want to buy, which is usually justification for more IT activity, rather than naively assume that everyone wants to save money.

Although my original torments have gone, new thoughts float around my head now. I now wonder whether there are technological solutions which would sidestep the current demand-side inefficiencies, and move us to a much more efficient and valuable exploitation of IT.

My current view is that there may well be new types of solution which are orders of magnitude more efficient. Although my thoughts are vague and speculative, I see hints of it in everything-as-a-service (removing infrastructure constraint), in the technologies that support the semantic web (removing data constraint), and even in role-playing computer games (combining individual and computer-based realities). I am not sure there is anything in it, but it is definitely something worth thinking about.

© Copyright 2011 Minimal IT Ltd. See the Minimal IT website for the original newsletter and copyright information.

Minimal IT