A measuring rod for artificially intelligent systems

In order to map out the quality and risks of systems, we have been using ISO 25010 and similar standards of quality characteristics for years.
For AI systems such as chatbots, robots, etc., these are certainly still relevant, but they are no longer sufficient, as Rik Marselis of Sogeti experienced in his daily practice.

Almost every organization now has a chatbot. To deal with frequently asked questions, to attract new customers or to manage common transactions. The chatbot is one of the manifestations of the large number of intelligent machines that are currently storming into our lives. Other examples are vacuum cleaner robots, license plate recognition cameras and autonomous cars. 

A large insurer decided to implement a chatbot when the number of damage claims rose after a
major storm and the call center could not handle the volume of requests. Most claims are so clear that they can be automatically administrated and handled. This offers great benefits for both company and customers. If the quality of a chatbot is poor, an organization can also suffer a lot.

Think of the chatbot of Microsoft that started writing out extreme right-wing language. It simply reproduced what it had learned from input (of some mischievous people), as self-learning algorithms do, so basically the chatbot itself wasn’t wrong, but Microsoft's reputation went wrong.

There are a lot of characteristics that are important for the quality of information systems in general. The well-known ISO 25010 standard defines eight main groups, which we know as the functional and non-functional quality characteristics. Developers use these attributes not only in advance, to define the requirements that the system must meet, but also afterwards, to verify and validate whether the system actually complies.

In the case of AI-systems such as chatbots, the eight main groups of ISO 25010 are fully useful and necessary. In our practice, while researching intelligent machines, we have noticed that these attributes are no longer sufficient; AI systems have extra features that we do not encounter with normal information systems. That's why we introduced three new main groups of characteristics to check the quality: intelligent behavior, morality and personality, each with multiple subattributes, and we have added embodiment as a sub-characteristic of user-friendliness.

AI quality attributes

Figure 1: ISO 25010 quality characteristics and our extensions for quality of intelligent machines.

Many are worried about the intelligent behavior of AI systems. The big fear is that they become smarter than people and will take over the world. This kind of 'superintelligence' is still totally impossible for the time being, and if developers use it deliberately, those future super intelligent AI systems will be good to us.

At the moment, AI systems still have limited intelligent behavior: they are very good at one specific task, for example recognizing objects on photos or moving independently through traffic. The question here is how intelligent the systems really are. Does a self-driving car understand what kind of action is expected if there is a 'Think of our children' road sign?

The fact that something is technically possible does not mean that we should always want it. AI systems can already perform tasks that were previously reserved for people, such as assessing medical data and making a diagnosis. Or supporting those in need - think of a speaking assistant that reminds elderly people to take their pills. However, we cannot leave such tasks to intelligent systems without first answering all sorts of difficult ethical questions. In addition, privacy is at stake. Intelligent devices register everything with cameras and microphones and are permanently connected to the internet. Who can see and hear all? Moreover, we need to think carefully about how we want AI systems to behave towards people - think of killer robots that independently decide who is an enemy.

Chatbots in all shapes and sizesChatbots
Chatbots can certainly use artificial intelligence and machine learning. However, most existing chatbots are pre-programmed. The interaction with the user can seem very smart, but often there is no machine learning or artificial intelligence involved.
Many people associate a chatbot with text-based communication. However, there are more species. For example, speech-based chatbots rapidly enter our lives. Think of Amazon's Alexa, Microsoft's Cortana, Google's Home and Apple's Siri.
There are also many visually oriented chatbots. Take a look at [Quickdraw] (https://quickdraw.withgoogle.com). It is amazing how well this algorithm recognizes drawings. And at the end it also gives insight into how it reached its conclusion.
A bit further in the future are the video chatbots. An animation of a person (as we also know from computer games) comes into the picture and it soon feels like we are just talking to someone on the video. It will not be long before the artificial assistant can no longer be distinguished from real.

People will interact with intelligent machines. Developers should carefully consider how to make this a pleasant experience. In the development of AI-systems, the personality deserves special attention, whereby we have to look carefully at the situation. As an example we compared two chatbots that have the same function but quite a different look-and-feel. Kayak's chatbot (https://www.facebook.com/messages/t/kayak) supports the search for air travel tickets by asking short questions and comes up with relevant options quickly. The Bluebot of KLM (https://www.facebook.com/messages/t/klm) has much more text, it welcomes you and gives you hints and tips for answering its questions, and finally gets you relevant travel options as well. The result is the same, but the feeling created is totally different. One is not better or worse than the other; the target audience is simply different. Kayak fits perfectly with a frequent user at a travel agency, while Bluebot is much more suitable for the occasional traveler. If the personality of an AI system fits well, people will enjoy the interaction.

An interesting aspect of personality is humor. Humor can make the conversation between a human and a digital assistant more lively, but the AI must be able to recognize when the user is in the mood for it. Who contacts the insurance chatbot to report storm damage is not in the mood for a joke.

In addition, the embodiment is of importance: what does the intelligent machine look like? The appearance of digital assistants should not frighten people, because then nobody will want to interact. Also the look-and-feel must meet its purpose and context.

Not perfect
For developers, it is important to think early about the quality characteristics they expect from their AI-system. If we build our own system, we will want to know in detail how well it learns and makes decisions and whether those decisions are transparent. We can also buy 'off the shelf'. If all goes well, the supplier has already tested the technology, but we still have to determine whether the resulting business process, including the intelligent machine, indeed meets the requirements
and expectations. A bank, for example, that decides to put a robot behind the counter wants to know for sure that the robot gives the right advice and handles transactions properly.
If the quality of AI-systems is not sufficient, it entails major personal and societal risks. With the existing and newly added quality attributes, these risks can be well mitigated. We, however, certainly do not strive for perfection. And why would we, with intelligent systems that can eventually develop themselves?

A pleasant and joyful experience
In this article we introduced the quality characteristics of ISO 25010 and extended it with three new main groups: intelligent behavior, morality and personality, and embodiment as a subcharacteristic of user-friendliness. With the use of these quality characteristics development teams can work towards the right quality
level of their intelligent machines to satisfy the users and make interaction between people and intelligent machines a pleasant and joyful experience.


Author: Rik Marselis. 
Rik is a testing expert at Sogeti in the Netherlands. He is a well-appreciated presenter, trainer, author, consultant and coach in the world of testing. His presentations are always appreciated for their liveliness, his ability to keep the talks serious but light, and his use of practical examples with humorous comparisons. He supported many organizations and people in improving their testing practice by providing useful tools & checklists, practical support and having in-depth discussions.
As a fellow of Sogeti’s R&D network SogetiLabs, Rik researches the testing of intelligent machines (such as chatbots and robots) and how to use artificial intelligence and machine learning to support testing activities.
Rik has published many articles, blogs and papers on testing. He contributed as an author or projectleader to the following books: TestGrip (2007), TMap NEXT BDTM (2008), TPI NEXT (2009), PointZERO (2012), Quality Supervision (2012), Neil’s Quest for Quality (2014). On top of this he made smaller contributions to 12 other books.
In 2018 his latest book “Testing in the digital age; AI makes the difference”, written together with Tom van de Ven and Humayun Shaukat, was published.