On quality issues when applying social sciences

Lately there’s been an increase of interest in the application of social sciences to IT topics. For quite some time the Usability and UX communities apply methods from behavioral sciences and insights from cognitive psychology. In recent years the Agile / DevOps community has taken a major interest in cultural transformation. Nowadays designers have started to study positive psychology with the aim of designing for users’ happiness and general wellbeing, a development called positive computing which I heard about from my colleagues at VINT who will be publishing reports on Digital Happiness later this year.

However, I’ve noticed that people in the IT industry are often not aware of the quality issues that are inherent to scientific publications and their application, especially in the social sciences. In IT we often read about something and immediately consider ways to apply that knowledge. However when reading on social sciences it is important to continuously scrutinize what you’re reading. Publications are usually part of a larger scientific discourse where it is common to publish suggested models, unproven theories and preliminary conclusions with the exact purpose of sparking a discussion and thus allowing others to verify or refute them. In this piece I’d like to delve into some of the more common quality issues with regards to publicized scientific information and their application and point to some examples where this has led to general misunderstandings.

The most straightforward issue occurs when a theory is published with limited or no factual evidence. An example of this is actually one of the most popular models of human behavior in the history of psychology: Maslow’s hierarchy of needs. When reading about this model it’s easy to see its appeal: it’s simple, recognizable and very plausible. After all who wouldn’t want to have enough food, physical safety and meaningful relationships? However it was published with no evidence as to its validity, and subsequent research has proven the model to be false. People do not have a shared priority of needs, rather they develop different priorities based amongst other things on their cultural background, personal life experiences and cognitive functions. Sometimes people choose to take dangerously low amounts of nutrition, like during fasting, hunger strikes or anorexia. Other times people willingly engage in dangerous behavior, like extreme sports or visiting unsafe countries. And yet other times people isolate themselves from society, like recluses, phobics or anti-socials.

Another issue can be methodological concerns. For brevity I’ll only focus on the representativeness of the sample. The famous 5 phases of grief described by Elisabeth Kübler-Ross for example have been based on therapy sessions with a limited number of terminally ill patients. That’s a sample group so small that factual conclusions cannot be drawn from them. Also, the fact that these people were under therapy could mean they are not representative for all terminally ill patients. One could also question whether terminally ill patients are representative for all people experiencing grief. Finally one could wonder whether these phases are specific to American culture or transferable to people from other cultural backgrounds. These questions may not necessarily mean the theory is false or unusable, but rather that it should be restrictively applied rather than taken at face value. Other methodological concerns may include using self-report as evidence, eliminating extreme results from samples, not including a control group or using unrealistic experimental settings.

Then there’s the issue of drawing conclusions from experimental results. One rule every first-year psychology student learns is “correlation does not imply causation”. Just because A and B occur simultaneously, does not mean that A causes B. B might cause A instead, or A and B might be caused by C, or it’s just a coincidence A and B coincide. For example in economics a free market economy is widely assumed to cause general wealth increase. However recent historical analysis of free market economies in pre-modern times by Bas van Bavel suggests that it is actually an increase in general wealth that allows free market economies to come into existence, which then increase wealth inequality to the point  that they self-destruct.

Another common issue lies in the overgeneralization of specific results. For example studies on non-verbal communication by Albert Mehrabian have shown that when people communicate feelings inconsistently the receiver will place a relative importance of 93% on nonverbal cues over verbal communication. This finding has been widely overgeneralized to the extent that to this day management consultants worldwide believe roughly 95% of all communication is non-verbal. However the original author of the study never drew this general conclusion and publicly criticizes it. Different studies on non-verbal communication in different settings have yielded a broad range of results, to the extent that science cannot provide one definite answer to the question what percentage of communication is non-verbal.

Furthermore there’s the issue of selective reading and reproduction. This is also part of the problem for non-verbal communication and the 5 phases of grief, which not many people realize were described as not necessarily occurring in a set order. But the example I’d like to highlight here concerns brainstorming. Invented in the late 1930’s the technique gained popularity in the decades following, however experimental studies failed to find a positive effect for brainstorming. In the late 1990’s a review of brainstorming research shockingly found that almost none of them used brainstorming as originally intended. Experiments were done with random people, while the technique was originally executed using a group of co-workers with years of experience working together. Many experimental settings did not use a trained moderator for the brainstorming sessions, which is essential to prevent disrupting group processes like bullying or social loafing. But the biggest surprise for me was that brainstorming was actually intended as part of a larger process for building creativity within the team. Experiments were usually done with brainstorming as an isolated activity, which in my experience is still common in practice. The results may suggest that having random groups of people generate ideas is less effective than having individuals do so separately, however it’s impossible to draw any conclusions on whether the actual brainstorming activity as originally intended is an effective creative tool.

Finally I’d like to touch on the issue of invalid application of proven scientific knowledge. Social sciences, especially psychology, are divided in fields or schools. Those fields can have widely different paradigms on the subjects of their studies. Knowledge gained from those studies may not be easily transferable to another field. When applying such knowledge care must be taken that the application matches the paradigm that was used to gain that knowledge. For example, I once read a paper suggesting the application of techniques developed within the context of behaviorism to cultural transformation programs. Such techniques have been proven very effective for the application of behavior modification or management on an individual level. However they cannot be applied to the much more complicated question of cultural transformation. In a multi-year study of cultural transformations Jaap Boonstra found the effectiveness of interventions on the behavioral level to be negligible. The mismatch occurs because techniques developed for use with individuals are applied to groups, whose behaviors have a very different dynamic, especially when cultural factors come into play. Another example of this fallacy is the application of models of personality such as DISC to perform job assessments or screenings. Even though the DISC assessment was developed for this exact purpose, the underlying model was created by studying people’s emotions and behavior. The workplace context wasn’t part of the original research. The underlying assumption that certain personality types are better suited for certain jobs is also unfounded. Job assessments can become extremely subjective when using personality traits as criteria.  A much better tool for assessing workplace behavior is the Belbin Team Role Inventory, which was developed by studying team behavior during business games. Note that conversely these team roles are not equivalent to personality types.  

In closing I would like to stress that as a social computer scientist I’m extremely happy to see an increasing interest in social sciences from the IT industry. I’m convinced this has already lead to a better match between IT solutions and the needs of the people they affect. Decades of usability engineering for example have helped create user interfaces much better suited to users, especially novices. Also keep in mind that all the models, theories and studies that I described here have been very influential in shaping our current understanding of human behavior. Even when proven wrong, these at the very least least have sparked interest, discussion and research into important topics. What I tried to do was point out common pitfalls that one may encounter when applying knowledge gained from scientific publications. These should always be carefully scrutinized to avoid quality issues. If taken at face value, the long-term results may not be what was expected.

Niek Fraanje is a Test manager working for Sogeti in the Netherlands