Deep dive qualitative data analysis

I derailed a bit in my last article, so I also wrote this one. This one goes a bit more in depth into my process of analysing qualitative studies with a sprinkle of quantitative methods. To read the original study link here.

This started from the question “Any ideas on how we balance Qualitative research with Quantitative data?”. I wrote about how to think about quantitative and qualitative methods during the Research Process. I have long been quite passionate about the analysis part because I think that is one of the activities we tend to take shortcuts on.  

With that said, this is an example of how to do it. And like always, I’m not saying this is the right way to do it. But I have found this to work well for me. 

The intro and purpose for this taken from the other article
I try to work with the data so it can be worked with more flexibly over a longer period of time, which means I need to use more quantitative methods to do so. I still go through all the data in a more qualitative manner. I still identify themes and codes, but I put it down in an Excel or Google sheet so it exists like parameters in a database. Which means I can use software to easily switch between themes and codes I have detected. Themes and codes which might relate to behaviours and values, those things qualitative data is so good at. But at the same time, also opens up for demographic segments such as age, income, and things some stakeholders might be interested in.

📈 Let me show you an example

Image showing a screen dump from a demo google sheet created for this article. What a mess mixing languages, sorry for that. Click for a bigger view. 

The above example I made for this specific article, so no privacy-data was harmed in the making of this article. This is an example of what a data sheet could look like, after I have identified qualitative codes and themes. So I have kept the qualitative parts in it to keep the soft and more human values, which qualitative data is so good at. But I have also created quite strict codes.

I don't write the codes in the same cell separated by commas, instead I write them in separated rows. Some of them are binary, true, or false. Some consist of several values, it, of course, depends on the question. 

Example “sparkonto/savings account” is divided into binary values, either the respondents have a savings account or they don't. But on the question regarding “skäl till flytt/why they want to move” the codes are broken down to various codes. I have also added a quantitative question where they get to rank their personal interest in private finances. This just to show an example of the kind of quantitative question I often ask. 

So why all this hassle? Let me show you 👇🏼

🍊 Orange DataMining

Image of a screen dump showing how data import can look like in Orange. Click for a bigger view. 

I do all this to be able to analyse a bit more systematically. I use the open-source software called Orange DataMining. It is far from the most powerful or competent software, BUT it is free and Mac friendly, which Power BI is not. I can link to a Google sheet by URL, and it will be updated if you make any changes to the file online. Orange provides many modules, you can see them to the left in the image. I will show you briefly how I use some of them. In the image above you need to dictate Data type for every row (or Orange will do it for you, but you might want to change this), and that depends on how you want to use them. And through the data table module, you get an overview of how the data-file looks like. This helps you to understand what you can do, what you need to do to make it work, or often why an error has occurred.

This kind of method probably works with any analysis software, but I will demo this in Orange. Orange is the one I use most frequently, even though I should probably use something a bit more stable and common. But as long as it works for you, go for it.

📊 Distribution module

Image shows a screen dump from Orange and the distribution module. Click for a bigger view. 

So I very rarely use Orange to calculate correlations or significance, even though Orange can do that, those are different modules. But that is not what I’m interested in here. I’m interested in detecting patterns fast, or having visual support when I try to communicate qualitative data. Some people just respond better to charts. So by modelling the data the way I do in the Google sheet, I get a distribution chart of how the respondent lives split by number of children. This helps me to establish if it is more common to live in a house than a rental flat when having children. This means that splitting the different questions on demographic segments and behavioural codes is just one click away, rather than a thousand posts-it notes away on a Miro-board.

Let’s take a more qualitative example. The example below explores two more qualitative questions. The answers to the question “the need of financial security” and “your dream living”. Both of these questions explored through qualitative answers and then coded down to several codes and levels. This makes it much faster and easier to explore if certain dream livings are more represented by respondents that have less or more need of certain financial stability. Do determine on the dummy sheet I made, dreaming of living in a chain-house is more popular among the respondent who has little need of financial security. And when you have discovered that, you can go back to your data sheet and explore why in the text.

📚 Text Mining

The image shows the text mining add on in Orange, and the word cloud function. Click for a bigger view. 

I also quite often use the text module and create word clouds. Word clouds only work on basic questions. But on this question “hur sparar du pengar/how do you save money” it works OK. You can easily spot that “sparkonto/savingsaccount” is the most common. 

💕 Sentiment

The image shows the Sentiment functionality. Click for a bigger view. 

Occasionally, I also use the sentiment module, and this is suitable when you ask questions which can be emotionally loaded, or if you ask the respondent how they feel about something. Then you can get a fast overview if the answers are positive, neutral, or negative loaded. You can also check in the corpus viewer the specific comment. The sentiment should be used if you have a bigger sample. And good to know is that Orange only supports one sentiment model, which supports Swedish. I don’t use this as a truth but use it as a support while analysing the data. 

👋🏼 Outro

Orange DataMining software is an open-source software, which means it sometimes lacks maintenance, and I know some people have had plenty of problems with the software crashing. I haven’t had those kinds of issues, but for a very long time the Text Mining add-on didn't work, and it took a bit of time to get that fix. But it is free, highly flexible and the documentation is great. 

Some of these things I could, of course, do in Google Sheet or Excel or use Miro’s AI and so on. But it is not as powerful and flexible. This creates much more freedom. Orange comes with some standard add-ons. The ones I have added myself are Text, Network, Sentiment and Geo. 

The interface is quite user-friendly if you compare it to a lot of other similar tools. It is a lot of drag and drop, but that doesn’t mean you won’t get a lot of errors in the beginning. Data that doesn’t load, or behave the way you want. It takes a bit of time on how to work with the data that goes into Orange to make sense, which data types work with which add-ons and just how these different modules work. But remember, read the documentation and read the error messages. That would have spared me plenty of time. 

Previous
Previous

🤖 Q&A Research and Discovery Part 3: Qualitative and Quantitative

Next
Next

Det enkla ledarskapet