Why Software Testers need to learn Data Science

I have been working as a QA Engineer/Tester for a year and a half now. I have tested a variety of web and mobile apps and I had a wonderful time exploring how much value testing adds to a software system.

The company I work at, Red Buffer, is among the very few companies in the country which are doing extensive work towards providing Data Science and Machine Learning solutions to a variety of different clients. I always looked at Data Science as a completely unknown scary area which was also attributed to by the fact that the only data-related course I took in my university degree was Database Systems and hence, I had no real knowledge of the area. Being a reader, I was seeing numerous articles being published every day about how important it is to know Data Science and Machine Learning and how we are globally moving towards it. All of this was more than enough motivation for me to start learning Data Science.

Last weekend, thanks to the kindness of Sir Majd Uddin, I was invited to give a talk at the Tester’s meetup organized by Pakistan Software Testing Board. Working in the field of Software Testing for a while and now having realized the importance of learning Data Science, I decided to give a talk on why software testers shouldn’t stay behind in the race of Data Science and how the two fields potentially overlap.

 

http-%2f%2fmashable-com%2fwp-content%2fuploads%2f2012%2f12%2fjer-thorp-good-morning
Visualization of ‘Good Morning’ tweets globally by Jer Thorp

 

We have the age of data upon us. Tech and media companies are constantly making use of their users’ data to learn more about them, strategically plan their businesses around them and make decisions accordingly. It is being used in medical science to effectively diagnose and treat patients.

Hiding within those mounds of data is knowledge that could change the life of a patient, or change the world. ~ Atul Butte, Stanford

Facebook uses the data about you: your location, your likes and dislikes, your friends, their likes and dislikes and a lot more to show you content that matters to you and ads that you are most likely to be interested in.

Netflix was originally an entertainment company which allowed its subscribers to stream movies and TV shows online. We are already aware of how they recommend us more shows and movies to watch based on our interests. But, most interestingly, Netflix is using the power of data by using it to produce very successful Netflix-original TV shows which have now beaten all popular TV channels.

If we have self-driving cars in the near future, we owe it to Data Science. Self-driving cars are autonomous cars that are capable of driving without human input. They are trained by human drivers and they store that data for driving without human intervention later on. It makes use of location data, sensory data of the surroundings, road type and much more to ensure a safe drive.

Data Science is an interdisciplinary field which uses different techniques to extract information from data in different forms. People can be making use of Data Science in many different capacities and disciplines. So, given that you have some data, you can do a lot by applying different Data Science models and techniques from gathering insights to training a machine learning algorithm.

datasciencecomparedwithdifferentanalyticsdisciplines
Credits: http://www.dezyre.com/article/data-science-compared-with-different-analytics-disciplines/175

Data is messy. No matter what capacity you’re working in, it is very hard to always have data in a known particular format. Whenever you’re working with data, you’re making assumptions about what the data might look like. And we know better than to rely on assumptions. Testing is required every step of the way and testers on the team can play their role in testing those assumptions.

Let’s say a Data Scientist wrote a piece of code to get some information out of some data or to train a certain system. Later on, more data comes in for more training. You don’t know whether the code would work for the new set of data or not. That is why you need testing.

Recently, as part of my learning experiments, I took the data of the highest grossing movies from the past 10 years. I wanted to visualize the most popular genre of top twenty highest grossing movies. I wrote a code to extract their genres and visualize them using D3.js and here is what I got:

top20

Later on, I decided to add more data to my visualization so I changed the data from top twenty to all top hundred movies and this is what I got.

top100wrong

After some testing and analysis, I realized that the code isn’t working correctly for the new data. As it turns out, there were only 9 unique genres for top twenty movies. But when I increased the data set to accommodate top hundred movies, I now had 14 unique genres but my visualization code only catered for the original 9 genres and it needed fixing. Here is what I got after fixing.

top100

The more I read and experiment with Data Science, the more I realize that there is an interesting overlap between the skill set desired in Software Testing and Data Science.

screenshot-2017-02-17-12-34-53

A major part of a software tester’s job is to think critically because we know that things aren’t all that they look like and we should always be doubting the system under test. We don’t have to just prove that the system works but also uncover hidden issues and ensure that it isn’t doing something that it isn’t supposed to. We need to use analytical thinking to analyze the situation in which a user might use the system and how he might approach it and how it might or might not work. Similarly, the good part of a data scientist’s job is to look at raw data and analyze it to extract useful information and identifying patterns in it. So, even if you’re not exactly working in Data Science, learning about it can help you be critical of the system.

Whether we realize it or not, a lot of software testing is already data-driven. Whether it be something as deep as control flow or data flow testing or testing a simple sign-up form, we are testing the software system with different sets of data so we already happen to have an understanding of how to use data to our benefit.

At the end of the day, it really matters which programming language you have to work with. The languages used in Data Science are Python and R, which are also extensively used by testers for writing tests and debugging codes.

You don’t have to agree with me on the overlap of Software Testing and overlap but you cannot deny the rise in popularity of Data Science. With more and more companies investing a lot in this area and us seeing a rapid shift towards it, we don’t know what testing would look like in the near future. Being in tech, we have to constantly evolve and learn new things. Data Science helps make better decisions and software testing helps make better software so combining the knowledge of both, we can really achieve something.

2 thoughts on “Why Software Testers need to learn Data Science

  1. I agree with what you’re saying. After reading this blog, I believe software tester needs to learn data science. From now onwards I will, also keep my eye on data science as a tester 🙂
    Thanks for giving such an innovative blogs!
    You can also check out the Software Testing blogs here – https://goo.gl/D6Z98j

    Liked by 1 person

Leave a comment