The Science of Predictions: A Simple Guide to Predictive Analytics
Do you use predictive analytics? (Hint: yes, you almost certainly do.) Predictive analytics are the force behind streaming recommendations in your movie and music apps, weather forecasts, and personalized shopping recommendations. Predictive analytics is a way to use information about the past and present to guess what might happen in the future. It looks at patterns in data, like what people buy, how they behave, or how things change over time, and uses that to make predictions.
Why do we care about predictive analytics? Why does it matter? I’ll give you three examples. Using predictive analytics, businesses can make better choices, and so can individuals. Organizations can use the data to predict problems before they happen based on past patterns. Advertisers and retailers can use data to create personalized shopping experiences. That last point can be controversial, but the concept is here to stay, so we might as well learn as much about it as we can.
The Science Behind the Magic
Data Collection
We start with the process of collecting data, lots and lots of data, various types of data. Predictive analytics uses historical data, real-time data, and a variety of data. Knowing what the terms mean gives us a good foundation.
Historical data is stuff that happened in the past, but it can be pretty wide in scope. It might be sales records, weather data, police reports, or the number of visitors to a historical site over a period of time. It doesn’t have to be anything of major significance – even mundane details can give us insight into the future from the past.
Real-time data would come from live data streams, like live social media activity. It might also come from sensor readings that update predictions instantly. We saw an example of that from a seismograph in Tuckaleechee Caverns in Townsend, Tennessee. It’s one of the most sensitive seismographs in the world, and they show cavern visitors what the current readings are and how the data is used. Another example is the river level sensors that feed data to websites used by river barge pilots to help them navigate safely along the river locks.
There’s a difference in types of data, as well, and the differences give us even more information. Structured data fits neatly into its spot, like a spreadsheet with rows and columns, or a database. It’s easy to search because it follows rules. Unstructured data is just the opposite. It doesn’t follow any rules because there’s no — well, structure. We see in this category things like images, videos, emails, and social media posts. Good predictive analytics systems take advantage of both structured and unstructured data.
Analysis Techniques
Analyzing the collected data involves three primary methods. Analysts use Regression Analysis to predict relationships between variables. For example, sales records and weather data can help an ice cream company understand how people buy ice cream when the weather changes. The one you are most likely to have used yourself is classification. When we group things into categories, we can examine the significance of what we’re seeing. Finally, clustering finds patterns by grouping similar data points. This might look like using customer segmentation in marketing efforts.
Machine Learning
For any of this to work, it’s necessary to teach a computer to recognize patterns, and scientists do that by feeding it lots and lots and lots of data. Then analysts check how accurate the computer’s guesses are with new incoming data. The more data the model gets, the better it can get at predicting things. As you might guess, it’s pretty bad at first, but then analysts “show” how the guess went off the rails.
My favorite way to illustrate machine learning is to think of trying to teach a computer what a dog is. We would feed it thousands of data points, including photos and hand drawings, of things that are “dog” and “not dog,” specifying what the “not dog” things are, of course.” But it can take a lot of time and error for the computer to be able to distinguish between a living creature with four legs, a tail, and a head from a table, with four legs, a piece of fabric draped over the edge of it, and a vase of flowers at the opposite end. There are enough similarities that early attempts would confuse the computer, but with enough subsequent comparisons, the computer will eventually get the idea.
Opportunities and Limitations
We can easily realize that better and more complete information can lead us to better decisions. However, we can also make those decisions with less effort and time, and the results can be personalized to me, to you, and to a particular population segment. By bringing in only the relevant inputs, we can reasonably control the outputs.
We do need to understand, though, that predictions aren’t guarantees, and that’s one of the limitations of predictive analytics. The quality of the data will affect the precision and accuracy of the conclusions. Additionally, and some of my friends refuse to acknowledge this, bias can creep into the results through the selection of the input data. What’s magenta to me is purple to my husband, so “is it purple” will be determined by whether I or my husband provided the input. There’s a social media meme with a photo of a blueberry muffin and another photo of a chihuahua’s face, and the caption “How to confuse a computer.” Yes, that’s a reality.
The Ethical Side of Predictive Analytics
Predictive analytics can be powerful, but it also raises important ethical questions. Privacy is probably the one topmost on people’s minds. But avoiding discrimination and bias matters if we’re going to have anything “predictable” in predictable analysis.
Privacy and Ethics – Collection and Protection
Data collectors must be transparent about what data is collected, how it is collected, and how it is used, stored, interpreted, and disposed of. They should limit collection to only what is relevant to the actual, current project, rather than trying to anticipate what might be useful in the future. Opt-in policies should be the norm, rather than burying the information in long Terms of Service documents. Data collectors should follow the maxim, “If you collect it, protect it. If you can’t protect it, don’t collect it.”
Garbage In, Garbage Out is Still in Force
Bias in Data will lead to bias in outcomes, even if there’s no intention behind the bias. If the models are trained on biased data, we will see reinforcement of unfair outcomes, like in loan approvals that disfavor a population segment. Businesses should always make predictive models understandable. You should be able to see how the inputs led to the outcomes. The “Black Box” model of explaining the algorithm is unacceptable. The fact that it’s not easy to explain or understand doesn’t relieve us of the responsibility to be able to do so. Predictive analytics should not be configured in a way that could unfairly exclude people from hiring opportunities or medical treatments.
You Have a Role as a Consumer
GRAB THE PUBLIC WI-FI LINK
You can regularly review privacy settings on apps and websites to see what data they ask from you, and to exercise as much control over that data as possible. Remember the post on how to stay safe on public Wi-Fi? Make sure you don’t enter any sensitive information on any unsecured Wi-Fi networks. Avoid oversharing personal details on social media. Your friends already know the good stuff, and who else needs it? Finally, yes, those exhaustive agreements that nobody ever reads – the fact that you don’t read it doesn’t remove you from your responsibility to it. At least give them a glance.
Your Turn
It’s not hard to see how useful predictive analytics can be to us as individuals, to businesses, and to society as a whole, but it’s also not hard to see how it can be abused. I feel that we can’t just throw the baby out with the bath water and say that unless we can always use it right, nobody should use it at all — but that’s my opinion. What’s yours? What are some of the ways predictive analytics excites you, and what are some ways it scares you? Drop a comment below and let’s continue the conversation. You can also explore Google Trends and this YouTube video that explains how Excel helps with predictive analytics. When you’ve enjoyed all that (and left me a comment below 😉 ), check out my post on Data Driven Decision Making.
My photography shops are https://www.oakwoodfineartphotography.com/ and https://oakwoodfineart.etsy.com, my merch shops are https://www.zazzle.com/store/south_fried_shop and https://society6.com/southernfriedyanqui.
Check out my New and Featured page – the latest photos and merch I’ve added to my shops! https://oakwoodexperience.com/new-and-featured/
Curious about safeguarding your digital life without getting lost in the technical weeds? Check out ‘Your Data, Your Devices, and You’—a straightforward guide to understanding and protecting your online presence. Perfect for those who love tech but not the jargon. Available now on Amazon:
https://www.amazon.com/Your-Data-Devices-Easy-Follow-ebook/dp/B0D5287NR3
