Author Talks: How data takes on a life of its own—and what you can do about it

In this edition of Author Talks, McKinsey Global Publishing’s Mike Borruso chats with Aram Sinnreich, author and professor at American University, about his new book, coauthored with Jesse Gilbert, The Secret Life of Data: Navigating Hype and Uncertainty in the Age of Algorithmic Surveillance (MIT Press, April 2024). Sinnreich conducted more than 30 interviews of experts from various disciplines to explore the profound implications of data usage and storage across all aspects of life. He examines data’s impact on our culture, the power of manipulated algorithms, and the need for long-term approaches to treating data and humanity responsibly and with dignity. An edited version of the conversation follows, and you can also watch the full video at the end of this page.

Is the familiar phrase ‘you can’t argue with the data’ accurate?

No. We’d be in bad shape if we didn’t argue with data. Information scientists also have another familiar phrase: “Garbage in, garbage out.” A lot of data is based on garbage going in, so a lot of garbage comes out.

Data offers us many excellent tools, such as reorganizing our society, improving our economies, and improving our quality of life due to all the data we harvest. Conversely, the sad truth is there’s no such thing as a data system that doesn’t contain biases or flaws, especially for people left out of the design and executing phases of the data systems themselves. Those people tend to be the ones already receiving the short end of the stick in society.

There’s no such thing as a data system that doesn’t contain biases or flaws, especially for people left out of the design and executing phases of the data systems themselves.

Who is Francis Galton, and how does his story illustrate flaws in a data system?

Francis Galton was one of the originators of statistical thinking as we know it and use it today. His first cousin was Charles Darwin, and he shared Darwin’s interest in algorithmic understanding of our species and society.

Galton was a more skilled mathematician and information scientist than Darwin, and his innovations were more influential in some ways. He was one of the first people to argue that the normal distribution curve—the bell curve—could help us understand the distribution of various human traits.

One way he tested his theory was with a special glove he created with a pin on the thumb. In the experiment, he kept his hand and a piece of paper in his pocket. As he walked down the street, he would personally evaluate the attractiveness of all the women he passed by making holes in the paper with the glove.

If he thought a woman was very attractive, he would make a hole at the top of the paper; if he thought a woman wasn’t attractive, he’d make a hole at the bottom of the paper; if he thought a woman was so-so, he’d make a hole in the middle.

Once the experiment was complete, he created a “beauty map” of the British Isles. When you look at the map, you see a distribution curve of his subjective evaluation of women’s attractiveness in major cities from Scotland to Wales, and he deduced that London had the loveliest ladies.

This experiment is an excellent capsule story about how statistics simultaneously reveal population-level truths and impose the researcher’s biases and ugliness into the process.

Of course, it’s objectifying to evaluate women [in this way]; it propagates patriarchal power structures that have made society difficult for women.

Additionally, from the data standpoint, it’s bad data. There was no reliability measurement. Galton was solely responsible for designing the instrument, recording the values, and analyzing them. Any social scientist understands that those missteps make you prone to error. Of course, he was revealing his blind spots—many dimensions of human society don’t follow a normal distribution curve.

For example, he could’ve asked questions such as whether people were gainfully employed or had addiction or health problems. These types of social-scientific questions directly impact our understanding of society and our decisions about regulating it. They don’t reinforce Galton’s fundamental premise that you can understand everything innately through statistics and deviations from the mean. Society has absorbed half of its lesson and largely ignored the other half; most of these systems now regulate society.

These systems tell us something important about society’s desired state. They obscure the fundamental inequities and biases on which our financial and medical systems are based.

What are the pros and cons of metadata?

Metadata has been one of the greatest accelerants of human achievement, beginning with the Library of Alexandria, where Zenodotus created the first alphabetization and categorization system. To do this, he placed tags on the ends of papyrus scrolls that described what the scroll contained. That way, people could know what was on the scrolls without unrolling them to look.

In the present day, every photo we take on our phones and every document we create on our laptops is instantly populated with all kinds of metadata. That data is about who made the content, how it was made, and how it should be audited, distributed, and stored.

Every major step in human knowledge and expertise is built on past knowledge and expertise, which is aided and accelerated by copious metadata.

Metadata is excellent in this respect, but it has many problems. One issue is that metadata easily replicates because it is smaller and simpler than the data it describes. Therefore, if there’s an error in the metadata, that error will replicate faster than any fix.

There’s also a more sinister dimension to metadata; it collectively became data. It’s generative. When you pull enough metadata into one place, you create a new data object that creates new metadata. The line between metadata and data gets blurry.

It turns out that in large enough quantities, that kind of metadata can be used to deanonymize an individual, which data scientists have repeatedly demonstrated.

What is ‘algovision’?

We coined the term “algovision” because we needed a word for our discussion. It’s the basic idea of the many ways we see ourselves—whether it’s the image of yourself in a video chat interface or Apple’s tally of how much you use an app daily or weekly.

Data systems have become integral to how we began to collectively understand ourselves through the eyes of the algorithm, and this has led to various downstream cultural consequences.

For example, the demand for cosmetic surgery skyrocketed during COVID. When cosmetic surgeons were interviewed about the market, they reported that nine in ten people were getting first-time consultations because they didn’t like their appearance during video conference calls.

There’s a famous saying among young people, “Do it for the likes.” They point a selfie camera at themselves and then post it to social media to get social engagement, which translates into social capital, which translates into other capital.

Part of the goal of the book is to give words to people so that they can discuss what’s both good and bad regarding the algorithm. There are upsides to algovision, and seeing yourself through the eyes of the algorithm can be empowering.

People want to feel attractive, liked, and influential. You feel good about yourself when you receive lots of likes on social media, but beyond that, people can use algovision as a form of political power.

What surprised you during the research and writing of this book?

During our research, we were constantly surprised. We interviewed dozens of experts in fields we didn’t have experience in. Some of the surprises were small but crucial. For example, we spoke to a data broker who shared that they never delete data. Instead, they collect every possible data point and hold onto it forever—even though 99.9 percent of it isn’t actionably monetizable. They know that the data may be worth something in the future, whether it’s a month, year, or decade away.

Data storage has become cheap because of Moore’s Law. There’s now a reason to store data forever and periodically delve back into it when new tools or business models emerge.

The same is true for encrypted data. We are on the cusp of a new computer revolution known as quantum computing, which is excellent at decrypting encrypted data.

Of course, there are new quantum encryption algorithms, but you can’t retroactively apply them to data encrypted with old algorithms. This means that unaccountable third parties can intercept and store encrypted data transacted over the internet, such as health records, private communications, political affiliations, and financial transactions.

It’s estimated that between 2028 and 2036, affordable quantum computers will be able to easily decrypt data. Additionally, after 2036, data will find its way to publicly available databases, where it can be archived and searchable. Everything we once thought was private will then be available to everyone.

I am fascinated by that fact, and it’s hard for me to stop thinking about it, but something else surprised me even more. It wasn’t technology or social outcomes, but it was the interviewees.

Those who build systems to collect, analyze, store, and deploy data aren’t meaningfully considering what could go wrong.

We interviewed about 30 people who cover a broad range of industries: policy, technology, business, artists, scholars, and activists. Each person, in their way, had more data experience than either of us.

My coauthor and I would delve into their expertise and discuss how the secret life of data worked with their purview. At the end of the conversation, we would ask, “With your knowledge, what’s the worst you could do as a supervillain? What havoc could you wreak on society with your specialized experience?”

Without exception, every person we interviewed responded with, “I never thought about that before.”

That was a significant finding of the book. No one understands how data is extracted, where it’s stored, and who’s using it.

Those who build systems to collect, analyze, store, and deploy data aren’t meaningfully considering what could go wrong.

What are the dangers of simple mistakes related to data?

Numerous data errors cause issues, such as individuals being denied surgery or taking out a business loan. A false positive on the no-fly list can cause someone to get kicked off a flight. These problems predate the internet, the current data regime, and are as old as bureaucracy itself.

There are numerous stories about data mistakes. I’m a huge fan of Franz Kafka; he wrote about a mistake in a 19th-century context. There’s also Terry Gilliam’s 1984 film Brazil, in which a fly lands on a typewriter, changing the first letter of a last name, and havoc ensues.

These inherent flaws in the bureaucracy system have been evident since before we digitized it. Now, we have global networks, and the negative consequences can cascade in ways that no one can predict or contain.

There’s a mismatch between our social understanding of privacy, the lines between personal, private, and public data that we’ve been raised with, and how data systems work. Thanks to the pervasiveness of data systems, we unintentionally reveal things about ourselves to each other and learn things that we don’t want to know.

An example we discuss in the book is people discovering genetic families they weren’t previously aware of. We tend to implicate one another, and it’s not just with DNA but with things like location data.

We could implicate someone with a face in the background of a photo uploaded to social media. Every face uploaded to every platform is run against facial recognition algorithms. This is partly for good reasons, such as scanning for missing children.

Is technology omniscient?

Yes. Our book is based on “The carrier wave principle,” an academic article that my coauthor and I published in 2019. In the article, we use the metaphor of the recording angel from the Bible. The angel records everything in your life, and once your life is over, the angel shows you a ledger of everything you did.

If you didn’t give a homeless person a quarter one day, that’s a mark against you, or if you helped an older person across the street, that’s a mark in favor.

We’ve created our version of the recording angel, but it’s not about morality; instead, it’s a fundamentally amoral and immoral system.

What can organizations and governments do to minimize the dangers of data vulnerabilities?

Commercial organizations are bound because there’s a fiduciary duty to exploit data to its total capacity for profit in a data-rich economy. It’s not that they’re evil, but they are responsible for seeking profits.

Conversely, governments must do a better job regulating how data is used and proactively look at the various pressure points along the data chain. This includes accessing the designers and manufacturers of data software, the businesses that encourage customers to yield data, and the brokerages that trade in data. It also includes the downstream industries that use intelligence from brokered data.

Each step along the value chain must be regulated in a way that considers the unintended and potentially harmful consequences. Additionally, the regulations must be continually revised and improved to meet the escalating powers of data systems frequently. Of course, that’s a tall order.

From an internal standpoint, private and government institutions should consider the human factor in data vulnerabilities.

Any concluding thoughts?

Don’t despair. My coauthor and I experienced depression and paranoia while writing this book. It was for justifiable reasons, but we discovered that humans and society are incredibly resilient.

Our desire for organic connection, human dignity, and political liberty is strong and innate. People find spaces of freedom, agency, and new ways to express themselves in the most totalizing political and data systems.

Numerous approaches are outlined in the concluding chapters of The Secret Life of Data. They carve out spaces for freedom; some of the approaches are technological interventions, some are cultural interventions, and some are regulatory interventions. It all boils down to the understanding that data is a tool that’s not neutral or objective and has political consequences. Therefore, I prescribe starting to think of data as a crucial element in being a kind, responsible, and civic-minded person.

For example, if someone is visiting your home for dinner and you ask them, “Do you have any allergies?,” and they inform you of their peanut allergy, you may decide not to make pad thai but lasagna instead.

We should do the same thing with data. Ask, “Are there reasons you wouldn’t want your conversations recorded?”

Small, normative changes like that can ramify and add up to large-scale social change that will impact business practices and policies based on social norms and assumptions.

Watch the full interview

Author Talks

Visit Author Talks to see the full series.

Explore a career with us