Truth and Prediction in the Dataclysm

The Deluge by Francis Danby. 1837-1839

Last time I looked at the state of online dating. Among the figures was mentioned was Christian Rudder, one of the founders of the dating site OkCupid and the author of a book on big data called Dataclysm: Who We Are When We Think No One’s Looking that somehow manages to be both laugh-out-loud funny and deeply disturbing at the same time.

Rudder is famous, or infamous depending on your view of the matter, for having written a piece about his site with the provocative title: We experiment on human beings!. There he wrote: 

We noticed recently that people didn’t like it when Facebook “experimented” with their news feed. Even the FTC is getting involved. But guess what, everybody: if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.

That statement might set the blood of some boiling, but my own negative reaction to it is somewhat tempered by the fact that Rudder’s willingness to run his experiments on his sites users originates, it seems, not in any conscious effort to be more successful at manipulating them, but as a way to quantify our ignorance. Or, as he puts it in the piece linked to above:

I’m the first to admit it: we might be popular, we might create a lot of great relationships, we might blah blah blah. But OkCupid doesn’t really know what it’s doing. Neither does any other website. It’s not like people have been building these things for very long, or you can go look up a blueprint or something. Most ideas are bad. Even good ideas could be better. Experiments are how you sort all this out.

Rudder eventually turned his experiments on the data of OkCupid’s users into his book Dataclysm which displays the same kind of brutal honesty and acknowledgement of the limits of our knowledge. What he is trying to do is make sense of the deluge of data now inundating us. The only way we have found to do this is to create sophisticated algorithms that allow us to discern patterns in the flood.  The problem with using algorithms to try and organize human interactions (which have themselves now become points of data) is that their users are often reduced into the version of what being a human beings is that have been embedded by the algorithm’s programmers. Rudder, is well aware and completely upfront about these limitations and refuses to make any special claims about algorithmic wisdom compared to the normal human sort. As he puts it in Dataclysm:

That said, all websites, and indeed all data scientists objectify. Algorithms don’t work well with things that aren’t numbers, so when you want a computer to understand an idea, you have to convert as much of it as you can into digits. The challenge facing sites and apps is thus to chop and jam the continuum of the of human experience into little buckets 1, 2, 3, without anyone noticing: to divide some vast, ineffable process- for Facebook, friendship, for Reddit, community, for dating sites, love- into a pieces a server can handle. (13)

At the same time, Rudder appears to see the data collected on sites such as OkCupid as a sort of mirror, reflecting back to us in ways we have never had available before the real truth about ourselves laid bare of the social conventions and politeness that tend to obscure the way we truly feel. And what Rudder finds in this data is not a reflection of the inner beauty of humanity one might hope for, but something more like the mirror out of A Picture of Dorian Grey.

As an example take what Rudder calls” Wooderson’s Law” after the character from Dazed and Confused who said in the film “That’s what I love about these high school girl, I get older while they stay the same age”. What Rudder has found is that heterosexual male attraction to females peaks when those women are in their early 20’s and thereafter precipitously falls. On OkCupid at least, women in their 30’s and 40’s are effectively invisible when competing against women in their 20’s for male sexual attraction. Fortunately for heterosexual men, women are more realistic in their expectations and tend to report the strongest attraction to men roughly their own age, until sometime in men’s 40’s where males attractiveness also falls off a cliff… gulp.

Another finding from Rudder’s work is not just that looks rule, but just how absolutely they rule. In his aforementioned piece, Rudder lays out that the vast majority of users essentially equate personality with looks. A particularly stunning women can find herself with a 99% personality rating even if she has not one word in her profile.

These are perhaps somewhat banal and even obvious discoveries about human nature Rudder has been able to mine from OkCupid’s data, and to my mind at least, are less disturbing than the deep seated racial bias he finds there as well. Again, at least among OkCupid’s users, dating preferences are heavily skewed against black men and women. Not just whites it seems, but all other racial groups- Asians, Hispanics would apparently prefer to date someone from a race other than African- disheartening for the 21st century.

Rudder looks at other dark manifestations of our collective self than those found in OkCupid data as well. Try using Google search as one would play the game Taboo. The search suggestions that pop up in the Google search bar, after all, are compiled on the basis of Google user’s most popular searches and thus provide a kind of gauge on what 1.17 billion human beings are thinking. Try these some of which Rudder plays himself:

“why do women?”

“why do men?”

“why do white people?”

“why do black people?”

“why do Asians?”

“why do Muslims?”

The exercise gives a whole new meaning to Nietzsche’s observation that “When you stare into the abyss, the abyss stares back”.

Rudder also looks at the ability of social media to engender mobs. Take this case from Twitter in 2014. On New Years Eve of that year a young woman tweeted:

“This beautiful earth is now 2014 years old, amazing.”

Her strength obviously wasn’t science in school, but what should have just led to collective giggles, or perhaps a polite correction regarding terrestrial chronology, ballooned into a storm of tweets like this:

“Kill yourself”

And:

“Kill yourself you stupid motherfucker”. (139)

As a recent study has pointed out the emotion second most likely to go viral is rage, we can count ourselves very lucky the emotion most likely to go viral is awe.

Then there’s the question of the structure of the whole thing. Like Jaron Lanier, Rudder is struck by the degree to which the seemingly democratized architecture of the Internet appears to consistently manifest the opposite and reveal itself as following Zipf’s Law, which Rudder concisely reduces to:

rank x number = constant (160)

Both the economy and the society in the Internet age are dominated by “superstars”, companies (such as Google and FaceBook that so far outstrip their rivals in search or social media that they might be called monopolies), along with celebrities, musical artist, authors. Zipf’s Law also seems to apply to dating sites where a few profiles dominate the class of those viewed by potential partners. In the environment of a networked society where invisibility is the common fate of almost all of us and success often hinges on increasing our own visibility we are forced to turn ourselves towards “personal branding” and obsession over “Klout scores”. It’s not a new problem, but I wonder how much all this effort at garnering attention is stealing time from the effort at actual work that makes that attention worthwhile and long lasting.

Rudder is uncomfortable with all this algorithmization while at the same time accepting its inevitability. He writes of the project:

Reduction is inescapable. Algorithms are crude. Computers are machines. Data science is trying to make sense of an analog world. It’s a by-product of the basic physical nature of the micro-chip: a chip is just a sequence of tiny gates.

From that microscopic reality an absolutism propagates up through the whole enterprise, until at the highest level you have the definitions, data types and classes essential to programming languages like C and JavaScript.  (217-218)

Thing is, for all his humility at the effectiveness of big data so far, or his admittedly limited ability to draw solid conclusions from the data of OkCupid, he seems to place undue trust in the ability of large corporations and the security state to succeed at the same project. Much deeper data mining and superior analytics, he thinks, separate his efforts from those of the really big boys. Rudder writes:

Analytics has in many ways surpassed the information itself as the real lever to pry. Cookies in your web browser and guys hacking for your credit card numbers get most of the press and are certainly the most acutely annoying of the data collectors. But they’ve taken hold of a small fraction of your life and for that they’ve had to put in all kinds of work. (227)

He compares them to Mike Myer’s Dr. Evil holding the world hostage “for one million dollars”

… while the billions fly to the real masterminds, like Axicom. These corporate data marketers, with reach into bank and credit card records, retail histories, and government fillings like tax accounts, know stuff about human behavior that no academic researcher searching for patterns on some website ever could. Meanwhile the resources and expertise the national security apparatus brings to bear makes enterprise-level data mining look like Minesweeper (227)

Yet do we really know this faith in big data isn’t an illusion? What discernable effects that are clearly traceable to the juggernauts of big data ,such as Axicom, on the overall economy or even consumer behavior? For us to believe in the power of data shouldn’t someone have to show us the data that it works and not just the promise that it will transform the economy once it has achieved maximum penetration?

On that same score, what degree of faith should we put in the powers of big data when it comes to security? As far as I am aware no evidence has been produced that mass surveillance has prevented attacks- it didn’t stop the Charlie Hebo killers. Just as importantly, it seemingly hasn’t prevented our public officials from being caught flat footed and flabbergasted in the face of international events such as the revolution in Egypt or the war in Ukraine. And these later big events would seem to be precisely the kinds of predictions big data should find relatively easy- monitoring broad public sentiment as expressed through social media and across telecommunications networks and marrying that with inside knowledge of the machinations of the major political players at the storm center of events.

On this point of not yet mastering the art of being able to anticipate the future despite the mountains of data it was collecting,  Anne Neuberger, Special Assistant to the NSA Director, gave a fascinating talk over at the Long Now Foundation in August last year. During a sometimes intense q&a she had this exchange with one of the moderators, Stanford professor, Paul Saffo:

 Saffo: With big data as a friend likes to say “perhaps the data haystack that the intelligence community has created has grown too big to ever find the needle in.”

Neuberger : I think one of the reasons we talked about our desire to work with big data peers on analytics is because we certainly feel that we can glean far more value from the data that we have and potentially collect less data if we have a deeper understanding of how to better bring that together to develop more insights.

It’s a strange admission from a spokesperson from the nation’s premier cyber-intelligence agency that for their surveillance model to work they have to learn from the analytics of private sector big data companies whose models themselves are far from having proven their effectiveness.

Perhaps then, Rudder should have extended his skepticism beyond the world of dating websites. For me, I’ll only know big data in the security sphere works when our politicians, Noah like, seem unusually well prepared for a major crisis that the rest of us data poor chumps didn’t also see a mile away, and coming.

 

Advertisements

Sex and Love in the Age of Algorithms

Eros and Psyche

How’s this for a 21st century Valentine’s Day tale: a group of religious fundamentalists want to redefine human sexual and gender relationships based on a more than 2,000 year old religious text. Yet instead of doing this by aiming to seize hold of the cultural and political institutions of society, a task they find impossible, they create an algorithm which once people enter their experience is based on religiously derived assumptions users cannot see. People who enter this world have no control over their actions within it, and surrender their autonomy for the promise of finding their “soul mate”.

I’m not writing a science-fiction story- it’s a tale that’s essentially true.

One of the first places, perhaps the only place, where the desire to compress human behavior into algorithmically processable and rationalized “data”, has run into a wall was in the ever so irrational realms of sex and love. Perhaps I should have titled this piece “Cupid’s Revenge”, for the domain of sex and love has proved itself so unruly and non-computable that what is now almost unbelievable has happened- real human beings have been brought back into the process of making actual decisions that affect their lives rather than relying on silicon oracles to tell them what to do.

It’s a story not much known and therefore important to tell. The story begins with the exaggerated claims of what was one of the first and biggest online dating sites- eHarmony. Founded in 2000 by Neil Clark Warren, a clinical psychologist and former marriage counselor, eHarmony promoted itself as more than just a mere dating site claiming that it had the ability to help those using its service find their “soul mate”. As their senior research scientist, Gian C. Gonzaga, would put it:

 It is possible “to empirically derive a matchmaking algorithm that predicts the relationship of a couple before they ever meet.”

At the same time it made such claims, eHarmony was also very controlling in the way its customers were allowed to use its dating site. Members were not allowed to search for potential partners on their own, but directed to “appropriate” matches based on a 200 item questionnaire and directed by the site’s algorithm, which remained opaque to its users. This model of what dating should be was doubtless driven by Warren’s religious background, for in addition to his psychological credentials, Warren was also a Christian theologian.

By 2011 eHarmony garnered the attention of sceptical social psychologists, most notably, Eli J. Finkel, who, along with his co-authors, wrote a critical piece for the American Psychological Association in 2011 on eHarmony and related online dating sites.

What Finkle wanted to know was if claims such as that of eHarmony that it had discovered some ideal way to match individuals to long term partners actually stood up to critical scrutiny. What he and his authors concluded was that while online dating had opened up a new frontier for romantic relationships, it had not solved the problem of how to actually find the love of one’s life. Or as he later put it in a recent article:

As almost a century of research on romantic relationships has taught us, predicting whether two people are romantically compatible requires the sort of information that comes to light only after they have actually met.

Faced with critical scrutiny, eHarmony felt compelled to do something, to my knowledge, none of the programmers of the various algorithms that now mediate much of our relationship with the world have done; namely, to make the assumptions behind their algorithms explicit.

As Gonzaga explained it eHarmony’s matching algorithm was based on six key characteristics of users that included things like “level of agreeableness”  and “optimism”. Yet as another critic of eHarmony Dr. Reis told Gonzaga:

That agreeable person that you happen to be matching up with me would, in fact, get along famously with anyone in this room.

Still, the major problem critics found with eHarmony wasn’t just that it made exaggerated claims for the effectiveness of its romantic algorithms that were at best a version of skimming, it’s that it asserted nearly complete control over the way its users defined what love actually was. As is the case with many algorithms, the one used by eHarmony was a way for its designers and owners to constrain those using it to impose, rightly or wrongly, their own value assumptions about the world.

And like many classic romantic tales, this one ended with the rebellion of messy human emotion over reason and paternalistic control. Social psychologist weren’t the only ones who found eHarmony’s model constraining and weren’t the first to notice its flaws. One of the founders of an alternative dating site, Christian Rudder of OkCupid, has noted that much of what his organization has done was in light of the exaggerated claims for the efficacy of their algorithms and top-down constraints imposed by the creators of eHarmony. But it is another, much maligned dating site, Tinder, that proved to be the real rebel in this story.

Critics of Tinder, where users swipe through profile pictures to find potential dates have labeled the site a “hook-up” site that encourages shallowness. Yet Finkle concludes:

Yes, Tinder is superficial. It doesn’t let people browse profiles to find compatible partners, and it doesn’t claim to possess an algorithm that can find your soulmate. But this approach is at least honest and avoids the errors committed by more traditional approaches to online dating.

And appearance driven sites are unlikely to be the last word in online dating especially for older Romeos and Juliets who would like to go a little deeper than looks. Psychologist, Robert Epstein, working at the MIT Media Lab sees two up and coming trends that will likely further humanize the 21st century dating experience. The first is the rise of non-video game like virtual dating environments. As he describes it:

….so at some point you will be able to have, you know, something like a real date with someone, but do it virtually, which means the safety issue is taken care of and you’ll find out how you interact with someone in some semi-real setting or even a real setting; maybe you can go to some exotic place, maybe you can even go to the Champs-Elyséesin Paris or maybe you can go down to the local fast-food joint with them, but do it virtually and interact with them.

The other, just as important, but less tech-sexy change Epstine sees coming is bringing friends and family back into the dating experience:

Right now, if you sign up with the eHarmony or match.com or any of the other big services, you’re alone—you’re completely alone. It’s like being at a huge bar, but going without your guy friends or your girl friends—you’re really alone. But in the real world, the community is very helpful in trying to determine whether someone is right for you, and some of the new services allow you to go online with friends and family and have, you know, your best friend with you searching for potential partners, checking people out. So, that’s the new community approach to online dating.

As has long been the case, sex and love have been among the first set of explorers moving out into a previously unexplored realm of human possibility. Yet sex and love are also because of this the proverbial canary in the coal mine informing us of potential dangers. The experience of online dating suggest that we need to be sceptical of the exaggerated claims of the various algorithms that now mediate much of lives and be privy to their underlying assumptions. To be successful algorithms need to bring our humanity back into the loop rather than regulate it away as something messy, imperfect, irrational and unsystematic.

There is another lesson here as well, for the more something becomes disconnected from our human capacity to extend trust through person-to-person contact and through taping into the wisdom of our own collective networks of trust the more dependent we become on overseers who in exchange for protecting us from deception demand the kinds of intimate knowledge from us only friends and lovers deserve.