Truth and Prediction in the Dataclysm

The Deluge by Francis Danby. 1837-1839

Last time I looked at the state of online dating. Among the figures was mentioned was Christian Rudder, one of the founders of the dating site OkCupid and the author of a book on big data called Dataclysm: Who We Are When We Think No One’s Looking that somehow manages to be both laugh-out-loud funny and deeply disturbing at the same time.

Rudder is famous, or infamous depending on your view of the matter, for having written a piece about his site with the provocative title: We experiment on human beings!. There he wrote: 

We noticed recently that people didn’t like it when Facebook “experimented” with their news feed. Even the FTC is getting involved. But guess what, everybody: if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.

That statement might set the blood of some boiling, but my own negative reaction to it is somewhat tempered by the fact that Rudder’s willingness to run his experiments on his sites users originates, it seems, not in any conscious effort to be more successful at manipulating them, but as a way to quantify our ignorance. Or, as he puts it in the piece linked to above:

I’m the first to admit it: we might be popular, we might create a lot of great relationships, we might blah blah blah. But OkCupid doesn’t really know what it’s doing. Neither does any other website. It’s not like people have been building these things for very long, or you can go look up a blueprint or something. Most ideas are bad. Even good ideas could be better. Experiments are how you sort all this out.

Rudder eventually turned his experiments on the data of OkCupid’s users into his book Dataclysm which displays the same kind of brutal honesty and acknowledgement of the limits of our knowledge. What he is trying to do is make sense of the deluge of data now inundating us. The only way we have found to do this is to create sophisticated algorithms that allow us to discern patterns in the flood.  The problem with using algorithms to try and organize human interactions (which have themselves now become points of data) is that their users are often reduced into the version of what being a human beings is that have been embedded by the algorithm’s programmers. Rudder, is well aware and completely upfront about these limitations and refuses to make any special claims about algorithmic wisdom compared to the normal human sort. As he puts it in Dataclysm:

That said, all websites, and indeed all data scientists objectify. Algorithms don’t work well with things that aren’t numbers, so when you want a computer to understand an idea, you have to convert as much of it as you can into digits. The challenge facing sites and apps is thus to chop and jam the continuum of the of human experience into little buckets 1, 2, 3, without anyone noticing: to divide some vast, ineffable process- for Facebook, friendship, for Reddit, community, for dating sites, love- into a pieces a server can handle. (13)

At the same time, Rudder appears to see the data collected on sites such as OkCupid as a sort of mirror, reflecting back to us in ways we have never had available before the real truth about ourselves laid bare of the social conventions and politeness that tend to obscure the way we truly feel. And what Rudder finds in this data is not a reflection of the inner beauty of humanity one might hope for, but something more like the mirror out of A Picture of Dorian Grey.

As an example take what Rudder calls” Wooderson’s Law” after the character from Dazed and Confused who said in the film “That’s what I love about these high school girl, I get older while they stay the same age”. What Rudder has found is that heterosexual male attraction to females peaks when those women are in their early 20’s and thereafter precipitously falls. On OkCupid at least, women in their 30’s and 40’s are effectively invisible when competing against women in their 20’s for male sexual attraction. Fortunately for heterosexual men, women are more realistic in their expectations and tend to report the strongest attraction to men roughly their own age, until sometime in men’s 40’s where males attractiveness also falls off a cliff… gulp.

Another finding from Rudder’s work is not just that looks rule, but just how absolutely they rule. In his aforementioned piece, Rudder lays out that the vast majority of users essentially equate personality with looks. A particularly stunning women can find herself with a 99% personality rating even if she has not one word in her profile.

These are perhaps somewhat banal and even obvious discoveries about human nature Rudder has been able to mine from OkCupid’s data, and to my mind at least, are less disturbing than the deep seated racial bias he finds there as well. Again, at least among OkCupid’s users, dating preferences are heavily skewed against black men and women. Not just whites it seems, but all other racial groups- Asians, Hispanics would apparently prefer to date someone from a race other than African- disheartening for the 21st century.

Rudder looks at other dark manifestations of our collective self than those found in OkCupid data as well. Try using Google search as one would play the game Taboo. The search suggestions that pop up in the Google search bar, after all, are compiled on the basis of Google user’s most popular searches and thus provide a kind of gauge on what 1.17 billion human beings are thinking. Try these some of which Rudder plays himself:

“why do women?”

“why do men?”

“why do white people?”

“why do black people?”

“why do Asians?”

“why do Muslims?”

The exercise gives a whole new meaning to Nietzsche’s observation that “When you stare into the abyss, the abyss stares back”.

Rudder also looks at the ability of social media to engender mobs. Take this case from Twitter in 2014. On New Years Eve of that year a young woman tweeted:

“This beautiful earth is now 2014 years old, amazing.”

Her strength obviously wasn’t science in school, but what should have just led to collective giggles, or perhaps a polite correction regarding terrestrial chronology, ballooned into a storm of tweets like this:

“Kill yourself”


“Kill yourself you stupid motherfucker”. (139)

As a recent study has pointed out the emotion second most likely to go viral is rage, we can count ourselves very lucky the emotion most likely to go viral is awe.

Then there’s the question of the structure of the whole thing. Like Jaron Lanier, Rudder is struck by the degree to which the seemingly democratized architecture of the Internet appears to consistently manifest the opposite and reveal itself as following Zipf’s Law, which Rudder concisely reduces to:

rank x number = constant (160)

Both the economy and the society in the Internet age are dominated by “superstars”, companies (such as Google and FaceBook that so far outstrip their rivals in search or social media that they might be called monopolies), along with celebrities, musical artist, authors. Zipf’s Law also seems to apply to dating sites where a few profiles dominate the class of those viewed by potential partners. In the environment of a networked society where invisibility is the common fate of almost all of us and success often hinges on increasing our own visibility we are forced to turn ourselves towards “personal branding” and obsession over “Klout scores”. It’s not a new problem, but I wonder how much all this effort at garnering attention is stealing time from the effort at actual work that makes that attention worthwhile and long lasting.

Rudder is uncomfortable with all this algorithmization while at the same time accepting its inevitability. He writes of the project:

Reduction is inescapable. Algorithms are crude. Computers are machines. Data science is trying to make sense of an analog world. It’s a by-product of the basic physical nature of the micro-chip: a chip is just a sequence of tiny gates.

From that microscopic reality an absolutism propagates up through the whole enterprise, until at the highest level you have the definitions, data types and classes essential to programming languages like C and JavaScript.  (217-218)

Thing is, for all his humility at the effectiveness of big data so far, or his admittedly limited ability to draw solid conclusions from the data of OkCupid, he seems to place undue trust in the ability of large corporations and the security state to succeed at the same project. Much deeper data mining and superior analytics, he thinks, separate his efforts from those of the really big boys. Rudder writes:

Analytics has in many ways surpassed the information itself as the real lever to pry. Cookies in your web browser and guys hacking for your credit card numbers get most of the press and our certainly the most acutely annoying of the data collectors. But they’ve taken hold of a small fraction of your life and for that they’ve had to put in all kinds of work. (227)

He compares them to Mike Myer’s Dr. Evil holding the world hostage “for one million dollars”

… while the billions fly to the real masterminds, like Axicom. These corporate data marketers, with reach into bank and credit card records, retail histories, and government fillings like tax accounts, know stuff about human behavior that no academic researcher searching for patterns on some website ever could. Meanwhile the resources and expertise the national security apparatus brings to bear makes enterprise-level data mining look like Minesweeper (227)

Yet do we really know this faith in big data isn’t an illusion? What discernable effects that are clearly traceable to the juggernauts of big data ,such as Axicom, on the overall economy or even consumer behavior? For us to believe in the power of data shouldn’t someone have to show us the data that it works and not just the promise that it will transform the economy once it has achieved maximum penetration?

On that same score, what degree of faith should we put in the powers of big data when it comes to security? As far as I am aware no evidence has been produced that mass surveillance has prevented attacks- it didn’t stop the Charlie Hebo killers. Just as importantly, it seemingly hasn’t prevented our public officials from being caught flat footed and flabbergasted in the face of international events such as the revolution in Egypt or the war in Ukraine. And these later big events would seem to be precisely the kinds of predictions big data should find relatively easy- monitoring broad public sentiment as expressed through social media and across telecommunications networks and marrying that with inside knowledge of the machinations of the major political players at the storm center of events.

On this point of not yet mastering the art of being able to anticipate the future despite the mountains of data it was collecting,  Anne Neuberger, Special Assistant to the NSA Director, gave a fascinating talk over at the Long Now Foundation in August last year. During a sometimes intense q&a she had this exchange with one of the moderators, Stanford professor, Paul Saffo:

 Saffo: With big data as a friend likes to say “perhaps the data haystack that the intelligence community has created has grown too big to ever find the needle in.”

Neuberger : I think one of the reasons we talked about our desire to work with big data peers on analytics is because we certainly feel that we can glean far more value from the data that we have and potentially collect less data if we have a deeper understanding of how to better bring that together to develop more insights.

It’s a strange admission from a spokesperson from the nation’s premier cyber-intelligence agency that for their surveillance model to work they have to learn from the analytics of private sector big data companies whose models themselves are far from having proven their effectiveness.

Perhaps then, Rudder should have extended his skepticism beyond the world of dating websites. For me, I’ll only know big data in the security sphere works when our politicians, Noah like, seem unusually well prepared for a major crisis that the rest of us data poor chumps didn’t also see a mile away, and coming.


Sex and Love in the Age of Algorithms

Eros and Psyche

How’s this for a 21st century Valentine’s Day tale: a group of religious fundamentalists want to redefine human sexual and gender relationships based on a more than 2,000 year old religious text. Yet instead of doing this by aiming to seize hold of the cultural and political institutions of society, a task they find impossible, they create an algorithm which once people enter their experience is based on religiously derived assumptions users cannot see. People who enter this world have no control over their actions within it, and surrender their autonomy for the promise of finding their “soul mate”.

I’m not writing a science-fiction story- it’s a tale that’s essentially true.

One of the first places, perhaps the only place, where the desire to compress human behavior into algorithmically processable and rationalized “data”, has run into a wall was in the ever so irrational realms of sex and love. Perhaps I should have titled this piece “Cupid’s Revenge”, for the domain of sex and love has proved itself so unruly and non-computable that what is now almost unbelievable has happened- real human beings have been brought back into the process of making actual decisions that affect their lives rather than relying on silicon oracles to tell them what to do.

It’s a story not much known and therefore important to tell. The story begins with the exaggerated claims of what was one of the first and biggest online dating sites- eHarmony. Founded in 2000 by Neil Clark Warren, a clinical psychologist and former marriage counselor, eHarmony promoted itself as more than just a mere dating site claiming that it had the ability to help those using its service find their “soul mate”. As their senior research scientist, Gian C. Gonzaga, would put it:

 It is possible “to empirically derive a matchmaking algorithm that predicts the relationship of a couple before they ever meet.”

At the same time it made such claims, eHarmony was also very controlling in the way its customers were allowed to use its dating site. Members were not allowed to search for potential partners on their own, but directed to “appropriate” matches based on a 200 item questionnaire and directed by the site’s algorithm, which remained opaque to its users. This model of what dating should be was doubtless driven by Warren’s religious background, for in addition to his psychological credentials, Warren was also a Christian theologian.

By 2011 eHarmony garnered the attention of sceptical social psychologists, most notably, Eli J. Finkel, who, along with his co-authors, wrote a critical piece for the American Psychological Association in 2011 on eHarmony and related online dating sites.

What Finkle wanted to know was if claims such as that of eHarmony that it had discovered some ideal way to match individuals to long term partners actually stood up to critical scrutiny. What he and his authors concluded was that while online dating had opened up a new frontier for romantic relationships, it had not solved the problem of how to actually find the love of one’s life. Or as he later put it in a recent article:

As almost a century of research on romantic relationships has taught us, predicting whether two people are romantically compatible requires the sort of information that comes to light only after they have actually met.

Faced with critical scrutiny, eHarmony felt compelled to do something, to my knowledge, none of the programmers of the various algorithms that now mediate much of our relationship with the world have done; namely, to make the assumptions behind their algorithms explicit.

As Gonzaga explained it eHarmony’s matching algorithm was based on six key characteristics of users that included things like “level of agreeableness”  and “optimism”. Yet as another critic of eHarmony Dr. Reis told Gonzaga:

That agreeable person that you happen to be matching up with me would, in fact, get along famously with anyone in this room.

Still, the major problem critics found with eHarmony wasn’t just that it made exaggerated claims for the effectiveness of its romantic algorithms that were at best a version of skimming, it’s that it asserted nearly complete control over the way its users defined what love actually was. As is the case with many algorithms, the one used by eHarmony was a way for its designers and owners to constrain those using it to impose, rightly or wrongly, their own value assumptions about the world.

And like many classic romantic tales, this one ended with the rebellion of messy human emotion over reason and paternalistic control. Social psychologist weren’t the only ones who found eHarmony’s model constraining and weren’t the first to notice its flaws. One of the founders of an alternative dating site, Christian Rudder of OkCupid, has noted that much of what his organization has done was in light of the exaggerated claims for the efficacy of their algorithms and top-down constraints imposed by the creators of eHarmony. But it is another, much maligned dating site, Tinder, that proved to be the real rebel in this story.

Critics of Tinder, where users swipe through profile pictures to find potential dates have labeled the site a “hook-up” site that encourages shallowness. Yet Finkle concludes:

Yes, Tinder is superficial. It doesn’t let people browse profiles to find compatible partners, and it doesn’t claim to possess an algorithm that can find your soulmate. But this approach is at least honest and avoids the errors committed by more traditional approaches to online dating.

And appearance driven sites are unlikely to be the last word in online dating especially for older Romeos and Juliets who would like to go a little deeper than looks. Psychologist, Robert Epstein, working at the MIT Media Lab sees two up and coming trends that will likely further humanize the 21st century dating experience. The first is the rise of non-video game like virtual dating environments. As he describes it:

….so at some point you will be able to have, you know, something like a real date with someone, but do it virtually, which means the safety issue is taken care of and you’ll find out how you interact with someone in some semi-real setting or even a real setting; maybe you can go to some exotic place, maybe you can even go to the Champs-Elyséesin Paris or maybe you can go down to the local fast-food joint with them, but do it virtually and interact with them.

The other, just as important, but less tech-sexy change Epstine sees coming is bringing friends and family back into the dating experience:

Right now, if you sign up with the eHarmony or or any of the other big services, you’re alone—you’re completely alone. It’s like being at a huge bar, but going without your guy friends or your girl friends—you’re really alone. But in the real world, the community is very helpful in trying to determine whether someone is right for you, and some of the new services allow you to go online with friends and family and have, you know, your best friend with you searching for potential partners, checking people out. So, that’s the new community approach to online dating.

As has long been the case, sex and love have been among the first set of explorers moving out into a previously unexplored realm of human possibility. Yet sex and love are also because of this the proverbial canary in the coal mine informing us of potential dangers. The experience of online dating suggest that we need to be sceptical of the exaggerated claims of the various algorithms that now mediate much of lives and be privy to their underlying assumptions. To be successful algorithms need to bring our humanity back into the loop rather than regulate it away as something messy, imperfect, irrational and unsystematic.

There is another lesson here as well, for the more something becomes disconnected from our human capacity to extend trust through person-to-person contact and through taping into the wisdom of our own collective networks of trust the more dependent we become on overseers who in exchange for protecting us from deception demand the kinds of intimate knowledge from us only friends and lovers deserve.


Big Data as statistical masturbation

Infinite Book Tunnel

It’s just possible that there is a looming crisis in yet another technological sector whose proponents have leaped too far ahead, and too soon, promising all kinds of things they are unable to deliver. It strange how we keep ramming our head into this same damned wall, but this next crisis is perhaps more important than deflated hype at other times, say our over optimism about the timeline for human space flight in the 1970’s, or the “AI winter” in the 1980’s, or the miracles that seemed just at our fingertips when we cracked the Human Genome while pulling riches out of the air during the dotcom boom- both of which brought us to a state of mania in the 1990’s and early 2000’s.

The thing that separates a potentially new crisis in the area of so-called “Big-Data” from these earlier ones is that, literally overnight, we have reconstructed much of our economy, national security infrastructure and in the process of eroding our ancient right privacy on it’s yet to be proven premises. Now, we are on the verge of changing not just the nature of the science upon which we all depend, but nearly every other field of human intellectual endeavor. And we’ve done and are doing this despite the fact that the the most over the top promises of Big Data are about as epistemologically grounded as divining the future by looking at goat entrails.

Well, that might be a little unfair. Big Data is helpful, but the question is helpful for what? A tool, as opposed to a supposedly magical talisman has its limits, and understanding those limits should lead not to our jettisoning the tool of large scale data based analysis, but what needs to be done to make these new capacities actually useful rather than, like all forms of divination, comforting us with the idea that we can know the future and thus somehow exert control over it, when in reality both our foresight and our powers are much more limited.

Start with the issue of the digital economy. One model underlies most of the major Internet giants- Google, FaceBook and to a lesser extent Apple and Amazon, along with a whole set of behemoths who few of us can name but that underlie everything we do online, especially data aggregators such as Axicom. That model is to essentially gather up every last digital record we leave behind, many of them gained in exchange for “free” services and using this living archive to target advertisements at us.

It’s not only that this model has provided the infrastructure for an unprecedented violation of privacy by the security state (more on which below) it’s that there’s no real evidence that it even works.

Just anecdotally reflect on your own personal experience. If companies can very reasonably be said to know you better than your mother, your wife, or even you know yourself, why are the ads coming your way so damn obvious, and frankly even oblivious? In my own case, if I shop online for something, a hammer, a car, a pair of pants, I end up getting ads for that very same type of product weeks or even months after I have actually bought a version of the item I was searching for.

In large measure, the Internet is a giant market in which we can find products or information. Targeted ads can only really work if they are able refract in their marketed product’s favor the information I am searching for, if they lead me to buy something I would not have purchased in the first place. Derek Thompson, in the piece linked to above points out that this problem is called Endogeneity, or more colloquially: “hell, I was going to buy it anyway.”

The problem with this economic model, though, goes even deeper than that. At least one-third of clicks on digital ads aren’t human beings at all but bots that represent a way of gaming advertising revenue like something right out of a William Gibson novel.

Okay, so we have this economic model based on what at it’s root is really just spyware, and despite all the billions poured into it, we have no idea if it actually affects consumer behavior. That might be merely an annoying feature of the present rather than something to fret about were it not for the fact that this surveillance architecture has apparently been captured by the security services of the state. The model is essentially just a darker version of its commercial forbearer. Here the NSA, GCHQ et al hoover up as much of the Internet’s information as they can get their hands on. Ostensibly, their doing this so they can algorithmically sort through this data to identify threats.

In this case, we have just as many reasons to suspect that it doesn’t really work, and though they claim it does, none of these intelligence agencies will actually look at their supposed evidence that it does. The reasons to suspect that mass surveillance might suffer similar flaws as mass “personalized” marketing, was excellently summed up   in a recent article in the Financial Times Zeynep Tufekci when she wrote:

But the assertion that big data is “what it’s all about” when it comes to predicting rare events is not supported by what we know about how these methods work, and more importantly, don’t work. Analytics on massive datasets can be powerful in analysing and identifying broad patterns, or events that occur regularly and frequently, but are singularly unsuited to finding unpredictable, erratic, and rare needles in huge haystacks. In fact, the bigger the haystack — the more massive the scale and the wider the scope of the surveillance — the less suited these methods are to finding such exceptional events, and the more they may serve to direct resources and attention away from appropriate tools and methods.

I’ll get to what’s epistemologically wrong with using Big Data in the way used by the NSA that Tufekci rightly criticizes in a moment, but on a personal, not societal level, the biggest danger from getting the capabilities of Big Data wrong seems most likely to come through its potentially flawed use in medicine.

Here’s the kind of hype we’re in the midst of as found in a recent article by Tim Mcdonnell in Nautilus:

We’re well on our way to a future where massive data processing will power not just medical research, but nearly every aspect of society. Viktor Mayer-Schönberger, a data scholar at the University of Oxford’s Oxford Internet Institute, says we are in the midst of a fundamental shift from a culture in which we make inferences about the world based on a small amount of information to one in which sweeping new insights are gleaned by steadily accumulating a virtually limitless amount of data on everything.

The value of collecting all the information, says Mayer-Schönberger, who published an exhaustive treatise entitled Big Data in March, is that “you don’t have to worry about biases or randomization. You don’t have to worry about having a hypothesis, a conclusion, beforehand.” If you look at everything, the landscape will become apparent and patterns will naturally emerge.

Here’s the problem with this line of reasoning, a problem that I think is the same, and shares the same solution to the issue of mass surveillance by the NSA and other security agencies. It begins with this idea that “the landscape will become apparent and patterns will naturally emerge.”

The flaw that this reasoning suffers has to do with the way very large data sets work. One would think that the fact that sampling millions of people, which we’re now able to do via ubiquitous monitoring, would offer enormous gains over the way we used to be confined to population samples of only a few thousand, yet this isn’t necessarily the case. The problem is the larger your sample size the greater your chance at false correlations.

Previously I had thought that surely this is a problem that statisticians had either solved or were on the verge of solving. They’re not, at least according to the computer scientist Michael Jordan, who fears that we might be on the verge of a “Big Data winter” similar to the one AI went through in the 1980’s and 90’s. Let’s say you had an extremely large database with multiple forms of metrics:

Now, if I start allowing myself to look at all of the combinations of these features—if you live in Beijing, and you ride bike to work, and you work in a certain job, and are a certain age—what’s the probability you will have a certain disease or you will like my advertisement? Now I’m getting combinations of millions of attributes, and the number of such combinations is exponential; it gets to be the size of the number of atoms in the universe.

Those are the hypotheses that I’m willing to consider. And for any particular database, I will find some combination of columns that will predict perfectly any outcome, just by chance alone. If I just look at all the people who have a heart attack and compare them to all the people that don’t have a heart attack, and I’m looking for combinations of the columns that predict heart attacks, I will find all kinds of spurious combinations of columns, because there are huge numbers of them.

The actual mathematics of sorting out spurious from potentially useful correlations from being distinguished is, in Jordan’s estimation, far from being worked out:

We are just getting this engineering science assembled. We have many ideas that come from hundreds of years of statistics and computer science. And we’re working on putting them together, making them scalable. A lot of the ideas for controlling what are called familywise errors, where I have many hypotheses and want to know my error rate, have emerged over the last 30 years. But many of them haven’t been studied computationally. It’s hard mathematics and engineering to work all this out, and it will take time.

It’s not a year or two. It will take decades to get right. We are still learning how to do big data well.

Alright, now that’s a problem. As you’ll no doubt notice the danger of false correlation that Jordan identifies as a problem for science is almost exactly the same critique Tufekci  made against the mass surveillance of the NSA. That is, unless the NSA and its cohorts have actually solved the statistical/engineering problems Jordan identified and haven’t told us, all the biggest data haystack in the world is going to lead to is too many leads to follow, most of them false, and many of which will drain resources from actual public protection. Perhaps equally troubling: if security services have solved these statistical/engineering problems how much will be wasted in research funding and how many lives will be lost because medical scientists were kept from the tools that would have empowered their research?

At least part of the solution to this will be remembering why we developed statistical analysis in the first place. Herbert I. Weisberg with his recent book Willful Ignorance: The Mismeasure of Uncertainty has provided a wonderful, short primer on the subject.

Statistical evidence, according to Weisberg was first introduced to medical research back in the 1950’s as a protection against exaggerated claims to efficacy and widespread quackery. Since then we have come to take the p value .05 almost as the truth itself. Weisberg’s book is really a plea to clinicians to know their patients and not rely almost exclusively on statistical analyses of “average” patients to help those in their care make life altering decisions in terms of what medicines to take or procedures to undergo. Weisberg thinks that personalized medicine will over the long term solve these problems, and while I won’t go into my doubts about that here, I do think, in the experience of the physician, he identifies the root to the solution of our Big Data problem.

Rather than think of Big Data as somehow providing us with a picture of reality, “naturally emerging” as Mayer-Schönberger quoted above suggested we should start to view it as a way to easily and cheaply give us a metric for the potential validity of a hypothesis. And it’s not only the first step that continues to be guided by old fashioned science rather than computer driven numerology but the remaining steps as well, a positive signal  followed up by actual scientist and other researchers doing such now rusting skills as actual experiments and building theories to explain their results. Big Data, if done right, won’t end up making science a form of information promising, but will instead be used as the primary tool for keeping scientist from going down a cul-de-sac.

The same principle applied to mass surveillance means a return to old school human intelligence even if it now needs to be empowered by new digital tools. Rather than Big Data being used to hoover up and analyze all potential leads, espionage and counterterrorism should become more targeted and based on efforts to understand and penetrate threat groups themselves. The move back to human intelligence and towards more targeted surveillance rather than the mass data grab symbolized by Bluffdale may be a reality forced on the NSA et al by events. In part due to the Snowden revelations terrorist and criminal networks have already abandoned the non-secure public networks which the rest of us use. Mass surveillance has lost its raison d’etre.

At least it terms of science and medicine, I recently saw a version of how Big Data done right might work. In an article for Qunta and Scientific American by Veronique Greenwood she discussed two recent efforts by researchers to use Big Data to find new understandings of and treatments for disease.

The physicist (not biologist) Stefan Thurner has created a network model of comorbid diseases trying to uncover the hidden relationships between different, seemingly unrelated medical conditions. What I find interesting about this is that it gives us a new way of understanding disease, breaking free of hermetically sealed categories that may blind us to underlying shared mechanisms by medical conditions. I find this especially pressing where it comes to mental health where the kind of symptom listing found in the DSM- the Bible for mental health care professionals- has never resulted in a causative model of how conditions such as anxiety or depression actually work and is based on an antiquated separation between the mind and the body not to mention the social and environmental factors that all give shape to mental health.

Even more interesting, from Greenwood’s piece, are the efforts by Joseph Loscalzo of Harvard Medical School to try and come up with a whole new model for disease that looks beyond genome associations for diseases to map out the molecular networks of disease isolating the statistical correlation between a particular variant of such a map and a disease. This relationship between genes and proteins correlated with a disease is something Loscalzo calls a “disease module”.

Thurner describes the underlying methodology behind his, and by implication Loscalzo’s,  efforts to Greenwood this way:

Once you draw a network, you are drawing hypotheses on a piece of paper,” Thurner said. “You are saying, ‘Wow, look, I didn’t know these two things were related. Why could they be? Or is it just that our statistical threshold did not kick it out?’” In network analysis, you first validate your analysis by checking that it recreates connections that people have already identified in whatever system you are studying. After that, Thurner said, “the ones that did not exist before, those are new hypotheses. Then the work really starts.

It’s the next steps, the testing of hypotheses, the development of a stable model where the most important work really lies. Like any intellectual fad, Big Data has its element of truth. We can now much more easily distill large and sometimes previously invisible  patterns from the deluge of information in which we are now drowning. This has potentially huge benefits for science, medicine, social policy, and law enforcement.

The problem comes from thinking that we are at the point where our data crunching algorithms can do the work for us and are about to replace the human beings and their skills at investigating problems deeply and in the real world. The danger there would be thinking that knowledge could work like self-gratification a mere thing of the mind without all the hard work, compromises, and conflict between expectations and reality that goes into a real relationship. Ironically, this was a truth perhaps discovered first not by scientists or intelligence agencies but by online dating services. To that strange story, next time….

Edward O. Wilson’s Dull Paradise

Garden of Eden

In all sincerity I have to admit that there is much I admire about the biologist Edward O. Wilson. I can only pray that not only should I live into my 80’s, but still possess the intellectual stamina to write what are at least thought provoking books when I get there. I also wish I still have the balls to write a book with the title of Wilson’s latest- The Meaning of Human Existence, for publishing with an appellation like that would mean I wasn’t afraid I would disappoint my readers, and Wilson did indeed leave me wondering if the whole thing was worth the effort.

Nevertheless,  I think Wilson opened up an important alternative future that is seldom discussed here- namely what if we aimed not at a supposedly brighter, so-called post-human future but to keep things the same? Well, there would be some changes, no extremes of human poverty, along with the restoration of much of the natural environment to its pre-industrial revolution health. Still, we ourselves would aim to stay largely the same human beings who emerged some 100,000 years ago- flaws and all.

Wilson calls this admittedly conservative vision paradise, and I’ve seen his eyes light up like a child contemplating Christmas when using the word in interviews. Another point that might be of interest to this audience is who he largely blames for keeping us from entering this Shangri-la; archaic religions and their “creation stories.”

I have to admit that I find the idea of trying to preserve humanity as it is a valid alternative future. After all, “evolve or die” isn’t really the way nature works. Typically the “goal” of evolution is to find a “design” that works and then stick with it for as long as possible. Since we now dominate the entire planet and our numbers out-rival by a long way any other large animal it seems hard to assert that we need a major, and likely risky, upgrade. Here’s Wilson making the case:

While I am at it, I hereby cast a vote for existential conservatism, the preservation of biological human nature as a sacred trust. We are doing very well in terms of science and technology. Let’s agree to keep that up, and move both along even faster. But let’s also promote the humanities, that which makes us human, and not use science to mess around with the wellspring of this, the absolute and unique potential of the human future. (60)

It’s an idea that rings true to my inner Edmund Burke, and sounds simple, doesn’t it? And on reflection it would be, if human beings were bison, blue whales, or gray wolves. Indeed, I think Wilson has drawn this idea of human preservation from his lifetime of very laudable work on biodiversity. Yet had he reflected upon why efforts at preservation fail when they do he would have realized that the problem isn’t the wildlife itself, but the human beings who don’t share the same value system going in the opposite direction. That is, humans, though we are certainly animals, aren’t wildlife, in the sense that we take destiny into our own hands, even if doing so is sometimes for the worse. Wilson seems to think that it’s quite a short step from asserting it as a goal to gaining universal assent to the “preservation of biological human nature as a sacred trust”, the problem is there is no widespread agreement over what human nature even is, and then, even if you had such agreement, how in the world do you go about enforcing it for the minority who refuse to adhere to it? How far should we be willing to go to prevent persons from willingly crossing some line that defines what a human being is? And where exactly is that line in the first place? Wilson thinks we’re near the end of the argument when we only just took our seat at the debate.

Strange thing is the very people who would likely naturally lean towards the kind of biological conservatism that Wilson hopes “we” will ultimately choose are the sorts of traditionally religious persons he thinks are at the root of most of our conflicts. Here again is Wilson:

Religious warriors are not an anomaly. It is a mistake to classify believers of a particular religious and dogmatic religion-like ideologies into two groups, moderates versus extremists. The true cause of hatred and religious violence is faith versus faith, an outward expression of the ancient instinct of tribalism. Faith is the one thing that makes otherwise good people do bad things. (154)

For Wilson, a religious groups “defines itself foremost by its creation story, the supernatural narrative that explains how human beings came into existence.” (151)  The trouble with this is that it’s not even superficially true. Three of the world’s religions that have been busy killing one another over the last millennium – Judaism, Christianity and Islam all have the same creation story. Wilson knows a hell of a lot more about ants and evolution then he does about religion or even world history. And while religion is certainly the root of some of our tribalism, which I agree is the deep and perennial human problem, it’s far from the only source, and very few of our tribal conflicts have anything to do with the fight between human beings over our origins in the deep past. How about class conflict? Or racial conflict? Or nationalist conflicts when the two sides profess the not only the exact same religion but the exact same sect- such as the current fight between the two Christian Orthodox nations of Russia and Ukraine? If China and Japan someday go to war it will not be a horrifying replay of the Scopes Monkey Trial.

For a book called The Meaning of Human Existence Wilson’s ideas have very little explanatory power when it comes to anything other than our biological origins, and some quite questionable ideas regarding the origins of our capacity for violence. That is, the book lacks depth, and because of this I found it, well… dull.

Nowhere was I more hopeful that Wilson would have something interesting and different to say than when it came to the question of extraterrestrial life. Here we have one of the world’s greatest living biologists, a man who had spent a lifetime studying ants as an alternative route to the kinds of eusociality possessed only by humans, the naked mole rat, and a handful of insects. Here was a scientists who was clearly passionate about preserving the amazing diversity of life on our small planet.

Yet Wilson’s E.T.s are land dwellers, relatively large, biologically audiovisual, “their head is distinct, big, and located up front” (115) they have moderate teeth and jaws, they have a high social intelligence, and “a small number of free locomotory appendages, levered for maximum strength with stiff internal or external skeletons composed of hinged segments (as by human elbows and knees), and with at least one pair of which are terminated by digits with pulpy tips used for sensitive touch and grasping. “ (116)

In other words they are little green men.

What I had hoped was the Wilson would have used his deep knowledge of biology to imagine alternative paths to technological civilization. Couldn’t he have imagined a hive-like species that evolves in tandem with its own technological advancement? Or maybe some larger form of insect like animal which doesn’t just have an instinctive repertoire of things that it builds, but constantly improves upon its own designs, and explores the space of possible technologies? Or aquatic species that develop something like civilization through the use of sea-herding and ocean farming? How about species that communicate not audio-visually but through electrical impulses the way our computers do?

After all, nature on earth is pretty weird. There’s not just us, but termites that build air conditioned skyscrapers (at least from their view), whales which have culturally specific songs, and strange little things that eat and excrete electrons. One might guess that life elsewhere will be even weirder. Perhaps my problem with The Meaning of Human Existence is that it just wasn’t weird enough not just to capture the worlds of tomorrow and elsewhere- but the one we’re living in right now.