Truth and Prediction in the Dataclysm

The Deluge by Francis Danby. 1837-1839

Last time I looked at the state of online dating. Among the figures was mentioned was Christian Rudder, one of the founders of the dating site OkCupid and the author of a book on big data called Dataclysm: Who We Are When We Think No One’s Looking that somehow manages to be both laugh-out-loud funny and deeply disturbing at the same time.

Rudder is famous, or infamous depending on your view of the matter, for having written a piece about his site with the provocative title: We experiment on human beings!. There he wrote:

We noticed recently that people didn’t like it when Facebook “experimented” with their news feed. Even the FTC is getting involved. But guess what, everybody: if you use the Internet, you’re the subject of hundreds of experiments at any given time, on every site. That’s how websites work.

That statement might set the blood of some boiling, but my own negative reaction to it is somewhat tempered by the fact that Rudder’s willingness to run his experiments on his sites users originates, it seems, not in any conscious effort to be more successful at manipulating them, but as a way to quantify our ignorance. Or, as he puts it in the piece linked to above:

I’m the first to admit it: we might be popular, we might create a lot of great relationships, we might blah blah blah. But OkCupid doesn’t really know what it’s doing. Neither does any other website. It’s not like people have been building these things for very long, or you can go look up a blueprint or something. Most ideas are bad. Even good ideas could be better. Experiments are how you sort all this out.

Rudder eventually turned his experiments on the data of OkCupid’s users into his book Dataclysm which displays the same kind of brutal honesty and acknowledgement of the limits of our knowledge. What he is trying to do is make sense of the deluge of data now inundating us. The only way we have found to do this is to create sophisticated algorithms that allow us to discern patterns in the flood. The problem with using algorithms to try and organize human interactions (which have themselves now become points of data) is that their users are often reduced into the version of what being a human beings is that have been embedded by the algorithm’s programmers. Rudder, is well aware and completely upfront about these limitations and refuses to make any special claims about algorithmic wisdom compared to the normal human sort. As he puts it in Dataclysm:

That said, all websites, and indeed all data scientists objectify. Algorithms don’t work well with things that aren’t numbers, so when you want a computer to understand an idea, you have to convert as much of it as you can into digits. The challenge facing sites and apps is thus to chop and jam the continuum of the of human experience into little buckets 1, 2, 3, without anyone noticing: to divide some vast, ineffable process- for Facebook, friendship, for Reddit, community, for dating sites, love- into a pieces a server can handle. (13)

At the same time, Rudder appears to see the data collected on sites such as OkCupid as a sort of mirror, reflecting back to us in ways we have never had available before the real truth about ourselves laid bare of the social conventions and politeness that tend to obscure the way we truly feel. And what Rudder finds in this data is not a reflection of the inner beauty of humanity one might hope for, but something more like the mirror out of A Picture of Dorian Grey.

As an example take what Rudder calls” Wooderson’s Law” after the character from Dazed and Confused who said in the film “That’s what I love about these high school girl, I get older while they stay the same age”. What Rudder has found is that heterosexual male attraction to females peaks when those women are in their early 20’s and thereafter precipitously falls. On OkCupid at least, women in their 30’s and 40’s are effectively invisible when competing against women in their 20’s for male sexual attraction. Fortunately for heterosexual men, women are more realistic in their expectations and tend to report the strongest attraction to men roughly their own age, until sometime in men’s 40’s where males attractiveness also falls off a cliff… gulp.

Another finding from Rudder’s work is not just that looks rule, but just how absolutely they rule. In his aforementioned piece, Rudder lays out that the vast majority of users essentially equate personality with looks. A particularly stunning women can find herself with a 99% personality rating even if she has not one word in her profile.

These are perhaps somewhat banal and even obvious discoveries about human nature Rudder has been able to mine from OkCupid’s data, and to my mind at least, are less disturbing than the deep seated racial bias he finds there as well. Again, at least among OkCupid’s users, dating preferences are heavily skewed against black men and women. Not just whites it seems, but all other racial groups- Asians, Hispanics would apparently prefer to date someone from a race other than African- disheartening for the 21st century.

Rudder looks at other dark manifestations of our collective self than those found in OkCupid data as well. Try using Google search as one would play the game Taboo. The search suggestions that pop up in the Google search bar, after all, are compiled on the basis of Google user’s most popular searches and thus provide a kind of gauge on what 1.17 billion human beings are thinking. Try these some of which Rudder plays himself:

“why do women?”

“why do men?”

“why do white people?”

“why do black people?”

“why do Asians?”

“why do Muslims?”

The exercise gives a whole new meaning to Nietzsche’s observation that “When you stare into the abyss, the abyss stares back”.

Rudder also looks at the ability of social media to engender mobs. Take this case from Twitter in 2014. On New Years Eve of that year a young woman tweeted:

“This beautiful earth is now 2014 years old, amazing.”

Her strength obviously wasn’t science in school, but what should have just led to collective giggles, or perhaps a polite correction regarding terrestrial chronology, ballooned into a storm of tweets like this:

“Kill yourself”

And:

“Kill yourself you stupid motherfucker”. (139)

As a recent study has pointed out the emotion second most likely to go viral is rage, we can count ourselves very lucky the emotion most likely to go viral is awe.

Then there’s the question of the structure of the whole thing. Like Jaron Lanier, Rudder is struck by the degree to which the seemingly democratized architecture of the Internet appears to consistently manifest the opposite and reveal itself as following Zipf’s Law, which Rudder concisely reduces to:

rank x number = constant (160)

Both the economy and the society in the Internet age are dominated by “superstars”, companies (such as Google and FaceBook that so far outstrip their rivals in search or social media that they might be called monopolies), along with celebrities, musical artist, authors. Zipf’s Law also seems to apply to dating sites where a few profiles dominate the class of those viewed by potential partners. In the environment of a networked society where invisibility is the common fate of almost all of us and success often hinges on increasing our own visibility we are forced to turn ourselves towards “personal branding” and obsession over “Klout scores”. It’s not a new problem, but I wonder how much all this effort at garnering attention is stealing time from the effort at actual work that makes that attention worthwhile and long lasting.

Rudder is uncomfortable with all this algorithmization while at the same time accepting its inevitability. He writes of the project:

Reduction is inescapable. Algorithms are crude. Computers are machines. Data science is trying to make sense of an analog world. It’s a by-product of the basic physical nature of the micro-chip: a chip is just a sequence of tiny gates.

From that microscopic reality an absolutism propagates up through the whole enterprise, until at the highest level you have the definitions, data types and classes essential to programming languages like C and JavaScript. (217-218)

Thing is, for all his humility at the effectiveness of big data so far, or his admittedly limited ability to draw solid conclusions from the data of OkCupid, he seems to place undue trust in the ability of large corporations and the security state to succeed at the same project. Much deeper data mining and superior analytics, he thinks, separate his efforts from those of the really big boys. Rudder writes:

Analytics has in many ways surpassed the information itself as the real lever to pry. Cookies in your web browser and guys hacking for your credit card numbers get most of the press and are certainly the most acutely annoying of the data collectors. But they’ve taken hold of a small fraction of your life and for that they’ve had to put in all kinds of work. (227)

He compares them to Mike Myer’s Dr. Evil holding the world hostage “for one million dollars”

… while the billions fly to the real masterminds, like Axicom. These corporate data marketers, with reach into bank and credit card records, retail histories, and government fillings like tax accounts, know stuff about human behavior that no academic researcher searching for patterns on some website ever could. Meanwhile the resources and expertise the national security apparatus brings to bear makes enterprise-level data mining look like Minesweeper (227)

Yet do we really know this faith in big data isn’t an illusion? What discernable effects that are clearly traceable to the juggernauts of big data ,such as Axicom, on the overall economy or even consumer behavior? For us to believe in the power of data shouldn’t someone have to show us the data that it works and not just the promise that it will transform the economy once it has achieved maximum penetration?

On that same score, what degree of faith should we put in the powers of big data when it comes to security? As far as I am aware no evidence has been produced that mass surveillance has prevented attacks- it didn’t stop the Charlie Hebo killers. Just as importantly, it seemingly hasn’t prevented our public officials from being caught flat footed and flabbergasted in the face of international events such as the revolution in Egypt or the war in Ukraine. And these later big events would seem to be precisely the kinds of predictions big data should find relatively easy- monitoring broad public sentiment as expressed through social media and across telecommunications networks and marrying that with inside knowledge of the machinations of the major political players at the storm center of events.

On this point of not yet mastering the art of being able to anticipate the future despite the mountains of data it was collecting, Anne Neuberger, Special Assistant to the NSA Director, gave a fascinating talk over at the Long Now Foundation in August last year. During a sometimes intense q&a she had this exchange with one of the moderators, Stanford professor, Paul Saffo:

Saffo: With big data as a friend likes to say “perhaps the data haystack that the intelligence community has created has grown too big to ever find the needle in.”

Neuberger : I think one of the reasons we talked about our desire to work with big data peers on analytics is because we certainly feel that we can glean far more value from the data that we have and potentially collect less data if we have a deeper understanding of how to better bring that together to develop more insights.

It’s a strange admission from a spokesperson from the nation’s premier cyber-intelligence agency that for their surveillance model to work they have to learn from the analytics of private sector big data companies whose models themselves are far from having proven their effectiveness.

Perhaps then, Rudder should have extended his skepticism beyond the world of dating websites. For me, I’ll only know big data in the security sphere works when our politicians, Noah like, seem unusually well prepared for a major crisis that the rest of us data poor chumps didn’t also see a mile away, and coming.

Big Data as statistical masturbation

Feb8

Infinite Book Tunnel

It’s just possible that there is a looming crisis in yet another technological sector whose proponents have leaped too far ahead, and too soon, promising all kinds of things they are unable to deliver. It strange how we keep ramming our head into this same damned wall, but this next crisis is perhaps more important than deflated hype at other times, say our over optimism about the timeline for human space flight in the 1970’s, or the “AI winter” in the 1980’s, or the miracles that seemed just at our fingertips when we cracked the Human Genome while pulling riches out of the air during the dotcom boom- both of which brought us to a state of mania in the 1990’s and early 2000’s.

The thing that separates a potentially new crisis in the area of so-called “Big-Data” from these earlier ones is that, literally overnight, we have reconstructed much of our economy, national security infrastructure and in the process of eroding our ancient right privacy on it’s yet to be proven premises. Now, we are on the verge of changing not just the nature of the science upon which we all depend, but nearly every other field of human intellectual endeavor. And we’ve done and are doing this despite the fact that the the most over the top promises of Big Data are about as epistemologically grounded as divining the future by looking at goat entrails.

Well, that might be a little unfair. Big Data is helpful, but the question is helpful for what? A tool, as opposed to a supposedly magical talisman has its limits, and understanding those limits should lead not to our jettisoning the tool of large scale data based analysis, but what needs to be done to make these new capacities actually useful rather than, like all forms of divination, comforting us with the idea that we can know the future and thus somehow exert control over it, when in reality both our foresight and our powers are much more limited.

Start with the issue of the digital economy. One model underlies most of the major Internet giants- Google, FaceBook and to a lesser extent Apple and Amazon, along with a whole set of behemoths who few of us can name but that underlie everything we do online, especially data aggregators such as Axicom. That model is to essentially gather up every last digital record we leave behind, many of them gained in exchange for “free” services and using this living archive to target advertisements at us.

It’s not only that this model has provided the infrastructure for an unprecedented violation of privacy by the security state (more on which below) it’s that there’s no real evidence that it even works.

Just anecdotally reflect on your own personal experience. If companies can very reasonably be said to know you better than your mother, your wife, or even you know yourself, why are the ads coming your way so damn obvious, and frankly even oblivious? In my own case, if I shop online for something, a hammer, a car, a pair of pants, I end up getting ads for that very same type of product weeks or even months after I have actually bought a version of the item I was searching for.

In large measure, the Internet is a giant market in which we can find products or information. Targeted ads can only really work if they are able refract in their marketed product’s favor the information I am searching for, if they lead me to buy something I would not have purchased in the first place. Derek Thompson, in the piece linked to above points out that this problem is called Endogeneity, or more colloquially: “hell, I was going to buy it anyway.”

The problem with this economic model, though, goes even deeper than that. At least one-third of clicks on digital ads aren’t human beings at all but bots that represent a way of gaming advertising revenue like something right out of a William Gibson novel.

Okay, so we have this economic model based on what at it’s root is really just spyware, and despite all the billions poured into it, we have no idea if it actually affects consumer behavior. That might be merely an annoying feature of the present rather than something to fret about were it not for the fact that this surveillance architecture has apparently been captured by the security services of the state. The model is essentially just a darker version of its commercial forbearer. Here the NSA, GCHQ et al hoover up as much of the Internet’s information as they can get their hands on. Ostensibly, their doing this so they can algorithmically sort through this data to identify threats.

In this case, we have just as many reasons to suspect that it doesn’t really work, and though they claim it does, none of these intelligence agencies will actually look at their supposed evidence that it does. The reasons to suspect that mass surveillance might suffer similar flaws as mass “personalized” marketing, was excellently summed up in a recent article in the Financial Times Zeynep Tufekci when she wrote:

But the assertion that big data is “what it’s all about” when it comes to predicting rare events is not supported by what we know about how these methods work, and more importantly, don’t work. Analytics on massive datasets can be powerful in analysing and identifying broad patterns, or events that occur regularly and frequently, but are singularly unsuited to finding unpredictable, erratic, and rare needles in huge haystacks. In fact, the bigger the haystack — the more massive the scale and the wider the scope of the surveillance — the less suited these methods are to finding such exceptional events, and the more they may serve to direct resources and attention away from appropriate tools and methods.

I’ll get to what’s epistemologically wrong with using Big Data in the way used by the NSA that Tufekci rightly criticizes in a moment, but on a personal, not societal level, the biggest danger from getting the capabilities of Big Data wrong seems most likely to come through its potentially flawed use in medicine.

Here’s the kind of hype we’re in the midst of as found in a recent article by Tim Mcdonnell in Nautilus:

We’re well on our way to a future where massive data processing will power not just medical research, but nearly every aspect of society. Viktor Mayer-Schönberger, a data scholar at the University of Oxford’s Oxford Internet Institute, says we are in the midst of a fundamental shift from a culture in which we make inferences about the world based on a small amount of information to one in which sweeping new insights are gleaned by steadily accumulating a virtually limitless amount of data on everything.

The value of collecting all the information, says Mayer-Schönberger, who published an exhaustive treatise entitled Big Data in March, is that “you don’t have to worry about biases or randomization. You don’t have to worry about having a hypothesis, a conclusion, beforehand.” If you look at everything, the landscape will become apparent and patterns will naturally emerge.

Here’s the problem with this line of reasoning, a problem that I think is the same, and shares the same solution to the issue of mass surveillance by the NSA and other security agencies. It begins with this idea that “the landscape will become apparent and patterns will naturally emerge.”

The flaw that this reasoning suffers has to do with the way very large data sets work. One would think that the fact that sampling millions of people, which we’re now able to do via ubiquitous monitoring, would offer enormous gains over the way we used to be confined to population samples of only a few thousand, yet this isn’t necessarily the case. The problem is the larger your sample size the greater your chance at false correlations.

Previously I had thought that surely this is a problem that statisticians had either solved or were on the verge of solving. They’re not, at least according to the computer scientist Michael Jordan, who fears that we might be on the verge of a “Big Data winter” similar to the one AI went through in the 1980’s and 90’s. Let’s say you had an extremely large database with multiple forms of metrics:

Now, if I start allowing myself to look at all of the combinations of these features—if you live in Beijing, and you ride bike to work, and you work in a certain job, and are a certain age—what’s the probability you will have a certain disease or you will like my advertisement? Now I’m getting combinations of millions of attributes, and the number of such combinations is exponential; it gets to be the size of the number of atoms in the universe.

Those are the hypotheses that I’m willing to consider. And for any particular database, I will find some combination of columns that will predict perfectly any outcome, just by chance alone. If I just look at all the people who have a heart attack and compare them to all the people that don’t have a heart attack, and I’m looking for combinations of the columns that predict heart attacks, I will find all kinds of spurious combinations of columns, because there are huge numbers of them.

The actual mathematics of sorting out spurious from potentially useful correlations from being distinguished is, in Jordan’s estimation, far from being worked out:

We are just getting this engineering science assembled. We have many ideas that come from hundreds of years of statistics and computer science. And we’re working on putting them together, making them scalable. A lot of the ideas for controlling what are called familywise errors, where I have many hypotheses and want to know my error rate, have emerged over the last 30 years. But many of them haven’t been studied computationally. It’s hard mathematics and engineering to work all this out, and it will take time.

It’s not a year or two. It will take decades to get right. We are still learning how to do big data well.

Alright, now that’s a problem. As you’ll no doubt notice the danger of false correlation that Jordan identifies as a problem for science is almost exactly the same critique Tufekci made against the mass surveillance of the NSA. That is, unless the NSA and its cohorts have actually solved the statistical/engineering problems Jordan identified and haven’t told us, all the biggest data haystack in the world is going to lead to is too many leads to follow, most of them false, and many of which will drain resources from actual public protection. Perhaps equally troubling: if security services have solved these statistical/engineering problems how much will be wasted in research funding and how many lives will be lost because medical scientists were kept from the tools that would have empowered their research?

At least part of the solution to this will be remembering why we developed statistical analysis in the first place. Herbert I. Weisberg with his recent book Willful Ignorance: The Mismeasure of Uncertainty has provided a wonderful, short primer on the subject.

Statistical evidence, according to Weisberg was first introduced to medical research back in the 1950’s as a protection against exaggerated claims to efficacy and widespread quackery. Since then we have come to take the p value .05 almost as the truth itself. Weisberg’s book is really a plea to clinicians to know their patients and not rely almost exclusively on statistical analyses of “average” patients to help those in their care make life altering decisions in terms of what medicines to take or procedures to undergo. Weisberg thinks that personalized medicine will over the long term solve these problems, and while I won’t go into my doubts about that here, I do think, in the experience of the physician, he identifies the root to the solution of our Big Data problem.

Rather than think of Big Data as somehow providing us with a picture of reality, “naturally emerging” as Mayer-Schönberger quoted above suggested we should start to view it as a way to easily and cheaply give us a metric for the potential validity of a hypothesis. And it’s not only the first step that continues to be guided by old fashioned science rather than computer driven numerology but the remaining steps as well, a positive signal followed up by actual scientist and other researchers doing such now rusting skills as actual experiments and building theories to explain their results. Big Data, if done right, won’t end up making science a form of information processing, but will instead be used as the primary tool for keeping scientist from going down a cul-de-sac.

The same principle applied to mass surveillance means a return to old school human intelligence even if it now needs to be empowered by new digital tools. Rather than Big Data being used to hoover up and analyze all potential leads, espionage and counterterrorism should become more targeted and based on efforts to understand and penetrate threat groups themselves. The move back to human intelligence and towards more targeted surveillance rather than the mass data grab symbolized by Bluffdale may be a reality forced on the NSA et al by events. In part due to the Snowden revelations terrorist and criminal networks have already abandoned the non-secure public networks which the rest of us use. Mass surveillance has lost its raison d’etre.

At least it terms of science and medicine, I recently saw a version of how Big Data done right might work. In an article for Qunta and Scientific American by Veronique Greenwood she discussed two recent efforts by researchers to use Big Data to find new understandings of and treatments for disease.

The physicist (not biologist) Stefan Thurner has created a network model of comorbid diseases trying to uncover the hidden relationships between different, seemingly unrelated medical conditions. What I find interesting about this is that it gives us a new way of understanding disease, breaking free of hermetically sealed categories that may blind us to underlying shared mechanisms by medical conditions. I find this especially pressing where it comes to mental health where the kind of symptom listing found in the DSM- the Bible for mental health care professionals- has never resulted in a causative model of how conditions such as anxiety or depression actually work and is based on an antiquated separation between the mind and the body not to mention the social and environmental factors that all give shape to mental health.

Even more interesting, from Greenwood’s piece, are the efforts by Joseph Loscalzo of Harvard Medical School to try and come up with a whole new model for disease that looks beyond genome associations for diseases to map out the molecular networks of disease isolating the statistical correlation between a particular variant of such a map and a disease. This relationship between genes and proteins correlated with a disease is something Loscalzo calls a “disease module”.

Thurner describes the underlying methodology behind his, and by implication Loscalzo’s, efforts to Greenwood this way:

Once you draw a network, you are drawing hypotheses on a piece of paper,” Thurner said. “You are saying, ‘Wow, look, I didn’t know these two things were related. Why could they be? Or is it just that our statistical threshold did not kick it out?’” In network analysis, you first validate your analysis by checking that it recreates connections that people have already identified in whatever system you are studying. After that, Thurner said, “the ones that did not exist before, those are new hypotheses. Then the work really starts.

It’s the next steps, the testing of hypotheses, the development of a stable model where the most important work really lies. Like any intellectual fad, Big Data has its element of truth. We can now much more easily distill large and sometimes previously invisible patterns from the deluge of information in which we are now drowning. This has potentially huge benefits for science, medicine, social policy, and law enforcement.

The problem comes from thinking that we are at the point where our data crunching algorithms can do the work for us and are about to replace the human beings and their skills at investigating problems deeply and in the real world. The danger there would be thinking that knowledge could work like self-gratification a mere thing of the mind without all the hard work, compromises, and conflict between expectations and reality that goes into a real relationship. Ironically, this was a truth perhaps discovered first not by scientists or intelligence agencies but by online dating services. To that strange story, next time….

Big Brother, Big Data and The Forked Path Revisited

Feb9

This week witnessed yet another examples of the distortions caused by the 9/11 Wars on the ideals that underlie the American system of government and the ballooning of the powers and reach of the national security state. In a 16 page Justice Department memo obtained by the NBC News reporter, Michael Isikoff, legal justifications were outlined for the extra-judicial killings of American citizens deemed to pose a “significant threat” to the United States. The problem here is who gets to define what such a threat is. The absence of any independent judicial process outside of the executive branch that can determine whether the rights of an American citizen can be stripped, including the condition of being considered innocent before being proved guilty amounts to an enormous increase in executive power that will likely survive the Obama Administration. This is not really news in that we already knew that extra-judicial killings (in the case of one known suspect, Anwar al-Awlaki, at least) had already taken place. What was not known was just how sweeping, permanent, and without clear legal boundaries these claims of an executive right to kill American citizens absent the proof of guilt actually were. Now we know.

This would be disturbing information if it stood alone by itself, but it does not stand alone. What we have seen since the the attacks on 9/11 is the spread of similar disturbing trends which have only been accelerated by technological changes such as the growth of Big-Data and robotics. Given the news, I thought it might be a good idea to reblog a post I had written back in September where I tried to detail these developments. What follows is a largely unedited version of that original post.

………………………………………………………………………………

The technological ecosystem in which political power operates tends to mark out the possibility space for what kinds of political arrangements, good and bad, exist within that space. Orwell’s Oceania and its sister tyrannies were imagined in what was the age of big, centralized media. Here the Party had under its control not only the older printing press, having the ability to craft and doctor, at will, anything created using print from newspapers, to government documents, to novels. It also controlled the newer mediums of radio and film, and, as Orwell imagined, would twist those technologies around backwards to serve as spying machines aimed at everyone.

The questions, to my knowledge, Orwell never asked was what was the Party to do with all that data? How was it to store, sift through, make sense of, or locate locate actual threats within it the yottabytes of information that would be gathered by recording almost every conversation, filming or viewing almost every movement, of its citizens lives? In other words, the Party would have ran into the problem of Big Data. Many of Orwellian developments since 9/11 have come in the form of the state trying to ride the wave of the Big Data tsunami unleashed with the rise of the internet, an attempt create it’s own form of electronic panopticon.

In their book Top Secret America: The Rise of the New American Security State, Dana Priest, and ,William Arkin, of the Washington Post present a frightening picture of the surveillance and covert state that has mushroomed in the United States since 9/11. A vast network of endeavors which has grown to dwarf, in terms of cummulative numbers of programs and operations, similar efforts, during the unarguably much more dangerous Cold War. (TS 12)

Theirs’ is not so much a vision of an America of dark security services controlled behind the scenes by a sinister figure like J. Edgar Hoover, as it is one of complexity gone wild. Priest and Arkin paint a picture of Top Secret America as a vast data sucking machine, vacuuming up every morsel of information with the intention of correctly “connecting the dots”, (150) in the hopes of preventing another tragedy like 9/11.

So much money was poured into intelligence gathering after 9/11, in so many different organizations, that no one, not the President, nor the Director of the CIA, nor any other official has a full grasp of what is going on. The security state, like the rest of the American government, has become reliant on private contractors who rake in stupendous profits. The same corruption that can be found elsewhere in Washington is found here. Employees of the government and the private sector spin round and round in a revolving door between the Washington connections brought by participation in political establishment followed by big-time money in the ballooning world of private security and intelligence. Priest quotes one American intelligence official who had the balls to describe the insectous relationship between government and private security firms as “a self-licking ice cream cone”. (TS 198)

The flood of money that inundated the intelligence field in after 9/11 has created what Priest and Arkin call an “alternative geography” companies doing covert work for the government that exist in huge complexes, some of which are large enough to contain their very own “cities”- shopping centers, athletic facilities, and the like. To these are added mammoth government run complexes some known and others unknown.

Our modern day Winston Smiths, who work for such public and private intelligence services, are tasked not with the mind numbing work of doctoring history, but with the equally superfluous job of repackaging the very same information that had been produced by another individual in another organization public or private each with little hope that they would know that the other was working on the same damned thing. All of this would be a mere tragic waste of public money that could be better invested in other things, but it goes beyond that by threatening the very freedoms that these efforts are meant to protect.

Perhaps the pinnacle of the government’s Orwellian version of a Google FaceBook mashup is the gargantuan supercomputer data center in Bluffdale Nevada built and run by the premier spy agency in the age of the internet- the National Security Administration or NSA. As described by James Bamford for Wired Magazine:

In the process—and for the first time since Watergate and the other scandals of the Nixon administration—the NSA has turned its surveillance apparatus on the US and its citizens. It has established listening posts throughout the nation to collect and sift through billions of email messages and phone calls, whether they originate within the country or overseas. It has created a supercomputer of almost unimaginable speed to look for patterns and unscramble codes. Finally, the agency has begun building a place to store all the trillions of words and thoughts and whispers captured in its electronic net.

It had been thought that domestic spying by the NSA, under a super-secret program with the Carl Saganesque name, Stellar Wind, had ended during the G.W. Bush administration, but if the whistleblower, William Binney, interviewed in this chilling piece by Laura Poitras of the New York Times, is to be believed, the certainly unconstitutional program remains very much in existence.

The bizarre thing about this program is just how wasteful it is. After all, don’t private companies, such as FaceBook and Google not already possess the very same kinds of data trails that would be provided by such obviously unconstitutional efforts like those at Bluffdale? Why doesn’t the US government just subpoena internet and telecommunications companies who already track almost everything we do for commercial purposes? The US government, of course, has already tried to turn the internet into a tool of intelligence gathering, most notably, with the stalled Cyber Intelligence Sharing and Intelligence Act, or CISPA , and perhaps it is building Bluffdale in anticipation that such legislation will fail, that however it is changed might not be to its liking, or because it doesn’t want to be bothered with the need to obtain warrants or with constitutional niceties such as our protection against unreasonable search and seizure.

If such behemoth surveillance instruments fulfill the role of the telescreens and hidden microphones in Orwell’s 1984, then the role the only group in the novel whose name actually reflects what it is- The Spies – children who watch their parents for unorthodox behavior and turn them in, is taken today by the American public itself. In post 9/11 America it is, local law enforcement, neighbors, and passersby who are asked to “report suspicious activity”. People who actually do report suspicious activity have their observations and photographs recorded in an ominous sounding data base that Orwell himself might have named called The Guardian. (TS 144)

As Priest writes:

Guardian stores the profiles of tens of thousands of Americans and legal residents who are not accused of any crime. Most are not even suspected of one. What they have done is appear, to a town sheriff, a traffic cop, or even a neighbor to be acting suspiciously”. (TS 145)

Such information is reported to, and initially investigated by, the personnel in another sort of data collector- the “fusion centers” which had been created in every state after 9/11.These fusion centers are often located in rural states whose employees have literally nothing to do. They tend to be staffed by persons without intelligence backgrounds, and who instead hailed from law enforcement, because those with even the bare minimum of foreign intelligence experience were sucked up by the behemoth intelligence organizations, both private and public, that have spread like mould around Washington D.C.

Into this vacuum of largely non-existent threats came “consultants” such as Montijo Walid Shoebat, who lectured fusion center staff on the fantastical plot of Muslims to establish Sharia Law in the United States. (TS 271-272). A story as wild as the concocted boogeymen of Goldstein and the Brotherhood in Orwell’s dystopia.

It isn’t only Mosques, or Islamic groups that find themselves spied upon by overeager local law enforcement and sometimes highly unprofessional private intelligence firms. Completely non-violent, political groups, such as ones in my native Pennsylvania, have become the target of “investigations”. In 2009 the private intelligence firm the Institute for Terrorism Research and Response compiled reports for state officials on a wide range of peaceful political groups that included: “The Pennsylvania Tea Party Patriots Coalition, the Libertarian Movement, anti-war protesters, animal-rights groups, and an environmentalist dressed up as Santa Claus and handing out coal-filled stockings” (TS 146). A list that is just about politically broad enough to piss everybody off.

Like the fusion centers, or as part of them, data rich crime centers such as the Memphis Real Time Crime Center are popping up all over the United States. Local police officers now suck up streams of data about the environments in which they operate and are able to pull that data together to identify suspects- now by scanning licence plates, but soon enough, as in Arizona, where the Maricopa County Sheriff’s office was creating up to 9,000 biometric, digital profiles a month (TS 131) by scanning human faces from a distance.

Sometimes crime centers used the information gathered for massive sweeps arresting over a thousand people at a clip. The result was an overloaded justice and prison system that couldn’t handle the caseload (TS 144), and no doubt, as was the case in territories occupied by the US military, an even more alienated and angry local population.

From one perspective Big Data would seem to make torture more not less likely as all information that can be gathered from suspects, whatever their station, becomes important in a way it wasn’t before, a piece in a gigantic, electronic puzzle. Yet, technological developments outside of Big Data, appear to point in the direction away from torture as a way of gathering information.

“Controlled torture”, the phrase burns in my mouth, has always been the consequence of the unbridgeable space between human minds. Torture attempts to break through the wall of privacy we possess as individuals through physical and mental coercion. Big Data, whether of the commercial or security variety, hates privacy because it gums up the capacity to gather more and more information for Big Data to become what so it desires- Even Bigger Data. The dilemma for the state, or in the case of the Inquisition, the organization, is that once the green light has been given to human sadism it is almost impossible to control it. Torture, or the knowledge of torture inflicted on loved ones, breeds more and more enemies.

Torture’s ham fisted and outwardly brutal methods today are going hopelessly out of fashion. They are the equivalent of rifling through someone’s trash or breaking into their house to obtain useful information about them. Much better to have them tell you what you need to know because they “like” you.

In that vein, Priest describes some of the new interrogation technologies being developed by the government and private security technology firms. One such technology is an “interrogation booth” that contain avatars with characteristics (such as an older Hispanic woman) that have been psychologically studied to produce more accurate answers from those questioned. There are ideas to replace the booth with a tiny projector mounted on a soldier’s or policeman’s helmet to produce the needed avatar at a moments notice. There was also a “lie detecting beam” that could tell- from a distance- whether someone was lying by measuring miniscule changes on a person’s skin. (TS 169) But if security services demand transparency from those it seeks to control they offer up no such transparency themselves. This is the case not only in the notoriously secretive nature of the security state, but also in the way the US government itself explains and seeks support for its policies in the outside world.

Orwell, was deeply interested in the abuse of language, and I think here too, the actions of the American government would give him much to chew on. Ever since the disaster of the war in Iraq, American officials have been obsessed with the idea of “soft-power”. The fallacy that resistance to American policy was a matter of “bad messaging” rather than the policy itself. Sadly, this messaging was often something far from truthful and often fell under what the government termed” Influence operations” which, according to Priest:

Influence operations, as the name suggests, are aimed at secretly influencing or manipulating the opinions of foreign audiences, either on an actual battlefield- such as during a feint in a tactical battle- or within civilian populations, such as undermining support for an existing government or terrorist group (TS 59)

Another great technological development over the past decade has been the revolution in robotics, which like Big Data is brought to us by the ever expanding information processing powers of computers, the product of Moore’s Law.

Since 9/11 multiple forms of robots have been perfected, developed, and deployed by the military, intelligence services and private contractors only the most discussed and controversial of which have been flying drones. It is with these and other tools of covert warfare, such as drones, and in his quite sweeping understanding and application of executive power that President Obama has been even more Orwellian than his predecessor.

Obama may have ended the torture of prisoners captured by American soldiers and intelligence officials, and he certainly showed courage and foresight in his assassination of Osama Bin Laden, a fact by which the world can breathe a sigh of relief. The problem is that he has allowed, indeed propelled, the expansion of the instruments of American foreign policy that are largely hidden from the purview and control of the democratic public. In addition to the surveillance issues above, he has put forward a sweeping and quite dangerous interpretation of executive power in the forms of indefinite detention without trial found in the NDAA, engaged in the extrajudicial killings of American citizens, and asserted the prerogative, questionable under both the constitution and international law, to launch attacks, both covert and overt, on countries with which the United States is not officially at war.

In the words of Conor Friedersdorf of the Atlantic writing on the unprecedented expansion of executive power under the Obama administration and comparing these very real and troubling developments to the paranoid delusions of right-wing nuts, who seem more concerned with the fantastical conspiracy theories such as the Social Security Administration buying hollow-point bullets:

… the fact that the executive branch is literally spying on American citizens, putting them on secret kill lists, and invoking the state secrets privilege to hide their actions doesn’t even merit a mention. (by the right-wing).

Perhaps surprisingly, the technologies created in the last generation seem tailor made for the new types of covert war the US is now choosing to fight. This can perhaps best be seen in the ongoing covert war against Iran which has used not only drones but brand new forms of weapons such the Stuxnet Worm.

The questions posed to us by the militarized versions of Big Data, new media, Robotics, and spyware/computer viruses are the same as those these phenomena pose in the civilian world: Big Data; does it actually provide us with a useful map of reality, or instead drown us in mostly useless information? In analog to the question of profitability in the economic sphere: does Big Data actually make us safer? New Media, how is the truth to survive in a world where seemingly any organization or person can create their own version of reality. Doesn’t the lack of transparency by corporations or the government give rise to all sorts of conspiracy theories in such an atmosphere, and isn’t it ultimately futile, and liable to backfire, for corporations and governments to try to shape all these newly enabled voices to its liking through spin and propaganda? Robotics; in analog to the question of what it portends to the world of work, what is it doing to the world of war? Is Robotics making us safer or giving us a false sense of security and control? Is it engendering an over-readiness to take risks because we have abstracted away the very human consequences of our actions- at least in terms of the risks to our own soldiers. In terms of spyware and computer viruses: how open should our systems remain given their vulnerabilities to those who would use this openness for ill ends?

At the very least, in terms of Big.Data, we should have grave doubts. The kind of FaceBook from hell the government has created didn’t seem all that capable of actually pulling information together into a coherent much less accurate picture. Much like their less technologically enabled counterparts who missed the collapse of the Eastern Bloc and fall of the Soviet Union, the new internet enabled security services missed the world shaking event of the Arab Spring.

The problem with all of these technologies, I think, is that they are methods for treating the symptoms of a diseased society, rather than the disease itself. But first let me take a detour through Orwell vision of the future of capitalist, liberal democracy seen from his vantage point in the 1940s.

Orwell, and this is especially clear in his essay The Lion and the Unicorn, believed the world was poised between two stark alternatives: the Socialist one, which he defined in terms of social justice, political liberty, equal rights, and global solidarity, and a Fascist or Bolshevist one, characterized by the increasingly brutal actions of the state in the name of caste, both domestically and internationally.

He wrote:

Because the time has come when one can predict the future in terms of an “either–or”. Either we turn this war into a revolutionary war (I do not say that our policy will be EXACTLY what I have indicated above–merely that it will be along those general lines) or we lose it, and much more besides. Quite soon it will be possible to say definitely that our feet are set upon one path or the other. But at any rate it is certain that with our present social structure we cannot win. Our real forces, physical, moral or intellectual, cannot be mobilised.

It is almost impossible for those of us in the West who have been raised to believe that capitalist liberal democracy is the end of the line in terms of political evolution to remember that within the lifetimes of people still with us (such as my grandmother who tends her garden now in the same way she did in the 1940’s) this whole system seemed to have been swept up into the dustbin of history and that the future lie elsewhere.

What the brilliance of Orwell missed, the penetrating insight of Aldous Huxley in his Brave New World caught: that a sufficiently prosperous society would lull it’s citizens to sleep, and in doing so rob them both of the desire for revolutionary change and their very freedom.

As I have argued elsewhere, Huxley’s prescience may depend on the kind of economic growth and general prosperity that was the norm after the Second World War. What worries me is that if the pessimists are proven correct, if we are in for an era of resource scarcity, and population pressures, stagnant economies, and chronic unemployment that Huxley’s dystopia will give way to a more brutal Orwellian one.

This is why we need to push back against the Orwellian features that have crept upon us since 9/11. The fact is we are almost unaware that we building the architecture for something truly dystopian and should pause to think before it is too late.

To return to the question of whether the new technologies help or hurt here: It is almost undeniable that all of the technological wonders that have emerged since 9/11 are good at treating the symptoms of social breakdown, both abroad and at home. They allow us to kill or capture persons who would harm largely innocent Americans, or catch violent or predatory criminals in our own country, state, and neighborhood. Where they fail is in getting to the actual root of the disease itself.

American would much better serve its foreign policy interest were it to better align itself with the public opinion of the outside world insofar as we were able to maintain our long term interests and continue to guarantee the safety of our allies. Much better than the kind of “information operation” supported by the US government to portray a corrupt, and now deposed, autocrat like Yemen’s Abdullah Saleh as “an anti-corruption activist”, would be actual assistance by the US and other advanced countries in…. I duknow… fighting corruption. Much better Western support for education and health in the Islamic world that the kinds of interference in the internal political development of post-revolutionary Islamic societies driven by geopolitical interest and practiced by the likes of Iran and Saudi Arabia.

This same logic applies inside the United States as well. It is time to radically roll back the Orwellian advances that have occurred since 9/11. The dangers of the war on terrorism were always that they would become like Orwell’s “continuous warfare”, and would perpetually exist in spite, rather than because of the level of threat. We are in danger of investing so much in our security architecture, bloated to a scale that dwarfs enemies, which we have blown up in our own imaginations into monstrous shadows, that we are failing to invest in the parts of our society that will actually keep us safe and prosperous over the long-term.

In Orwell’s Oceania, the poor, the “proles” were largely ignored by the surveillance state. There is a danger here that with the movement of what were once advanced technologies into the hands of local law enforcement: drones, robots, biometric scanners, super-fast data crunching computers, geo-location technologies- that domestically we will move even further in the direction of treating the symptoms of social decay, rather than dealing with the underlying conditions that propel it.

The fact of the matter is that the very equality, “the early paradise”, a product of democratic socialism and technology, Orwell thought was at our fingertips has retreated farther and farther from us. The reasons for this are multiple; To name just a few: financial concentration, automation, the end of “low hanging fruit” and their consequent high growth rates brought by industrialization,the crisis of complexity and the problem of ever more marginal returns. This retreat, if it lasts, would likely tip the balance from Huxley’s stupification by consumption to Orwell’s more brutal dystopia initiated by terrified elites attempting to keep a lid on things.

In a state of fear and panic we have blanketed the world with a sphere of surveillance, propaganda and covert violence at which Big Brother himself would be proud. This is shameful, and threatens not only to undermine our very real freedom, but to usher in a horribly dystopian world with some resemblance to the one outlined in Orwell’s dark imaginings. We must return to the other path.

The Shirky- Morozov Debate or how FaceBook beat Linux

Nov9

One thing that struck me throughout the 2012 presidential contest was the Obama campaign’s novel use of Big-Data and targeted communication to mobilize voters. Many of these trends I found somewhat disturbing, namely, the practice of micro-mobilization through fear, the application of manipulative techniques created in commercial advertising and behavioral economics to spur voter mobilization, and the invasion of privacy opened up by the transparency culture and technology of social media.

These doubts and criticisms were made despite the fact that I am generally an Obama supporter, would ultimately cast my vote for the man, and was overall delighted by the progressive victories in the election, not least the push back against voter suppression which had been attempted, and only at the last minute thwarted, in my home state of Pennsylvania.

The sheer clarity of the success of the Obama campaign’s strategy makes me think that these techniques are largely a fait accompli, and will be rapidly picked up by Republicans to the extent they can. Political commentators have already turned their eyes to the strategy’s success, completely ignoring the kinds of critical questions brought to our attention, for instance, by,Charles Duhigg, in The New York Times only a few weeks ago.

Given their effectiveness, there might be very little push-back from liberal voters regarding the way the 2012 campaign was waged, and such push-back might be seen as demands for unilateral disarmament on the part of Democrats should they come from Republicans- in which case the demand might quite rightly be seen as just another example of the GOP’s attempts at voter suppression. Or, should such push back against these techniques come from a minority of progressives in, or allied with, the Democratic party who are troubled by their implications, such complaints might be written off as geriatric whining by out of touch idealists who have no clue on how the new era of networked politics works. And this would largely be right, the campaigns of 2012, and the Obama campaign most especially, have likely brought us into a brand new political era.

A recent article in Time Magazine gives a good idea of how the new science of campaigning works: it is data driven, and builds upon techniques honed in the world’s of advertising and psychology to target both individuals and groups strategically.
Like the world’s of finance and government surveillance it is a new ecology where past, and bogus, claims by individuals to be able to “forecast the future” by ” gut-instinct” has fallen before Big Data and the cold brilliance of the quants.

That data-driven decision making played a huge role in creating a second term for the 44th President and will be one of the more closely studied elements of the 2012 cycle. It’s another sign that the role of the campaign pros in Washington who make decisions on hunches and experience is rapidly dwindling, being replaced by the work of quants and computer coders who can crack massive data sets for insight. As one official put it, the time of “guys sitting in a back room smoking cigars, saying ‘We always buy 60 Minutes’” is over. In politics, the era of big data has arrived.

One can feel for a political pundit such as Michael Gerson who attacked the political predictions of the data savvy Nate Silver in the same way one can feel sympathy for the thick-necked, testosterone heavy, Wall Street traders who were replaced by thinner-necked quants who had gotten their chops not on raucous trading floors but in courses on advanced physics. And, at the end of the day, Silver was right. Gerson’s “observation” about the nature of American politics in his ridiculous critique of Silver- given the actual reality of the 2012 campaign- is better understood as a lament than an observation:

An election is not a mathematical equation; it is a nation making a decision. People are weighing the priorities of their society and the quality of their leaders. Those views, at any given moment, can be roughly measured. But spreadsheets don’t add up to a political community. In a democracy, the convictions of the public ultimately depend on persuasion, which resists quantification.

Put another way: The most interesting and important thing about politics is not the measurement of opinion but the formation of opinion. Public opinion is the product — the outcome — of politics; it is not the substance of politics. If political punditry has any value in a democracy, it is in clarifying large policy issues and ethical debates, not in “scientific” assessments of public views.

My main objections here are that this is an aspirational statement- not one of fact, and that the role Gerson gives to pundits, to himself, is absolutely contrary to reality- unless one believes the kind of “clarity” found by paying attention to the talking heads on Fox News is actually an exercise in democratic deliberation.

Yet, there are other ways in which the type of political campaign seen in 2012 offer up interesting food for thought in that they seem to point towards an unlikely outcome in current debates over the role and effect of the new communications technology on politics.

In some sense Obama’s 2012 campaign seems to answer what I’ll call the “Clay Shirky- Evgeny Morozov Debate”. I could also call it the Shirky-Gladwell debate, but I find Morozov to be a more articulate spokesman of techo-pessimism (or techno-realism, depending upon one’s preference) than the omnipresent Malcolm Gladwell.

Clay Shirky is a well known spokesperson for the idea that the technological revolution centered around the Internet and other communications networks is politically transformative and offers up the possibility of a new form of horizontal politics.

Shirky sees the potential of governance to follow the open source model of software development found in collectively developed software such as Linux and Github that allow users to collaborate without being coordinated by anyone from above- as opposed to the top-down model followed by traditional software companies i.e. MicroSoft. Although Shirky does not discuss them in his talk- the hacktivists group of Anonymous and Wikileaks follow this same decentralized, and horizontal model. As of yet, no government has adopted anything but token elements of the open source model of governance though they have, in Shirky’s view embraced more openness- transparency.

In an article for the journal Foreign Affairs in 2011 entitled The Political Power of Social Media, an article written before either the Arab Spring or the Occupy Wall Street movements had exploded on the scene, Shirky made a reasoned case for the potential of social media to serve as a prime vector for political change. Social media, while in everyday life certainly dominated by nonsense such as “singing cats”, also brought the potential to mobilize the public- overnight- based on some grievance or concern.

Here, Shirky responded to criticisms of both Malcolm Gladwell and Evgeny Morozov that his techno-optimism downplayed both the opiate like characteristics of social media, with its tendencies to distract people from political activity, along with the tendency of social media to create a shallow form of political commitment as people confuse signing an online petition or “liking” some person or group with actually doing something.

I do not agree with all of what Morozov has to say in his side of this debate, but, that said, he is always like a bracing glass of cold water to the face- a defense against getting lost in daydreams. If you’ve never seen the man in action here is a great short documentary that has the pugnacious Belarusian surrounded by a sort of panopticon of video screens where he pokes holes in almost every techo-utopia shibboleth out there.

In his The Net Delusion Morozov had made the case that the new social media didn’t lend themselves to lasting political movements because all such movements are guided strategically and ideologically by a core group of people with real rather than superficial commitment who had sacrificed, sometimes literally everything, in the name of the movement. Social media’s very decentralization and the shallow sorts of political activities it most often engenders are inimical to a truly effective political movement, and, at the same time, the very technologies that had given rise to social media have increased exponentially the state’s capacity for surveillance and the sphere of a-political distractions surrounding the individual.

And in early 2011 much of what Morozov said seemed right, but then came the Arab Spring, and then the Occupy Wall Street Movement, the former at the very least facilitated by social media, and the latter only made possible by it. If it was a prize fight, Morozov would have been on the mat, and Shirky shaking his fist with glee. And then…

It was the old-school Muslim Brotherhood not the tech-savvy tweeters who rose to prominence in post-Mubarak Egypt, and the Occupy Wall Street Movement faded almost as fast as it had appeared. Morozov was up off the mat.

And now we have had the 2012 presidential campaign, a contest fought and won using the tools of social media and Big Data. This suggests to me an outcome of the telecommunications revolution neither Shirky nor Morozov fully anticipated.

Shirky always sides with the tendency of the new media landscape to empower the individual and flatten hierarchies. This is not what was seen in the presidential race. Voters were instead “guided” by experts who were the only ones to grasp the strategic rationale of goading this individual rather than that and “nudging” them to act in some specific way.

Morozov, by contrasts, focuses his attention on the capacity of social media to pacify and distract the public in authoritarian states, and to ultimately hold the reins on the exchange of information.

What the Obama campaign suggests is that authoritarian countries might be able to use social media to foster a regime friendly political activity- that is to sponsor and facilitate the actions of large groups in its own interests, while short circuiting similar actions growing out of civil society which authoritarians find threatening. Though, regime friendly political activity in this case is likely to be much more targeted and voluntary than the absurdities of 20th century totalitarianism that mobilized people for every reason under the sun.

The difference between authoritarian countries and democratic ones in respect to these technologies, at least so far, is this: that authoritarian countries will likely use them to exercise power whereas in democracies they are only used to win it.

If 2012 was a portent of the future, what Web 2.0 has brought us is not Shirky’s dream of “open-sourced government” which uses technology to actively engage citizens in not merely the debate over, but the crafting of policies and laws, an outcome which would have spelled the decline of the influence of political parties. Instead, what we have is carefully targeted political mobilization based on the intimate knowledge of individual political preferences and psychological touch- points centrally directed by data-rich entities with a clear set of already decided upon political goals. Its continuation would constitute the defeat of the political model based on Linux and the victory of one based on FaceBook.

Utopia or Dystopia

where past meets future

Tag Archives: Big Data

Truth and Prediction in the Dataclysm

Big Data as statistical masturbation

Big Brother, Big Data and The Forked Path Revisited