All science is becoming data science. Therefore data scientists have a lot of power in this regime [stifles a laugh] It’s a great time to be a data geek.
This is an interesting aside made by Bill Howe of Washington University in an early lecture on Coursera’s Introduction to Data Science MOOC. I take this to be the point that Emma Uprichard was making when she wrote about ‘methodological genocide’ last year in Discover Society:
At the risk of sounding a bit melodramatic, the big data hype is generating, for want of a better term, a methodological genocide. To my mind, it even has a flavour of being a disciplinary genocide. It is fierce and it is violent, and social scientists – and especially sociologists – need to fight back. Certainly, if we are going to meaningfully interrogate the social systems and structures that make up the social world, we will need to improve our quantitative skills. I know, I’m sorry to say it, I know this doesn’t always go down well among many social scientists, especially among those in the UK. But whilst I do think that one of the ways we will need to fight back is to increase our quantitative skills – we need to be clear about the kind of social science we move forward to [….]
Many new statistical techniques used to crunch through big data involve ‘shrinking’ the data. This not only ‘dilutes’ the importance of extreme cases – the outliers – within large datasets, but also focuses the analysis on the masses in the middle. One of the key strengths of social research and sociological research in particular is a sensibility to social divisions, minority groups, oppressed and silenced voices. In order to remain strong in these areas, we must absolutely remain attentive to the methodological techniques that go some way to erase extreme cases, pockets of extreme difference. Another big way of organising data is through data mining, machine learning and pattern recognition. At the core of those approaches, there are issues such as classification – who or what goes into which group and how are units of analysis measured as ‘similar’ or ‘different’? How should we count in a way that allows for meaningful counts over time? How we shape the social through our counting and classifying are highly political and ethical issues.
The only difference is that Howe thinks this is great. ‘Soft sciences’ are becoming ‘hard sciences’. Intellectual life is transforming and he finds himself at the centre of it. There are different ways in which the emerging field can be understood in intellectual terms. I find the emergence of data science in this context, within the university and outside of it, extremely interesting. It partly represents an interdisciplinary point of convergence driven by socio-technical innovation, the opportunities for inquiry facilitated by it and the technical challenges posed by them:
I honestly think it’s hard to overstate the significance of this for the Social Sciences as a whole. Much of their future development will hinge on the dynamics underlying the second venn diagram. I find it easy to imagine a future where computational social science becomes established as the vanguard of the social sciences, with disciplinary boundaries between them a thing of the past, buttressed by entrenched pockets of more discipline bound enquiry which nonetheless are implicitly and explicitly supportive of the computational social science project on an epistemic and methodological level. Meanwhile qualitative sociologists, anthropologists and other hold outs become less part of the diagram and more a circle on to themselves, hopefully doing sustained research in an interesting way but with the risk that they become preoccupied by hurling critique at the shiny and well-funded convergent project over the road from them. Hopefully I’m wrong because this seems like it would be a suboptimal state of affairs on many levels.
Another interesting thing about data science is the emergence of the ‘data scientist’ as an aspirational category. Not to worry though because You can be a Data Scientist too!
Again there is socio-technical innovation, the opportunities they provide (for business) and the technical challenges posed by their exploitation. It builds upon the existing occupational role of the business analyst, adds additional skills and valorises ‘curiosity’. It is the sexiest job of the 21st century:
Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.
The corporate demand for data science has led some to bemoan a ‘big data brain drain’ in higher education. In so far as there’s lots of money to be made in corporate data science and relatively little within the universities, with even the massive funding provision for data science research encouraging academic entrepreneurism but having little impact upon the academic career structure, it can’t be separated from the broader trajectory of the labour market in an age of austerity. Nor too can we adequately understand the emergence of data science if we don’t consider it in the light of the cultural ascendency of quants and the entrenchment of the quants within the finance industry and beyond.
I’m also intrigued about what drives these interdisciplinary trends at the level of intellectual biography. For instance physics is well represented within data science, as well as often being invoked in the discourse surrounding it as an illustration of the scientific rigour characterising data science. There has been a sharp increase in Physics PhDs since 2000 in the US (source) – to what extent is this being driven by newly graduated post-doctoral physicists who, either out of curiosity or necessity, go marauding into other areas of inquiry because advancement in their own field is either unlikely or undesirable?