Big data, new skills: how the accelerated academy hinders the interdisciplinary collaboration we need

My notes for this panel discussion on Tuesday.

Underlying many of the issues we’re discussing today is the fundamental problem of speed. We’ve seen rapid developments at the level of platforms, devices, practices and methods but this rapidity has made it difficult for methodological and theoretical deliberation to catch up. As the Geographer Rob Kitchin puts it in his book The Data Revolution, such deliberation “is desperately needed in order to catch up with the pace of technical change and the roll-out of ad hoc and pragmatic approaches, and to replace proliferating forms of weak empiricism” (loc 3800-3817).

It’s this empiricism which I think is the long term worry of many, as the empirical in digital social research of this sort is too often exhausted by what registers as digital data on a given platform. If this goes unchallenged, we risk a precipitous collapse in the descriptive and explanatory horizons of the social sciences. All the more so, if it’s coupled with the untenable, yet oddly persistent, idea that big data ‘speaks for itself’ and allows us to leave theory behind.

Innovations in research methods and the new data environment will undoubtedly impact teaching and research in the social sciences. But unless we can make space for methodological and theoretical deliberation to catch up to technical developments, it’s not obvious to me that this impact will be a positive one. This catching up necessitates critical and knowledgeable engagement with technical advances, as well as forms of collaborative and exploratory interdisciplinarity which are inherently challenging.

Unfortunately, the infrastructure of scholarly communication as it presently stands works to hinder this ‘catching up’. The journal system poses a number of problems by discouraging the exploration and experimentation which we need:

The quantitative increase in publication encourages ever-narrower specialisation, as new journals define themselves in relation to an already crowded intellectual market place.
With an estimated 28,100 journals publishing 2.5 million articles a year, ‘keeping up with the literature’ within one’s own field becomes an ever more overloading exercise.
This also intensifies existing access problems, particularly as newer and more specialised journals might not be widely accessible, with the political economy of journal subscriptions currently in a state of flux.
The slow speed of publication hinders discussion and debate about new developments, albeit at a rate that varies between fields.
The increasing importance of the established prestigious leaders within a field or discipline, something which manifests both qualitatively and qualitatively, enacts epistemic discipline as ambitious researcher compete for the limited number of slots available within that journal by adapting themselves to what they perceive as its intellectual standards.
It also creates competition between journals, as lesser ranked journals take instrumental action with the intention of improving their impact factors (and increasingly, orientated towards improving their alt metrics scores as well).
Existing norms of scholarship also curtail some of the discussions which the ‘catching up’ I’m talking about would necessitate. Is there space and freedom to reflect on the complex considerations involved in working with digital data? Do technical details get excluded from papers about substantive topics submitted to mainstream journals? Does the journal system impose a division of theory, method and data which obscures precisely the complexity of their entanglement in digital social research that we need to investigate?

There’s a risk of overstating the case here: there are existing innovations in journals which could mitigate these problems, such as Big Data & Society’s request that authors publish primary data (or provide detailed instruction about accessing it) or Sociological Science’s streamlining of the peer review process and commitment to editorial decisions within 30 days of submission. The dynamic of specialisation I’m criticising could also work to encourage the establishment of new journals which meet precisely the needs I’m suggesting are currently unmet. There’s also room to be hopeful given that some open science practices are becoming institutionalised within the social sciences, for instance the publication of pre-prints. But there are still nonetheless commercial threats to open science and some degree of conservatism built into structures of career advancement and research assessment.

These problems with the journal system are compounded by disciplinary silos. It too often shapes reading in a way which reinforces a disciplinary focus, as ‘keeping up with the literature’ necessitates triaging based on past experience. There’s little external incentive to read widely outside one’s own area, let alone the time or energy to do so when existing workload is unmanageable or unsustainable for many. These same factors militate against attending events outside one’s own areas. Even when we do, the technical languages spoken and the skills presupposed by speakers make meaningful engagement inherently difficult, a matter of ‘keeping up’ rather than contributing.

For digital methodology and digital theory to catch up to digital methods necessitates collaboration between “computationally literate social scientists and socially literate computer scientists” but it’s hard to see at present how such relations can reliably take a mutually beneficial shape, though of course this does happen in places. We don’t need to be speaking the same language but we do need to be able to understand each other. Even socially literate computer scientists tend only to be socially literate in a narrow sense. More importantly from my point of view, there are still relatively few computationally literate social scientists (and I’m certainly not one of them) and it seems urgent to me that we understand why. In the absence of this reciprocal literacy, collaboration will likely be narrowly instrumental, bringing in a social scientist for their domain expertise or bringing in a data scientist for their computational skills.

However I do think there are practical steps we can take to mitigate these problems. Social media is a powerful tool for crossing disciplinary boundaries, connecting with people in other fields and keeping up to date with developments through things like blogs and twitters feeds. To give an example of how virtual spaces can be used to facilitate the methodological and theoretical ‘catching up’ I’m advocating: as a joint project between the Independent Social Research Foundation’s Digital Social Science Forum and the International Journal of Social Research Methodology, we organised a stream at this conference on ‘beyond big and small data’. The intention was to offer a conceptual and methodological framing within which all manner of related issues could be explored. What is ‘big data’, what is ‘small data’ and is this opposition a useful one?

But as well as the conference stream itself, we live tweeted the talks and recorded them as podcasts to be released on social media. The participants have been invited to write up their paper as a short article for the LSE Impact Blog, a popular website with an extremely large and diverse international audience, for a special section on Digital Methodologies. These will then be circulated, along with the podcasts from the event, as a call for more contributions to the discussion from those not at the event. Thus we intend to facilitate an ongoing conversation, building from a face-to-face meeting but using social media to extend far beyond it, one which will hopefully lead to more face-to-face meetings at future events if the discussion builds up sufficient momentum.

But even if we can explore new ways of facilitating collaboration outside the limits I’ve described, there’s still the challenge of ‘upskilling’ the social sciences to expand computational literacy, while resisting the urge to simply reproduce the norms of computer science. There are free digital training tools which can be used to this end, things like Code Academy and some of the more technically orientated MOOC platforms. But the pressures of career advancement to which individuals are subject militate against this upskilling being something which can be pursued as an individual matter. It’s extremely time consuming and the institutional incentives aren’t there. But these digital tools could be a starting point, supplemented by peer support networks, meeting both face-to-face and digitally.

Ultimately though, we need well funded networks and institutional recognition of the importance of this endeavor, as well as long term changes to the structure of graduate education. Only in this way do I think it will be possible to facilitate the meaningful participation of social scientists in shaping how the new data environment impacts research and teaching, as well as the likely transformation of the social sciences themselves.

Mark Carrigan

Big data, new skills: how the accelerated academy hinders the interdisciplinary collaboration we need

Share this: