I really wish I could go to this:

CALL FOR PAPERS/PARTICIPATION: One-day Workshop on
*Developing a Research Agenda for Human-Centered Data Science*

in conjunction with CSCW 2016
http://cscw.acm.org/2016/

Sunday, February 28th, 2016
San Francisco, CA, USA

Workshop Website:
https://cscw2016hcds.wordpress.com/

———————————-
Important dates:
– 11th December 2015: Submission of Position Papers
– 18th January 2016: Notification of acceptance
– 25th January 2016: Camera-ready version
– 28th February 2016: Workshop at CSCW 2016
———————————-

TOPIC:
The study and analysis of large and complex data sets offer a wealth of
insights in a variety of applications. Computational approaches provide
researchers access to broad assemblages ofdata, but the insights extracted
may lack the rich detail that qualitative approaches have brought to the
understanding of sociotechnical phenomena. How do we preserve the richness
associated with traditional qualitative methods while utilizing the power
of large data sets? How do we uncover social nuances or consider ethics and
values in data use?

These and other questions are explored by human-centered data science, an
emerging field at the intersection of human-computer interaction (HCI),
computer-supported cooperative work (CSCW), human computation, and the
statistical and computational techniques of data science. This workshop,
the first of its kind at CSCW, seeks to bring together researchers
interested inhuman-centered approaches to data science to collaborate,
define a research agenda, and form a community.

PARTICIPATION:
This workshop provides a venue for attendees to discuss a variety of topics
in human-centereddata science.  We welcome researchers interested in
exploring how data-driven and qualitative research can be integrated to
address complex questions in a diverse range of areas, including but not
limited to social computing, urban, health, or crisis informatics,
scientific, business, policy, technical, and other fields. Researchers and
practitioners working with large data sets (“big data”) and/or qualitative
data sets looking to expand their methodological toolbox are invited to
participate and share their experiences while learning from the broader
community.

Topics and themes of interest include, but are not limited to:

– Deep ethnographic methods: How do we preserve the richness of traditional
qualitative techniques in data science?
– Scaling up qualitative data analysis: How do we deal with ever growing
qualitative datasets?
– Quantitative and behavioral methods: How are quantitative and behavioral
methods related todata mining, machine learning, and qualitative methods?
– Connecting across levels of analysis: How can we integrate the analysis
of personal data with large-scale data?
– Ethics and values of data use: What ethical questions should we raise in
using large-scale online data?
– Privacy of data use: How can we preserve anonymity and privacy
within data ecosystems
that can easily expose users?
– Human-centered algorithm design: How do we design machine learning
algorithms tailored forhuman use and understanding?
– Understanding community data: How can we integrate knowledge gained about
communities from their aggregate social data as well as their personal
experiences?
– Health and well-being at micro and macro scales: What understandings can
be exposed or occluded by aggregate or granular perspectives on health and
well-being?

SUBMISSION:
Please submit a position paper (from 2 to 4 pages in the CSCW ACM sigCHI EA
format – see
http://www.sigchi.org/publications/chipubform/sigchi-extended-abstracts-format-2016/view)
by 11th December 2015 to hcds.cscw2016@gmail.com

The submissions will be reviewed by the organizers with support of other
researchers in a dedicated program committee and selected according to
their potential to contribute to the workshop topic and to foster
discussion.

All accepted contributions (notifications will be sent out by 18 January
2016) will be made available on the website to allow participants to
prepare for the workshop. The organizers may consider the publication of
revised versions of accepted papers as part of a special issue in a CSCW
related journal.

ORGANIZERS:
Cecilia Aragon, University of Washington.
CJ Hutto, Georgia Institute of Technology.
Yun Huang, Syracuse University.
Wanli Xing, University of Missouri.
Gina Neff, University of Washington.
Jinyoung Kim, University of Maryland, College Park.
Andy Echenique, University of California, San Diego and San Diego
Supercomputer Center.
Joseph Bayer, University of Michigan.
Brittany Fiore-Gartland, University of Washington.

CONTACT:
For more details, check the workshop website:
https://cscw2016hcds.wordpress.com/

For any further information on the workshop please contact
hcds.cscw2016@gmail.com

From Plutocrats: The Rise of the New Global Super-Rich pg 46:

Carlos Slim, who studied engineering in college and taught algebra and linear programming as an undergraduate, attributes his fortune to his facility with numbers. So does Steve Schwarzman, who told me he owed his success to his “ability to see patterns that other people don’t see” in large collections of numbers. People inside the super- elite think the rise of the data geeks is just beginning. Elliot Schrage is a member of the tech aristocracy— he was the communications director for Google when it was the hottest company in the Valley and jumped to the same role at Facebook just as it was becoming a behemoth. At a 2009 talk he gave to an internal company meeting of education and publishing executives, Schrage was asked what field we should encourage our children to study. His instant answer was statistics, because the ability to understand data would be the most powerful skill in the twenty- first century.

How does this intersect with the (purported) rise of the data scientist as the ‘sexist job of the 21st century‘?

The Politics of Data (Science)

This special issue of Discover Society will explore the political implications of ‘big data’ and the systems of expertise emerging around it, including though not limited to Data Science. In doing so it will aim to bridge the gap between the methodological discourse surrounding data science and the political discourse beginning to emerge around ‘big data’. Here are some of the questions the issue will address:

– How is ‘big data’ understood and acted upon? How should we understand its cultural power?
– How is ‘big data’ reconfiguring the social sciences? Do we risk all science becoming data science?
– How and why has the ‘data scientist’ come to be seen as the ‘sexiest job of the 21st century’?
– Is the ‘data scientist’ just a ’Statistician who lives in Shoreditch?’ Or is this a genuinely new intellectual role?
– Can ‘big data’ address ‘big questions’? If not, is this a problem?
– What are the precursors of ‘data science’ within the academy and/or within corporations?
– What implications does corporate data science have for the relationship between corporations & consumers?
– What implications does national security data science have for the relationship between the state & citizens?
– Can the use of digital data lead to efficiency savings in public services? How does this relate to the politics of austerity?
– How could predictive privacy harms emerging from data analytics be addressed politically?
– Can the opacity of algorithmic processes be challenged? Or are we heading inexorably for a ‘black-box society’?
– How are new forms of digital data reconfiguring activity in particular social environments?

However these are just suggestions and ideas beyond the scope of this list are very welcome.

The deadline for contributions is June 15th. Contact mark@markcarrigan.net to discuss a potential contribution.

The article will constitute the July issue of Discover Society. Most articles will be 1500 words however there are a number of special sections in the online magazine.

Front line – 1500 words
View point – 1500 words
Policy briefing – 1500-2000 words

If you would be interested in writing one of these thematic sections, please get in touch asap.

The issue will follow the usual formatting guidelines of Discover Society. Please consult the notes for contributors.

The Politics of Data (Science)

This special issue of Discover Society will explore the political implications of ‘big data’ and the systems of expertise emerging around it, including though not limited to Data Science. In doing so it will aim to bridge the gap between the methodological discourse surrounding data science and the political discourse beginning to emerge around ‘big data’. Here are some of the questions the issue will address:

– How is ‘big data’ understood and acted upon? How should we understand its cultural power?
– How is ‘big data’ reconfiguring the social sciences? Do we risk all science becoming data science?
– How and why has the ‘data scientist’ come to be seen as the ‘sexiest job of the 21st century’?
– Is the ‘data scientist’ just a ’Statistician who lives in Shoreditch?’ Or is this a genuinely new intellectual role?
– Can ‘big data’ address ‘big questions’? If not, is this a problem?
– What are the precursors of ‘data science’ within the academy and/or within corporations?
– What implications does corporate data science have for the relationship between corporations & consumers?
– What implications does national security data science have for the relationship between the state & citizens?
– Can the use of digital data lead to efficiency savings in public services? How does this relate to the politics of austerity?
– How could predictive privacy harms emerging from data analytics be addressed politically?
– Can the opacity of algorithmic processes be challenged? Or are we heading inexorably for a ‘black-box society’?
– How are new forms of digital data reconfiguring activity in particular social environments?

However these are just suggestions and ideas beyond the scope of this list are very welcome.

The deadline for contributions is June 15th. Contact mark@markcarrigan.net to discuss a potential contribution.

The article will constitute the July issue of Discover Society. Most articles will be 1500 words however there are a number of special sections in the online magazine.

Front line – 1500 words
View point – 1500 words
Policy briefing – 1500-2000 words

If you would be interested in writing one of these thematic sections, please get in touch asap.

The issue will follow the usual formatting guidelines of Discover Society. Please consult the notes for contributors.

In his A necessary disenchantment: myth, agency and injustice in a digital worldNick Couldry argues that transitions in media infrastructure are facilitating the emergence of a new myth of collectivity:

A new myth about the collectivities we form when we use platforms such as Facebook. An emerging myth of natural collectivity that is particularly seductive, because here traditional media institutions seem to drop out altogether from the picture: the story is focused entirely on what ‘we’ do naturally, when we have the chance to keep in touch with each other, as of course we want to do.

http://onlinelibrary.wiley.com/doi/10.1111/1467-954X.12158/abstract

This is coming to replace an older sense of media as the point of access to the centre of society. The reliance on media organisations to access flows of content helped constitute an understanding of centre and periphery, with the media facilitating access to the (mythical) centre of value, knowledge and meaning for the majority who experienced themselves as peripheral to it. The rapid diffusion of the internet, mobile computing and social networking engenders a new form of mediation, by ‘us’ rather than content producing media organisations, which helps shatter this previous myth of the ‘mediated centre’ and substitute it with a vision of human networks, animated by natural sociability, dispersed across national boundaries. As I understand Couldry’s argument, the power of this new myth derives in part from its displacement of the old: once our reliance on the old media organisations is seen to be shattered, our sociality is unbound, revealing a naturally co-operative inclination towards discussion, creation and sharing (see for example Clay Shirky’s theory of ‘cognitive surplus’). Obviously, the perception is erroneous and it serves vested interests: media organisations haven’t ceased to be party to communication, either in the sphere of content-production or facilitating communication, it’s only that their role has shifted with a change in the logic of their competition. This obfuscation serves the interests of platform providers in particular, as they drift towards being seen solely in terms of the provision of infrastructure rather than as corporate actors with increasingly vast lobbying operations.

Couldry’s concern is that “we must be wary when our most important moments of ‘coming together’ seem to be captured in what people happen to do on platforms whose economic value is based on generating just such an idea of natural collectivity”. Social media platforms present themselves as providing new enablements for and eliminating old constraints upon ‘natural collectivity’: their business model simultaneously relies upon monetizing the crowd which they have encouraged to gather, profiling behaviour in a manner susceptible to inference and allowing the growing data mining industry to do further work to this end. Their concern becomes less a matter of reaching as many people with adverts as possible (on occasions of mass attention driven by shared spectacle) but reaching the right people all the time. This is why ‘big’ data analytics are so tied up in the broader transformation of the media: the process itself demands innovation in order to extract the value it promises to generate. However this genuine computational challenge, as well as the economic interests which partly drive it, stand obscured behind the ‘myth of big data’ which Couldry takes aim at:

Myth works, as I’ve often argued following Maurice Bloch (1989) and Roland Barthes (1972), through ambiguity: through sometimes claiming to offer ‘truth’ and at other times to be merely playful, providing what, in the George W. Bush era, was called ‘plausible deniability’, but here at the level of claims about knowledge claims! So Mayer-Schonberger and Cukier, on the one hand, say big data bring ‘an  essential enrichment in human comprehension’ (2013: 96). They go further, proposing a large project of ‘datafication’ that involves quantifying every  aspect of everyday phenomena to enable big data analysts to find its hidden order: the result will be ‘a great infrastructure project’ like Diderot’s 18th- century encyclopaedia: ‘this enormous treasure chest of datafied information . . . once analysed, will shed light on social dynamics at all levels, from the individual to society at large’ (2013: 93–94, emphasis added). The world too will look different: ‘we will no longer regard our world as a string of happenings that we explain as a natural or social phenomenon, but as a universe comprised essentially of information’ (2013: 96, emphasis added). On the other hand, when the moral consequences of acting on the basis of ‘big data’ arises – for example, arresting people for offences they are predicted to commit but haven’t yet – they back off and say that big data only provide probabilities, not actualities, and worry about ‘fetishizing the output of our [data] analysis’ (2013: 151)

http://onlinelibrary.wiley.com/doi/10.1111/1467-954X.12158/abstract

It’s the final points which will be so crucial to understanding the trajectory of ‘big data’ in a social world rapidly acclimatising itself to these forms of intervention. The mythical sociability of ‘us’ stands in sharp contrast to the quantity and quality of the interventions we are potentially susceptible to in virtue of our participation in (digitised) social life: we stand exposed, fragmented and scrutinised before a diffuse and inscrutable power. Under these circumstances might we come to cling to the myth more tightly than ever for the security it provides? As Couldry points out in relation to big data, “we too are involved in its reproduction, supplying information (to government and countless other collectors, including social media platforms) about what we do, as we do it, allowing that information to supplant other possible types of information about ourselves, what we say, and how we reflect”. He goes on to call for an ethical engagement with these questions and the implications that they have for the social order:

The CEO of a big-data-based sentiment analysis company, sounds reasonable when he says that ‘if we’re right 75% to 80% of the time, we don’t care about any single story’ (quoted Andrejevic, 2013: 56). 4 . 4 But if the big data model works by equating our only forms of social knowledge with such probabilities, then we have already started organizing things so that the single story – your story,my story – really doesn’t matter. That raises fundamental questions about individual voice, and the way voice is valued in our societies.

http://onlinelibrary.wiley.com/doi/10.1111/1467-954X.12158/abstract

He doesn’t develop the point but it strikes me there’s a contradiction between the myth of ‘us’ and the myth of big data which could provide a focal point for resistance. In reality, the networked ‘us’ makes ‘big data’ possible. However symbolically, the reality of big data serves to negate the imagined promise of the ‘us’: can we reclaim an impulse towards networked sociality and co-operation in a way that resists corporate capture? Could the very force of the myth of ‘us’ be something that can be drawn upon to mobilise resistance to a world in which, as Couldry puts it, “corporate interests and the state seek to know us through big data”?

I noticed this in the foyer of Warwick’s sociology department this morning. I’d read about military recruitment of hackers in the US but hadn’t realized how widespread this co-option of hacking had become. I think it’s interesting to see the invocation of ‘hacking’ as part of the institutionalisation of data science in light of this (i.e. the view that data scientists are programmers and statisticians who can hack) – the meaning of ‘hacking’ is tied up in a broader set of issues which I don’t understand as well as I would like to.

photo

Program: The Data Incubator is an intensive six-week fellowship that prepares postdocs and PhDs in STEM + social science fields seeking industry careers as data scientists. The program is free for fellows and supported by sponsorships from dozens of employers across multiple industries. In response to the overwhelming interest in our earlier summer and fall sessions, we will be holding a winter postdoc fellowship.

Locations: There will be both an in-person (in NYC) and online section of the fellowship.

Dates: Both sections will be from 01/05/15 to 02/13/15

Who should apply: Anyone within one year of graduating from a PhD program or who has already obtained a PhD is welcome to apply. Applications from internaitonal students welcome. There is a common application for both the online and in-person sections. Everyone else (including non-PhDs) is enouraged to sign-up for a future session.

For additional information, checkout our website, blog, Venture Beat article, or Harvard Business Review piece.

This is very interesting. The author argues that “Data carpentry” is “not a single process but a thousand little skills and techniques”. He takes issue with the manner in which other ways of framing this dimension of what data scientists do obscure the craft inherent in it. I think this argument has important implications for the rapid expansion of data science courses and the risk that speed and modularisation lead ‘data carpentry’ to be rendered peripheral:

The New York Times has an article titled For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights. Mostly I really like it. The fact that raw data is rarely usable for analysis without significant work is a point I try hard to make with my students. I told them “do not underestimate the difficulty of data preparation”. When they turned in their projects, many of them reported that they had underestimated the difficulty of data preparation. Recognizing this as a hard problem is great.

What I’m less thrilled about is calling this “janitor work”. For one thing, it’s not particularly respectful of custodians, whose work I really appreciate. But it also mis-characterizes what this type of work is about. I’d like to propose a different analogy that I think fits a lot better: data carpentry. (Note: data carpentry seems to already be a thing).

Why is woodworking a better analogy? The article uses a few other terms, like data wrangling (data as unruly beasts to be tamed?) and munging (what is that, anyway?), neither of which mean much to me. I also like data curation but that’s also a bit vague. Data carpentry probably has something to do with wishing I could make things like Carrie Roy, but I should start by saying what I don’t like about the “data cleaning” or “janitor work” terms. To me these imply that there is some kind of pure or clean data buried in a thin layer of non-clean data, and that one need only hose the dataset off to reveal the hard porcelain underneath the muck. In reality, the process is more like deciding how to cut into a piece of material, or how much to plane down a surface. It’s not that there’s any real distinction between good and bad, it’s more that some parts are softer or knottier than others. Judgement is critical.

http://blogs.lse.ac.uk/impactofsocialsciences/2014/09/01/data-carpentry-skilled-craft-data-science/

I’m interested in the rapidity with which the role of ‘data scientist’ is emerging, the interests expressed within it and their conjunction in the institutionalisation of ‘data science’: what implications does the hype surrounding data science have for how data science courses are designed and marketed?

All science is becoming data science. Therefore data scientists have a lot of power in this regime [stifles a laugh] It’s a great time to be a data geek.

This is an interesting aside made by Bill Howe of Washington University in an early lecture on Coursera’s Introduction to Data Science MOOC. I take this to be the point that Emma Uprichard was making when she wrote about ‘methodological genocide’ last year in Discover Society:

At the risk of sounding a bit melodramatic, the big data hype is generating, for want of a better term, a methodological genocide. To my mind, it even has a flavour of being a disciplinary genocide. It is fierce and it is violent, and social scientists – and especially sociologists – need to fight back. Certainly, if we are going to meaningfully interrogate the social systems and structures that make up the social world, we will need to improve our quantitative skills. I know, I’m sorry to say it, I know this doesn’t always go down well among many social scientists, especially among those in the UK. But whilst I do think that one of the ways we will need to fight back is to increase our quantitative skills – we need to be clear about the kind of social science we move forward to [….]

Many new statistical techniques used to crunch through big data involve ‘shrinking’ the data. This not only ‘dilutes’ the importance of extreme cases – the outliers – within large datasets, but also focuses the analysis on the masses in the middle. One of the key strengths of social research and sociological research in particular is a sensibility to social divisions, minority groups, oppressed and silenced voices. In order to remain strong in these areas, we must absolutely remain attentive to the methodological techniques that go some way to erase extreme cases, pockets of extreme difference. Another big way of organising data is through data mining, machine learning and pattern recognition. At the core of those approaches, there are issues such as classification – who or what goes into which group and how are units of analysis measured as ‘similar’ or ‘different’? How should we count in a way that allows for meaningful counts over time? How we shape the social through our counting and classifying are highly political and ethical issues.

http://www.discoversociety.org/2013/10/01/focus-big-data-little-questions/

The only difference is that Howe thinks this is great. ‘Soft sciences’ are becoming ‘hard sciences’. Intellectual life is transforming and he finds himself at the centre of it. There are different ways in which the emerging field can be understood in intellectual terms. I find the emergence of data science in this context, within the university and outside of it, extremely interesting. It partly represents an interdisciplinary point of convergence driven by socio-technical innovation, the opportunities for inquiry facilitated by it and the technical challenges posed by them:

Data_Science_VDccsvenn

I honestly think it’s hard to overstate the significance of this for the Social Sciences as a whole. Much of their future development will hinge on the dynamics underlying the second venn diagram. I find it easy to imagine a future where computational social science becomes established as the vanguard of the social sciences, with disciplinary boundaries between them a thing of the past, buttressed by entrenched pockets of more discipline bound enquiry which nonetheless are implicitly and explicitly supportive of the computational social science project on an epistemic and methodological level. Meanwhile qualitative sociologists, anthropologists and other hold outs become less part of the diagram and more a circle on to themselves, hopefully doing sustained research in an interesting way but with the risk that they become preoccupied by hurling critique at the shiny and well-funded convergent project over the road from them. Hopefully I’m wrong because this seems like it would be a suboptimal state of affairs on many levels.

Another interesting thing about data science is the emergence of the ‘data scientist’ as an aspirational category. Not to worry though because You  can be a Data Scientist too!

Again there is socio-technical innovation, the opportunities they provide (for business) and the technical challenges posed by their exploitation. It builds upon the existing occupational role of the business analyst, adds additional skills and valorises ‘curiosity’. It is the sexiest job of the 21st century:

Goldman is a good example of a new key player in organizations: the “data scientist.” It’s a high-ranking professional with the training and curiosity to make discoveries in the world of big data. The title has been around for only a few years. (It was coined in 2008 by one of us, D.J. Patil, and Jeff Hammerbacher, then the respective leads of data and analytics efforts at LinkedIn and Facebook.) But thousands of data scientists are already working at both start-ups and well-established companies. Their sudden appearance on the business scene reflects the fact that companies are now wrestling with information that comes in varieties and volumes never encountered before. If your organization stores multiple petabytes of data, if the information most critical to your business resides in forms other than rows and columns of numbers, or if answering your biggest question would involve a “mashup” of several analytical efforts, you’ve got a big data opportunity.

http://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century/

The corporate demand for data science has led some to bemoan a ‘big data brain drain’ in higher education. In so far as there’s lots of money to be made in corporate data science and relatively little within the universities, with even the massive funding provision for data science research encouraging academic entrepreneurism but having little impact upon the academic career structure, it can’t be separated from the broader trajectory of the labour market in an age of austerity. Nor too can we adequately understand the emergence of data science if we don’t consider it in the light of the cultural ascendency of quants and the entrenchment of the quants within the finance industry and beyond.

I’m also intrigued about what drives these interdisciplinary trends at the level of intellectual biography. For instance physics is well represented within data science, as well as often being invoked in the discourse surrounding it as an illustration of the scientific rigour characterising data science. There has been a sharp increase in Physics PhDs  since 2000 in the US (source) – to what extent is this being driven by newly graduated post-doctoral physicists who, either out of curiosity or necessity, go marauding into other areas of inquiry because advancement in their own field is either unlikely or undesirable?

Screen Shot 2014-08-09 at 06.29.23

I just saw the LSE Impact blog posted this nice summary of the interview series I’m doing for them:

Rob_001

Rob Kitchin: “Big data should complement small data, not replace them.”
In this first interview, Rob Kitchin elaborates on the specific characteristics of big data, the hype and hubris surrounding its advent, and the distinction between data-driven science and empiricism.

evelyn

Evelyn Ruppert: “Social consequences of Big Data are not being attended to”
For the second interview, Evelyn Ruppert discusses creating an interdisciplinary forum to analyse the major changes in our relations to data, as subjects, citizens and researchers.

Deborah-Lupton-2

Deborah Lupton: Liquid metaphors for Big Data seek to familiarise technology
Deborah Lupton
 talks about how sociologists are involved in making sense of and positioning big data. Also of interest to social researchers are the nature metaphors used to discuss data, such as ‘flows’ and ‘flood’

SusanHalford.jpg_SIA - JPG - Fit to Width_144_true

Susan Halford: “Semantic web innovations are likely to have implications for us all”
Co-director of the Web Science Institute, Susan Halford underlines the necessity of broad interdisciplinarity as well as further technical training to engage in depth with the web as it evolves in different ways.

More to follow soon!

I just came across this great post by Helen Margetts on the LSE Impact Blog from a few months ago. It’s worth reading the post in full but what really caught my imagination were the five recommendations she makes at the end. I don’t think the methods training I received was bad but in retrospect I think it was hugely limited (and consequentially limiting). This needs to be addressed institutionally because otherwise conversations surrounding ‘big data’ are likely to become absurdly lopsided over time, as successive cohorts of data scientists are trained in a way that is relatively insulated from the traditional concerns of the social sciences. I think Helen’s third point is important as a matter of technical proficiency but perhaps even more crucial as a precondition for sustained interdisciplinary communication. So while my current strategy of gradually working through Code Academy might be useful for me, it’s not exactly a scaleable solution for the social sciences more broadly (though it does fit worryingly well with the privatisation of upskilling in order to ensure one’s own occupational viability in a changing labour market). These are Helen’s five recommendations:

  1. Accept that multi-disciplinary research teams are going to become the norm for social science research, extending beyond social science disciplines into the life sciences, mathematics, physics, and engineering. At Policy and Internet’s 2012 Big Data conference, thekeynote speaker Duncan Watts (physicist turned sociologist) called for a ‘dating agency’ for engineers and social scientists – with the former providing the technological expertise, and the latter identifying the important research questions. We need to make sure that forums exist where social scientists and technologists meet and discuss big data research at the earliest stages, so that research projects and programmes incorporate the core competencies of both.
  2. We need to provide the normative and ethical basis for policy decisions in the big data era. That means bringing in normative political theorists and philosophers of information into our research teams. The government has committed £65 million to big data research funding, but it seems likely that any successful research proposals will have a strong ethics component embedded in the research programme, rather than an ethics add on or afterthought.
  3. Training in data science. Many leading US universities are now admitting undergraduates todata science courses, but lack social science input. Of the 20 US masters courses in big data analytics compiled by Information Week, nearly all came from computer science or informatics departments. Social science research training needs to incorporate coding and analysis skills of the kind these courses provide, but with a social science focus. If we as social scientists leave the training to computer scientists, we will find that the new cadre of data scientists tend to leave out social science concerns or questions.
  4. Bringing policy makers and academic researchers together to tackle the challenges that big data present. Last month the OII and Policy and Internet convened a workshop in Harvard on Responsible Research Agendas for Public Policy in the Big Data Era, which included various leading academic researchers in the government and big data field, and government officials from the Census Bureau, the Federal Reserve Board, the Bureau of Labor Statistics, and the Office of Management and Budget (OMB). The discussions revealed that there is continual procession of major events on big data in Washington DC (usually with a corporate or scientific research focus) to which US federal officials are invited, but also how few were really dedicated to tackling the distinctive issues that face government agencies such as those represented around the table.
  5. Taking forward theoretical development in social science, incorporating big data insights. I recently spoke at the Oxford Analytica Global Horizons conference, at a session on Big Data. One of the few policy-makers (in proportion to corporate representatives) in the audience asked the panel “where is the theory”? As social scientists, we need to respond to that question, and fast.

http://blogs.lse.ac.uk/impactofsocialsciences/2013/11/11/5-recommendations-policy-making-big-data/