Raiding the inarticulate since 2010

accelerated academy acceleration agency AI Algorithmic Authoritarianism and Digital Repression archer Archive Archiving artificial intelligence automation Becoming Who We Are Between Post-Capitalism and Techno-Fascism big data blogging capitalism ChatGPT claude Cognitive Triage: Practice, Culture and Strategies Communicative Escalation and Cultural Abundance: How Do We Cope? Corporate Culture, Elites and Their Self-Understandings craft creativity critical realism data science Defensive Elites Digital Capitalism and Digital Social Science Digital Distraction, Personal Agency and The Reflexive Imperative Digital Elections, Party Politics and Diplomacy digital elites Digital Inequalities Digital Social Science Digital Sociology digital sociology Digital Universities elites Fragile Movements and Their Politics Cultures generative AI higher education Interested labour Lacan Listening LLMs margaret archer Organising personal morphogenesis Philosophy of Technology platform capitalism platforms populism Post-Democracy, Depoliticisation and Technocracy post-truth psychoanalysis public engagement public sociology publishing Reading realism reflexivity scholarship sexuality Shadow Mobilization, Astroturfing and Manipulation Social Media Social Media for Academics social media for academics social ontology social theory sociology technology The Content Ecosystem The Intensification of Work The Political Economy of Digital Capitalism The Technological History of Digital Capitalism Thinking trump twitter Uncategorized work writing zizek

The role of metaphors in framing Data Science

This is very interesting. The author argues that “Data carpentry” is “not a single process but a thousand little skills and techniques”. He takes issue with the manner in which other ways of framing this dimension of what data scientists do obscure the craft inherent in it. I think this argument has important implications for the rapid expansion of data science courses and the risk that speed and modularisation lead ‘data carpentry’ to be rendered peripheral:

The New York Times has an article titled For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights. Mostly I really like it. The fact that raw data is rarely usable for analysis without significant work is a point I try hard to make with my students. I told them “do not underestimate the difficulty of data preparation”. When they turned in their projects, many of them reported that they had underestimated the difficulty of data preparation. Recognizing this as a hard problem is great.

What I’m less thrilled about is calling this “janitor work”. For one thing, it’s not particularly respectful of custodians, whose work I really appreciate. But it also mis-characterizes what this type of work is about. I’d like to propose a different analogy that I think fits a lot better: data carpentry. (Note: data carpentry seems to already be a thing).

Why is woodworking a better analogy? The article uses a few other terms, like data wrangling (data as unruly beasts to be tamed?) and munging (what is that, anyway?), neither of which mean much to me. I also like data curation but that’s also a bit vague. Data carpentry probably has something to do with wishing I could make things like Carrie Roy, but I should start by saying what I don’t like about the “data cleaning” or “janitor work” terms. To me these imply that there is some kind of pure or clean data buried in a thin layer of non-clean data, and that one need only hose the dataset off to reveal the hard porcelain underneath the muck. In reality, the process is more like deciding how to cut into a piece of material, or how much to plane down a surface. It’s not that there’s any real distinction between good and bad, it’s more that some parts are softer or knottier than others. Judgement is critical.

http://blogs.lse.ac.uk/impactofsocialsciences/2014/09/01/data-carpentry-skilled-craft-data-science/

I’m interested in the rapidity with which the role of ‘data scientist’ is emerging, the interests expressed within it and their conjunction in the institutionalisation of ‘data science’: what implications does the hype surrounding data science have for how data science courses are designed and marketed?