Chatting in natural language with a massive archive, built from hand-picked trustworthy sources

Interesting to see that Casey Newton shares my preoccupation with how generative AI might enable us to interact with our archive through natural language. My blog has 5000+ posts over 13 years containing every idea I was interested in enough to write about. There is no good way to interact with this using present tools (mainly because the context window on current tools is too small and extracting from WordPress is a pain) but it’s a real possibility which I can’t stop thinking about.

Earlier this year, like many productivity tools, Notion added a handful of AI features. I use two of them in my links database. One extracts the names of any companies mentioned in an article, creating a kind of automatic tagging system. The other provides a two- or three-sentence summary of the article I’m saving.

Neither of these, in practice, is particularly useful. Tags might theoretically be useful for revisiting old material, but databases are not designed to be browsed. And while we publish summaries of news articles in each edition of the newsletter, we wouldn’t use AI-written summaries: among other reasons, they often miss important details and context.

At the same time, the database contains nearly three years of links to every subject I cover here, along with the complete text of thousands of articles. It is here, and not in a note-taking app, that knowledge of my beat has been accreting over the past few years. If only I could access that knowledge in some way that went beyond my memory.

It’s here that AI should be able to help. Within some reasonable period of time, I expect that I will be able to talk to my Notion database as if it’s ChatGPT. If I could, I imagine I would talk to it all the time.

Much of journalism simply involves remembering relevant events from the past. An AI-powered link database has a perfect memory; all it’s missing is a usable chat interface. If it had one, it might be a perfect research assistant.

I imagine using it to generate little briefing documents to help me when I return to a subject after some time away. Catch me up on Canada’s fight with Meta over news, I might say. Make me a timeline of events at Twitter since Elon Musk bought it. Show me coverage of deepfakes over the past three months.

Today’s chatbots can’t do any of this to a reporter’s standard. The training data often stops in 2021, for one thing. The bots continue to make stuff up, and struggle to cite their sources.

But if I could chat in natural language with a massive archive, built from hand-picked trustworthy sources? That seems powerful to me, at least in the abstract.
https://www.platformer.news/p/why-note-taking-apps-dont-make-us?utm_source=post-email-title&publication_id=7976&post_id=136371006&isFreemail=true&utm_medium=email

Furthermore it could surface connections between posts in an effortless way. This was the promise of Roam, which it didn’t quite realise in practice, but which could be plausible in the future. The taxonomy of this blog has always been a problem to me because it’s unwieldy to recategorise, as the structures which would be useful change over time. There are threads running through it which I’m conscious of (themes and concepts which knit together different aspects of my work) which make me suspect there are many I’m not conscious of.

Chatting in natural language with a massive archive, built from hand-picked trustworthy sources

Share this: