by Dr Edward Kirton-Darling, Senior Lecturer at the University of Bristol Law School, and Professor Will Smith, Department for Computer Science, University of York,
On 11 September 2024, Google released an Audio Overview feature as part of NotebookLM, a ChatGPT-type tool which analyses uploaded documents and generates summaries, FAQs or other materials. Pitched as a “Note Taking & Research Assistant Powered by AI”, it is clearly targeted at students (and teachers). Amongst other new features, it contains the ability to turn a scholarly paper into a podcast, where two cheerful American AI hosts ‘take a deep dive’ into the research. Two academics in very different fields, who have been friends since childhood, discuss the development…
Ed: that podcast thing you sent is bonkers.
Will: astonishing isn’t it?
Ed: yes, I put a paper of mine through it, and the results are intriguing.
Will: in some ways great, in other ways, horrible?
Ed: yes, exactly. Hold up, maybe we could write about it together?
Will: yeah, why not.
Ed: and, I tell you what, we could do it as a dialogue!
Will: Yes! And we can use NotebookLM to turn it into a podcast for people to listen to.
Ed: I love it.
—
Ed: ok, so we would need to start by introducing ourselves, you go first.
Will: I am at the University of York living my dream of mucking around with computers all day and making them do fun and interesting things. Specifically, I work on computer vision (understanding images and video), graphics (making images) and machine learning (getting computers to work out how to do stuff from data rather than telling them what to do).
Ed: I’m at the University of Bristol, and I’m a socio-legal scholar, interested in lots of things, including inquests and investigations into death, housing and planning, homelessness and social welfare. And you are the computer scientist, so you can introduce NotebookLM.
Will: I think it would be funnier if you did it, but ok. Let me try to condense 50 years of AI research into a few digestible sentences – a bit like NotebookLM now I think about. We need to start first with large language models.
Ed: Like ChatGPT?
Will: Exactly. These are enormous neural networks that…
Ed: Wait, I’m already lost. What’s a neural network?
Will: The details aren’t that important. It’s enough to know that they are functions that map inputs to outputs according to lots of parameters (billions in the case of LLMs) that can be adjusted to change the input to output mapping until it does what we want it to. LLMs are initially trained to be good at “next token prediction”. If I said guess the next word: “The cat sat on the…”
Ed: You want me to say “mat” but I am not going to play along with your little game. So, er, “rug.”
Will: You didn’t take my bait, but actually your answer is plausible right? If you looked at the entirety of written English, the most probable next word would likely be “mat”. But “rug” is also an acceptable answer, just less likely. You can imagine assigning a probability to every word in English. “mat” would be highest, “rug” would be fairly high while “floccinaucinihilipilification” would be vanishingly small. This is exactly what an LLM learns to do: given a sequence of “tokens” (not exactly words – more like a few characters) as input, it is trained to output probabilities over all possible tokens for how likely they are to come next.
Ed: And, for the purposes of exposition, how does it learn to do this?
Will: Well, you take the entire content of the written internet (or something close to this), feed it passages of text and encourage it to output a high probability for the token that actually came next. This turns out to be sufficient to learn grammar, writing style and also to store a huge amount of knowledge within the model.
Ed: Ok – I’m with you, it makes a lot more sense than that internet thing you tried to explain to us in 1995 (it’ll never catch on). But how do you get from there to a podcast?
Will: There’s one more step of training from this generic language model to something like ChatGPT or NotebookLM podcasts. You now need to “fine-tune” it (just meaning do a bit more training on more limited data) to make it create desirable text. You do this by getting humans to rate the output and score how good an answer is or how like a podcast script it is. This is called “alignment” and leads to better output.
Ed: Ok, so now we have a model that spits out text that sounds like a podcast script?
Will: Exactly. Now for actually using the model. You need to put one of our papers into its “context”. When it’s deciding on the next token it can take into account lots of preceding content (we call this attention). So now it is generating a podcast script where each token depends on the script so far but also our paper. As a final step, NotebookLM will have a very good text-to-speech generator to turn the script into audio. So what paper did you run through it?
Ed: My paper brings together scholarship on social welfare law/policy and research on the contemporary inquest. I look at links between the two and examine the ways in which Coroners and inquests increasingly – as I argue – engage with questions of social welfare when investigating deaths.(1) And how about you?
Will: It’s to do with reconstructing 3D models of outdoor scenes where you can edit the lighting (2). And what did you think about how it dealt with your paper?
Ed: I’m being deeply unoriginal when I say I feel pretty ambivalent about AI and large language models, about their implications and who they will benefit, but I am trying to be open minded, and I was impressed – it is slick and really sounds like a proper conversation. If you didn’t know any better, it would be a really persuasive bit of content.
Will: There is a but coming.
Ed: Yes, and it is a bit paradoxical. It seems as if the AI found emoting easier than explaining. The podcast ‘hosts’ provide a simulation of empathy and mourning in a discussion about death, persuasively imitating sadness at the ‘tragedies’ and emphasising how terrible the facts are. It is a performance of grief which is understandable given the training which LLMs have received, but which I find pretty distasteful and which importantly distracts from some important problems in their analysis. For example, they miss the criticism that to use the language of tragedy is to suggest that a death in these circumstances is inevitable and unavoidable. And they also make various mistakes with the law.
Will: Like what?
Ed: Well, they only had my article to go off, and so it is perhaps understandable that they don’t clearly explain the court structure, but worse than this – because it is discussed explicitly in the article – they incorrectly state that Coroners can make findings of negligence. This is a very fundamental error, and I am intrigued by how the AI got here. Could it be something around the way in which it uses synonyms? Obviously, law requires precision, and when it comes to the conclusions of an inquest, the process dances delicately between a statutory requirement not to make any findings suggestive of civil liability, and requirements to make findings which might be judgemental. It means that swapping the permitted word ‘failings’ for the unlawful ‘negligence’ is categorically wrong, something which the AI clearly does not appreciate. But tell me about yours.
Will: In my paper, it uses a very clever metaphor to describe our approach, evoking the complementary roles of set and lighting designers on a stage to explain how the method works. If our method had actually worked as they described then it would have been a beautiful way of explaining it. But it doesn’t. The false confidence of the presenters as they seemingly simplify a complex concept to an easy-to-follow analogy is concerning as the explanation is fundamentally wrong. It also made me worry about what happens when you put incorrect information into its context. If my paper had contained unscientific vaccine misinformation or climate change denial, would the podcast have cheerfully and convincingly used these “facts” to create persuasive arguments?
Ed: That is not a happy thought. I also found that, listening to yours, there was an oddly chilling part at the end where the AI podcast warns listeners to be careful about telling the difference between reality and a world created by computers. Was this in your article or was this something they made up?
Will: They made it up! (We should probably stop anthropomorphising by calling it “them” instead of “the statistical next token prediction machine”). But then again, this shouldn’t be surprising at all. Imagine how many articles about AI it will have seen in its training data that conclude by talking about the dangers.
Ed: So we should avoid that I guess. So, what do you make of it?
Will: My first thought is that podcast presenting style and content must be incredibly formulaic because it is so successful at replicating it! More worryingly, I think that if students use it for literature reviews or non-academics do, they are going to end up with superficial understandings and in many cases misunderstandings. You have to wonder if there is any value at all if you can’t trust it?
Ed: I wondered if there was an accessibility argument for it, but then again, there is already sophisticated software in this context. But if you know the materials already, might it be a good way to think about a paper you know well, to provoke new perspectives on it?
Will: I use ChatGPT regularly but it’s never brought me new perspectives. I mainly use it for more mundane queries relating to coding. If I already knew a source well, I would be surprised if it can really provide new insight beyond statistical summarisation of the content at quite a superficial level. I do think there are really interesting outreach possibilities in terms of making our research accessible to a wide audience (imagine a version where the voices were trained on the authors own voice!) But on the other hand, if there is now a deluge of AI-generated research podcasts that are formulaic in their tone and format and inaccurate in their content, people will likely get tired of it pretty quickly.
Ed: Yes, that is true. This version is quite fun for playing about with though, and I’m looking forward to seeing what it makes of this blog.
Will: It is definitely fun. The best use of it I have seen so far is my daughter putting stories into it which she has made up, and then falling about laughing while two adults earnestly dissect what she’s written. Maybe this says something about the value of it in earlier stages of education?
Ed: That makes sense – the interactivity aspect of it is great, and treating at the level Harriet does – as a bit of a daft thing to play with – is much better than getting riled up about fake sincerity. Anyway, let’s go listen to what they, sorry it, has made of all this…
[Will and Ed listen to the podcast of this blog]
Podcast of Will and Ed’s blog
Ed: the lipstick on a pig thing doesn’t make any sense.
Will: It is really good in places though!
Ed: Yeah, ish, ‘it really makes you think’ is about as banal a sentence as it is possible to come up with, and there is a lot which isn’t very convincing – all the wow, are we really thinking for ourselves stuff.
Will: I don’t mean it is convincing, I mean that the levels of meta-ness are hilariously brilliant!
Ed: Sorry, you are quite right, I was approaching it as a serious thing, but I should remember the correct approach is to ask, what would Harriet do?
Will: Exactly, hold onto the daftness.
We hope you found our chat interesting, we’d be keen to hear your thoughts @ekd.bksy.social and @willsmithvision.bsky.social
References:
- Kirton-Darling, E. (2023). Death, social reform and the scrutiny of social welfare provision: the role of the contemporary inquest. Journal of Social Welfare and Family Law, 45(4), 363-386.
- Gardner, J. A., Kashin, E., Egger, B., & Smith, W. A. (2025). The Sky’s the Limit: Relightable Outdoor Scenes via a Sky-Pixel Constrained Illumination Prior and Outside-In Visibility. In European Conference on Computer Vision (pp. 126-143). Springer, Cham.
Leave A Comment