The thorny problem of authorship in a world of AI

The thorny problem of authorship in a world of AI

August 26, 2024 0 By JR

I ran into this from a couple of different places over the past few days. First, over at Though Shrapnel. Doug introduces the quoted piece with a pretty provocative statement,

This is an interesting article by Justine Tunney who argues that Open Source developers are having their contributions erased from history by LLMs. It’s interesting to consider this by field, as LLMs seem to have no problem explaining accurately what I’m known for (digital literacies, etc.) As Tunney points out, the world of Open Source is a gift economy. But if we’re gifting things to something ingesting everything indiscriminately and then regurgitating in a way that erases authorship, is that problematic?Doug Belshaw

The OLDaily newsletter included a couple of follow-up points,

Looking at my own learning, I would find it impossible to credit everyone I learned from in order to create, say, this post. Sure, where I’m directly quoting someone, I can credit them. That’s a trivial problem AI could easily solve when it directly quotes someone. But if AI learns the phrase ‘points out’ from, say, 500 different examples, does it make sense to credit each of them?Stephen Downes

It’s an interesting and challenging thing to articulate. At what point should one attribute or cite the origin of ideas, words, audio, video, etc. they create? As Downes suggests, obviously, if I’m directly quoting someone, then I should disclose that. Usually, that ends up in the form of a citation. Having worked a lot with CC licenses (I’m not so familiar with the licenses the OP uses to contribute to open source) I’ve extensively used attribution guidelines. There I can acknowledge the who, where from, and how have I changed things.

But we see patterns with people where this direct connection to the origin breaks. One ends up being a kind of modern folklore. The learning pyramid for example, oft cited (and bullsh*t) model of how people learn. Loosely based on Dale’s Cone for multimedia. Or the folklore about people learning 60000 times faster visually than reading (also meaningless claim but it’s out there). The other pattern that may feel close to home, is as Downes suggests about if you steep in enough content for long enough the pattern of thought you’ve steeped in sticks. That’s a crude way to explain how vectors might work in an LLM.

But what caught my eye in the post was this statement, “Is this the future we want? Imagine if Isaac Newton’s name was erased, but the calculus textbooks remained.” It’s always good to ask whether we want a particular future. And it’s certainly a problem if people are being ripped off and erased from their labour. Interestingly, the example shows us (although unknowingly, or maybe just not stated) that erasing people from their work is not an LLM or AI problem. It’s a human problem. There are so many ways this current GenAI craze holds a mirror up to humanity, and we’re kind of repulsed by it.

One could say who cares if Isaac Newton’s name was erased from calculus books, we’d still have calculus. In fact, we have erased many many brilliant mathematicians from learning resources, and out of the general mathematical lexicon while still benefiting from it. Madhava of Sangamagrama developed concepts of infinite series and other calculus theories hundreds of years before Newton. Although algebra still references Al-Khwarizmi’s book, we don’t call the quadradic formula the Al-Khwarizmi formula (but we might if they had been European); consider “Newton’s Laws of Motion” instead of just calling them the Laws of Motion. Similarly, we don’t call the binomial theorem the Omar Khayyam theorem. What we commonly refer to as the Pythagorean theorem now was present elsewhere, again, hundreds of years prior. So the pattern here shouldn’t be so surprising, typically outside of Europe formulas, principles, etc. would be named for their utility. However, when developed within Europe (or just popularized there), naming conventions seem to trend toward the “discoverer.”

Watching the developments around how we deal with what we see in the mirror remains interesting. Are we happy with how we have acted in the past? What is the future we want?