New ask Hacker News story: Ask HN: GPT-3 and Copyright
Ask HN: GPT-3 and Copyright
3 by kippinitreal | 1 comments on Hacker News.
I've had a thought stuck in my head and I need someone to talk me through it. GPT-3 has essentially compressed the internet (since that's essentially its massive training corpus) into 200 billion parameters. This is super cool and clearly leads to some neat applications, but shouldn't there be concerns that content creation is just plagiarizing internet content? Like, sure all creation is a form of imitation but it was basically trained to plagiarize via next word prediction! It's aimed to be semantically similar rather so we aren't going to see character-for-character copies, but should we still be concerned? What threw me off are the examples of it doing math: Like, it can figure out "2 + 2" => "4" because somewhere on the internet there's a line like "2+2=4", "two plus two equals four" or even "II + II = IV" but it's probably not actually doing math, it's just copying someone else's answer (which isn't always right). Shouldn't we expect the same if your prompt leads it to remember something specific from training? If you were to have it generate a paper for you from a prompt how worried should you be that it wasn't plagiarizing someone's paper with a similar prompt? Does the fact that we don't know if the output is original limit the commercial uses of this sort of technology? I'm not even sure how you define content "originality" without going through a judge... Internet friends...please educate me! Note: I don't have access to GPT-3 API so these thoughts are based on examples and my familiarity with transformer architectures.
3 by kippinitreal | 1 comments on Hacker News.
I've had a thought stuck in my head and I need someone to talk me through it. GPT-3 has essentially compressed the internet (since that's essentially its massive training corpus) into 200 billion parameters. This is super cool and clearly leads to some neat applications, but shouldn't there be concerns that content creation is just plagiarizing internet content? Like, sure all creation is a form of imitation but it was basically trained to plagiarize via next word prediction! It's aimed to be semantically similar rather so we aren't going to see character-for-character copies, but should we still be concerned? What threw me off are the examples of it doing math: Like, it can figure out "2 + 2" => "4" because somewhere on the internet there's a line like "2+2=4", "two plus two equals four" or even "II + II = IV" but it's probably not actually doing math, it's just copying someone else's answer (which isn't always right). Shouldn't we expect the same if your prompt leads it to remember something specific from training? If you were to have it generate a paper for you from a prompt how worried should you be that it wasn't plagiarizing someone's paper with a similar prompt? Does the fact that we don't know if the output is original limit the commercial uses of this sort of technology? I'm not even sure how you define content "originality" without going through a judge... Internet friends...please educate me! Note: I don't have access to GPT-3 API so these thoughts are based on examples and my familiarity with transformer architectures.
Comments
Post a Comment