Monday, January 8, 2024

Plagiarism



There's something else worth talking about, that Subotic sort of points at here but doesn't come out and say: that while we weren't looking the job of identifying plagiarism has been turned over to an AI device that matches the words in the text with all the previously published words it knows about—it's been automated, meaning it finds a lot more stuff than anybody ever found before, much more than your professors could find when they had to rely on Google to search it out for them, and almost infinitely more than in the millennia before Google existed (the term was invented by the Latin epigrammatist Martial, annoyed with a fellow Roman who was in the habit of reciting his, Martial's, poems in public with the claim that he'd written them himself—Martial liked to think of his published poems as slaves that he had set free, and called the impostor a plagiarius, a slave-kidnapper).

Robots shouldn't be tagging plagiarism for the same reason they shouldn't be tagging pornography, really; because unlike Justice Stewart, they don't and can't "know it when I see it." They don't know anything. They can be furnished with an algorithm that labels pictures as "porn" and "not-porn" by the criteria the algorithm supplies, and that's it, and you already know how well that works:

The algorithms that drive products like YouTubeFacebook, and Apple's iOS software share a common challenge: They can't seem to consistently distinguish between pornography and sexual and reproductive health content....

The online sexual health company O.school reported in October how the iPhone's new software, with the parental control setting enabled, blocked not just its website but numerous entertainment sites and health resources for teens and adolescents. While the filter restricted sites like Teen Vogue and Scarleteen, it didn't deny users access to websites like the neo-Nazi Daily Stormer or the anti-gay Westboro Baptist Church.

The plagiarism algorithm looks better than that, because the things it finds are kind of undeniable: your author really did use those words, and so did somebody else:

Original text from acknowledgments in Facing Up to the American Dream: Race, Class, and the Soul of the Nation by Jennifer L. Hochschild:

Sandy Jencks showed me the importance of getting the data right and of following where they lead without fear or favor . . . [Jencks] drove me much harder than I sometimes wanted to be driven.

Dr. Gay’s acknowledgments from her Harvard dissertation:

[My thesis advisor, Gary King] reminded me of the importance of getting the data right and following where they lead without fear or favor [and my family] drove me harder than I sometimes wanted to be driven.

But it doesn't know what plagiarism is, just as YouTube's anti-pornography software doesn't know what pornography is; all it knows is its algorithm, and so it ends up flagging examples like this one, on which we can't come to any agreement as to whether it is or not. (To me, it's totally triviality—who is ever going to read a dissertation acknowledgement page? How could they possibly care whether the form is original?) Thus one of Dr. Gay's chief persecutors, billionaire financier and Harvard board member Bill Ackman, was firm on the question of Gay's guilt and the need for her to resign until it turned out that his own wife, Neri Oxman, had committed radically worse offenses in her own Ph.D. dissertation, when he found out that it was no big deal:

In an extensive, 5,139-word post on X made Saturday evening, Ackman — who led the crusade to get Harvard President Claudine Gay to resign over plagiarism allegations — said it is "a near certainty that authors will miss some quotation marks and fail to properly cite or provide attribution for another author on at least a modest percentage of the pages of their papers."

"Some plagiarism is due to the laziness of the author. Laziness is not a great excuse for a member of the faculty, but it does not seem like a crime to me," Ackman wrote. (Business Insider)

Of course in another part of this gigantic manifesto he also suggests that the entire faculty at MIT (where Oxman once held a tenured teaching position) should have all their publications run through the Turnitin mill. His general view thus seems to be "If I don't like them, it's plagiarism, and if I do it isn't."

What I think, if you don't mind my saying so, is that the algorithmic model for identifying plagiarism is intrinsically flawed, and shouldn't be used at all, or only under certain very strict conditions,  which I'll clarify. 

First, we have to think more clearly about what plagiarism is, and why it's a problem. Starting with the fact that, as everybody understands, beginning with Wikipedia, it isn't normally a crime.

Plagiarism is typically not in itself a crime, but like counterfeitingfraud can be punished in a court[12][13] for prejudices caused by copyright infringement,[14][15] violation of moral rights,[16] or torts. In academia and in industry, it is a serious ethical offense.[17][18] Plagiarism and copyright infringement overlap to a considerable extent, but they are not equivalent concepts,[19] and many types of plagiarism do not constitute copyright infringement, which is defined by copyright law and may be adjudicated on by courts.

It boils down to two very different basic kinds of thing: the tort, typically in commercial publishing, where the author of the stolen words claims to have suffered a loss and can sue for damages; and the breach of an honor code, typically in a school, where the author of the stolen words isn't involved, and the punishment is of a different kind, like expulsion from the community. (Academic publishing is kind of an extension of school; there's no money in it, though there is a tangible reward to the extent that getting publications is good for job security, and the punishments are of the same type.)

Anybody who does academic advisement is trained to tell students that if they cheat, whether by plagiarism or by stealing exam answers or buying ghostwritten papers, they're harming themselves, because doing the work is an essential part of the learning process in the education they're seeking (and especially at tertiary level paying an inordinate amount for), and if they don't do the work they're not getting the full benefit. 

Thus, if you copy an unattributed Wikipedia enumeration of a set of processes into your work, it looks as if you might not have given any thought to what the processes are, and left it unattributed because a Wikipedia footnote is embarrassing; what you should have done is gone from Wikipedia's well-footnoted text to their source, and rephrased that, to show that you know what you're talking about. In contrast, if you copy a peculiar formula for thanking your advisor from somebody else, it's weird, but it doesn't show you as cheating yourself.

And that's what your faculty, at least in principle, is concerned with. This is the reason for the often insanely complex and irritating system of citation references: it's supposed to ensure that you can prove you did the work, not just for your sake, but maybe still more for the value of the degree coming from their institution, which the faculty supposedly wants to protect. At best, it's understood as their job to stop you from doing it before you do it.

Even if that means spending more time than they want reading your deathless prose. So be it. For teachers, with the big projects, from the senior thesis to the Ph.D. dissertation, the job should be to prevent students from getting busted for plagiarism—read the thing before the final submission, and show them how to fix their mistakes and failures, at least the ones they can catch. 

Anyhow, as somebody who's spent considerable time in recent years hunting for plagiarism out of pure malice, against New York Times columnist David Brooks (as most recently at the Substack), I feel qualified to tell you all that the way to do it isn't the way Christopher Rufo does (though he's definitely got all the malice you could want). It's too lazy. 

The big trick is to not do anything until you're made properly suspicious, as I was, for instance, by this Brooksery in 2019:

Trolls bid for attention by trying to make others feel bad. Studies of people who troll find that they score high on measures of psychopathy, sadism and narcissism. Online media hasn’t made them vicious; they’re just vicious. Online has given them a platform to use viciousness to full effect.
Trolls also score high on cognitive empathy. Intellectually, they understand other people’s emotions and how to make them suffer. But they score low on affective empathy. They don’t feel others’ pain, so when they hurt you, they don’t care.

Talking about how trolls "score high" and "score low" on various parameters suggest that he's consulting some formal psychological study, but he's not telling you what it is—David Brooks Plagiarism Watch! Then Google around for the unnamed source (turned out to be an Australian study by Natalie Sest and Evita March, "Constructing the cyber-troll: Psychopathy, sadism, and empathy", that appeared in Personality and Individual Differences, June 2017). The next step is to find out who else has referred to Sest and March; I fairly quickly found a very good candidate in a magazine I know Brooks reads, The Atlantic, and Luke O'Brien's "The Making of an American Nazi" (December 2017):

In recent years, psychologists have found a powerful connection between trolling and what’s known as the “dark tetrad” of personality traits: psychopathy, sadism, narcissism, and Machiavellianism. The first two traits are significant predictors of trolling behavior, and all four traits correlate with enjoyment of trolling. Research published in June by Natalie Sest and Evita March, two Australian scholars, shows that trolls tend to be high in cognitive empathy, meaning they can understand emotional suffering in others, but low in affective empathy, meaning they don’t care about the pain they cause. They are, in short, skilled and ruthless manipulators.

Bingo! I can't say why he chose to plagiarize it (could have been simple laziness; O'Brien, in Atlantic style, doesn't footnote the thing, though he gives you enough information to Google it, and Brooks didn't feel like going to the trouble, but he also didn't want to credit O'Brien, so he just left it there as if to suggest he'd done the study himself, but it's pernicious all the same, in spite of that fool Ackman), but plagiarism it absolutely is, and of a rather more serious variety, depriving a really good young professional reporter of his fair credit, than dumb copying out of Wikipedia.

But an algorithm approach like Turnitin would never catch it, because Brooks has hidden it in the folds of his rewriting, and Rufo will never find any plagiarism worth finding. Though I think I may keep my eyes open to see if he might commit some himself.

 

No comments:

Post a Comment