Academic Integrity

AI Detectors

As soon as ChatGPT was released, there was a scramble to create detection tools, similar to text-matching tools such as TurnItIn®, which can alert instructors to the possibility that a student has copied material. One of the first tools to garner attention was GPTZero, created in December 2022 in an Etobicoke coffee shop by Princeton University computer science student Edward Tian during his winter break. In the intervening year, we have seen a flood of tools that purport to identify human-written vs AI-generated work, but no tool has consistently performed adequately, let alone well. All tools generate both false positives and false negatives, and their accuracy rates are low enough to not be able to trust them (Gewirtz, 2023; Watkins, 2023a).

Marc Watkins asserts that

AI text detectors are not analogous to plagiarism detection software, and we need to stop treating them as such. AI detectors rely on LLMs to calculate probability in their detection. Unlike plagiarism detection, there is no sentence-by-sentence comparison to another text. This is because LLMs don’t reproduce text—they generate it. False positives abound, and these unreliable AI detection systems are sure to further erode our relationships with our students. (Watkins, 2023b)

Watkins has a long list of cautions for using ad-hoc AI checkers, including:

  • copyright implications (students give explicit consent for instructors to use plagiarism checkers);
  • privacy and security concerns (institutions establish business relationships, and so at least theoretically, they vet and can hold accountable companies such as TurnItIn®, but these new black-box tools have not yet been scrutinized); and
  • lack of rigorous testing for accuracy (Watkins, 2023b).

Another significant drawback to current AI detection tools is that they tend to misidentify the written work of students who are not native English speakers (Liang et al., 2023).

Obviously, if a student hands in an essay with a much more sophisticated writing style than is found in the short paragraphs that they’ve written in class, or demonstrates a command of English in their final project that is not borne out by the emails they send to their professor, there is reason to investigate further. However, given the burden of proof for undertaking student discipline for academic misconduct—and the potentially traumatizing effects on a student of being falsely accused — AI checkers are not currently useful tools in academic integrity enforcement.

ChatGPT and other LLM-based tools are not creating an academic misconduct issue, but they will perhaps serve as the tipping point to motivate institutions to implement a culture of intentional academic integrity, where strategies for ensuring academic integrity are explicitly taught and where course learning outcomes include mentions of ethics and integrity (as applied in the classroom and/or in the field). Subsequent sections will examine some approaches to assessment, from small tweaks to complete overhauls.

 

 

License

Share This Book