Dispatches: Let's get serious about plagiarism

Dispatches: Let's get serious about plagiarism

 * By Awadewit, Elcobbola, Jbmurray, Kablammo, Moonriddengirl and Tony1

Plagiarism, as Wikipedia's article on the topic explains, "is the use or close imitation of the language and ideas of another author and representation of them as one's own original work." At best it is intellectual sloppiness and at worst outright theft. As Robin Levin Penslar notes in Research Ethics: Cases and Materials, "The real penalty for plagiarism is the abhorrence of the community of scholars." It can bring a community into disrepute. Wikipedia's editors should create their own articles, not adopt the work of others. But while this is an easy approach to recommend, plagiarism may not be as simple as it first seems—it is often committed inadvertently. The best way to prevent plagiarism is to understand clearly what it is, how to avoid it, and how to address it when it appears.

Understanding plagiarism
Wikipedia is not a primary source and contains no original research; therefore, everything that appears on Wikipedia should be rooted in a reliable source. The problem with plagiarism is not that it involves the use of other people's ideas, but rather that other people's words or ideas are misrepresented—specifically that they are presented as though they were "an editor's own original work". Even if contributors provide a citation for a sentence, it may still be plagiarism if they do not clearly indicate with quotation marks the duplication of the source's wording. Citations are universally understood as indicating a source for information, not as a license to copy the original wording.

There are three major ways to plagiarize:
 * 1) Failing to acknowledge the source of quotations and borrowed ideas;
 * 2) Failing to clearly mark copied language with quotation marks;
 * 3) Failing to sufficiently adapt a summary or paraphrase and thus following the wording of a source too closely

Plagiarism and copyright infringement
Plagiarism is not the same as copyright infringement: material can be plagiarized from both copyrighted and public domain sources. One report about a plagiarism scandal on Wikipedia claimed that "Wikipedia editors ... declared a handful [of the allegedly plagiarized articles] to be OK because copied passages came from the public domain." If this was indeed the reaction of Wikipedia editors, they were mistaken. To clarify this, think of the famous opening line of Jane Austen's novel Pride and Prejudice (1813): "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife." The text of this novel, like the text of the 1911 Encyclopedia Britannica, is in the public domain. However, these are Austen's words and even though no one owns the copyright to them any longer, we need to acknowledge that the wording is hers. By inserting this sentence without quotation marks into an article, Wikipedia editors would be plagiarizing Austen. Apart from the ethical need to credit her for her words, Wikipedia has a scholarly duty to inform its readers of the source of such a sentence, including the page number where the sentence can be found in the source.

Wikipedia policies say much about copyright violation, but far less about plagiarism. The guideline on the topic was written only last year and has yet to be adopted by the community. However, Wikipedia's co-founder Jimmy Wales took a clear stand on the issue in 2005: "Let me say quite firmly that for me, the legal issues [surrounding plagiarism] are important, but far far far more important are the moral issues. We want to be able, all of us, to point at Wikipedia and say: we made it ourselves, fair and square."

What to cite: the "common knowledge" exception
Not every fact contained in a Wikipedia article requires attribution. When a fact is "common knowledge"—that is, generally known—it is not plagiarism to repeat it, even if contributors learned it from a specific reference. For example, it is commonly known that Emily Dickinson published very few poems during her lifetime. Generally, if information is mentioned in many sources, especially general reference sources, and easily found, it is considered common knowledge. It is also acceptable to reproduce non-creative lists of basic information, such as an alphabetical directory of actors appearing in a film. While Wikipedia's verifiability policy encourages the citing of such information, a failure to do so is not plagiarism.

Although common knowledge and non-creative lists of basic facts do not "belong" to a source and do not require attribution to avoid plagiarism, less commonly known information, opinions and creative text do. Likewise, the creative presentation even of common knowledge, belongs to its original author. Contributors can safely re-use the fact, but not the language unless it is a title, as for a job or a creative work, or utterly devoid of creativity, such as a common phrase. From a copyright standpoint, the level of creativity required to claim ownership is minimal. The United States Supreme Court has indicated that under US copyright law, which governs copyright matters on Wikipedia, "[t]he vast majority of works make the grade quite easily, as they possess some creative spark, 'no matter how crude, humble or obvious' it might be." Similarly, most text will be creative enough that its replication will be plagiarism. Accordingly, while text such as "Dickinson was born on December 10, 1830" can be copied without quotation marks, care must be taken not to rely too much on the presumption that text is not creative. Further, one cannot copy an entire source in this way, claiming that it is "common knowledge" or uncreative text. In such cases, it can come down to the length of a string of exactly copied words; good editors get a feel for where it's starting to be dishonest not to attribute.

Less commonly known facts or interpretations of facts must be cited to avoid plagiarism, and creative text must either be quoted or properly revised.

Avoiding plagiarism
To construct articles that read smoothly while still remaining faithful to their sources, it is essential to learn how to properly use other people's ideas and words. Wikipedia contributors need to know when to give credit, how to adapt source material so that it can be used in an article, and when to use quotations.

Quotation
When editors want to use verbatim excerpts of a source, there is one simple way to avoid plagiarism: use direct quotations. The words from the source should be reproduced exactly as they appear in the original, enclosed within quotation marks, and identified by an inline citation after the quotation. However, direct quotations should not be overused. They run the risk of copyright infringement if the sources used are not free. Wikipedia's non-free content guidelines offer some guidance on when to use direct quotations and remind us that the "[e]xtensive quotation of copyrighted text is prohibited." But even when free sources are used, the overuse of direct quotation produces articles that are simply collections of quotations. The risk is a fragmentary effect in which the broader context of the quoted material is unclear, and readers are left to piece together the information, which often involves shifts in writing style.

Quotations should generally be used in the following situations:
 * "When language is especially vivid or expressive"
 * "When exact wording is needed for technical accuracy"
 * "When it is important to let the debaters of an issue explain their positions in their own words"
 * "When the words of an important authority lend weight to an argument"
 * "When the language of a source is the topic of your discussion"

Adapting sources: paraphrasing and summarizing
Source text is usually adapted using a combination of paraphrase and summary. These two styles generally differ in their level of detail. A summary is more likely to be used for longer expanses of text and to cover only the major points in a passage, omitting or touching lightly on examples or definitions; a summary is generally expected to be considerably shorter than the original source. By contrast, paraphrasing is more likely to be closer to the original and may be nearly as long as or even longer than the source.

Adapting source text, whether by paraphrasing or summarizing, is a valuable skill, and contributors to Wikipedia need to be alert to the potential for inadvertent plagiarism. Many editors believe that by changing a few words here or there—or even by changing a great number of the words found in the original source—they have avoided plagiarism. This is not necessarily the case. Nor does the mere rearrangement of clauses, sentences, or paragraphs avoid the problem.

In this example, Wikipedia's article text is an attempt at paraphrasing the source. However, almost all of the original word choice, word order and sentence structure is retained.
 * Problems in paraphrasing

Source
 * "A statement from the receiver, David Carson of Deloitte, confirmed that 480 of the 670 employees have been made redundant ... At least 100 Waterford Crystal employees are refusing to leave the visitors' gallery at the factory tonight and are staging an unofficial sit-in. The employees say they will not be leaving until they meet with Mr Carson. There were some scuffles at one point and a main door to the visitors' centre was damaged ... Local Sinn Féin Councillor Joe Kelly, who is one of those currently occupying the visitors' gallery, said the receiver had told staff he would not close the company while there were interested investors."

Wikipedia article:
 * "A statement issued by the receiver, Deloitte's David Carson, confirmed that, of the 670 employees, 480 of them would be laid off. The workers responded angrily to this unexpected decision and at least 100 of them began an unofficial sit-in in the visitors' gallery at the factory that night. They insisted they would refuse to leave until they had met with Carson. Following the revelations, there was a minor scuffle during which the main door to the visitors' centre was damaged. Local Sinn Féin Councillor Joe Kelly was amongst those who occupied the visitors' gallery."

Analysis:
 * "A statement issued by the receiver, Deloitte's David Carson, confirmed that, of the 670 employees, 480 of them would be laid off" vs. "A statement from the receiver, David Carson of Deloitte, confirmed that 480 of the 670 employees have been made redundant". – The structure of Wikipedia's statement is essentially the same as the original. Changing a single word and slightly reordering one phrase is not enough to constitute a paraphrase.
 * "They insisted they would refuse to leave until they had met with Carson" vs. "The employees say they will not be leaving until they meet with Mr Carson". – The structure of this sentence is the same.
 * "there was a minor scuffle during which the main door to the visitors' centre was damaged" vs. "There were some scuffles at one point and a main door to the visitors' centre was damaged". – The structure and language of the two sentences are the same.
 * "Local Sinn Féin Councillor Joe Kelly was amongst those who occupied the visitors' gallery" vs. "Local Sinn Féin Councillor Joe Kelly, who is one of those currently occupying the visitors' gallery". – This slight rewording does not change the fact that the underlying structure and language are the same. Minor changes, such as "was amongst those" --> "is one of those" and "occupied" --> "currently occupying", are not enough to constitute an original rewriting of the passage.

Good adaptation practice
In terms of both plagiarism and copyright, the author of a text not only "owns" the precise, creative language he or she uses, but less tangible creative features of presentation, which may incorporate the structure of the piece and the choice of facts. In terms of plagiarism, but not copyright, the author also "owns" the facts or his or her interpretation of them, unless these are, as mentioned above, common knowledge. Revising to avoid plagiarism means completely restructuring a source in word choice and arrangement while giving due credit for the ideas and information taken from it.

In this paraphrase, the language and structure of the passage has been significantly altered, making it an original expression of the ideas. The ideas have, of course, been properly credited.

Source:
 * "In earlier times, surveillance was limited to the information that a supervisor could observe and record firsthand and to primitive counting devices. In the computer age surveillance can be instantaneous, unblinking, cheap, and, maybe most importantly, easy." — From Carol Botan and Mihaela Vorvoreanu, "What do Employees Think about Electronic Surveillance at Work"? p. 126

Paraphrase:
 * "Scholars Carl Botan and Mihaela Vorvoreanu claim that the nature of workplace surveillance has changed over time. Before the arrival of computers, managers could collect only small amounts of information about their employees based on what they saw or heard. However, because computers are now standard workplace technology, employers can monitor employees efficiently (126)."

This adaptation, from the featured article about Thomas Eakins' The Swimming Hole, displays attribution of opinion and uses a combination of paraphrase and quotation:
 * The Swimming Hole represented the full range of Eakins' techniques and academic principles. He used life study, photography, wax studies, and landscape sketches to produce a work that manifested his interest in the human form. Lloyd Goodrich (1897–1987) believed the work was "Eakins's most masterful use of the nude", with the solidly conceived figures perfectly integrated into the landscape, an image of subtle tonal construction and one of the artist's "richest pieces of painting". Another biographer, William Innes Homer (b. 1929), was more reserved and described the poses of the figures as rigidly academic. Homer found inconsistencies in paint quality and atmospheric effect, and wrote that the painting was unsuccessful in reconciling antique and naturalistic ideals. For him, "it is as though these nudes had been abruptly transplanted from the studio into nature".

Unfortunately, there is no hard and fast rule for how much revision is necessary to avoid plagiarizing. In evaluating copyright concerns, the United States courts adopt a "substantial similarity" test that compares the pattern and sequence of two works, finding such similarity where "the ordinary observer [reading two works], unless he set out to detect the disparities, would be disposed to overlook them, and regard their aesthetic appeal as the same." Even if all of the language is revised, a court may find copyright infringement under the doctrine of "comprehensive non-literal similarity" if "the pattern or sequence of the two works is similar". Likewise, plagiarism may exist if readers comparing the two works would come away with a sense that one is copied from or too heavily based on another.

Editors should always compare their final drafts with the sources they have used to make sure that they have not accidentally come too close in language and structure or failed to attribute when necessary.

Research and writing methods: tips for avoiding plagiarism
One way editors can minimize the tendency to reuse text is to not copy and paste text into their working drafts. Instead, editors should assemble and organize their notes, excerpts, and other source materials by topic. This can be done either in hard copy or by using an electronic filing system. Editors should then read and absorb what the sources say and proceed to writing a draft version, in their own words, of each topic. These drafts can be assembled according to the editor's own organizational schema. There are a number of ways to organize material; editors should not slavishly follow a source's structure, either in overall organization, or in the composition and arrangement of sentences and paragraphs within each section. This method reduces the temptation (and makes it harder) to adopt verbatim language and organization from the sources.

At the same time, when taking notes from a source for their own use, editors may find it useful to take them verbatim, with quotation marks, if they will not have access to that source as they are writing their final draft. If a different language is used in note-taking, an editor may find him or herself accidentally restoring some of the author's original words when constructing a draft. Being able to see at a glance exactly how the source was written can help avoid this.

Use multiple sources, if possible. Editors may find it more difficult to avoid following that text too closely if they rely on only one source, as they will necessarily be limited to those details selected by the author of that original source. It is not impossible to revise and reorganize a single source sufficiently to avoid plagiarism or copyright infringement, but it is more difficult.

Spotting plagiarism
Editors should be careful not to add plagiarized material to Wikipedia, and can help to protect the integrity of the project by spotting plagiarism and helping to correct it. When large sections of a source are copied word-for-word into an article, it is often easy to spot and repair. The use of ideas or uncommon facts without credit, possibly the most common form of plagiarism, can be repaired by sourcing. Detecting and dealing with subtler forms of plagiarism may be more challenging, but is usually possible.

Red flags for plagiarism include:
 * Inconsistent authorial voice: Although many articles on Wikipedia are multi-authored, sudden switches in tone throughout an article may still be a sign of plagiarism. For example, if the "History" section in an article on a city sounds like a tourist brochure and the "Climate and geography" section is filled with highly technical and jargon-filled language, readers might suspect that sources have been followed too closely. A particular tip-off is the sudden introduction of sophisticated text or ideas that seem inconsistent with the authorial voice of the material surrounding it.
 * Inconsistent language: If the tone of an article or passage is colloquial or does not feel "right"—for example if jargon or idioms are used incorrectly—a source may have been misused.
 * Atypical elegance: A reader may have cause for concern if a section of an article seems to have been written "too well". Much writing on Wikipedia is not at the level of professional publications. Therefore, when readers suddenly come across professional-level writing on Wikipedia, with no spelling or grammatical errors, they may want to investigate further. If the article has been extensively copy-edited, such as in undergoing the featured articles process, such polish is in itself less of a red flag, though even extensive copy-editing does not guarantee the absence of plagiarism.
 * Rapid maturation: Fully developed articles that appear in very few edits may signal plagiarism. While sometimes editors construct articles in sandboxes or off-wiki, sometimes a sudden maturity of text signals that material has quickly been added without careful attention to the issue of plagiarism.

If you suspect plagiarism, you may wish to start by checking the article's history. If the article has a multi-authored feel but appears to be largely single-authored, there could be reason for concern, as this may suggest a contributor has borrowed too heavily from the diction of multiple sources. It may be worth checking the contribution history of an editor across a number of articles, to see if there is a discernible authorial voice or if there is a pattern of such inconsistency. There may be a history of such issues on the editor's talk page.

Another good starting point is to review the article's sources. Particularly when plagiarism results from misunderstanding—rather than intent to deceive—a contributor may clearly identify the sources from which s/he has plagiarized, and even link to them. If the source is in another language, for instance, the contributor may be under the mistaken belief that the act of translation is a sufficient revision to eliminate concerns of plagiarism. On the contrary, whether or not the work is free, the obligation remains to give credit to authors of foreign language texts for their creative expression, information and ideas, and, if the work is unfree, direct translation is likely to be a copyright violation as well. Concerned readers can also use search engines and automated plagiarism detection. When searching manually, it is helpful to isolate small sections of text from an article. However, some results found this way may be from mirrors and forks of Wikipedia itself, particularly if the article is not newly created.

Addressing plagiarism
There are templates such as or  that are added to the top of a suspect section or article and may draw attention to the problem; concerns might be noted at an appropriate WikiProject or forum. Just as Wikipedia currently has no clear guideline or policy on plagiarism, it has no clear forum for addressing plagiarism concerns. However, Wikipedia:WikiProject Copyright Cleanup stands to assist where plagiarism may co-exist with copyright infringement, and, even where it doesn't, project members may be able to assist with plagiarism.

If an article seems to follow the language and structure of another work too closely, first consider whether it is a matter of copyright infringement or plagiarism. If the source is not free and the text may represent a legal concern for Wikipedia, follow the procedures set out at Wikipedia's copyright violations policy. If the source is free, steps should be taken to remedy plagiarism. Wikipedia's proposed guideline on plagiarism suggests politely discussing concerns with the contributor. Further steps may need to be taken to address contributors who persist in plagiarism after being made aware of the problem, through Requests for comment or—if the contributor proves disruptive—through a report at the administrator's incidents noticeboard. The plagiarism will also need to be repaired as soon as possible. If it can be attributed, revised or turned into a usable quotation, it should be. If the editor who discovers the problem is unable to repair it or uncertain of how it should be addressed, it should be brought to the attention of other contributors.

As the main page says, Wikipedia is "the free encyclopedia that anyone can edit". Anyone can, and should, repair plagiarism.

Links

 * WikiProject Copyright Cleanup
 * Wikipedia:Close paraphrasing
 * Plagiarism.org
 * Quoting, paraphrasing and summarizing, Purdue University