Tuesday, September 11, 2007

arXiv - Turkish Plagiarism - Plagiarism and falsified data slip into the scientific literature: a report

http://arstechnica.com/articles/culture/plagiarism-and-falsified-data-slip-into-the-scientific-literature.ars


Plagiarism and falsified data slip into the scientific literature: a report

By John Timmer Published: August 07, 2007 - 11:23PM CT

The challenges of scientific integrity

Scientific progress is conveyed primarily through peer-reviewed publications. These publications are the primary source of information for everyone involved in scientific research, allowing them to understand the current scientific models and consensus and making them aware of new ideas and new techniques that may influence the work they do. Because of this essential role, the integrity of the peer review process is essential. When misinformation makes its way into the literature, it may not only influence career advancement and funding decisions; it can actually influence which experiments get done and how they are interpreted. Bad information can also cause researchers to waste time in fruitless attempts to replicate results that never actually existed.

Despite the danger represented by research fraud, instances of manufactured data and other unethical behavior have produced a steady stream of scandal and retractions within the scientific community. This point has been driven home by the recent retraction of a paper published in the journal Science and the recognition of a few individuals engaged in dozens of acts of plagiarism in physics journals. Ars has interviewed a number of the people involved in both of these cases, and we discuss their impact on the field and the prospects for preventing similar problems in the future.

Plagiarists run amok

Recently, Ars was informed that a number of papers with a set of overlapping authors were being withdrawn from the arXiv, a repository of publications and drafts in the physical sciences. We confirmed that several papers were no longer available and that their entries now lead to text that states, "This paper has been removed by arXiv administrators because it plagiarizes... " followed by a list of the sources of the plagiarized material (an example is here).

In at least one case, the final publication had been withdrawn but an earlier draft version was still available. Comparisons of the text (PDF) with the sources it was plagiarized from reveal the blatant nature of the fraud. Section 1 of that paper begins with an extensive copying of the introduction of a 2003 paper (PDF; copying starts with the second sentence of the introduction). Section 3 of the fraudulent work begins with a similarly large excerpt from the introduction of a different publication (PDF) that also dates from 2003. Although the arXiv has acted on the plagiarism, the fraudulent publication currently remains available at the Journal of High Energy Physics.

Ars contacted an arXiv administrator, who put us in touch with faculty at the Middle East Technical University in Ankara, Turkey, home of the authors of the fraudulent publications. Dr. Ozgur Sarioglu spoke on behalf of a group of METU faculty that also included Atalay Karasu, Ayse Karasu, and Bayram Tekin. They provided a PDF of the Journal of High Energy Physics article, marked up to reveal the source of much of the text. It contains material from at least a dozen different peer-reviewed works; the original material seems limited to a majority of the abstract and a limited number of mathematical derivations that rephrase equations published elsewhere.

According to Dr. Sarioglu, two of the authors of this paper were graduate students with a prodigious track record of publication: over 40 papers in a 22-month span. Dr. Karasu, who sat on the panel that evaluated their oral exams, became suspicious when their knowledge of physics didn't appear to be consistent with this level of output. Discussions with Dr. Tekin revealed that the students also did not appear to possess the language skills necessary for this level of output in English-language journals (METU conducts its instruction in English).

This caused these faculty members to go back and examine their publications in detail, at which point the plagiarism became clear. "All they had done was literally take big chunks of others' work using the 'copy and paste' technique," Dr. Sarioglu said, "steal from here and there to cook up an Intro which is basically the same stuff in all their manuscripts, carry out some really trivial calculations such as taking derivatives of some simple functions, and write up the results in the format of a paper." The department chair was informed and started an internal investigation; the university's Ethics Committee has since become involved.

In the mean time, the faculty and administration at METU are attempting to do some damage control. The university's president personally sent a letter to the Journal of High Energy Physics requesting that the paper be withdrawn—a request that, as noted above, has yet to be acted upon. Meanwhile, the faculty members mentioned above are working with the arXiv administrators to ensure that any plagiarized work is removed.

How will this impact the field? Professor Paul Ginsparg at Cornell, who helped establish the arXiv, suggests that the impact will be minor. Because the fraudulent work was necessarily so derivative, it did not have a high profile or influence. "There's little effect on science," Dr. Ginsparg said, "since the people who produce high quality work don't need to plagiarize, and the people who do need to plagiarize don't produce high enough quality work to affect anything." Sarioglu is less sure, as the full extent of the plagiarism remains unclear. Most of the publications had additional authors beyond the two graduate students at the center of the scandal, and the investigations are just beginning to explore the larger connections. "All the work they had published on gr-qc [general relativity-quantum cosmology] plagiarizes something. Looking into these things we also found other cases—there are about 20 people who we know are plagiarizers."

--------------------------

Misleading data in a contentious field

One of the open questions in human biology is how an otherwise symmetric egg first obtains the positional information it needs to form specialized tissues at precise locations. It's a question with medical implications, as it may influence the use of cells from fertilized embryos for purposes such as genetic diagnostics and stem cell generation. Opinions on the topic range widely, and many of those in the field promote their interpretations of the currently ambiguous data with a great deal of fervor.


Dr. R. Michael Roberts

Dr. R. Michael Roberts knew that he was getting into a contentious field when Dr. Kaushik Deb showed him some preliminary data that suggested that regional information may exist as early as the first cell division in mice. But with contention came a high profile; detailed results were ultimately published in a February 2006 edition of the prestigious journal Science. They quickly attracted the interest of those with a stake in the field.

Some of that interest included suspicions; within a week of publication, Dr. Roberts had received an e-mail suggesting that a number of images in the publication were duplicates. Although that claim turned out to be incorrect, Dr. Roberts said he immediately informed Science and enlisted some of his coworkers to examine the images in greater detail. "There was enough there to worry me," Roberts said. "The images were similar enough they they could possibly be the same embryo photographed at a different time."

By April, formal accusations of scientific fraud, including suggestions that the images had been manipulated, were received by Science. At this point, Missouri University took over the investigation and locked down the relevant data. "I thought the university's initial reaction was really prompt, and they did things by the book," Dr. Roberts said. "The committee worked as quickly and as thoroughly as one could reasonably expect." By the time Roberts met with them in June, it was clear that very few of the raw images for the paper's data were available.

As a result of this meeting, Roberts and the committee agreed to send a letter to Science informing them that the data was suspect; they published a Statement of Editorial Concern in response (Roberts was generally pleased with the way Science handled matters). Deb left the lab in July and has since dropped out of contact. "I've informed the [Deb's] prior post-doc lab," Roberts said, but, "I don't know the status of things."

How did Dr. Deb manage to create the impression that he had generated a solid data set? Roberts suggests that a number of factors were at play. Several aspects of the experiments allowed Deb to work largely alone. The mouse facility was in a separate building, and "catching a mouse embryo at the three-cell stage had him in from midnight until dawn," Dr. Roberts noted. Deb was also on his second post-doc position, a time where it was essential for him to develop the ability to work independently. The nature of the data itself lent it to manipulation. The raw data for these experiments consisted of a number of independent grayscale images that are normally assigned colors and merged (typically in Photoshop) prior to analysis. Finally, Roberts noted that this was a new avenue of research for his lab, and he feels that he might have been better able to pick up on anything unusual had the work been on a more familiar topic. Most people involved with research in the biological sciences are likely to have seen some of these factors in play at one time or another.

When the university's inquiry ended, Dr. Roberts was able to formally withdraw the paper, but it's clearly influenced the field in the meantime. Since its publication, Roberts noted, the paper "was being cited all over, over 30 times. It's even been cited after the Statement of Editorial Concern." He also believes that at least one research group has attempted to replicate some of the results.

Fraudulent impact

Plagiarism and fraudulent data are likely to cause an overlapping set of problems for the scientific community at large. In many ways, plagiarism is less harmful, as it necessarily involves the recycling of ideas that, to a degree, have already been accepted by the scientific community. The primary danger with plagiarism appears to be that it will further the advancement of the careers of undeserving individuals. In that way, it does actively harm science, as there are generally an excess of researchers competing for grants and faculty positions; any successes based upon plagiarized productivity will come at the expense of more deserving individuals.

The manufacture of falsified data can also lead to a similar sort of unfair career advancement, but its impact can also be more insidious and far-reaching. Fraudulent data may not only give the person who produced it an unfair advantage in obtaining publications, positions, and grant money; if it is accepted as valid by the community at large, it may have a general influence on decision-making in all of these areas. At a time where science funding is generally tight, fraudulent data can also cause limited resources to be spent on attempts to recapitulate or expand upon non-existent data. Thus, although both activities have the potential to distort the scientific enterprise, the dangers of plagiarism appear to be a subset of the potential damages that can arise from fraudulent data.

Both Dr. Ginsparg of Cornell and Dr. Roberts of Missouri commented on the interplay between the profile of the work and the problem of scientific fraud. In Ginsparg's case, his outlook was relatively optimistic, suggesting that plagiarism is largely incapable of harming science. In his view, plagiarized works published in low-impact journals may be less likely to be detected, but they also have little influence on the larger scientific community. If a work draws wider attention, any questionable features of it will likely become apparent quickly.

Referring to fraudulent data, Ginsparg said, "that has the potential to be more pernicious, but still doesn't affect the big picture in the long run since science is self-correcting—again, the more important the result, the more likely that someone will try to reproduce it." Roberts had a remarkably similar thought, but he appeared to view its implications in a different light. He focused on the ability of questionable data to continue to influence scientific thought in some of the less-prominent areas of study: "I think it [research fraud] occurs a lot more frequently than some would care to admit. I think if the work had been in an insignificant backwater, it might well have never been detected."

These two cases suggest that, while profile is important, science's self-correction may even reach the backwaters. It's clear that the Science publication attracted immediate attention because it provided information relevant to an area of active research that was home to a number of competing scientific models. But the experience at METU suggests that at least some scientists are willing to police their own, (apparently) motivated by nothing more than a desire to keep the field clean.

---------------------------------


An ounce of prevention

Plagiarism is a relatively easy problem to detect, in part because it's an issue that affects that academic community well beyond science. Computer algorithms to detect duplications of text have already proven successful at detecting plagiarism in papers in the physical sciences. The arXiv now uses similar software to scan all submissions for signs of plagiarized text. As this report was being prepared, the publishing service Crossref announced that it would begin a pilot program to index the contents of the journals produced by a number of academic publishers in order to expose them for the verification of originality. Thus, catching plagiarism early should be getting increasingly easy for the academic world.

Both Ginsparg and Roberts point out that part of the plagiarism problem may stem from different cultural norms in an increasingly international scientific community. It's a problem that Dr. Roberts thinks may be exacerbated by the fact that English is a second language for many scientists; the temptation to "borrow" well-written English phrases can be strong. He suggests that part of the solution is to expand the role of courses in scientific ethics at universities. Roberts feels strongly that these courses need to be strengthened by the inclusion of case studies and kept up-to-date by the adding a discussion of issues such as image manipulation via software.

As a result of his experience, Dr. Roberts has instituted strict guidelines for the handling of digital images in his lab, but he finds that few others have considered instituting preventative practices. "Investigators have to think about the integrity of their data and making sure that raw data, as much as possible, is put in a read-only file," he said. "The initial raw image needs to stay in a lock box." Again, however, this solution does not extend to the publishing community; Roberts never had to provide any raw data as part of the peer review process, an experience that's typical for most submitted manuscripts.

Until journals institute standards that are likely to catch more instances of fraud, the burden of maintaining scientific integrity will fall outside of the peer review process and remain in the hands of individual investigators. It will require them to strike a balance between encouraging independence and the exchange of data with keeping close watch on those who work for them while enforcing robust standards for data maintenance. To an extent, it seems to require a paranoia that's somewhat antithetical to the open exchange of scientific ideas. As Roberts said, "I'm going to be much more suspicious of things from now on, and I'm not sure that's entirely a good thing. This takes a huge piece out of you in terms of your optimism about science."