Skip to main content

Software Analyzing Texts?

Can software accurately analyze the writing style of an author to determine if he or she wrote a specific work? Maybe…
Open source app can detect text's authors
http://www.theregister.co.uk/2013/02/22/author_detection_uni_adelaide/
A group of Adelaide researchers has released an open-source tool that helps identify document authorship by comparing texts.

While their own test cases – and therefore the headlines – concentrated on identifying the authors of historical documents, it seems to The Register that any number of modern uses of such a tool might arise.

The two test cases the researchers drew on in developing their software, on Github here, were a series of US essays called The Federalist Papers, and the Letter to the Hebrews in the New Testament.

The Federalist Paper essays were written in the lead-up to the drafting of the US Constitution, by Alexander Hamilton, James Madison and John Jay. Of the 85 essays, the authorship of 12 is disputed and one has generally been attributed to Jay.

Professor Derek Abbott of the University of Adelaide explains the results: "We've shown that one of the disputed texts, Essay 62, is indeed written by James Madison with a high degree of certainty.

"But the other 12 essays cannot be allocated to any of the three authors with a similarly strong likelihood. We believe they are probably the result of a certain degree of collaboration between the authors, which would also explain why there hasn't been scholarly consensus to date."
I love research such as this. One of the problems we will face with online courses (and even traditional courses) is that students might be more tempted to submit the works of others as their own. You can buy almost anything online, including term papers and reports. What if software could flag works as questionable? That would be pretty valuable.

The challenge might be amassing sufficient amounts of verified text to establish a pattern, but we could always ask students to write a few short samples. Their online forum posts would also give us some sense of how a students writes when a grade isn't at stake.

The research and the software are both freely available.

Free software means there is a high likelihood that other scholars will test the software and the conclusions of the researchers. The more testing of the software, the more likely it will be improved. It might be reasonable to expect private industry and public agencies to also test the software.
In the research paper, published in full at PLOSOne, the group notes that author attribution is a question that's stretching beyond academia in the modern era.

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0054998
"Due to an increase in the amount of data in various forms including emails, blogs, messages on the internet and SMS, the problem of author attribution has received more attention. In addition to its traditional application for shedding light on the authorship of disputed texts in the classical literature, new applications have arisen such as plagiarism detection, web searching, spam email detection, and finding the authors of disputed or anonymous documents in forensics against cyber crime," the researchers write.

They note that further research would be needed to test their methodology against modern texts – but with the software offered for free, The Register can easily imagine the software getting a workout by any number of interested parties.
Free software? I know I'm curious enough to experiment with some public domain texts. After all, maybe Chaucer didn't write all those poems!

Comments

Popular posts from this blog

Slowly Rebooting in 286 Mode

The lumbar radiculopathy, which sounds too much like "ridiculously" for me, hasn't faded completely. My left leg still cramps, tingles, and hurts with sharp pains. My mind remains cloudy, too, even as I stop taking painkillers for the back pain and a recent surgery.

Efforts to reboot and get back on track intellectually, physically, and emotionally are off to a slow, grinding start. It reminds me of an old 80286 PC, the infamously confused Intel CPU that wasn't sure what it was meant to be. And this was before the "SX" fiascos, which wedded 32-bit CPU cores with 16-bit connections. The 80286 was supposed to be able to multitask, but design flaws resulted in a first-generation that was useless to operating system vendors.

My back, my knees, my ankles are each making noises like those old computers.

If I haven't already lost you as a reader, the basic problem is that my mind cannot focus on one task for long without exhaustion and multitasking seems…

MarsEdit and Blogging

MarsEdit (Photo credit: Wikipedia) Mailing posts to blogs, a practice I adopted in 2005, allows a blogger like me to store copies of draft posts within email. If Blogger, WordPress, or the blogging platform of the moment crashes or for some other reason eats my posts, at least I have the original drafts of most entries. I find having such a nicely organized archive convenient — much easier than remembering to archive posts from Blogger or WordPress to my computer.

With this post, I am testing MarsEdit from Red Sweater Software based on recent reviews, including an overview on 9to5Mac.

Composing posts an email offers a fast way to prepare draft blogs, but the email does not always work well if you want to include basic formatting, images, and links to online resources. Submitting to Blogger via Apple Mail often produced complex HTML with unnecessary font and paragraph formatting styles. Problems with rich text led me to convert blog entries to plaintext in Apple Mail and then format th…

Screenwriting Applications

Screenplay sample, showing dialogue and action descriptions. "O.S."=off screen. Written in Final Draft. (Photo credit: Wikipedia) A lot of students and aspiring writers ask me if you "must" use Final Draft or Screenwriter to write a screenplay. No. Absolutely not, unless you are working on a production. In which case, they own or your earn enough for Final Draft or Screenwriter and whatever budget/scheduling apps the production team uses.

I have to say, after trying WriterDuet I would use it in a heartbeat for a small production company and definitely for any non-profit, educational projects. No question. The only reason not to use it is that you must have the exclusive rights to a script... and I don't have those in my work.

WriterDuet is probably best free or low-cost option I have tested. It is very interesting. Blows away Celtx. The Pro version with off-line editing is cheaper than Final Draft or Screenwriter.

The Pro edition is a standalone, offline versio…