A Postmortem of a Short Story Generator

21 April 2016

Can humans build a story-telling machine that can trick other human beings (a la the ‘Turing Test’)? Dartmouth College’s Neukom Institute for Computational Science is hosting a competition (“DigiLit”) to see if that can be done. I submitted a program (Architect) to this program. I think it has a good shot of winning (~40%).

The idea behind DigiLit is simple. You build a machine that can accept an arbitrary noun prompt (“wedding”, “sorrow”, “car keys”, etc.). The machine then uses this noun prompt to generate a short story that is 7000 words or less. Then this short story is then presented to two panels (each with 3 judges). If you can convince a majority of one panel that your story is human-generated, then you win.

This does seem like a difficult task. You must:

  1. Have the computer accept arbitrary input
  2. Generate a story using the input
  3. Make sure the story is coherent enough to be readable

Yet, I think I completed this task. After I finished writing my program (Architect, named after the program in the Matrix trilogy), I decided to test it out. I chose a random noun prompt, and wrote a story based on it. I also had my program generate a story as well (using that same noun prompt). I then gave those two stories to 7 people. Story A was written by me, and Story B was written by the program.

This means that 43% of the humans were fooled. That’s pretty good. And it’s good enough for me to submit it over to DigiLit.

That doesn’t mean that victory is assured, after all. 43% isn’t exactly a majority. But consider that I really need to convince 2 out of 6 judges to have a good chance of ‘convincing’ a whole panel of 3 judges (assuming that both judges sit on the same panel). That means I only need to convince 33% of all judges.

To be 100% certain of convincing a panel though, I would need to convince 3 out of 6 judges (so that no matter how the judges are distributed, at least a majority of those I convinced would sit on a single panel). That does require me to be able to convince 50% of all judges. That doesn’t seem too likely.

One main problem with my study is that the format of my test and the DigiLit test is slightly different. In my test, each person is reading both stories individually and then making a choice. In DigiLit, the judges sit on a panel, have access to multiple stories (some human-generated and some machine-generated) so they can detect patterns and ‘tells’, and can talk to one another and share their insights. The mob will likely find more faults in a story than a single human can. So my study may likely overestimate how good my program actually is.

But enough boring stats. Here’s the source code of Architect and a brief description of how it works:

  1. Have the computer accept input:
    • Plug the input into a madlib template: {INPUT} {LOCATION_TYPE}. For example:
      • Sorrow Academy.
      • Wedding Plaza.
      • Car Keys Muesum
      The story will therefore be about a generated LOCATION, and not directly about the noun prompt. Here, I'm assuming that reader will assume that when the program is writing about "Sorrow Academy", they assume it to be symbolic of the actual noun in question ('sorrow').

  2. Generate a story using that input:
    • Randomly pick an introductory sentence from a corpus of introductory sentences.
    • Load the program with several pre-written passages.
    • Randomly pick a few of those passages. Insert in the LOCATION into the passages (so that the passages would therefore be about the input in question).
    • Between each passage, insert a "transition passage" to link the two passages together (providing an illusion of continuity between the passages). The "transition passage" was also pre-written.
    • Randomly pick an conclusion sentence from a corpus of conclusion sentences.
    This isn't a story with any planned plot or coherence. The program has no idea what it's writing. But since there is an illusion of continuity between passages, humans are able to read a 'narrative'.

  3. Have the story be thematically coherent enough to be readable.
    • Each sentence of the story must have certain repeated symbols, themes, and characters within them, so that readers are more likely to assume that there is some purpose behind the words. (This approach was originally used in the Fairy Tale Generator to create interesting stories based on random paragraphs.)
    • Since I'm too lazy to hand-craft most of these sentences myself, I decided to use a preexisting generator (Abulafia's Film Noir Monologue generator) to generate most of the paragraphs, introductory sentences, and conclusion sentences. Since this generator already had a prebuilt theme in mind (a cynical detective trying to solve a mysterious case), the resulting output does have some sort of thematic coherence. I then edited most of the paragraphs and sentences to provide even more thematic coherence.
    • The 'transition passages', however, were handwritten by me. Again, their goal is to provide thematic coherence and to "fit" with the rest of the passages.

This was the computer-generated story that I presented to my readers during the test:

It was a dark day—maybe normal for this time of year, but today the big city felt even darker and more sinister.

Miss Kitty Rider prowled through my door like a tigress slinks into a Burmese orphanage—a pinup blonde with legs as far as you could want ’em. No dame her age could afford a coat like that, my money was right where my mind was: the gutter. She was to bad news what apple pie is to America. The dame was all business—before I could even close the office door, she told me she wants Investigator Blake dead. Turns out Blake is masterminding the gang violence, and wants to seize the Envy Foundation for himself by killing off all his rivals. I laughed at her; Blake may be a devil, but he’s a devil I can work with. I wasn’t going to sell him out.

I asked Miss Kitty Rider if there was another way to end the gang violence, without having to take out Investigator Blake. Miss Kitty Rider responded me that if I wanted to bring an end to the violence, then I needed someone who is outside of the system. She recommended a fresh recruit from the police academy who was ‘on the ball’. Mr. Simpson. I foolishly took her advice.

“An inside job…?” Simpson gasped timidly. “Well … No… not an inside job,” I growled. I could barely contain myself with this new guy. “Here’s the deal,” I muttered, “Why don’t I handle this case, while you make like a magnet … and flux off?”

Before I left, Mr. Simpson offered me the phone number to the office. He claimed that the office had helped him greatly and that it can help me too. Desperate for any clues, I took Mr. Simpson’s advice. I spoke to the receptionist for an hour, and jotted down everything she said.

I had little to go on this time … but the office did manage to dig something up. G-Men, fighting crime and resisting corruption, deciding that the best way to fight crime and resist corruption is to kill everyone who might be a criminal and might be corrupt. Their current goal is to ‘purify’ the Envy Foundation before moving onto other tourist attractions. I always knew that G-Men were a little ‘special’.

I shoulda got out when I had the chance.

Now, this story does have its faults. According to one of my readers, there isn’t really any coherent plot, and no actual attempt to explain what’s going on. And I agree with that reader. The stories that my program spits out are really more like ‘prose poetry’: the plot is really just an excuse to show off different evocative scenes. But I know that some people may like reading these stories anyway. Evocative scenes are enjoyable in their own right.

In addition, another reader praised the computer-generated story as having “linear progression”, and a “[natural] level of detail and organization and flow”. This suggest that the existence of some underlining structure may also help to make this story more appealing to read.

What is undeniable is that my computer has written a story.

My entry is based on a simple trick: Humans has a tendency to see patterns in everyday life, even when no patterns exist. The very act of me placing words right next to each other imply that the words must have some relationship with each other. So the humans create this relationship within their own mind. This tendency for humans to see patterns where none exist has a name: “apophenia”.

All I have to do is to provide some context and themes to help trick the humans into assuming that there must be meaning in the words that the program generates. Once that happens, the humans will then fill in the details by interpreting the program’s words. The computer can write total nonsense…and that’s okay, because the humans will just happily figure out the meaning of that nonsense. While the humans figure out that meaning, they thereby construct the story out of that nonsense. A story, therefore, appears from a mere collection of words.

Appendix A: Is This Story Of High-Quality?

I don’t know. It appears that if the judges think that a story is human-written, they will generally assume it to be better…

This sounds silly, but according to my data, the rating of the quality of the story is dependent on whether the judge assume the story is written by a human.

This is a pretty odd result and suggests to me that the ‘origin’ of a story impacts how judges view its quality. This makes some sense: humans like reading what other humans have written, and if they assume that something is human-written, they are willing to think of it as good (even if it’s not). But at the same time, I do have a small sample size, and it’s possible that a different group of judges would determine the quality of a story more “objectively”.

By the way, there have been several peer-reviewed studies about how humans’ perceptions of news articles can change based on whether they were told the news article was written by a ‘human’ or by a ‘robot’, and I do plan on blogging on these articles later. I trust the results of those peer-reviewed studies over that of my unscientific study.

Appendix B: Can This Program ‘Scale’?

Using a single noun prompt, my story can generate 120 different short stories. Most people are not going to read all 120 stories. Instead, they’ll only read a few computer-generated stories before moving onto the next content to consume. So I’m fine with how the algorithm works currently. There’s no reason to spend time writing more code than is necessary.

That being said, the algorithm could “scale” upwards. If the program is given more passages, you could easily generate more short stories (or even longer stories, possibly even novels). Finding those passages (and making sure they are all thematic coherent) may be somewhat difficult to do, and would most likely require using passages from Creative Commons or public domain works.

Note that the “transition phrases” will have to be made more ‘generic’ and reusable for different passages. Currently, I had to handwrite each transition phrase to handle specific situations (for example, I handwrote a the transition phrase to justify our unnamed detective leaving Mr. Simpson’s place and calling the Office). Handwriting each transition phrase is a very laborious and tedious process. Having “generic” transition phrases, on the other hand, would have enabled me to focus on copying and pasting as many evocative scenes as possible into the generator.

Return back to Blog Index