Statistical Approaches Are Actually Pretty Good

A "Mea Culpa" Regarding One Of My Predictions About Text Generation

18 June 2020

For some time, I have written blog posts on dev.to about text generation, and even went so far as to collect them into an online ebook. In one of the chapters of said e-book, written in 2016, I made predictions about the future of text generation. One of those predictions was “Statistical Approaches Will Be Part Of A ‘Toolkit’ In Text Generation”.

Statistical approaches are good for presenting text that looks human-ish. It is very effective at producing evocative words, phrases, and sentences. However, the generated text quickly turns into gibberish with even short-term exposure. The generated text could not “scale” upwards to novel-length works.

That didn’t mean statistical approaches (e.g., machine learning) were worthless. Indeed, the fact that it can produce evocative text is, well, pretty cool. But you would need to combine it with another approach (‘structural modeling’, e.g. hardcoded text generation logic) - essentially using machine learning as part of a ‘toolkit’ rather than as a sole thing.

That’s was true back in 2016. It’s still true now. My problem was assuming that it would always be true…and only now do I realize my error.

It’s like predicting in the 1850s that people will never be able to develop airplanes. He would be correct in the 1860s. And even in the 1870s. But one day, airplanes did get invented, and your prediction would be proven wrong.

When OpenAI announced in February 14th 2019 that it has created a neural network that can generate text (the infamous GPT-2), I was flabbergasted at how coherent it was - even if the examples were indeed hand-selected. Sure, there were still flaws with said generated text. But it was still better than what I originally expected. In a GitHub issue for my own text generation project, I wrote at the time:

Even though there are still subtle flaws in the neural network they have right now, those subtle flaws can be fixed given enough time and resources. OpenAI have demonstrated what is possible, and what is possible will wind up being inevitable.

Of course, I didn’t post a “Mea Culpa” back then. Just because something is inevitable doesn’t mean that it will happen immediately enough for it to be practical.

One month after OpenAI’s announcement of GPT-2, Rich Sutton would write a blog post entitled The Bitter Lesson.

The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

Of course, Rich intended for this paragraph to advocate in favor of “scaling computation by search and learning” (the “statistical approach”)…but I could also read it in the opposite direction - that if you want to have results now, not in some unspecified time in the future, you want to build knowledge into your agent (“structural modeling” or a hybrid approach). Yes, your technology will be declared obsolete when the “statistical approach” prevails, but you don’t know when that happens, and your code is going to be obsolete anyway due to advances in technologies. So embrace the Bitter Lesson. If you want to write a computer program to write text now…well, then do it now! Don’t wait for innovation to happen.

Then, on May 28th 2020, OpenAI announced that it built GPT-3, which was a massive improvement over GPT-2, fixing several of its faults. The generated text does turn into gibberish, eventually…but now it takes medium-term exposure to expose its gibberish nature, which is remarkable to me. Interestingly, it’s also very good at “few-shot learning”…by providing the generator with a few examples, it can quickly figure out what type of text you want it to generate and then generate it. Here’s a cherry-picked example:

Human: To do a “farduddle” means to jump up and down really fast. An example of a sentence that uses the word farduddle is:

GPT-3: One day when I was playing tag with my little sister, she got really excited and she started doing these crazy farduddles.

Human: A “yalubalu” is a type of vegetable that looks like a big pumpkin. An example of a sentence that uses the word yalubalu is:

GPT-3: I was on a trip to Africa and I tried this yalubalu vegetable that was grown in a garden there. It was delicious.

Huh. So I guess people should have waited. Technology is indeed progressing faster than I expected, and in future text generation projects, I can definitely see myself tinkering with GPT-3 and its future successors rather than resorting to structural modeling. It’s way more convenient to take advantage of a text generator’s “few-shot learning” capabilities, configuring it to generate the text the way I want it to - rather than writing code.

OpenAI thinks the same way, and have recently announced plans to commercialize API access to its text generation models, including GPT-3. The ‘profit motive’ tends to fund future innovation in that line, which means more breakthroughs may occur at a steady clip.

I do not believe that GPT-3, in its present state, can indeed “scale” upwards to novel-length works on its own. But I am confident that in the next 10-20 years[1], a future successor to GPT-3 will scale, all by itself. I believe that statistical learning has a promising future ahead of it.

[1] I’m a pessimist, so I tend to lean towards predicting 20 years - after all, AI Winters and tech bubbles are still a thing. But, I am also a pessimist about my pessimism, and can easily see “10 years” as a doable target.

Note: This blog post was originally posted on dev.to.

Return back to Blog Index