Nate Pann is a popular internet forum in South Korea. It is big enough to include a diverse demographic as users; however, one defining trait of Nate Pann is the “결/시/친” (Gyeolsichin, shorthand for “Marriage/In-laws/Parents” where in-laws and parents refer to the woman’s) forum. Gyeolsichin has an established status as a place where women in distress caused by diverse elements of the patriarchic Korean society come to rant. This characteristic makes it unique compared to other public online communities of comparable sizes, many of which display male-dominant voices.
This project is an attempt to explore the collective memories embodied in online communities. I’ve long wanted to do some work on Korean internet communities including this one, an interest which is also manifest in the k-www project. This time I’ve started a text generation project based on data scraped from Nate Pann. I scraped most of Nate Pann’s “Best Articles”, an aggregated list of highly-ranked new articles across all subforums, from Jan 1 2013 when the list first appeared until some date last week. Among the 169,795 articles, 18,884 belong to Gyeolsichin.
I set up Insik Kim’s kor-char-rnn-tensorflow on a cloud server with a Tesla P40 GPU. An exploratory training of an LSTM model (hidden=700, layer=3, seq=100) with titles (1MB) took about 10hrs (reaching about 0.5 loss at epoch 1000); for the main content (70MB) it seems closer to 150hrs for 100 epochs. There must be more efficient and computationally sophisticated ways to do this but I’ve decided to give it a go since I have some free credit to spend.
Using the trained models (early-epoch model for main content), I set up a script that generates a title and a content; for titles, filter out sentences that come directly from the training set; parses the generated text so each one begins and ends with a more or less complete sentence; generates an html file styled to look similar to Nate Pann, but with a different logo that signals its fakeness; exports screenshots, which is a common way these postings are shared on Twitter; and runs a bot that posts a tweet everyday with said images attached.
The result is the Nate Pann RNN Twitter bot.