AutoBio: RNN Twitter Bio Generator
In this project, we generate Twitter Biographies using RNNs(Recurrent Neural Nets). Given any user ID, the code finds all of their friends, and the friends of all of their friends, then collects the bios of up to 10,000 of these users. We then use a RNN to learn the conditional probabilities between letters. Lastly, we use this RNN to generate new biographies that are 140 characters each.
To generate Twitter bios using your friends’ bios
Download the code from github and install the requirements: pip install -r requirements.txt
Getting started with Twitter API
Set up an Twitter developer account and create a new app.
To Find your Twitter ID:
1) Go to www.twitter.com and click your profile at the top right and go to Settings and Privacy
2) Click on Your Twitter Data
3) Enter your password and hit confirm to see the USER ID
Dataset
1) Find your Twitter ID
2) Get your friends IDs by passing in your ID to get_friends.py
, which creates a file called FriendIDs.txt
> python get_friends.py 01221414
3) Get your friends of friends’ IDs by calling get_fof.py
and passing it FriendIDs.txt
, which creates a file get FoFIDs.txt
> python get_fof.py FriendIDs.txt
4) Get all user biographies by calling get_bios.py
and passing in FoFIDs.txt
, which creates a file called Bios.txt
> python get_bios.py FoFIDs.txt
Training and Generating
5) Train
Train a RNN on all the biographies by calling train.py
and passing it Bios.txt
> python train.py Bios.txt
Training for 2000 epochs...
(10 minutes later)
Saved as Bios.pt
6) Generate
After training the model will be saved as [filename].pt
Now run generate_bios.py
with that filename to generate some new bios.
Pick two characters that you want the sentences to start with.
We created examples when the first two letters are “Th” and “I “
> python generate_bios.py Bios.pt --prime_str "Th"
Training options
Usage: train.py [filename] [options]
Options:
--n_epochs Number of epochs to train
--print_every Log learning rate at this interval
--hidden_size Hidden size of GRU
--n_layers Number of GRU layers
--learning_rate Learning rate
--chunk_len Length of chunks to train on at a time
Generation options
Usage: generate.py [filename] [options]
Options:
-p, --prime_str String to prime generation with
-l, --predict_len Length of prediction
-t, --temperature Temperature (higher is more chaotic)
7) (Optional) Call temp-iterator.py and pass in a value between 0 and 1 to test out the algorithm at different temperatures. Higher temperatures make the output more unpredictable, while lower temperatures make them more regular
Challenges
Language Mixing. I had to manually remove the IDs of my friends that follow a lot of people that tweet in different languages, and even so friends of friends still leaked in some non-english tweets.
Sampling and getting the temperature just right for maximmum comedic effect (and not getting stuck in a “the” loop)
Acknowledgments
- I made this project using a char rnn generation tutorial from here: https://github.com/spro/practical-pytorch/tree/master/char-rnn-generation