postprefix
2/17/2018 - 5:00 PM

Python Reddit bot on Heroku

Python Reddit bot on Heroku

A crash course in setting up your Python Reddit bot on Heroku

You'll need to do the following:

  • You need to make your bot a python app. Do this by making another directory (can be the same name as the regular one) and put all your python code in that, and make an empty file called __init__.py in it as well. See how I structured mine if this isn't clear. In your base directory, create two files: "requirements.txt" and "runtime.txt". The requirements.txt file should be the output of pip freeze (you can run the command "pip freeze > requirements.txt"). If you're not using virtualenv, you'll need to go through after and delete all the lines with packages your code doesn't actually use. Check out mine to see what I mean. Runtime.txt just specifies with python version for heroku to use. Mine just has the line "python-2.7.4" in it. All of this will tell heroku to recognize your bot as a python app.

  • Make a heroku account and set up your git repo to use it as a remote (heroku has a nice guide for explaining all this)

  • Once you've got your repo set up on heroku, there's two things you'll have to change (these may not apply to you, but they did to the person I wrote this to originally):

    • A: Can't use a prop file for username/password anymore since it's untracked in your gitignore, so you'll have to set environmental variables.
    • B. Heroku has its own weird FS, and you can't preserve generated files between runs (AKA pickle caching isn't an option).
  • To solve A, you'll have to set environmental variables. I just set two for the login credentials ("REDDIT_USERNAME" and "REDDIT_PASSWORD"). You can set it like this from terminal (assuming you've got your heroku toolbelt and remote all set up correctly):

# Set heroku config/env variables
$ heroku config:set REDDIT_USERNAME=<username>
$ heroku config:set REDDIT_PASSWORD=<password>

# Confirm they're set with this command
$ heroku config

And programmatically retrieve it in your code like this:

# Retrieve heroku env variables
login_info = [os.environ['REDDIT_USERNAME'], os.environ['REDDIT_PASSWORD']]
  • To solve B is a little trickier. You'll have to use one of Heroku's many cache options. The most convenient one for me ended up being Memcached Cloud. You'll have to add a credit card to your account to use plugins like this or the scheduler I'll mention next, but don't worry we'll be using the free tiers.

    • (Sidenote) They have measurement system called "Dyno Hours", but my bot running every 10 minutes doesn't come anywhere close to making a dent in the free amount you get. You can check this under your Account, scroll down to the bottom and see next to "Current Usage". You can click "details" next to that for more info. You get 750 free dyno hours every month (which is ~$40 worth).
  • Anyway back to caching. Memcached Cloud will give you a DB that you can use as a cache, and it supports key-value caching (which suits my bot best I think. You can do more with it, but for now this is all I've done/know how to use). Add the free tier (25MB) as an add-on via add-ons section in your bot project on heroku or from the command line with the following command:

$ heroku addons:add memcachedcloud:25
  • Might take a couple minutes to show, but you'll now have 3 new env. variables added with credentials to access your DB. You can verify this with the "heroku config" command again

  • See my code for more comprehensive examples, but here's the gist of how to use it in your code

# Add python-binary-memcached==0.21 to your requirements.txt
import bmemcached    

# Initialize your memcached client from the env variables that were automatically set when it was installed
mc = bmemcached.Client(os.environ.get('MEMCACHEDCLOUD_SERVERS').split(','),
    os.environ.get('MEMCACHEDCLOUD_USERNAME'),
    os.environ.get('MEMCACHEDCLOUD_PASSWORD'))
    
# Get a value from a key
# Make sure the key is a string, otherwise cast it first
obj = mc.get(input_key)
    
# Check if key is in cache (continuation from above line)
if not obj:
    # Not in cache
    return False
else:
    # In the cache
    return True

# Set a value. Again, make sure the key is a string
mc.set(input_key, "True")

# Delete a value. bla bla make sure key is a string
mc.delete(key)

# Seriously, the keys-as-string thing gave me such much shit. I just ended up casting every key as a string anyway, just in case. str(key)
  • If you're mixing between running locally and on heroku, you can have a global boolean variable "running_on_heroku", and do a check to see if the memcached environmental variable exists (that's what I do in my code anyway). If it exists, it's running on heroku. See line 52 here.

  • Last bit is running it on a scheduler. iirc your bot runs continuously and sleeps for periods of time. If you do this on heroku, I think it'd be unsafe and you'd risk hitting those dyno hours with it being considered a "long running process". Instead, add maybe a flag in your arguments (mine is "--cron"), and if that argument exists then do a single run-through and terminate. On the topic of args, those would also be a good solution for specifying if the bot's running on heroku or locally (probably better than my current approach :P). My bot can run on heroku or locally, which complicated the code a bit but kept it flexible.

  • Onto the scheduler. This is another free heroku add-on (it can only run up to every 10 minutes though). Add it through heroku's add-on library or from the command line with the following:

$ heroku addons:add scheduler
  • Once it's installed, you can schedule tasks (essentially terminal commands) to run through the heroku site (go to your app, and under the "Resources" section is where you'll see all your installed add-ons. I just scheduled the command to run my bot, along with its flags
# --env tells it to use environmental vars for login instead of a prop file
# --cron tells it to run once and terminate rather than run/sleep/loop
$ python spursgifsbot/bot.py --env --cron

And there you have it. That should get you started at least. There's other scheduler add-ons to choose from as well (also with free options) too. Think of it this way, the free scheduler add-on gives you theoretically 144 runs/day of your bot. The free tier of the temporize scheduler only gives you 20. Process scheduler looks promising and is more customizable (and very specific with the amount of dyno hours it uses), but I haven't investigated it fully.

Good luck!

REVISION 1

Here's a quick tidbit about settings some stuff up manually. Say you cache some important things (meaning you're not setting them with env variables), like IDs or keys. You can programatically do this, but you don't want to hardcode it into your git history, so instead you can run the python interpreter directly from heroky. Just run the following from your local base directory where you set up heroku from

$ heroku run python

Which will then hook you into a python interpreter running on Heroku

$ heroku run python
Running `python` attached to terminal... up, run.2622
Python 2.7.5 (default, May 17 2013, 06:45:09)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

From here you can python away like you normally would in a locally-running interpreter, but you'll have access to your Heroku DB setup, like so

>>> import bmemcached
>>> import os
>>> mc = bmemcached.Client(os.environ.get('MEMCACHEDCLOUD_SERVERS').split(','), os.environ.get('MEMCACHEDCLOUD_USERNAME'), os.environ.get('MEMCACHEDCLOUD_PASSWORD'))
>>> mc.set('Test', 'awww yeah')
True
>>> mc.get('Test')
'awww yeah'
>>> mc.delete('Test')
True
>>>

Revision 2

So on further reading about memcachedcloud, you can actually store serialized objects in there (think lists and dicts), which makes things incredibly easy. I've implemented this in my Facebook Bot if you want a real-world example.

Here's a short example:

$ heroku run python                                                     1 ↵
Running `python` attached to terminal... up, run.9520
Python 2.7.5 (default, May 17 2013, 06:45:09)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import bmemcached
>>> import os
>>> mc = bmemcached.Client(os.environ.get('MEMCACHEDCLOUD_SERVERS').split(','), os.environ.get('MEMCACHEDCLOUD_USERNAME'), os.environ.get('MEMCACHEDCLOUD_PASSWORD'))
>>> new_dict = {'example': 'yay'}
>>> new_list = [1, 2, 3, 4, 5]
>>> new_dict['list_test'] = new_list
>>> mc.set('dict', new_dict)
True
>>> mc.set('list', new_list)
True
>>> print mc.get('dict')
{'example': 'yay', 'list_test': [1, 2, 3, 4, 5]}
>>> print mc.get('list')
[1, 2, 3, 4, 5]
>>>