bartvanremortele
4/25/2016 - 11:13 PM

How to run Newspaper in an Amazon Lambda function

How to run Newspaper in an Amazon Lambda function

How to run Newspaper (the Python 2.7 version) in an Amazon Lambda function:

  • Start a new EC2 instance with the Amazon Linux AMI
  • sudo yum install gcc gcc-c++ libjpeg-devel zlib-devel libevent-devel libxml2-devel libxslt-devel libpng-devel
  • sudo yum install python27-devel python27-pip
  • virtualenv env
  • source env/bin/activate
  • sudo /usr/bin/easy_install lxml
  • pip install newspaper
  • nano env/local/lib/python2.7/site-packages/newspaper/settings.py
    • change DATA_DIRECTORY variable value to '/tmp/.newspaper_scraper'
  • zip -9 bundle.zip lambda_function.py
  • cd $VIRTUAL_ENV/lib/python2.7/site-packages
  • zip -r9 ~/bundle.zip *
  • cd $VIRTUAL_ENV/lib64/python2.7/site-packages
  • zip -r9 ~/bundle.zip *
  • Upload the bundle.zip file to your Lambda function
    • This assumes a default Handler set to lambda_function.lambda_handler
  • Delete your EC2 instance
from newspaper import Article

def lambda_handler(event, context):
    url = event['url']
    article = Article(url)
    article.download()
    article.parse()

    return {
        'content' : article.text
    }