Recovery from nginx "Too many open files" error on Amazon AWS Linux

10/28/2015 - 2:46 PM

Recovery from nginx "Too many open files" error on Amazon AWS Linux

Rendered
Source

On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:

2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files) 2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...

An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.

- Instead of using su to run ulimit on the nginx account, use ps aux | grep nginx to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits (where pid is the process id retrieved from ps). (Note: sudo may be necessary on your system for the cat command here, depending on your system.)
Added fs.file-max = 70000 to /etc/sysctl.conf
Added nginx soft nofile 10000 and nginx hard nofile 30000 to /etc/security/limits.conf
Ran sysctl -p
Added worker_rlimit_nofile 30000; to /etc/nginx/nginx.conf.
- While the directions suggested that nginx -s reload was enough to get nginx to recognize the new settings, not all of nginx's processes received the new setting. Upon closer inspection of /proc/pid/limits (see #1 above), the first worker process still had the original S1024/H4096 limit on file handles. Even nginx -s quit didn't shut nginx down. The solution was to kill nginx with the kill pid. After restarting nginx, all of the nginx-user owned processes had the new file limit of S10000/H30000 handles.

Cacher is the code snippet organizer for pro developers

We empower you and your team to get more done, faster

Recovery from nginx "Too many open files" error on Amazon AWS Linux