Recovery from nginx "Too many open files" error on Amazon AWS Linux
On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:
2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files) 2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...
An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.
su to run ulimit on the nginx account, use ps aux | grep nginx to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits (where pid is the process id retrieved from ps). (Note: sudo may be necessary on your system for the cat command here, depending on your system.)fs.file-max = 70000 to /etc/sysctl.confnginx soft nofile 10000 and nginx hard nofile 30000 to /etc/security/limits.confsysctl -pworker_rlimit_nofile 30000; to /etc/nginx/nginx.conf.nginx -s reload was enough to get nginx to recognize the new settings, not all of nginx's processes received the new setting. Upon closer inspection of /proc/pid/limits (see #1 above), the first worker process still had the original S1024/H4096 limit on file handles. Even nginx -s quit didn't shut nginx down. The solution was to kill nginx with the kill pid. After restarting nginx, all of the nginx-user owned processes had the new file limit of S10000/H30000 handles.