Recovery from nginx "Too many open files" error on Amazon AWS Linux
On Tue Oct 27, 2015, history.state.gov began buckling under load, intermittently issuing 500 errors. Nginx's error log was sprinkled with the following errors:
2015/10/27 21:48:36 [crit] 2475#0: accept4() failed (24: Too many open files) 2015/10/27 21:48:36 [alert] 2475#0: *7163915 socket() failed (24: Too many open files) while connecting to upstream...
An article at http://www.cyberciti.biz/faq/linux-unix-nginx-too-many-open-files/ provided directions that mostly worked. Below are the steps we followed. The steps that diverged from the article's directions are marked with an *.
su
to run ulimit
on the nginx account, use ps aux | grep nginx
to locate nginx's process IDs. Then query each process's file handle limits using cat /proc/pid/limits
(where pid
is the process id retrieved from ps
). (Note: sudo
may be necessary on your system for the cat
command here, depending on your system.)fs.file-max = 70000
to /etc/sysctl.confnginx soft nofile 10000
and nginx hard nofile 30000
to /etc/security/limits.confsysctl -p
worker_rlimit_nofile 30000;
to /etc/nginx/nginx.conf.nginx -s reload
was enough to get nginx to recognize the new settings, not all of nginx's processes received the new setting. Upon closer inspection of /proc/pid/limits
(see #1 above), the first worker process still had the original S1024/H4096 limit on file handles. Even nginx -s quit
didn't shut nginx down. The solution was to kill nginx with the kill pid
. After restarting nginx, all of the nginx-user owned processes had the new file limit of S10000/H30000 handles.