I am currently tying up various loose ends on a full stack project that we have invested a lot of our development time into over the past six months.

As a general knowledge exercise, and to make sure our server setup is optimized I have spent the past few hours fine-toothcombing the nginx docs.

In the process I learnt a number of new things, and discovered some interesting optimizations. I thought I'd post a brief 'nginx - what you need to know' kind of post. This is intended to be an overview of nginx for someone who has a generally solid knowledge of software/server architecture yet only has fifteen minutes to spare.

Architecture

nginx is events based which allows it to handle load better than for example Apache (which spawns a new process per connection).

It was built with the intention of handling high concurrency (lots of simultaneous connections) whilst performing quickly and efficiently.

It consists of a master process which:

  • reads and validates your configuration files
  • manages worker processes

The worker processes accept connections on a shared 'listen' socket and are capable of handling thousands of concurrent connections.

As a general rule you should configure one worker process per cpu core. Double that if you are serving mainly static content.

You can see these respective processes by executing ps -ax | grep nginx from the command line.

If you reload your nginx configuration, worker processes are gracefully shut down.

Configuration

nginx follows a 'c-style' configuration format.

nginx configuration allows for powerful regular expression matching and variable utilization.

server blocks define the configuration for a particular host, ip, port combination.

default_server can be specified on the listen directive to indicate to utilize that configuration block for any connection on that port (should no other block match). That is to say the below block will match a request on port 80 even if the host is not example.org.

server {
    listen       80  default_server;
    listen       8080;
    server_name  example.org;
    ...
}

server blocks allow for wildcard matching or regular expression matching. For example you could match both the www and non-www versions of a domain name.

Exact match server names are however more efficient than wildcards or regular expressions on the basis of how nginx stores host data in hash tables.

location blocks define the configuration for a specific location.

Location blocks only consider the URL. They do not consider the query string.

Longer matches take preference - that is to say location /images will be matched over location / were you to request http://server.com/images/123.jpg.

Regular expression matches are prioritized over the longest prefix. If you want to match a regular expression in a location block, prepend it with ~ e.g location ~ \.(gif|jpg|png)$

Regular expressions follow the PCRE format.

Any regular expression containing brace ({}) characters should be quoted as it will it would be otherwise unparseable (given nginx's usage of braces for block closures).

Regular expressions can use named captures. For example:

server {
    server_name   ~^(www\.)?(?<domain>.+)$;

    location / {
        root   /sites/$domain;
    }
}

Interesting features

Load balancing with nginx is really easy.
There are three loading balancing methodologies available in nginx:

  • round-robin
  • least-connected
  • ip-hash

Load balancing is highly configurable and allows for the intelligent direction of requests to different servers.

Health checks are in built such that if a particular server fails to respond nginx refrains for sending the request to that server based on configurable parameters.

HTTPS is also easy to implement with nginx. It is a case of adding listen 443 ssl to your server block and adding directives for the locations of your certificate and private key.

Given that the SSL handshake is the most expensive part of a secure offering, you can cache your ssl sessions.

ssl_session_cache   shared:SSL:10m;
ssl_session_timeout 10m;

Modules

nginx is very modular in its nature. Although you interact with it as one unit through your configuration it is in fact made up of individual units doing different things.

nginx offer a detailled module reference - this is an overview of some interesting or less commonly discussed modules available to you.

The internal directive of the core module allows for a location to only be accesible to an internal redirect.

For example, if you want 404.html only to be accessible to a 404 response you can redirect requests from the error_page whilst not making it accessible to a user typing it directly into their browser.

error_page 404 /404.html;

location /404.html {
    internal;
}

Autoindex

If you dont want to show directory listings to nosy users you can turn autoindex to off. This can be used to protect your image directory for example.

Browser detection

You can use modern_browser and ancient_browser to show different pages dependent on the browser which your client is using. See here.

IP Conditionals

You can use the geo module to set variables based on IP ranges. You could for example set a variable based on a locale IP range and use that to send users from a specific country to a specific location.

Image manipulation

nginx even offers an image filter module. This module allows you to crop, resize, and rotate images with nginx.

Connection limits

You can limit a particular IP to a particular number of concurrent connections. For example you could only allow one connection to files within your 'downloads' folder at a given time.

In addition to that, you can limit the request rate and configure the handling of request bursts greater than a configurable value.

My initial thoughts were that this would be a fantastic method of preventing people from hammering a public API for example.

More information can be found here and here.

Request controls

Further to the above, nginx allows you to limit the request types that a particular block will handle.

This would again be extremely useful for an API offering.

limit_except GET {
    allow 192.168.1.0/32;
    deny  all;
}

Conditional logging

The log module allows for conditional logging.

I thought I'd give this a mention because I can see a lot of merit in only wanting to log access requests that result in bad response codes.

The example listed shows this:

map $status $loggable {
    ~^[23]  0;
    default 1;
}

access_log /path/to/access.log combined if=$loggable;

Secure links

This module is pretty awesome - it allows you to secure your links and associate validity time periods with them utilizing the nginx server software.

Personally, I have no use for it because although I have production use cases of similar functionality, I can't help but feel that you could do this is many easier ways.

Novelty

There was one module that I found somewhat novel. I just can not see a use case for it - perhaps someone can enlighten me?

nginx offers a module to show a random file from a given directory as an index file.

I need this :P

Summary

As mentioned, the above is based on a thorough read through of the nginx documentation.

The following chapter from 'The Architecture Of Open Source Applications' was also incredibly interesting. It offers a significantly more complex look at the internals of nginx. Perhaps not suitable if you really did only have 15 minutes ;)

I highly reccomend reading the documentation yourself if you are interested in, or are running nginx on a server.

If you have any questions I would be happy to answer them.