May 1, 2017

Caching reverse proxy for Linux and OSX software updates

Filed under: Technical — Tags: , , , , , , , — James Bunton @ 12:00 am

I have lots of computers in my house, all of which receive regular software updates. I get tired of waiting for all of these to download the same data, sometimes at the same time! I decided to use nginx to cache this data on my router to save time.

The approach is to intercept insecure HTTP requests and cache them:

  • Configure my router’s DNS to resolve the sites I want to cache, like archive.ubuntu.com, to itself.
  • Run nginx on the router as a reverse proxy and configure it to serve incoming requests from the cache when possible.
  • Run sniproxy on the router to forward secure requests onto the original site.
  • Optional – Configure an additional IP address on the router which is dedicated to serving the cache. This allows me to continue hosting a website from my router.

DNS configuration

I use dnsmasq so I added the following lines to /etc/dnsmasq.conf. This will send HTTP requests for these hostnames to my router’s IP address.

address=/archive.canonical.com/192.168.1.8
address=/archive.ubuntu.com/192.168.1.8
address=/dl.google.com/192.168.1.8
address=/download.cdn.mozilla.net/192.168.1.8
address=/mirror.archlinuxarm.org/192.168.1.8
address=/mirror.internode.on.net/192.168.1.8
address=/security.debian.org/192.168.1.8
address=/security.ubuntu.com/192.168.1.8
address=/swcdn.apple.com/192.168.1.8
address=/swdownload.apple.com/192.168.1.8

nginx configuration

These settings work well for me. The nginx proxy module documentation is quite helpful if you want to understand the various caching options and how to tweak them.

The proxy_pass http://$host line tells nginx to make outgoing requests to whatever site it receives in the Host header. This makes it an open proxy, but that’s ok because it’s only listening on a private IP address.

One interesting thing in this config is that resolver 8.8.8.8 line. Without this option when nginx tried to make an outgoing connection to archive.ubuntu.com it would use resolve that name using my local dnsmasq. That would send the request back to nginx again. Infinite loop! Telling nginx to use Google’s DNS (8.8.8.8) ensures that it will always get the real IP address of the site, fixing the problem. You could also put your ISP nameserver here if you like.

user www-data;
worker_processes 4;
pid /run/nginx.pid;

events {
    worker_connections 768;
}

http {
    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    access_log /var/log/nginx/access.log;
    error_log /var/log/nginx/error.log;

    resolver 8.8.8.8 8.8.4.4;

    proxy_cache_path
        /var/cache/nginx/proxy/cache
        levels=1
        keys_zone=the-cache:10m
        inactive=30d
        max_size=10g;

    proxy_temp_path /var/cache/nginx/proxy/temp 1;

    server {
        listen 192.168.1.8:80;
        location / {
            proxy_pass http://$host;
            proxy_set_header Host $host;
            proxy_cache the-cache;
            proxy_cache_lock on;
            proxy_cache_lock_timeout 5m;
            proxy_cache_revalidate on;
            proxy_cache_valid 200 1h;
            proxy_cache_bypass $http_cache_control;
            proxy_no_cache $http_cache_control;
            proxy_http_version 1.1;
            proxy_ignore_headers
                X-Accel-Redirect
                X-Accel-Expires
                X-Accel-Limit-Rate
                X-Accel-Buffering
                X-Accel-Charset
                Expires
                Cache-Control;
        }
    }
}

sniproxy configuration

If some client on your network tries to make an HTTPS request to one of the above sites we’ll be unable to cache it due to the cryptography. However it’s important that the request still works correctly. The TLS SNI extension allows us to proxy these HTTPS connections to their original destinations even though we cannot read the data. I use sniproxy to do this.

Note that I’m again using Google’s nameserver, for the same reasons as above.

user daemon
pidfile /var/run/sniproxy.pid

error_log {
    syslog daemon
    priority notice
}

resolver {
    nameserver 8.8.8.8
}

listen 192.168.1.8:443 {
    protocol tls
}

table {
    .* *:443
}