winsnes.io

IT, Cloud, Automation, and Business

Reduce personal data leakage using Pi-hole, the easy way

This christmas my partner and I was ready early with all the preparations, so we found ourselves with a bit of time on our hands to do things that we always meant to do, but never had the time. I've had a Raspberry Pi 3B sitting on my shelf for a long time that I was going to set up to run a home automation setup. I haven't had the time to sit down and go through in detail what is required, and while waiting the online solutions have gotten a lot better. It won't save me from if I lose internet connectivity, but good enough to work for now, so I have decided to repurpose the Raspberry Pi for a different job.

Over the last year there has been a lot of focus on data and tracking of people in the media, Facebook and Google have been a focus of a lot of these stories, but data is the new gold rush, so everyone is collecting as much as possible on their users. I wanted to reduce the amount of data I share with the world, and hopefully doing so in a seamless way that doesn't impact our day to day. The last things I want to do is install something custom onto all the different devices I have, the operational burden of that is far too high. A good way would be to install something on my network that would filter out the “bad” requests. There are a few enterprise grade solutions for this, but they are generally big, bulky, difficult to manage, and expensive. Since I have a Raspberry Pi that I'm not using, an inexpensive device to both buy and run, I started looking for something that I can run on that.

Enter Pi-hole, a simple and automated solution that works as a DNS forwarder and filters on a list of known bad entries that's updated frequently (about 113,000 blocked entries at time of writing). It is not as effective as installing a browser plugin, but it will filter all of the devices on our network, and prevent our smart devices from dialing home with our information. The easy way is to run it in a docker container on your Raspberry Pi, and then configure your router to broadcast the Raspberry Pi as the authoritative DNS server on your network using DHCP options.

Set up Pi-hole on your Raspberry Pi

When I started looking at Pi-hole, there was an option of running it in a docker container, which is perfect as it makes it very simple to set up and you don't end up with libraries plastered all over your system. It also makes the upgrade path very simple, as you just need to rotate in a new container and it will have a clean install, no files from a previous version hanging around. With that in mind, I started looking into how to run docker on my Raspberry Pi.

Going to the Docker website, there is a simple shell script that you can run on the host to install docker for the Pi, but I don't like having to log into the instance itself and run scripts. Often you forget something, and you need to document a set of steps you go through, so when inevitably something breaks you know how to rebuild it. The better way is to spend a bit more effort up front and build it out using configuration management as code. Ansible is my favorite tool for this kind of task, it runs on any Linux distribution you can run python on, perfect for the Pi.

First thing I did was set up a new role, so it could be published on Ansible Galaxy. You can find the role here, with detailed instructions on how to use it. If you run into any problems, please raise an issue and I'll try to fix it up as soon as possible.

The role does 3 main things:

  • Install Docker
  • Install Pip and the docker python library - a dependency for configuring docker containers using Ansible
  • Configure the Pi-hole docker container and set it up to run as a service that starts automatically

Once you have run the Ansible playbook on your Pi, you should have a functional Pi-hole deployed and configured. Ready to be used to filter your internet bound traffic.

Configure your router

Next up we need to configure your router. There are two approaches to this, we can either tell the router to use the newly deployed Pi-hole as the main DNS server, or we can tell the router to broadcast the ip of the pi-hole as the default DNS server as part of the DHCP packages.

I went down the second path, so I could log which devices requested DNS, and how many of them were blocked. After running it for a few days, I noticed my TV is a big offender and makes a lot of DNS requests that gets blocked.

There are quite a few different options you can set for DHCP, and they are all outlined in the RFC, but if you're like me and prefers plain english, here is a good reference on the different options you can use.

From the list you can see that option 6 is the one you use to broadcast which DNS server all the devices on the network should be using. I use a router which I've installed DD-WRT on. It has a simple user interface to add additional options to DHCP. You will have to look at your router manual to see where you can input additional DHCP options.

Add the option like below, and you should be sending out the new configuration to all your devices.

dhcp-option=6,<ip of pi-hole>

It will take a bit of time for your devices to pick up the new settings, depending on the lifetime of the lease they have on their IP. Generally not more than a day.

Dashboard goodness

Once you have that setup and running, you should be able to log into your Pi-hole by going to http://<ip of raspberry pi>/admin/ and see metrics on a dashboard like below. What surprised me was the number of requests that got blocked. As you can see from the screenshot of my dashboard that I took after running the Pi-hole for a few days, over 24% of DNS requests have been blocked.

Pi-hole Dashboard

Demystifying Hugo Archetypes

Archetypes in Hugo is a great way to make starting off a new page, blog post, or any other types of content. It is in essence a type template that makes it a little faster to create content by automatically generating a skeleton with pre-filled metadata and content to start you of. In my blog I use it to automatically differentiate between posts and other types of content. It took me a while to get me head around how this works, so I thought I would share how I think about this feature and hopefully it will help someone that thinks the same way as I do.

Every time you create a new content page using the hugo generator (hugo new) it will search your system for an archetype file that matches what you are generating. It will first search your archetype folder in your site, and if none is found there, it will search for one in your configured template. There is a gotcha here with precedence that I'll explain with an example.

First let's decompose the generator shell command.

$> hugo new post/post-name.md
    │    │   │    │
    │    │   │    └─ name of the content being generated
    │    │   └─ type of the content being generated
    │    └─ generator command to create new content
    └─ hugo command

This will create a new file at content/post/post-name.md using the archetype for post. As you can see, the type is also added to the path of the content when it gets generated to easily identify which type a content page is. Hugo uses folders to infer types and taxonomy. This can be overridden in the the “Front Matter”, the metadata section at the begging of your files.

In my archetypes folder I have a file called post.md that I use as a template for all of my blog posts. It contains this:

---
title: "{{ replace .Name "-" " " | title }}"
date: {{ .Date }}
draft: true
author: "Thomas Winsnes"
description: "Insert description"
tags:
  -
---
Write blog here

The section on the top between the two --- is the “Front Matter” and it is used to provide meta data about the post. In my case, the title, date it was written, tags etc. These values are used by Hugo to generate the html, and can be directly accessed by the theme. So as you can see it saves time and means you don't have to look up the names for your metadata value every time you create a new post.

The gotcha that I mentioned before, is about how Hugo selects which archetype file to use when generating a new content file. It uses the type that is inferred from the path, or if it's overridden in the “Front Matter” by setting type: typename.

When using the command above hugo new post/postname.md, it will look for the archetype in this order:

  1. archetypes/post.md
  2. archetypes/default.md
  3. themes/<themename>/archetypes/post.md
  4. themes/<themename>/archetypes/default.md

As you can see, it will look for the file that matches the type name in your root archetypes folder first, and if it can't, it will look for one called default.md as a fallback. If it can't find either, it will then look in your registered theme archetypes folder. So if you're aiming to use the archetypes from a theme, make sure you don't have a default archetype in your root archetypes folder. This tripped me up when working with this the first time.

There area a lot of powerful things you can do with archetypes and front matters which I won't go into details on here, but the hugo documentation covers a lot of it.

Netlify is easier and cheaper than google cloud

After setting out to try to build a simple and cheap static blog in the google cloud platform, I've come to the conclusion after trying to work through all the different services and tools offered, that google cloud platform isn't a good fit for this kind of site. And that is okay.

It was a fun experiment to see what I could get going and how cheap I could get it. I was initially thinking of running a simple GKE cluster and run the site on top of that, but then with the complexities of running multiple global load balancers and certificates to make sure I support ip v4 and v6 as well as supporting redirects from www.winsnes.io to winsnes.io both for http and https. It did start getting fairly complicated, and costly, as the redirection rules do add up.

The more I looked at the difficulties I had with getting everything to play nicely, and then comparing it to services like Netlify, which solves it all for you, and for this simple use case is free of charge. It gets hard to justify running this on google.

If I was building an app requiring a dynamic backend and data store, I would use GCP in the future, just not for this kind of simple static site.

Thanks for reading, I'll keep writing in the future, just not about this specific topic.

Root Domain Support

Now we have the website deployed on the Google Cloud Platform as a static site, it allows us to access it by going to http://www.winsnes.io. This is fantastic, but what if someone goes to http://winsnes.io?

This is the second post in the ongoing series about hosting a blog on the Google Cloud Platform for cheap.

When we set up a storage bucket with a static site hosted on it, we ask Google to take care of all the domain name routing that happens in the background. And Google will happily route all requests to www.winsnes.io to the static buck and display the content of the blog. But we never told google to do anything about winsnes.io, so that won't be routed correctly.

In the previous post we talked about how you can ford a sub domain to a different url using a cname, but not a root domain. More details on why that is can be found in this blog post by Dominic Fraser. What that means is that we can't create a cname for winsnes.io, we will have to use an a record. Some name server services, like AWS R53, support something called an aname or an alias which works as a cname but for root domains. Google sadly doesn't support this yet, so we have to find a different way to manage that.

To create an a record you need to have a static IP that you can point it to. In the Google Cloud Platform, you can get this a few different ways, like a virtual machine, but we are going to use the Google Load Balancing service, as it gives us an easy and scalable way to redirect traffic that can be managed through Terraform.

Deploying a Load Balancer using Terraform

To deploy a load balancer in the Google Cloud Cloud platform is a little different than other clouds. You deploy four different resources that together allows you to route traffic to where you need it to go. It is a bit odd if you're used to traditional load balancers, but it is very flexible.

First you create a Global Forward Rule which is the public endpoint of the load balancer. This is then connected to a Target Http Proxy which links the forward rule to the URL Map. The URL Map contains all the routing rules allowing you to route different requests to different backends. E.g. route all requests to winsnes.io/static/ to a storage bucket, while all requests to winsnes.io/api goes to a VM to handle the dynamic request.

In our case we are routing all requests to the same place, so we have a single rule in place. This rule routes all requests to the backend bucket service containing our blog.

The Terraform added to the main.tf file to accomplish this:

resource "google_compute_global_forwarding_rule" "default" {
  name       = "winsnesio-global-forwarding-rule"
  target     = "${google_compute_target_http_proxy.default.self_link}"
  port_range = "80"
  depends_on = ["google_compute_target_http_proxy.default"]
}

resource "google_compute_target_http_proxy" "default" {
  name        = "winsnesio-http-proxy"
  description = "Http proxy for winsnes.io"
  url_map     = "${google_compute_url_map.default.self_link}"
}

resource "google_compute_url_map" "default" {
  name            = "winsnesio-url-map"
  description     = "Url map for winsnes.io blog"
  default_service = "${google_compute_backend_bucket.static_storage.self_link}"

  host_rule {
    hosts        = ["${var.root_domain}"]
    path_matcher = "allpaths"
  }

  path_matcher {
    name            = "allpaths"
    default_service = "${google_compute_backend_bucket.static_storage.self_link}"

    path_rule {
      paths   = ["/*"]
      service = "${google_compute_backend_bucket.static_storage.self_link}"
    }
  }
}

resource "google_compute_backend_bucket" "static_storage" {
  name = "storage-backend-bucket"
  bucket_name = "${google_storage_bucket.static-store.name}"
}

After deploying this new load balancer, we will have a static IP that we can now use to update our DNS entry for winsnes.io. After the DNS changes have replicated to the major DNS servers, usually a couple of hours depending on the TTL of the entry, we should now be able to access the blog on http://winsnes.io!

Set up a blog on Google Cloud Store using Hugo

This post is the first part of a long running series about running a blog in the cloud using production ready DevOps practices on the cheap. The code in this blog post is the same code I use to run this blog, and it is all available in a public github repository. The automatic CI and CD processes will be ran directly out of this public repository, and you will be able to see all the moving parts of what makes this blog work. Any credentials and secrets will of course not be included ;)

I chose to go with the Google Cloud Platform for this, as it's the platform that has the least publicity around their services, and they have some services that is difficult to beat. Their Kubernetes offerings is probably the best in the market. And with their release of Anthos, which allows you to manage Kubernetes on different cloud providers, they are making a move to reduce lock-in to a specific platform.

Google is not as mature as the other two big platforms: Azure and AWS. And this is apparent when you look at some of the limitations that around hosting serverless and static sites. Some we will be having to work around as we build up this blog. That said, google probably has the best developer experience of the three major platforms.

Before we begin

Before we get started, there are a few things we need to get set up first.

We will be using Google Cloud so you will need to set up an account there, you can set up an new account and get US$300 of free credit to play with, and configure the CLI to interact with the account.

We will be using Hashicorp's Terraform to configure any infrastructure, I assume you have rudimentary knowledge of this, but you should be able to follow the guide without it.

Set up the git repository

In this solution, we will be deploying both infrastructure and the application, so we will need to structure the repository to support this. I generally try to do this by placing different components in different sub-folders, as this makes it a lot easier later when we automate the build and deploy process. We have to main components so far, infrastructure and blog, so we will create a folder for each of them. Also create a README.md file for documenting the solution.

This is the structure I'm using:

.
├── blog
│   └── .keep
├── infra
│   └── .keep
└── README.md

Initiate the git repository and commit the structure

git init
git add .
git commit -m "Initial Commit"

.keep is a convention make sure folders are committed to git, we will remove those later when we add files to the sub-folders.

Set up blog

Install Hugo

On macOS installing Hugo is easy using homebrew

In terminal:

brew install hugo

For other platforms, the official documentation has all the information needed to get Hugo set up: https://gohugo.io/getting-started/installing/

Generating the web-site scaffold

Hugo has a built in generator that we will ise to generate a scaffold for us that we will use to drive the the generation of the static web-site. It contains the folder structure that Hugo relies on, as well as a few defaults that we will have to update to work with the structure we require for our blog.

To generate the scaffold, go to the root of your git repository and type in this command:

hugo new site blog

Once that command completes you should have a structure like this in the blog folder:

.
├── archetypes
├── content
├── data
├── layouts
├── static
├── themes
└── config.toml

For more information on what the different folders are, and what they are used for, please visit the official Hugo documentation. This post is about how to set up and deploy on google, how to use Hugo can be a series in its own right.

Layout

Hugo comes with a theme engine, and you can either create your own, or download a theme from their theme library. For simplicity I went with downloading a theme that I liked, which I will use as a starting point and modify to fit my taste.

Installing a theme with Hugo is really easy, it's a matter of downloading the theme into a sub-folder in the themes folder and update the blog/config.toml file with the details of the theme.

I've used the hello-friend theme by panr. It's licensed under MIT, which means we can modify it at will, as long as we retain the license file intact.

The description of the theme has a good guide on how to install it, but for the lazy, I've included it here as well.

git clone https://github.com/panr/hugo-theme-hello-friend.git themes/hello-friend

Paste the following into blog/config.toml to get the site configured with a theme and defaults for the hello-friend theme:

baseurl = "/"
languageCode = "en-us"
theme = "hello-friend"
paginate = 5

[params]
  # dir name of your blog content (default is `content/posts`)
  contentTypeName = "posts"
  # "light" or "dark"
  defaultTheme = "dark"
  # if you set this to 0, only submenu trigger will be visible
  showMenuItems = 2
  # Show reading time in minutes for posts
  showReadingTime = false

[languages]
  [languages.en]
    title = "Hello Friend"
    subtitle = "A simple theme for Hugo"
    keywords = ""
    copyright = ""
    menuMore = "Show more"
    writtenBy = "Written by"
    readMore = "Read more"
    readOtherPosts = "Read other posts"
    newerPosts = "Newer posts"
    olderPosts = "Older posts"
    minuteReadingTime = "min read"
    dateFormatSingle = "2006-01-02"
    dateFormatList = "2006-01-02"

    [languages.en.params.logo]
      logoText = "hello friend"
      logoHomeLink = "/"
    # or
    #
    # path = "/img/your-example-logo.svg"
    # alt = "Your example logo alt text"

    [languages.en.menu]
      [[languages.en.menu.main]]
        identifier = "about"
        name = "About"
        url = "/about"
      [[languages.en.menu.main]]
        identifier = "showcase"
        name = "Showcase"
        url = "/showcase"

Make and changes to this as you see fit.

Testing the blog

To test what we have done so far, we open the terminal and go to the blog directory. Once there, start the hugo dev server with the following command:

hugo server -D

Once that has started, your new blog should be available at http://localhost:1313, if you keep this running, you will be able to see live changes to the blog as you modify settings and add posts.

Tweaks before we commit changes

First we will remove the .keep file in the blog folder, as we now have files there, and will add .keep files to all the empty directories as well. To help keep the directory structure for the future.

We'll also be adding a .gitignore file to filter out files and folders we do not want to preserve in git. This can be either be files with sensitive data, or generated files that are considered output of running the hugo static-site generator.

When Hugo runs, it will create two folders with static files that we will later publish to our hosting solution, we will add these folders to the .gitignore file in the blog folder to prevent us from accidentally committing these in the future.

blog/.gitignore should now contain this:

public/
resources/

Commit and continue

Commit all your changes to git and continue with setting up the Google Cloud project and configure service accounts to deploy the solution.

git add .
git commit -m "Added hugo blog with hello-friend theme"

The state of the repository should be similar to this: GitHub repository

Configure Google Cloud Project

To complete this part you will need to have the google command line utilities installed: gcloud and gutil. If you haven't yet, please go install them from the link given in the requirements section.

The first thing we will have to do is to set up a new project in GCP. Identity and access is managed using projects in the Google's cloud, by creating a new project we can get a fully isolated environment that other users and services don't have access to by default.

To create a new project you need to be logged into your account

gcloud auth login

This will open a browser window and allow you to log into your account. If it's the fist time you log in, it will also ask you to select a project.

Create the project

Creating the project is quite easy. In fact only one command is required

gcloud projects create --name <project-name>

This will create a new project called project-name and generate a unique id for that project, usually the name with a set of numbers as a suffix. Remember this id, as we will use it later.

Alternatively you can set the project directly, this gives you more control, but it has to be globally unique, so it will likely take you a couple of goes before you get it right. I wouldn't recommend it.

gcloud projects create <project-id>

Once the project has been created, we need to set it to the active project in the command line. We do this by calling the config command.

gcloud config set project <project-id>

Configure service account

Next up is configuring a service account for deploying the infrastructure. We are using Terraform, and the google cloud provider only supports using a service account to deploy the infrastructure. This is also considered best practice, and makes it a lot simpler when working with automated tools. Calling gcloud auth login opens a browser window, doing that as part of an automated script seems a bit silly. Also very difficult when running on a server that we are not logged into like we would when running the deployment through a CI/CD process.

To create a new service account we are going to use the gcloud command like we did when setting up the project.

gcloud iam service-accounts create <account-name> --display-name "<Service account name>"

Now that the service account has been created, we need to set up a key for the service account to authenticate with the Google API. This key will be stored in a file on disk to be referenced as part of the Terraform deployment scripts. Remember to add this to the .gitignore file, so you don't accidentally upload it to your git repository. Having access to this file means you can make changes to all the things the service account has access to.

mkdir -p infra/.creds
gcloud iam service-accounts keys create infra/.creds/gcp-sa-key.json --iam-account <account-name>@<project-id>.iam.gserviceaccount.com

The key should be stored in the infra/.creds/gcp-sa-key.json file. This will be referenced by Terraform when we deploy the infrastructure.

The service account is created, but it does not have access to anything. This is part of the security model that Google follows, accounts needs to be given explicit access to the resources the require. We will give it the owner role for the project. This is probably more access than it needs, but it's a good starting point, and we can reduce it once we got a better idea of what the service account needs to do. To give a role to a service account you crate a three way binding between the SA, a project, and a role.

gcloud projects add-iam-policy-binding <project-id> --member serviceAccount:<account-name>@<project-id>.iam.gserviceaccount.com --role roles/owner

The project has been set up and you should have a service account with access to deploy resources.

Infrastructure using Terraform

I'm a big proponent of using code to deploy infrastructure, Infrastructure as Code (IaC). If you have been working with the cloud for a while, it becomes second nature and quite surprising when you have to explain to someone why you have to do it. In short, it gives you many benefits over using a portal to deploy. Having you infrastructure as code allows you deploy the same environment multiple times, quickly, securely, and more importantly, always the same. No room for human error. In addition to this, it has a built in Disaster Recovery component. If something happens that forces you to redeploy the solution somewhere else, it is fast, easy and you can trust that it will deploy the same way. I'll stop there, there is enough information in the benefits and challenges to IaC that it would warrant it's own blog post, just keep in mind that in a majority of cases, IaC is probably the right choice.

There are as many different types of IaC as there are programming languages. Most cloud providers have their own flavour, as well as SDKs to allow you to integrate into their platforms using your programming language of choice. One of the key requirements I look for when choosing which approach and language I use is ‘state management’, which allows meta data to be added maintained for all the resources deployed. This gives a really big performance benefit when working in environments with a lot of different resources, as pulling information for each and every one of them is going to be very slow and prone to get rate limited by the cloud provider. Another key thing I look for is support for multiple different cloud providers. Using the same language across multiple providers, cloud and others. This eliminates the provider specific languages, and leaves 3rd party providers like Terraform and Pulumi.

I went with Terraform for this as that is the one I know the best.

Static file hosting and custom domains

Google Cloud Platform allows you to host static files in their storage buckets and make them public, this allows websites to keep their static files like images separate from their dynamic content in a cheaper and more scalable way. We are going to take advantage of this, but take it one step further. Because we do not have any dynamic content, why not host the whole website in the storage bucket? To do that, we need a way to default to a specific page if no specific path has been requested, like when you type http://www.example.com. Luckily Google Cloud allows us to specify this as part of the configuration of the storage bucket. It also allows us to configure our own domain.

To configure your own domain is quite easy, but has a few caveats that you need to know about.

  1. You need to prove that you own the domain, and the service account we use to deploy the bucket needs to have access to this.
  2. You use a CName to redirect to the bucket, this means you can not use http://example.com, but have to use http://www.example.com. The reason for this is the way the RFC for DNS is defined. More details on that can be found in this excellent blog post by Dominic Fraser.

To prove you own the domain you intend to use with your blow, please follow the documentation on the google docs page on this subject.

Once you have verified the domain, make sure you add your service account defined earlier as an owner of the domain. If you add it as a user or administrator, it will not work.

Terraform main file

I won't go into too much details on how to set up and configure terraform as this is something that has been covered many times before. I will rather go into the things that are different for the Google Cloud Platform.

Authentication

There are two main ways to authenticate with Google Cloud, you can either get an OAuth token using the google cli utilities, or you can download a key file and use that. The first of those two options requires user authentication, which isn't ideal for automation, so we have gone with the second option in this guide.

When we configure the terraform provider we pass in the path to the configuration file that we created earlier for the service account. In the future we will upload this file to our CI tool to allow it to authenticate as our service account and execute any changes we need.

provider "google" {
  credentials = "${file(".creds/gcp-sa-key.json")}"
  project     = "${var.google_project}"
  region      = "${local.region}"
}

Resources

There are two resources we need to deploy: the bucket and an acl object to give public users access to the files.

The storage bucket is configured to be located in the us and the name is set to the domain we intend to use for this website. It's important that this is the full domain, as it is what google uses to direct requests to the correct bucket.

We have also set up two pages that Google will redirect to by default, one for if there no path provided, and one if there is a path provided, but it doesn't lead to anything, a 404 error.

The ACL contains the default roles, with one additional one READER:allUsers, which makes the bucket public and allows users that have not authenticated to our google account read the files.

resource "google_storage_bucket" "static-store" {
  name     = "www.winsnes.io"
  location = "US"

  website {
    main_page_suffix = "index.html"
    not_found_page   = "404.html"
  }
}

data "google_project" "project" {}

resource "google_storage_bucket_acl" "static-store-default" {
  bucket = "${google_storage_bucket.static-store.name}"

  role_entity = [
    "OWNER:project-owners-${data.google_project.project.number}",
    "OWNER:user-${var.service_account}",
    "READER:allUsers",
  ]
}

DNS Configuration

Google cloud has made it very easy to configure a custom domain for a storage bucket, all you have to do, is configure a CName to redirect to a specific google domain (c.storage.googleapis.com.), and google takes care of the rest.

Add the DNS entry to your DNS zone and you should be able to access your blog once you upload the content.

Generate and Publish

Now that everything is set up and ready, all that is left is to generate the static content and publish it to the bucket.

To generate the content go to the blog folder and use the generate command for hugo

hugo

This commands will create two folders in the blog directory: public and resources. Next we will upload the content of the public folder to the storage bucket using gsutil.

gsutil -m cp -r public/* gs://<name of bucket>

In my case I use www.winsnes.io as the bucket name.

Once the command is completed, you should see the content of the blog by going to the registered DNS entry.

Every time you create a new blog post, you need to step through this last process and upload the generated content. This is something that I will automate using CI/CD. But that is something we will look at in the future.