2016/11/12

GDG Berlin Golang "Movember Gophers" に行ってきた

昨日ドイツのベルリンについたばかりで、少し時差ボケってたんですが、なんとなくGDG Berlin Golang "Movember Gophers"に参加できました。

ちなみに「Movember」というのが「Moustache」と「November」の組み合わせなというわけで、イベントページのゴーファー君も立派な口ひげしてますね。

僕、カナダ人ですが、実はプログラミングや開発に興味を持ち始めたのが日本にきてからなので、こんな感じに日本の外で勉強会に参加するのって初めてでした。世界中の勉強会で何が普通なのか全くわからないんですが、とりあえずピザとビールが決まりみたいです。

ただ昨日は折り畳めて食べる巨大なピザでした・・・

あと、日本ではいつもトークを聞いてから、懇親会を行うんですが、このイベントではピザとビールが最初から出してあって、みんなが集めるの待ってるうちに飲む感じでした。トークとトークの間にみんながビールを取りに冷蔵庫にダッシュしたところ面白かったですww

さて、ピザとビールの件をほといて、トークの内容の話をしましょう。


最初の発表者が@matryer というイギリスの方でした。GoでTDDをする話をしてくれました (スライドはこちらです)。

TTDを行うときに便利なTIPを色々紹介してくれました。例えば:

  • Silk というマークダウンで書かれたドキュメントからHTTPのテストを実行できるパッケージ
  • go test -cover コマンドでテストのカバレージを確認する方法
  • 特別な理由がなければ、テストヘルパーを使わないで標準testパッケージをそのまま使おう (Go作者の意見)
  • 外部dependencyを避けるため、実際のオブジェクトじゃなくてモックしたオブジェクトを利用する (つまりinterfaceでtypeを柔軟にする)
  • interfaceを元にモックのstructを自動生成ツールの紹介
  • あるパッケージを外部からテストしたいとき、テスト専用のパッケージを作ってもいい(テストの場合同じディレクトリに複数パッケージをおいても大丈夫)


2番目の発表がに@konradreicheによるConsumer Driven Contract Testing in Goでした。

マイクロサービスのインテグレーションテストで出てくる問題を解決しようとしているPactの紹介でした。(概要から日本語で説明する自信ないので、クックパッドの記事を参考にするといいでしょう)

Pact for Goが実際にTape.tvでどういうふうに使われているかも説明してくれました。Box Officeというチケット購入システム(Ruby)とBouncerという認証用API(Go)の違うチームによって管理されているマイクロサービスがあるそうですが、共通のAPIをPactで定義することで管理のところを一部自動化できてて安心らしいです。

ただConsumer Driven Contractを使えるまでのプロジェクトのセットアップやチームが新しい仕組みになれるまでの時間の面で少し大変らしいです。


最後に@fortytw2watneyというRubyのvcrの移植の紹介をしました。HTTPリクエストをキャプチャーし、次回に実行した際にHTTPリスポンスを再利用することで、ネットワーク障害の影響でこける確率を低くしたり、テストを早くしたりするとても便利そうなパッケージです。

2016/11/02

Minna no Go Gengo: A Summary / Review in English (chapter 3)

Here's a continuation of my summary of the Japanese Go programming book: Minna no Go Gengo. This is chapter 3.

For anyone who has missed my other summaries, here are chapter 1 and chapter 2.

How to Make practical applications

Author: Fujiwara Shunichiro (aka @fujiwara)

3.1 Opening

First of all what does the author mean by "practical applications"?
A practical application...

  • makes it easy to look up what kind of operations it performs
  • has good performance
  • can support different inputs and outputs
  • is easy for humans to use
  • is easy to maintain

The two github repos below are referenced frequently throughout the chapter as real-world practical applications written. Both are applications created by the author.

3.2 Version Control

Many Go programs can be shipped as a single binary, so in comparison to interpreted languages, the deployment process is generally much simpler. However since we're dealing with binaries it's a good idea to make it easy to programmatically obtain the version number of the binary so users can check if they have the latest version. Using the flag package to capture whether the program was invoked with -v or --version flags is common, but instead of hardcoding the version into the source code, the author recommends making use of git tags to store the version number and then passing it to the code using the build argument ldflags. A Makefile could for example do something like this:


#!/bin/sh

GIT_VER=`git describe --tags`
go build -ldflags "-X main.version=${GIT_VER}"


The Makefile for fluent-agent-hydra seems to make use of this very technique.

3.3 Efficient Use of I/O

This section demonstrates why and how bufio should be used when dealing with I/O operations.

The first point the author makes is how useful bufio.Reader.Peek can be when you run into a situation where you want to validate data coming in from STDIN, but don't want to read everything in the buffer quite just yet. An example of this kind of scenario is in the application stretcher which expects to receive a valid JSON string via STDIN. Although it's possible to read the entirety of the input into memory and then check whether it is valid JSON or not, it's more efficient to pass the input in io.Reader directly to encoding/json.Decoder. This is where bufio.Reader.Peek comes in. A call to Peak() can be used to check if the first character looks like the beginning of a JSON array ("[") , and if not we can simply return an error without bothering to read the rest of STDIN.

Another important point the author brings up is the difference between buffering in Go as opposed to in interpreted languages such as Ruby, Perl and Python. Interpreted languages generally handle the buffering of text output automatically at run time when their enclosing program is handed to a pipe, thereby reducing the number of costly system calls. Go, on the other hand, doesn't automatically buffer anything.

For example, try inspecting the following program using strace -e trace=write ./filename | cat


package main

import (
  "fmt"
  "os"
  "strings"
)

func main() {
  for i := 0; i < 100; i++ {
    fmt.Fprintln(os.Stdout, strings.Repeat("x", 100))
  }
}


Checking the output of strace on the above program reveals that a total of 100 system calls are recorded, indicating that no buffering has taken place. The bufio package can help us improve this example.


package main

import (
  "bufio"
  "fmt"
  "os"
  "strings"
)

func main() {
  b := bufio.NewWriter(os.Stdout)
  for i := 0; i < 100; i++ {
    fmt.Fprintln(b, strings.Repeat("x", 100))
  }
  b.Flush()
}


By wrapping os.Stdout with a *bufio.Writer we can delay the system calls until Flush() is called. The default buffer size is 4096 bytes, but it can be increased as necessary. Inspecting our new and improved program in strace will show that the number of system calls has been reduced to 2 whether we pass our program to a pipe or not.

3.4 Handling random numbers

This section mostly just explains the difference between math/rand (pseudo random number generator) and crypto/rand (cryptographically secure pseudo random number generator) and shows how they can be used. I think this topic is pretty well covered in English.

3.5 Human readable numbers

The package recommended for converting file sizes and time stamps to human readable format is: go-humanize. Again the documentation for this package is in English, so probably I don't need to summarize. Just if you need to convert numbers to a more readable format, use this package rather than wasting time trying to do it yourself.

3.6 Executing external commands through Go

Generally speaking executing other programs through Go incurs a penalty in terms of starting up other processes in the background and sending data to external commands, so performance-wise, it's often preferable to implement a lot of things in pure Go. However there are of course instances where it is better to delegate the work to an existing program. This section mostly just models how to use the os/exec package to call external programs. One thing I didn't know is that if you call sh through os/exec you can use redirects and other shell sigils (>, ||, &&, etc) as normal.


exec.Command("sh", "-c", "some_command || handle_error").Output()


3.7 Timing out

While a lot of existing packages like net/http handle timeouts for you, sometimes you might want to implement a timeout yourself. This section demonstrates how you can use the time package and channels to implement a timeout yourself.


// A 10 second timer
timer := time.NewTimer(10 * time.Second)
// a channel to receive the result
done := make(chan error)

go func() {
  // call the function you want to run asynchronously in a goroutine
  done <- doSomething() // a function that returns an error
}

// use select to wait for a response from multiple channels
select {
case <-timer.C:
  return fmt.Errorf("timeout reached")
case err <-done:
  if err != nil {
    return err
  }
}


3.8 Working with signals

Go's default handling of signals is documented in os/signal. This section demonstrates how you might change the default behaviour in your own programs. A few examples of why you might want to do this is for example you have a server application and want to finish processing all incoming requests before closing or you want to make sure your program finishes writing everything currently in the buffer and properly closes open files before exiting itself. The example in the book is relatively close to the example from the os/signal docs for Notify(), so a read through that will give you the gist.

3.9 Stopping goroutines

It's easy to start a goroutine, but stopping one goroutine from inside another can be a bit tricky. There are two main ways to do this: using channels, or using the context package provided by go 1.7 and later.

The example code on page 80 shows how to use channels to stop a goroutine. The code demonstrates a program that implements concurrent workers which can process data from a queue.


package main

import (
  "fmt"
  "sync"
)

var wg sync.WaitGroup

func main() {
  queue := make(chan string)
  for i := 0; i < 2; i++ { //make two workers (goroutines)
    wg.Add(1)
    go fetchURL(queue)
  }

  queue <- "http://www.example.com"
  queue <- "http://www.example.net"
  queue <- "http://www.example.net/foo"
  queue <- "http://www.example.net/bar"

  close(queue) // tell the goroutines to terminate
  wg.Wait()    // wait for all goroutines to terminate
}

func fetchURL(queue chan string) {
  for {
    url, more := <-queue // more will be false when this closes
    if more {
      // process the url
      fmt.Println("fetching", url)
      // ...
    } else {
      fmt.Println("worker exit")
      wg.Done()
      return
    }
  }
}


When you call close() on a channel from the sending side, the second variable passed to the receiving side (the variable more evaluates to false. The goroutine is then able to close itself (by calling return) when there is no more data to receive.

The context package can be used to achieve much the same thing. The advantage that context brings to the table is a function called context.WithTimeout() which lets you handle timing out and cancellation in one fell swoop. Go Concurrency Patterns: Context from the official golang blog covers its usage extensively.

That wraps up the topics introduced in this chapter. A lot of the content was new to me, so hope to make use of some of the techniques in the future.

2016/10/17

Using Packer and Terraform on Digital Ocean

I'm in the middle of an international move, which means that it's kind of slow to connect to the VPS I had been using, since they are based in Digital Ocean's Singapore region. I have a couple chef recipes to help me automate, so moving my servers over to a region closer to me shouldn't be too painful, but even just installing rbenv and getting Chef up and running can be a bit of a pain. What better time to teach myself how to use Packer and Terraform?

So this is the agenda for today:

  • Use Packer to setup users and install nginx openresty on a snapshot using Chef as the provisioner
  • Create a Terraform configuration that will use the snapshot created above to startup a Droplet

Packer

First, make a directory to contain your packer configuration files and enter the directory:


crimson@dixneuf 14:07 ~/ $ mkdir packer 
crimson@dixneuf 14:07 ~/ $ cd packer 


Next, create a json file to hold the configuration settings. You can name it whatever you like, so best to name it something memorable to remember what it is later. I called mine bustermachine.json (because I like to name my VPS after Top wo Nerae 2, why not? ¯\_(ツ)_/¯). This is the basic configuration:


{
  "builders": [{
    "type": "digitalocean",
    "api_token": "YOUR-API-TOKEN",
    "region": "fra1", 
    "size": "512mb",
    "image": "centos-7-2-x64",
    "droplet_name": "bustermachine",
    "snapshot_name": "bustermachine-img-{{timestamp}}"
  }]
}


This will create a snapshot using the centos 7.2 image in the Frankfurt 1 region. The droplet size is set at 512MB, but the size can be scaled up when creating a new droplet from this image, so making the smallest size can't hurt.
The snapshot name must be unique and is what will appear in the DigitalOcean console, so it's a good idea to set it to something memorable + a timestamp. You can find additional configuration options in Packer's official documentation.

The configuration above will build an empty server with nothing running, so let's do some provisioning with chef-solo:


  "provisioners": [{
    "type": "chef-solo",
    "cookbook_paths": ["cookbooks"],
    "data_bags_path": "data_bags",
    "run_list": [ "recipe[local-accounts]", "recipe[nginx]" ]
  }]


The cookbooks_paths and data_bags_path are relative to the working directory (our ~/packer folder), but you can also define an absolute path to an existing chef repository on your local machine. What kind of recipes you want to run is up to you, but I'm just going to run one that sets up a user account and one that installs installs openresty nginx.

OK. Let's build it.


crimson@dixneuf 14:14 ~/packer  $ packer build bustermachine.json
digitalocean output will be in this color.

==> digitalocean: Creating temporary ssh key for droplet...
==> digitalocean: Creating droplet...
==> digitalocean: Waiting for droplet to become active...
==> digitalocean: Waiting for SSH to become available...
==> digitalocean: Connected to SSH!
==> digitalocean: Provisioning with chef-solo
    digitalocean: Installing Chef...
    digitalocean: % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    digitalocean: Dload  Upload   Total   Spent    Left  Speed
    digitalocean: 0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
    sudo: sorry, you must have a tty to run sudo
    digitalocean:  25 20058   25  5000    0     0   4172      0  0:00:04  0:00:01  0:00:03  4177
    digitalocean: curl: (23) Failed writing body (1855 != 2759)
==> digitalocean: Destroying droplet...
==> digitalocean: Deleting temporary ssh key...
Build 'digitalocean' errored: Error installing Chef: Install script exited with non-zero exit status 1

==> Some builds didn't complete successfully and had errors:
--> digitalocean: Error installing Chef: Install script exited with non-zero exit status 1

==> Builds finished but no artifacts were created.


Uh oh. Since I'm getting the message sudo: sorry, you must have a tty to run sudo, it looks like CentOS' default sudo permissions are getting in the way of Chef's installation. We can work around this by defining "ssh_pty": true in the builder portion of our Packer configuration.

Now the configuration file looks like this. Now that we've got that taken care of, let's try building again:


crimson@dixneuf 14:20 ~/digitalocean/packer  $ packer build bustermachine.json
digitalocean output will be in this color.

==> digitalocean: Creating temporary ssh key for droplet...
==> digitalocean: Creating droplet...
==> digitalocean: Waiting for droplet to become active...
==> digitalocean: Waiting for SSH to become available...
==> digitalocean: Connected to SSH!
==> digitalocean: Provisioning with chef-solo
    digitalocean: Installing Chef...

....

    digitalocean: Running handlers:
    digitalocean: Running handlers complete
    digitalocean: Chef Client finished, 14/14 resources updated in 31 seconds
==> digitalocean: Gracefully shutting down droplet...
==> digitalocean: Creating snapshot: bustermachine-img-1476642546
==> digitalocean: Waiting for snapshot to complete...
==> digitalocean: Error waiting for snapshot to complete: 
Timeout while waiting to for droplet to become 'active'
==> digitalocean: Destroying droplet...
==> digitalocean: Deleting temporary ssh key...
Build 'digitalocean' errored: Error waiting for snapshot to complete: 
Timeout while waiting to for droplet to become 'active'

==> Some builds didn't complete successfully and had errors:
--> digitalocean: Error waiting for snapshot to complete: 
Timeout while waiting to for droplet to become 'active'

==> Builds finished but no artifacts were created.


Ok this time there was a timeout... but a look at Packer's Github issues shows that this problem is a known bug that will be fixed in the next version of Packer (I'm on version 0.10.2). Even though Packer timed out, the Digital Ocean console shows that our snapshot was created ok.

Since we've successfully created our first snapshot, let's move on to creating a Droplet in Terraform.

Terraform

First make a directory to hold our configuration files and from where we will execute all our terraform commands:


crimson@dixneuf 14:07 ~/ $ mkdir terraform
crimson@dixneuf 14:07 ~/ $ cd terraform


Before we continue, though, the next step requires that we know the ID of the snapshot we just created, which is different from the slug name that appears in the console. We can use the Digital Ocean API to look that up:


curl -X GET -H "Content-Type: application/json" -H "Authorization: Bearer " / 
"https://api.digitalocean.com/v2/snapshots"
{
  snapshots: [
    {
      id: 20321411,
      name: "bustermachine-img-1476642546",
      regions: ["fra1"],
      created_at: "2016-10-16T18:31:29Z",
      resource_id: 29342757,
      resource_type: "droplet",
      min_disk_size: 20,
      size_gigabytes: 1.35
    }
  ],
  links: { },
  meta: { total: 1 }
}


In my case it's id: 20321411, so we'll have to use that ID in our Terraform config file. Let's make that file now and name it config.tf:

variable "do_token" {}

# Configure the DigitalOcean Provider
provider "digitalocean" {
    token = "${var.do_token}"
}

This first part just lets Terraform know that we intend to use Digital Ocean as our provider, but we have to pass it our API token. We can do this in two ways. One way is to create a file called terraform.tfvars to contain our variables, or we can pass the variable using the command line when we call terraform plan later on (terraform plan -var 'do_token=foo'). I recommend checking out the documentation.
Next we need to define the resources we intend to create. Here's my config for creating a Droplet named vingtsept using the snapshot ID I obtained earlier (20321411) in the image definition:

# Create a web server
resource "digitalocean_droplet" "vingtsept" {
    image = "20321411"
    name = "vingtsept"
    region = "fra1"
    size = "1gb"
    ssh_keys = [4055393]
}

Note that, although our snapshot was created from a 512MB Droplet, we can create a larger 1GB Droplet from it (but making a smaller Droplet from a bigger sized snapshot is not possible).
I already had an ssh key registered on Digital Ocean so I set the ssh key id (also obtainable via the Digital Ocean API), but if you need to upload a new one you can also use Terraform to do it:

resource "digitalocean_ssh_key" "default" {
    name = "dixneuf"
    public_key = "${file("/Users/crimson/.ssh/id_rsa.pub")}"
}

Now that we've finished creating our config file, let's try running terraform plan to check our configuration:


crimson@dixneuf 15:28 ~/terraform  $ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but
will not be persisted to local or remote state storage.

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

+ digitalocean_droplet.vingtsept
    image:                "20321411"
    ipv4_address:         ""
    ipv4_address_private: ""
    ipv6_address:         ""
    ipv6_address_private: ""
    locked:               ""
    name:                 "vingtsept"
    region:               "fra1"
    size:                 "1gb"
    ssh_keys.#:           "1"
    ssh_keys.0:           "4055393"
    status:               ""

Plan: 1 to add, 0 to change, 0 to destroy.


Looks ok, so let's create the Droplet already.


crimson@dixneuf 15:28 ~/terraform  $ terraform apply
digitalocean_droplet.vingtsept: Creating...
  image:                "" => "20321411"
  ipv4_address:         "" => ""
  ipv4_address_private: "" => ""
  ipv6_address:         "" => ""
  ipv6_address_private: "" => ""
  locked:               "" => ""
  name:                 "" => "vingtsept"
  region:               "" => "fra1"
  size:                 "" => "1gb"
  ssh_keys.#:           "" => "1"
  ssh_keys.0:           "" => "4055393"
  status:               "" => ""
digitalocean_droplet.vingtsept: Still creating... (10s elapsed)
digitalocean_droplet.vingtsept: Still creating... (20s elapsed)
digitalocean_droplet.vingtsept: Still creating... (30s elapsed)
digitalocean_droplet.vingtsept: Still creating... (40s elapsed)
digitalocean_droplet.vingtsept: Creation complete

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

The state of your infrastructure has been saved to the path
below. This state is required to modify and destroy your
infrastructure, so keep it safe. To inspect the complete state
use the `terraform show` command.

State path: terraform.tfstate


And there it is. The Droplet is up and running. Makes for incredibly painless server setup if you ask me!

2016/10/05

Minna no Go Gengo: A Summary / Review in English (chapter 2)

It took me way longer than I expected to sit down and write this, but now that I have a bit of downtime before my next flight, I'd like to continue my summary of Minna no Go Gengo.

How to build multi-platform tools for your workplace
Author: @mattn

This chapter encourages readers to build multi-platform tools (Windows, Mac, Linux, etc) to support different devices coworkers use and gives some guidelines on how this can be done effectively in Go. Here is a bit of a break down of what each of the sections are and what kind of info they cover:

2.1 Why build internal tools in Go

Go makes it possible to statically build a runnable module for various OSes, so there is no need to ask users to install the Go runtime on their machines. Thanks to this, there is no worry that a different runtime implementation on a different OS will behave differently. Distributing a single binary file is all that is needed to let others use a Go program. Both of these things make Go a really good choice for internal tooling.

2.2 Implicit rules to follow

Rule one is use path/filepath to interact with the filesystem and not the path package. These two packages might be confusing to new users of Go: while path/filepath is pretty explanatory, path is a package meant for resolving relative paths in a http or ftp context. Because the path package does not recognize "\" as a path separator even on Windows, accessing a url like http://localhost:8080/data/..\main.go on a web server that makes use of the path package to locate static files could be used to expose the raw contents of other files on the filesystem

Rule 2 is to use defer to clean up resources. This is pretty well documented elsewhere, so I don't think I really need to elaborate.

The next recommendation is of particular concert to anyone who deals with Japanese or languages containing multibyte characters. Anyone interacting with programs that make use of the ANSI API in Windows to produce output will have to make use of an appropriate encoding package like golang.org/x/text/encoding/japanese to convert the input from ShiftJIS to UTF-8.

2.3 Using TUI on Windows

Linux based Text-based User Interfaces use a lot of escape sequences, many of which don't display properly in Windows. In Go you can use a library called termbox to make the process of making multi-platform TUI applications easier. Another recommended program is one of the author's own tools: go-colorable which can be used which can help produce coloured text in log output, etc.

2.4 Handling OS Specific Processes

Use runtime.GOOS to determine the OS from within a program.

This section also covers build constraints, but this topic is already well covered in English in the documentation, so I won't go into detail here.

2.5 Rely on existing tools instead of trying too hard

While it is technically possible to daemonize processes in Go by using the syscall package to call fork(2), the multithreaded nature of Go makes this a bit tricky. So it is generally recommended to use external tools to handle the daemonizing of a Go program. For Linux for example check out daemonize, supervisord and upstart and for Windows check out nssm

For Unix a regular user can't listen on port 80 or 433 so a lot of unix servers are configured to start as root and use setuid(2) to demote the permissions. However it's not recommended that you use setuid(2) in Go because it only affects the current thread. Instead use nginx or another server to reverse proxy requests from 80 or 433 to another port that Go can listen on.

2.6 Go likes its single binaries

Go makes deployment as easy as placing a single binary file on a server, but in the case of larger programs like web applications (for example) sometimes templates, pictures and other files are necessary. Try using go-bindata to pack static files as assets in a binary so that you don't have to sacrifice ease of deployment.

2.7 Making Windows applications

This section covers how to toggle whether or not your Go program displays a command prompt or not using the -ldflags="-H windowsgui" with go build and also how to link resource files (like the application's icon) using IDI_MYAPP ICON "myapp.ico"

And here are some recommended packages for building multi-platform compatible GUIs:

2.8 Configuration files

The first part of this section covers different file formats like INI, JSON, YAML, TOML and covers their strengths and weaknesses.

Aside from file format, file location on each platform can also be a source of confusion when configuring applications. On UNIX systems the standard was to place each file in the home directory like $HOME/.myapp originally, but more recently the XDG Base Directory Specification recommends that config files be placed under $HOME/.config/.

Similarly on Windows it's no problem if you use %USERPROFILE%\.config\, but the author mentioned he often places config files under %APPDATA%\my-app\.


Well that's the gist of it. I haven't really built software for Windows before mostly just because it seemed like too much trouble, but this chapter sure made it look like Go is making that whole process much easier for those of us who are used to developing for Linux.

For anyone who missed my (much briefer) summary of chapter 1, you can find it here.

2016/09/13

Minna no Go Gengo: A Summary / Review in English (chapter 1)

My copy of Minna no Go Gengo, the Golang book with the cover everyone loves, came in the mail today!

Just last week, I was talking to some gophers based in Germany and when I mentioned that a coworker of mine recently published a chapter in a book on Go, one of the guys immediately asked "Is it the one with the gophers and robot on the cover?". Since the book only exists in Japanese I was surprised that he knew about it, but I guess I shouldn't be too suprised. The cover is just too good not to share, am I right?

So I thought i'd write a quick review / summary of each chapter in case there are some gophers out there who are interested in knowing what's in the book. I've just barely begun reading, but as the title, Minna no Go Gengo (Everyone's Golang), suggests, it covers a number of topics for a range of different skill levels from how to get started for absolute beginners to more advanced topics like reflection. Each chapter is written by a different author, all of whom are well-known OSS contributors here in Japan.

The first chapter is the most beginner friendly, but also contains some stellar tips about how to write Go code in a Go-like way.

How to start writing Go code for team development.

Author: Matsuki Masayuki (aka @Songmu)

This chapter starts out with the essentials: how to install Go, an introduction to some of the core command-line tools used in Go development as well as suggestions for some useful third party tools like ghq, peco and glide). It's super concise and does a great job of covering the essentials without being too verbose.

For anyone who's written Go code before, the meat of the chapter, though is in the style guide which highlights some differences between writing programs using scripting languages like Ruby and Perl vs writing in Go. For example:

  • Avoid using regexp: use the strings package wherever possible instead. Why? They can be really slow, sometimes even slower than Perl regexp... which is pretty bad for a pre-compiled program.
  • Avoid maps. Because Go is a strongly typed language it is better to use structs. Also maps are not thread-safe. If you need to use a map alongside concurrency embed one in a struct alongside a sync.RWMutex.
  • Don't overuse concurrency. While Go is great for concurrency overusing it not only makes programs harder to read, but also increases the likelihood of race conditions.
  • Use the -ldflags and -tag options to embed useful information in a binary when using go build.
  • runtime.NumGoRoutine and runtime.ReadMemStats are useful monitoring metrics for web servers and other long running programs. golang-stats-api-handler is a useful library that provides an api interface to the go runtime package.

Hardly an exhaustive list, as this chapter is packed with useful info for people who are transitioning to Go from other languages and gives a good introduction to how to get into a Go mindset. I am looking forward to reading and writing up on the remaining chapters.

2016/08/07

Using OpenResty's access_by_lua and the satisfy any directive

So recently I learned that OpenResty's access_by_lua_block and the satisfy any directive don't play nice together. To be honest I didn't have a very compelling reason to use an access_by_lua_block to begin with. Ideally I would have used a set_by_lua_block, but subrequests using ngx.location.capture are disabled by it since it is a blocking function.

Still I felt a bit conflicted about whether I should be using access_by_lua or rewrite_by_lua, since technically all I really wanted to do was set a variable (to print in the access logs) and am neither authenticating or rewriting. Using either one seems like a hacky workaround.

As it turns out, rewrite_by_lua is the much safer option if you use any authentication directives. Take for example this situation:

  • I need to set a variable using an external microservice and decide to do so with ngx.location.capture
  • It's a private API with complex authentication rules (ie a combination of IP blocking and basic auth)

So something like this:


server {
  set $my_special_variable "0"; # fallback value if no response from microservice
  access_by_lua_block {    
    local res = ngx.location.capture("/microservice")
    if res then
      ngx.var.my_special_variable = res.body
    end
  }

  location ~ /private/api/endpoint {
    satisfy any;
    allow [some ip address];
    deny all;

    auth basic "unauthorized";
    auth_basic_user_file /etc/nginx/.htpasswd;

    proxy_pass http://my_backend_server;
  }
}
.

Unfortunately the access_by_lua_block counts as a satisfied condition for the satisfy any; directive and suddenly my API is opened up to the world: neither the correct IP or basic auth is required to access it anymore. Everyone can access it. Huge security risk to say the least.

So it still feels bit hacky since the subrequest to the microservice isn't involved doing any uri rewriting to speak of, but rewrite_by_lua_block is definitely the better option in this situation. Glad I caught this technicality early on.

2016/07/31

Testing Out Google's Natural Language API

Since today is the last day of the Google Natural Language API's free public beta I thought I'd give it a little spin. One of the applications for the API listed on google's promotional page was analyzing product reviews... which reminded me that late last year I made a Slack webhook that extracts Google Play and App Store reviews and happen to still have a stockpile of those laying around in a database of mine, so what better sample data to use for this experiment?

Apple and Google provide rating data (1 to 5 stars) so I know what percentages of the users gave unfavourable reviews, but it would be a good idea to try to narrow down some of the things users are complaining about. Perhaps that's something we can tackle with this API? Let's try it.

My sample data is stored in a database with the following structure:


sqlite> .schema
CREATE TABLE review (
  id INT(11) NOT NULL,
  title VARCHAR(255) NULL,
  content TEXT,
  rating TINYINT(3) NOT NULL DEFAULT 0,
  device_type TINYINT(3) NOT NULL,
  device_name VARCHAR(255) NULL,
  author_name VARCHAR(255) NULL,
  author_uri VARCHAR(255) NULL,
  created DATETIME NULL,
  updated DATETIME NOT NULL,
  acquired DATETIME NOT NULL,
  PRIMARY KEY (id, device_type)
);
CREATE INDEX updated_idx on review(updated);
CREATE INDEX rating_idx on review(rating);
.

At this point I don't really care about platform (although I certainly could further break down my user samples by device type if I was so inclined), so I'm just going to collect the comments from users who gave a distinctly bad rating (of 1 or 2 stars) to feed to the API with a simple query like this:


SELECT content FROM review WHERE rating < 3;
.

If I throw the main body of the review text at Google's API and see if it'll come up with some salient keywords (and how often they are brought up) perhaps it'll give us a better clue what it is the users are complaining about.

So first we need to authenticate with the API.

Creating a service key file for authentication is straight forward enough, so I'll just link to the documentation here and once you have one of those all you need to do is use the gcloud command to authenticate and print an access-token.


$ gcloud auth activate-service-account --key-file=kinmedai-cb03d32572c2.json
Activated service account credentials for: [user@projectname.iam.gserviceaccount.com]
$ gcloud auth print-access-token
[[output omitted]]
.

Now that I'm ready to access the API, I decided to create a script in Go to do the dirty work for me. So the first step is to use the sample json payload and response body data from the getting started docs to generate structs in Go. Writing structs by hand is a pain, so I used JSON to Go to generate the bulk of it and then tweaked it a bit like so:

The request structure:


type EntityRequest struct {
  EncodingType string                `json:"encodingType"`
  Document     EntityRequestDocument `json:"document"`
}

type EntityRequestDocument struct {
  TypeName string `json:"type"`
  Content  string `json:"content"`
  Language string `json:"language"`
}
.

And the response structure:


type EntityResponse struct {
  Entities []DetectedEntity `json:"entities"`
  Language string           `json:"language"`
}

type DetectedEntity struct {
  Name       string          `json:"name"`
  EntityType string          `json:"type"`
  Salience   float64         `json:"salience"`
  Mentions   []EntityMention `json:"mentions"`
  Metadata   struct {
    WikipediaUrl string `json:wikipedia_url"`
  } `json:"metadata"`
}

type EntityMention struct {
  Text struct {
    Content     string `json:"content"`
    BeginOffset string `json:"beginOffset"`
  } `json:"text"`
}
.

Now that I know what kind of data I'll be dealing with I can start building my request.


func createEntityRequests() []*EntityRequest {
  dbh := getDBH()
  rows, err := dbh.Query(`SELECT content FROM review WHERE rating < 3`)
  if err != nil {
    log.Fatal(err)
  }

  var entities []*EntityRequest

  for rows.Next() {
    var comment string
    err = rows.Scan(&comment)
    if err != nil {
      log.Fatal(err)
    }

    // Google Play lets users submit ratings with no comments (stars only ratings) so skip those
    if len(comment) == 0 {
      continue
    }

    entityRequest := &EntityRequest{
      EncodingType: "UTF8",
      Document: EntityRequestDocument{
        TypeName: "PLAIN_TEXT",
        Content:  comment,
        Language: "JA",
      },
    }
    entities = append(entities, entityRequest)
  }

  return entities
}
.

Next I'll need to create a function that posts to the entities analysis API. Again, the quickstart docs summarize this process very clearly, but it's a basic HTTP post request with a json payload and the access token we got from gcloud set in the Authorization header.

I'm planning on passing the token directly from standard input so I can pipe my script with gcloud, but more on that later. First the request:


func postEntity(accessToken string, entityRequest *EntityRequest) []byte {
  jsonEntity, _ := json.Marshal(entityRequest)
  req, err := http.NewRequest("POST", ENTITIES_URL, bytes.NewBuffer(jsonEntity))
  if err != nil {
    log.Fatal(err)
  }
  req.Header.Set("Content-Type", "application/json")
  req.Header.Set("Authorization", "Bearer "+accessToken)

  client := &http.Client{}
  res, err := client.Do(req)
  if err != nil {
    log.Fatal(err)
  }
  defer res.Body.Close()

  body, err := ioutil.ReadAll(res.Body)

  if res.StatusCode != http.StatusOK || err != nil {
    log.Fatal(res.Status)
    log.Fatal(err)
  }

  return body
}
.

Now you might have noticed that this is not the most efficient way to get feedback since I am sending one request per review. At the moment I only have 220 reviews for my test data, so it's not a big deal, but if I was actually planning on using this on any regular kind of basis it could potentially be very expensive (and also slow) to do it this way. Since we don't need to associate any other of the data with the content we could potentially amalgamate several reviews into one body of text and send the data in a batch.

However at this point I'm not even 100% sure this experiment is going to produce any kind of meaningful result, so for the time being I'm going to analyze each review individually. Better to make sure it works before spending time optimizing it, right?

Anyway, let's piece the rest of the script together. I know I'm going to have to pass the script my API access token as well as the path to the db (I'm using sqlite3), so my interface is going to look something like this:


$ gcloud auth print-access-token | go run kinmedai.go -d /path/to/sqlitedbname.db
.

So for my main block I'm going to grab my two parameters via stdin and flags and build the request payloads. Then I'll post each payload to the API and parse any entities from the http response body into the detectedEntities map that'll be used to count how many times a specific term was referenced:


func main() {
  flag.Parse()

  var accessToken string
  fmt.Scan(&accessToken)

  entityRequests := createEntityRequests()
  detectedEntities := make(map[string]int)

  for i := 0; i < len(entityRequests); i++ {
    var entityResponse EntityResponse
    body := postEntity(accessToken, entityRequests[i])
    json.Unmarshal(body, &entityResponse)

    for j := 0; j < len(entityResponse.Entities); j++ {
      entity := entityResponse.Entities[j]
      detectedEntities[entity.Name] = detectedEntities[entity.Name] + 1
    }
  }

  for k, v := range detectedEntities {
    fmt.Printf("%s: %d\n", k, v)
  }
}
.

Yikes! That's a lot of for loops! But as I said I'm still in the experimental phase so I'm not going to worry about how fast this runs just yet.

And the output looks something like this (did I mention I was parsing Japanese reviews?):


Wi-Fi: 2
GOOGLE: 1
TT: 1
DL: 1
2MWXZH4T: 1
アンインストール: 2
ぼく友: 1
ガラポン: 1
2.7.2: 1
GYH7AFSY: 1
某都: 1
Fuck this shit I: 1
掛布: 1
星飛雄馬: 1
ガチャ: 12
ガチャゲー: 1
C8YENNZG: 1
やめた: 3
ゴミゲー: 1
めちゃ運: 1
っ・ω: 1
ゴールド: 1
ガチャ.: 1
間違いない(* ̄ー ̄: 1
WB5Z2JQX: 1
甲子園: 3
平安: 1
パワプロ: 1
.

So it looks like the results are a bit hit and miss. Things like invitation codes, emoji, etc that probably shouldn't be actual keywords are showing up as entities. However it looks like at least 15 (if you count "ガチャ", "ガチャ.", "ガチャゲー" and "ガラポン" as the same result) out of our sample of 220 users are complaining about the gacha system in this particular app.

I can't exactly say this is a groundbreaking discovery or anything. Having read most of the reviews for the app over the last few months (since my webhook delivers new reviews to my team's slack daily) I pretty much knew that many of the complaints hinged around users not getting the drops they wanted, but I guess this helps quantify it a little better?

Either way it looks like Google's entity search is mostly based around pinpointing terms that can be found on Wikipedia... whereas for an app like the one I run it'd be more useful to do a keyword search for game specific terminology if the goal is to pinpoint features the users might be frustrated about.

But now that I've seen what this API is capable of I'm already starting to think of better applications for this technology (like analyzing followers' tweets to see what other games and manga they are talking about).

I skipped over some details (such as connecting to the db and other things that don't really pertain to using the API), but I've uploaded the script to a gist so feel free to reference it in its entirety in case I lost you at any point.