Uncategorized

Git Tips

Partial Checkout of a repo

If you have to work on a big repo and don’t want to clone the entire repo on your machine, you could checkout only part of the repo using sparse checkout option.

This allows you to choose only folders you want to work on within the repo.

  1. Create the directory name same as your repo name
mkdir mybigrepo && cd  mybigrepo

2. Fetch git info on the repo

git remote add -f origin https://github.com/mygituser/mybigrepo.git

3. Specify directories within repo that you want to get

git  config core.sparseCheckout true


echo "builds" >> .git/info/sparse-checkout
echo "products/myproduct1" >> .git/info/sparse-checkout

4. Pull the directories specified

git  pull origin master

This should get you the directories you want.

5. If at a later stage you decide you want additional folders, update your sparse-checkout file

echo "products/myproduct2"  >> .git/info/sparse-checkout

git read-tree -mu HEAD
git pull origin master

Squash commits and amend commit messages

I always like to commit as often as I can, so I don’t lose any changes. But I don’t always want my commits to go in separately. Rebase allows us to reapply commits. To squash a bunch of commits into one we can use git interactive option. The interactive command allows us to change the commits within an editor.

The interactive option opens the default editor. You can change this and set it to use your favorite editor. Mine happens to be emacs.

git config --global core.editor emacs

If I want to squash my last 4 commits into one commit, I start with entering the rebase command.

git rebase -i HEAD~4

This is what the opened file looks like

pick 6f7623f7 remove legacy  files (#21)
pick 14366fc3 added recent builds
pick f4fa9525 fixed issue with refresh
pick d3ba2c02 updated version info

Rebase 4b4ecd33..d3ba2c02 onto 4b4ecd33 (4 commands)
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
# Note that empty commits are commented out

To squash my commits I change the first four lines to what I want to do. In this case, I will squash the last three commits.

pick 6f7623f7 remove legacy  files (#21)
pick 14366fc3 added recent builds
s f4fa9525 fixed issue with refresh
s d3ba2c02 updated version info

When I save this file it will take me to the editor again to verify my edit message. In my case it will be the commit I chose to keep and squash into.

added recent builds

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# ...

I usually change the commit message to a more descriptive message that covers all the three commits and save.

Once this is done we push with the force option(–force or -f). That will force the change in the history of the repo.

git push -f 

The branch history will now show only the last commit (14366fc3) and if there was any before that.

To simply amend a commit message you can use

git commit --amend

This will similarly open up in an editor and allow you to change the message. You have to push force once done.

It is fine to leave multiple commits if they cover different functionality changes. We can always squash them later when merging into master as well.

Uncategorized

Writing tests in Go

Unit tests in Go are run using the go test command. To create unit tests in go to check functions, add a file with the same name as the original and add a suffix “_test”

E.g.

lrucache
|- cache.go
|- cache_test.go

To test function Foo your test name should be of the form:

func TestFoo(*testing.T)

Even if a function is not exported, and starts with a lowercase, you need to start it with an uppercase in the test function or it is not run by the go test command. When I first started go, I wrote a small piece of code that have one exported function and when I wrote the test for it. I ran it and I got a “no tests to run” warning. Then I noticed the Golang doc states:

func TestXxx(*testing.T) 
where Xxx does not start with a lowercase letter. 

E.g. Test for function findNumber

func TestfindNumber(t *testing.T) {
    result := findNumber([]int{5, 3, 1})
    expected := 2
    if result != expected {
        t.Error("Incorrect result Expected ", expected, "Got:", result)
    }

}

Test tables

We can use anonymous structs to write very clear table tests that have multiple inputs and expected results without relying on any external package. E.g.

var testTable = []struct { 
isdev bool
expected string
}{
{true, "/Users/"},
{false, "/home/httpd/"},
}
for _, test := range testTable {
config.IsDev = test.isdev
actual := getUsersHomeDir()
if !strings.Contains(actual, test.expected) { t.Errorf("getUsersHomeDir: Expected %s, Got %s",
test.expected, actual)
}
}

Test options

Here are some very useful go test options

//Run Verbose 
go test -v
//run tests with race detector
go test -race

This is a neat trick, I used to test multiple packages in a repo and exclude one or more folders. go list will only include folders with .go files in them. I had functional tests in a folder called test-client written in Go that I wanted to exclude.

go test `go list ./... | grep -v test-client`

Also check out the post Mocking with Golang so you could write unit tests that rely on external dependencies like servers. Using interfaces these can be simulated to write tests without actually accessing the external resource.

Reference Links

Uncategorized

Cryptography

Cryptography is the set of protocols and algorithms for information protection and verification. There are three widely used concepts in Cryptography that are used to achieve data verification, integrity and confidentiality. These are encryption, hashing and salting.

Encryption

Encryption scrambles data so its unreadable by unintended parties. Encryption is two way. When you encrypt something it will be decrypted and used. To encrypt data you normally use a cipher which is an algorithm used to perform the encryption and decryption.

Some popular encryption algorithms include:

AES

AES stands for Advanced Encryption Standard. It is a symmetric encryption algorithm. In symmetric encryption each party has its own key that can both encrypt and decrypt. AES is a common algorithm with SSL/TLS since it is faster and can be used to communicate efficiently.

RSA

RSA is a public key asymmetric encryption algorithm. Asymmetric means there are two different keys. A user publishes a public key. Anyone can use it and send messages to the user. Only user with the private key can read those messages.

Blowfish

Blowfish is also a symmetric cipher. It is mainly used for securing passwords in password management tools.

Hashing

Hashing is the process of creating a map key of fixed length for quick access of data. While encryption protects data that needs to be transferred across a network, hashing can be used to verify that the data was not altered. Each hashing algorithm outputs data at a fixed length. The output is called a hash value, message digest or checksum.

E.g. Here are a few:

HashDigest Size
MD416
MD516
SHA120
SHA22428
SHA25632

The two most popular hashing algorithms are:

MD5

MD5 is not secure and is proven to suffer vulnerabilities. But if the goal is to create a unique hash for lookup it could be used.

SHA

SHA stands for Secure Hashing Algorithm. It is teh most widely used in SSL/TLS cipher suites. SHA1 is deprectaed in favor of SHA2 which is also known as SHA-256.

Salting

Salting is often used in password hashing. A unique value is stored at the end of a password. This value is known as salt. This makes it virtually impossible to apply brute force to decrypt the password. Using a random salt guarantees that no two passwords have the same hash value hence making it harder to decipher.

Uncategorized

Interfaces in Go

An interface type is a method set. If a type contains methods of the interface, it implements the interface. 

A type can implement multiple interfaces. For instance all type implement the empty interface.

An interface may use a  interface type name  in place of a method specification. This is called embedding interface.

type ReadWriter interface {	
Read(b Buffer) bool
Write(b Buffer) bool
}
type File interface {
ReadWriter // same as adding the methods of ReadWriter
Close()
}

The empty interface

A type of empty interface can be used to declare a generic data type. 

E.g. Implementing a linked list in Golang.

linked list is a linear data structure where each element is a separate object. Linked list elements or nodes are not stored at contiguous location; the elements are linked using pointers.

We could declare the linked list node struct as follows

type Node struct {
    Next *Node
    Data interface{}
}

//creates a New Node
func NewNode(d interface{}) *Node {
    nn := Node{Data: d, Next: nil}
    return &nn
}

//Append : Adds a node to linked list
func (n *Node) Append(d interface{}) {
    nn := NewNode(d)

    if n == nil {
        n = nn
    } else {
        m := n
        for m.Next != nil {
            m = m.Next
        }
        m.Next = nn
    }
}

 This allows us to use the same struct to hold data of different types.

Here I use int as the data type for Node.Data

n := NewNode(0) // int Data
n.Append(3)
n.Append(9)
for m := n; m != nil; m = m.Next {
    fmt.Println(m.Data)
}

I could also use string as the data type using the same struct that I defined above.

n1 := NewNode("a") //string Data
n1.Append("b")
n1.Append("c")

Here is the full example:

https://play.golang.org/p/fo_sSndTsmc

I also have examples of using the empty interface when Unmarshalling JSON in my other post: Custom Unmarshal JSON in Go

Useful interfaces in Go stdlib

Error Interface

The error type is an interface that has a method Error.

type error interface {    
Error() string
}

The most commonly used implementation of the error interface is the errorString type in the errors package.

// errorString is a trivial implementation of error.type errorString struct {    
s string
}
func (e *errorString) Error() string {
return e.s
}

Handler Interface

The Handler Interface in the net/http package requires one method ServerHTTP

type Handler interface {	
  ServeHTTP(ResponseWriter, *Request)
}

Within that same package you will find HandlerFunc implements the Handler interface. 

type HandlerFunc func(ResponseWriter, *Request)     

// ServeHTTP calls f(w, r).  
func (f HandlerFunc) ServeHTTP(w ResponseWriter, r *Request) {  
f(w, r)  
}

The HandlerFunc makes it possible for us to pass in any function to make it a Handler. All we would have to do is wrap it in  HandlerFunc. 

http.Handle("/", http.HandlerFunc(indexHandlerHelloWorld))null

We can have also have our own struct that has fields and methods and implements the Handler Interface by defining the ServeHTTP method as a member of the struct. 

Stringer Interface

The fmt package has a stringer interface. It can be implemented by a type that declares the String method. 

type Stringer interface {        
String() string
}

If a type implements the stringer interface, a call to fmt.Println or fmt.Printf of the variable of type will use that method. 

E.g. If you want  to print a struct in a formatted fashion with key names and values, the struct needs to implement the Stringer interface

type Family struct {
    Name string
    Age int
}

func (f *Family) String() string {
    return fmt.Sprintf("Name: %s\t Age: %d", f.Name, f.Age)
}
func main() {
family := []Family{
        {"Alice", 23},
        {"David", 6},
        {"Erik", 2},
        {"Mary", 32},
    }

    for _, i := range family {
        fmt.Println(&i)
    }
}

https://play.golang.org/p/F7rNPyClwG4

The fmt package has other interfaces like Scanner, Formatter and State.

References

https://golang.org/ref/spec#Interface_types



Golang · Uncategorized

Golang Net HTTP Package

Golang’s net/http package can be used to build a web server in a minutes. It packs in a pretty wide use of Golang concepts like functions, interfaces and types to achieve this.

Here is a basic web server using Go:

package main 
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/", handlerHelloWorld)
http.ListenAndServe(":8082", nil)
}

func handlerHelloWorld(w http.ResponseWriter, r *http.Request){
fmt.Fprintf(w, "Hello world")
}

If we run the above server we can make a GET request and the server will print “Hello World”.

What we need to understand that in the background, the package runs a ServeMux to map the url to the handler.

What is ServeMux?

A ServeMux is a HTTP request multiplexer or router that  matches the incoming requests with a set of registered patterns and  calls  the associated handler for that pattern.

http.ListenAndServe has the following signature

func ListenAndServe(addr string, handler Handler) error

If we pass nil as the handler, as we did in or basic server example, the DefaultServeMux will be used.

ServeMux struct contains the following four vital functions that are key to the working of the http package:

func (mux *ServeMux) Handle(pattern string, handler Handler)
func (mux *ServeMux) HandleFunc(pattern string, handler func(ResponseWriter, *Request))
func (mux *ServeMux) Handler(r *Request) (h Handler, pattern string)
func (mux *ServeMux) ServeHTTP(w ResponseWriter, r *Request)

What is a Handler?

Notice that ServeMux has a function named Handler that takes in a reference to a http.Request param and returns a object of type Handler.   Made my head spin a bit when I first saw that!

But looking under the hood, it turns out, http.Handler is simply an interface. Any object can be made a handler as long as it implements the ServeHTTP function with the following signature.

 ServeHTTP(ResponseWriter, *Request)

So essentially the default ServeMux is a type of Handler since it implements ServeHTTP.

HandleFunc and Handle

In our simple server code above, we did not define a Handler that implements ServeHTTP nor did we define a ServeMux. Instead we called HandleFunc and the function that would handle the response.

This is the source code for HandleFunc in the net/http package

func HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {     
DefaultServeMux.HandleFunc(pattern, handler)
}  

Internally this calls the DefaultServerMux’s HandleFunc. If you take a look at the implementation of HandleFunc within ServeMux, here is what you’ll find:

func (mux *ServeMux) HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {   
if handler == nil {  
panic("http: nil handler")  
}   
mux.Handle(pattern, HandlerFunc(handler))   
}

From the net/http source, we find that HandlerFunc type is an adapter to allows the use of an ordinary functions as HTTP handlers.

type HandlerFunc func(ResponseWriter, *Request)       

// ServeHTTP calls f(w, r).  
 func (f HandlerFunc) ServeHTTP(w ResponseWriter, r *Request) {    f(w, r)   
}

The HandlerFunc makes it possible for us to pass in any function to make it a Handler. So in our simple server example above, we could change the HandleFunc call to a call to the Handle function. All we would have to do is wrap it in  HandlerFunc.

http.Handle("/", http.HandlerFunc(indexHandlerHelloWorld))

The Handle function is used when we want to use a custom Handler in our code. 

To demonstrate the use of some of these concepts, here is a simple example of chat server that will receive messages and broadcast them. It uses a Handler that is passed to a ServeMux. 

package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
)

type MessageDigest struct {
Text string json:"message"
ToUser string json:"to"
}
type ChatHandler struct{}
func (c *ChatHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if r.Body == nil {
return
}
var msg MessageDigest
body, err := ioutil.ReadAll(r.Body)
if err != nil {
http.StatusText(http.StatusInternalServerError),http.StatusInternalServerError)
return
}
err = json.Unmarshal(body, msg)
if err != nil {
http.Error(w, http.StatusText(http.StatusInternalServerError),http.StatusInternalServerError)
return
     }
     w.WriteHeader(http.StatusOK)
fmt.Println("Message for ", msg.ToUser, ": ", msg.Text)
}


func main() {
mux := http.NewServeMux()
chatHandler := new(ChatHandler)
mux.Handle("/ws", chatHandler)
log.Fatal(http.ListenAndServe(":8080", mux))
}
Uncategorized

Channels and Workerpools

Concurrency are part of the Golang core. They are similar to light weight threads. We run a routine using the go keyword.

go matchRecordsWithDb(record)

Channels

Channels are a way to synchronize and communicate with go routines.

ch := make(chan string) 
ch <- "test" //Send data to unbuffered channel
v := <-ch //receive data from channel and assign to v

The receive here will block till data is available on the channel.

Most programs use multiple go routines and  buffered channels.

doneCh := make(chan bool, 4)

Here we will be able to run a routine 4 times without blocking.

select

select is like a switch with cases that allows you to wait on multiple communication operations. It is a way to receive channel data will block until one of its case is ready.  Select with a default clause is a way to implement non-blocking sends, receives.

select {
case ac <- a:
// sent a on ac
case b:= <-bc:
// received b from bc
}

WorkerPools

I encountered the classic scenario where I had to make thousands of database calls to match records in a payment file. Finding viable matches in the database per line in the file, was slow and proving to much of a hassle. I wanted to add concurrency to my calls to achieve this faster. However, I was restricted by the database connection. I could only send a set number of queries at a time or it would error out.

I started with the naive approach. I  create a buffered channel of 100 that went out and call the matching routine. The requirement was to match with a key in a table and return results.  Some improvement. It did about 100 queries. Wait for those to finish and start the next batch of 100.

const workers = 100 
jobsCh := make(chan int, workers)
for rowIdx := 0; rowIdx < len(fileRecords); rowIdx += workers {
for j = 0; j < workers; j++ {
if (rowIndex + j) >=len(fileRecords) {
break;
}
go matchRecordsWithDb(jobsCh, &fileRecords[rowIdx+j])
} // wait for the 100 workers to return
for i := 0; i < j; i++ {
fmt.Printf("%d", <-jobsCh)
}
}

There was a major snag in this approach. We had a condition if for some reason the line in the file didn’t have the main key, we had to query on another field. This field was not indexed and took a while to query.  It is a very old legacy system so I can’t change the indexing at this point.

In my entire file I had one such record. The iteration of the 100 workers that had among it the routine to do this one query waited almost a minute and a half on that one query, while the 99 others finished. That is when I started looking at design patterns with channels and came across worker pools.

Worker pools is an approach to concurrency in which a fixed number of m workers have to do n number of  tasks in a work queue. Rather than wait on all the workers (channels) at once, as the workers get idle they can be assigned jobs.

The three main players are :

Collector:  Gathers all the jobs

AvailableWorkers Pool: Buffered channel of channels that is used to process the requests

Dispatcher: Pulls work requests off the Collector and sends them to available channels

All the jobs are add to a collector pool. The dispatcher picks jobs off the collector. If there are availableWorkers it gives them the job else it tries to createOne. If all m workers are busy doing jobs the dispatcher will wait on completion to assign the job.

After reading and experimenting with workerPools, I have written a workerPool package that can be used directly or as a guideline to implement your own.

https://github.com/mariadesouza/workerpool

Uncategorized

Microservices with gRPC

A microservice is an independent runnable services that does one task effectively.  The concept is rather than having one monolithic application, we break it up into independent services that can be easily maintained.

To effectively use microservices there has to be a way for the various independent services to communicate with each other.

There are two ways of communication between microservices:

1. REST, such as JSON or XML over http
2. gRPC – Lightweight RPC protocol brought out by Google

What is gRPC?

To understand gRPC we first take a look at RPC.

RPC(Remote Procedure Call) is a form of inter-process communication (IPC), in that different processes have different address spaces. RPC is a kind of request–response protocol. RPC enables data exchange and invocation of functionality residing in a different address space or process.

gRPC is based around the idea of defining a service, specifying the methods that can be called remotely with their parameters and return types. A client application can call methods on a server application as if it were a local object.

  • gRPC uses the new HTTP 2.0 spec
  • It allows for bidirectional streaming
  • It uses binary rather than text and that helps keep the payload compact and efficient.

This is Google’s announcement for gRPC.

So whats the “g” in gRPC? Google? Per the official FAQ page, gRPC stands for  gRPC Remote Procedure Calls i.e. it is a recursive acronym.

Protocol Buffers

gRPC uses protocol buffers as Interface Definition Language (IDL) for describing both the service interface and the structure of the payload messages.

Protocol buffers are a mechanism for serializing structured data. Define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

https://developers.google.com/protocol-buffers/docs/overview

Specify how you want the information you’re serializing to be structured by defining protocol buffer message types in .proto files. This message is encoded to the protocol buffer binary format.

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
}

gRPC in Go

go get -u google.golang.org/grpc
go get -u github.com/golang/protobuf/protoc-gen-go

protobuf.Protobuf allows you to define an interface to your service using a developer friendly format.