Uncategorized

Memory management under the hood

Memory for a programming language can be allocated via the stack or the heap. Allocation on the stack happens in contiguous blocks of memory. When variables go out of scope these are deallocated automatically. Stack allocation is faster but stack size is limited.

Heap does not have a single block of memory but a set of free regions. Hence it can be expanded as needed. Since memory with the heap is fragmented, access is slower.

There is also the fixed sized segments such as data segment and code segment. Data segment normally contains global variables. Code segments contain the code instructions and constant values.

Traditionally, function parameters and local variables are located on the stack. and dynamic memory allocation is done on the heap.

Memory management in Go

Memory allocation in Go is a bit ambiguous. Go manages its own memory allocation and garbage collection. The language specification does not clearly draw a line about what will be stored where. If more space is needed it will accommodate that.

The initial stack size for a goroutine is only 2kB. Additional memory is allocated as needed. The Go compiler sets the default maximum stack to 1GB on 64-bit systems and 250 MB on 32-bit systems

To its credit, Go prefers allocation on the stack. But stack memory is limited and stack allocation has to be decided at compile time. If the size of the data could change at runtime, heap allocation will happen.

Go uses an optimization technique called escape analysis. The main idea is:

  • If a lifetime of the variable can be determined at compile time, it will be allocated on the stack.
  • If a value of a variable is shared or can escape outside a function scope, it will be allocated on the heap.

To find out what will escape to the heap in your code, use gcflags option:

go build  -gcflags '-m'
or
go tool compile -m test.go

Note that passing -m multiple times will give you a more verbose response.

There are some obvious escape patterns:

E.g.

func foo() *string{
a := "hello world"
return &a
}
go tool compile -m hello.go
command-line-arguments
./hello.go:9:6: can inline foo
./hello.go:6:17: inlining call to foo
./hello.go:6:17: foo() escapes to heap
./hello.go:6:17: &a escapes to heap
./hello.go:6:17: moved to heap: a
./hello.go:11:9: &a escapes to heap
./hello.go:10:2: moved to heap: a

This is a not so obvious example where memory escapes to heap

func main() {
    a := "hello"
   fmt.Println(a)
}

$go tool compile -m hello.go
hello.go:7:13: a escapes to heap
hello.go:7:13: main … argument does not escape

$go tool compile -m -m hello.go
hello.go:5:6: cannot inline main: function too complex: cost 89 exceeds budget 80
hello.go:7:13: a escapes to heap
hello.go:7:13: from … argument (arg to …) at hello.go:7:13
hello.go:7:13: from *(… argument) (indirection) at hello.go:7:13
hello.go:7:13: from … argument (passed to call[argument content escapes]) at hello.go:7:13
hello.go:7:13: main … argument does not escape

This showed that vara escapes to the heap because it is passed as a function argument to a function that takes a variadic argument. Had I passed a to a function that took just a string it would stay on the stack.

Per example below, if I pass a reference to a function, it does not escape to the heap but returning a reference will cause it to escape to the heap.

func main() {
    x := 2
    
    a, b := hello.AddTwo(&x)

    a = a + 2
    *b = *b + 2
}

func AddTwo(a *int) (int, *int) {
    b := 2
    return *a + 2, &b
}

./test.go:12:6: can inline main as: func() { x := 2; a, b = hello.AddTwo(&x); a = a + 2; *b = *b + 2 }
./test.go:14:22: inlining call to hello.AddTwo func(*int) (int, *int) { var hello.b·4 int; hello.b·4 = ; hello.b·4 = int(2); return hello.a + int(2), &hello.b·4 } ./test.go:20:6: can inline AddTwo as: func(int) (int, *int) { b := 2; return *a + 2, &b }
./test.go:14:23: main &x does not escape
./test.go:14:22: main &hello.b·4 does not escape
./test.go:22:17: &b escapes to heap
./test.go:22:17: from ~r2 (return) at ./test.go:22:2
./test.go:21:2: moved to heap: b
./test.go:20:13: AddTwo a does not escape
~/sources/src/github.com/mariadesouza/scratch $

There is no easy way to determine all the memory allocation within your code. For optimization, here are some guidelines:

  • Re-use variables where possible.
  • Run the compile tool with -m to inspect and if time permits, rewrite the code when the variable is detected as escaped to heap.
  • Preallocate memory for arrays if boundary is known rather than use a slice.

Uncategorized

System Design – Scalability

Scalability is defined as the capacity to change in size. In technology, it means the ability to grow and manage increased demand without affecting overall performance of the systems.

Why is scalability so hard?

A lot of systems are built iteratively. They start with an idea and a prototype. Traffic and workload grow over time. Scalability is often an after thought. A lot of times as system resources get more efficient, adding them to the existing ecosystem can expose scaling issues in the other systems.

Then there is a cost issue, especially for smaller companies. Investing in systems or resources that will help your systems scale from the onset is a tough decision to make, especially when operating on a tight budget.

The Fallout

When traffic to your site is light or your consumer base is low, weaknesses in your system are hard to spot. As the workload increases and surpasses the sytems’ ability to scale, performance drops.

Vertical vs Horizontal Scaling

In vertical scaling, we increase the overall capacity by increasing the capacity of the individual systems. E.g. increase CPU, memory etc. Vertical scaling is also referred to as scaling up.

Horizontal scaling or scaling out means adding more machines to your setup. With scaling out, you are spreading the workload across your infrastructure. The most common pattern in horizontal scaling is the use of load balancers that round robin system load.

Scalability strategies

Load Balancers

Use a load balancer for distributing load to systems. This works well if you have stateless applications and any instance put behind a load balancer can handle the load

Caching

Here there can be fast lookup of most recently used results. That will help with high frequency accesses of the same resource. Redis and Memcached are two commonly used caching systems.

NoSQL Databases

NoSQL databases scale better than Relational databases. This is primarily because of the ACID constraint on RDMS. When one or more if the ACID constraints are relaxed, write and read operations can scale.

E.g. Atomic transactions in database means all operations happen or none. However this means there is a write lock on the database till the operations commits or rolls back. In MongoDB a NoSQL database, write operations are atomic on the level of a single document. So even if your are writing multiple operations, other operations may interleave.

Similarly Cassandra drops rules around consistency. There is eventual consistency. This is highly optimal for systems with high write operations.

Content Delivery Networks

A Content Delivery Network will serve the user content from a location as close to the user as possible thus reducing latency.

Communication between microservices

On the surface, this does not look like a system design issue that will affect scale. We could decide, REST is the way we will communicate. However, this decision will not work well when scaling out to build hundreds of microservices that need to communicate. We will need to be able to keep the latency of our responses low. Here is where we could leverage a binary communication protocol like gRPC.

Another communication tool to look into is brokered messaging. This is particular needed to stream large volumes of data between systems. This can be done via a systems like Apache Kafka that create data pipelines between systems.

Always look to the future when designing systems so your systems can withstand the tests of time.

Uncategorized

Git Tips

Partial Checkout of a repo

If you have to work on a big repo and don’t want to clone the entire repo on your machine, you could checkout only part of the repo using sparse checkout option.

This allows you to choose only folders you want to work on within the repo.

  1. Create the directory name same as your repo name
mkdir mybigrepo && cd  mybigrepo

2. Fetch git info on the repo

git remote add -f origin https://github.com/mygituser/mybigrepo.git

3. Specify directories within repo that you want to get

git  config core.sparseCheckout true


echo "builds" >> .git/info/sparse-checkout
echo "products/myproduct1" >> .git/info/sparse-checkout

4. Pull the directories specified

git  pull origin master

This should get you the directories you want.

5. If at a later stage you decide you want additional folders, update your sparse-checkout file

echo "products/myproduct2"  >> .git/info/sparse-checkout

git read-tree -mu HEAD
git pull origin master

Squash commits and amend commit messages

I always like to commit as often as I can, so I don’t lose any changes. But I don’t always want my commits to go in separately. Rebase allows us to reapply commits. To squash a bunch of commits into one we can use git interactive option. The interactive command allows us to change the commits within an editor.

The interactive option opens the default editor. You can change this and set it to use your favorite editor. Mine happens to be emacs.

git config --global core.editor emacs

If I want to squash my last 4 commits into one commit, I start with entering the rebase command.

git rebase -i HEAD~4

This is what the opened file looks like

pick 6f7623f7 remove legacy  files (#21)
pick 14366fc3 added recent builds
pick f4fa9525 fixed issue with refresh
pick d3ba2c02 updated version info

Rebase 4b4ecd33..d3ba2c02 onto 4b4ecd33 (4 commands)
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
# d, drop = remove commit
#
# These lines can be re-ordered; they are executed from top to bottom.
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.
# Note that empty commits are commented out

To squash my commits I change the first four lines to what I want to do. In this case, I will squash the last three commits.

pick 6f7623f7 remove legacy  files (#21)
pick 14366fc3 added recent builds
s f4fa9525 fixed issue with refresh
s d3ba2c02 updated version info

When I save this file it will take me to the editor again to verify my edit message. In my case it will be the commit I chose to keep and squash into.

added recent builds

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
# ...

I usually change the commit message to a more descriptive message that covers all the three commits and save.

Once this is done we push with the force option(–force or -f). That will force the change in the history of the repo.

git push -f 

The branch history will now show only the last commit (14366fc3) and if there was any before that.

To simply amend a commit message you can use

git commit --amend

This will similarly open up in an editor and allow you to change the message. You have to push force once done.

It is fine to leave multiple commits if they cover different functionality changes. We can always squash them later when merging into master as well.

Uncategorized

Writing tests in Go

Unit tests in Go are run using the go test command. To create unit tests in go to check functions, add a file with the same name as the original and add a suffix “_test”

E.g.

lrucache
|- cache.go
|- cache_test.go

To test function Foo your test name should be of the form:

func TestFoo(*testing.T)

Even if a function is not exported, and starts with a lowercase, you need to start it with an uppercase in the test function or it is not run by the go test command. When I first started go, I wrote a small piece of code that have one exported function and when I wrote the test for it. I ran it and I got a “no tests to run” warning. Then I noticed the Golang doc states:

func TestXxx(*testing.T) 
where Xxx does not start with a lowercase letter. 

E.g. Test for function findNumber

func TestFindNumber(t *testing.T) {
    result := findNumber([]int{5, 3, 1})
    expected := 2
    if result != expected {
        t.Error("Incorrect result Expected ", expected, "Got:", result)
    }

}

Test tables

We can use anonymous structs to write very clear table tests that have multiple inputs and expected results without relying on any external package. E.g.

var testTable = []struct { 
isdev bool
expected string
}{
{true, "/Users/"},
{false, "/home/httpd/"},
}
for _, test := range testTable {
config.IsDev = test.isdev
actual := getUsersHomeDir()
if !strings.Contains(actual, test.expected) { t.Errorf("getUsersHomeDir: Expected %s, Got %s",
test.expected, actual)
}
}

Test options

Here are some very useful go test options

//Run Verbose 
go test -v
//run tests with race detector
go test -race

This is a neat trick, I used to test multiple packages in a repo and exclude one or more folders. go list will only include folders with .go files in them. I had functional tests in a folder called test-client written in Go that I wanted to exclude.

go test `go list ./... | grep -v test-client`

Also check out the post Mocking with Golang so you could write unit tests that rely on external dependencies like servers. Using interfaces these can be simulated to write tests without actually accessing the external resource.

Reference Links

Uncategorized

Cryptography

Cryptography is the set of protocols and algorithms for information protection and verification. There are three widely used concepts in Cryptography that are used to achieve data verification, integrity and confidentiality. These are encryption, hashing and salting.

Encryption

Encryption scrambles data so its unreadable by unintended parties. Encryption is two way. When you encrypt something it will be decrypted and used. To encrypt data you normally use a cipher which is an algorithm used to perform the encryption and decryption.

Some popular encryption algorithms include:

AES

AES stands for Advanced Encryption Standard. It is a symmetric encryption algorithm. In symmetric encryption each party has its own key that can both encrypt and decrypt. AES is a common algorithm with SSL/TLS since it is faster and can be used to communicate efficiently.

RSA

RSA is a public key asymmetric encryption algorithm. Asymmetric means there are two different keys. A user publishes a public key. Anyone can use it and send messages to the user. Only user with the private key can read those messages.

Blowfish

Blowfish is also a symmetric cipher. It is mainly used for securing passwords in password management tools.

Hashing

Hashing is the process of creating a map key of fixed length for quick access of data. While encryption protects data that needs to be transferred across a network, hashing can be used to verify that the data was not altered. Each hashing algorithm outputs data at a fixed length. The output is called a hash value, message digest or checksum.

E.g. Here are a few:

HashDigest Size
MD416
MD516
SHA120
SHA22428
SHA25632

The two most popular hashing algorithms are:

MD5

MD5 is not secure and is proven to suffer vulnerabilities. But if the goal is to create a unique hash for lookup it could be used.

SHA

SHA stands for Secure Hashing Algorithm. It is teh most widely used in SSL/TLS cipher suites. SHA1 is deprectaed in favor of SHA2 which is also known as SHA-256.

Salting

Salting is often used in password hashing. A unique value is stored at the end of a password. This value is known as salt. This makes it virtually impossible to apply brute force to decrypt the password. Using a random salt guarantees that no two passwords have the same hash value hence making it harder to decipher.

Uncategorized

Interfaces in Go

An interface type is a method set. If a type contains methods of the interface, it implements the interface. 

A type can implement multiple interfaces. For instance all type implement the empty interface.

An interface may use a  interface type name  in place of a method specification. This is called embedding interface.

type ReadWriter interface {	
Read(b Buffer) bool
Write(b Buffer) bool
}
type File interface {
ReadWriter // same as adding the methods of ReadWriter
Close()
}

The empty interface

A type of empty interface can be used to declare a generic data type. 

E.g. Implementing a linked list in Golang.

linked list is a linear data structure where each element is a separate object. Linked list elements or nodes are not stored at contiguous location; the elements are linked using pointers.

We could declare the linked list node struct as follows

type Node struct {
    Next *Node
    Data interface{}
}

//creates a New Node
func NewNode(d interface{}) *Node {
    nn := Node{Data: d, Next: nil}
    return &nn
}

//Append : Adds a node to linked list
func (n *Node) Append(d interface{}) {
    nn := NewNode(d)

    if n == nil {
        n = nn
    } else {
        m := n
        for m.Next != nil {
            m = m.Next
        }
        m.Next = nn
    }
}

 This allows us to use the same struct to hold data of different types.

Here I use int as the data type for Node.Data

n := NewNode(0) // int Data
n.Append(3)
n.Append(9)
for m := n; m != nil; m = m.Next {
    fmt.Println(m.Data)
}

I could also use string as the data type using the same struct that I defined above.

n1 := NewNode("a") //string Data
n1.Append("b")
n1.Append("c")

Here is the full example:

https://play.golang.org/p/fo_sSndTsmc

I also have examples of using the empty interface when Unmarshalling JSON in my other post: Custom Unmarshal JSON in Go

Useful interfaces in Go stdlib

Error Interface

The error type is an interface that has a method Error.

type error interface {    
Error() string
}

The most commonly used implementation of the error interface is the errorString type in the errors package.

// errorString is a trivial implementation of error.type errorString struct {    
s string
}
func (e *errorString) Error() string {
return e.s
}

Handler Interface

The Handler Interface in the net/http package requires one method ServerHTTP

type Handler interface {	
  ServeHTTP(ResponseWriter, *Request)
}

Within that same package you will find HandlerFunc implements the Handler interface. 

type HandlerFunc func(ResponseWriter, *Request)     

// ServeHTTP calls f(w, r).  
func (f HandlerFunc) ServeHTTP(w ResponseWriter, r *Request) {  
f(w, r)  
}

The HandlerFunc makes it possible for us to pass in any function to make it a Handler. All we would have to do is wrap it in  HandlerFunc. 

http.Handle("/", http.HandlerFunc(indexHandlerHelloWorld))null

We can have also have our own struct that has fields and methods and implements the Handler Interface by defining the ServeHTTP method as a member of the struct. 

Stringer Interface

The fmt package has a stringer interface. It can be implemented by a type that declares the String method. 

type Stringer interface {        
String() string
}

If a type implements the stringer interface, a call to fmt.Println or fmt.Printf of the variable of type will use that method. 

E.g. If you want  to print a struct in a formatted fashion with key names and values, the struct needs to implement the Stringer interface

type Family struct {
    Name string
    Age int
}

func (f *Family) String() string {
    return fmt.Sprintf("Name: %s\t Age: %d", f.Name, f.Age)
}
func main() {
family := []Family{
        {"Alice", 23},
        {"David", 6},
        {"Erik", 2},
        {"Mary", 32},
    }

    for _, i := range family {
        fmt.Println(&i)
    }
}

https://play.golang.org/p/F7rNPyClwG4

The fmt package has other interfaces like Scanner, Formatter and State.

References

https://golang.org/ref/spec#Interface_types



Golang · Uncategorized

Golang Net HTTP Package

Golang’s net/http package can be used to build a web server in a minutes. It packs in a pretty wide use of Golang concepts like functions, interfaces and types to achieve this.

Here is a basic web server using Go:

package main 
import (
"fmt"
"net/http"
)
func main() {
http.HandleFunc("/", handlerHelloWorld)
http.ListenAndServe(":8082", nil)
}

func handlerHelloWorld(w http.ResponseWriter, r *http.Request){
fmt.Fprintf(w, "Hello world")
}

If we run the above server we can make a GET request and the server will print “Hello World”.

What we need to understand that in the background, the package runs a ServeMux to map the url to the handler.

What is ServeMux?

A ServeMux is a HTTP request multiplexer or router that  matches the incoming requests with a set of registered patterns and  calls  the associated handler for that pattern.

http.ListenAndServe has the following signature

func ListenAndServe(addr string, handler Handler) error

If we pass nil as the handler, as we did in or basic server example, the DefaultServeMux will be used.

ServeMux struct contains the following four vital functions that are key to the working of the http package:

func (mux *ServeMux) Handle(pattern string, handler Handler)
func (mux *ServeMux) HandleFunc(pattern string, handler func(ResponseWriter, *Request))
func (mux *ServeMux) Handler(r *Request) (h Handler, pattern string)
func (mux *ServeMux) ServeHTTP(w ResponseWriter, r *Request)

What is a Handler?

Notice that ServeMux has a function named Handler that takes in a reference to a http.Request param and returns a object of type Handler.   Made my head spin a bit when I first saw that!

But looking under the hood, it turns out, http.Handler is simply an interface. Any object can be made a handler as long as it implements the ServeHTTP function with the following signature.

 ServeHTTP(ResponseWriter, *Request)

So essentially the default ServeMux is a type of Handler since it implements ServeHTTP.

HandleFunc and Handle

In our simple server code above, we did not define a Handler that implements ServeHTTP nor did we define a ServeMux. Instead we called HandleFunc and the function that would handle the response.

This is the source code for HandleFunc in the net/http package

func HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {     
DefaultServeMux.HandleFunc(pattern, handler)
}  

Internally this calls the DefaultServerMux’s HandleFunc. If you take a look at the implementation of HandleFunc within ServeMux, here is what you’ll find:

func (mux *ServeMux) HandleFunc(pattern string, handler func(ResponseWriter, *Request)) {   
if handler == nil {  
panic("http: nil handler")  
}   
mux.Handle(pattern, HandlerFunc(handler))   
}

From the net/http source, we find that HandlerFunc type is an adapter to allows the use of an ordinary functions as HTTP handlers.

type HandlerFunc func(ResponseWriter, *Request)       

// ServeHTTP calls f(w, r).  
 func (f HandlerFunc) ServeHTTP(w ResponseWriter, r *Request) {    f(w, r)   
}

The HandlerFunc makes it possible for us to pass in any function to make it a Handler. So in our simple server example above, we could change the HandleFunc call to a call to the Handle function. All we would have to do is wrap it in  HandlerFunc.

http.Handle("/", http.HandlerFunc(indexHandlerHelloWorld))

The Handle function is used when we want to use a custom Handler in our code. 

To demonstrate the use of some of these concepts, here is a simple example of chat server that will receive messages and broadcast them. It uses a Handler that is passed to a ServeMux. 

package main
import (
"encoding/json"
"fmt"
"io/ioutil"
"log"
"net/http"
)

type MessageDigest struct {
Text string json:"message"
ToUser string json:"to"
}
type ChatHandler struct{}
func (c *ChatHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
if r.Body == nil {
return
}
var msg MessageDigest
body, err := ioutil.ReadAll(r.Body)
if err != nil {
http.StatusText(http.StatusInternalServerError),http.StatusInternalServerError)
return
}
err = json.Unmarshal(body, msg)
if err != nil {
http.Error(w, http.StatusText(http.StatusInternalServerError),http.StatusInternalServerError)
return
     }
     w.WriteHeader(http.StatusOK)
fmt.Println("Message for ", msg.ToUser, ": ", msg.Text)
}


func main() {
mux := http.NewServeMux()
chatHandler := new(ChatHandler)
mux.Handle("/ws", chatHandler)
log.Fatal(http.ListenAndServe(":8080", mux))
}