golang context的一些思考

前言

因为goroutine，go的并发非常方便，但是这也带来了另外一个问题，当我们进行一个耗时的异步操作时，如何在约定的时间内终止该操作并返回一个自定义的结果？这也是大家常说的我们如何去终止一个goroutine(因为goroutine不同于os线程，没有主动interrupt机制)，这里就轮到今天的主角context登场了。

context源于google，于1.7版本加入标准库，按照官方文档的说法，它是一个请求的全局上下文，携带了截止时间、手动取消等信号，并包含一个并发安全的map用于携带数据。context的API比较简单,标准库实现上也比较干净、独立，接下来我会从具体的使用场景和源码分析两个角度进行阐述。

使用技巧

使用场景一: 请求链路传值

一般来说，我们的根context会在请求的入口处构造如下

1	ctx := context.Background()

如果拿捏不准是否需要一个全局的context，可以使用下面这个函数构造

1	ctx := context.TODO()

但是不可以为nil。
传值使用方式如下

package main

import (
	"context"
	"fmt"
)

func func1(ctx context.Context) {
	ctx = context.WithValue(ctx, "k1", "v1")
	func2(ctx)
}
func func2(ctx context.Context) {
	fmt.Println(ctx.Value("k1").(string))
}

func main() {
	ctx := context.Background()
	func1(ctx)
}

我们在func1通过WithValue(parent Context, key, val interface{}) Context，赋值k1为v1，在其下层函数func2通过ctx.Value(key interface{}) interface{}获取k1的值，比较简单。这里有个疑问，如果我是在func2里赋值，在func1里面能够拿到这个值吗？答案是不能，context只能自上而下携带值，这个是要注意的一点。

使用场景二: 取消耗时操作，及时释放资源

可以考虑这样一个问题，如果没有context包，我们如何取消一个耗时操作呢？我这里模拟了两种写法

网络交互场景,经常通过SetReadDeadline、SetWriteDeadline、SetDeadline进行超时取消


timeout := 10 * time.Second
t = time.Now().Add(timeout)
conn.SetDeadline(t)

耗时操作场景，通过select模拟

package main

import (
	"errors"
	"fmt"
	"time"
)

func func1() error {
	respC := make(chan int)
	// 处理逻辑
	go func() {
		time.Sleep(time.Second * 3)
		respC <- 10
		close(respC)
	}()

	// 超时逻辑
	select {
	case r := <-respC:
		fmt.Printf("Resp: %d\n", r)
		return nil
	case <-time.After(time.Second * 2):
		fmt.Println("catch timeout")
		return errors.New("timeout")
	}
}

func main() {
	err := func1()
	fmt.Printf("func1 error: %v\n", err)
}

以上两种方式在工程实践中也会经常用到，下面我们来看下如何使用context进行主动取消、超时取消以及存在多个timeout时如何处理

主动取消

package main

import (
	"context"
	"errors"
	"fmt"
	"sync"
	"time"
)

func func1(ctx context.Context, wg *sync.WaitGroup) error {
	defer wg.Done()
	respC := make(chan int)
	// 处理逻辑
	go func() {
		time.Sleep(time.Second * 5)
		respC <- 10
	}()
	// 取消机制
	select {
	case <-ctx.Done():
		fmt.Println("cancel")
		return errors.New("cancel")
	case r := <-respC:
		fmt.Println(r)
		return nil
	}
}

func main() {
	wg := new(sync.WaitGroup)
	ctx, cancel := context.WithCancel(context.Background())
	wg.Add(1)
	go func1(ctx, wg)
	time.Sleep(time.Second * 2)
	// 触发取消
	cancel()
	// 等待goroutine退出
	wg.Wait()
}

超时取消

package main

import (
	"context"
	"fmt"
	"time"
)

func func1(ctx context.Context) {
	hctx, hcancel := context.WithTimeout(ctx, time.Second*4)
	defer hcancel()

	resp := make(chan struct{}, 1)
	// 处理逻辑
	go func() {
		// 处理耗时
		time.Sleep(time.Second * 10)
		resp <- struct{}{}
	}()

	// 超时机制
	select {
	//	case <-ctx.Done():
	//		fmt.Println("ctx timeout")
	//		fmt.Println(ctx.Err())
	case <-hctx.Done():
		fmt.Println("hctx timeout")
		fmt.Println(hctx.Err())
	case v := <-resp:
		fmt.Println("test2 function handle done")
		fmt.Printf("result: %v\n", v)
	}
	fmt.Println("test2 finish")
	return

}

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), time.Second*2)
	defer cancel()
	func1(ctx)
}

对于多个超时时间的处理，可以把上述超时取消例子中的注释打开，会观察到，当处理两个ctx时，时间短的会优先触发，这种情况下，如果只判定一个context的Done()也是可以的，但是一定要保证调用到两个cancel函数

源码分析

context中的接口

在使用场景中可以看到context包本身包含了数个导出函数，包括WithValue、WithTimeout等，无论是最初构造context还是传导context，最核心的接口类型都是context.Context，任何一种context也都实现了该接口，包括value context。

到底有几种context?

既然context都需要实现Context，那么包括不直接可见(非导出)的结构体，一共有几种context呢?答案是4种。

类型一: emptyCtx，context之源头

emptyCtx定义如下

1	type emptyCtx int

为了减轻gc压力，emptyCtx其实是一个int，并且以do nothing的方式实现了Context接口，还记得context包里面有两个初始化context的函数

1 2	func Background() Context func TODO() Context

这两个函数返回的实现类型即为emptyCtx，而在contex包中实现了两个emptyCtx类型的全局变量: background、todo，其定义如下

var (
	background = new(emptyCtx)
	todo       = new(emptyCtx)
)

上述两个函数依次对应这两个全局变量。到这里我们可以很确定地说context的根节点就是一个int全局变量，并且Background()和TODO()是一样的。所以千万不要用nil作为context，并且从易于理解的角度出发，未考虑清楚是否传递、如何传递context时用TODO，其他情况都用Background()，如请求入口初始化context

类型二: cancelCtx，cancel机制之灵魂

cancelCtx的cancel机制是手工取消、超时取消的内部实现，其定义如下

type cancelCtx struct {
	Context

	mu       sync.Mutex
	done     chan struct{}
	children map[canceler]struct{}
	err      error 
}

这里的mu是context并发安全的关键、done是通知的关键、children存储结构是内部最常用传导context的方式。

类型三: timerCtx，cancel机制的场景补充

timerCtx内部包含了cancelCtx，然后通过定时器，实现了到时取消的功能，定义如下

type timerCtx struct {
	cancelCtx
	timer *time.Timer // Under cancelCtx.mu.

	deadline time.Time
}

这里deadline只做记录、String()等边缘功能，timer才是关键。

类型四: valueCtx，传值

valueCtx是四个类型的最后一个，只用来传值，当然也可以传递，所有context都可以传递，定义如下

type valueCtx struct {
	Context
	key, val interface{}
}

由于有的人认为context应该只用来传值、有的人认为context的cancel机制才是核心，所以对于valueCtx也在下面做了一个单独的介绍，大家可以通过把握内部实现后按照自己的业务场景做一个取舍(传值可以用一个全局结构体、map之类)。

value context的底层是map吗?

在上面valueCtx的定义中，我们可以看出其实value context底层不是一个map，而是每一个单独的kv映射都对应一个valueCtx，当传递多个值时就要构造多个ctx。同时，这要是value contex不能自低向上传递值的原因。

valueCtx的key、val都是接口类型，在调用WithValue的时候，内部会首先通过反射确定key是否可比较类型(同map中的key)，然后赋值key

在调用Value的时候，内部会首先在本context查找对应的key，如果没有找到会在parent context中递归寻找，这也是value可以自顶向下传值的原因。

context是如何传递的

首先可以明确，任何一种context都具有传递性，而传递性的内在机制可以理解为: 在调用WithCancel、WithTimeout、WithValue时如何处理父子context。从传递性的角度来说，几种With*函数内部都是通过propagateCancel这个函数来实现的，下面以WithCancel函数为例

func WithCancel(parent Context) (ctx Context, cancel CancelFunc) {
	c := newCancelCtx(parent)
	propagateCancel(parent, &c)
	return &c, func() { c.cancel(true, Canceled) }
}

newCancelCtx是cancelCtx赋值父context的过程，而propagateCancel建立父子context之间的联系。

propagateCance定义如下

func propagateCancel(parent Context, child canceler) {
	if parent.Done() == nil {
		return // parent is never canceled
	}
	if p, ok := parentCancelCtx(parent); ok {// context包内部可以直接识别、处理的类型
		p.mu.Lock()
		if p.err != nil {
			// parent has already been canceled
			child.cancel(false, p.err)
		} else {
			if p.children == nil {
				p.children = make(map[canceler]struct{})
			}
			p.children[child] = struct{}{}
		}
		p.mu.Unlock()
	} else {// context包内部不能直接处理的类型，比如type A struct{context.Context},这种静默包含的方式
		go func() {
			select {
			case <-parent.Done():
				child.cancel(false, parent.Err())
			case <-child.Done():
			}
		}()
	}
}

1.如果parent.Done是nil，则不做任何处理，因为parent context永远不会取消，比如TODO()、Background()、WithValue等。
2.parentCancelCtx根据parent context的类型，返回bool型ok，ok为真时需要建立parent对应的children，并保存parent->child映射关系(cancelCtx、timerCtx这两种类型会建立，valueCtx类型会一直向上寻找，而循环往上找是因为cancel是必须的，然后找一种最合理的)，这里children的key是canceler接口，并不能处理所有的外部类型，所以会有else，示例见上述代码注释处。对于其他外部类型，不建立直接的传递关系。
parentCancelCtx定义如下

func parentCancelCtx(parent Context) (*cancelCtx, bool) {
	for {
		switch c := parent.(type) {
		case *cancelCtx:
			return c, true
		case *timerCtx:
			return &c.cancelCtx, true
		case *valueCtx:
			parent = c.Context // 循环往上寻找
		default:
			return nil, false
		}
	}
}

context是如何触发取消的

上文在阐述传递性时的实现时，也包含了一部分取消机制的代码，这里不会再列出源码，但是会依据上述源码进行说明。对于几种context，传递过程大同小异，但是取消机制有所不同，针对每种类型，我会一一解释。不同类型的context可以在一条链路进行取消，但是每一个context的取消只会被一种条件触发，所以下面会单独介绍下每一种context的取消机制(组合取消的场景，按照先到先得的原则，无论那种条件触发的，都会传递调用cancel)。这里有两个设计很关键:

cancel函数是幂等的，可以被多次调用。
context中包含done channel可以用来确认是否取消、通知取消。

cancelCtx类型

cancelCtx会主动进行取消，在自顶向下取消的过程中，会遍历children context，然后依次主动取消。
cancel函数定义如下

func (c *cancelCtx) cancel(removeFromParent bool, err error) {
	if err == nil {
		panic("context: internal error: missing cancel error")
	}
	c.mu.Lock()
	if c.err != nil {
		c.mu.Unlock()
		return // already canceled
	}
	c.err = err
	if c.done == nil {
		c.done = closedchan
	} else {
		close(c.done)
	}
	for child := range c.children {
		// NOTE: acquiring the child's lock while holding parent's lock.
		child.cancel(false, err)
	}
	c.children = nil
	c.mu.Unlock()

	if removeFromParent {
		removeChild(c.Context, c)
	}
}

timerCtx类型

WithTimeout是通过WithDeadline来实现的，均对应timerCtx类型。通过parentCancelCtx函数的定义我们知道，timerCtx也会记录父子context关系。但是timerCtx是通过timer定时器触发cancel调用的，部分实现如下

if c.err == nil {
    c.timer = time.AfterFunc(dur, func() {
        c.cancel(true, DeadlineExceeded)
           })
}

静默包含context

这里暂时只想到了静默包含即type A struct{context.Context}的情况。通过parentCancelCtx和propagateCancel我们知道这种context不会建立父子context的直接联系，但是会通过单独的goroutine去检测done channel，来确定是否需要触发链路上的cancel函数，实现见propagateCancel的else部分。

结尾

context的使用注意大致有以下三点:

context只能自顶向下传值，反之则不可以。
如果有cancel，一定要保证调用，否则会造成资源泄露，比如timer泄露。
context一定不能为nil，如果不确定，可以使用context.TODO()生成一个empty的context。

context的实现并不复杂，但在实际使用中能给大家带来不小的便利。使用场景部分通过介绍几个常见的开发场景使大家对号入座，熟练地使用context；源码分析部分，通过了解context的实现，能够在context使用中更加得心应手，做到知其然知其所以然，谢谢。

参考资料

golang官方包

Go Concurrency Patterns: Context

etcd客户端超时处理示例代码