2017-05-10

nginx的配置管理

1.介绍

在介绍实现之前，我们先看一下，nginx的配置文件是如何管理的。

user  nobody;
daemon on;
worker_processes  1;
error_log  logs/debug.log  debug;

pid        run/nginx.pid;

#thread_pool default threads=64 max_queue=65536;

events {
    use epoll;
    worker_connections  1024;
    multi_accept on;            #if on, accept as many connections as possible in one time
    accept_mutex off;
}
http {
    access_log  logs/access.log  main;
    sendfile    on;
    keepalive_timeout  65;
    keepalive_requests 1000;
    server {
        servername   www.test1.com
        listen       80;
        access_log  logs/host.access.log  main;
        location / {
            root   html/test1;
        }


    server {
        servername   www.test2.com
        listen       443;
        access_log  logs/host.access.log  main;

        location / {
            root   html/test2;
        }
}

大致如上的样子，每一行都可以认为是一个command，有其参数。重要的是，每个指令有起生效范围。返回包含三类：

NGX_HTTP_MAIN_CONF
NGX_HTTP_SRV_CONF
NGX_HTTP_LOC_CONF

当我们写nginx插件的时候，会定义指令可以配置在哪个返回，举access_log为例：

{ ngx_string("access_log"),
     NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONF|NGX_HTTP_LIF_CONF
                       |NGX_HTTP_LMT_CONF|NGX_CONF_1MORE,
     ngx_http_log_set_log,
     NGX_HTTP_LOC_CONF_OFFSET,
     0,
     NULL }

可以看出，这个指令即可以配置在main段可以server段也可以location段。而pid指令就只能配置在main段里

{ ngx_string("pid"),
     NGX_MAIN_CONF|NGX_DIRECT_CONF|NGX_CONF_TAKE1,
     ngx_conf_set_str_slot,
     0,
     offsetof(ngx_core_conf_t, pid),
     NULL },

在介绍实现之前，我们先理一个基本的规则，方便理解：

main的实例只有一个，即全局的
server实例有很多个，存在于main内
location的实例有很多个，存在一个server内
每个模块在可以有若干配置指令，指令的生效范围可以是前面的3个实例

好，后面就是我们的重点了，模块的配置到底是如何保存在各个实例中的（main实例，server实例，以及location实例）

2.配置的管理

2.1. 模块的分类

nginx的模块，大致分两类吧，也许有别的，但我不关心

CORE模块
HTTP模块

core模块是最基础的模块，比如events模块,log模块，thread pool模块，http_core模块等，
HTTP模块是处理http时候使用的模块，我们增加的扩展大多属于这类模块。

CORE模块是第一层级，HTTP模块是http_core模块的下一层。这和配置管理很有关系。

2.2. core模块的配置

大家都知道nginx的配置，都保存在cycle->conf_ctx里，是一个void**类型，但为啥这么多层，我现在仍不知情。
不过不影响我们看逻辑：他的第一层下标就是core模块的index，比如http_core模块的index是20，那么http_core模块对应的配置文件便是cycle->conf_ctx[20]，这样就脱调了一层（我没看到哪里脱第二层和第三层）。
core 模块有这么几个

ngx_core_module
ngx_events_module
ngx_openssl_module
ngx_google_perftools_module
ngx_http_module
ngx_errlog_module
ngx_mail_module
ngx_regex_module
ngx_stream_module
ngx_thread_pool_module

变量名	模块名	配置文件类型
ngx_core_module	core	ngx_core_conf_t
ngx_http_module	http	无
ngx_stream_module	stream	无
ngx_google_perftools_module	google_perftools	ngx_google_perftools_conf_t
ngx_events_module	events	无
ngx_errlog_module	errlog	无
ngx_mail_module	mail	无
ngx_regex_module	regex	ngx_regex_conf_t
ngx_openssl_module	openssl	ngx_openssl_conf_t
ngx_thread_pool_module	thread_pool	ngx_thread_pool_conf_t

core module的定义如下

typedef struct {
    ngx_str_t             name;
    void               *(*create_conf)(ngx_cycle_t *cycle);
    char               *(*init_conf)(ngx_cycle_t *cycle, void *conf);
} ngx_core_module_t;

可以看出，有的是create_conf是有值的，比如mail/regex/thread_pool/core，其他的是没有的，为什么会有这样的分别呢?
我们可以看看core这个module得conf

typedef struct {
    ngx_flag_t                daemon;
    ngx_flag_t                master;

    ngx_msec_t                timer_resolution;

    ngx_int_t                 worker_processes;
    ngx_int_t                 debug_points;

    ngx_int_t                 rlimit_nofile;
    off_t                     rlimit_core;

    int                       priority;

    ngx_uint_t                cpu_affinity_auto;
    ngx_uint_t                cpu_affinity_n;
    ngx_cpuset_t             *cpu_affinity;

    char                     *username;
    ngx_uid_t                 user;
    ngx_gid_t                 group;

    ngx_str_t                 working_directory;
    ngx_str_t                 lock_file;

    ngx_str_t                 pid;
    ngx_str_t                 oldpid;

    ngx_array_t               env;
    char                    **environment;
} ngx_core_conf_t;

可以看出，里边都是一些全局的配置，比如是否daemon，pid等等。都很直接。

那么为什么其他的模块，比如event/http/stream 都没有对应的conf呢？
因为http/stream 可以有多个，每个http和stream，还有其更细分的模块，不再是NGX_CORE_MODULE这个类型了。
比如：

http有一个对应ngx_http_core_module，其类型为NGX_HTTP_MODULE
stream有一个对应的ngx_stream_core_module，其类型为NGX_STREAM_MODULE
event有一个对应的ngx_event_core_module，其对应的类型为NGX_EVENT_MODULE

这里errlog是一个另类，欠一个债，后续补充

对于一个新增的模块，一般需要写很一些配置命令，这些配置命令有的是通过set_slot的方式写入到main/svc/loc的配置conf中，有的是调用函数。
调用函数的格式一般这样

1 2	static char * ngx_http_core_listen(ngx_conf_t cf, ngx_command_t cmd, void *conf)

cf和conf 是什么呢？这个对于理解配置文件解析是很重要的
先说ngx_conf_t，这个是本命令所在实例（main/srv/loc）的总体配置情况。

struct ngx_conf_s {
    char                 *name;        //命令的名字
    ngx_array_t          *args;        //命令的参数

    ngx_cycle_t          *cycle;
    ngx_pool_t           *pool;
    ngx_pool_t           *temp_pool;
    ngx_conf_file_t      *conf_file;
    ngx_log_t            *log;

    void                 *ctx;         // 最重要的部分
    ngx_uint_t            module_type;
    ngx_uint_t            cmd_type;

    ngx_conf_handler_pt   handler;
    void                 *handler_conf;
};

ctx是最重要的部分，他是本command所在的实例的上下文信息，对于不通的模块，结构不太一样
对于http来讲，其为ngx_http_conf_ctx_t。在一个http{},sever{},location{}开始的时候，都会创建一个这样的变量，{}内的命令解析的时候的cf就是这个变量
对于event来讲，其为一个数组指针
对于stream来讲，是ngx_stream_conf_ctx_t

那参数conf又是啥呢？在某个实例内，解析到某个命令的时候，conf是该命令所在模块在这个实例的配置。
比如listen命令定义如下

{ ngx_string("listen"),
  NGX_HTTP_SRV_CONF|NGX_CONF_1MORE,
  ngx_http_core_listen,
  NGX_HTTP_SRV_CONF_OFFSET,
  0,
  NULL },

它定义在server内，模块为ngx_http_core_module，在初始化这个模块的时候，会调用ngx_http_core_create_srv_conf 创建ngx_http_core_srv_conf_t
所以对于listen命令来讲，其conf就是ngx_http_core_srv_conf_t

总结一下，这样方式的好处就是在处理一个命令的时候，能方便的得到本模块在本实例的配置信息，也能得到本实例的其他模块的配置信息。

对于ngx_http_module，只有一个命令，即 http, 所以每次遇到一个http {}这样的配置段，就会执行ngx_http_block函数
那么ngx_http_block到底执行力啥东西呢？
对于每个http{}段，都会对应一个ngx_http_conf_ctx_t类型的ctx变量。

1 2	static char * ngx_http_block(ngx_conf_t cf, ngx_command_t cmd, void *conf)

创建的ctx变量会当成返回值赋值给参数conf，具体怎么用还不清楚。（这里推测conf和ngx_cycle里的**有关系）

对于每个http{}block来说，都会创建一个ngx_http_conf_ctx_t类型的变量，用于存储各个http模块的ctx

typedef struct {
    void        **main_conf;
    void        **srv_conf;
    void        **loc_conf;
} ngx_http_conf_ctx_t;

抽象的不要不要的，这也是我们要着重说的部分啦

2.3 http_core的main

前面我们提到过，每个http模块的指令可以定义在不同的范围生效，这里先介绍最简单的情况：在main里生效的情况。这也是最简单的情况了。这个和前面提到的NGX_CORE_MODULE类型的模块类似，是以数组的方式组织的，下标即http模块的下标。
对于解析到一个http{}时候，会创建一个ctx，这个ctx就是下面的结构

typedef struct {
    void        **main_conf;
    void        **srv_conf;
    void        **loc_conf;
} ngx_http_conf_ctx_t;

main_conf,保留所有模块的main conf
srv_conf和loc_conf，看到这里的时候，敏感的人就会想，这里明明是main即全局的ctx，srv_conf和loc_conf会不会是没用的？答案是有用的，干啥用的，后面一起讲

每个模块的main conf就保存在**main_conf里，比如access_log模块index是32，那么他的main conf就是：

1	((ngx_http_conf_ctx_t*)(cycle->conf_ctx[http_core_module_index]))->main_conf[32];

2.4 http_core的server

模块的指令定义在server段的话，是保存在哪里呢？server实例不是全局唯一的，而是可以定义多个的，所以先需要知道server实例保存在哪里，进而知道如何在server实例里保存这个模块的配置。
在2.3里提到过，每个HTTP模块通常有一个main_conf，注意有一个特殊的HTTP模块，那就是 ngx_http_core_module, 它有一个main conf，即ngx_http_core_main_conf_t。我们定义在一个
http{}的所有server实例ngx_http_core_srv_conf_t就保存在ngx_http_core_main_conf_t的servers成员里。

留意一下ngx_http_core_srv_conf_t本身里也有一个ngx_http_conf_ctx_t，那么这个家伙在这里又怎么理解呢？

main_conf直接复用父实例的main_conf，因为这里已经是server段了
srv_conf保留了本实例里所有http模块的server conf配置，以模块下标索引，类似之前的思路
loc_conf，和前面一样，这里是server段的ctx，那么loc_conf是不是没啥用，同样是有用的。

2.5 http_core的location

和前面的思路一样，我们需要知道location实例的藏匿点。
server实例们保存在ngx_http_core_main_conf_t里，那么location们按理应该保存在ngx_http_core_srv_conf_t里。
但失望了，它们却保存在ngx_http_core_loc_conf_t里，主要是因为location是可以嵌套的。

1	ngx_queue_t *locations;

每个locations成员的类型是ngx_http_location_queue_t,具体怎么加入locations，请查阅代码ngx_http_add_location。
但ngx_http_location_queue_t只是一个封装，具体有逻辑的结构还是ngx_http_core_loc_conf_t。
ngx_http_core_loc_conf_t里有一个loc_conf的成员，答对了，他就是保存所有模块location配置的地方。

1 2	/* pointer to the modules' loc_conf / void *loc_conf;

ngx_http_core_srv_conf_t里没有保存各个模块的srv_conf list只是保存了ctx，可以根据ctx间接找到所有模块的srv conf，为什么ngx_http_core_loc_conf_t里保存了loc_conf呢，
且没有ngx_http_conf_ctx_t类型的成员了呢？这个动机我还是不太清楚。
但这里有点需要指出的是，location是可以嵌套的。ngx_http_core_srv_conf_t的ctx的http_core模块，有个loc_conf[ngx_http_core_module.ctx_index]。
可以认为是这个server{}默认的location，其内的locations，记录了本server{}里的所有location。

2.6 关于merge操作

问题的起源是，如果一个选项比如access_log在main里和server各有配置，怎么办？模块的作者需要自己写merge函数

static ngx_http_module_t  ngx_http_log_module_ctx = {
    NULL,                                  /* preconfiguration */
    ngx_http_log_init,                     /* postconfiguration */

    ngx_http_log_create_main_conf,         /* create main configuration */
    NULL,                                  /* init main configuration */

    NULL,                                  /* create server configuration */
    NULL,                                  /* merge server configuration */

    ngx_http_log_create_loc_conf,          /* create location configuration */
    ngx_http_log_merge_loc_conf            /* merge location configuration */
};

那么nginx是怎么回调用户自己的函数的呢？
而且回掉的时候的参数类型是一样的，比如一个模块的作者定义了一条指令 xxx on/off。这个指令可以在不同范围生效。
对应不同段的定义如下


typedef struct {
    ngx_flag_t                           enable;
} ngx_http_xxx_loc_conf_t

typedef struct {
    ngx_flag_t                           enable;
} ngx_http_xxx_srv_conf_t;

typedef struct {
    ngx_flag_t                           enable;
} ngx_http_xxx_main_conf_t;

假设模块的作者想要的规则是使用最小范围的。location没配，则用server里的，server里没配，则用main里的。
当merge location的时候，传入的都是ngx_http_xxx_loc_conf_t类型的。我当时的困惑是，location里的配置当然是ngx_http_xxx_loc_conf_t类型的，但server哪里来ngx_http_xxx_loc_conf_t类型的配置。

再联想一下前面说的问题，为啥创建server实例的时候，会把各个模块的location实例创建出来一个，那时候分明还有没有解析到location配置。这个家伙就是用来merge的。那问题来了，他的enable的值从哪里来呢？当然是在server段里配置 xxx on/off的时候。
其实前面的类型定义的代码是错误的，
当一个指令可以在各个范围生效的时候，需要把这个指令存储在最小的范围里，所以上面的定义应该改成：


typedef struct {
    ngx_flag_t                           enable;
} ngx_http_xxx_loc_conf_t

typedef struct {
    //
} ngx_http_xxx_srv_conf_t;

typedef struct {
    ngx_flag_t                           enable;
} ngx_http_xxx_main_conf_t;

然后定义commands的时候

{
    ngx_string("xxx"),
    NGX_HTTP_MAIN_CONF|NGX_HTTP_SRV_CONF|NGX_HTTP_LOC_CONFNGX_CONF_FLAG,
    ngx_conf_set_flag_slot,
    NGX_HTTP_LOC_CONF_OFFSET,
    offsetof(ngx_http_xxx_loc_conf_t, enable),
    NULL
},

NGX_HTTP_LOC_CONF_OFFSET是告诉nginx，放到ctx的那个conf里，还记得ctx的定义么？

typedef struct {
    void        **main_conf;
    void        **srv_conf;
    void        **loc_conf;
} ngx_http_conf_ctx_t;

然后offset那一行，告诉存储在那个变量里

最后关键来了，怎么merge呢？
答：对于一个模块来说，就是用main实例里的srv_conf和server实例里的srv_conf merge，然后用server实例里的loc_conf和location实例里的loc_conf merge。
注意merge的时候是会改变配置的哦。

2.7 对于一个request，如何找对应的模块的配置

最后看一下运行态，对于一个请求来讲需要找到对应的srv和loc配置，才能处理起来
即ngx_http_get_module_ctx是如何生效的。

#define ngx_http_get_module_main_conf(r, module)                             \
    (r)->main_conf[module.ctx_index]
#define ngx_http_get_module_srv_conf(r, module)  (r)->srv_conf[module.ctx_index]
#define ngx_http_get_module_loc_conf(r, module)  (r)->loc_conf[module.ctx_index]

所以就是看看我们的r是如何赋值 main_conf/svc_conf/loc_conf的

我原来以为看如何找main_conf是最容易的，但感觉有点复杂。
ngx_http_init_connection函数里，

1 2	/* the default server configuration for the address:port */ hc->conf_ctx = hc->addr_conf->default_server->ctx;

但default_server->ctx尚不知如何赋值的。不过不管哪个server他们的main_conf都是一样的，知识srv_conf不一样而已。

然后再ngx_http_create_request里，就用了上面找到的ctx去找main_conf。
其他的两个容易看到 ngx_http_find_virtual_server，帮我们找到ngx_http_core_srv_conf_t配置。
ngx_http_core_find_location帮助我们找到对应的location，即为r->loc_conf赋值，有了这个之后，就可以找到这个location下的所有其他模块的配置

好了，分析结束，上一个大图。

2017-05-05

openssl签发证书

1.介绍

为了搭建https的测试环境，还是别用自己的正式证书了，想自己当把CA，颁发一些证书，自己当根。网上看了一些文章，都是当前目录建立一个demoCA这样的东西，但使用openssl ca 命令的时候，总是各种访问默认目录/etc/pki什么的，所以hi是放弃幻想了。直接用系统的目录做事情吧。

先看一下这个目录的结构吧

    [root@cq01-bce-48-29-31.cq01.baidu.com httpsec]# tree /etc/pki/CA
        /etc/pki/CA
        |-- certs
        |-- crl
        |-- newcerts
        `-- private
    4 directories, 0 files

2.步骤

2.1.生成私钥

进入CA目录，生成我们的一个私钥：

    cd /etc/pki/CA
    openssl genrsa -out private/cakey.pem 2048

大家可能问，公钥跑哪里去了，实际上生成的私钥里既有私钥，也有一些基础信息，用于将来生成公钥

    RSAPrivateKey ::= SEQUENCE {
        version Version,
        modulus INTEGER, -- n
        publicExponent INTEGER, -- e
        privateExponent INTEGER, -- d
        prime1 INTEGER, -- p
        prime2 INTEGER, -- q
        exponent1 INTEGER, -- d mod (p-1)
        exponent2 INTEGER, -- d mod (q-1)
        coefficient INTEGER, -- (inverse of q) mod p
        otherPrimeInfos OtherPrimeInfos OPTIONAL
    }

    RSAPublicKey ::= SEQUENCE {
        modulus INTEGER, -- n
        publicExponent INTEGER -- e
    }

2.2.生成根证书

生成根证书有两种方式，一种是直接生成字签名根证书，一种是先生成根证书的请求文件csr，然后自己跟自己颁根证书

方法一

    openssl req -new -days 3650 -x509 -key ./private/cakey.pem -out  cacert.pem

方法二

    openssl req -new -key ./private/cakey.pem -out  rootca.csr
    openssl req -x509 -days 3650 -key ./private/cakey.pem -in rootca.csr -out cacert.pem

我们可以看下证书的内容

     openssl x509 -in rootca.crt -noout -text

2.4.给别人签发证书

自己当了CA后，就可以给别人签发证书了。客户自己生成一个证书请求，给CA，CA就可以生成证书了

在客户机执行如下操作

    1.生成私钥
    openssl genrsa -out client1_key.pem 2048
    openssl  req -new -key  client1_key.pem  -out client1.csr

最后就是关键的一步，签发证书,表示CA对证书申请者的认同。

    openssl ca -in client1.csr -out client1.pem

中间可能会提示index.txt文件和serial文件，按他的要求有这个两个文件即可，也可以一开始的时候就初始化好

    touch /etc/pki/CA/index.txt
    echo xyz > /etc/pki/CA/serial

好了，搞定！

期间需要注意的是，root CA的证书申请时候的Country,State,Comm那几个字段要一致，否则签发不了证书。可以自己试试。

补一个，证书导出公钥

openssl x509 -inform PEM -in client1.pem -outform PEM -pubkey -noout

2017-05-04

socket编程的内核实现

1.一些全局结构

inetsw_array是全局定义的回调函数，不同协议，不同回调，可以简单看一下tcp协议的ops


    struct proto tcp_prot = {
    .name            = "TCP",
    .owner            = THIS_MODULE,
    .close            = tcp_close,
    .connect        = tcp_v4_connect,
    .disconnect        = tcp_disconnect,
    .accept            = inet_csk_accept,
    .ioctl            = tcp_ioctl,
    .init            = tcp_v4_init_sock,
    .destroy        = tcp_v4_destroy_sock,
    .shutdown        = tcp_shutdown,

tcp socket的ops

    const struct proto_ops inet_stream_ops = {
        .family           = PF_INET,
        .owner           = THIS_MODULE,
        .release       = inet_release,
        .bind           = inet_bind,
        .connect       = inet_stream_connect,
        .socketpair       = sock_no_socketpair,
        .accept           = inet_accept,
        .getname       = inet_getname,
        .poll           = tcp_poll,
        .ioctl           = inet_ioctl,
        .listen           = inet_listen,
        .shutdown       = inet_shutdown,
        .setsockopt       = sock_common_setsockopt,
        .getsockopt       = sock_common_getsockopt,
        .sendmsg       = inet_sendmsg,
        .recvmsg       = inet_recvmsg,
        .mmap           = sock_no_mmap,
        .sendpage       = inet_sendpage,
        .splice_read       = tcp_splice_read,
    #ifdef CONFIG_COMPAT
        .compat_setsockopt = compat_sock_common_setsockopt,
        .compat_getsockopt = compat_sock_common_getsockopt,
        .compat_ioctl       = inet_compat_ioctl,
    #endif
    };

真正开始listener在inet_csk_listen_start

tcp_rcv_state_process
tcp_v4_conn_request

inet_csk_reqsk_queue_hash_add
    inet_csk_reqsk_queue_added
        reqsk_queue_added

tcp_sock有seq ack之类的信息

sock
inet_sock

accept的入口
inet_csk_accept
它是从这个队列里取 struct request_sock_queue *queue = &icsk->icsk_accept_queue;

2017-05-04

关于reuseport那些事儿

nginx开启reuse port后，据说benchmark能跑很多。那么为啥nginx能在reuseport开启的情况下性能提升不少呢？nginx使用reuseport需要注意哪些问题呢？

1.摘要

reuseport是在nginx 1.9.1里提供了支持，官方更是提供了篇幅介绍reuseport带来的好处，主要是benchmark的提升。

具体详情可见：https://www.nginx.com/blog/socket-sharding-nginx-release-1-9-1/

我们这里想介绍一下，在nginx里是如何使用reuseport功能带来性能提升的

2.reuseport的原理

在3.9内核以前，为了支持多进程模型像haproxy，nginx等，大家不约而同的采用的fork的做法，即在父进程里，监听一个IP+port。
然后fork出N个子进程，子进程天然继承了父进程的listen socket的句柄，即可以执行accept操作了。

但因为是fork出来的，所以在kernel里，仍然是一个句柄，多个进程执行accept还是有竞争关系，所以nginx需要配置accept_mutex这样的开关

当开启reuseport后，每个监听地址将会有多个句柄，具体来说是一个worker一个，这样每个worker关心的listen socket就独立开了，自己搞定自己的事，避免了多进程的竞争。

3.reuseport在nginx的使用

通常情况下，使用reuseaddr都是启动多个进程，大家绑定相同的IP和port，然后就可以无限发挥reuseport的特性了。
但nginx毕竟还是采用了fork的模型。那么个是如何充分利用reuseport的呢？看代码吧。
在ngx_clone_listening里有这样的代码：

for (n = 1; n < ccf->worker_processes; n++) {    

    /* create a socket for each worker process */

    ls = ngx_array_push(&cf->cycle->listening);
    if (ls == NULL) {
        return NGX_ERROR;
    }

    *ls = ols;
    ls->worker = n;
}

(注意是从1开始的，因为master会创建一个worker是0的listener)
也就是解析完配置文件后，会根据worker的个数fork出来多个listener对象，统一扔到数组里，那么啥时候打开监听呢?

ngx_init_cycle
—>ngx_open_listening_sockets
—>bind
—>listen

因为设置了reuseport，所以数组里塞进去的ip port重复的listener可以创建好。
比如有8个worker，会创建出8个listen socket。

那么剩下来的问题就是，如何让一个listen socket和worker进程绑定。

那么就看ngx_event_process_init，worker进程初始化event模块的时候，会调用这个函数。

#if (NGX_HAVE_REUSEPORT)
       if (ls[i].reuseport && ls[i].worker != ngx_worker) {
           continue;
       }
#endif

这里可以看出，我只把自己worker对应的listen socket加入到epoll里去。

4.reuseport在nginx使用中遇到的问题

先说现象：
nginx从reuseport升级为非reuseport，以及从多worker升级为少worker都会有大量性能下降。
这里还是需要介绍一下reuseport的升级的流程，好trick。

升级的时候，也就是-USR2的时候，old maste启动新master的时候，会把所有listen socket的句柄们放在新进程的环境变量里。如果reuseport，举监听80端口为例，如果开启了4个worker，那么环境变量则存了4个句柄，格式为
句柄id1；句柄id2；句柄id3；句柄id4。
新的master启动后会把这个4个句柄读出来，注意，这4个句柄在新进程里也是合法的，然后调用各种syscall获得这个句柄的信息

ls[i].sockaddr (调用getsockname())
ls[i].addr_text_max_len
ls[i].addr_text
ls[i].backlog
ls[i].rcvbuf (调用getsockopt())
ls[i].sndbuf (调用getsockopt())
ls[i].accept_filter
ls[i].deferred_accept

这个信息是放在old_cycle里的，然后加载配置文件，配置文件里也依然会监听80端口，这时新的cycle的listening数组里会有一个ngx_listening_t,但是在ngx_http_optimize_servers里会间接调用ngx_clone_listening，来clone出 worker个数的listen句柄，但这时候因为还没有调用listen函数，所以ls[i]的fd是空，肯定不会走后面的listen函数的，因为环境变量已经把老的句柄传递过来了，直接复用即可，而且如果不复用，重新listen的话会出问题的，因为老的句柄在内核有queue，确没人accept。
那么是哪里为新的ngx_listening_t赋值的呢？就是在init_cycle的后面

for (n = 0; n < cycle->listening.nelts; n++) {

    for (i = 0; i < old_cycle->listening.nelts; i++) {
        if (ls[i].ignore) {
            continue;
        }

        if (ls[i].remain) {
            continue;
        }

        if (ls[i].type != nls[n].type) {
            continue;
        }

        if (ngx_cmp_sockaddr(nls[n].sockaddr, nls[n].socklen,
                             ls[i].sockaddr, ls[i].socklen, 1)
            == NGX_OK)
        {
            nls[n].fd = ls[i].fd;
            nls[n].previous = &ls[i];
            ls[i].remain = 1;

            if (ls[i].backlog != nls[n].backlog) {
                nls[n].listen = 1;
            }
...................

对比新的cycle和旧的cycle，如果监听的地址一样，就拿来复用，已经复用的remain会置为1，下一个相同地址的就不会复用了。比如老的cycle里因为reuseport，一个ip+80，开启了4个 listen句柄，新的也开启4个listen结构，在上面的二层循环里，就一次把这个4个句柄赋值给新的 ngx_listening_t的fd。

这里的remain名字起的真是烂啊，我觉得叫copied/inherited都可以。

5.结论

reuseport功能会给nginx的性能带来很大的提升。但是升级的时候由于老的master的延迟退出，会导致在老的master退出之前，性能骤降，这和本来的on the fly upgrade 风格实在是落差不小。
也许是我理解有误，知道的小伙伴可以mail我。qzzhou$126.com

2017-01-20

RSA加密


#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<openssl/rsa.h>
#include<openssl/pem.h>
#include<openssl/err.h>


int base64_encode(char *in_str, int in_len, char *out_str)
{
    BIO *b64, *bio;
    BUF_MEM *bptr = NULL;
    size_t size = 0;

    if (in_str == NULL || out_str == NULL)
        return -1;

    b64 = BIO_new(BIO_f_base64());
    bio = BIO_new(BIO_s_mem());
    bio = BIO_push(b64, bio);

    BIO_write(bio, in_str, in_len);
    BIO_flush(bio);

    BIO_get_mem_ptr(bio, &bptr);
    memcpy(out_str, bptr->data, bptr->length);
    out_str[bptr->length] = '\0';
    size = bptr->length;

    BIO_free_all(bio);
    return size;
}

int base64_decode(char *in_str, int in_len, char *out_str)
{
    BIO *b64, *bio;
    BUF_MEM *bptr = NULL;
    int counts;
    int size = 0;

    if (in_str == NULL || out_str == NULL)
        return -1;

    b64 = BIO_new(BIO_f_base64());
    //BIO_set_flags(b64, BIO_FLAGS_BASE64_NO_NL);

    bio = BIO_new_mem_buf(in_str, in_len);
    bio = BIO_push(b64, bio);

    size = BIO_read(bio, out_str, in_len);
    out_str[size] = '\0';

    BIO_free_all(bio);
    return size;
}


int main(int argc, char** argv)
{

    char data[3048];
    char out[2048];
    char data_base64_pre[3048];
    char data_base64[3048];
    char data_unbase64[3048];

    char *pub_key_path = argv[1];
    char *priv_key_path = argv[2];
    char *data_path = argv[3];
    RSA *pub_rsa;
    RSA *priv_rsa;
    FILE *file;
    int flen,rsa_len;
    int enc_len = 0;
    int dec_len = 0;
    int data_len = 0;
    int data_unbase64_len = 0;
    int data_base64_len = 0;
    int data_base64_pre_len = 0;

    char* enc_buf;
    char* dec_buf;

    BIO* bio_data;

    BIO* bio;
    BIO* bio_enc;

    BIO* bio_64;
    BUF_MEM *bptr;

    if (argc != 4) {
        printf("\n usage:\n");
        printf("\t %s pub_key_file priv_key_file data_file\n", argv[0]);
        exit(0);
    }

    if((file=fopen(pub_key_path,"r"))==NULL){
        perror("open public key file error");
        return NULL;
    }
    if((pub_rsa=PEM_read_RSA_PUBKEY(file,NULL,NULL,NULL))==NULL){
        ERR_print_errors_fp(stdout);
        return NULL;
    }
    fclose(file);

    if((file=fopen(priv_key_path,"r"))==NULL){
        perror("open private key file error");
        return NULL;
    }
    if((priv_rsa=PEM_read_RSAPrivateKey(file,NULL,NULL,NULL))==NULL){
        ERR_print_errors_fp(stdout);
        return NULL;
    }
    fclose(file);

    bzero(data, sizeof(data));
    if((file=fopen(data_path,"r"))==NULL){
        perror("open private key file error");
        return NULL;
    }
    data_len = fread(data, 1, sizeof(data), file);
    printf("read data length:%d\n", data_len);
    fclose(file);

        bio = BIO_new(BIO_s_mem());
        BIO_write(bio, data, data_len);

        bio_enc = BIO_new(BIO_s_mem());

        rsa_len = RSA_size(pub_rsa);
        //printf("pub rsa size:%d\n", rsa_len);
        enc_buf = malloc(rsa_len);
        while(1) {
            char data[100];
            int data_len = BIO_read(bio, data, sizeof(data));
            // printf("read bytes:%d\n", data_len);
            if (data_len <= 0) {
                break;
            }
            bzero(enc_buf, rsa_len);
            enc_len = RSA_public_encrypt(data_len, data, enc_buf, pub_rsa, RSA_PKCS1_PADDING);
            //printf("encoded length:%d\n", enc_len);
            BIO_write(bio_enc, enc_buf, enc_len);
        }
        BIO_free(bio);
        BIO_flush(bio_enc);

        // all the data has been encript
        BIO_get_mem_ptr(bio_enc, &bptr);
        memcpy(data_base64_pre, bptr->data, bptr->length);
        data_base64_pre_len = bptr->length;

        BIO_free(bio_enc);

        //printf("length after encoded:%d\n", data_len);
        //now base64 encode
        data_base64_len = base64_encode(data_base64_pre, data_base64_pre_len, data_base64);
        //printf("data length after base64 encode:%d\n", data_base64_len);
        // printf("data content after base64 encode:\n%s\n", data_base64);
    int i = 0;
    for(i=0;i<1000;i++)
    {
        // now decript
        data_unbase64_len = base64_decode(data_base64, data_base64_len, data_unbase64);
        //printf("data length after base64 decode:%d\n", data_base64_len);
        //printf("data content after base64 decode:%s\n", data_unbase64);

        //printf("base64 decode length:%d\n", data_unbase64_len);
        bio_64 = BIO_new(BIO_s_mem());
        BIO_write(bio_64, data_unbase64, data_base64_len);

        dec_buf = malloc(rsa_len);
        while(1) {
            dec_len = BIO_read(bio_64, dec_buf, rsa_len);
            if(dec_len <= 0) {
                break;
            }
            bzero(out, sizeof(out));
            dec_len = RSA_private_decrypt(dec_len, dec_buf, out, priv_rsa, RSA_PKCS1_PADDING);
            //printf("%s", out);
        }
        BIO_free(bio_64);
    }
}

2016-10-09

nginx进程架构

1.概述

nginx默认采用的是多进程的架构方式。和haproxy有很大的不同，haproxy作者是推荐使用但进程的方式的，因为作者认为单进程的性能能应付大部分的case。而且多进程会带来很复杂的管理面问题，所以也不得宠。
但nginx采用单master+多worker进程的架构方式，天然就是为了多进程服务。
很多人看nginx代码，都迫不及待的看什么样的io模型，怎么快速做的http解析和收发等数据平面的东西，但当面临使用的时候，管理面遇到的问题远远比数据面严重的多，比如我们要是在云环境中使用nginx为用户做负载均衡或者cdn/waf之类的，必然要考虑如下的问题

如何做到横向扩展，比如一台机器启动多少进程？多加机器能解决性能问题么？
m台机器，一台机器n个nginx进程，如何管理这些m*n个进程？比如加载新配置，比如重启死循环或者hang住的进程
当升级重启的时候，如何做到真正的0宕机？
m*n的nginx集群，他们的统计怎么搞？比如访问的top 10 域名是什么？

master进程可以认为是管理平面的东西：

加载/更新配置文件
管理所有worker进程的创建，重启

然后我们看一下nginx是如何管理worker进程和配置文件的更新的

2.关于worker进程的管理

worker进程是从master进程fork出来的进程，nginx提供了几种不同的fork方式：

NGX_PROCESS_NORESPAWN
NGX_PROCESS_JUST_SPAWN
NGX_PROCESS_RESPAWN
NGX_PROCESS_JUST_RESPAWN
NGX_PROCESS_DETACHED

我们一个一个理一下

2.1.NGX_PROCESS_RESPAWN

这个是最常规的操作，fork worker进程的时候设置这个标志，当worker进程因为意外退出的时候，master进程会执行再生(respawn)操作。

static ngx_uint_t
ngx_reap_children(ngx_cycle_t *cycle)
{
    //......
    if (ngx_processes[i].exited) {
        //......
        if (ngx_processes[i].respawn
                && !ngx_processes[i].exiting
                && !ngx_terminate
                && !ngx_quit)
        {
            if (ngx_spawn_process(cycle, ngx_processes[i].proc,
            //.....
        }
    }
}

所以可以认为初次启动master的时候（比如刚启动，比如更新二进制了）都用以这个参数启动worker

2.2.NGX_PROCESS_JUST_RESPAWN

just是刚刚的意思，刚刚spawn出来的，用于更新配置的时候，因为更新配置执行如下的步骤
1.master加载新配置文件
2.fork新的worker进程
3.给使用旧配置文件的worker进程发QUIT信号

第二步fork进程的时候腰加上NGX_PROCESS_JUST_RESPAWN这个标志，用于给第三步区分哪些是旧进程，哪些是新欢。

2.3.NGX_PROCESS_JUST_SPAWN

这个和上一个差不多，用于cache manager，我不喜欢

这里注意一下，上面提到的3个类型，其实是转化成2个标志的，即respawn和just。
just:刚刚搞出来的，别动我，只动就的，用于区分新旧
respawn:本进程被master管理，死的时候可以自动拉起
spwawn由于前面没有re，只是fork出来就拉倒，所以JUST_SPAWN只有just是有含义的

2.4.NGX_PROCESS_DETACHED

这是说fork出来的进程和父进程没有管理的关系，比如nginx的master升级（老版本有bug），新的master从旧的mastr fork出来，就需要这样的标志，fork出来后和父进程没啥关系

2.5.NGX_PROCESS_NORESPAWN

cache loader会用到，当第一次启动的时候，使用NGX_PROCESS_NORESPAWN，就是启动一个进程执行ngx_cache_manager_process_cycle.但需要注意和上面的DETACHED的区别，因为在nginx里，一般父子进程都有很多管道通讯，只有DETACHED的模式下没有pipe通讯，这个NORESPAWN是保留了和父进程的管道通讯的

但是当重新加载配置的时候，还是继续使用NGX_PROCESS_JUST_SPAWN来区分新欢旧爱的

3.关于配置文件的加载过程

修改完配置文件后，通过如下的步骤让配置文件生效

给master进程发送HUP信号

master收到信号后会设置

1	ngx_reconfigure = 1;

然后下个周期检查ngx_reconfigure，调用ngx_init_cycle重新解析配置文件，生成一个cycle，注意一个cycle可以理解对应一个配置文件的周期。
在ngx_init_cycle里会做一些listner的bind和unbind操作，即旧的listener和新的listener的merge，当然还有其他配置的merge。

fork worker进程

worker进程里当然会能访问前面的cycle对象

给所有旧的worker发送NGX_SHUTDOWN_SIGNAL信号

旧的worker进程收到后，会关闭listen socket，然后等所有连接断开后，进程退出。

4.关于二进制的升级

写代码难免有bug，有bug就得改，改了后想生效就得升级。

给master发送一个USR2信号，ngx_change_binary会设置为1.
然后在那个ngx_init_cycle里，master进程会fork进程执行新的二进制（ngx_execute_proc）
ngx_new_binary会赋值为新master的进程id。
master起来后就是全新的master，会自动拉起新的worker进程，注意老master和新master都监听相同的listen socket，因为是fork出来执行execv的所以一样，nginx的listen socket的merger是它的killer feature。

这时候2套master和worker进程都在了，然后给旧的master发送WINCH信号,master会给worker发送graceful shutdown通知
这样就剩下旧的master+新的master+新的worker了，
为啥要留着旧的master呢？因为怕新的二进制有问题，如果有问题的话，

发送HUP给旧的master，旧的worker就起来了
发送TERM给新的master，刚来起来的worker就被干掉了

<div class="ds-thread" data-thread-key="nginx_1" data-title="nginx进程架构" data-url=""></div>

2016-08-21

关于代码优化

我们在做c coding的时候，如何才能才能更高效的呢？

1.读写数据尽量分开

因为CPU在执行内存指令的时候，是以cache line为单位夹在的，比如32或64个字节。
如果读写交叉，很容易造成cache line频繁失效

2.局部变量到底是好是坏，太大了是否可以？

每次夹在函数栈上的东西都要入cache

3.减少code path，检查不必要的调用

4.频繁调用/相关联的函数聚集到一起，一次型夹在到cpu cache

5.数据结构的cache line对其

如果是64 bit的cache line，就让首地址%64 =0

6.参数不能太多，否则参数寄存器就不够用了

7. 延迟计算，需要的时候才计算

8.提前计算+复用结果

9. per-CPU 变量

10.分支预测likely unlikely

11.进程切换会刷tlb，cr3寄存器

12.如何调试coredump

1.根据栈信息
2.根据挂的地址信息和nm出来的结果对比
3.copy越界
4.非法地址访问，不存在/只读用来写等

2016-06-30

neutron资源添加属性

理论上来说neutron client和neutron server是分离的，server端添加属性，client端无影响，只是show或不show而已，但有时候添加了属性，client用-c参数也show不出来。这时候一般是server添加属性的时候拉下地方了。

举例firewall来阐述一下添加属性步骤如下：

数据库增加一列attr_new

这步比较简单，轻松搞定

修改Firewall对象，增加一列，在firewall_db.py里

class Firewall(model_base.BASEV2, models_v2.HasId, models_v2.HasTenant):
    """Represents a Firewall resource."""
    __tablename__ = 'firewalls'
    name = sa.Column(sa.String(255))
    description = sa.Column(sa.String(1024))
    shared = sa.Column(sa.Boolean)
    admin_state_up = sa.Column(sa.Boolean)
    status = sa.Column(sa.String(16))
    firewall_policy_id = sa.Column(sa.String(36),
                                   sa.ForeignKey('firewall_policies.id'),
                                   nullable=True)
    creator = sa.Column(sa.String(255))
    attr_new = sa.Column(sa.String(255))   # new attribute

修改show的地方即，get_firewalls函数,其实主要是函数_make_firewall_dict

def _make_firewall_dict(self, fw, fields=None):
    res = {'id': fw['id'],
           'tenant_id': fw['tenant_id'],
           'name': fw['name'],
           'description': fw['description'],
           'shared': fw['shared'],
           'admin_state_up': fw['admin_state_up'],
           'status': fw['status'],
           'firewall_policy_id': fw['firewall_policy_id'],
           'creator': fw['creator']
           'attr_new': fw['attr_new']}
    return self._fields(res, fields)

别以为完事了，最重要的是修改plugin的RESOURCE_ATTRIBUTE_MAP，这个是每个plugin/service给api的接口，来介绍自己的属性列表

'firewalls': {
    'id': {'allow_post': False, 'allow_put': False,
           'validate': {'type:uuid': None},
           'is_visible': True,
           'primary_key': True},
    'tenant_id': {'allow_post': True, 'allow_put': False,
                  'required_by_policy': True,
                  'is_visible': True},
    'name': {'allow_post': True, 'allow_put': True,
             'validate': {'type:string': None},
    'attr_new': {'allow_post': True, 'allow_put': True,
             'validate': {'type:string': None},

好了，大功告成

如果想用neutron client自动show firewall的新属性则


25 class ListFirewall(neutronv20.ListCommand):
26     """List firewalls that belong to a given tenant."""
27
28     resource = 'firewall'
29     list_columns = ['id', 'name', 'firewall_policy_id', 'attr_new']
30     _formatters = {}
31     pagination_support = True
32     sorting_support = True

2016-06-27

创建资源后如何通知agent

当一个subnet创建后，需要通知dhcp-agent等,比如subnet_delete，subnet_create等，这个notify是什么时候发的呢，原来在API的Controller里

[python]

def delete(self, request, id, **kwargs):
    """Deletes the specified entity."""
    notifier_api.notify(request.context,
                        self._publisher_id,
                        self._resource + '.delete.start',
                        notifier_api.CONF.default_notification_level,
                        {self._resource + '_id': id})
    action = self._plugin_handlers[self.DELETE]

    # Check authz
    parent_id = kwargs.get(self._parent_id_name)
    obj = self._item(request, id, parent_id=parent_id)
    try:
        policy.enforce(request.context,
                       action,
                       obj,
                       resource=id)
    except exceptions.PolicyNotAuthorized as err:
        # To avoid giving away information, pretend that it
        # doesn't exist
        raise webob.exc.HTTPForbidden(explanation=err.msg)

    obj_deleter = getattr(self._plugin, action)
    obj_deleter(request.context, id, **kwargs)
    notifier_method = self._resource + '.delete.end'
    notifier_api.notify(request.context,
                        self._publisher_id,
                        notifier_method,
                        notifier_api.CONF.default_notification_level,
                        {self._resource + '_id': id})
    result = {self._resource: self._view(request.context, obj)}
    self._nova_notifier.send_network_change(action, {}, result)
    self._send_dhcp_notification(request.context,
                                 result,
                                 notifier_method)  <----------

代码里经常看到
create_XXX_precommit
create_XXX_postcommit
这样的函数，我以为是通知agent的呢，但看了子网的这2个家伙，发现只是通知driver而已，通知agent是由上面的notifyer完成的，可以看下面的log

<% codeblock %>
2016-06-27 12:00:55.725 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######3:<stevedore.extension.Extension object at 0x4b85c90>
2016-06-27 12:00:55.736 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######4:
2016-06-27 12:00:55.736 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######5:(<bound method BareMetalMechanismDriver.delete_subnet_postc
ommit of <neutron.plugins.ml2.drivers.mech_baremetal.BareMetalMechanismDriver object at 0x3a42610>>,)
2016-06-27 12:00:55.737 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######6:
2016-06-27 12:00:55.738 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######3:<stevedore.extension.Extension object at 0x492fc50>
2016-06-27 12:00:55.749 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######4:
2016-06-27 12:00:55.750 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######5:(<bound method OpenvswitchMechanismDriver.delete_subnet_pos
tcommit of <neutron.plugins.ml2.drivers.mech_openvswitch.OpenvswitchMechanismDriver object at 0x3a42d50>>,)
2016-06-27 12:00:55.750 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######6:
2016-06-27 12:00:55.751 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######3:<stevedore.extension.Extension object at 0x492fc50>
2016-06-27 12:00:55.762 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######4:
2016-06-27 12:00:55.763 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######5:(<bound method L2populationMechanismDriver.delete_subnet_po
stcommit of <neutron.plugins.ml2.drivers.l2pop.mech_driver.L2populationMechanismDriver object at 0x3a429d0>>,)
2016-06-27 12:00:55.763 14997 ERROR neutron.plugins.ml2.managers [req-1df986e6-3df2-4125-b555-e29ab7df384c None] #######6:

<% endcodeblock %>

另外一个有意思的细节是network什么时候schedule的呢，就是第一个create_port的时候，
Controller在收到发送port_create_end消息的时候，会调用DhcpAgentNotifyAPI的notify
这里的notify就很特殊处理了，不会傻乎乎的直接notify
如果是port_create_end消息，先做schedule，然后才发送消息。
也就是之前虽然有network_create_end, subnet_create_end之类的消息，基本没啥用

<% codeblock %>

# schedule the network first, if needed
schedule_required = method == 'port_create_end'
if schedule_required:
    agents = self._schedule_network(admin_ctx, network, agents)

enabled_agents = self._get_enabled_agents(
    context, network, agents, method, payload)
for agent in enabled_agents:
    self._cast_message(
        context, method, payload, agent.host, agent.topic)

<% endcodeblock %>

2016-06-24

emacs显示行尾空格

emacs写python的时候，行尾的空格能显示还是很不错的，找了好几个插件,blankmode,whitespce啥的，后来发现一个简单的highlight-chars.el就能搞定。
主页：https://www.emacswiki.org/emacs/highlight-chars.el
.emacs里添加
(require ‘highlight-chars) (hc-toggle-highlight-trailing-whitespace)