Protobuf之美

最近在研究关于SQL和AI相结合的问题的时候,看到了蚂蚁金服的SQLFlow,在研究其源码过程中发现使用了proto的文件,以及之前在研究Python的AI相关的项目的时候也遇到过相关的是使用,遂现在来进行简单的介绍和学习。

Protobuf简介

什么是Protobuf?

Protobuf是Google Protocol Buffer的简称,是一种轻便高效的结构化数据存储格式,与平台无关、语言无关、可扩展,可用于通讯协议和数据存储等领域。

为什么要用Protobuf?

很多情况下,大家都使用JSON来进行数据的存储,但是与JSON相比,Protobuf有一下的优点:

  • 与Json相比protobuf性能更高,更加规范
  • 编解码速度快,数据体积小
  • 使用统一的规范,不用再担心大小写不同导致解析失败等问题
  • 平台无关,语言无关,可扩展;
  • 提供了友好的动态库,使用简单;
  • 解析速度快,比对应的XML快约20-100倍;
  • 序列化数据非常简洁、紧凑,与XML相比,其序列化之后的数据量约为1/3到1/10

但是也存造成了:

  • 使用时的便利性降低
  • 改动协议字段,需要重新生成文件
  • 数据的可读性降低

如何使用Protobuf?

MacOS下安装

1
brew installl protoc-gen-go

Linux下安装

源码安装
1
2
3
4
git clone https://github.com/protocolbuffers/protobuf.git
cd protobuf

./autogen.sh && ./configure && make && make install

验证是否安装成功:

1
2
3
protoc --version

>>> libprotoc 3.10.0

在安装成功后,我们就可以尽情的使用它啦。

Protobuf在各种语言中的使用

执行protoc -h可以看出它的使用方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
Usage: protoc [OPTION] PROTO_FILES
Parse PROTO_FILES and generate output based on the options given:
-IPATH, --proto_path=PATH Specify the directory in which to search for
imports. May be specified multiple times;
directories will be searched in order. If not
given, the current working directory is used.
If not found in any of the these directories,
the --descriptor_set_in descriptors will be
checked for required proto file.
--version Show version info and exit.
-h, --help Show this text and exit.
--encode=MESSAGE_TYPE Read a text-format message of the given type
from standard input and write it in binary
to standard output. The message type must
be defined in PROTO_FILES or their imports.
--decode=MESSAGE_TYPE Read a binary message of the given type from
standard input and write it in text format
to standard output. The message type must
be defined in PROTO_FILES or their imports.
--decode_raw Read an arbitrary protocol message from
standard input and write the raw tag/value
pairs in text format to standard output. No
PROTO_FILES should be given when using this
flag.
--descriptor_set_in=FILES Specifies a delimited list of FILES
each containing a FileDescriptorSet (a
protocol buffer defined in descriptor.proto).
The FileDescriptor for each of the PROTO_FILES
provided will be loaded from these
FileDescriptorSets. If a FileDescriptor
appears multiple times, the first occurrence
will be used.
-oFILE, Writes a FileDescriptorSet (a protocol buffer,
--descriptor_set_out=FILE defined in descriptor.proto) containing all of
the input files to FILE.
--include_imports When using --descriptor_set_out, also include
all dependencies of the input files in the
set, so that the set is self-contained.
--include_source_info When using --descriptor_set_out, do not strip
SourceCodeInfo from the FileDescriptorProto.
This results in vastly larger descriptors that
include information about the original
location of each decl in the source file as
well as surrounding comments.
--dependency_out=FILE Write a dependency output file in the format
expected by make. This writes the transitive
set of input file paths to FILE
--error_format=FORMAT Set the format in which to print errors.
FORMAT may be 'gcc' (the default) or 'msvs'
(Microsoft Visual Studio format).
--print_free_field_numbers Print the free field numbers of the messages
defined in the given proto files. Groups share
the same field number space with the parent
message. Extension ranges are counted as
occupied fields numbers.

--plugin=EXECUTABLE Specifies a plugin executable to use.
Normally, protoc searches the PATH for
plugins, but you may specify additional
executables not in the path using this flag.
Additionally, EXECUTABLE may be of the form
NAME=PATH, in which case the given plugin name
is mapped to the given executable even if
the executable's own name differs.
--cpp_out=OUT_DIR Generate C++ header and source.
--csharp_out=OUT_DIR Generate C# source file.
--java_out=OUT_DIR Generate Java source file.
--js_out=OUT_DIR Generate JavaScript source.
--objc_out=OUT_DIR Generate Objective C header and source.
--php_out=OUT_DIR Generate PHP source file.
--python_out=OUT_DIR Generate Python source file.
--ruby_out=OUT_DIR Generate Ruby source file.
@<filename> Read options and filenames from file. If a
relative file path is specified, the file
will be searched in the working directory.
The --proto_path option will not affect how
this argument file is searched. Content of
the file will be expanded in the position of
@<filename> as in the argument list. Note
that shell expansion is not applied to the
content of the file (i.e., you cannot use
quotes, wildcards, escapes, commands, etc.).
Each line corresponds to a single argument,
even if it contains spaces.

从上面可以看出,支持C++、C#、Java、js、odjc、php、python、ruby等各种语言。但是因为本人熟悉Python、Go、C++三种语言的使用,故只介绍Proto在Python、Go中的使用,其他语言的用法参考官网。废话不多说,直接上Hello World

编写proto文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
syntax = "proto3";

package protos;


service Greeter {
rpc SayHello (Request) returns (Response);
}

message Request {
string name = 1;
}

message Response {
string message = 1;
}

Python

1
2
3
4
pip install grpcio-tools
pip install protobuf

python -m grpc_tools.protoc --python_out=. --grpc_python_out=. -I. projectname/protos/helloworld.proto
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# server端代码
# -*- coding: utf-8 -*-
# pylint: disable=C,R,W
'''
# Created on 2019-11-21 08:10:13
# Author: javy@xu
# Email: xujavy@gmail.com
# Description: test_proto.py
'''

import grpc
import time
from concurrent import futures

from projectname.protos import helloworld_pb2, helloworld_pb2_grpc

class Greeter(helloworld_pb2_grpc.GreeterServicer):

def SayHello(self, request, context):
return helloworld_pb2.Response(message='Hello, %s!' % request.name)


def serve():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
server.add_insecure_port('[::]:50051')
server.start()
try:
while True:
time.sleep(60 * 60 * 24)
except KeyboardInterrupt:
server.stop(0)


if __name__ == '__main__':
serve()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# client端代码
# -*- coding: utf-8 -*-
# pylint: disable=C,R,W
'''
# Created on 2019-11-21 18:06:15
# Author: javy@xu
# Email: xujavy@gmail.com
# Description: test_helloworld_client.py
'''

import grpc

from projectname.protos import helloworld_pb2, helloworld_pb2_grpc

def run():
# 连接 rpc 服务器
channel = grpc.insecure_channel('localhost:50051')
# 调用 rpc 服务
stub = helloworld_pb2_grpc.GreeterStub(channel)
response = stub.SayHello(helloworld_pb2.Request(name='World'))
print("Greeter client received: " + response.message)

if __name__ == '__main__':
run()

Go

1
protoc --go_out=plugins=grpc:. go/goprograms/src/protos/helloworld.proto

会在统一目录下生成一个helloworld.pb.go文件。此时可以安装以下的使用。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// server端代码
package main

import (
"context"
"fmt"
"net"

"google.golang.org/grpc"
pb "goprograms.org/goprograms/pkg/protos"
)

const (
port = ":50051"
)

// server is used to implement helloworld.GreeterServer.
type server struct {
pb.UnimplementedGreeterServer
}

// SayHello implements helloworld.GreeterServer
func (s *server) SayHello(ctx context.Context, in *pb.Request) (*pb.Response, error) {
return &pb.Response{Message: "Hello " + in.GetName()}, nil
}

func main() {
lis, err := net.Listen("tcp", port)
if err != nil {
fmt.Printf("failed to listen: %v", err)
}
s := grpc.NewServer()
pb.RegisterGreeterServer(s, &server{})
if err := s.Serve(lis); err != nil {
fmt.Printf("failed to serve: %v", err)
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// client端代码
package main

import (
"context"
"log"
"os"
"time"

"google.golang.org/grpc"
pb "goprograms.org/goprograms/pkg/protos"
)

const (
address = "localhost:50051"
defaultName = "world"
)

func main() {
// Set up a connection to the server.
conn, err := grpc.Dial(address, grpc.WithInsecure(), grpc.WithBlock())
if err != nil {
log.Fatalf("did not connect: %v", err)
}
defer conn.Close()
c := pb.NewGreeterClient(conn)

// Contact the server and print out its response.
name := defaultName
if len(os.Args) > 1 {
name = os.Args[1]
}
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
r, err := c.SayHello(ctx, &pb.Request{Name: name})
if err != nil {
log.Fatalf("could not greet: %v", err)
}
log.Printf("Greeting: %s", r.GetMessage())
}

proto语法介绍

ProtoBuf有两个语言版本: proto2与proto3,截止目前在使用proto3的时候,需要在 *.proto 文件首行中明文标识

helloworld.proto:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
syntax = "proto3";

package protos;


service Greeter {
rpc SayHello (Request) returns (Response);
}

message Request {
string name = 1;
}

message Response {
string message = 1;
}

如果使用的是proto2,在首行可以缺省syntax标识,或者指定syntax=”proto2”; 都可以.
proto3与proto2在语法上有一些区别例如:proto3去除了optional、required等,在语法上更为的简洁,我们这里主要介绍proto3,所以对proto2就不做过多介绍了

import

我们可以使用import语句导入使用其它描述文件中声明的类型:

1
import "others.proto";

默认情况,只能使用直接导入的.proto文件内的定义。但是有时候需要移动.proto文件到其它位置,为了避免更新所有相关文件,可以在原位置放置一个模型.proto文件,使用public关键字,转发所有对新文件内容的引用,例如:

1
2
3
4
5
6
7
8
9
// new.proto
// 所有新的定义在这里
// old.proto
// 客户端导入的原来的proto文件
import public "new.proto";
import "other.proto";
// client.proto
import "old.proto";
// 这里可以使用old.proto和new.proto文件中的定义,但是不能使用other.proto文件中的定义。

protocol编译器会在编译命令中 -I / –proto_path参数指定的目录中查找导入的文件,如果没有指定该参数,默认在当前目录中查找。

package

每个*.proto文件可以指定package作为生成语言的namespace

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
syntax = "proto3";

package protos;


service Greeter {
rpc SayHello (Request) returns (Response);
}

message Request {
string name = 1;
}

message Response {
string message = 1;
}

service

如果想要将消息类型用在RPC(远程方法调用)系统中,可以在.proto文件中定义一个RPC服务接口,protocol编译器会根据所选择的不同语言生成服务接口代码。例如,想要定义一个RPC服务并具有一个方法,该方法接收Request并返回一个Response,此时可以在.proto文件中进行如下定义:

1
2
3
service Greeter {
rpc SayHello (Request) returns (Response);
}

生成的接口代码作为客户端与服务端的约定,服务端必须实现定义的所有接口方法,客户端直接调用同名方法向服务端发起请求。

message

message 用来定义一个数据结构

  • 命名:常规的命名方式建议使用驼峰法,即:HelloWorld 样式
  • 注释: message 中支持 // 这样的单行注释
  • repeated的使用:被repeated标识的字段可以理解为是一个数组,比如:

    1
    2
    3
    4
    5
    message HelloWorld {
    int64 id = 1;
    string name = 2;
    repeated string skills = 3; // 这里表示skills可以接受多个string类型的值
    }
  • enum类型使用:枚举用来表示一定范围内具有相同属性的值,比如:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    syntax = "proto3";

    message HelloWorld {
    int64 id = 1;
    string name = 2;
    enum Skills {
    GOLANG = 0;
    PYTHON = 1;
    CPP = 2;
    }
    Skills skill = 3;
    }
  • message类型的使用:message在定义过程中是可以声明自己定义的message类型,比如:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    syntax = "proto3";

    message HelloWorld {
    int64 id = 1;
    string name = 2;
    Skill skills = 3; //这里声明的为自定义的Skill类型
    }

    message Skill {
    string name = 1;
    }
  • map类型的使用:message定义时可以使用map类型,比如:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    syntax = "proto3";

    message HEllowWorld {
    int64 id = 1;
    string name = 2;
    map<string, Skill> skills = 3;
    }

    message Skill {
    string name = 1;
    }