随着RPC框架、微服务、云计算、大数据的发展,业务的规模和深度相比过往也都在增加。一个业务可能横跨多个模块/服务/容器,依赖的中间件也越来越多,其中任何一个节点出现异常,都可能导致业务出现波动或者异常,这就导致服务质量监控和异常诊断/定位变得异常复杂。于是催生了新的业务监控模式:调用链跟踪系统APM
在诸多优秀的开源APM产品中Skywalking
和Pinpoint
脱颖而出,两款产品都通过字节码注入的方式,实现了对代码完全无任何侵入。对比如下:
前面我们介绍过单纯Docker方式(docker-compose
)部署Pinpoint, 可以提供参考。本节我们介绍在Kubernetes上部署Skywalking。
1、Helm3
1 2 3
| curl -LO https://get.helm.sh/helm-v3.2.4-linux-amd64.tar.gz tar -zxf helm-v3.2.4-linux-amd64.tar.gz cp linux-amd64/helm /usr/local/bin/helm3
|
2、服务端
Skywalking后端存储,使用EFK日志系统的ES集群。注意index加前缀区分
详细的Elasticsearch集群部署可以参考:Kubernetes日志系统EFK
1 2 3 4 5 6 7 8 9 10 11 12 13
| cd ~/k8s/helm/charts git clone https://github.com/apache/skywalking-kubernetes.git cd skywalking-kubernetes/chart helm dep up skywalking
kubectl create ns skywalking
vim skywalking/values.yaml
helm3 install skywalking skywalking -n skywalking --values ./skywalking/values.yaml helm3 -n skywalking list helm3 -n skywalking delete skywalking helm3 -n skywalking upgrade skywalking --values ./skywalking/values.yaml
|
Helm Values
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
| oap: name: oap dynamicConfigEnabled: false image: repository: apache/skywalking-oap-server tag: 8.1.0-es7 pullPolicy: IfNotPresent storageType: elasticsearch7 ports: grpc: 11800 rest: 12800 replicas: 2 service: type: ClusterIP javaOpts: -Xmx2g -Xms2g antiAffinity: "soft" nodeAffinity: {} nodeSelector: {} tolerations: [] resources: {} env: SW_NAMESPACE: "skywalking" ui: name: ui replicas: 1 image: repository: apache/skywalking-ui tag: 8.1.0 pullPolicy: IfNotPresent ingress: enabled: true annotations: {} path: / hosts: - skywalking.boer.xyz tls: [] elasticsearch: enabled: false config: port: http: 9200 host: "elasticsearch-logging.logging.svc" user: "elastic" password: "<your-es-password>"
|
3、客户端
制作skywalking-agent镜像
1 2 3 4 5 6
| cd ~/k8s/apps/skywalking-agent tar -zxf apache-skywalking-apm-es7-8.1.0.tar.gz cp apache-skywalking-apm-bin-es7/agent agent vim Dockerfile docker build -t registry.boer.xyz/public/skywalking-agent:8.1.0 . docker push registry.boer.xyz/public/skywalking-agent:8.1.0
|
Dockerfile
1 2 3 4
| FROM busybox:latest ENV LANG=C.UTF-8 WORKDIR /usr/skywalking/agent COPY agent/ .
|
skywalking-agent配置
1 2 3 4 5 6 7
| agent.service_name=${SW_AGENT_NAME:Your_ApplicationName} agent.instance_name=${HOSTNAME} collector.backend_service=${SW_AGENT_COLLECTOR_BACKEND_SERVICES:skywalking-oap.skywalking.svc:11800} logging.file_name=${SW_LOGGING_FILE_NAME:skywalking-api.log} logging.level=${SW_LOGGING_LEVEL:INFO} logging.max_file_size=${SW_LOGGING_MAX_FILE_SIZE:31457280}
|
4、使用示例
使用skywalking-agent
一般会想到两种方法:
- 将 agent 包构建到已经存在的基础镜像中
- 通过
initContainer
方式拷贝Agent
initContainer方式将skywalking-agent
拷贝到应用Pod中,无需修改基础JVM镜像,所以更推荐此方法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74
| apiVersion: apps/v1 kind: Deployment metadata: name: produce-deployment annotations: kubernetes.io/change-cause: <CHANGE_CAUSE> spec: selector: matchLabels: app: produce replicas: 2 template: metadata: labels: app: produce spec: initContainers: - image: registry.boer.xyz/public/skywalking-agent:8.1.0 name: skywalking-agent imagePullPolicy: IfNotPresent command: ['sh'] args: ['-c','cp -r /usr/skywalking/agent/* /skywalking/agent'] volumeMounts: - mountPath: /skywalking/agent name: skywalking-agent containers: - name: produce image: <IMAGE>:<IMAGE_TAG> imagePullPolicy: IfNotPresent volumeMounts: - mountPath: /usr/skywalking/agent name: skywalking-agent ports: - containerPort: 10080 resources: requests: memory: "512Mi" cpu: "200m" limits: memory: "1Gi" cpu: "600m" env: - name: ENVIRONMENT value: "pro" - name: SW_AGENT_NAME value: "springboot-produce" - name: JVM_OPTS value: "-Xms512m -Xmx512m -javaagent:/usr/skywalking/agent/skywalking-agent.jar" livenessProbe: httpGet: path: /actuator/health port: 10080 initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 readinessProbe: httpGet: path: /actuator/health port: 10080 initialDelaySeconds: 10 periodSeconds: 10 timeoutSeconds: 5 lifecycle: preStop: exec: command: - "curl" - "-XPOST" - "http://127.0.0.1:10080/actuator/shutdown" imagePullSecrets: - name: regcred volumes: - name: skywalking-agent emptyDir: {}
|
5、Skywalking ES存储索引管理
详细iLM索引生命周期,见Kubernetes日志系统EFK一文
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
| PUT _ilm/policy/skywalking-policy { "policy": { "phases": { "warm": { "min_age": "2d", "actions": { "forcemerge": { "max_num_segments": 1 } } }, "delete": { "min_age": "3d", "actions": { "delete": {} } } } } }
PUT _template/skywalking-template { "index_patterns": ["skywalking_*"], "settings": { "number_of_shards": 3, "number_of_replicas": 0, "index.lifecycle.name": "skywalking-policy", "index.refresh_interval": "30s", "index.translog.durability": "async", "index.translog.sync_interval":"60s" } }
|
6、The show
Ref