2000字范文,分享全网优秀范文,学习好帮手!
2000字范文 > 使用Zabbix监控ZooKeeper服务的健康状态

使用Zabbix监控ZooKeeper服务的健康状态

时间:2023-11-17 05:33:32

相关推荐

使用Zabbix监控ZooKeeper服务的健康状态

一 应用场景描述

在目前公司的业务中,没有太多使用ZooKeeper作为协同服务的场景。但是我们将使用Codis作为Redis的集群部署方案,Codis依赖ZooKeeper来存储配置信息。所以做好ZooKeeper的监控也很重要。

二 ZooKeeper监控要点

系统监控

内存使用量 ZooKeeper应当完全运行在内存中,不能使用到SWAP。Java Heap大小不能超过可用内存。

Swap使用量 使用Swap会降低ZooKeeper的性能,设置vm.swappiness = 0

网络带宽占用 如果发现ZooKeeper性能降低关注下网络带宽占用情况和丢包情况,通常情况下ZooKeeper是20%写入80%读入

磁盘使用量 ZooKeeper数据目录使用情况需要注意

磁盘I/O ZooKeeper的磁盘写入是异步的,所以不会存在很大的I/O请求,如果ZooKeeper和其他I/O密集型服务公用应该关注下磁盘I/O情况

ZooKeeper监控

zk_avg/min/max_latency 响应一个客户端请求的时间,建议这个时间大于10个Tick就报警

zk_outstanding_requests 排队请求的数量,当ZooKeeper超过了它的处理能力时,这个值会增大,建议设置报警阀值为10

zk_packets_received 接收到客户端请求的包数量

zk_packets_sent 发送给客户单的包数量,主要是响应和通知

zk_max_file_descriptor_count 最大允许打开的文件数,由ulimit控制

zk_open_file_descriptor_count 打开文件数量,当这个值大于允许值得85%时报警

Mode 运行的角色,如果没有加入集群就是standalone,加入集群式follower或者leader

zk_followersleader角色才会有这个输出,集合中follower的个数。正常的值应该是集合成员的数量减1

zk_pending_syncs leader角色才会有这个输出,pending syncs的数量

zk_znode_countznodes的数量

zk_watch_countwatches的数量

Java Heap SizeZooKeeper Java进程的

#echoruok|nc127.0.0.12181imok#echomntr|nc127.0.0.12181zk_version3.4.6-1569965,builton02/20/09:09GMTzk_avg_latency0zk_max_latency0zk_min_latency0zk_packets_received11zk_packets_sent10zk_num_alive_connections1zk_outstanding_requests0zk_server_stateleaderzk_znode_count17159zk_watch_count0zk_ephemerals_count1zk_approximate_data_size6666471zk_open_file_descriptor_count29zk_max_file_descriptor_count102400zk_followers2zk_synced_followers2zk_pending_syncs0#echosrvr|nc127.0.0.12181Zookeeperversion:3.4.6-1569965,builton02/20/09:09GMTLatencymin/avg/max:0/0/0Received:26Sent:25Connections:1Outstanding:0Zxid:0x500000000Mode:leaderNodecount:17159

三 编写Zabbix监控ZooKeeper的脚本和配置文件

要让Zabbix收集到这些监控数据,有两种方法一种是每个监控项目通过zabbix agent单独获取,主动监控和被动监控都可以。还有一种方法就是将这些监控数据一次性使用zabbix_sender全部发送给zabbix。这里我们选择第二种方式。那么采用zabbix_sender一次性发送全部监控数据的脚本就不能像通过zabbix agent这样逐个获取监控项目来编写脚本。

首先想办法将监控项目汇集成一个字典,然后遍历这个字典,将字典中的key:value对通过zabbix_sender的-k和-o参数指定发送出去

echo mntr|nc 127.0.0.1 2181

这条命令可以使用Python的subprocess模块调用,也可以使用socket模块去访问2181端口然后发送命令获取数据,获取到mntr执行的数据后还需要将其转化成为字典数据

即需要将这种样式的数据

zk_version3.4.6-1569965,builton02/20/09:09GMTzk_avg_latency0zk_max_latency0zk_min_latency0zk_packets_received91zk_packets_sent90zk_num_alive_connections1zk_outstanding_requests0zk_server_statefollowerzk_znode_count17159zk_watch_count0zk_ephemerals_count1zk_approximate_data_size6666471zk_open_file_descriptor_count27zk_max_file_descriptor_count102400

转换成为这样的数据

{'zk_followers':2,'zk_outstanding_requests':0,'zk_approximate_data_size':6666471,'zk_packets_sent':2089,'zk_pending_syncs':0,'zk_avg_latency':0,'zk_version':'3.4.6-1569965,builton02/20/09:09GMT','zk_watch_count':2,'zk_packets_received':2090,'zk_open_file_descriptor_count':30,'zk_server_ruok':'imok','zk_server_state':'leader','zk_synced_followers':2,'zk_max_latency':28,'zk_num_alive_connections':2,'zk_min_latency':0,'zk_ephemerals_count':1,'zk_znode_count':17159,'zk_max_file_descriptor_count':102400}

到最后需要使用zabbix_sender发送的数据格式这个样子的

zookeeper.status[zk_version]这是key的名称

zookeeper.status[zk_outstanding_requests]:0zookeeper.status[zk_approximate_data_size]:6666471zookeeper.status[zk_packets_sent]:48zookeeper.status[zk_avg_latency]:0zookeeper.status[zk_version]:3.4.6-1569965,builton02/20/09:09GMTzookeeper.status[zk_watch_count]:0zookeeper.status[zk_packets_received]:49zookeeper.status[zk_open_file_descriptor_count]:27zookeeper.status[zk_server_ruok]:imokzookeeper.status[zk_server_state]:followerzookeeper.status[zk_max_latency]:0zookeeper.status[zk_num_alive_connections]:1zookeeper.status[zk_min_latency]:0zookeeper.status[zk_ephemerals_count]:1zookeeper.status[zk_znode_count]:17159zookeeper.status[zk_max_file_descriptor_count]:102400

精简代码如下:

#!/usr/bin/pythonimportsocket#fromStringIOimportStringIOfromcStringIOimportStringIOs=socket.socket()s.connect(('localhost',2181))s.send('mntr')data_mntr=s.recv(2048)s.close()#printdata_mntrh=StringIO(data_mntr)result={}zresult={}forlineinh.readlines():key,value=map(str.strip,line.split('\t'))zkey='zookeeper.status'+'['+key+']'zvalue=valueresult[key]=valuezresult[zkey]=zvalueprintresultprint'\n\n'printzresult

#pythontest.py{'zk_outstanding_requests':'0','zk_approximate_data_size':'6666471','zk_max_latency':'0','zk_avg_latency':'0','zk_version':'3.4.6-1569965,builton02/20/09:09GMT','zk_watch_count':'0','zk_num_alive_connections':'1','zk_open_file_descriptor_count':'27','zk_server_state':'follower','zk_packets_sent':'542','zk_packets_received':'543','zk_min_latency':'0','zk_ephemerals_count':'1','zk_znode_count':'17159','zk_max_file_descriptor_count':'102400'}{'zookeeper.status[zk_watch_count]':'0','zookeeper.status[zk_avg_latency]':'0','zookeeper.status[zk_max_latency]':'0','zookeeper.status[zk_approximate_data_size]':'6666471','zookeeper.status[zk_server_state]':'follower','zookeeper.status[zk_num_alive_connections]':'1','zookeeper.status[zk_min_latency]':'0','zookeeper.status[zk_outstanding_requests]':'0','zookeeper.status[zk_packets_received]':'543','zookeeper.status[zk_ephemerals_count]':'1','zookeeper.status[zk_znode_count]':'17159','zookeeper.status[zk_packets_sent]':'542','zookeeper.status[zk_open_file_descriptor_count]':'27','zookeeper.status[zk_max_file_descriptor_count]':'102400','zookeeper.status[zk_version]':'3.4.6-1569965,builton02/20/09:09GMT'}

详细代码如下:

#!/usr/bin/python"""CheckZookeeperClusterzookeeperversionshouldbenewerthan3.4.x#echomntr|nc127.0.0.12181zk_version3.4.6-1569965,builton02/20/09:09GMTzk_avg_latency0zk_max_latency4zk_min_latency0zk_packets_received84467zk_packets_sent84466zk_num_alive_connections3zk_outstanding_requests0zk_server_statefollowerzk_znode_count17159zk_watch_count2zk_ephemerals_count1zk_approximate_data_size6666471zk_open_file_descriptor_count29zk_max_file_descriptor_count102400#echoruok|nc127.0.0.12181imok"""importsysimportsocketimportreimportsubprocessfromStringIOimportStringIOimportoszabbix_sender='/opt/app/zabbix/sbin/zabbix_sender'zabbix_conf='/opt/app/zabbix/conf/zabbix_agentd.conf'send_to_zabbix=1#############getzookeeperserverstatusclassZooKeeperServer(object):def__init__(self,host='localhost',port='2181',timeout=1):self._address=(host,int(port))self._timeout=timeoutself._result={}def_create_socket(self):returnsocket.socket()def_send_cmd(self,cmd):"""Senda4letterwordcommandtotheserver"""s=self._create_socket()s.settimeout(self._timeout)s.connect(self._address)s.send(cmd)data=s.recv(2048)s.close()returndatadefget_stats(self):"""GetZooKeeperserverstatsasamap"""data_mntr=self._send_cmd('mntr')data_ruok=self._send_cmd('ruok')ifdata_mntr:result_mntr=self._parse(data_mntr)ifdata_ruok:result_ruok=self._parse_ruok(data_ruok)self._result=dict(result_mntr.items()+result_ruok.items())ifnotself._result.has_key('zk_followers')andnotself._result.has_key('zk_synced_followers')andnotself._result.has_key('zk_pending_syncs'):#####thetreemetricsonlyexposedonleaderrolezookeeperserver,wejustsetthefollowers'to0leader_only={'zk_followers':0,'zk_synced_followers':0,'zk_pending_syncs':0}self._result=dict(result_mntr.items()+result_ruok.items()+leader_only.items())returnself._resultdef_parse(self,data):"""Parsetheoutputfromthe'mntr'4letterwordcommand"""h=StringIO(data)result={}forlineinh.readlines():try:key,value=self._parse_line(line)result[key]=valueexceptValueError:pass#ignorebrokenlinesreturnresultdef_parse_ruok(self,data):"""Parsetheoutputfromthe'ruok'4letterwordcommand"""h=StringIO(data)result={}ruok=h.readline()ifruok:result['zk_server_ruok']=ruokreturnresultdef_parse_line(self,line):try:key,value=map(str.strip,line.split('\t'))exceptValueError:raiseValueError('Foundinvalidline:%s'%line)ifnotkey:raiseValueError('Thekeyismandatoryandshouldnotbeempty')try:value=int(value)except(TypeError,ValueError):passreturnkey,valuedefget_pid(self):#ps-ef|grepjava|grepzookeeper|awk'{print$2}'pidarg='''ps-ef|grepjava|grepzookeeper|grep-vgrep|awk'{print$2}''''pidout=subprocess.Popen(pidarg,shell=True,stdout=subprocess.PIPE)pid=pidout.stdout.readline().strip('\n')returnpiddefsend_to_zabbix(self,metric):key="zookeeper.status["+metric+"]"ifsend_to_zabbix>0:#printkey+":"+str(self._result[metric])try:subprocess.call([zabbix_sender,"-c",zabbix_conf,"-k",key,"-o",str(self._result[metric])],stdout=FNULL,stderr=FNULL,shell=False)exceptOSError,detail:print"Somethingwentwrongwhileexectutingzabbix_sender:",detailelse:print"Simulation:thefollowingcommandwouldbeexecucted:\n",zabbix_sender,"-c",zabbix_conf,"-k",key,"-o",self._result[metric],"\n"defusage():"""Displayprogramusage"""print"\nUsage:",sys.argv[0],"alive|all"print"Modes:\n\talive:Returnpidofrunningzookeeper\n\tall:Sendzookeeperstatsaswell"sys.exit(1)accepted_modes=['alive','all']iflen(sys.argv)==2andsys.argv[1]inaccepted_modes:mode=sys.argv[1]else:usage()zk=ZooKeeperServer()#printzk.get_stats()pid=zk.get_pid()ifpid!=""andmode=='all':zk.get_stats()#printzk._resultFNULL=open(os.devnull,'w')forkeyinzk._result:zk.send_to_zabbix(key)FNULL.close()printpidelifpid!=""andmode=="alive":printpidelse:print0

zabbix配置文件check_zookeeper.conf

UserParameter=zookeeper.status[*],/usr/bin/python/opt/app/zabbix/sbin/check_zookeeper.py$1

重新启动zabbix agent服务

四 制作Zabbix监控ZooKeeper的模板并设置报警阀值

模板参见附件

参考文档:

/how-to-monitor-zookeeper/

/apache/zookeeper/tree/trunk/src/contrib/monitoring

http://john88wang./2165294/1708302

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。