LinuxEye - Linux系统教程

LinuxEye - Linux系统教程

当前位置: 主页 > Linux教程 >

MooseFS灾备演练实录

时间:2015-07-14 09:41来源:nolinux.blog.51cto.com 编辑:nolinux 点击:
昨天晚上去机房为数据库服务器做磁盘扩容,顺带为目前线上的存储系统MooseFS做了一次灾难演练。故此,今天准备把昨天的灾难演练的详情总结一下,分享给大家。如果大家正在使用
昨天晚上去机房为数据库服务器做磁盘扩容,顺带为目前线上的存储系统MooseFS做了一次灾难演练。故此,今天准备把昨天的灾难演练的详情总结一下,分享给大家。如果大家正在使用MooseFS,那么就可以有所参考了。

       MooseFS是一个分布式的文件系统,有关它的具体信息,我这里就不多做介绍了,大家可以去参考我之前写过的三篇博文:

    分布式文件系统之MooseFS----介绍

    分布式文件系统之MooseFS----部署

    分布式文件系统之MooseFS----管理优化

 

这里简单先介绍一下,目前我们这套存储的架构设计:

wKioL1WM_iHz51SRAAKc8nmYm5o149.jpg

 

服务器的配置情况:

mfsmaster主和mfsmaster备

品牌:Dell PowerEdge R720

CPU:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

内存:8 * 8G

磁盘:300G * 6

RAID级别:10

mfschunkserver

 

品牌:Dell PowerEdge R720

CPU:Intel(R) Xeon(R) CPU E5-2630 v2 @ 2.60GHz

内存:4 * 8G

磁盘:2T * 6

RAID级别:10


架构介绍:

       整个MooseFS的架构,是以两台mfsmaster,一台主,一台备,中间heartbeat+drbd技术来做该服务的高可用,后端放置3台存储节点,负责提供数据存储服务。

 

灾难演练计划:

       本次灾难演练主要从几个会触发故障的因素出发,采用人工模拟故障的方式来真实触发故障,然后记录故障发生前后相关服务的状态,相关服务的日志记录和故障发生前后客户端的服务使用情况等,然后通过分析记录的信息来判断服务的高可用性是否OK。

几个记录点的目标:

       通过记录故障发生前后的服务状态来判断服务切换是否正常

       通过记录发生故障时相关服务的日志记录来分析故障发生时,高可用软件的决策和动作

       通过记录发生故障前后客户端的服务使用情况,来判断故障对客户端的影响程度

模拟如下几个灾难:

       1、heartbeat服务崩溃

       2、mfsmaster主对外服务的网络中断

       3、mfsmaster主的drbd同步网络中断

 

备注:

 

      在测试之前,我会在 mfs 客户端放一个持续输入脚本,它会以一秒的间隔向挂载的mfs目录中的某个文件进行文字输入,以此用来判断 mfs 的恢复时间。

[root@web-phy13-rj ~]# for i in {1..20000};do echo `date` $i >> /mfsdata/testxxxx;sleep 1;done

故障1:heartbeat服务崩溃

模拟故障:

[root@kvm-phy11-rj ~]# /etc/init.d/heartbeat stop
Stopping High-Availability services: Done.

故障发生之前的mfsmaster主备服务器状态:

mfsmaster主:

[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}'
mfs      48931     1  0 20:03 ?        00:00:00 /usr/local/mfs/sbin/mfsmaster start
[root@mfs-master01-rj ~]# ip a|grep  em4
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.1.1.27/24 brd 10.1.1.255 scope global em4
    inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4
[root@mfs-master01-rj ~]# cat /proc/drbd 
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:34
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:10824 nr:3912 dw:12716 dr:16539 al:11 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

mfsmaster备:

[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}'
[root@mfs-master02-rj ~]# ip a|grep  em4
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.1.1.28/24 brd 10.1.1.255 scope global em4
[root@mfs-master02-rj ~]# cat /proc/drbd
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:33
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:3912 nr:10896 dw:106643992 dr:16616 al:10 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

故障发生时,mfsmaster主备服务器的日志信息:

mfsmaster主:

Jun 25 20:05:29 mfs-master01-rj.btr heartbeat: [48425]: info: Heartbeat shutdown in progress. (48425)
Jun 25 20:05:29 mfs-master01-rj.btr heartbeat: [48980]: info: Giving up all HA resources.
ResourceManager(default)[48993]:        2015/06/25_20:05:29 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster
ResourceManager(default)[48993]:        2015/06/25_20:05:29 info: Running /etc/ha.d/resource.d/mfsmaster  stop
ResourceManager(default)[48993]:        2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop
Filesystem(Filesystem_/dev/drbd0)[49056]:       2015/06/25_20:05:31 INFO: Running stop for /dev/drbd0 on /usr/local/mfs
Filesystem(Filesystem_/dev/drbd0)[49056]:       2015/06/25_20:05:31 INFO: Trying to unmount /usr/local/mfs
Filesystem(Filesystem_/dev/drbd0)[49056]:       2015/06/25_20:05:31 INFO: unmounted /usr/local/mfs successfully
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[49048]:    2015/06/25_20:05:31 INFO:  Success
ResourceManager(default)[48993]:        2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/drbddisk drbd stop
ResourceManager(default)[48993]:        2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop
IPaddr(IPaddr_10.1.1.26)[401]:  2015/06/25_20:05:31 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[373]:       2015/06/25_20:05:31 INFO:  Success
Jun 25 20:05:31 mfs-master01-rj.btr heartbeat: [48980]: info: All HA resources relinquished.
Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [2673:2675]
Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: info: No pkts missing from mfs-master02-rj.btr!
Jun 25 20:05:32 mfs-master01-rj.btr heartbeat: [48425]: info: killing /usr/lib64/heartbeat/ipfail process group 48439 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBFIFO process 48428 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48429 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48430 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48431 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48432 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48433 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48434 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBWRITE process 48435 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: killing HBREAD process 48436 with signal 15
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48430 exited. 9 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48433 exited. 8 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48431 exited. 7 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48428 exited. 6 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48432 exited. 5 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48434 exited. 4 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48435 exited. 3 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48436 exited. 2 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: Core process 48429 exited. 1 remaining
Jun 25 20:05:33 mfs-master01-rj.btr heartbeat: [48425]: info: mfs-master01-rj.btr Heartbeat shutdown complete.
mfsmaster备:
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Received shutdown notice from 'mfs-master01-rj.btr'.
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Resources being acquired from mfs-master01-rj.btr.
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47668]: info: acquire local HA resources (standby).
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47669]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys mfs-master02-rj.btr] to acquire.
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [47668]: info: local HA resource acquisition completed (standby).
Jun 25 20:05:31 mfs-master02-rj.btr heartbeat: [44613]: info: Standby resource acquisition done [foreign].
harc(default)[47694]:   2015/06/25_20:05:31 info: Running /etc/ha.d//rc.d/status status
mach_down(default)[47711]:      2015/06/25_20:05:31 info: Taking over resource group IPaddr::10.1.1.26/24/em4
ResourceManager(default)[47738]:        2015/06/25_20:05:31 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[47766]:     2015/06/25_20:05:31 INFO:  Resource is stopped
ResourceManager(default)[47738]:        2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start
IPaddr(IPaddr_10.1.1.26)[47889]:        2015/06/25_20:05:31 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4
IPaddr(IPaddr_10.1.1.26)[47889]:        2015/06/25_20:05:31 INFO: Bringing device em4 up
IPaddr(IPaddr_10.1.1.26)[47889]:        2015/06/25_20:05:31 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[47863]:     2015/06/25_20:05:31 INFO:  Success
ResourceManager(default)[47738]:        2015/06/25_20:05:31 info: Running /etc/ha.d/resource.d/drbddisk drbd start
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48017]:    2015/06/25_20:05:32 INFO:  Resource is stopped
ResourceManager(default)[47738]:        2015/06/25_20:05:32 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start
Filesystem(Filesystem_/dev/drbd0)[48103]:       2015/06/25_20:05:32 INFO: Running start for /dev/drbd0 on /usr/local/mfs
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48095]:    2015/06/25_20:05:32 INFO:  Success
ResourceManager(default)[47738]:        2015/06/25_20:05:32 info: Running /etc/ha.d/resource.d/mfsmaster  start
mach_down(default)[47711]:      2015/06/25_20:05:32 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down(default)[47711]:      2015/06/25_20:05:32 info: mach_down takeover complete for node mfs-master01-rj.btr.
Jun 25 20:05:32 mfs-master02-rj.btr heartbeat: [44613]: info: mach_down takeover complete.
Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: WARN: node mfs-master01-rj.btr: is dead
Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: info: Dead node mfs-master01-rj.btr gave up resources.
Jun 25 20:05:42 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status dead
Jun 25 20:05:42 mfs-master02-rj.btr heartbeat: [44613]: info: Link mfs-master01-rj.btr:em2 dead.
Jun 25 20:05:44 mfs-master02-rj.btr ipfail: [44630]: info: NS: We are still alive!
Jun 25 20:05:44 mfs-master02-rj.btr ipfail: [44630]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status dead
Jun 25 20:05:45 mfs-master02-rj.btr ipfail: [44630]: info: Asking other side for ping node count.
Jun 25 20:05:45 mfs-master02-rj.btr ipfail: [44630]: info: Checking remote count of ping nodes.
故障发生后,mfsmaster主备服务器的状态:

mfsmaster主:

[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}'
[root@mfs-master01-rj ~]# ip a|grep  em4                         
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.1.1.27/24 brd 10.1.1.255 scope global em4
[root@mfs-master01-rj ~]# cat /proc/drbd                         
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:34
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:11876 nr:4536 dw:14392 dr:16551 al:12 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

mfsmaster备:

[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}'
mfs      48197     1  0 20:05 ?        00:00:00 /usr/local/mfs/sbin/mfsmaster start
[root@mfs-master02-rj ~]# ip a|grep  em4                         
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.1.1.28/24 brd 10.1.1.255 scope global em4
    inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4
[root@mfs-master02-rj ~]# cat /proc/drbd                         
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:33
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:4604 nr:11864 dw:106645652 dr:18533 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

客户端的服务中断情况:

Thu Jun 25 18:52:31 CST 2015 86
Thu Jun 25 18:52:32 CST 2015 87
Thu Jun 25 18:52:33 CST 2015 88
Thu Jun 25 18:52:34 CST 2015 89           ########
Thu Jun 25 18:52:47 CST 2015 90           ########
Thu Jun 25 18:52:48 CST 2015 91
Thu Jun 25 18:52:49 CST 2015 92

恢复故障1:mfsmaster 的heartbeat服务恢复之后

恢复heartbeat服务:

[root@mfs-master01-rj ~]# /etc/init.d/heartbeat start
Starting High-Availability services: INFO:  Resource is stopped
Done.

故障恢复时,mfsmaster主备服务器的日志信息:

mfsmaster主:

Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: Pacemaker support: false
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: WARN: Logging daemon is disabled --enabling logging daemon is recommended
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: **************************
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [653]: info: Configuration validated. Starting heartbeat 3.0.4
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: heartbeat: version 3.0.4
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Heartbeat generation: 1435221812
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: UDP multicast heartbeat started for group 225.0.0.192 port 694 interface em2 (ttl=1 loop=0)
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: glib: ping heartbeat started.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_TriggerHandler: Added signal manual handler
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_TriggerHandler: Added signal manual handler
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: G_main_add_SignalHandler: Added signal handler for signal 17
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Local status now set to: 'up'
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.27: status ping
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.27:10.1.1.27 up.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.28: status ping
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.28:10.1.1.28 up.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Link mfs-master02-rj.btr:em2 up.
Jun 25 20:09:29 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node mfs-master02-rj.btr: status active
harc(default)[667]:     2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Link 10.1.1.1:10.1.1.1 up.
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Status update for node 10.1.1.1: status ping
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Comm_now_up(): updating status to active
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Local status now set to: 'active'
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Starting child client "/usr/lib64/heartbeat/ipfail" (497,496)
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [685]: info: Starting "/usr/lib64/heartbeat/ipfail" as uid 497  gid 496 (pid 685)
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed.
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed.
Jun 25 20:09:30 mfs-master01-rj.btr heartbeat: [654]: info: Local Resource acquisition completed. (none)
Jun 25 20:09:31 mfs-master01-rj.btr heartbeat: [654]: info: mfs-master02-rj.btr wants to go standby [foreign]
Jun 25 20:09:33 mfs-master01-rj.btr heartbeat: [654]: info: standby: acquire [foreign] resources from mfs-master02-rj.btr
Jun 25 20:09:33 mfs-master01-rj.btr heartbeat: [688]: info: acquire local HA resources (standby).
ResourceManager(default)[701]:  2015/06/25_20:09:34 info: Acquiring resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[729]:       2015/06/25_20:09:34 INFO:  Resource is stopped
ResourceManager(default)[701]:  2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 start
IPaddr(IPaddr_10.1.1.26)[855]:  2015/06/25_20:09:34 INFO: Adding inet address 10.1.1.26/24 with broadcast address 10.1.1.255 to device em4
IPaddr(IPaddr_10.1.1.26)[855]:  2015/06/25_20:09:34 INFO: Bringing device em4 up
IPaddr(IPaddr_10.1.1.26)[855]:  2015/06/25_20:09:34 INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.1.1.26 em4 10.1.1.26 auto not_used not_used
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[828]:       2015/06/25_20:09:34 INFO:  Success
ResourceManager(default)[701]:  2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/drbddisk drbd start
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[985]:      2015/06/25_20:09:34 INFO:  Resource is stopped
ResourceManager(default)[701]:  2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 start
Filesystem(Filesystem_/dev/drbd0)[1071]:        2015/06/25_20:09:34 INFO: Running start for /dev/drbd0 on /usr/local/mfs
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[1063]:     2015/06/25_20:09:34 INFO:  Success
ResourceManager(default)[701]:  2015/06/25_20:09:34 info: Running /etc/ha.d/resource.d/mfsmaster  start
Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [688]: info: local HA resource acquisition completed (standby).
Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [654]: info: Standby resource acquisition done [foreign].
Jun 25 20:09:34 mfs-master01-rj.btr heartbeat: [654]: info: Initial resource acquisition complete (auto_failback)
Jun 25 20:09:35 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed.
Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Ping node count is balanced.
Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Giving up foreign resources (auto_failback).
Jun 25 20:09:42 mfs-master01-rj.btr ipfail: [685]: info: Delayed giveup in 4 seconds.
Jun 25 20:09:46 mfs-master01-rj.btr ipfail: [685]: info: giveup() called (timeout worked)
Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [654]: info: mfs-master01-rj.btr wants to go standby [foreign]
Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [654]: info: standby: mfs-master02-rj.btr can take our foreign resources
Jun 25 20:09:47 mfs-master01-rj.btr heartbeat: [1166]: info: give up foreign HA resources (standby).
Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [1166]: info: foreign HA resource release completed (standby).
Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: Local standby process completed [foreign].
Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: WARN: 1 lost packet(s) for [mfs-master02-rj.btr] [2816:2818]
Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: remote resource transition completed.
Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: No pkts missing from mfs-master02-rj.btr!
Jun 25 20:09:48 mfs-master01-rj.btr heartbeat: [654]: info: Other node completed standby takeover of foreign resources.

mfsmaster备:

Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Heartbeat restart on node mfs-master01-rj.btr
Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Link mfs-master01-rj.btr:em2 up.
Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status init
Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Link Status update: Link mfs-master01-rj.btr/em2 now has status up
Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status init
Jun 25 20:09:29 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status up
Jun 25 20:09:29 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status up
harc(default)[48262]:   2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status
harc(default)[48279]:   2015/06/25_20:09:29 info: Running /etc/ha.d//rc.d/status status
Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: Status update for node mfs-master01-rj.btr: status active
Jun 25 20:09:30 mfs-master02-rj.btr ipfail: [44630]: info: Status update: Node mfs-master01-rj.btr now has status active
harc(default)[48296]:   2015/06/25_20:09:30 info: Running /etc/ha.d//rc.d/status status
Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed.
Jun 25 20:09:30 mfs-master02-rj.btr heartbeat: [44613]: info: mfs-master02-rj.btr wants to go standby [foreign]
Jun 25 20:09:31 mfs-master02-rj.btr heartbeat: [44613]: info: standby: mfs-master01-rj.btr can take our foreign resources
Jun 25 20:09:31 mfs-master02-rj.btr heartbeat: [48313]: info: give up foreign HA resources (standby).
ResourceManager(default)[48326]:        2015/06/25_20:09:31 info: Releasing resource group: mfs-master01-rj.btr IPaddr::10.1.1.26/24/em4 drbddisk::drbd Filesystem::/dev/drbd0::/usr/local/mfs::ext4 mfsmaster
ResourceManager(default)[48326]:        2015/06/25_20:09:31 info: Running /etc/ha.d/resource.d/mfsmaster  stop
Jun 25 20:09:32 mfs-master02-rj.btr ipfail: [44630]: info: Asking other side for ping node count.
ResourceManager(default)[48326]:        2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/Filesystem /dev/drbd0 /usr/local/mfs ext4 stop
Filesystem(Filesystem_/dev/drbd0)[48389]:       2015/06/25_20:09:33 INFO: Running stop for /dev/drbd0 on /usr/local/mfs
Filesystem(Filesystem_/dev/drbd0)[48389]:       2015/06/25_20:09:33 INFO: Trying to unmount /usr/local/mfs
Filesystem(Filesystem_/dev/drbd0)[48389]:       2015/06/25_20:09:33 INFO: unmounted /usr/local/mfs successfully
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[48381]:    2015/06/25_20:09:33 INFO:  Success
ResourceManager(default)[48326]:        2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/drbddisk drbd stop
ResourceManager(default)[48326]:        2015/06/25_20:09:33 info: Running /etc/ha.d/resource.d/IPaddr 10.1.1.26/24/em4 stop
IPaddr(IPaddr_10.1.1.26)[48545]:        2015/06/25_20:09:33 INFO: IP status = ok, IP_CIP=
/usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.1.1.26)[48519]:     2015/06/25_20:09:33 INFO:  Success
Jun 25 20:09:33 mfs-master02-rj.btr heartbeat: [48313]: info: foreign HA resource release completed (standby).
Jun 25 20:09:33 mfs-master02-rj.btr heartbeat: [44613]: info: Local standby process completed [foreign].
Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: WARN: 1 lost packet(s) for [mfs-master01-rj.btr] [13:15]
Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed.
Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: No pkts missing from mfs-master01-rj.btr!
Jun 25 20:09:34 mfs-master02-rj.btr heartbeat: [44613]: info: Other node completed standby takeover of foreign resources.
Jun 25 20:09:42 mfs-master02-rj.btr ipfail: [44630]: info: No giveup timer to abort.
Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: mfs-master01-rj.btr wants to go standby [foreign]
Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: standby: acquire [foreign] resources from mfs-master01-rj.btr
Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [48610]: info: acquire local HA resources (standby).
Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [48610]: info: local HA resource acquisition completed (standby).
Jun 25 20:09:47 mfs-master02-rj.btr heartbeat: [44613]: info: Standby resource acquisition done [foreign].
Jun 25 20:09:48 mfs-master02-rj.btr heartbeat: [44613]: info: remote resource transition completed.

故障恢复后,mfsmaster主备服务器的状态:

mfsmaster主:

[root@mfs-master01-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}'
mfs       1165     1  0 20:09 ?        00:00:00 /usr/local/mfs/sbin/mfsmaster start
[root@mfs-master01-rj ~]# ip a|grep  em4                         
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.1.1.27/24 brd 10.1.1.255 scope global em4
    inet 10.1.1.26/24 brd 10.1.1.255 scope global secondary em4
[root@mfs-master01-rj ~]# cat /proc/drbd                         
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master01-rj.btr, 2015-06-25 17:20:34
 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
    ns:12288 nr:5600 dw:15868 dr:18468 al:13 bm:14 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0

mfsmaster备:

[root@mfs-master02-rj ~]# ps -ef|awk '$0!~"awk"&&$0~"mfs"{print}'
[root@mfs-master02-rj ~]# ip a|grep  em4                         
5: em4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    inet 10.1.1.28/24 brd 10.1.1.255 scope global em4
[root@mfs-master02-rj ~]# cat /proc/drbd                         
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by root@mfs-master02-rj.btr, 2015-06-25 17:20:33
 0: cs:Connected ro:Secondary/Primary ds:UpToDate/UpToDate C r-----
    ns:5600 nr:12324 dw:106647108 dr:18541 al:12 bm:6414 lo:0 pe:0 ua:0 ap:0 ep:1 wo:d oos:0
mfs客户端数据恢复信息:
Thu Jun 25 18:56:33 CST 2015 314
Thu Jun 25 18:56:34 CST 2015 315
Thu Jun 25 18:56:35 CST 2015 316
Thu Jun 25 18:56:36 CST 2015 317        #######
Thu Jun 25 18:56:49 CST 2015 318        #######
Thu Jun 25 18:56:50 CST 2015 319
Thu Jun 25 18:56:51 CST 2015 320
Thu Jun 25 18:56:52 CST 2015 321

转载请保留固定链接: https://linuxeye.com/Linux/2709.html

------分隔线----------------------------
标签:MooseFS
栏目列表
推荐内容