Hello, We are currently figuring out how to add GlusterFS to our system to make our systems highly available using scripts. We are using Gluster 3.7.11. Problem: Trying to migrate to GlusterFS from a non-clustered system to a 3-node glusterfs replicated cluster using scripts. Tried various things to make this work, but it sometimes causes us to be in an indesirable state where if you call "gluster volume heal <volname> full", we would get the error message, "Launching heal operation to perform full self heal on volume <volname> has been unsuccessful on bricks that are down. Please check if all brick processes are running." All the brick processes are running based on running the command, "gluster volume status volname" Things we have tried: Order of preference 1. Create Volume with 3 Filesystems with the same data 2. Create Volume with 2 Empty filesysytems and one with the data 3. Create Volume with only one filesystem with data and then using "add-brick" command to add the other two empty filesystems 4. Create Volume with one empty filesystem, mounting it, and then copying the data over to that one. And then finally, using "add-brick" command to add the other two empty filesystems 5. Create Volume with 3 empty filesystems, mounting it, and then copying the data over Other things to note: A few minutes after the volume is created and started successfully, our application server starts up against it, so reads and writes may happen pretty quickly after the volume has started. But there is only about 50MB of data. Steps to reproduce (all in a script): # This is run by the primary node with the IP Adress, <server-ip-1>, that has data systemctl restart glusterd gluster peer probe <server-ip-2> gluster peer probe <server-ip-3> Wait for "gluster peer status" to all be in "Peer in Cluster" state gluster volume create <volname> replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]} ${BRICKS[2]} force gluster volume set <volname> nfs.disable true gluster volume start <volname> mkdir -p $MOUNT_POINT mount -t glusterfs <server-ip-1>:/volname $MOUNT_POINT find $MOUNT_POINT | xargs stat Note that, when we added sleeps around the gluster commands, there was a higher probability of success, but not 100%. # Once volume is started, all the the clients/servers will mount the gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p $MOUNT_POINT mount -t glusterfs <server-ip-1>:/volname $MOUNT_POINT Logs: *etc-glusterfs-glusterd.vol.log* in *server-ip-1* [2016-06-21 14:10:38.285234] I [MSGID: 106533] [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: Received heal vol req for volume volname [2016-06-21 14:10:38.296801] E [MSGID: 106153] [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on <server-ip-2>. Please check log file for details. *usr-local-volname-data-mirrored-data.log* in *server-ip-1* [2016-06-21 14:14:39.233366] E [MSGID: 114058] [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. *I think this is caused by the self heal daemon* *cmd_history.log* in *server-ip-1* [2016-06-21 14:10:38.298800] : volume heal volname full : FAILED : Commit failed on <server-ip-2>. Please check log file for details. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160621/02bdc4dd/attachment.html>
Am Di, 21.06.2016, 19:22 schrieb Danny Lee:> Hello, > > > We are currently figuring out how to add GlusterFS to our system to make > our systems highly available using scripts. We are using Gluster 3.7.11. > > Problem: > Trying to migrate to GlusterFS from a non-clustered system to a 3-node > glusterfs replicated cluster using scripts. Tried various things to make this work, but it sometimes causes us to be in an > indesirable state where if you call "gluster volume heal <volname> full", we would get the error message, "Launching heal > operation to perform full self heal on volume <volname> has been unsuccessful on bricks that are down. Please check if > all brick processes are running." All the brick processes are running based on running the command, "gluster volume status > volname" > > Things we have tried: > Order of preference > 1. Create Volume with 3 Filesystems with the same data > 2. Create Volume with 2 Empty filesysytems and one with the data > 3. Create Volume with only one filesystem with data and then using > "add-brick" command to add the other two empty filesystems > 4. Create Volume with one empty filesystem, mounting it, and then copying > the data over to that one. And then finally, using "add-brick" command to add the other two empty filesystems- should be working - read each file on /mnt/gvol, to trigger replication [2]> 5. Create Volume > with 3 empty filesystems, mounting it, and then copying the data over- my favorite> > Other things to note: > A few minutes after the volume is created and started successfully, our > application server starts up against it, so reads and writes may happen pretty quickly after the volume has started. But there > is only about 50MB of data. > > Steps to reproduce (all in a script): > # This is run by the primary node with the IP Adress, <server-ip-1>, that > has data systemctl restart glusterd gluster peer probe <server-ip-2> gluster peer probe <server-ip-3> Wait for "gluster peer > status" to all be in "Peer in Cluster" state gluster volume create <volname> replica 3 transport tcp ${BRICKS[0]} ${BRICKS[1]} > ${BRICKS[2]} force > gluster volume set <volname> nfs.disable true gluster volume start <volname> mkdir -p $MOUNT_POINT mount -t glusterfs > <server-ip-1>:/volname $MOUNT_POINT > find $MOUNT_POINT | xargs statI have written a script for 2 nodes. [1] but should be at least 3 nodes. I hope it helps you regards Heiko> > Note that, when we added sleeps around the gluster commands, there was a > higher probability of success, but not 100%. > > # Once volume is started, all the the clients/servers will mount the > gluster filesystem by polling "mountpoint -q $MOUNT_POINT": mkdir -p $MOUNT_POINT mount -t glusterfs <server-ip-1>:/volname > $MOUNT_POINT > > > Logs: > *etc-glusterfs-glusterd.vol.log* in *server-ip-1* > > > [2016-06-21 14:10:38.285234] I [MSGID: 106533] > [glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] 0-management: > Received heal vol req for volume volname > [2016-06-21 14:10:38.296801] E [MSGID: 106153] > [glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on > <server-ip-2>. Please check log file for details. > > > > *usr-local-volname-data-mirrored-data.log* in *server-ip-1* > > > [2016-06-21 14:14:39.233366] E [MSGID: 114058] > [client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0: > failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is > running. *I think this is caused by the self heal daemon* > > > *cmd_history.log* in *server-ip-1* > > > [2016-06-21 14:10:38.298800] : volume heal volname full : FAILED : Commit > failed on <server-ip-2>. Please check log file for details. _______________________________________________ > Gluster-users mailing list > Gluster-users at gluster.org > http://www.gluster.org/mailman/listinfo/gluster-users[1] http://www2.fh-lausitz.de/launic/comp/net/glusterfs/130620.glusterfs.create_brick_vol.howto.txt - old, limit 2 nodes --
Notice it actually tells you to look in the logs on server-ip-2 but you did not include any logs from that server. On June 21, 2016 10:22:14 AM PDT, Danny Lee <dannyl at vt.edu> wrote:>Hello, > >We are currently figuring out how to add GlusterFS to our system to >make >our systems highly available using scripts. We are using Gluster >3.7.11. > >Problem: >Trying to migrate to GlusterFS from a non-clustered system to a 3-node >glusterfs replicated cluster using scripts. Tried various things to >make >this work, but it sometimes causes us to be in an indesirable state >where >if you call "gluster volume heal <volname> full", we would get the >error >message, "Launching heal operation to perform full self heal on volume ><volname> has been unsuccessful on bricks that are down. Please check >if >all brick processes are running." All the brick processes are running >based on running the command, "gluster volume status volname" > >Things we have tried: >Order of preference >1. Create Volume with 3 Filesystems with the same data >2. Create Volume with 2 Empty filesysytems and one with the data >3. Create Volume with only one filesystem with data and then using >"add-brick" command to add the other two empty filesystems >4. Create Volume with one empty filesystem, mounting it, and then >copying >the data over to that one. And then finally, using "add-brick" command >to >add the other two empty filesystems >5. Create Volume with 3 empty filesystems, mounting it, and then >copying >the data over > >Other things to note: >A few minutes after the volume is created and started successfully, our >application server starts up against it, so reads and writes may happen >pretty quickly after the volume has started. But there is only about >50MB >of data. > >Steps to reproduce (all in a script): ># This is run by the primary node with the IP Adress, <server-ip-1>, >that >has data >systemctl restart glusterd >gluster peer probe <server-ip-2> >gluster peer probe <server-ip-3> >Wait for "gluster peer status" to all be in "Peer in Cluster" state >gluster volume create <volname> replica 3 transport tcp ${BRICKS[0]} >${BRICKS[1]} ${BRICKS[2]} force >gluster volume set <volname> nfs.disable true >gluster volume start <volname> >mkdir -p $MOUNT_POINT >mount -t glusterfs <server-ip-1>:/volname $MOUNT_POINT >find $MOUNT_POINT | xargs stat > >Note that, when we added sleeps around the gluster commands, there was >a >higher probability of success, but not 100%. > ># Once volume is started, all the the clients/servers will mount the >gluster filesystem by polling "mountpoint -q $MOUNT_POINT": >mkdir -p $MOUNT_POINT >mount -t glusterfs <server-ip-1>:/volname $MOUNT_POINT > >Logs: >*etc-glusterfs-glusterd.vol.log* in *server-ip-1* > >[2016-06-21 14:10:38.285234] I [MSGID: 106533] >[glusterd-volume-ops.c:857:__glusterd_handle_cli_heal_volume] >0-management: >Received heal vol req for volume volname >[2016-06-21 14:10:38.296801] E [MSGID: 106153] >[glusterd-syncop.c:113:gd_collate_errors] 0-glusterd: Commit failed on ><server-ip-2>. Please check log file for details. > > >*usr-local-volname-data-mirrored-data.log* in *server-ip-1* > >[2016-06-21 14:14:39.233366] E [MSGID: 114058] >[client-handshake.c:1524:client_query_portmap_cbk] 0-volname-client-0: >failed to get the port number for remote subvolume. Please run 'gluster >volume status' on server to see if brick process is running. >*I think this is caused by the self heal daemon* > >*cmd_history.log* in *server-ip-1* > >[2016-06-21 14:10:38.298800] : volume heal volname full : FAILED : >Commit >failed on <server-ip-2>. Please check log file for details. > > >------------------------------------------------------------------------ > >_______________________________________________ >Gluster-users mailing list >Gluster-users at gluster.org >http://www.gluster.org/mailman/listinfo/gluster-users-- Sent from my Android device with K-9 Mail. Please excuse my brevity. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160625/fc1bb94b/attachment.html>