On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:>> Is that the same issue I posted earlier re "gluster volume heal info" >> appearing to block I/O? >> > I don't think it is heal info that is blocking I/O. I think it is > client triggering heal and block the fop until heal completes that > results in this pattern. This data-heal disabling should get you out > of this problem.I tried it earlier and it didn't seem to help. Does anything need to be restarted after cluster.data-self-heal is set off? -- Lindsay Mathieson
On 27/03/2016 12:33 AM, Lindsay Mathieson wrote:> On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote: >>> Is that the same issue I posted earlier re "gluster volume heal >>> info" appearing to block I/O? >>> >> I don't think it is heal info that is blocking I/O. I think it is >> client triggering heal and block the fop until heal completes that >> results in this pattern. This data-heal disabling should get you out >> of this problem. > > > I tried it earlier and it didn't seem to help. > > Does anything need to be restarted after cluster.data-self-heal is set > off?Tried again this morning. 100% replicate the behaviour I noted in> After testing the heal process by killing glusterfsd on a node I > noticed the following. > > - I/O continued at normal speed while glusterfsd was down. > > - After restarting glusterfsd, I/O still continued as normal > > - performing a "gluster volume heal datastore2 info" whould show some > info then hang. > > - I/O on the cluster would cease. e.g in a VM where I was running a > command line build of a large project, the build just stopped. The VM > itself was mostly responsive but anything that involved accessing the > disk hung. > > - if I killed the "gluster volume heal datastore2 info" command then > I/O in the VM's resumed at a normal pace. > > - if I then reissued the "gluster volume heal datastore2 info" command > I/O would continue for a short while (seconds - minutes) before > hanging again. > > - killing the heal info command would resume I/O again.iowait and cpu are under 4% on all three nodes. Even after I shutdown all vm's on datastore2 "gluster volume heal datastore2 info" hung indefinitely with no output. I had to stop/start the datastore2 before the info would work, it rteurned very quickly with: Brick vnb.proxmox.softlog:/tank/vmdata/datastore2 Number of entries: 0 Brick vng.proxmox.softlog:/tank/vmdata/datastore2 /.shard - Possibly undergoing heal Number of entries: 1 Brick vna.proxmox.softlog:/tank/vmdata/datastore2 /.shard - Possibly undergoing heal Number of entries: 1 Unfortunately its stayed that way for 10 minutes now. I'd like to recheck this behaviour under 3.7.7 - can I just revert to that (debian packages) without recreating the datastore? thanks, -- Lindsay Mathieson -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://www.gluster.org/pipermail/gluster-users/attachments/20160328/3db31634/attachment.html>
Pranith Kumar Karampuri
2016-Mar-28 06:31 UTC
[Gluster-users] Very poor heal behaviour in 3.7.9
On 03/26/2016 08:03 PM, Lindsay Mathieson wrote:> On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote: >>> Is that the same issue I posted earlier re "gluster volume heal >>> info" appearing to block I/O? >>> >> I don't think it is heal info that is blocking I/O. I think it is >> client triggering heal and block the fop until heal completes that >> results in this pattern. This data-heal disabling should get you out >> of this problem. > > > I tried it earlier and it didn't seem to help. > > Does anything need to be restarted after cluster.data-self-heal is set > off? >Self-heals already triggered won't be cancelled when it is turned off. So until mount completes ongoing self-heals it will block the I/O :-(. Pranith