thr3ads.net - Gluster users - [Gluster-users] Very poor heal behaviour in 3.7.9 [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Lindsay Mathieson

2016-Mar-26 14:33 UTC

[Gluster-users] Very poor heal behaviour in 3.7.9

On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:>> Is that the same issue I posted earlier re "gluster volume heal
info"
>> appearing to block I/O?
>>
> I don't think it is heal info that is blocking I/O. I think it is 
> client triggering heal and block the fop until heal completes that 
> results in this pattern. This data-heal disabling should get you out 
> of this problem. 

I tried it earlier and it didn't seem to help.

Does anything need to be restarted after cluster.data-self-heal is set off?

-- 
Lindsay Mathieson

Lindsay Mathieson

2016-Mar-28 01:08 UTC

head link

[Gluster-users] Very poor heal behaviour in 3.7.9

On 27/03/2016 12:33 AM, Lindsay Mathieson wrote:> On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:
>>> Is that the same issue I posted earlier re "gluster volume
heal
>>> info" appearing to block I/O?
>>>
>> I don't think it is heal info that is blocking I/O. I think it is 
>> client triggering heal and block the fop until heal completes that 
>> results in this pattern. This data-heal disabling should get you out 
>> of this problem. 
>
>
> I tried it earlier and it didn't seem to help.
>
> Does anything need to be restarted after cluster.data-self-heal is set 
> off?

Tried again this morning. 100% replicate the behaviour I noted in
> After testing the heal process by killing glusterfsd on a node I 
> noticed the following.
>
> - I/O continued at normal speed while glusterfsd was down.
>
> - After restarting glusterfsd, I/O still continued as normal
>
> - performing a "gluster volume heal datastore2 info" whould show
some
> info then hang.
>
> - I/O on the cluster would cease. e.g in a VM where I was running a 
> command line build of a large project, the build just stopped. The VM 
> itself was mostly responsive but anything that involved accessing the 
> disk hung.
>
> - if I killed the "gluster volume heal datastore2 info" command
then
> I/O in the VM's resumed at a normal pace.
>
> - if I then reissued the "gluster volume heal datastore2 info"
command
> I/O would continue for a short while (seconds - minutes) before 
> hanging again.
>
> - killing the heal info command would resume I/O again.

iowait and cpu are under 4% on all three nodes.

Even after I shutdown all vm's on datastore2 "gluster volume heal 
datastore2 info" hung indefinitely with no output.

I had to stop/start the datastore2 before the info would work, it 
rteurned very quickly with:

    Brick vnb.proxmox.softlog:/tank/vmdata/datastore2
    Number of entries: 0

    Brick vng.proxmox.softlog:/tank/vmdata/datastore2
    /.shard - Possibly undergoing heal

    Number of entries: 1

    Brick vna.proxmox.softlog:/tank/vmdata/datastore2
    /.shard - Possibly undergoing heal

    Number of entries: 1


Unfortunately its stayed that way for 10 minutes now.


I'd like to recheck this behaviour under 3.7.7 - can I just revert to 
that (debian packages) without recreating the datastore?

thanks,



-- 
Lindsay Mathieson

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160328/3db31634/attachment.html>

Pranith Kumar Karampuri

2016-Mar-28 06:31 UTC

head link

[Gluster-users] Very poor heal behaviour in 3.7.9

On 03/26/2016 08:03 PM, Lindsay Mathieson wrote:> On 26/03/2016 11:58 PM, Pranith Kumar Karampuri wrote:
>>> Is that the same issue I posted earlier re "gluster volume
heal
>>> info" appearing to block I/O?
>>>
>> I don't think it is heal info that is blocking I/O. I think it is 
>> client triggering heal and block the fop until heal completes that 
>> results in this pattern. This data-heal disabling should get you out 
>> of this problem. 
>
>
> I tried it earlier and it didn't seem to help.
>
> Does anything need to be restarted after cluster.data-self-heal is set 
> off?
>Self-heals already triggered won't be cancelled when it is turned off. 
So until mount completes ongoing self-heals it will block the I/O :-(.

Pranith

Gluster users - Mar 2016 - Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9

[Gluster-users] Very poor heal behaviour in 3.7.9