thr3ads.net - Gluster users - [Gluster-users] Blocked inodes locking on some files [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Florian Philippon

2016-Jun-20 08:43 UTC

[Gluster-users] Blocked inodes locking on some files

Hello guys,

I would like to get some advices on a some problems we have on our 3 
hosts gluster setup.

Here the setup used:

 1. GlusterFS 3.8.0-1 (we did an upgrade from 3.7.11 last week)
 2. Type: Disperse
 3. Number of Bricks: 1 x (2 + 1) = 3
 4. Transport-type: tcp
 5. Options Reconfigured: transport.address-family: inet

Please note that we also have the ACL option enabled on the volume mount.

Use case:

An user submit jobs/tasks to a Spark cluster which have the glusterfs 
volume mounted on each host.

13 tasks were successfully completed in ~30 min for each  (convert some 
logs to a json format and write the ouput to the gluster fs) but one was 
blocked for more than 12 hours when we checked
was going wrong.

We found some log entries related to an inode locking in the brick log 
one one host:

[2016-06-19 03:15:08.563397] E [inodelk.c:304:__inode_unlock_lock] 
0-exp-locks:  Matching lock not found for unlock 0-9223372036854775807, 
by 10613ebc6c6a0000 on 0x6cee5c0f4730
[2016-06-19 03:15:08.563684] E [MSGID: 115053] 
[server-rpc-fops.c:273:server_inodelk_cbk] 0-exp-server: 5375861: 
INODELK /spark/user/20160328/_temporary/0/_temporary (015bde3a-09d
6-41a2-8e9f-7e7c5295d596) ==> (Invalid argument) [Invalid argument]

Errors in the data log:

[2016-06-19 03:13:29.198676] I [MSGID: 109036] 
[dht-common.c:8824:dht_log_new_layout_for_dir_selfheal] 0-exp-dht: 
Setting layout of /spark/user/20160328/_temporary/0/_temporary/at
tempt_201606190511_0004_m_000004_26 with [Subvol_name: exp-disperse-0, 
Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
[2016-06-19 03:14:59.349357] I [MSGID: 109066] 
[dht-rename.c:1562:dht_rename] 0-exp-dht: renaming 
/spark/user/20160328/_temporary/0/_temporary/attempt_201606190511_0004_m_000001_2
3 (hash=exp-disperse-0/cache=exp-disperse-0) => 
/spark/user/20160328/_temporary/0/task_201606190511_0004_m_000001 
(hash=exp-disperse-0/cache=<nul>)

And these entries are also spamming the data log when an action is done 
the fs:

[2016-06-19 13:58:22.817308] I [dict.c:462:dict_get] 
(-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628) 
[0x6f0655cd1628] -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s
o(+0x9ccb) [0x6f0655ab5ccb] -->/lib64/libglusterfs.so.0(dict_get+0xec) 
[0x6f066528df7c] ) 0-dict: !this || key=system.posix_acl_access [Invalid 
argument]
[2016-06-19 13:58:22.817364] I [dict.c:462:dict_get] 
(-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628) 
[0x6f0655cd1628] -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s
o(+0x9d21) [0x6f0655ab5d21] -->/lib64/libglusterfs.so.0(dict_get+0xec) 
[0x6f066528df7c] ) 0-dict: !this || key=system.posix_acl_default 
[Invalid argument]

We did a stadump and we got confirmation that some processes were in a 
blocking state.

We did a clear lock on the blocked inode and the spark job has finally 
finished (with errors).

What could be the root cause of these lockings?

Thanks for your help!

Florian


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20160620/94fda21d/attachment.html>

Florian Philippon

2016-Oct-27 09:29 UTC

head link

[Gluster-users] Blocked inodes locking on some files

Hello community,

I'm still facing this issue using GlusterFS-3.8.5-1 and i would like to
know if it could be related to a potential bug in the disperse mode or if
it's related to the way of Spark writing results of
jobs in gluster mountpoint.

Found the following locks using statedump:

[xlator.features.locks.exp-locks.inode]
path=/tests/output/_temporary/0/_temporary
mandatory=0
inodelk-count=3
lock-dump.domain.domain=dht.layout.heal
lock-dump.domain.domain=exp-disperse-0
inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid 10978,
owner=60af7c2a0c7f0000, client=0x7f4fd4009900,
connection-id=server1-5422-2016/10/26-15:03:03:550732-exp-client-0-0-0,
blocked at 2016-10-26 15:53:13, granted at 2016-10-26 15:53:14
inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 19137,
owner=18590414617f0000, client=0x7f4fdc0e1950,
connection-id=server2-1720-2016/10/26-15:05:59:328008-exp-client-0-0-0,
blocked at 2016-10-26 15:53:14
inodelk.inodelk[2](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid 9798,
owner=303b34a4e8720000, client=0x7f4fdc0deac0,
connection-id=server1-3955-2016/10/26-15:04:00:874305-exp-client-0-0-0,
blocked at 2016-10-26 15:53:18

Many thanks for your advices and help on this topic.


2016-06-20 10:43 GMT+02:00 Florian Philippon <florian.philippon at
gmail.com>:
> Hello guys,
>
> I would like to get some advices on a some problems we have on our 3 hosts
> gluster setup.
>
> Here the setup used:
>
>    1. GlusterFS 3.8.0-1 (we did an upgrade from 3.7.11 last week)
>    2. Type: Disperse
>    3. Number of Bricks: 1 x (2 + 1) = 3
>    4. Transport-type: tcp
>    5. Options Reconfigured: transport.address-family: inet
>
> Please note that we also have the ACL option enabled on the volume mount.
>
> Use case:
>
> An user submit jobs/tasks to a Spark cluster which have the glusterfs
> volume mounted on each host.
> 13 tasks were successfully completed in ~30 min for each  (convert some
> logs to a json format and write the ouput to the gluster fs) but one was
> blocked for more than 12 hours when we checked
> was going wrong.
>
> We found some log entries related to an inode locking in the brick log one
> one host:
>
> [2016-06-19 03:15:08.563397] E [inodelk.c:304:__inode_unlock_lock]
> 0-exp-locks:  Matching lock not found for unlock 0-9223372036854775807, by
> 10613ebc6c6a0000 on 0x6cee5c0f4730
> [2016-06-19 03:15:08.563684] E [MSGID: 115053]
> [server-rpc-fops.c:273:server_inodelk_cbk] 0-exp-server: 5375861: INODELK
> /spark/user/20160328/_temporary/0/_temporary (015bde3a-09d
> 6-41a2-8e9f-7e7c5295d596) ==> (Invalid argument) [Invalid argument]
>
> Errors in the data log:
>
> [2016-06-19 03:13:29.198676] I [MSGID: 109036]
[dht-common.c:8824:dht_log_new_layout_for_dir_selfheal]
> 0-exp-dht: Setting layout of /spark/user/20160328/_
> temporary/0/_temporary/at
> tempt_201606190511_0004_m_000004_26 with [Subvol_name: exp-disperse-0,
> Err: -1 , Start: 0 , Stop: 4294967295 , Hash: 1 ],
> [2016-06-19 03:14:59.349357] I [MSGID: 109066]
> [dht-rename.c:1562:dht_rename] 0-exp-dht: renaming /spark/user/20160328/_
> temporary/0/_temporary/attempt_201606190511_0004_m_000001_2
> 3 (hash=exp-disperse-0/cache=exp-disperse-0) => /spark/user/20160328/_
> temporary/0/task_201606190511_0004_m_000001 (hash=exp-disperse-0/cache=<
> nul>)
>
> And these entries are also spamming the data log when an action is done
> the fs:
>
> [2016-06-19 13:58:22.817308] I [dict.c:462:dict_get]
> (-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
> [0x6f0655cd1628] -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s
> o(+0x9ccb) [0x6f0655ab5ccb] -->/lib64/libglusterfs.so.0(dict_get+0xec)
> [0x6f066528df7c] ) 0-dict: !this || key=system.posix_acl_access [Invalid
> argument]
> [2016-06-19 13:58:22.817364] I [dict.c:462:dict_get]
> (-->/usr/lib64/glusterfs/3.8.0/xlator/debug/io-stats.so(+0x13628)
> [0x6f0655cd1628] -->/usr/lib64/glusterfs/3.8.0/xlator/system/posix-acl.s
> o(+0x9d21) [0x6f0655ab5d21] -->/lib64/libglusterfs.so.0(dict_get+0xec)
> [0x6f066528df7c] ) 0-dict: !this || key=system.posix_acl_default [Invalid
> argument]
>
> We did a stadump and we got confirmation that some processes were in a
> blocking state.
>
> We did a clear lock on the blocked inode and the spark job has finally
> finished (with errors).
>
> What could be the root cause of these lockings?
>
> Thanks for your help!
>
> Florian
>
>
>

-- 
Florian Philippon
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://www.gluster.org/pipermail/gluster-users/attachments/20161027/463774aa/attachment.html>

Gluster users - Jun 2016 - Blocked inodes locking on some files

[Gluster-users] Blocked inodes locking on some files

[Gluster-users] Blocked inodes locking on some files