thr3ads.net - Gluster users - [Gluster-users] Arbiter brick size estimation [Mar 2016]

If this information is useful, please help other people find it:
Share via:

Ravishankar N

2016-Mar-17 04:28 UTC

[Gluster-users] Arbiter brick size estimation

On 03/16/2016 10:57 PM, Oleksandr Natalenko wrote:> OK, I've repeated the test with the following hierarchy:
>
> * 10 top-level folders with 10 second-level folders each;
> * 10 000 files in each second-level folder.
>
> So, this composes 10?10?10000=1M files and 100 folders
>
> Initial brick used space: 33 M
> Initial inodes count: 24
>
> After test:
>
> * each brick in replica took 18G, and the arbiter brick took 836M;
> * inodes count: 1066036
>
> So:
>
> (836 - 33) / (1066036 - 24) == 790 bytes per inode.
>
> So, yes, it is slightly bigger value than with previous test due to, I
guess,
> lots of files in one folder, but it is still too far from 4k. Given a good
> engineer should consider 30% reserve, the ratio is about 1k per stored
inode.
>
> Correct me if I'm missing something (regarding average workload and not
corner
> cases).
Looks okay to me Oleksandr. You might want to make a github gist of your 
tests+results as a reference for others.
Regards,
Ravi
>
> Test script is here: [1]
>
> Regards,
>    Oleksandr.
>
> [1] http://termbin.com/qlvz
>
> On ????????, 8 ??????? 2016 ?. 19:13:05 EET Ravishankar N wrote:
>> On 03/05/2016 03:45 PM, Oleksandr Natalenko wrote:
>>> In order to estimate GlusterFS arbiter brick size, I've
deployed test
>>> setup
>>> with replica 3 arbiter 1 volume within one node. Each brick is
located on
>>> separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 +
>>> memleak
>>> patches. Volume options are kept default.
>>>
>>> Here is the script that creates files and folders in mounted
volume: [1]
>>>
>>> The script creates 1M of files of random size (between 1 and 32768
bytes)
>>> and some amount of folders. After running it I've got 1036637
folders.
>>> So, in total it is 2036637 files and folders.
>>>
>>> The initial used space on each brick is 42M . After running script
I've
>>> got:
>>>
>>> replica brick 1 and 2: 19867168 kbytes == 19G
>>> arbiter brick: 1872308 kbytes == 1.8G
>>>
>>> The amount of inodes on each brick is 3139091. So here goes
estimation.
>>>
>>> Dividing arbiter used space by files+folders we get:
>>>
>>> (1872308 - 42000)/2036637 == 899 bytes per file or folder
>>>
>>> Dividing arbiter used space by inodes we get:
>>>
>>> (1872308 - 42000)/3139091 == 583 bytes per inode
>>>
>>> Not sure about what calculation is correct.
>> I think the first one is right because you still haven't used up
all the
>> inodes.(2036637 used vs. the max. permissible 3139091). But again this
>> is an approximation because not all files would be 899 bytes. For
>> example if there are a thousand files present in a directory, then du
>> <dirname> would be more than du <file> because the
directory will take
>> some disk space to store the dentries.
>>
>>>    I guess we should consider the one
>>>
>>> that accounts inodes because of .glusterfs/ folder data.
>>>
>>> Nevertheless, in contrast, documentation [2] says it should be 4096
bytes
>>> per file. Am I wrong with my calculations?
>> The 4KB is a conservative estimate considering the fact that though the
>> arbiter brick does not store data, it still keeps a copy of both user
>> and gluster xattrs. For example, if the application sets a lot of
>> xattrs, it can consume a data block if they cannot be accommodated on
>> the inode itself.  Also there is the .glusterfs folder like you said
>> which would take up some space. Here is what I tried on an XFS brick:
>> [root at ravi4 brick]# touch file
>>
>> [root at ravi4 brick]# ls -l file
>> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
>>
>> [root at ravi4 brick]# du file
>> *0       file**
>> *
>> [root at ravi4 brick]# for i in {1..100}
>>
>>   > do
>>   > setfattr -n user.value$i -v value$i file
>>   > done
>>
>> [root at ravi4 brick]# ll -l file
>> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
>>
>> [root at ravi4 brick]# du -h file
>> *4.0K    file**
>> *
>> Hope this helps,
>> Ravi
>>
>>> Pranith?
>>>
>>> [1] http://termbin.com/ka9x
>>> [2]
>>>
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-vo
>>> lumes-and-quorum/ _______________________________________________
>>> Gluster-devel mailing list
>>> Gluster-devel at gluster.org
>>> http://www.gluster.org/mailman/listinfo/gluster-devel
>

Oleksandr Natalenko

2016-Mar-17 15:06 UTC

head link

[Gluster-users] Arbiter brick size estimation

Ravi, I will definitely arrange the results into some short handy 
document and post it here.

Also, @JoeJulian on IRC suggested me to perform this test on XFS bricks 
with inode size of 256b and 1k:

==22:38 <@JoeJulian> post-factum: Just wondering what 256 byte inodes 
might look like for that. And, by the same token, 1k inodes.
22:39 < post-factum> JoeJulian: should I try 1k inodes instead?
22:41 <@JoeJulian> post-factum: Doesn't hurt to try. My expectation is
that disk usage will go up despite inode usage going down.
22:41 < post-factum> JoeJulian: ok, will check that
22:41 <@JoeJulian> post-factum: and with 256, I'm curious if inode
usage
will stay close to the same while disk usage goes down.
==
Here are the results for 1k:

(1171336 - 33000) / (1066036 - 23) == 1068 bytes per inode.

Disk usage is indeed higher (1.2G), but inodes usage is the same.

Will test with 256b inode now.

17.03.2016 06:28, Ravishankar N wrote:> Looks okay to me Oleksandr. You might want to make a github gist of
> your tests+results as a reference for others.

Oleksandr Natalenko

2016-Mar-17 19:35 UTC

head link

[Gluster-users] Arbiter brick size estimation

And for 256b inode:

(597904 - 33000) / (1066036 - 23) == 530 bytes per inode.

So I still consider 1k to be good estimation for average workload.

Regards,
  Oleksandr.

On ??????, 17 ??????? 2016 ?. 09:58:14 EET Ravishankar N
wrote:> Looks okay to me Oleksandr. You might want to make a github gist of your
> tests+results as a reference for others.

Oleksandr Natalenko

2016-Mar-17 22:54 UTC

head link

[Gluster-users] Arbiter brick size estimation

Ravi,

here is the summary: [1]

Regards,
  Oleksandr.

[1] https://gist.github.com/e8265ca07f7b19f30bb3

On ??????, 17 ??????? 2016 ?. 09:58:14 EET Ravishankar N
wrote:> On 03/16/2016 10:57 PM, Oleksandr Natalenko wrote:
> > OK, I've repeated the test with the following hierarchy:
> > 
> > * 10 top-level folders with 10 second-level folders each;
> > * 10 000 files in each second-level folder.
> > 
> > So, this composes 10?10?10000=1M files and 100 folders
> > 
> > Initial brick used space: 33 M
> > Initial inodes count: 24
> > 
> > After test:
> > 
> > * each brick in replica took 18G, and the arbiter brick took 836M;
> > * inodes count: 1066036
> > 
> > So:
> > 
> > (836 - 33) / (1066036 - 24) == 790 bytes per inode.
> > 
> > So, yes, it is slightly bigger value than with previous test due to, I
> > guess, lots of files in one folder, but it is still too far from 4k.
> > Given a good engineer should consider 30% reserve, the ratio is about
1k
> > per stored inode.
> > 
> > Correct me if I'm missing something (regarding average workload
and not
> > corner cases).
> 
> Looks okay to me Oleksandr. You might want to make a github gist of your
> tests+results as a reference for others.
> Regards,
> Ravi
> 
> > Test script is here: [1]
> > 
> > Regards,
> > 
> >    Oleksandr.
> > 
> > [1] http://termbin.com/qlvz
> > 
> > On ????????, 8 ??????? 2016 ?. 19:13:05 EET Ravishankar N wrote:
> >> On 03/05/2016 03:45 PM, Oleksandr Natalenko wrote:
> >>> In order to estimate GlusterFS arbiter brick size, I've
deployed test
> >>> setup
> >>> with replica 3 arbiter 1 volume within one node. Each brick is
located
> >>> on
> >>> separate HDD (XFS with inode size == 512). Using GlusterFS
v3.7.6 +
> >>> memleak
> >>> patches. Volume options are kept default.
> >>> 
> >>> Here is the script that creates files and folders in mounted
volume: [1]
> >>> 
> >>> The script creates 1M of files of random size (between 1 and
32768
> >>> bytes)
> >>> and some amount of folders. After running it I've got
1036637 folders.
> >>> So, in total it is 2036637 files and folders.
> >>> 
> >>> The initial used space on each brick is 42M . After running
script I've
> >>> got:
> >>> 
> >>> replica brick 1 and 2: 19867168 kbytes == 19G
> >>> arbiter brick: 1872308 kbytes == 1.8G
> >>> 
> >>> The amount of inodes on each brick is 3139091. So here goes
estimation.
> >>> 
> >>> Dividing arbiter used space by files+folders we get:
> >>> 
> >>> (1872308 - 42000)/2036637 == 899 bytes per file or folder
> >>> 
> >>> Dividing arbiter used space by inodes we get:
> >>> 
> >>> (1872308 - 42000)/3139091 == 583 bytes per inode
> >>> 
> >>> Not sure about what calculation is correct.
> >> 
> >> I think the first one is right because you still haven't used
up all the
> >> inodes.(2036637 used vs. the max. permissible 3139091). But again
this
> >> is an approximation because not all files would be 899 bytes. For
> >> example if there are a thousand files present in a directory, then
du
> >> <dirname> would be more than du <file> because the
directory will take
> >> some disk space to store the dentries.
> >> 
> >>>    I guess we should consider the one
> >>> 
> >>> that accounts inodes because of .glusterfs/ folder data.
> >>> 
> >>> Nevertheless, in contrast, documentation [2] says it should be
4096
> >>> bytes
> >>> per file. Am I wrong with my calculations?
> >> 
> >> The 4KB is a conservative estimate considering the fact that
though the
> >> arbiter brick does not store data, it still keeps a copy of both
user
> >> and gluster xattrs. For example, if the application sets a lot of
> >> xattrs, it can consume a data block if they cannot be accommodated
on
> >> the inode itself.  Also there is the .glusterfs folder like you
said
> >> which would take up some space. Here is what I tried on an XFS
brick:
> >> [root at ravi4 brick]# touch file
> >> 
> >> [root at ravi4 brick]# ls -l file
> >> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
> >> 
> >> [root at ravi4 brick]# du file
> >> *0       file**
> >> *
> >> [root at ravi4 brick]# for i in {1..100}
> >> 
> >>   > do
> >>   > setfattr -n user.value$i -v value$i file
> >>   > done
> >> 
> >> [root at ravi4 brick]# ll -l file
> >> -rw-r--r-- 1 root root 0 Mar  8 12:54 file
> >> 
> >> [root at ravi4 brick]# du -h file
> >> *4.0K    file**
> >> *
> >> Hope this helps,
> >> Ravi
> >> 
> >>> Pranith?
> >>> 
> >>> [1] http://termbin.com/ka9x
> >>> [2]
> >>>
http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-v
> >>> o
> >>> lumes-and-quorum/
_______________________________________________
> >>> Gluster-devel mailing list
> >>> Gluster-devel at gluster.org
> >>> http://www.gluster.org/mailman/listinfo/gluster-devel

Gluster users - Mar 2016 - Arbiter brick size estimation

[Gluster-users] Arbiter brick size estimation

[Gluster-users] Arbiter brick size estimation

[Gluster-users] Arbiter brick size estimation

[Gluster-users] Arbiter brick size estimation