On 03/16/2016 10:57 PM, Oleksandr Natalenko wrote:> OK, I've repeated the test with the following hierarchy: > > * 10 top-level folders with 10 second-level folders each; > * 10 000 files in each second-level folder. > > So, this composes 10?10?10000=1M files and 100 folders > > Initial brick used space: 33 M > Initial inodes count: 24 > > After test: > > * each brick in replica took 18G, and the arbiter brick took 836M; > * inodes count: 1066036 > > So: > > (836 - 33) / (1066036 - 24) == 790 bytes per inode. > > So, yes, it is slightly bigger value than with previous test due to, I guess, > lots of files in one folder, but it is still too far from 4k. Given a good > engineer should consider 30% reserve, the ratio is about 1k per stored inode. > > Correct me if I'm missing something (regarding average workload and not corner > cases).Looks okay to me Oleksandr. You might want to make a github gist of your tests+results as a reference for others. Regards, Ravi> > Test script is here: [1] > > Regards, > Oleksandr. > > [1] http://termbin.com/qlvz > > On ????????, 8 ??????? 2016 ?. 19:13:05 EET Ravishankar N wrote: >> On 03/05/2016 03:45 PM, Oleksandr Natalenko wrote: >>> In order to estimate GlusterFS arbiter brick size, I've deployed test >>> setup >>> with replica 3 arbiter 1 volume within one node. Each brick is located on >>> separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 + >>> memleak >>> patches. Volume options are kept default. >>> >>> Here is the script that creates files and folders in mounted volume: [1] >>> >>> The script creates 1M of files of random size (between 1 and 32768 bytes) >>> and some amount of folders. After running it I've got 1036637 folders. >>> So, in total it is 2036637 files and folders. >>> >>> The initial used space on each brick is 42M . After running script I've >>> got: >>> >>> replica brick 1 and 2: 19867168 kbytes == 19G >>> arbiter brick: 1872308 kbytes == 1.8G >>> >>> The amount of inodes on each brick is 3139091. So here goes estimation. >>> >>> Dividing arbiter used space by files+folders we get: >>> >>> (1872308 - 42000)/2036637 == 899 bytes per file or folder >>> >>> Dividing arbiter used space by inodes we get: >>> >>> (1872308 - 42000)/3139091 == 583 bytes per inode >>> >>> Not sure about what calculation is correct. >> I think the first one is right because you still haven't used up all the >> inodes.(2036637 used vs. the max. permissible 3139091). But again this >> is an approximation because not all files would be 899 bytes. For >> example if there are a thousand files present in a directory, then du >> <dirname> would be more than du <file> because the directory will take >> some disk space to store the dentries. >> >>> I guess we should consider the one >>> >>> that accounts inodes because of .glusterfs/ folder data. >>> >>> Nevertheless, in contrast, documentation [2] says it should be 4096 bytes >>> per file. Am I wrong with my calculations? >> The 4KB is a conservative estimate considering the fact that though the >> arbiter brick does not store data, it still keeps a copy of both user >> and gluster xattrs. For example, if the application sets a lot of >> xattrs, it can consume a data block if they cannot be accommodated on >> the inode itself. Also there is the .glusterfs folder like you said >> which would take up some space. Here is what I tried on an XFS brick: >> [root at ravi4 brick]# touch file >> >> [root at ravi4 brick]# ls -l file >> -rw-r--r-- 1 root root 0 Mar 8 12:54 file >> >> [root at ravi4 brick]# du file >> *0 file** >> * >> [root at ravi4 brick]# for i in {1..100} >> >> > do >> > setfattr -n user.value$i -v value$i file >> > done >> >> [root at ravi4 brick]# ll -l file >> -rw-r--r-- 1 root root 0 Mar 8 12:54 file >> >> [root at ravi4 brick]# du -h file >> *4.0K file** >> * >> Hope this helps, >> Ravi >> >>> Pranith? >>> >>> [1] http://termbin.com/ka9x >>> [2] >>> http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-vo >>> lumes-and-quorum/ _______________________________________________ >>> Gluster-devel mailing list >>> Gluster-devel at gluster.org >>> http://www.gluster.org/mailman/listinfo/gluster-devel >
Ravi, I will definitely arrange the results into some short handy document and post it here. Also, @JoeJulian on IRC suggested me to perform this test on XFS bricks with inode size of 256b and 1k: ==22:38 <@JoeJulian> post-factum: Just wondering what 256 byte inodes might look like for that. And, by the same token, 1k inodes. 22:39 < post-factum> JoeJulian: should I try 1k inodes instead? 22:41 <@JoeJulian> post-factum: Doesn't hurt to try. My expectation is that disk usage will go up despite inode usage going down. 22:41 < post-factum> JoeJulian: ok, will check that 22:41 <@JoeJulian> post-factum: and with 256, I'm curious if inode usage will stay close to the same while disk usage goes down. == Here are the results for 1k: (1171336 - 33000) / (1066036 - 23) == 1068 bytes per inode. Disk usage is indeed higher (1.2G), but inodes usage is the same. Will test with 256b inode now. 17.03.2016 06:28, Ravishankar N wrote:> Looks okay to me Oleksandr. You might want to make a github gist of > your tests+results as a reference for others.
And for 256b inode: (597904 - 33000) / (1066036 - 23) == 530 bytes per inode. So I still consider 1k to be good estimation for average workload. Regards, Oleksandr. On ??????, 17 ??????? 2016 ?. 09:58:14 EET Ravishankar N wrote:> Looks okay to me Oleksandr. You might want to make a github gist of your > tests+results as a reference for others.
Ravi, here is the summary: [1] Regards, Oleksandr. [1] https://gist.github.com/e8265ca07f7b19f30bb3 On ??????, 17 ??????? 2016 ?. 09:58:14 EET Ravishankar N wrote:> On 03/16/2016 10:57 PM, Oleksandr Natalenko wrote: > > OK, I've repeated the test with the following hierarchy: > > > > * 10 top-level folders with 10 second-level folders each; > > * 10 000 files in each second-level folder. > > > > So, this composes 10?10?10000=1M files and 100 folders > > > > Initial brick used space: 33 M > > Initial inodes count: 24 > > > > After test: > > > > * each brick in replica took 18G, and the arbiter brick took 836M; > > * inodes count: 1066036 > > > > So: > > > > (836 - 33) / (1066036 - 24) == 790 bytes per inode. > > > > So, yes, it is slightly bigger value than with previous test due to, I > > guess, lots of files in one folder, but it is still too far from 4k. > > Given a good engineer should consider 30% reserve, the ratio is about 1k > > per stored inode. > > > > Correct me if I'm missing something (regarding average workload and not > > corner cases). > > Looks okay to me Oleksandr. You might want to make a github gist of your > tests+results as a reference for others. > Regards, > Ravi > > > Test script is here: [1] > > > > Regards, > > > > Oleksandr. > > > > [1] http://termbin.com/qlvz > > > > On ????????, 8 ??????? 2016 ?. 19:13:05 EET Ravishankar N wrote: > >> On 03/05/2016 03:45 PM, Oleksandr Natalenko wrote: > >>> In order to estimate GlusterFS arbiter brick size, I've deployed test > >>> setup > >>> with replica 3 arbiter 1 volume within one node. Each brick is located > >>> on > >>> separate HDD (XFS with inode size == 512). Using GlusterFS v3.7.6 + > >>> memleak > >>> patches. Volume options are kept default. > >>> > >>> Here is the script that creates files and folders in mounted volume: [1] > >>> > >>> The script creates 1M of files of random size (between 1 and 32768 > >>> bytes) > >>> and some amount of folders. After running it I've got 1036637 folders. > >>> So, in total it is 2036637 files and folders. > >>> > >>> The initial used space on each brick is 42M . After running script I've > >>> got: > >>> > >>> replica brick 1 and 2: 19867168 kbytes == 19G > >>> arbiter brick: 1872308 kbytes == 1.8G > >>> > >>> The amount of inodes on each brick is 3139091. So here goes estimation. > >>> > >>> Dividing arbiter used space by files+folders we get: > >>> > >>> (1872308 - 42000)/2036637 == 899 bytes per file or folder > >>> > >>> Dividing arbiter used space by inodes we get: > >>> > >>> (1872308 - 42000)/3139091 == 583 bytes per inode > >>> > >>> Not sure about what calculation is correct. > >> > >> I think the first one is right because you still haven't used up all the > >> inodes.(2036637 used vs. the max. permissible 3139091). But again this > >> is an approximation because not all files would be 899 bytes. For > >> example if there are a thousand files present in a directory, then du > >> <dirname> would be more than du <file> because the directory will take > >> some disk space to store the dentries. > >> > >>> I guess we should consider the one > >>> > >>> that accounts inodes because of .glusterfs/ folder data. > >>> > >>> Nevertheless, in contrast, documentation [2] says it should be 4096 > >>> bytes > >>> per file. Am I wrong with my calculations? > >> > >> The 4KB is a conservative estimate considering the fact that though the > >> arbiter brick does not store data, it still keeps a copy of both user > >> and gluster xattrs. For example, if the application sets a lot of > >> xattrs, it can consume a data block if they cannot be accommodated on > >> the inode itself. Also there is the .glusterfs folder like you said > >> which would take up some space. Here is what I tried on an XFS brick: > >> [root at ravi4 brick]# touch file > >> > >> [root at ravi4 brick]# ls -l file > >> -rw-r--r-- 1 root root 0 Mar 8 12:54 file > >> > >> [root at ravi4 brick]# du file > >> *0 file** > >> * > >> [root at ravi4 brick]# for i in {1..100} > >> > >> > do > >> > setfattr -n user.value$i -v value$i file > >> > done > >> > >> [root at ravi4 brick]# ll -l file > >> -rw-r--r-- 1 root root 0 Mar 8 12:54 file > >> > >> [root at ravi4 brick]# du -h file > >> *4.0K file** > >> * > >> Hope this helps, > >> Ravi > >> > >>> Pranith? > >>> > >>> [1] http://termbin.com/ka9x > >>> [2] > >>> http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-v > >>> o > >>> lumes-and-quorum/ _______________________________________________ > >>> Gluster-devel mailing list > >>> Gluster-devel at gluster.org > >>> http://www.gluster.org/mailman/listinfo/gluster-devel