On 02/13/2016 01:02 AM, Mike Stump wrote:> On Feb 12, 2016, at 8:34 AM, Ravishankar N <ravishankar at redhat.com> wrote: >> Consistency, availability, tolerance to network partitions. You get to pick any two. > I wanted the first two. I did not get them. By default, we get split brain. This means no consistency.consistency means the client always gets back the same data it wrote to the volume. For replication, if say the write succeed only on one brick, then further reads will be served from the healthy brick and not accidentally from the stale one. It also means if one client updated the file, other clients also get to see the same update when they access it.> To cure that, we choose quorums. But when the first of a replica 2 pair goes away, you then loose write access. Without write, we loose availability. So, if you think it is possible, let me know how to reconfigure my array and I will tell you if it worked. If you could update the docs to explain how you get the first two, that would be nice. If you could update the docs to state that the array goes into a partial read-only state if a replica pair goes away, that would be nice.Like Bishoy said in another thread, quorum does not really make sense in 2-replica because there is no notion of majority. If you use a 3 way replica with client-quorum enabled, then you have more availability than a 2 way replica. If preventing split-brains is your major concern while not wanting to use 3x replication, you can try arbiter volumes. (https://gluster.readthedocs.org/en/release-3.7.0/Features/afr-arbiter-volumes/)> > I?m fine with running in a degraded state when a server goes away. When it comes back, I want it to suck down all the new changes from the authoritative replica pair known to the quorum and then once it has all the data, then it can be marked as not-degraded and resume normal operation. > > I want each node to notice a down server, and when it is part of a 51% partition, I want the remaining replica members of that server to become degraded replica N-1 set. When the server comes back up, and want it to repair back into a replica N state.AFR does all this but in a distributed synchronous replication system, no matter what the replication factor is, at some point, *preventing* split-brains means failing further writes if the current write would make the only true copy not true anymore. This fencing will be done until the other copies are in sync (i.e. healed) . That *will* mean a loss of availability (for writes) until the duration of heal. About the docs, could you list the links for client and server quorum where you found the details to be inadequate? I can't seem to find anything myself on readthedocs.:( I'm anyway planning to do a detailed write up for arbiter volumes, split-brains, client and server quorums which can serve as a ready reckoner. HTH, Ravi
On Feb 13, 2016, at 6:21 PM, Ravishankar N <ravishankar at redhat.com> wrote:> Like Bishoy said in another thread, quorum does not really make sense in 2-replica because there is no notion of majority.Ah, could you update the doc to note that a quorum isn?t formed from all the members of the cluster, but rather from the replicas and arbiters involved in the specific operation? In a 10x3 cluster, it would be formed from just 3 nodes, not 30. I expected that in a 10 (5x2) server setup, 6 means majority. Any island of 5 or less, would not be a majority. Any island of 6 or more, would form a majority. The doc isn?t clear on the point that in a 5x2 setup, the software won?t take a majority of all the cluster members. I had assumed that all the cluster members would help form a quorum for the entire cluster. Additionally, can you say that a replica 2 doesn?t help build an HA cluster. One must have a replica 3, or an arbiter volume. This let?s a person who specifically wants HA to avoid the mistake of replica 2. Thanks.
On Feb 13, 2016, at 6:21 PM, Ravishankar N <ravishankar at redhat.com> wrote:> About the docs, could you list the links for client and server quorum where you found the details to be inadequate? I can't seem to find anything myself on readthedocs.:( > I'm anyway planning to do a detailed write up for arbiter volumes, split-brains, client and server quorums which can serve as a ready reckoner.So, http://www.gluster.org/community/documentation/index.php/GlusterFS_Concepts can explain that replica 2 is unlike RAID-1. It can state that to achieve RAID-1, one must use replica 3. http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/ likewise talks about high availability and replicated, but it could explain that replica 3 is required to achieve RAID-1. In http://gluster.readthedocs.org/en/release-3.7.0/Features/server-quorum/ it states:> Therefore, it is highly advisable to ensure that the cluster size is three or greater.So, I would have an easier time understanding this, is cluster size was linked to a definition. Would my reading be correct if I read this as replica 3 or greater?> It need only be present to preserve the notion of a quorum majority less than the entire cluster membership, allowing the cluster to survive the loss of a single node without losing quorum.Again, here it talks about cluster membership. I think I incorrectly viewed this as relating to the total list of peers rather than the replica count. 1 - https://en.wikipedia.org/wiki/Standard_RAID_levels 2 - https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices
On 02/18/2016 11:33 AM, Mike Stump wrote:> On Feb 13, 2016, at 6:21 PM, Ravishankar N <ravishankar at redhat.com> wrote: >> About the docs, could you list the links for client and server quorum where you found the details to be inadequate? I can't seem to find anything myself on readthedocs.:( >> I'm anyway planning to do a detailed write up for arbiter volumes, split-brains, client and server quorums which can serve as a ready reckoner.Hi, So I've made a write-up here: http://gluster.readthedocs.org/en/latest/Administrator%20Guide/arbiter-volumes-and-quorum/ Hope this is more explanatory. Thanks, Ravi> So, http://www.gluster.org/community/documentation/index.php/GlusterFS_Concepts can explain that replica 2 is unlike RAID-1. It can state that to achieve RAID-1, one must use replica 3. http://gluster.readthedocs.org/en/latest/Administrator%20Guide/Setting%20Up%20Volumes/ likewise talks about high availability and replicated, but it could explain that replica 3 is required to achieve RAID-1. > > In > > http://gluster.readthedocs.org/en/release-3.7.0/Features/server-quorum/ > > it states: > >> Therefore, it is highly advisable to ensure that the cluster size is three or greater. > So, I would have an easier time understanding this, is cluster size was linked to a definition. Would my reading be correct if I read this as replica 3 or greater? > >> It need only be present to preserve the notion of a quorum majority less than the entire cluster membership, allowing the cluster to survive the loss of a single node without losing quorum. > Again, here it talks about cluster membership. I think I incorrectly viewed this as relating to the total list of peers rather than the replica count. > > > 1 - https://en.wikipedia.org/wiki/Standard_RAID_levels > 2 - https://btrfs.wiki.kernel.org/index.php/Using_Btrfs_with_Multiple_Devices >