Hey, this is one of the last design issues behind de-MLS: a recovery mechanism against synchronicity issues such as state partitioning.
Problem: Since de-MLS is a stateful protocol, meaning all members operate under a single global state. Although the de-MLS protocol protects this state under the assumption that message delivery works for the majority, practical network issues can violate this assumption. In such cases, there might be practical issues that violate this assumption, and the group state can be forked into multiple partition. In this case, we need a mechanism that allows the group to re-sync again.
We can divide the problem into two events:
Weak sync issue: Single partition. A few members are out of sync while the majority (>n/2) are in a single state. This can be solved by a sync mechanism that members request and reply the missing commit messages from synced majority untill the they align again.
Hard sync issue: There is no state that the majority agrees on. This requires a hard reset, which group is recreated from scratch. One idea can be defining the checkpoints and if there is hard sync issue and rollback to latest checkpoint but MLS does not allow rollback due to the security features.
So, it requires members must determine whether they are synced or not. For this, two capabilities are required:
Deterministic state identification:
Each member computes a deterministic fingerprint of its local state. We can simply use the tree hash for that.
Exchanging fingerprints among the members:
Members gossip their fingerprints so they can compare them and detect whether a partition exists.
After this process, each members know whether there is a weak or hard sync issue in the network globally, if yes, its situation (whether they themselves are part of a minority partition).
Finally, the member does this according to the collected info:
If there is a hard sync issue, A member initiates a hard reset request proposal (yes, another consensus here. We can safely assume there is a big partition in the state members, and still can conduct consensus)
If there is a weak sync issue, then it initiates the state recovery procedure by data exchanging to retrieve missing commit messages to get the final latest state that gets the result votes or provides the latest commits to others.
Some discussion points:
For a hard reset, probably the reset operator requires all members’ keyPackages, so it raises the question whether every keyPackages should be stored and synced among the members? Looks possible, but not efficient. A Logos storage can be used here.
For the exchanging commit messages phase in the weak sync issue section, we can use SDS to not re-implement the custom exchange mechanism. cc:@jazzz and @haelius.
Here is more detail for how we can use SDS protocol against the weak sync issues. First the SDS determines is there any missing commit if so, SDS-R repeairs the missing commit by the user can request the missing commits from the network.
message_ids are added to the bloom filter upon successful reception, enabling other participants to probabilistically infer acknowledgement.
Finally find the missing dependency by the SDS.
For SDS-R part:
Repair request buffer; n many commit // the calculating the parameter n must be optimized efficiency and availability.
Since the MLS epochs are ordered, other saying, we have diffrent commits for each epochs. We may exclude the conflit resolving since we expect to see the same commit for the same epoch. If no, the users can get the different commits and try tp apply, since the commit mesasges are encrypted messages, only the real messages will be decrypted and applied.
Ce-MLS makes an assumption (at least operationally) that there is a service which allows users to publish keypackages. These keypackages can then be accessed asynchronously, by group creators. Without it the bootstrapping problem needs to be solved in another manner.
Looks possible, but not efficient. A Logos storage can be used here.
This problem is analogous to X3DH Keybundles. In order to maintain asynchronous session initiation, the bundles need to be available to all clients. Status’ approach to this problem was to rebroadcast bundles every ~8-24 hours, so that the bundles are available to all members of the network. This has many considerable downsides.
My argument would be that storing these keypackages is much more efficient than the alternatives - though may require more upfront engineering.
I really like the idea of using the treehash as a fingerprint. Curious what the transporting of this data would look like?
For discussion I imagine the payload would look similar to:
message SyncStateInfo {
uint32 leaf_index = 1; // identify member
bytes fingerprint = 2; // current state
uint64 timestamp = 3; // or seqnum to determine most recent update
}
The data isn’t critical however could have negative impacts on the privacy model if sent in the clear. Would be nice to find an encryption channel which is less sensitive to desync.
In de-MLS the users share the keyPackages in plaintext, so we can implement that the users can store some past keyPackage and exchange if there is need for validating the voting proposals and or re-creating the group. Therefore, there is no dedicated service for that right now.
Yes some crypto would be ideal here, let me focus on this.