diff options
Diffstat (limited to 'docs/design/branch-based-merge.md')
| -rw-r--r-- | docs/design/branch-based-merge.md | 386 |
1 files changed, 386 insertions, 0 deletions
diff --git a/docs/design/branch-based-merge.md b/docs/design/branch-based-merge.md new file mode 100644 index 0000000..331af5f --- /dev/null +++ b/docs/design/branch-based-merge.md @@ -0,0 +1,386 @@ +# Branch-Based Merge with Conflict Detection + +**Last Updated:** 2025-11-13 + +## Overview + +This document describes the approach for handling concurrent operations in the CRDT, using vector clocks to detect causal branches and implementing smart merge logic that automatically resolves safe conflicts while flagging ambiguous ones for user resolution. + +## The Core Challenge + +When two users make concurrent subdivisions to a music notation grid, we need to: +1. Apply both operations (CRDT convergence requirement) +2. Detect when operations conflict (overlapping edits, duration violations) +3. Present conflicts to users in an understandable way +4. Preserve both users' intent without silent data loss + +## Detecting Concurrent Operations + +Vector clocks allow us to determine if two operations are concurrent (neither causally depends on the other): + +```rust +impl VectorClock { + pub fn is_concurrent_with(&self, other: &VectorClock) -> bool { + // Two clocks are concurrent if neither is ≤ the other + // This means each has seen events the other hasn't + match self.partial_cmp(other) { + None => true, // Incomparable = concurrent + Some(Ordering::Equal) => false, + _ => false, + } + } +} +``` + +**Example:** +``` +Alice's clock: {alice: 5, bob: 2} +Bob's clock: {alice: 3, bob: 6} + +Alice has seen 5 of her ops, 2 of Bob's +Bob has seen 3 of Alice's ops, 6 of his own + +Neither clock is ≤ the other → Concurrent! +``` + +## Grouping Operations into Causal Branches + +Operations form a directed acyclic graph (DAG) based on causality: + +```rust +pub struct Branch { + pub tip_clock: VectorClock, + pub ops: Vec<Op>, +} + +impl Doc { + /// Detect causal branches in the operation history + pub fn detect_branches(&self) -> Vec<Branch> { + let mut branches: Vec<Branch> = vec![]; + + for op in &self.ops { + // Find branches this op extends (causally after) + let extends: Vec<usize> = branches.iter() + .enumerate() + .filter(|(_, b)| op.clock > b.tip_clock) + .map(|(i, _)| i) + .collect(); + + match extends.len() { + 0 => { + // New concurrent branch + branches.push(Branch { + tip_clock: op.clock.clone(), + ops: vec![op.clone()], + }); + } + 1 => { + // Extends single branch (linear history) + let branch = &mut branches[extends[0]]; + branch.ops.push(op.clone()); + branch.tip_clock = op.clock.clone(); + } + _ => { + // Merges multiple branches + // This op can see all previous concurrent branches + // Could create a new "merge node" in the DAG + } + } + } + + branches + } +} +``` + +## Reconstructing Actor Views + +Each actor has a causal view of the state at their last operation: + +```rust +impl Doc { + /// Realize the state as a specific actor saw it + pub fn realize_actor_view(&self, actor_id: &Uuid) -> RealizedDoc { + let last_op = self.ops.iter() + .filter(|op| op.clock.get(actor_id) > 0) + .last(); + + if let Some(last_op) = last_op { + let mut realized = RealizedDoc::default(); + + // Apply only operations in this actor's causal past + for op in &self.ops { + if op.clock <= last_op.clock { + op.apply(&mut realized); + } + } + + realized + } else { + RealizedDoc::default() + } + } +} +``` + +**Why this works:** Vector clocks define a partial order. If `op.clock <= actor_last_clock`, then the actor had seen (or could have seen) that operation when they made their edit. + +## Smart Conflict Detection + +Not all concurrent operations conflict! We detect specific problematic patterns: + +### 1. Overlapping Subdivisions + +```rust +fn detect_overlapping_subdivisions(&self) -> Vec<ConflictDetail> { + let mut conflicts = vec![]; + + for (i, op_a) in self.ops.iter().enumerate() { + if let OpPayload::Subdivide { deleted_cell_ids: cells_a, .. } = &op_a.payload { + for op_b in &self.ops[i+1..] { + if let OpPayload::Subdivide { deleted_cell_ids: cells_b, .. } = &op_b.payload { + // Must be concurrent + if !op_a.clock.is_concurrent_with(&op_b.clock) { + continue; + } + + // Check for overlapping target cells + let overlap: Vec<_> = cells_a.iter() + .filter(|c| cells_b.contains(c)) + .cloned() + .collect(); + + if !overlap.is_empty() { + conflicts.push(ConflictDetail { + kind: ConflictKind::OverlappingSubdivision, + ops: vec![op_a.id, op_b.id], + affected_cells: overlap, + }); + } + } + } + } + } + + conflicts +} +``` + +**Example:** +``` +Initial: [A, B, C, D] + +Alice subdivides [A, B] → concurrent → Both apply +Bob subdivides [B, C] → concurrent → Duration mismatch! + +Cell B is deleted by both operations +Result has extra duration from both new cell sets +``` + +### 2. Lost Edits (Edit-Delete Conflict) + +```rust +fn detect_lost_edits(&self) -> Vec<ConflictDetail> { + let mut conflicts = vec![]; + + for (i, op_a) in self.ops.iter().enumerate() { + if let OpPayload::SetCellValue { cell_id, .. } = op_a.payload { + for op_b in &self.ops[i+1..] { + if let OpPayload::Subdivide { deleted_cell_ids, .. } = &op_b.payload { + if op_a.clock.is_concurrent_with(&op_b.clock) + && deleted_cell_ids.contains(&cell_id) { + conflicts.push(ConflictDetail { + kind: ConflictKind::CellValueLostBySubdivision, + ops: vec![op_a.id, op_b.id], + affected_cells: vec![cell_id], + }); + } + } + } + } + } + + conflicts +} +``` + +**Example:** +``` +Alice sets cell B value to "C#" → concurrent +Bob subdivides [B, C] → concurrent + +Alice's edit is "lost" because B is deleted +User should be warned about this +``` + +### 3. Duration Mismatches + +```rust +fn detect_duration_mismatches(&self, realized: &RealizedDoc) -> Vec<ConflictDetail> { + let mut conflicts = vec![]; + + for row in &realized.grid.rows { + let active_cells: Vec<_> = row.cells.iter() + .filter(|c| !c.deleted) + .collect(); + + let actual: f64 = active_cells.iter() + .map(|c| c.duration) + .sum(); + + let expected = row.expected_duration; + + if (actual - expected).abs() > 0.001 { + // Find which operations affected this row + let affecting_ops = self.find_ops_affecting_row(row.id); + + conflicts.push(ConflictDetail { + kind: ConflictKind::DurationMismatch { + expected, + actual, + }, + ops: affecting_ops, + affected_cells: active_cells.iter().map(|c| c.id).collect(), + }); + } + } + + conflicts +} +``` + +## The Smart Merge Algorithm + +```rust +pub enum MergeResult { + Clean(RealizedDoc), + Conflict { + branches: Vec<(Uuid, RealizedDoc)>, // Actor views + conflicts: Vec<ConflictDetail>, + }, +} + +impl Doc { + pub fn smart_merge(&self) -> MergeResult { + // Apply all operations (CRDT convergence) + let mut merged = RealizedDoc::default(); + for op in &self.ops { + op.apply(&mut merged); + } + + // Detect conflicts + let mut conflicts = vec![]; + conflicts.extend(self.detect_overlapping_subdivisions()); + conflicts.extend(self.detect_lost_edits()); + conflicts.extend(self.detect_duration_mismatches(&merged)); + + if conflicts.is_empty() { + // Clean merge - all concurrent ops are compatible + MergeResult::Clean(merged) + } else { + // Reconstruct each actor's view for the merge UI + let actors = self.get_divergent_actors(); + let actor_views = actors.into_iter() + .map(|actor_id| (actor_id, self.realize_actor_view(&actor_id))) + .collect(); + + MergeResult::Conflict { + branches: actor_views, + conflicts, + } + } + } +} +``` + +## User Experience + +### Auto-Merge (No Conflict) + +``` +Alice subdivides [A, B] → Triplets +Bob subdivides [C, D] → Quintuplets + +No overlap, no conflict +Result: [Triplet1, Triplet2, Triplet3, Quintuplet1, ..., Quintuplet5] +Duration preserved ✓ +``` + +### Conflict Detection UI + +``` +┌──────────────────────────────────────────────────────────┐ +│ Merge Conflict: Overlapping Subdivisions │ +├──────────────────────────────────────────────────────────┤ +│ │ +│ Alice and Bob edited the same cells concurrently │ +│ │ +│ Alice's version (Row 1): │ +│ ┌───┬───┬───┬───┐ │ +│ │T1 │T2 │T3 │ D │ Duration: 1.0 ✓ │ +│ └───┴───┴───┴───┘ │ +│ Triplets from subdividing [A, B, C] │ +│ │ +│ Bob's version (Row 1): │ +│ ┌───┬───┬───┬───┬───┬───┐ │ +│ │ A │Q1 │Q2 │Q3 │Q4 │Q5 │ Duration: 1.0 ✓ │ +│ └───┴───┴───┴───┴───┴───┘ │ +│ Quintuplets from subdividing [B, C, D] │ +│ │ +│ Merged result (both applied): │ +│ ┌───┬───┬───┬───┬───┬───┬───┬───┐ │ +│ │T1 │T2 │T3 │Q1 │Q2 │Q3 │Q4 │Q5 │ Duration: 1.5 ✗ │ +│ └───┴───┴───┴───┴───┴───┴───┴───┘ │ +│ │ +│ [ Keep Alice's version ] │ +│ [ Keep Bob's version ] │ +│ [ Keep both as separate grids ] │ +│ [ Manually resolve ] │ +└──────────────────────────────────────────────────────────┘ +``` + +## Resolution Strategies + +### 1. Keep One Version +Create a new operation that: +- Undoes the unwanted changes (marks those cells as deleted) +- Re-applies one actor's version as a new operation +- Preserves causality (new operation depends on both branches) + +### 2. Keep Both as Separate Grids +- Fork the grid into two separate grids +- Each preserves one actor's version +- Allows manual copying between grids later + +### 3. Manual Resolution +- UI allows editing the merged (invalid) state +- Create operations to fix duration mismatches +- New operations causally depend on all previous ops + +## Key Design Properties + +1. **No Silent Data Loss** - Both operations always apply; conflicts are explicit +2. **Automatic Safe Merges** - Non-overlapping concurrent edits merge cleanly +3. **Deterministic Convergence** - All replicas converge to same state (including conflicts) +4. **User Control** - Ambiguous cases are presented to users, not arbitrarily resolved +5. **Causal Consistency** - Vector clocks ensure we can reconstruct any actor's view + +## Implementation Notes + +- Store all operations (even if they create conflicts) +- Conflicts are part of the realized state, not errors +- Resolution creates new operations (not undoing history) +- UI must handle temporary invalid states gracefully +- Consider conflict "severity" (overlapping subdivisions vs. minor duration drift) + +## Future Enhancements + +- **Operational Transformation** - Automatically adjust positions in some cases +- **Three-way Merge** - Show common ancestor state in conflict UI +- **Conflict Clustering** - Group related conflicts (e.g., same row) +- **Suggested Resolutions** - AI/heuristic-based merge suggestions + +## References + +- See `crdt-design.md` for overall CRDT architecture +- See `fractional-indexing.md` for position semantics |
