Text Diff Algorithm - LCS and Myers Diff Explained

Published 2025-05-12 · ToolNest

Diff tools are essential for comparing code changes and document revisions. This guide explains the algorithms behind them.

What is a Diff Algorithm?

A diff algorithm finds the minimal set of changes (insertions and deletions) that transform one text into another. The result is displayed as added lines (green) and removed lines (red).

LCS Algorithm (Line-Level Diff)

The Longest Common Subsequence (LCS) algorithm is the simplest approach for line-by-line comparison.

How It Works

  1. Split both texts into arrays of lines
  2. Build a DP (dynamic programming) table comparing all line pairs
  3. Backtrack through the table to find the LCS
  4. Lines not in the LCS are marked as added or removed

JavaScript Implementation

function diffLines(a, b) {
  const linesA = a.split('\n');
  const linesB = b.split('\n');
  const m = linesA.length, n = linesB.length;
  
  // Build DP table
  const dp = Array.from({length: m+1}, () => new Array(n+1).fill(0));
  for (let i = m-1; i >= 0; i--) {
    for (let j = n-1; j >= 0; j--) {
      dp[i][j] = linesA[i] === linesB[j]
        ? dp[i+1][j+1] + 1
        : Math.max(dp[i+1][j], dp[i][j+1]);
    }
  }
  
  // Backtrack to find diff
  const result = [];
  let i = 0, j = 0;
  while (i < m && j < n) {
    if (linesA[i] === linesB[j]) {
      result.push({type: 'same', text: linesA[i]}); i++; j++;
    } else if (dp[i+1][j] >= dp[i][j+1]) {
      result.push({type: 'removed', text: linesA[i]}); i++;
    } else {
      result.push({type: 'added', text: linesB[j]}); j++;
    }
  }
  while (i < m) result.push({type: 'removed', text: linesA[i++]});
  while (j < n) result.push({type: 'added', text: linesB[j++]});
  return result;
}

Time and Space Complexity

  • Time: O(m × n) where m and n are line counts
  • Space: O(m × n) for the DP table

For large files (10,000+ lines), this becomes slow and memory-heavy.

Myers Algorithm (Git's Choice)

Git uses the Myers diff algorithm, which is more efficient:

Key Ideas

  1. Models the problem as finding the shortest path in an edit graph
  2. Uses "snake" traversals along diagonal edges
  3. Linear space variant exists for large files

Advantages

  • Time: O(ND) where D is the number of differences (fast when changes are small)
  • Space: O(N) with the linear variant
  • Produces minimal, readable diffs

Character-Level vs Line-Level Diff

Level Best For Example
Line-level Code, config files Git diff
Character-level Prose, documents Word's track changes
Word-level Balanced Some merge tools

ToolNest's Text Diff uses line-level LCS — best for code and configuration comparison.

Interpreting Diff Results

  same line (unchanged)
- removed line (in original, not in modified)
+ added line (in modified, not in original)
  • Green (+): Lines added in the new version
  • Red (-): Lines removed from the old version
  • Dimmed: Unchanged context lines

Practical Use Cases

  1. Code review: Compare before/after changes
  2. Config management: See what changed in config files
  3. Document revision: Track edits between versions
  4. Data comparison: Compare exported data snapshots
  5. Debugging: Compare expected vs actual output

Limitations

  1. Rearrangements: LCS-based diffs show moved blocks as delete+add, not as moves
  2. Large files: O(m×n) memory can be prohibitive
  3. Semantic understanding: Diff doesn't understand code semantics (e.g., variable renames)

Use ToolNest Text Diff for quick line-by-line comparison, right in your browser.

← Back to Articles