Text Diff Algorithm - LCS and Myers Diff Explained
Diff tools are essential for comparing code changes and document revisions. This guide explains the algorithms behind them.
What is a Diff Algorithm?
A diff algorithm finds the minimal set of changes (insertions and deletions) that transform one text into another. The result is displayed as added lines (green) and removed lines (red).
LCS Algorithm (Line-Level Diff)
The Longest Common Subsequence (LCS) algorithm is the simplest approach for line-by-line comparison.
How It Works
- Split both texts into arrays of lines
- Build a DP (dynamic programming) table comparing all line pairs
- Backtrack through the table to find the LCS
- Lines not in the LCS are marked as added or removed
JavaScript Implementation
function diffLines(a, b) {
const linesA = a.split('\n');
const linesB = b.split('\n');
const m = linesA.length, n = linesB.length;
// Build DP table
const dp = Array.from({length: m+1}, () => new Array(n+1).fill(0));
for (let i = m-1; i >= 0; i--) {
for (let j = n-1; j >= 0; j--) {
dp[i][j] = linesA[i] === linesB[j]
? dp[i+1][j+1] + 1
: Math.max(dp[i+1][j], dp[i][j+1]);
}
}
// Backtrack to find diff
const result = [];
let i = 0, j = 0;
while (i < m && j < n) {
if (linesA[i] === linesB[j]) {
result.push({type: 'same', text: linesA[i]}); i++; j++;
} else if (dp[i+1][j] >= dp[i][j+1]) {
result.push({type: 'removed', text: linesA[i]}); i++;
} else {
result.push({type: 'added', text: linesB[j]}); j++;
}
}
while (i < m) result.push({type: 'removed', text: linesA[i++]});
while (j < n) result.push({type: 'added', text: linesB[j++]});
return result;
}
Time and Space Complexity
- Time: O(m × n) where m and n are line counts
- Space: O(m × n) for the DP table
For large files (10,000+ lines), this becomes slow and memory-heavy.
Myers Algorithm (Git's Choice)
Git uses the Myers diff algorithm, which is more efficient:
Key Ideas
- Models the problem as finding the shortest path in an edit graph
- Uses "snake" traversals along diagonal edges
- Linear space variant exists for large files
Advantages
- Time: O(ND) where D is the number of differences (fast when changes are small)
- Space: O(N) with the linear variant
- Produces minimal, readable diffs
Character-Level vs Line-Level Diff
| Level | Best For | Example |
|---|---|---|
| Line-level | Code, config files | Git diff |
| Character-level | Prose, documents | Word's track changes |
| Word-level | Balanced | Some merge tools |
ToolNest's Text Diff uses line-level LCS — best for code and configuration comparison.
Interpreting Diff Results
same line (unchanged)
- removed line (in original, not in modified)
+ added line (in modified, not in original)
- Green (+): Lines added in the new version
- Red (-): Lines removed from the old version
- Dimmed: Unchanged context lines
Practical Use Cases
- Code review: Compare before/after changes
- Config management: See what changed in config files
- Document revision: Track edits between versions
- Data comparison: Compare exported data snapshots
- Debugging: Compare expected vs actual output
Limitations
- Rearrangements: LCS-based diffs show moved blocks as delete+add, not as moves
- Large files: O(m×n) memory can be prohibitive
- Semantic understanding: Diff doesn't understand code semantics (e.g., variable renames)
Use ToolNest Text Diff for quick line-by-line comparison, right in your browser.