# Longest Common Subsequence

Understanding what a subsequence is in a given text or string.

• A subsequence is a sequence of characters or letters obtained from a given sequence by excluding a number of elements.
• The relative order of elements in a subsequence remains the same as that of the original sequence.

Consider the following example
Given sequence : [ A, G, O, R, T ]
We could have subsequences : ( [ A ], [ A, G ], [ A, G, R, T ], [ G, R, T ], …). In all these subsequences, the relative order of the letters remain similar to that of the original sequence [ A, G, O, R, T ].

Finding the longest common subsequence from two given sequences.
Consider the sequences of letters : Seq A : [ A, G, O, R, T ], Seq B : [ B, G, P, O, A, T ]

Sequence A Sequence B Common subsequence Length
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ A ] 1
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ G ] 1
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ O ] 1
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ A, T ] 2
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ G, O ] 2
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ G, T ] 2
[ A, G, O, R, T ] [ B, G, P, O, A, T ] [ G, O, T ] 3

Since the length of common subsequence [ G, O, T ] is 3, it is the longest common subsequence.

Recursive Approach

LCS ( seq_A , seq_B )
If the length of either sequence is 0, then
The length of LCS would be 0
If the last characters of the sequences seq_A & seq_B match, then
Example : [ A, G, O, R, T ] and [ B, G, P, O, A, T ] ( for every matching letter the length of the common subsequence increases by 1 )
The length of LCS would be 1 + LCS ( [ A, G, O, R ], [ B, G, P, O, A ] )
Length = 1 + LCS ( seq_A - 1, seq_B - 1 )
If the last characters of the sequences do not match, then
Example : [ A, G, O, R ] and [ B, G, P, O, A ]
The length of LCS would be max [ LCS ( [ A, G, O, R ], [ B, G, P, O ] ), LCS ( [ A, G, O ], [ B, G, P, O, A ] ) ]
Length = max [ LCS ( seq_A, seq_B - 1 ), LCS ( seq_A - 1, seq_B ) ]

Example, consider sequences [ A, G, O, R, T ] and [ B, G, P, O, A, T ] The recursive algorithm runs in exponential time as identical subproblems are solved as we go down the recursion tree. The complexity of recursive algorithm reaches 2n in worst case when none of the characters in the two subsequences match.

Dynamic Programming Approach

Consider two subsequences A [ 0 .. a - 1 ] of length a and B [ 0 .. b - 1 ] of length b.
Note : The index of the first character of the two subsequences is 0.

We try to construct a table with rows = [ length (Sequence B) + 1 ] and columns = [ length (Sequence A) + 1 ].
Example : Seq A = [ A, G, O, R, T ], Seq B = [ B, G, P, O, A, T ].
We fill the table in bottom-up manner, i.e we try to fill in the table for smaller sequences in Seq A and Seq B.

Base Case: Empty sequence
All elements of R0 equal 0 indicating empty sequence A.
All elements of C0 equal 0 indication empty sequence B.

Based on the recurrence relation described above

Case: Characters in the sequences match.
If the character in sequence A at position a-1 matches the character in sequence B at position b-1
then increase the length of LCS
i.e LCS length at location [ a ] [ b ] = 1 + LCS length at location[ a - 1 ] [ b - 1 ]
We see from the table that ( [ a - 1 ] [ b - 1 ] is the diagonal element )

Example : Consider subsequence [ A G ] in Seq A and subsequence [ B G ] in Seq B.
As the last character G in [ B G ] & [ A G ] match, we try to fill in table [ 2 ] [ 2 ] i.e we are finding the length of LCS in
subsequence [ B G ] ( of length 2 ) and subsequence [ A G ] ( of length 2 ).
Thus length of LCS in ( [ B G ] , [ A G ] ) = 1 + length of LCS in ( [ B ] ( of length 1 ), [ A ] ( of length 1 ).
i.e when the last character match we have LCS [ 2 ] [ 2 ] = 1 + LCS [ 1 ] [ 1 ]

Case: Characters in the sequences do not match.
If the character in sequence A at position a - 1 does not match the character in sequence B at position b - 1
then the length of LCS remains the same (i.e the earlier found length of LCS).
i.e LCS at location [ a ] [ b ] = max ( LCS at location [ a - 1 ] [ b ], LCS at location [ a ] [ b - 1 ] )
location [ a - 1 ] [ b ] is the element above and location [ a ] [ b - 1 ] is the element on the left

Example : Consider subsequence [ A G O ] in Seq A and subsequence [ B G ] in Seq B.
As the last character G in [ B G ] does not match with the last character O in [ A G O ] , we try to fill in table [ 2 ] [ 3 ] i.e we are finding the length of LCS in
subsequence [ B G ] ( of length 2 ) and subsequence [ A G O ] ( of length 3 ).
Thus length of LCS in ( [ B G ] , [ A G O ] ) = max length of LCS in ( [ B G ] ( length 2 ), [ A G ] ( length 2 ) or [ B ] ( length 1 ), [ A G O ] (of length 3) )
i.e when the last character does not match we have LCS [ 2 ] [ 3 ] = max ( LCS [ 2 ] [ 2 ], LCS [ 1 ] [ 3 ] ) = max ( 1, 0 ) = 1 Time complexity : O ( N * M ), where N and M are the lengths of the two sequences.

Longest common subsequence implementation

``````from typing import List # For annotations

# A slower recursive function to get the length of the longest common subsequence
def LCSLength (seq1 : str, seq2 : str) -> int :

sz1 = len(seq1)
sz2 = len(seq2)

if (sz1 == 0 or sz2 == 0) :
return 0
elif (seq1[sz1 - 1] == seq2[sz2 - 1]) : # Last characters in the two sequences match
return 1 + LCSLength ( seq1[:-1], seq2[:-1] )
else : # Last characters in the two sequences do not match
return max ( LCSLength (seq1, seq2[:-1]), LCSLength (seq1[:-1], seq2) )

# A faster recursive function to get the length of the longest common subsequence
def LCSLength_Faster (seq1 : str, seq2 : str, table : List[List[int]]) -> int :

sz1 = len(seq1)
sz2 = len(seq2)

if (sz1 == 0 or sz2 == 0) :
return 0

# A subproblem with sz1 and sz2 has been solved already. Return the result
if (table[sz1][sz2] != -1) :
return table[sz1][sz2]

result = 0

if (seq1[sz1 - 1] == seq2[sz2 - 1]) : # Last characters in the two sequences match
result = 1 + LCSLength_Faster (seq1[:-1], seq2[:-1], table)
else : # Last characters in the two sequences do not match
result = max ( LCSLength_Faster (seq1, seq2[:-1], table),
LCSLength_Faster (seq1[:-1], seq2, table) )

table[sz1][sz2] = result
return result

# The below function uses dynamic programming to fetch the length of the LCS
def LCSLength_DP (seq1 : str, seq2 : str) :

sz1 = len(seq1)
sz2 = len(seq2)
length_lcs = 0

# Create a DP table of size [sz1 + 1][sz2 + 1]
LCSTable =  * ( sz1 + 1 )
for i in range ( sz1 + 1 ) :
LCSTable[i] =  * ( sz2 + 1 )

# Finding the length of LCS
for a in range (sz1 + 1) :
for b in range (sz2 + 1) :
if (a == 0 or b == 0) :
LCSTable[a][b] = 0
elif ( seq1[a-1] == seq2[b-1] ) :
LCSTable[a][b] = 1 + LCSTable[a-1][b-1]
else :
LCSTable[a][b] = max ( LCSTable[a-1][b], LCSTable[a][b-1] )

length_lcs = LCSTable[sz1][sz2]

# Constructing the LCS
i = sz1
j = sz2
lcs = ""
while ( i > 0 and j > 0 ) :
if ( seq1[i-1] == seq2[j-1] ) :
lcs += seq1[i-1]  # Move diagonally towards top left
i -= 1
j -= 1
elif ( LCSTable[i-1][j] > LCSTable[i][j-1] ) :
i -= 1 # Move upwards
else :
j -= 1 # Move towards left

# Reverse lcs to find the actual sequence
lcs = lcs[::-1]

return length_lcs, lcs

list_seq_a = [ "ATPLBCCXWKQ", "AGORT" ]
list_seq_b = [ "FTCMXACWZYKQ", "BGPOAT" ]

for i in range(len(list_seq_a)) :

# Initialize result table with '-1'
rows = len(list_seq_a[i])
cols = len(list_seq_b[i])

table = [-1] * ( rows + 1 )
for r in range ( rows + 1 ) :
table[r] = [-1] * ( cols + 1 )

print ("\nSequence 1 : " + list_seq_a[i] + " Sequence 2 : " + list_seq_b[i])
print ("[Slower Recursion] LCS Length : " + str(LCSLength (list_seq_a[i], list_seq_b[i])))
print ("[Faster Recursion] LCS Length : " + str(LCSLength_Faster (list_seq_a[i], list_seq_b[i], table)))
length, lcs = LCSLength_DP (list_seq_a[i], list_seq_b[i])
print("[DP] LCS Length : " + str(length) + " Sequence : " + lcs)

``````

Output

``````Sequence 1 : ATPLBCCXWKQ Sequence 2 : FTCMXACWZYKQ
[Slower Recursion] LCS Length : 6
[Faster Recursion] LCS Length : 6
[DP] LCS Length : 6 Sequence : TCXWKQ

Sequence 1 : AGORT Sequence 2 : BGPOAT
[Slower Recursion] LCS Length : 3
[Faster Recursion] LCS Length : 3
[DP] LCS Length : 3 Sequence : GOT
``````
``````#include<iostream>
#include<vector>
#include<string>
#include<algorithm>

using namespace std;

/* A slower recursive function to get the length of the longest common subsequence */
int LCSLength (string seq1, string seq2) {

int sz1 = seq1.size();
int sz2 = seq2.size();

if (sz1 == 0 or sz2 == 0) {
return 0;
} else if (seq1.at(sz1 - 1) == seq2.at(sz2 - 1)) { // Last characters in the two sequences match
return 1 + LCSLength ( seq1.substr(0, sz1-1), seq2.substr(0, sz2-1) );
} else { // Last characters in the two sequences do not match
return max ( LCSLength (seq1.substr(0, sz1), seq2.substr(0, sz2-1)),
LCSLength (seq1.substr(0, sz1-1), seq2.substr(0, sz2)));
}
}

/* A faster recursive function to get the length of the longest common subsequence */
int LCSLength_Faster (string seq1, string seq2, vector<vector<int>> & table) {

int sz1 = seq1.size();
int sz2 = seq2.size();

if (sz1 == 0 or sz2 == 0) {
return 0;
}

// A subproblem with sz1 and sz2 has been solved already. Return the result
if (table[sz1][sz2] != -1) {
return table[sz1][sz2];
}

int result;

if (seq1.at(sz1 - 1) == seq2.at(sz2 - 1)) { // Last characters in the two sequences match
result = 1 + LCSLength_Faster (seq1.substr(0, sz1-1), seq2.substr(0, sz2-1), table);
} else { // Last characters in the two sequences do not match
result = max ( LCSLength_Faster (seq1.substr(0, sz1), seq2.substr(0, sz2-1), table),
LCSLength_Faster (seq1.substr(0, sz1-1), seq2.substr(0, sz2), table));
}

table[sz1][sz2] = result;
return result;
}

/* The below function uses dynamic programming to fetch the length of the LCS */
int LCSLength_DP (string seq1, string seq2, string& lcs) {

int sz1 = seq1.size();
int sz2 = seq2.size();
int length_lcs = 0;

int LCSTable[sz1+1][sz2+1];

// Finding the length of LCS
for (int a = 0; a <= sz1; a++) {
for (int b = 0; b <= sz2; b++) {
if (a == 0 or b == 0) {
LCSTable[a][b] = 0;
} else if (seq1.at(a-1) == seq2.at(b-1)) {
LCSTable[a][b] = 1 + LCSTable[a-1][b-1];
} else {
LCSTable[a][b] = max (LCSTable[a-1][b], LCSTable[a][b-1]);
}
}
}

length_lcs = LCSTable[sz1][sz2];

// Finding the LCS
int i = sz1, j = sz2;
while (i > 0 and j > 0) {
if (seq1[i-1] == seq2[j-1]) {
lcs += seq1[i-1];  // Move diagonally towards top left
i--, j--;
} else if (LCSTable[i-1][j] > LCSTable[i][j-1]) {
i--; // Move upwards
} else {
j--; // Move towards left
}
}

reverse (lcs.begin(), lcs.end());
return length_lcs;
}

int main() {

vector<string> vector_a = { "ATPLBCCXWKQ", "AGORT" };
vector<string> vector_b = { "FTCMXACWZYKQ", "BGPOAT" };

for (int i = 0; i < 2; i++) {

string lcs   = "";
// Initialize result table with '-1'
vector<vector<int>> table (vector_a[i].size()+1, vector<int>(vector_b[i].size()+1, -1));

cout << "Sequence 1 : " << vector_a[i] << " Sequence 2 : " << vector_b[i] << endl;
cout << "[Slower Recursion] LCS Length : " << LCSLength (vector_a[i], vector_b[i]) << endl;
cout << "[Faster Recursion] LCS Length : " << LCSLength_Faster (vector_a[i], vector_b[i], table) << endl;
cout << "[DP] LCS Length : " << LCSLength_DP (vector_a[i], vector_b[i], lcs) << endl;
cout << "Longest Common Subsequence (LCS) : " << lcs << endl << endl;
}

return 0;
}
``````

Output

``````Sequence 1 : ATPLBCCXWKQ Sequence 2 : FTCMXACWZYKQ
[Slower Recursion] LCS Length : 6
[Faster Recursion] LCS Length : 6
[DP] LCS Length : 6
Longest Common Subsequence (LCS) : TCXWKQ

Sequence 1 : AGORT Sequence 2 : BGPOAT
[Slower Recursion] LCS Length : 3
[Faster Recursion] LCS Length : 3
[DP] LCS Length : 3
Longest Common Subsequence (LCS) : GOT
``````
``````import java.util.Arrays;
import java.util.List;
import java.lang.StringBuilder;

class LCS {

/* A slower recursive implementation to fetch the length of longest common subsequence */
int GetLength_LCS (String seq1, String seq2) {

int sz1 = seq1.length();
int sz2 = seq2.length();

if (sz1 == 0 || sz2 == 0) {
return 0;
} else if (seq1.charAt(sz1 - 1) == seq2.charAt(sz2 - 1)) { // Last characters in the two sequences match
return 1 + GetLength_LCS ( seq1.substring(0, sz1-1), seq2.substring(0, sz2-1) );
} else { // Last characters in the two sequences do not match
return Math.max (GetLength_LCS (seq1.substring(0, sz1), seq2.substring(0, sz2-1)),
GetLength_LCS (seq1.substring(0, sz1-1), seq2.substring(0, sz2)));
}
}

/* A faster recursive implementation to fetch the length of longest common subsequence */
int GetLength_FasterLCS (String seq1, String seq2, int table[][]) {

int sz1 = seq1.length();
int sz2 = seq2.length();

if (sz1 == 0 || sz2 == 0) {
return 0;
}

// A sub-problem with sz1 and sz2 has been solved already. Return the result
if (table[sz1][sz2] != -1) {
return table[sz1][sz2];
}

int result;

if (seq1.charAt(sz1 - 1) == seq2.charAt(sz2 - 1)) { // Last characters in the two sequences match
result = 1 + GetLength_FasterLCS (seq1.substring(0, sz1-1), seq2.substring(0, sz2-1), table);
} else { // Last characters in the two sequences do not match
result = Math.max(GetLength_FasterLCS (seq1.substring(0, sz1), seq2.substring(0, sz2-1), table),
GetLength_FasterLCS (seq1.substring(0, sz1-1), seq2.substring(0, sz2), table));
}

table[sz1][sz2] = result;
return result;
}

/* A dynamic programming implementation to fetch the length of longest common subsequence (LCS) */
int GetLength_LCSDP (String seq1, String seq2, StringBuilder lcs) {

int sz1 = seq1.length();
int sz2 = seq2.length();
int length_lcs = 0;

int LCSTable[][] = new int[sz1+1][sz2+1];

// Finding the length of LCS
for (int a = 0; a <= sz1; a++) {
for (int b = 0; b <= sz2; b++) {
if (a == 0 || b == 0) {
LCSTable[a][b] = 0;
} else if (seq1.charAt(a-1) == seq2.charAt(b-1)) {
LCSTable[a][b] = 1 + LCSTable[a-1][b-1];
} else {
LCSTable[a][b] = Math.max(LCSTable[a-1][b], LCSTable[a][b-1]);
}
}
}

length_lcs = LCSTable[sz1][sz2];

// Finding the LCS
int i = sz1, j = sz2;
while (i > 0 && j > 0) {
if (seq1.charAt(i-1) == seq2.charAt(j-1)) {
lcs.append(seq1.charAt(i-1));  // Move diagonally towards top left
i--;
j--;
} else if (LCSTable[i-1][j] > LCSTable[i][j-1]) {
i--; // Move upwards
} else {
j--; // Move towards left
}
}

lcs.reverse();
return length_lcs;
}

public static void main ( String[] args ) {

List<String> list_a = List.of("ATPLBCCXWKQR", "AGORTRE");
List<String> list_b = List.of("FTCMXACWZYKQR", "BGPOATRT");

LCS obj = new LCS();

for (int i = 0; i < 2; i++) {

StringBuilder lcs =  new StringBuilder();
int table[][] = new int[list_a.get(i).length()+1][list_b.get(i).length()+1];

// Initialize 2 dimensional array with '-1'
for (int[] arr : table) {
Arrays.fill(arr, -1);
}

System.out.println("Sequence 1 : " + list_a.get(i) + ", Sequence 2 : " + list_b.get(i));
System.out.println("[Slower Recursion] LCS length : " + obj.GetLength_LCS (list_a.get(i), list_b.get(i)));
System.out.println("[Faster Recursion] LCS length : " + obj.GetLength_FasterLCS (list_a.get(i), list_b.get(i), table));
System.out.println("[DP] LCS Length : " + obj.GetLength_LCSDP (list_a.get(i), list_b.get(i), lcs));
System.out.println("Longest Common Subsequence (LCS) : " + lcs + "\n");
}
}
}
``````

Output

``````Sequence 1 : ATPLBCCXWKQR, Sequence 2 : FTCMXACWZYKQR
[Slower Recursion] LCS length : 7
[Faster Recursion] LCS length : 7
[DP] LCS Length : 7
Longest Common Subsequence (LCS) : TCXWKQR

Sequence 1 : AGORTRE, Sequence 2 : BGPOATRT
[Slower Recursion] LCS length : 4
[Faster Recursion] LCS length : 4
[DP] LCS Length : 4
Longest Common Subsequence (LCS) : GOTR
``````

Copyright (c) 2019-2022, Algotree.org.