Longest Common Subsequence

Finding the length of longest common subsequence (LCS)

Subsequence and common subsequence

A subsequence is a sequence obtained from another by the exclusion of a number of elements. Also, the relative order of elements in a subsequence remains the same as that of the original sequence.

Example of subsequences in a given sequence
Sequence: [ A, G, O, R, T, ]
Subsequence 1: [ A, G, R, T, ]
Subsequence 2: [ G, R, T, ]
Subsequence 3: [ A, O, R, ]

If there are ‘n’ elements in a sequence, then the number of subsequences that can be obtained are 2^n.
nC0 + nC1 + nC2 + .. nCn = 2^n

Example of longest common subsequence of given subsequences
Given two sequences, the idea is to find the longest common subsequence to the two.
Sequence A: [ A, G, O, R, T, ]
Sequence B: [ B, G, P, O, A, T, ]

Common Sequence 1: [ A, T, ]
Common Sequence 2: [ G, O, T, ]

Thus the longest common subsequence (LCS) is [ G, O, T, ] and its length is 3


Recursive Approach


LCSRecursive ( seq1(0 … sz1), seq2(0 … sz2) )
If length of either sequence is 0, then
      length of LCS is 0
If the last characters of the sequences match, then
      length of LCS is 1 + LCSRecursive ( seq1 ( 0 … sz1-1 ), seq2 ( 0 .. sz2-1 ) )
If the last characters of the sequences do not match, then
      length of LCS is max [ LCSRecursive ( seq1 ( 0 … sz1 ), seq2 ( 0 … sz2-1) ), LCSRecursive ( seq1 ( 0 … sz1-1 ), seq2 ( 0 … sz2 )) ]

Example, consider sequences [ A, G, O, T, R, ] and [ B, G, P, O, A, T, ]
LCS Recursive

The recursive algorithm runs in exponential time. Since identical subproblems can be seen as we go down the recursion tree, the performance of the recursive algorithm can be improved by storing the result of the solved subproblems. This memoized version of the algorithm runs in O(sz1.sz2) time.


Dynamic Programming Approach


A table with rows = length (Sequence 2) + 1 and columns = length (Sequence 1) + 1 is created.

Note: Sequences begin with index 0

Case: Empty Sequence
      All elements of R0 equal 0 indicating empty Sequence 1.
      All elements of C0 equal 0 indication empty Sequence 2.

Case: Characters in the sequences match.
      If the character in Sequence 1 at position a-1 matches the character in Sequence 2 at position b-1
      then increase the length of LCS
      i.e LCS at location [a][b] = 1 + length of LCS at location[a-1][b-1] ([a-1][b-1] is the diagonal element)

Case: Characters in the sequences do not match.
      If the character in Sequence 1 at position a-1 does not match the character in Sequence 2 at position b-1
      then length of LCS remains the same (i.e the earlier found length of LCS).
      i.e LCS at location [a][b] = max ( LCS at location[a-1][b], LCS at location[a][b-1] )
      location[a-1][b] is the element above and location[a][b-1] is the element on the left Longest_Common_Subsequence_DP

Time complexity of finding the longest common subsequence using dynamic programming : O(N*M), where N and M are the lengths of the two sequences.


C++14 : Longest Common Subsequence implementation using recursion and dynamic programming

#include<iostream>
#include<vector>
#include<string>
#include<algorithm>

using namespace std;

/* A slower recursive implementation to fetch the length of longest common subsequence */
int GetLength_LCSRecursive (string seq1, string seq2) {

    int sz1 = seq1.size();
    int sz2 = seq2.size();

    if (sz1 == 0 or sz2 == 0) { 
       return 0;
    } else if (seq1.at(sz1 - 1) == seq2.at(sz2 - 1)) { // Last characters in the two sequences match
       return 1 + GetLength_LCSRecursive ( seq1.substr(0, sz1-1), seq2.substr(0, sz2-1) );
    } else { // Last characters in the two sequences do not match
       return max ( GetLength_LCSRecursive (seq1.substr(0, sz1), seq2.substr(0, sz2-1)), 
                    GetLength_LCSRecursive (seq1.substr(0, sz1-1), seq2.substr(0, sz2)));
    }   
}

/* A faster recursive implementation to fetch the length of longest common subsequence */
int GetLength_EfficientLCSRecursive (string seq1, string seq2, vector<vector<int>> & result_table) {

    int sz1 = seq1.size();
    int sz2 = seq2.size();

    if (sz1 == 0 or sz2 == 0) { 
       return 0;
    }   

    // A subproblem with sz1 and sz2 has been solved already. Return the result
    if (result_table[sz1][sz2] != -1) {
       return result_table[sz1][sz2]; 
    }   

    int result;

    if (seq1.at(sz1 - 1) == seq2.at(sz2 - 1)) { // Last characters in the two sequences match
       result = 1 + GetLength_LCSRecursive(seq1.substr(0, sz1-1), seq2.substr(0, sz2-1));
    } else { // Last characters in the two sequences do not match
       result = max( GetLength_LCSRecursive (seq1.substr(0, sz1), seq2.substr(0, sz2-1)),
                     GetLength_LCSRecursive (seq1.substr(0, sz1-1), seq2.substr(0, sz2)));
    }   

    result_table[sz1][sz2] = result;
    return result;
}

/* A dynamic programming implementation to fetch the length of longest common subsequence (LCS) */
int GetLength_LCSDP (string seq1, string seq2, string& lcs) {

    int sz1 = seq1.size();
    int sz2 = seq2.size();
    int length_lcs = 0;

    int LCSTable[sz1+1][sz2+1];

    // Finding the length of LCS
    for (int a = 0; a <= sz1; a++) {
        for (int b = 0; b <= sz2; b++) {
            if (a == 0 or b == 0) {
                LCSTable[a][b] = 0;
            } else if (seq1.at(a-1) == seq2.at(b-1)) {
                LCSTable[a][b] = 1 + LCSTable[a-1][b-1];
            } else {
                LCSTable[a][b] = max (LCSTable[a-1][b], LCSTable[a][b-1]);
            }
        }
    }   

    length_lcs = LCSTable[sz1][sz2];
 
    // Finding the LCS
    int i = sz1, j = sz2;
    while (i > 0 and j > 0) { 
       if (seq1[i-1] == seq2[j-1]) {
          lcs += seq1[i-1];  // Move diagonally towards top left 
          i--, j--;
       } else if (LCSTable[i-1][j] > LCSTable[i][j-1]) {
          i--; // Move upwards
       } else {
          j--; // Move towards left
       }
    }   

    reverse (lcs.begin(), lcs.end());
    return length_lcs;
}

int main(){

    string seq1 = "ATPLBCCXWKQ";
    string seq2 = "FTCMXACWZYKQ";
    string lcs   = ""; 
    
    // Initialize result table with '-1'
    vector<vector<int>> result_table (seq1.size()+1, vector<int>(seq2.size()+1, -1));

    cout << "Length of longest common subsequence using slower recursion : " << GetLength_LCSRecursive (seq1, seq2) << endl;
    cout << "Length of longest common subsequence using faster recursion : " << GetLength_EfficientLCSRecursive (seq1, seq2, result_table) << endl;
    cout << "Length of longest common subsequence using dynamic programming : " << GetLength_LCSDP (seq1, seq2, lcs) << endl;
    cout << "Longest Common Subsequence : " << lcs << endl;

    seq1 = "AGORT";
    seq2 = "BGPOAT";
    lcs   = ""; 
    vector<vector<int>> result_table1 (seq1.size()+1, vector<int>(seq2.size()+1, -1));

    cout << "Length of longest common subsequence using slower recursion : " << GetLength_LCSRecursive (seq1, seq2) << endl;
    cout << "Length of longest common subsequence using faster recursion : " << GetLength_EfficientLCSRecursive (seq1, seq2, result_table1) << endl;
    cout << "Length of longest common subsequence using dynamic programming : " << GetLength_LCSDP (seq1, seq2, lcs) << endl;
    cout << "Longest Common Subsequence : " << lcs << endl;

    return 0;
}

Output showing the length of the longest common subsequence found using recursion and dynamic programming

Length of longest common subsequence using slower recursion : 6
Length of longest common subsequence using faster recursion : 6
Length of longest common subsequence using dynamic programming : 6
Longest Common Subsequence : TCXWKQ
Length of longest common subsequence using slower recursion : 3
Length of longest common subsequence using faster recursion : 3
Length of longest common subsequence using dynamic programming : 3
Longest Common Subsequence : GOT
Copyright (c) 2019, Algotree.org.
All rights reserved.