Breaking a string into a given set of words

Given: A string and a dictionary containing words.
Objective: To find out if the string can be partitioned into a sequence of one or more dictionary words.


Dynamic programming approach

Consider the string “abcd” and a dictionary containing [ “a”, “ab”, “bc”, “cd” ].

Base case: An empty string can always be partitioned using the words in the dictionary.
We use an array of length string_size + 1 to check if the substrings of length [ 1 . . . string_size ] can be partitioned using the dictionary words.

We set table [ i ] = True if substring ( 0, i ) can be partitioned.
Thus for string of length 0 i.e for empty string, we set table [ 0 ] = True

Example: Consider the substring [ abc ]

String Substring
length
Substring Partitions
abcd 3 abc "" | abc
a | bc
ab | c

Based the paritions we use dynamic programming to check

  1. Check if the string [ abc ] can be found in the dictionary.
  2. Check if it was possible for earlier partition [ a ] and [ bc ] exist in the dictionary.
  3. Check if it was possible for earlier partition [ ab ] and [ c ] can be found in the dictionary.

If either of the above 3 conditions holds true, then the substring “abc” can be partitioned using the words stored in the dictionary.

Word_Break_DP




















Dynamic Programming : Program to partition a given string into a sequence of dictionary words

#include<iostream>
#include<string>
#include<set>
#include<vector>
#include<cstring>

using namespace std;

bool Check (string str, set<string>& wordDict) {

    int sz = str.size();
    bool partition_possible[sz+1];
    memset(partition_possible, false, sizeof(bool)*(sz+1));
    
    partition_possible[0] = true; // empty string could always be obtained.

    for (int len=1; len<=sz; len++) {
        // Check if the string [ 0 : len ] is found in the dictionary.
        // Example is the current string is "abcd", check if abcd is found in the dictionary.
        cout << "Looking for string : " << str.substr(0, len) << endl;
        if (wordDict.find(str.substr(0, len)) != wordDict.end()) {
            partition_possible[len] = true;
            cout << "--- String found. " << endl;
        } else {
            // See if the possible partitions can be found in the dictionary.
            // Check if it was possible for [a] and partition [bcd] can be found.
            // Check if it was possible for [ab] and partition [cd] can be found.
            // Check if it was possible for [abc] and partition [d] can be found.
            for (int cut=1; cut<len; cut++) {
                // Check if the left part is in the dictionary.
                if (partition_possible[cut]) { 
                    string part = str.substr(cut, len-cut);
                    cout << "Looking for right part : " << str.substr(cut, len-cut) << endl;
                    // Check if the right part is in the dictionary.
                    if (wordDict.find(part) != wordDict.end()) {
                        partition_possible[len] = true;
                        cout << "-- Right part found. " << endl;
                        break;
                    }
                }
            }
        }
    }
    return partition_possible[sz];
}

bool wordBreak (string str, vector<string>& wordDict) {
    set<string> setDict(wordDict.begin(), wordDict.end());
    return Check(str, setDict);
}

int main() {

    string str_a("abcd");
    vector<string> vec_a = {"a", "abc", "b", "cd"};

    cout << "===========================" << endl;
    if (wordBreak(str_a, vec_a)) {
        cout << "Matched" << endl;
    } else {
        cout << "No match found" << endl;
    }
    cout << "===========================" << endl;

    string str_b("goldcold");
    vector<string> vec_b = {"go", "old", "co"};

    if (wordBreak(str_b, vec_b)) {
        cout << "Matched" << endl;
    } else {
        cout << "No match found" << endl;
    }
    cout << "===========================" << endl;

    return 0;
}

Output

===========================
Looking for string : a
--- String found. 
Looking for string : ab
Looking for right part : b
-- Right part found. 
Looking for string : abc
--- String found. 
Looking for string : abcd
Looking for right part : bcd
Looking for right part : cd
-- Right part found. 
Matched
===========================
Looking for string : g
Looking for string : go
--- String found. 
Looking for string : gol
Looking for right part : l
Looking for string : gold
Looking for right part : ld
Looking for string : goldc
Looking for right part : ldc
Looking for string : goldco
Looking for right part : ldco
Looking for string : goldcol
Looking for right part : ldcol
Looking for string : goldcold
Looking for right part : ldcold
No match found
===========================



Copyright (c) 2019-2024, Algotree.org.
All rights reserved.