[LeetCode][python3]0030. Substring with Concatenation of All Words

6 min readJul 30, 2020

[LeetCode][python3]0030. Substring with Concatenation of All Words

Start the journey
N2I -2020.07.30

My Solution

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        
        if not words:
            return []
        elif words[0]=="":
            return list(range(len(s)+1))
        elif len(s)<len(words[0]):
            return []
        self.res=[]
        self.wordlen=len(words[0])
        self.words=words
        
        def dp(s,start,wordleft,end,key):
            if not wordleft:
                self.res.append(key)
                return
            if start>end:
                return
            target=s[start:start+self.wordlen]
            if target in wordleft:
                tmp=list(wordleft)
                tmp.remove(target)
                dp(s,start+self.wordlen,tmp,end,key)
        last_start=len(s)-self.wordlen*len(words)
        for i in range(last_start+1):
            dp(s,i,words,len(s)-self.wordlen+1,i)
        return self.res

Explanation:

An easy way to solve this problem is dynamic programming. For each character in the string, it could be a part of any words or it could be none. So our solution scans from the front of the string. We use the advantage from the description “All words in the list have the same length”. We check the sub-string if it is or is not on our word list. We will keep scanning the next and the next sub-string until there are no more words in the word list or if the sub-string doesn’t match the word list. Then we will start from the next character.

2. A Better solution

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        if not s or not words or not words[0]:
            return []
        n = len(s)
        wn, wm = len(words), len(words[0])
        t = wn * wm
        count = collections.Counter(words)
        
        res = []
        
        for i in range(min(wm, n - t + 1)):
            l = r = i
            cur = collections.defaultdict(int)
            while r + wm <= n:
                w = s[r:r+wm]
                r += wm
                if w not in count:
                    l = r
                    cur.clear()
                else:
                    cur[w] += 1
                    while cur[w] > count[w]:
                        cur[s[l:l+wm]] -= 1
                        l += wm
                    if r - l == t:
                        res.append(l)
        
        return res

Explanation (Chinese):

從本質上跟第一個解法差不多，但是他改用一個window scan+buffer的方式進行。以範例 s = “wordgoodgoodgoodbestword”, words = [“word”,”good”,”best”,”word”] 的testcase來說，大局上來看，該window分別會從s[0],s[1],s[2],s[3] 為起始開始掃描並以四個四個的character推進。也就是說s[0]開始的就會負責掃s[0:4],s[4:8],s[8:12],s[12:16],s[16:20],s[20:24] ,s[1]開始的就會負責掃描的就會是s[1:5],s[5:9]… 。如此則可確保所有可能性都能被掃到。

在這個範例裡首先他會去搜尋開頭的word跟good並把這兩個字的數量加到cur裡。當他吃到第二個good(index=8)的時候，由於words 裡的good數量只有一個，所以他會從第一個吃到的東西開始吐出來，並查看吐完後的good數量會不會等於words裡的good數量。如果吃到的東西並不在words裡頭則會把所有cur裡的東西吐出來重新計算。如果吃到的字串總長度r-l==t剛好是全部字串的總長度，則會把起始l加進答案裡。

特別注意到的是，這個方法不能解決當words裡面都是空字串的情況，因此最好是要在開頭再加上len(words[0])==0 的例外處理。

[LeetCode][python3]0030. Substring with Concatenation of All Words

Written by Ruka

No responses yet