The Karp-Rabin algorithm is a string matching algorithm that uses a rolling hash function to determine if one string is a substring of another.
To compare the string s and t we can compare s
with each substring in t. This would yield a time
complexity of O(len(s) * len(t)).
t: [ | | | | | | | | | | | | | | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
s: [ | | | ]
To do this a little more efficiently, we can
convert the string s to a numeric value using
a hash function. Using the same hash function
we can append the next character and remove the
previous character.
E.g.
To compare bce with abcdabce a brute force
version might look like the following:
t: [a|b|c|d|a|b|c|e]
s: [b|c|e]
s: [b|c|e]
s: [b|c|e]
s: [b|c|e]
s: [b|c|e]
s: [b|c|e]
Each comparison would require comparing each character in the substring (3 characters) with the corresponding character in the sliding window. This means as the size of the substring grows, so does the number of comparisons.
If we convert the substring to an integer value, then add a new character as we slide left and remove the previous character, then we can compare a single value with another single value. We incur the cost of computing the hash code for each sliding window but by using integers and addition/subtraction we make this a constant time operation.
E.g.
If we use the ascii character set then we can convert each character to the equivalent ascii code.
s: [b|c|e]
sc: [97, 98, 99].sum == 298
t: [a|b|c|d|a|b|c|e]
s: [b|c|e]
* (298 == [97, 98, 99])
* (298 == 294)
* (298 == 294)
s: [b|c|e]
* (298 == (294 - 'a' + 'd')
* (298 == (294 - 97 + 100)
* (298 == 297)
s: [b|c|e]
* (298 == (297 - 'b' + 'a'))
* (298 == (297 - 97 + 96))
* (298 == 296)
s: [b|c|e]
* (298 == (296 - 'c' + 'b'))
* (298 == (296 - 99 + 98))
* (298 == (295))
s: [b|c|e]
* (298 == (295 - 'd' + 'c'))
* (298 == (295 - 100 + 99))
* (298 == (294))
s: [b|c|e]
* (298 == (294 - 'a' + 'e'))
* (298 == (294 - 97 + 101))
* (298 == 298)
#!/usr/bin/env ruby
def verify?(s, t, i)
s.bytes == t.bytes[i..(i+s.bytes.size-1)]
end
def match?(s, t)
sc, tc = s.bytes.sum, t.bytes[0...s.bytes.size].sum
return true if sc == tc && verify?(s, t, 0)
for i in (1...(t.bytes.size))
tc -= t.bytes[i-1]
tc += (t.bytes[i+(s.bytes.size-1)] || 0)
return true if sc == tc && verify?(s, t, i)
end
false
end
[
["bce", "abcdabce", true],
["bce", "abcdabcd", false],
["dba", "ccaccaae", false],
].each do |(s, t, expected)|
result = match?(s, t)
puts [s, t, result, expected == result ? "PASS" : "FAIL"].inspect
end