Quantcast
Channel: Kodeco | High quality programming tutorials: iOS, Android, Swift, Kotlin, Unity, and more
Viewing all articles
Browse latest Browse all 4370

Swift Algorithm Club: Boyer Moore String Search Algorithm

$
0
0

Swift Algorithm Club: Efficient Swift 4 String Searching

The Swift Algorithm Club is an open source project on implementing data structures and algorithms in Swift.

Every month, Vincent Ngo, Ross O’Brien and I feature a cool data structure or algorithm from the club in a tutorial on this site. If you want to learn more about algorithms and data structures, follow along with us!

In this tutorial, you’ll analyze 2 algorithms:

Note: This is written for Xcode 9 Beta 2/Swift 4. You can download Xcode betas here.

Getting Started

It’s fairly simple to illustrate just how important string searching algorithms are in the world. Press CMD + F and try to search for the letter c. You get the results almost instantly. Now imagine if that took 10 seconds to compute… you might as well retire!

Brute Force It!

The brute force method is relatively straightforward. To understand the brute force string method, consider the string "HELLO WORLD":

HELLO WORLD String

For the purposes of this tutorial, there are a few things to keep in mind.

  1. Your search algorithm should be case sensitive.
  2. The algorithm should return the index of the first match.
  3. Partial matches should work. For example:
    let text = "HELLO WORLD"
    text.index(of: "ELLO") // returns 1
    text.index(of: "LD") // returns 9
    

The algorithm is fairly straightforward. For example, assume you are looking for the pattern "LO". You’ll begin by iterating through the source string. As soon as you reach a character that matches the first character of your lookup string, you’ll try to match the rest of the characters. Otherwise, you’ll move on through the rest of the string:

Implementation

You’ll write this method as an extension of String. Using Xcode 9 beta 2 or later, create a new Swift playground. Delete the boilerplate code so you have a blank playground page. You’ll start by creating a stub of the implementation inside a String extension. Write the following at the top of your playground:

extension String {
  func index(of pattern: String) -> Index? {
    // more to come
    return nil
  }
}

This purpose of this function is simple: Given a string (hereby referred to as the source string), you check to see if another string is within it (hereby referred to as the pattern). If a match can be made, it’ll return the index of the first character of the match. If this method can’t find a match, it’ll return nil.

As of Swift 4, String exposes the indices property, which contains all the indexes that is used to subscript the string. You’ll use this to iterate through your source string. Update the function to the following:

func index(of pattern: String) -> Index? {
  // 1
  for i in indices {

    // 2
    var j = i
    var found = true
    for p in pattern.indices {
      guard j != endIndex && self[j] == pattern[p] else { found = false; break }
      j = index(after: j)
    }
    if found {
      return i
    }
  }
  return nil
}

This does exactly what you wanted:

  1. You loop over the indices of the source string
  2. You attempt to match the pattern string with the source string.

As soon as you find a match, you’ll return the index. It’s time to test it out. Write the following at the bottom of your playground:

let text = "Hello World"
text.index(of: "lo") // returns 3
text.index(of: "ld") // returns 9

The brute force approach works, but it’s relatively inefficient. In the next section, you’ll look at how you can make use of a clever technique to optimize your algorithm.

Boyer Moore String Search

As it turns out, you don’t need to look at every character from the source string — you can often skip ahead multiple characters. The skip-ahead algorithm is called Boyer Moore and it’s been around for some time. It is considered the benchmark for all string search algorithms.

This technique builds upon the brute force method, with 2 key differences:

  1. Pattern matching is done backwards.
  2. Uses a skip table to perform aggressive skips during traversal

Here’s what it looks like:

The Boyer Moore technique makes use of a skip table. The idea is fairly straightforward. You create a table based on the word you’d like to match. The table is responsible for holding the number of steps you may skip for a given letter of the word. Here’s a skip table for the word "HELLO":

Skip Table

You’ll use the skip table to decide how many traversals you should skip forward. You’ll consult the skip table before each traversal in the source string. To illustrate the usage, take a look at this specific example:

In this situation, you’re comparing the "H" character in the source string. Since this doesn’t match the last character in the pattern, you’ll want to move down the source string. Before that, you would consult the skip table to see if there’s an opportunity to do some skips. In this case, "H" is in the skip table and you’re able to perform a 4 index skip.

Back in your Swift playground, delete the implementation of index(of:), except for return nil:

func index(of pattern: String) -> Index? {
  return nil
}

The Skipping Table

You’ll start by dealing with the skip table. Write the following inside the String extension:

fileprivate var skipTable: [Character: Int] {
  var skipTable: [Character: Int] = [:]
  for (i, c) in enumerated() {
    skipTable[c] = count - i - 1
  }
  return skipTable
}

This will enumerate over a string and return a dictionary with it’s characters as keys and an integer representing the amount it should skip by. Verify that it works. At the bottom of your playground write the following:

let helloText = "Hello"
helloText.skipTable.forEach { print($0) }

You should see the following in the console:

(key: "H", value: 4)
(key: "L", value: 1)
(key: "O", value: 0)
(key: "E", value: 3)

This matches the table diagram from earlier.

Matching

Another component of the Boyer Moore algorithm is backwards string matching. You’ll devise a method to handle that. This method has 3 goals:

  1. Backwards match 2 strings character by character.
  2. If at any point the match fails, the method will return nil.
  3. If a match completes successfully, return the String.Index of the source string that matches the first letter of the pattern.

Write the following beneath skipTable:

// 1
fileprivate func match(from currentIndex: Index, with pattern: String) -> Index? {
  // more to come

  // 2
  return match(from: index(before: currentIndex), with: "\(pattern.dropLast())")
}

This is the recursive method you’ll use to do matching against the source and pattern strings:

  1. currentIndex keeps track of the current character of the source string you want to match against.
  2. On each recursive call, you decrement the index, and shorten the pattern string by dropping its last character.

The behaviour of this method looks like this:

Now, it’s time to deal with the comparison logic. Update the match method to the following:

fileprivate func match(from currentIndex: Index, with pattern: String) -> Index? {
  // 1
  if currentIndex < startIndex { return nil }
  if currentIndex >= endIndex { return nil }

  // 2
  if self[currentIndex] != pattern.last { return nil }

  // 3
  if pattern.count == 1 && self[currentIndex] == pattern.last { return currentIndex }
  return match(from: index(before: currentIndex), with: "\(pattern.dropLast())")
}
  1. You’ll need to do some bounds checking. If currentIndex ever goes out of bounds, you’ll return nil
  2. If the characters don’t match, then there’s no point to continue further.
  3. If the final character in pattern matches, then you’ll return the current index, indicating a match was made at starting at this location.
For explanation purposes, I separated the logic into multiple statements. You could rewrite this in a more concise way using guard:
guard currentIndex >= startIndex && currentIndex < endIndex && pattern.last == self[currentIndex]
  else { return nil }
if pattern.count == 1 && self[currentIndex] == pattern.first { return currentIndex }

With the skip table and matching function ready, it's time to tackle the final piece of the puzzle!

index(of:)

Update the index method to the following:

func index(of pattern: String) -> Index? {
  // 1
  let patternLength = pattern.count
  guard patternLength > 0, patternLength <= count else { return nil }

  // 2
  let skipTable = pattern.skipTable
  let lastChar = pattern.last!

  // 3
  var i = index(startIndex, offsetBy: patternLength - 1)

  // more to come...
  return nil
}

You've set up the playing field:

  1. First, check to see if the length of the pattern string is within the bounds of the source string.
  2. Keep track of the skip table for the pattern string, and it's last character.
  3. You'll initialize a String.Index to keep track of traversals. Since you're planning on matching the strings backwards, you can have a small headstart by offsetting this index by the length of the pattern.

Next, you'll define the logic for the matching and traversals. Add the following just before the return statement:

// 1
while i < endIndex {
  let c = self[i]

  // 2
  if c == lastChar {
    if let k = match(from: i, with: pattern) { return k }
    i = index(after: i)
  } else {
    // 3
    i = index(i, offsetBy: skipTable[c] ?? patternLength, limitedBy: endIndex) ?? endIndex
  }
}

Here's the play by play:

  1. You'll continue to traverse the source string until you reach the endIndex
  2. If the current character of the source string matches the last character of the pattern string, you'll attempt to run the match function. If this returns a non nil value, it means you've found a match, so you'll return the index that matches the pattern. Otherwise, you'll move to the next index.
  3. If you can't make a match, you'll consult the skip table to see how many indexes you can skip. If this skip goes beyond the length of the source string, you'll just head straight to the end.

Time to give it a whirl. Add the following at the bottom of the playground:

let sourceString = "Hello World!"
let pattern = "World"
sourceString.index(of: pattern)

You should get a 6 for the index. Woohoo, it's working!

Where to go From Here?

I hope you enjoyed this tutorial on efficient string searching!

Here is a playground with the above code. You can also find the original implementation and further discussion on the repo for Brute Force String Search and Boyer Moore String Search.

This was just one of the many algorithms in the Swift Algorithm Club repository. If you're interested in more, check out the repo.

It's in your best interest to know about algorithms and data structures - they're solutions to many real-world problems, and are frequently asked as interview questions. Plus it's fun!

So stay tuned for many more tutorials from the Swift Algorithm club in the future. In the meantime, if you have any questions on implementing trees in Swift, please join the forum discussion below.

Note: The Swift Algorithm Club is always looking for more contributors. If you've got an interesting data structure, algorithm, or even an interview question to share, don't hesitate to contribute! To learn more about the contribution process, check out our Join the Swift Algorithm Club article.

The post Swift Algorithm Club: Boyer Moore String Search Algorithm appeared first on Ray Wenderlich.


Viewing all articles
Browse latest Browse all 4370

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>