In these days of big data, data is stored in a multitude of formats, which poses a challenge to anyone trying to consolidate and make sense of it. If you’re lucky, the data will be in an organized, hierarchical format such as JSON, XML or CSV. If you’re not so lucky, the data is more freeform and unstructured and you may have to struggle with endless if/else cases or regular expressions.
You can also use automated parsers such as NSScanner to analyze string data in any form, from natural written languages to computer programming languages. In this NSScanner tutorial, you’ll learn about the parser included in Cocoa and how to use its powerful methods to extract information and manipulate strings in really neat ways. You’ll use what you learn to build an OS X application that works like Apple Mail’s interface, as shown below:
Although you’ll be building an OS X application in this tutorial, NSScanner is available on both OS X and iOS. By the end of this tutorial, you’ll be ready to parse text on either platform. Let’s get started!
Getting Started
Download the starter project, extract the contents of the ZIP file and open NSScannerTutorial.xcodeproj in Xcode.
You’ll find three folders named MasterViewController, Custom Cell and comp.sys.mac.hardware. In the View Controller folder you’ll find a simple xib
file with a TableView
on the left that contains a custom cell with a bunch of labels, and a TextView
on the right hand side.
MasterViewController.m contains a pre-made structure that sets up the delegate/data source for a Table View
. The Custom Cell folder contains PostCellView.h and PostCellView.m which form a subclass of NSTableCellView
. The cell has all the properties that you need to set each individual data item.
As for the data to parse: the comp.sys.mac.hardware folder contains 49 data files for you to parse with your app; take a minute and browse through the data to see how it’s structured.
UITableViews
in OS X are quite similar to those in iOS apps.Build and run the project to see it in action; you’ll see the following appear:
The basic framework is there: on the left hand side, the table view currently has placeholder labels with the prefix [Field]Value. These labels will be replaced with parsed data.
Understanding the Structure of the Data
Before going straight into parsing, it’s important to understand what you’re trying to parse.
Below is a sample file of the 49 files you have to parse; you’ll be parsing the items outlined in red below:
The set of parsed items includes the From, Subject, Date, Organization, Lines, and Message fields. Out of the six fields, you’ll do something extra special with the “From” and “Message” fields, as follows:
“From” Field
For the “From” field, you’ll split the email and the name. This is trickier than it looks, as the name may come before the email, or vice versa. The “From” field might not even have a name or email, or it might have one but not the other.
“Message” segment
For the message segment, you’ll see if a message contains anything cost related. You’ll search the message for prices such as $1000
or $1.00
, as well as particular keywords in the message.
The keywords you’ll search for are: apple, macs, software, keyboard, printer, video, monitor, laser, scanner, disks, cost, price, floppy, card and phone.
Other Fields
For the other fields, you’ll simply separate the field from its value.
The values of the fields are delimited by colons. Also note that the data’s field text segment is separated from the message text segment by a new line.
First off, you’ll need two classes to parse and hold the data to be displayed.
Creating the Object to Hold the Data
Navigate to File\New\File… (or simply press Command+N). Select Mac OS > Cocoa and then Objective-C class and click Next. Set the class name to MacHardwarePost
and the subclass to NSObject. Click Next and then Create.
Open MacHardwarePost.h and add the following properties and method prototype between @interface
and @end
:
//The field’s values once extracted placed in the properties. @property (nonatomic, strong) NSString *fileName; @property (nonatomic, strong) NSString *fromPerson; @property (nonatomic, strong) NSString *email; @property (nonatomic, strong) NSString *subject; @property (nonatomic, strong) NSString *date; @property (nonatomic, strong) NSString *organization; @property (nonatomic, strong) NSString *message; @property int lines; //Does this post have any money related information? E.g. $25, $50, $2000 etc. @property (nonatomic, strong) NSString *costSearch; //Contains a set of distinct keywords. @property (nonatomic, strong) NSMutableSet *setOfKeywords; - (NSString *) printKeywords; |
printKeywords
returns an instance of NSString
that places all keywords in one single string separated by commas. Think of this like Java’s toString
method.
Open MacHardwarePost.m and add the following code between @implementation
and @end
:
- (id)init { if (self = [super init]) { _setOfKeywords = [[NSMutableSet alloc] init]; //1 } return self; } |
init
sets up NSMutableSet
and its various properties. In line 1 above, _setOfKeywords
, which is an instance of NSMutableSet
, tracks all keywords found. You’re using NSMutableSet
over NSMutableArray
because it’s not necessary to store duplicate keywords in this context.
Still working in the same file, add the following code segment right after init
:
- (NSString *) printKeywords { NSMutableString *result = [[NSMutableString alloc] init]; //1 NSUInteger i = 0; //2 NSUInteger numberOfKeywords = [self.setOfKeywords count]; //3 if (numberOfKeywords == 0) return @"No keywords found."; //4 for (NSString *keyword in self.setOfKeywords) //5 { //6 [result appendFormat:(i != numberOfKeywords - 1) ? @" %@," : @" %@", keyword]; i++; //7 } return result; } |
Here’s what’s going on in the code above:
- Initialize an instance of
NSMutableString
namedresult
and is used to append keywords together. - Initialize the counter to 0.
- Obtain the size of the list.
- Check to see if the list is empty. If so, simply return a message.
- Loop over all keywords in
self.setOfKeywords
. - Check if the counter
i
is equal to the last index in the list. If it is not, append a comma after the keyword; otherwise, don’t add a comma after the last word. - Increment the counter to keep track of where you are in the list.
You have finished implementing the MacHardwarePost
object which will store the data you extract from the files. Now, on to creating the parser!
Creating the Data Parser
Navigate to File\New\File… (or simply press Command+N). Select Mac OS > Cocoa and then Objective-C class and click Next. Set the class name to MacHardwareDataParser and the subclass to NSObject. Click Next and then Create.
Open MacHardwareDataParser.h and add the following imports before the @interface
tag:
#import "MacHardwarePost.h" |
Next, add the following method prototype between @interface
and @end
:
- (void)constructSelectorDictionary; - (MacHardwarePost *)parseRawDataWithData:(NSData *)rawData; - (id)initWithKeywords:(NSArray *)listOfKeywords fileName:(NSString *)fileName; |
Now open MacHardwareDataParser.m and add the following code just before @implementation
:
@interface MacHardwareDataParser () //Object that contain the fully extracted information. @property (nonatomic, strong) MacHardwarePost *macHardwarePost; //1 //Stores selector methods that may be called by the parser. @property (nonatomic, strong) NSDictionary *selectorDict; //2 //Contains the list of keywords to search @property (nonatomic, strong) NSArray *listOfKeywords; //3 //Keeps track of the current file we are extracting information from @property (nonatomic, strong) NSString *fileName; //4 @end |
The properties between @interface
and @end
aren’t exposed to the caller of this class; they’re meant for private and/or internal methods and properties for the use of MacHardwareDataParser
alone.
- The property
macHardwarePost
is where all the extracted field’s information will be stored. This property will be returned to the client using our parser once the parsing is complete. selectorDict
is an NSDictionary with its key the field you’re parsing and its value a selector method. It’s really important to have different functions for different tasks and not do everything in one method. Each selector method will be explained later on; check out this StackOverflow post for more information on selectors.-
listOfKeywords
stores the list of keywords you will use to search the message portion for matching keywords. -
fileName
stores the data file you are currently parsing. It’s generally a good idea to store the file name mainly for debugging purposes. If there is some error with the data you have just parsed, you can easily pinpoint and examine the file to see what the issue is.
Initializing your Parser
Open MacHardwareDataParser.m add the following code between
@implementation
and @end
:
#pragma mark - Initialization Phase - (id)initWithKeywords:(NSArray *)listOfKeywords fileName:(NSString *)fileName { if ( self = [super init] ) { [self constructSelectorDictionary]; self.listOfKeywords = listOfKeywords; self.fileName = fileName; } return self; } // build scanner selectors - (void)constructSelectorDictionary { self.selectorDict = @{ @"From" : @"extractFromWithString:", @"Subject" : @"extractSubjectWithString:", @"Date" : @"extractDateWithString:", @"Organization" : @"extractOrganizationWithString:", @"Lines" : @"extractNumberOfLinesWithString:", @"Message" : @"extractMessageWithString:" }; } |
initWithKeywords:fileName:
is an object initializer; when you create a MacHardwareDataParser
object, you will pass in a listOfKeywords
to be searched when parsing the message. You also need to pass in the filename that you are extracting data from to keep track of what you are parsing.
Invoking constructSelectorDictionary
creates an instance of NSDictionary
initialized with six key/value pair items. Whenever you see any one of these keys while parsing, selector
will automatically call the corresponding method. For example, if you find the field “Subject”, the corresponding method extractSubjectWithString:
will be called to extract the “Subject” field’s information.
Still working in the same file, add the following code after constructSelectorDictionary
and before @end
:
#pragma mark - Build Object Phase // construct MacHardwarePost, and return object. - (MacHardwarePost *)parseRawDataWithData:(NSData *)rawData // 1 { if (rawData == nil) return nil; // 2 //Extracted information from raw data placed in MacHardwarePost fields. self.macHardwarePost = [[MacHardwarePost alloc] init]; // 3 //Set the fileName within a MacHardwarePost object //to keep track of which file we extracted information from. self.macHardwarePost.fileName = self.fileName; // 4 //Contains every field and message NSString *rawString = [[NSString alloc] initWithData:rawData encoding:NSUTF8StringEncoding]; // 5 //Split Sections, so we deal with only fields, and then messages NSArray *stringSections = [rawString componentsSeparatedByString:@"\n\n"]; // 6 if (stringSections == nil) // 7 { return nil; } //Don't consider data that doesn't have a message. So stringSection must be > 1 if ([stringSections count] >= 1) // 8 { //Only need to extract the fields. (Located in the 0 index) NSString *rawFieldString = stringSections[0]; // 9 //place extracted fields into macHardwarePost properties. [self extractFieldsWithString:rawFieldString]; // 10 //Place contiguous message blocks back together in one string. NSString *message = [self combineContiguousMessagesWithArray:stringSections withRange:NSMakeRange(1, [stringSections count])]; // 11 //Set macHardwarePost message field. [self extractMessageWithString:message]; // 12 //Analyze the message for $money money, every amount searched we will record all the amounts // concatenate a string of $ e.g. $25, $60, $1250 in one whole string // Empty string if no amount of money was talked about. [self extractCostRelatedInformationWithMessage: message]; // 13 //We are going to loop through the message string and look for the "keywords". [self extractKeywordsWithMessage: message]; // 14 } return self.macHardwarePost; // 15 } |
Taking each numbered comment in turn, you’ll find the following:
parseRawDataWithData
takes an instance ofNSData
as a parameter that contains your data. Once it has parsed all the fields and the message body, the method returns aMacHardwarePost
object in line 15.- Check to see if the data is
nil
before you begin parsing. - Create a new
MacHardwarePost
object and initialize it as empty. You’ll set all the properties’ values once you start extracting information. - Set the filename you’re working on for reference.
- Convert the
NSData
object into a raw string format. - Separate the
fields
text segment from themessage's
text segment. The array could have a size larger than 2 since messages may also havenewline
breaks.componentsSeparatedByString
will split the messages into segments if they’re separated by anewline
— check the example given below for an example of this. - Safety check to see if array was actually created.
- Check to see if the array is greater than 1. This lets you know there will be two or more components that include the fields and message sections.
- Store all the field text segments in
rawFieldString. - Pass
rawFieldString
intoextractFieldsWithString
to extract all the relevant fields and set properties appropriately in theMacHardwarePost
object. - Since you split the messages into multiple segments, you must combine the segments back together to parse cost related information and keywords.
- Pass the combined message into
extractMessageWithString:
to be set in theMacHardwarePost
object. extractCostRelatedInformationWithMessage
extracts and finds cost-related information.extractKeywordsWithMessage
finds the keywords in the message.
Below is an example of how componentsSeparatedByString
splits up the text segments:
parseRawDataWithData
is the first line of attack, to break up the incoming data into manageable chunks. This gives a clear outline of how the data is structured, and how it can be parsed step by step.
Next you'll see how the individual fields and messages are parsed — this is where the fun begins! :]
Parsing the Individual Fields
Consider, if you will, the following sample field text segment:
Here is where NSScanner comes in. You know that each field and its value is separated by the delimiter :
. The image below gives a visual representation of how each section is split up:
An NSScanner object interprets and converts the characters of an NSString
object into number and string values. You assign the scanner’s string on creating it, and the scanner progresses through the characters of that string from beginning to end as you request items.
Open MacHardwareDataParser.m and add the following code just after parseRawDataWithData
and before @end
:
/* * extractFieldsWithString, extracts the necessary fields for a data set, * and places them in the mac hardware post object. */ - (void) extractFieldsWithString: (NSString *)rawString { NSScanner *scanner = [NSScanner scannerWithString:rawString]; // 1 //Delimiters NSCharacterSet *newLine = [NSCharacterSet newlineCharacterSet]; // 2 NSString *currentLine = nil; // 3 NSString *field = nil; // 4 NSString *fieldInformation = nil; // 5 [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@":"]]; // 6 while (![scanner isAtEnd]) // 7 { //Obtain the field if([scanner scanUpToString:@":" intoString:¤tLine]) { // 8 //for some reason \n is always at the front. Probably because we setCharacterToBeSkipped to ":" field = [currentLine stringByTrimmingCharactersInSet: newLine]; // 9 } //Obtain the value. if([scanner scanUpToCharactersFromSet:newLine intoString:¤tLine]) // 10 { fieldInformation = currentLine; // 11 } BOOL containsField = (self.selectorDict[field] != nil) ? YES : NO; // 12 //Only parse the fields that are defined in the selectorDict. if (containsField) { #pragma clang diagnostic push #pragma clang diagnostic ignored "-Warc-performSelector-leaks" [self performSelector:NSSelectorFromString(self.selectorDict[field]) withObject:fieldInformation]; // 13 #pragma clang diagnostic pop } } } |
Here is a comment-by-comment tour of the above code:
scannerWithString
initializes the scanner with a given string and returns anNSScanner
object.- Create a newline
"\n"
NSCharacterSet object. This is used when you read each field/value pair one at a time. currentLine
stores the current field/value pair string.- Initialize
field
to be used to retrieve selector methods fromselectorDict
. - Initialize
fieldInformation
to be used to obtain the field's information which will be passed into the selector's parameters to be analyzed and extracted. setCharactersTobeSkipped:
provided byNSScanner
defines the set of characters to be ignored when scanning for a value representation. Recall that a field and its value are separated by a colon":"
; the colon is ignored when extracting the value. The returned string will not include the colon.- Loop while you haven't exhausted all significant characters in the string.
- Scan up to the colon, which grabs the field segment like so:
- After obtaining the field segment, invoke
stringByTrimmingCharactersInSet
to remove the newline at the end of the string. Later on you'll need to retrieve the selector using the field as a key to the dictionaryselectorDict
- Scan up to the new line character to grab the field's information like so:
- Store the data in
fieldInformation.
- Check to see whether the field exists in
selectorDict
. - If the field is in
selectorDict
, execute the method by invokingperformSelector
. This line is inside pragma tags simply to avoid warnings since theselectors
are unknown at run-time.
Creating the Selector Methods
Recall that your selector dictionary is constructed as follows:
@"From" : @"extractFromWithString:", @"Subject" : @"extractSubjectWithString:", @"Date" : @"extractDateWithString:", @"Organization" : @"extractOrganizationWithString:", @"Lines" : @"extractNumberOfLinesWithString:", @"Message" : @"extractMessageWithString:" |
Now that you have the field and the field’s information, you also have the corresponding method executing automatically to perform the data extraction. You now need to implement the six methods that will be called to extract each field’s value.
Open MacHardwareDataParser.m and add the following code after extractFieldsWithString
and before @end
:
//Extracts the subject field's value, and update post object. - (void)extractSubjectWithString: (NSString *)rawString { self.macHardwarePost.subject = rawString; } //Place date string into date property. - (void)extractDateWithString: (NSString *)rawString { self.macHardwarePost.date = rawString; } //Place the organization field value into organization property. - (void)extractOrganizationWithString: (NSString *)rawString { self.macHardwarePost.organization = rawString; } //Teaches you how to extract an entire message. - (void)extractMessageWithString: (NSString *)rawString { self.macHardwarePost.message = rawString; } |
The methods above simply place the field information you extracted into the MacHardwarePost
object.
Still working in the same file, add the following code immediately after extractMessageWithString:
:
//Teaches you how to extract a number. - (void)extractNumberOfLinesWithString:(NSString *)rawString { int numberOfLines; NSScanner *scanner = [[NSScanner alloc] initWithString:rawString]; //scans the string for an int value. [scanner scanInt:&numberOfLines]; self.macHardwarePost.lines = numberOfLines; } |
For extractNumberOfLinesWithString
, NSScanner
initializes the string that contains the number of lines. It then invokes scanInt:
which scans for an int value from a decimal representation and returns the value found by reference.
NSScanner
has various other methods you can explore at your leisure:
- scanDecimal:
- scanFloat:
- scanHexDouble:
- scanHexFloat:
- scanHexInt:
- scanHexLongLong:
- scanInteger:
- scanInt:
- scanLongLong:
Okay folks, brace yourselves: you're getting deep into the guts of NSScanner and regular expressions. The first bit to parse is the "From" field.
Here you can combine your regular expression skills from the NSRegularExpression Tutorial on this site with your mad NSScanner
skills. Regular expressions are a great way to establish string-splitting patterns.
Still working in the same file, add the following code after extractNumberOfLinesWithString:
and before @end
:
- (void)extractFromWithString: (NSString *)rawString { //An advantage of regular expressions could be used here. //http://www.raywenderlich.com/30288/ //Based on the cases stated, we need to establish some form of pattern in order to split the strings up. NSString *someRegexp = @".*[\\s]*\\({1}(.*)"; //1 // ROGOSCHP@MAX.CC.Uregina.CA (Are we having Fun yet ???) // oelt0002@student.tc.umn.edu (Bret Oeltjen) // (iisi owner) // mbuntan@staff.tc.umn.edu () // barry.davis@hal9k.ann-arbor.mi.us (Barry Davis) NSString *someRegexp2 = @".*[\\s]*<{1}(.*)"; //2 // "Jonathan L. Hutchison" <jh6r+@andrew.cmu.edu> // <BR4416A@auvm.american.edu> // Thomas Kephart <kephart@snowhite.eeap.cwru.edu> // Alexander Samuel McDiarmid <am2o+@andrew.cmu.edu> // Special case: // Mel_Shear@maccomw.uucp // vng@iscs.nus.sg NSPredicate *fromPatternTest1 = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", someRegexp]; //3 NSPredicate *fromPatternTest2 = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", someRegexp2]; //Run through the patterns //Format: Email (Name) if ([fromPatternTest1 evaluateWithObject: rawString]) //4 { [self extractFromParenthesesWithString: rawString]; } //Format: Name <Email> || <Email> else if ([fromPatternTest2 evaluateWithObject: rawString]) //5 { [self extractFromArrowWithString: rawString]; } //Format: Email else { [self extractFromEmailWithString: rawString]; //6 } } |
After examining the 49 data sets, you end up with three cases to consider:
- The first case:
email ( name )
- The second case:
name < email >
- The third case:
Email
with no Name.
Here's a step-by-step explanation of the above code:
- The first regular expression finds a pattern matching the first case. It checks for zero or more occurrences of any character, followed by zero or more occurrence of a space, followed by one open parenthesis
"("
and finally zero or more occurrences of a string. - The second regular expression finds a pattern matching the second case. It checks for zero or more occurrences of any character, followed by zero or more occurrence of a space, followed by one occurrence of an open angle bracket
"<"
and finally zero or more occurrences of any character. - Create a
NSPredicate
object that defines logical conditions used to constrain a search. TheMATCHES
operator uses the regular expression package. You can read more aboutNSPredicate
in the official Apple documentation. - First you check if the field’s information is of the pattern Email (Name). If true, then pass it into
extractFromParenthesesWithString
which extracts the Email and the Name. - If the first pattern doesn't match, check for Name
or just without the Name. If you find a match, pass it into extractFromArrowWithString
which extracts the Email and/or Name. - Finally, if neither of the first two patterns matched, this is the case where you only have an email. In this case, pass the string into
extractFromEmailWithString
.
Still working in the same file, add the following code after extractFromWithString
and before @end
:
#pragma mark - extractFromWithString helper methods //Extract the email, when the pattern is Format: email (No name specified) - (void)extractFromEmailWithString:(NSString *)rawString { self.macHardwarePost.email = rawString; self.macHardwarePost.fromPerson = @"unknown"; } |
extractFromEmailWithString
handles the special case where you don't match on pattern 1 or pattern 2; this is the case that only has the email but no name. In this case you just set MacHardwarePost
object's email and ser the name of the person to “unknown”.
Add the following code after extractFromEmailWithString
and before @end
:
//Extract the name of the person and email, when the pattern is Format: Name <Email> - (void)extractFromArrowWithString:(NSString *)rawString { NSScanner *scanner = [NSScanner scannerWithString:rawString]; //1 NSString *resultString = nil; //2 [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"<>"]]; //3 while (![scanner isAtEnd]) //4 { [scanner scanUpToString:@"<" intoString:&resultString]; //5 self.macHardwarePost.fromPerson = resultString; //6 [scanner scanUpToString:@">" intoString:&resultString]; //7 self.macHardwarePost.email = resultString; //8 } } |
Here is a step-by-step explanation of the code above:
- Create an instance of
NSScanner
that scans the given string with the patternName
- Initialize
resultString
; the extracted name and email will be placed in this string. - Set
"<"
and">"
to be ignored when scanning for a value representation. - Loop through the scanner until you reach the end.
- Scan up to the first
"<"
. This cuts off everything following, leaving only the Name, since you ignored"<"
and“>”
in line 3. The diagram below illustrates this in detail: - Set the
fromPerson
field inMacHardwarePost
. - Scan up to
">"
which will give you the email. This cuts out everything before"<"
and after">"
, like so: - Set the email field of
MacHardwarePost
You're not done yet! Add the following code after extractFromArrowWithString:
and before @end
:
//Extract the name of the person and email, when the pattern is Format: Email (Name) - (void)extractFromParenthesesWithString:(NSString *)rawString { NSScanner *scanner = [NSScanner scannerWithString:rawString]; NSString *resultString = nil; [scanner setCharactersToBeSkipped:[NSCharacterSet characterSetWithCharactersInString:@"()"]]; while (![scanner isAtEnd]) { [scanner scanUpToString:@"(" intoString:&resultString]; self.macHardwarePost.email = resultString; [scanner scanUpToString:@")" intoString:&resultString]; self.macHardwarePost.fromPerson = resultString; } } |
This is essentially the same as extractFromArrowWithString
, except this method deals with parentheses.
Add the following code after extractFromParenthesesWithString
and before @end
:
#pragma mark- Utilities - (NSString *)combineContiguousMessagesWithArray:(NSArray *)array withRange:(NSRange)range { NSMutableString *resultString = [[NSMutableString alloc] init]; //1 for(int i = (int)range.location; i < range.length; i++) //2 { [resultString appendString: array[i] ]; //3 } return [NSString stringWithString:resultString]; //4 } |
Think back to the diagram showing how to split the text:
You had to split the text segment with field-related information from the message segment with portions of the messages — now you need to recombine the message portion into one instead of multiple segments.
Here is a step-by-step explanation of the above code:
- You first create a new
NSMutableString
so you can edit the string whenever you try to combine a portion of text. - Given the range of the
message portion
, start from index 1 (index 0 is thefield portion
) and loop toarray length - 1
. You'll loop through each index containing the message portion. - Get the current index’s message portion and append it to the end of the resultString.
- Finally, return the combined text.
Now that you have the message portion in one place, you can start parsing the message for some useful information.
Parsing the Message
Your keyword search strategy is to look at every word and check your keyword’s dictionary to see if it matches. If so, add it to MacHardwarePost
keywords array that stores all keywords found relating to this message.
Add the following code to the end of MacHardwarePost.m, just before @end
:
//Extract keywords from the message. - (void)extractKeywordsWithMessage:(NSString *)rawString { NSScanner *scanner = [NSScanner scannerWithString:rawString]; //1 NSCharacterSet *whitespace = [NSCharacterSet whitespaceCharacterSet]; //2 NSString *keyword = @""; //3 while (![scanner isAtEnd]) //4 { [scanner scanUpToCharactersFromSet:whitespace intoString:&keyword]; //5 NSString *lowercaseKeyword = [keyword lowercaseString]; //6 if([self.listOfKeywords containsObject: lowercaseKeyword]) //7 { [self.macHardwarePost.setOfKeywords addObject:lowercaseKeyword]; //8 } } } |
The above code is fairly straightforward:
- scannerWithString: initializes the scanner with a given string — in this case your message — and returns an instance of
NSScanner
. - Next you create an
NSCharacterSet
for whitespace; this let you scan up to the next set of characters separated by a whitespace, as shown below: - Initialize the keyword string; this is used to store the found keyword.
- Loop until you're at the end of the text.
- Scan up to a whitespace and store the result in the
keyword
string. - Convert the keyword into lowercase as you want to ignore capital letters.
- Check if the keyword exists in
setOfKeywords
. - If the keyword exists, add the keyword to
MacHardwarePost
's keywords array.
Extracting Cost-Related Information
To search for cost related information, use NSScanner
to search each word separated by a whitespace. This is similar to the keywords strategy, but instead you're now searching for an occurrence of a dollarCharacter
($).
Add the following method code to the end of MacHardwarePost.m, just before @end
:
// Extract amount of cost if the message contains "$" symbol. - (void)extractCostRelatedInformationWithMessage:(NSString *)rawString { NSScanner *scanner = [NSScanner scannerWithString:rawString]; //1 NSMutableString *costResultString = [[NSMutableString alloc] init]; //2 NSCharacterSet *whitespace = [NSCharacterSet whitespaceCharacterSet]; //3 NSCharacterSet *dollarCharacter = [NSCharacterSet characterSetWithCharactersInString:@"$"]; //4 NSString *dollarFound; float dollarCost; while (![scanner isAtEnd]) //5 { //Have the scanner find the first $ token if (![scanner scanUpToCharactersFromSet:dollarCharacter intoString:nil]) //6 { [scanner scanUpToCharactersFromSet:whitespace intoString:&dollarFound]; //7 NSScanner *integerScanner = [NSScanner scannerWithString:dollarFound]; //8 [integerScanner scanString: @"$" intoString:nil]; //9 [integerScanner scanFloat: &dollarCost]; //10 if (!(int)dollarCost == 0) //11 { [costResultString appendFormat:@"$%.2f ", dollarCost]; } } } self.macHardwarePost.costSearch = costResultString; //12 } |
Here's what's going on in the code above:
- Again,
scannerWithString:
initializes the scanner with a given string — in this case your message — and returns an instance ofNSScanner
. - Create an
NSMutableString
so you can append all the cost related information into a single string. - Create a whitespace
NSCharacterSet
so you can jump to the next word after analyzing the previous one. - Create a
dollarCharacter
NSCharacterSet
so you can scan up to a string that starts with a $ symbol. - Continue to loop until you reach the end of the message.
scanUpToCharactersFromSet
scans the string until it finds a$
symbol.- Once you find a
$
symbol, scan up to the next whitespace to give you the cost related portion. - Create a separate
NSScanner
to scan the cost-related string. - The
NSScanner
scans past the$
symbol, leaving you with only the amount. NSScanner
scans the cost-related string for a float value; if you find one, store it indollarCost
.- If
scanFloat:
fails it returns zero, so checkdollarCost
to see if you actually found a valid amount. If so, append it to thecostResultString
. - Once the scanner reaches the end, set the
MacHardwarePost
costSearch field to the cost-related information extracted from the message.
There you have it — your parser is finally complete. Time to put your parser to good use and start extracting information from the 49 data files.
Connecting Your Parser with the 49 Data Files
The last things to do are run all 49 files through your parser to create the MacHardwarePost
objects, pass these objects into your masterViewController
and set up your delegate and data source for the tableview to display the results.
Open AppDelegate.h and replace the code between @interface
and @end
with the following code:
@property (assign) IBOutlet NSWindow *window; //Stores a reference to the data set's file path. E.g. /Users/userName/Documents/comp.sys.mac.hardware @property (nonatomic, strong) NSString *dataSetFilePath; //Stores an array of all data file names. E.g. 50419, 50420, 50421, ... @property (nonatomic, strong) NSArray *listOfDataFileNames; @property (nonatomic, strong) NSMutableArray *listOfPost; @property (nonatomic, strong) IBOutlet MasterViewController *masterViewController; |
Here's an explanation of what these properties will be used for:
dataSetFilePath
stores the path to the 49 data files so you can easily obtain each individual file to be parsed.listOfDataFileNames
stores all 49 data file names in an array; each file name will be appended todataSetFilePath
to get an individual file.listOfPost
stores allMacHardwarePost
objects once you're done parsing.masterViewController
contains theTableView
andTextView
for your app.
Open AppDelegate.m and add the following code just before @implementation
:
#import "MacHardwareDataParser.h" #import "MacHardwarePost.h" |
You'll need to include these imports to reference those classes in the next bit.
Still in the same file, replace applicationDidFinishLaunching
with the following code:
- (void)applicationDidFinishLaunching:(NSNotification *)aNotification { NSError *error = nil; //Obtain the file path to the resource folder. NSString *folderPath = [[NSBundle mainBundle] resourcePath]; //1 //Get all the fileNames from the resource folder. self.listOfDataFileNames = [[NSFileManager defaultManager] contentsOfDirectoryAtPath:folderPath error:&error]; //2 //Keywords we are passing into the scanner to check if a message contains one or more of these words. NSArray *keywords = @[ @"apple", @"macs", @"software", @"keyboard", @"printers", @"printer", @"video", @"monitor", @"laser", @"scanner", @"disks", @"cost", @"price", @"floppy", @"card", @"phone" ]; //3 self.listOfPost = [[NSMutableArray alloc] init]; //4 //Loops through the list of data files, and starts scanning and parsing them and converts them //to MacHardwarePost objects. for (NSString *fileName in self.listOfDataFileNames) //5 { //ignore system files, fileName we are interested in are numbers. if ([fileName intValue] == 0) continue; //6 NSString *path = [folderPath stringByAppendingString: [NSString stringWithFormat:@"/%@", fileName]]; //7 NSData *data = [NSData dataWithContentsOfFile: path]; //8 MacHardwareDataParser *parser = [[MacHardwareDataParser alloc] initWithKeywords:keywords fileName:fileName]; //9 MacHardwarePost *post = [parser parseRawDataWithData:data];//10 if (post != nil) { [self.listOfPost addObject:post];//11 } } //Create the masterViewController self.masterViewController = [[MasterViewController alloc] initWithNibName:@"MasterViewController" bundle:nil]; //12 self.masterViewController.listOfMacHardwarePost = self.listOfPost;//13 //Add the view controller to the Window's content view [self.window.contentView addSubview:self.masterViewController.view]; self.masterViewController.view.frame = ((NSView*)self.window.contentView).bounds; } |
Taking each numbered comment in turn, you'll find the following:
- First obtain the resource folder’s path, which is where all 49 files are located.
- Get all the filenames within the resource folder by calling
contentsOfDirectoryAtPath:
which returns an array of file names that could be either files or directory names. - Initialize an instance of
NSArray
namedkeywords
and set it up with all the keywords to search in our message. - Initialize an instance of
NSMutableArray
calledlistOfPost
to store all theMacHardwarePost
objects. - Loop through the list of files within the resource directory.
- Check each
fileName
to see if it’s an integer as all 49 of the file names are integers. If it isn’t, check the next file to see if it’s a data file to parse. - Append the filename to the resource path to obtain the full path to the file.
- Get the individual data file in the form of a
NSData
using the data file path. - Create an instance of
MacHardwareDataParser
and pass in thekeywords
to search for and thefileName
of the data file to parse. - Pass the data file into your parser’s
parserRawDataWithData
which extracts all the important data. Once complete, the method returns aMacHardwarePost
object ready to use. - Add the object to the list of
MacHardwarePost
objects if the parsing was successful. - Once you're done parsing all 49 data files, create the
masterViewController
. - Pass the list of
MacHardwarePost
objects into yourmasterViewController
which will be used later to set your data source.
At this point, you've parsed all 49 data files and passed the MacHardwarePost
objects to your masterViewController
— it’s finally time to display the results of all your hard effort! :]
Setting up the Table View Delegate and DataSource
Open up MasterViewController.m and add the following imports before @implementation
:
#import "MacHardwarePost.h" |
Find numberOfRowsInTableView:
and replace the implementation with the following:
- (NSInteger)numberOfRowsInTableView:(NSTableView *)tableView { return [self.listOfMacHardwarePost count]; } |
numberOfRowsInTableView
is part of the table view’s data source protocol; it sets the number of rows in a section of the table view. In this case you only have one section, with the number of rows being the number of data files you parsed.
Next, find tableView:viewForTableColumn:row:
. Replace the comment that says //TODO: Set up cell view
with the code below:
PostCellView *cellView = [tableView makeViewWithIdentifier:tableColumn.identifier owner:self]; //1 if ( [tableColumn.identifier isEqualToString:@"PostColumn"] ) //2 { MacHardwarePost *post = [self.listOfMacHardwarePost objectAtIndex:row]; //3 NSString *unknown = @"Unknown"; //4 NSString *costRelated = @"NO"; //5 cellView.from.stringValue = (post.fromPerson == nil) ? unknown : post.fromPerson; //6 cellView.subject.stringValue = (post.subject == nil) ? unknown : post.subject; cellView.email.stringValue = (post.email == nil) ? unknown : post.email; cellView.costRelated.stringValue = (post.costSearch.length == 0) ? costRelated : post.costSearch; cellView.organization.stringValue = (post.organization == nil) ? unknown : post.organization; cellView.date.stringValue = (post.date == nil) ? unknown : post.date; cellView.lines.stringValue = [NSString stringWithFormat:@"%d", post.lines]; cellView.keywords.stringValue = [post printKeywords]; } |
NSTableViewDelegate
has a method tableView:viewForTableColumn:row:
which is a part of the table view’s delegate protocol; this is where you set up every individual cell. There's a custom cell named PostCellView
for your use which contains labels such as from, subject, email, costRelated, organization, date, lines, and keywords for you to set.
Here's a detailed look at the code above:
- Create a new
PostCellView
. - Check that the
tableColumn
is indeedPostColumn
. - Get an individual
MacHardwarePost
object that you parsed based on the current row. - Set a
NSString
variableunknown
in case the property withinMacHardwarePost
turns out to be nil. - Set a NSString variable
costRelated
and initialize to “NO”. - Use the ternary operator to check if the
MacHardwarePost
field is nil. If so, set the label to"unknown"
; otherwise set the label to what you received fromMacHardwarePost
.
Lastly, set up TableView and TextView connection
In MasterViewController.m, replace the starter implementation of tableViewSelectionDidChange:
with the following:
- (void)tableViewSelectionDidChange:(NSNotification *)aNotification { NSInteger selectedRow = [self.tableView selectedRow]; //1 if( selectedRow >= 0 && [self.listOfMacHardwarePost count] > selectedRow ) //2 { MacHardwarePost *post = [self.listOfMacHardwarePost objectAtIndex:selectedRow]; //3 self.messageTextView.string = post.message; //4 } } |
tableViewSelectionDidChange
instructs the delegate
that the table view’s selection has changed. This method executes whenever the user selects a different cell.
Here's the details of the above code:
- Get the currently selected row.
- Check if
selectedRow
is in-bounds. - Get the
MacHardwarePost
corresponding to the selected row. - Get the message of the object, and set the message on the text view.
Build and run your project; you'll see all the parsed fields in the table view. Select a cell on the left and you'll see the corresponding message on the right.
These data files grow up so fast! :] They were just raw data when you found them, and after you groomed them a little with your parser, they look all grown up now. Aww.
Where to Go from Here?
Here’s the source code for the finished project: NSScannerTutorialFinal.zip
There is so much more you can do with the data you have parsed. You could write a formatter that converts all MacHardwarePost
into JSON, XML, CSV or any other formats you can think of! With your new-found flexibility to represent data in different forms, you can share your data across different platforms.
Using NSScanner is a great way to quickly manipulate and search for different strings. I hope this new skill gives you the power to parse all that meaningful data in your own apps!
If you're really interested in the study of computer languages and how they are implemented, take a class in comparative languages. Your course will likely cover formal languages and BNF grammars - all important concepts in the design and implementation of parsers.
For more information on NSScanner and other parsing theory, check out the following resources:
- XML Tutorial for iOS - How to choose the best XML parser for your iPhone-Project.
- Working with JSON in iOS5
- Apple: NSScanner Reference Guide
- Writing a parser using NSScanner (a CSV parsing example) by Matt Gallagher
- Short Presentation on NSScanner by Lorex Antiono
If you have any questions or comments, please join the discussion below and share them!
NSScanner Tutorial: Parsing Data in Mac OS X is a post from: Ray Wenderlich
The post NSScanner Tutorial: Parsing Data in Mac OS X appeared first on Ray Wenderlich.