A few days ago I read a STL Lambda Lounge post about Counting DNA Nucleotides. After solving the Nucleotides problem in Objective-C, I decided to solve a slightly different problem. I decided to count and print the number of times each character (assume ASCII text) appears in Moby Dick. I used the same data structure to solve both problems: an array of unsigned int
values, where the array index represents the ASCII character and the value is the count.
Reading Moby Dick using Grand Central Dispatch
// This function blocks until all characters are counted.
void CBRCountCharactersInFileAtPath(NSString *path, unsigned int *characters)
{
// The dispatch_io_read is an asynchronous call. We use a semaphore to provide synthetic synchronization.
dispatch_semaphore_t semaphore = dispatch_semaphore_create(0);
dispatch_queue_t readQueue = dispatch_get_global_queue(DISPATCH_QUEUE_PRIORITY_DEFAULT, 0);
dispatch_io_t channel = dispatch_io_create_with_path(DISPATCH_IO_STREAM, [path UTF8String], 0, O_RDONLY, readQueue, NULL);
// SIZE_MAX means keep going until we reach EOF.
dispatch_io_read(channel, 0, SIZE_MAX, readQueue, ^(bool done, dispatch_data_t data, int error) {
if (done) {
// All bytes have been read. Let's notify that we are done.
dispatch_semaphore_signal(semaphore);
} else if (data != NULL) {
// As we read, we count.
NSString *string = CBRStringFromDispatchData(data);
CBRCountCharactersInString(string, characters);
}
});
// Block the calling queue/ thread until the IO read completes (i.e. the channel is "done")
dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER);
// What we open, we must close.
dispatch_io_close(channel, 0);
}
The above code sets up and executes streaming file IO using Grand Central Dispatch. Now let's look at the two functions named CBRStringFromDispatchData
and CBRCountCharactersInString
.
Here is the code used to create a NSString
from a dispatch_data_t
.
NSString *CBRStringFromDispatchData(dispatch_data_t data)
{
size_t dataSize = dispatch_data_get_size(data);
NSMutableString *string = [[NSMutableString alloc] initWithCapacity:dataSize];
dispatch_data_apply(data, ^bool(dispatch_data_t region, size_t offset, const void *buffer, size_t size) {
[string appendFormat:@"%.*s", (unsigned int)size, buffer];
return true;
});
return string;
}
Here is the code that tracks the character counts.
void CBRCountCharactersInString(NSString *string, unsigned int *characters)
{
for (unsigned int index = 0; index < [string length]; index++) {
characters[[string characterAtIndex:index]]++;
}
}
Sample Program
#import "CBRCounter.h"
int main(int argc, const char * argv[])
{
@autoreleasepool {
unsigned int characters[256] = {0};
CBRCountCharactersInFileAtPath(@"/path/to/mobydick.txt", characters);
CBRPrintCharacterCount(characters);
}
return 0;
}
Printing the Character Counts
void CBRPrintCharacterCount(unsigned int *characters)
{
// This example only prints the "printable" characters.
for (unsigned int character = ' '; character <= '~'; character++) {
NSLog(@"%c: %d", (char)character, characters[character]);
}
}
Example Output
A: 2725
B: 1466
C: 1184
D: 802
E: 1361
F: 864
G: 751
H: 1497
I: 3641
J: 261
K: 185
L: 959
M: 782
N: 1241
O: 1050
P: 1158
Q: 323
R: 895
S: 2277
T: 2568
U: 284
V: 181
W: 1326
X: 25
Y: 360
Z: 38
Verify the Counts
I verified my program's results using TextEdit.
- Open "Moby Dick" in TextEdit
- Type Command+F
- Deselect "Ignore Case"
- Select "Contains"
- Enter Character
- View TextEdit's Result