davedelong / chcsvparser Goto Github PK

View Code? Open in Web Editor NEW

1.3K 59.0 254.0 21.49 MB

A proper CSV parser for Objective-C

License: Other

Ruby 0.98% Objective-C 98.90% Swift 0.12%

chcsvparser's Introduction

CHCSVParser

CHCSVParser is an Objective-C parser for CSV files.

Supported Platforms

Mac OS X 10.7+
iOS 6+

Usage

In order to use CHCSVParser, you'll need to include the following two files in your project:

CHCSVParser.h
CHCSVParser.m

CHCSVParser requires ARC.

Parsing

A CHCSVParser works very similarly to an NSXMLParser, in that it synchronously parses the data and invokes delegate callback methods to let you know that it has found a field, or has finished reading a line, or has encountered a syntax error.

A CHCSVParser can be created in one of three ways:

With a URL to a file
With the contents of an NSString
With an NSInputStream

CHCSVParser can be configured to parse other "character-separated" file formats, such as "TSV" (tab-separated). You can specify the delimiter of the parser during initialization. The delimiter can only be one character in length, and cannot be any newline character or ". Additionally, depending on which options you set on the parser, you may not use #, \, or = as the delimiter either

By default, CHCSVParser will not sanitize the output of the fields; in other words, individual fields will be returned exactly as they are found in the CSV file. However, if you wish the fields to be cleaned (surrounding double quotes stripped, characters unescaped, etc), you can specify this by setting the sanitizesFields property to YES.

CHCSVParser has other properties to alter the parsing behavior:

recognizesBackslashesAsEscapes allows you to parse delimited files where special characters (the delimiter, newlines, etc) are escaped using a backslash. When this option is enabled, you may not use a backslash as a delimiter. This option is disabled by default.
recognizesComments will skip parsing fields that being with an octothorpe (#). These fields are reported to the parser delegate as comments, and comments are terminated by an unescaped newline character. This option is disabled by default.
recognizesLeadingEqualSign allows quoted fields to begin with an =. Some programs use a leading equal sign to indicate that the contents of the field should be interpreted explicitly, and things like insignificant digits should not be removed. This option is disabled by default.

Writing

A CHCSVWriter has several methods for constructing CSV files:

-writeField: accepts an object and writes its -description (after being properly escaped) out to the CSV file. It will also write field separator (,) if necessary. You may pass an empty string (@"") or nil to write an empty field.

-finishLine is used to terminate the current CSV line. If you do not invoke -finishLine, then all of your CSV fields will be on a single line.

-writeLineOfFields: accepts an array of objects, sends each one to -writeField:, and then invokes -finishLine.

-writeComment: accepts a string and writes it out to the file as a CSV-style comment.

If you wish to write CSV directly into an NSString, you should create an NSOutputStream for writing to memory and use that as the output stream of the CHCSVWriter. For an example of how to do this, see the -[NSArray(CHCSVAdditions) CSVString] method.

Like CHCSVParser, CHCSVWriter can be customized with a delimiter other than , during initialization.

Convenience Methods

There are a couple of category methods on NSArray and NSString to simplify the common reading and writing of delimited files.

In addition, the convenience APIs allow for additional parsing options beyond what is provided by CHCSVParser. When you specify the CHCSVParserOptionUsesFirstLineAsKeys option, parsing will return an array of CHCSVOrderedDictionary instances, instead of an array of arrays of strings.

A CHCSVOrderedDictionary is an NSDictionary subclass that maintains a specific order to its key-value pairs, and allows you to look up keys and values by index.

##Data Encoding CHCSVParser relies on knowing the encoding of the content. It should work with pretty much any kind of file encoding, if you can provide what that encoding is. If you do not know the encoding of the file, then CHCSVParser can make a naïve guess. CHCSVParser will try to guess the encoding of the file from among these options:

MacOS Roman (NSMacOSRomanStringEncoding; the default/fallback encoding)
UTF-8 (NSUTF8StringEncoding)
UTF-16BE (NSUTF16BigEndianStringEncoding)
UTF-16LE (NSUTF16LittleEndianStringEncoding)
UTF-32BE (NSUTF32BigEndianStringEncoding)
UTF-32LE (NSUTF32LittleEndianStringEncoding)
ISO 2022-KR (kCFStringEncodingISO_2022_KR)

Performance

CHCSVParser is conscious of low-memory environments, such as the iPhone or iPad. It can safely parse very large CSV files, because it only loads portions of the file into memory at a single time.

Credits & Contributors

CHCSVParser was written by Dave DeLong and has accepted patches from several other contributors.

CHCSVParser uses code to discover file encoding that was provided by Rainer Brockerhoff.

License

CHCSVParser is licensed under the MIT license, which is reproduced in its entirety here:

Copyright (c) 2014 Dave DeLong

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

chcsvparser's People

Contributors

Stargazers

Watchers

Forkers

pomerama martica yresk judges yeonsh leonardopasquini-infocorp kentzo mobilehub csteynberg billysword meoz445 muppets chronicstim mikeabdullah brightsoftdev nriding memmons rpranata jbalcells timesong defragged simonfairbairn moacap rickerbh xiaoduan sumanthk2006 laurencegooch jkubicek baz nicktmro git109 jogloran amrelsehemy tjkoifoly jdmuys siuying dayananda87 kater169 projectskillz casademora hpique mikeodell77 adamotte jcoleman jokiazhang uchuugaka kevinlawler akdsouza spacreau booyah frankct310 shuoli84 edwardean twotoasters anfriis aftabiphone cbarillet ahmedalmoraly massimop kjuly elvisqin tiash demosp bright2013 mihaidamian cescofry giacaglia vtagle tomjuszczyk joehu wchaney hankinsoft umakanta kpippus haikusw syedfa jawngee darrenstone flipstudio benoit42 calrion rogmabi rinzwind papyrus100 yauzz sammcewan pritasam alexfigtree ioslabs nikolai1024 itworx ios4u kenmhaggerty takahh chipsimon jlott1 liane0827 brandonpluim santiagoitzcoatl algal

chcsvparser's Issues

when the file ends with new line, parser will create a dummy 1 row with 1 empty record

From the code, parser tried to ignore new line characters, with _parseNewline. But when the last character is newline, then _parseNewline will return true, and the next record logic will be triggered, which I presume is not the desired behavior.

Possible problem with sniffencoding?

Given the following csv file:
https://mega.co.nz/#!BklVVCSR!JSN_8SIjPfz4eyHdK87H-1U2wyNmC-JkvsDthE-peII

Which has 415 lines and no unicode characters until line 150, the following code "fails":
[NSArray arrayWithContentsOfCSVFile: "file path"];

"fails" in quotes as the method succeeded, but the array only contains 149 entries. At the point it gets to the unicode encoding it silently fails. This gives a false impression that everything was successful (and takes a long time to figure out what is going on :)

Problem reading little-endian Unicode

I have an NSString that was converted from little-endian Unicode NSData. The first character of the string is the Unicode byte-order marker (BOM), which in little endian is 0xFF 0xFE.

The first time _loadMoreIfNecessary calls initWithBytes:length:encoding:, the BOM is in the buffer and the buffer is read correctly. However, when the second buffer is converted there is no BOM, and the data is treated as big-endian. This means that the second and all subsequent buffers of data are corrupted.

In one sense, the bug is that _loadMoreIfNecessary is converting each buffer of text independently, rather than maintaining conversion context from one buffer to the next. In general, text encodings require context to handle multi-byte characters, byte order markers and such. A more robust version of this function would use the lower-level Text Encoding Converter, which maintains context from one buffer to the next.

But an easier fix might be to change initWithCSVString: to use a fixed encoding like NSUTF16BigEndianStringEncoding rather than calling [csv fastestEncoding], which evaluates to NSUnicodeStringEncoding which is ambiguous. I believe that using a unambiguous encoding would prevent the error, even if its not as general a solution as using Text Encoding Converter.

Fails to parse record with unescaped parenthesis

given the record

16681;6;Orehovyj boulevard, ul. Musy Dzhalilja (odd side);20;out;55.6141571054;37.7460757208;800;34;34;0;0;0;0;0;1

library fails to parse the 3rd field with the following error:

Unexpected delimiter. Expected ';' (0x3B), but got '(' (0x28)

Is there any way to parse this data without altering it (e.g adding quotes)?

Ability to use the First Line as Keys

I Think it could be quite interesting to have the ability to use the first line from the CSV file as keys for the following lines. So a CSV with 5 lines in it would result in a NSArray with 4 NSDictionarys in it instead of 5 NSArrays.
I could be happy to do a pull request if you feel it's interesting.

-initWithContentsOfCSVFile: assumes UTF-8

Any reasons for not using the encoding sniffing method when -initWithContentsOfCSVFile: is used by just passing NSStringEncoding encoding = 0? I think it might be better.

trailing backslash leads to crash

If you add a backslash as last character on the last line with content (escaping the \n) finishCurrentField crashes because NSRange.length is unsigned.

if ([currentField length] < nextSearchRange.location) break;

in the while loop would help.

Sorry for not providing a patch: I'm new to git.

Parser can't parse UTF-8 csv file

I just tested the new source and I found it can't parse UTF8 code.

Code as below is ok, I got what I need.
NSString *csvFile = [NSString stringWithContentsOfFile:pathToBundle(CSV_FINEAME1) encoding:NSUTF8StringEncoding error:nil];

But when I use code as below, I can't get what I need.
NSString *csvLines = [[NSArray arrayWithContentsOfCSVFile:pathToBundle(CSV_FINEAME1)] CSVString];

Parser says
-[CHCSVParser _sniffEncoding] CHCSVParser.m:176 unable to determine stream encoding; assuming MacOSRoman

unable to parse csv file with initWithContentsOfCSVFile

can u please look at this question am unable to parse the csv file its not returning anything to arrays

http://stackoverflow.com/questions/28780139/unable-to-parse-csv-file-with-chsv-parser

Writer create renew the file

Maybe I just didn't find, but I want the writer to append on my actual CSV File, not create a new one each time I do : CHCSVWriter *csvWriter = [[CHCSVWriter alloc] initForWritingToCSVFile:..];

Is it possible to do it ?
Because parse all the file before writing one more line in would be extremely expensive for the purpose.
Thank for support :)

public class variable for measuring progress

We want to hook CHCSVParser up to a progress bar. There are lots of ways to do this. I think a good way to do it is to track internally how many bytes the parser has read (better: processed) and then expose this as a variable. The delegate can compare this value against the total filesize and choose whether to send a message to the UI thread.


@property (assign) long long totalBytesRead;

- (void)_advance {
    [self _loadMoreIfNecessary];
    _nextIndex++;
    _totalBytesRead++;
}

The above is one suggestion. I think the right solution will need input from the project maintainer.

Stop parsing at certain line

Hi,
Does anyone know how to stop parsing process once it reaches a certain line #? This would be very helpful.
Thanks
Nick

Parsing ends if field includes nullchar

I can not parse the csv file with the following content:

id_1,id_2,id_3
delay,�,0
delay,�,0

Note that, there is a null char between the 2 commas for the empty field (put the cursor on the first comma and advance to right to see it).

When I try to parse this csv file, CHCSVParser just jumps to parserDidEndDocument function just after first null char between commas.

Make certain methods optional, not required

Make these methods optional and not required:

- (void) parser:(CHCSVParser *)parser didStartDocument:(NSString *)csvFile;

- (void) parser:(CHCSVParser *)parser didStartLine:(NSUInteger)lineNumber;

- (void) parser:(CHCSVParser *)parser didEndLine:(NSUInteger)lineNumber;

- (void) parser:(CHCSVParser *)parser didEndDocument:(NSString *)csvFile;

- (void) parser:(CHCSVParser *)parser didFailWithError:(NSError *)error;

This is because a developer does not always need to use them all. The only one that should be required, in my opinion, is:

- (void) parser:(CHCSVParser *)parser didReadField:(NSString *)field;

Add a new constructor for NSData

It would really handy if you could add the following contructor:

CHCSVParser.h

(id) initWithData: (NSData )data encoding:(NSStringEncoding)encoding error: (NSError *)anError;

CHCSVParser.m

(id) initWithData: (NSData )data encoding: (NSStringEncoding)encoding error: (NSError *)anError {
return [self initWithCSVString:[[NSString alloc] initWithData: data encoding: encoding]
encoding: encoding
error: anError];
}

Problem with initForWritingToCSVFile

Hi Dave,

I was assuming when I do something like this:

CHCSVWriter *writer = [[CHCSVWriter alloc] initForWritingToCSVFile:@"export.csv"];

for (Tbl_ExpenseRecords *noteInfo in xpenseData) {
    // NSLog(@"%@",noteInfo.cardname01);
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.expensedate]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.title]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.value]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.currency]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.cardname01]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.name_]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.address_]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.postalCode_]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.city_]];
    [writer writeField:[NSString stringWithFormat:@"%@",noteInfo.country_]];
    [writer finishLine];
}
[writer closeStream];

It would create a file in the Apps DocumentDirectory but there is no file, am I doing something wrong? Need some advise because I'm not very familiar with the stream method.

Thanks in advance
Ingemar

Tests in UnitTests.m are failing because required parser options haven't been set

The unit tests in UnitTests.m don't succeed. The arrayWithContentsOfCSVFile: method is called, but that convenience method (now) has options defaulting to NO. Yet, looking at the contents of expectedFields vis-à-vis the contents of Test.csv implies certain options are required to be set via arrayWithContentsOfCSVFile:options: :

The test content has backslashes used as escape characters, so CHCSVParserOptionsRecognizesBackslashesAsEscapes must be set.
The test content contains some comment lines (initial octothorpe "#"), and the test expects to not receive those lines from the parser, so CHCSVParserOptionsRecognizesComments must be set.
It isn't clear what value for CHCSVParserOptionsSanitizesFields would satisfy the tests. I thought it also must be set, but I can't get tests passing either way. How CHCSVParserOptionsSanitizesFields works may itself be another issue... I'm investigating.

Crash when manually defining file encoding

I get an EXC_BAD_ACCESS crash when I use this:

NSInputStream *inputStream = [[NSInputStream alloc]
                                initWithFileAtPath:csvFilePath];
CHCSVParser *parser = [[CHCSVParser alloc]
                         initWithInputStream:inputStream
                         usedEncoding:NSUTF8StringEncoding
                         delimiter:','];

The crash occurs at line 119 of CHCSVParser.m, which is if (encoding == NULL || *encoding == 0) {

I am running all of this off the main thread, but I suspect that ought to be fine. Everything works fine off the main thread when I use initWithContentsOfCSVFile instead of the above.

My CHCSVParser.h/m are identical to GitHub's current master :head with the exception that I have included the code from #33, which simply provides a toggle for whitespace stripping and should have no effect on this.

I will upload the CSV file I am using and post a link as soon as possible. However, I don't think this error should be at all related to the CSV file, should it?

Last new line

An empty last array is created if the CSV string has a last new line character. Should not happen according to RFC 4180 section 2.2.

Could you bump up the podspec version?

I want to use features which were committed recently (especially 45aceae).
Now I specify the commit in Podfile, but it might be better that we use released versions.

Apostrophe in CSV file

Maybe I'm missing something, but I'm parsing a CSV file (from Excel). Everything seems to work well - including parsing fields which contains commas and are embedded in "...": - until the parser comes across the first semicolon (the field is not inside "..."). The parser stops as if it was looking for a 2nd semicolon, which doesn't exist in the file. How do I solve for this? (Excel doesn't see the need to embed cells with semicolons inside "..." when exporting.)

One column CSV file not getting parsed correctly

I tried to parse the following CSV file:

line1
line2

I believe this is a valid CSV file with one column and two rows. However, CHCSVParser only reads an empty field in the first line and finishes.

Fields without balanced double quotes cause parsing to fail

I'm trying to parse a very large tsv file that contains some unbalanced double quotes. In my case, it looks like having a list of characters that the parser skipped would solve most of the problems I'm having. Would something like a new _parseFieldWhitespace (_parseFieldCharactersToIgnore ?) be the best way to attack this?

Here's a modification to the test.tsv file that shows a field with an unbalanced double quote.
1,a 1,b 1,c "1,d
2,a 2,b 2,c 2,d
3,a 3,b 3,c 3,d
4,a 4,b 4,c 4,d
5,a 5,b 5,c 5,d
6,a 6,b 6,c 6,d
7,a 7,b 7,c 7,d
8,a 8,b 8,c 8,d
9,a 9,b 9,c 9,d
10,a 10,b 10,c 10,d

Here's a quick hack at a solution (probably should be called whenever _parseFieldWhitespace is called) -

(void)_parseFieldCharactersToIgnore {
//bdd - test function to remove quotes from a field
NSCharacterSet *ignoreCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@"""];
while ([self _peekCharacter] != '\0' &&
[ignoreCharacterSet characterIsMember:[self _peekCharacter]] &&
[self _peekCharacter] != _delimiter) {
[self _advance];
}

}

Need to help on parsing

First of all, I am not sure whether it is an issue, but I need a little help here. Couldn't find any valid answer for my problem. We have implemented CHCSVParser in our project to parse CSV files of large sizes. Actually we have a CSV file and need to find the details of a search term. But for parsing the file itself it is taking more time (> 20 sec). We have used following code;
CHCSVParser *p = [[CHCSVParser alloc] initWithContentsOfCSVURL:[NSURL fileURLWithPath:filePath]];
p.delegate = self;
[p parse];
is it the right way to use it? And also which method can I use to search for the details of a term. Like, I enter a value in one of the columns in the CSV file and need to fetch the entire row. Please reply ASAP as we close to dead line.

Swift delegate prototypes (a comment, not an issue)

This class is working fine for me in a Swift project. The read delegate functions:

func parserDidBeginDocument(parser: CHCSVParser) {
}
func parser(parser: CHCSVParser, didFailWithError error: NSError) {
}
func parserDidEndDocument(parser: CHCSVParser) {
}
func parser(parser: CHCSVParser, didBeginLine recordNumber: UInt) {
}
func parser(parser: CHCSVParser, didEndLine recordNumber: UInt) {
}
func parser(parser: CHCSVParser, didReadField field: NSString?, atIndex fieldIndex: UInt) {
}
func parser(parser: CHCSVParser, didReadComment field: NSString?) {
}

when the final char of file is ',' that will miss a @"" field

my temporary fix is add follow code to the end of - (void) runParseLoop

if ([currentField length] > 0 && state == CHCSVParserStateInsideField) {
    [self finishCurrentField];
}
if (state == CHCSVParserStateInsideLine) {
    // temporary fix: when the final char of file is ',' that will miss a @"" field
    if ([currentField length] > 0) {
        [self finishCurrentField];
    }
    [self finishCurrentLine];
}

Reading Same File Twice

Hi,

I'm using the CSV as a lookup table for about 10,000 rows of data. I map at most, two columns to a dictionary, using the first column as a key and then another arbitrary column as the value. Since I have multiple columns, it would be most useful to run the parsing more than once. However, I can't seem to get the second run to work. It never sends any delegate methods for lines/records. I believe this is because the NSInputStream is at the end of the stream. So, I make sure I have no strong pointers to the CHCSVParser object (so it deallocates in ARC), and then create a new CHCSVParser using the same file path. To my surprise, this still doesn't work. Any thoughts on parsing the same file multiple times?

Thanks!

Caylan

Source code warnings

I turned on strict source code check and got 2 warnings:

CHCSVParser.m:151:16: Declaration shadows a local variable

CHCSVParser.m:228:20: Comparison of integers of different signs: 'NSInteger' (aka 'int') and 'NSUInteger' (aka 'unsigned int')

Using custom delimiters from extended ASCII table

Hi,
I'm able to use custom comma character but in order to use custom quote character I had to hard code it in .m file. It would be great if it could be passed in as a parameter.

Other issue is I would like to use quote character that is chr(254) in ASCII table and I'm not able to hard code that in. It seems datatype char does not support characters that high on ASCII table. Is there anything I can do?

To make things more interesting it seems that chr 254 looks different when I'm looking at ascii or unicode text file.

Just to sum things up, I'm using comma character 20 and quote character 254. This is fairly standard in field I'm in and there is no way to get files in any other format.

Any help would be great.

Thanks
Nick

Update code to use ARC

As some others have already done, it would be nice if this code utilized ARC.

Fails on comma within quotes

Fails basic test with error "CHCSVParser parse] [Line 268] @parsing error: Error Code=1 "Unexpected delimiter. Expected ',' (0x2C), but got '"' (0x22)" UserInfo=0x7b7b900 {NSLocalizedDescription=Unexpected delimiter."

CSV :
F1, F2, F3
a, "b, B", c
A, B, C
1, 2, 3
I, II, III

Parsed Data Replaces Double Quotes with Two Sets of Double Quotes

I have been using this library successfully for a while, but I have noticed that in the parsed array that is returned after reading a CSV, any double quotes have been returned as two double quotes.

For example, in a field there might be a value:

This is an example of "quoted text"

I expected that the parsed data I would get would be exactly the same, but instead I get the following:

This is an example of ""quoted text""

I found where this is done (on lines 645 to 648), but I wanted to ask why this is done? I am happy to submit a pull request to remove it if this isn't expected behaviour, but I just wanted to check first.

How to "create" the needed file for csv export?

As I am quite new to iOS development in general and also not familiar with this library I would like to know how to create the file or the path that is needed for exporting to a .csv file with this library?

Also I am not sure how a potential user of my application can than access the exported .csv file since there is no file system on iOS?

Setting options for parser requires creating a custom delegate

It's very cumbersome to use the parser when you want to modify the options (in this case, set sanitizesFields). Instead of just calling [@"a,b,c" CSVComponents] you have to create an entire delegate class from scratch and pass it to the parser. You should provide a default delegate, like CHCSVAccumulatorDelegate.

Another sugary option for the string category would be to allow options to be passed in:

@implementation NSString (CHCSVAdditions)

- (NSArray *)CSVComponentsWithOptions:(NSDictionary *)options {
    _CHCSVAggregator *aggregator = [[_CHCSVAggregator alloc] init];
    CHCSVParser *parser = [[CHCSVParser alloc] initWithCSVString:self];
    [parser setDelegate:aggregator];
    for (NSString *key in [options keys])
         [parse takeValue:options[key] forKey:key];
    [parser parse];
    CHCSV_RELEASE(parser);

    NSArray *final = CHCSV_AUTORELEASE(CHCSV_RETAIN([aggregator lines]));
    CHCSV_RELEASE(aggregator);

    return final;
}

@end

Putting the core data into csv file, please let me know replaced method for this method "initForWritingToString"

Hi Davedelong,

I am trying to put the core data into csv file and then attach it via an email to send it.

I saw that u have replaced "initForWritingToString" method. Can you just see the code and rectify it or help me out how to do this.

Thanks

//fetching the data from the core data

NSManagedObjectContext *moc = [self managedObjectContext];
NSEntityDescription *entityDescription = [NSEntityDescription
                                          entityForName:@"Input_Details" inManagedObjectContext:moc];
NSFetchRequest *request = [[NSFetchRequest alloc] init];

NSLog(@"I am a fetcher request, predicate is %@",request.predicate);
request.predicate = [NSPredicate predicateWithFormat:@"rs_Input_project.name = %@", self.projectObject.name];

[request setEntity:entityDescription];

request.resultType = NSDictionaryResultType;

NSSortDescriptor *sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"sewer_No" ascending:YES];
[request setSortDescriptors:@[sortDescriptor]];

NSError *error;

NSArray *fetchedObjects = [moc executeFetchRequest:request error:&error];

//creating a csv CHCSVWriter
CHCSVWriter *writer = [[CHCSVWriter alloc] initForWritingToString];
for(Input_Details* obj in fetchedObjects)
{

    [writer writeLineOfFields:obj.sewer_No, obj.mh_Up, obj.mh_down, nil];
}

NSLog(@"getdatafor csv:%@",writer);

NSArray *paths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory,  NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [paths objectAtIndex:0];
NSString *filePath = [documentsDirectoryPath  stringByAppendingPathComponent:@"Sewer_Output.csv"];
//        filePath = [filePath stringByAddingPercentEscapesUsingEncoding:NSUTF8StringEncoding];
NSData* settingsData;
settingsData = [writer dataUsingEncoding: NSASCIIStringEncoding];

//NSError *error1;

[settingsData writeToFile:filePath atomically:YES];
//            NSLog(@"writeok");
NSData *inputData = [NSData dataWithContentsOfFile:filePath options:NSDataReadingMapped   error:&error];

NSLog(@"Length:%d Error:%@",[inputData length],[error localizedDescription]);

MFMailComposeViewController *mail = [[MFMailComposeViewController alloc] init];
mail.mailComposeDelegate = self;

// Attach an image to the email
NSString *path1 = [[NSBundle mainBundle] pathForResource:@"history" ofType:@"csv"];
NSData *myData = [NSData dataWithContentsOfFile:path1];

// Fill out the email body text
NSString *emailBody = @"history";
[mail setMessageBody:emailBody isHTML:NO];
[mail addAttachmentData:myData  mimeType:@"text/cvs" fileName:@"Sewer_Output"];

[self presentModalViewController:mail animated:YES];

Add convenience methods for parser usage with custom delimiter

As for now only comma separated files can be parsed using the convenience Method arrayWithContentsOfCSVFile:
For example we could have a method

arrayWithContentsOfFile:delimiter:options

Unable to auto-detect encoding

I get an "unable to determine stream encoding; assuming MacOSRoman" when trying to parse a CSV file using

CHCSVParser *parser = [[CHCSVParser alloc]
                         initWithContentsOfCSVFile:csvFilePath];

As expected, all the international characters are scrambled in my app after this.

My CSV File. It was created using CHCSVWriter, then exported to my Mac, and then imported again into the app through iTunes file sharing. The file was never saved in that process as far as I know.

It works correctly if I force UTF8 encoding using

NSStringEncoding encoding = NSUTF8StringEncoding;
CHCSVParser *parser = [[CHCSVParser alloc]
                         initWithInputStream:inputStream
                         usedEncoding:&encoding
                         delimiter:','];

Parse fails when field has leading equal ('=') symbol

It looks like the parser fails when it encounter a line such as the following:
This,is,a,="simple",line

The fourth field (="simple") is causing the problem even though the CSV standard permits such field convention. Please refer to section "CSV Files and Leading Zeros on Numeric Fields" at http://edoceo.com/utilitas/csv-file-format

Can anyone please suggest a solution?

On a side note, the above line is successfully parsed when using the source code from different branch. (I am not sure which of the other two branches)

Thanks for any suggestion.

Invalid asserts in initializer

- (id)initWithInputStream:(NSInputStream *)stream usedEncoding:(NSStringEncoding *)encoding delimiter:(unichar)delimiter

the following asserts are invalid, as no value has been assigned to _delimiter yet:

NSAssert([[NSCharacterSet newlineCharacterSet] characterIsMember:_delimiter] == NO, @"The field delimiter may not be a newline");
NSAssert(_delimiter != DOUBLE_QUOTE, @"The field delimiter may not be a double quote");
NSAssert(_delimiter != OCTOTHORPE, @"The field delimiter may not be an octothorpe");

Or am I missing something?

Problems parsing cvs with æ,ø and å characters

[csvString CSVComponents]; fails when values contain special characters, like æ,ø and å

Thanks

Fields with double quotes that don't start at the beginning of the field cause parsing failures

Working on similar issues to Issue #34...

If a field has a double quote in it, but that quote doesn't occur at the beginning of the field CHCSVParser fails to parse it correctly.

Here's a modification of test.tsv that will cause that issue:
1,a 1,b 1,c 1,"d"
2,a 2,b 2,c 2,d
3,a 3,b 3,c 3,d
4,a 4,b 4,c 4,d
5,a 5,b 5,c 5,d
6,a 6,b 6,c 6,d
7,a 7,b 7,c 7,d
8,a 8,b 8,c 8,d
9,a 9,b 9,c 9,d
10,a 10,b 10,c 10,d

The error generated by Xcode at runtime is:
2013-04-25 13:03:48.576 CHCSVParser[68773:303] ERROR: Error Domain=com.davedelong.csv Code=1 "Unexpected delimiter. Expected ' ' (0x9), but got 'd' (0x64)" UserInfo=0x100301810 {NSLocalizedDescription=Unexpected delimiter. Expected ' ' (0x9), but got 'd' (0x64)}

NSString category: "CSVComponents" is broken if the string contains Unicode character

    NSString *test = @"TRẦN,species_code,Scientific name,Author name,Common name,Family,Description,Habitat,\"Leaf size min (cm, 0 decimal digit)\",\"Leaf size max (cm, 0 decimal digit)\",Distribution,Current National Conservation Status,Growth requirements,Horticultural features,Uses,Associated fauna,Reference,species_id";
    NSArray *testArr = [test CSVComponents];

    NSLog(@"TEST: %@",testArr);

Observing the result array is broken. The cause was because I added a unicode string "TRẦN". You can try with other unicode string, it will result the same thing.

Option to specify delimiter in convenience method

It would be fantastic if we could have a method like the following

[NSArray arrayWithContentsOfCSVFile:fileURL.path delimiter:@";"]

Now I had to implement a custom class as the parser delegate because the _CHCSVAggregator is a private class. I did it like this:

SCSVAggregator* semicolonSeparatedAggregator = [[SCSVAggregator alloc] init];
NSInputStream *stream = [NSInputStream inputStreamWithFileAtPath:fileURL.path];
NSStringEncoding encoding = 0;
CHCSVParser* semicolonSeparatedParser = [[CHCSVParser alloc] initWithInputStream:stream usedEncoding:&encoding delimiter:';'];
semicolonSeparatedParser.delegate = semicolonSeparatedAggregator;
[semicolonSeparatedParser parse];

Infinite loop when a comment octothorpe starts the last line of the file and that line isn't newline-terminated

The parser enters an infinite loop when the last line of the file is a comment line (initial character is an octothorpe, "#"), and that line isn't terminated with a newline.

The while(1) loop in the _parseComment method runs forever because no condition tested in the loop body ever results in a break. The local unichar variable next always equals zero.

Add field index to prototype

Change prototype in CHCSVParserDelegate from:

- (void) parser: (CHCSVParser *)parser didReadField: (NSString *)field {
    // do something
}

To:

- (void) parser: (CHCSVParser *)parser didReadField: (NSString *)field forFieldIndex: (NSUInteger)Index {
    // do something
}

This way a developer can tell what column is being read.

find a bug in -(void)finishCurrentField in CHCSVParser.m

[currentField trimCharactersInSet_csv:[NSCharacterSet newlineCharacterSet]];
if ([currentField hasPrefix:STRING_QUOTE] && [currentField hasSuffix:STRING_QUOTE]) {
[currentField trimString_csv:STRING_QUOTE];
}
else if ([currentField hasPrefix:delimiter]) { // ",word" will be "word" if it's not 'else if'
[currentField replaceCharactersInRange:NSMakeRange(0, [delimiter length]) withString:@""];
}

Not possible to read fields with significant space without also returning quotes in parsed result

If the option CHCSVParserOptionsSanitizesFields isn't set, then given a quoted field and/or a field containing escape characters, the parser returns the raw field; i.e. it hasn't removed surrounding quotes and escaped characters are still escaped.

So typical usage would require CHCSVParserOptionsSanitizesFields to be set in order to get back the actual field data not containing any CSV "markup." Most of the time you don't want the markup, just the data.

However, if one sets the option to YES, then a limitation is introduced: leading and trailing space cannot be significant in the field data because the same option also implies removal of leading and trailing white space.

Meaning if you had the following input file:

"  A  ","B","C"
# the above line has two spaces before and after the letter A

... the parser would never return a record matching this array:

@[ @"  A  ", @"B", @"C" ] // two spaces before and after the letter A

There is a similar test case in UnitTests.m expecting "significant" whitespace, but the expected value is not a raw, still-quoted field. It won't pass with the current parser.

I suggest that the CHCSVParserOptionsSanitizesFields option is really implying two distinct behaviours that should be separated as follows:

Whether or not to return the raw field data (still adorned with quotes and escape. Call this option, say, CHCSVParserOptionReturnRawFields, and
Whether to trim leading and trailing whitespace from the resulting field data, or treat it as significant. Call this option, say, CHCSVParserOptionTrimWhitespaceAndNewlines. Moreover, this option could only be set if the previous option is not set.

Then, there would be three possible ways to combine the options above, instead of the two permitted by the single existing option. This third possibility would enable the case above to work as expected: return non-raw field data where leading and trailing space remain significant.

_loadMoreIfNecessary error

_loadMoreIfNecessary chunked reading fails sometimes on decoding UTF-8. Here is a test file: https://dl.dropboxusercontent.com/u/60611528/vid3.csv

It gives a bunch of Chinese symbols starting line Line 8 Col 37.

In xcode 6 beta4 building shows a couple of warnings

CHCSVParser.m:862:1: Designated initializer missing a 'super' call to a designated initializer of the super class
CHCSVParser.m:863:18: Designated initializer should only invoke a designated initializer on 'super'

Parsing fails when "bad data" is not in the first input stream buffer

Trying to parse external TSV data, which I cannot fix before parsing. The weird problem is that data contains "bad" values e.g.

    "Englanti|Lontoo|kartat|matkaoppaat|n�aht�avyydet"

and If such string is early in file, everything is ok. If it's "further away", CHCSVParser _loadMoreIfNecessary fails to get over the "�". Guess it's interpreted as two separate halves of real character and first half is 0x00?

_stringBuffer could contain 15000 bytes, but [NSString initWithBytes] just returns nil until 15k+ has been incremented to zero one number at a time. Document ends on input file row 20, when file contains 1200 rows.

Forcing encoding as UTF8 helped a little bit, but not with this character.

NSInputStream *stream = [NSInputStream inputStreamWithURL:[NSURL URLWithString:urlPath]];
NSStringEncoding encoding = NSUTF8StringEncoding;
CHCSVParser *p = [[CHCSVParser alloc] initWithInputStream:stream usedEncoding:&encoding delimiter:'\t'];

Any ideas why first buffer[CHUNK_SIZE] from _sniffEncoding would be different than the rest from _loadMoreIfNecessary, can't see much difference? StreamEncoding is always NSUTF8StringEncoding. Any way to fix input stream data before trying to parse it?

Bad data in first buffer

    "Englanti|Lontoo|kartat|matkaoppaat|n\Ufffdaht\Ufffdavyydet"