Parsing a .CSV file in C#.NET

Whilst working on my latest project at work, I did some research into the different ways I could parse a .CSV file. This sort of operation is quite common, and so there were a wide range of methods people had come up with to do it. Each one had it’s Advantages and drawbacks. I needed a solution that could handle a file with thousands of records in as little time as possible, since there were other, more time consuming operations that also needed to be done. Below are the 2 examples that I worked with.

1. The Simple Parser.

Originally posted: http://www.switchonthecode.com/tutorials/building-a-simple-csv-parser-in-csharp
public List parseCSV(string path)
{
    List parsedData = new List();

    try
    {
        using (StreamReader readFile = new StreamReader(path))
        {
            string line;
            string[] row;

            while ((line = readFile.ReadLine()) != null)
            {
                row = line.Split(',');
                parsedData.Add(row);
            }
        }
    }
    catch (Exception e)
    {
        MessageBox.Show(e.Message);
    }

    return parsedData;
}

Explanation:

With not knowing much about parsing a .csv file, I tried out this solution. It initially worked with the .csv file I was given to parse, since the content itself didn’t actually contain any comma characters. That’s exactly where this solution falls down. When the line is read in as a string, the Split() method is called on it. Split() takes a parameter which is the character you want to break each field into. Each field is then passed to an Array called ‘row’, with each array object being the parsed field. Then, because we’re likely to be reading more than a single line, the row is passed into a List object called ‘parsedData’, which is returned at the end of the function. Pretty simple stuff.

Why not to use it:

As discussed, each field in a typical .csv is delimited using the Comma character. But what happens when your reading in some text that uses full English grammar? Say for example you have a product description that reads: “The all new i7 CPU, is the latest cutting edge innovation from Intel.”. When Split(‘,’) is called, it will be added to the array as shown below:

row.GetValue(0).ToString() = “The all new i7 CPU”

row.GetValue(1).ToString() = “is the latest cutting edge innovation from Intel”

Hopefully you see why this is an issue. The text has been split into 2 separate values in the array as it’s been treated as a new field.

2. The better solution.

Originally posted here: http://stackoverflow.com/questions/3507498/reading-csv-file by user: David Pokluda.
TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData) 
{
    //Processing row
    string[] fields = parser.ReadFields();
    foreach (string field in fields) 
    {
        //TODO: Process field
    }
}
parser.Close();

My Implimentation:

public List parseCSV(string path)
{
    List<string[]> parsedData = new List<string[]>();
    string[] fields;

    string line = parser.ReadLine();

    try
    {
        TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv");
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");

        while (!parser.EndOfData) 
        {
            fields = parser.ReadFields();
            parsedData.Add(fields);

            //Did more stuff here with each field.
        }

    parser.Close();

    }
    catch (Exception e)
    {
        MessageBox.Show(e.Message);
    }

    return parsedData;
}

 

Explanation:

Whilst this is technically VB.NET code, it works perfectly fine within C#.

You’ll need to add a reference to Microsoft.VisualBasic, by right-clicking on ‘References’ and selecting ‘Add new Reference’. You’ll also need to include required namespace at the top of the code to parse the CSV. This is done as follows:

using Microsoft.VisualBasic.FileIO;

This will then allow you to use the TextFieldParser class within VB.NET.

This method will read each field from the .csv, based upon the given delimiter that is passed to parser.SetDelimiters(), as opposed to reading each line and then splitting. It will not split any content text up like the previous example, so long as the entire field is surrounded by double quote marks (needs verification, however seems to work perfectly fine for me).

Any questions or comments? Feel free to post below!

  • asd

    *were like a million times.

    • Revised. Thank you for pointing this out.

      • Vivek Patel

        Hi Dan, I have used the same parser but I am having problem parsing CSV that were created on MAC and PC.
        They both use different new line character for new line. Also how would you parse a line that has values inside quotes and has new line.

        For example
        First Name, Last Name, Address
        A, B, “Main St
        city
        State”

        That should read like

        Header :- First Name, Last Name, Address
        Line 1:- A, B, “Main St
        City
        State”

  • Sara M

    Hi, I have a question, how would I be able to read the .CSV file line by line and parse out specific keywords. Then take those keywords and put them in an array or list for example?

    • Hi Sara!

      Apologies for the late response. You could do something like this:

      Assuming you pass a string array (Or List) of keywords into the method…

      //List of matched keywords
      List matchedWords = new List();

      while (!parser.EndOfData)
      {
      //Read the current line
      string line = parser.ReadLine();

      //For a single String keyword
      if (line.Contains(“myString”))
      {
      //Found a match – Add to list.
      matchedWords.Add(“myString”))
      }

      //For multiple keywords stored in an Array or List
      foreach (string keyword in Keywords)
      {
      if (line.Contains(keyword))
      {
      //Match Found
      matchedWords.Add(keyword);
      break; //Stop the foreach loop since we found a match
      }
      }
      }

      I havent had chance to check this so there may be some minor errors since I just scribbled it in notepad.. however I hope you get the general idea.

  • Tim Bostwick

    Hi Dan: Thank you very much for this; I’m starting to work with text files and your approach is helpful. I have a couple of questions:

    (1) I’m confused by the fifth line:
    string line = parser.ReadLine();

    This seems redundant vis-a-vis the parser.ReadFields() statement inside the while loop. And the ‘parser’ it invokes is scoped inside the try-block and thus can’t be seen at the method level.

    (2) I think the method signature needs a type to go with the List object returned. Shouldn’t this read:
    public List parseCSV(string path) ?

    If you want to email be directly, I’m at bostwick340@gmail.com. Thanks again

    –Tim

    • Hi Tim,

      Firstly, thank you for your comment on my post!

      I’ll address the questions you raised:

      1) This line essentially reads the first line of the file and does nothing with it (other than pass it to the var “line”). This is to skip over the column headers. It seems this is the only way to advance the cursor to the next line of the file.

      2) You’re absolutely right with the method signature, that’s a fail on my part… thanks for pointing that out!

      Hope you have a great weekend!
      -Dan

  • This works great. It even parses fields with commas correctly! When you are allowing files to be uploaded to a web server and need to limit file types to prevent malicious scripts this is it. Thanks so much.

    • You’re welcome! Glad it was able to help 🙂

  • Daniel José da Silva

    I got a problem when I have to parse a quoted field. Have Microsoft a native solution for this?

  • Jerzy Stachera

    Dan, Is there anything that makes the TextFieldParser faster?
    It’s pretty slow.
    When I have to read the CSV files of size 100MB or more the time can be count in minutes compare to for exaple regex split count in seconds.
    The problem is that right know I didn’t find good regular expression which will handle all the CSV format cases.

    • Hi Jerzy,

      Honeslty I’m not sure. A 100mb CSV files is a HUGE amount of text – and so it’s understandable why parsing a file of this size is so slow.

      If you happen to come across a faster method that achieves the same thing as the TextFieldParser class, please let me know – I’d be very interested in taking a look.

      Sorry I couldn’t answer your query.

      -Dan.

  • drool

    can tell me how can i convert this csv to json format