Parsing a .CSV file in C#.NET

Whilst working on my latest project at work, I did some research into the different ways I could parse a .CSV file. This sort of operation is quite common, and so there were a wide range of methods people had come up with to do it. Each one had it’s Advantages and drawbacks. I needed a solution that could handle a file with thousands of records in as little time as possible, since there were other, more time consuming operations that also needed to be done. Below are the 2 examples that I worked with.

1. The Simple Parser.

Originally posted: http://www.switchonthecode.com/tutorials/building-a-simple-csv-parser-in-csharp
public List parseCSV(string path)
{
    List parsedData = new List();

    try
    {
        using (StreamReader readFile = new StreamReader(path))
        {
            string line;
            string[] row;

            while ((line = readFile.ReadLine()) != null)
            {
                row = line.Split(',');
                parsedData.Add(row);
            }
        }
    }
    catch (Exception e)
    {
        MessageBox.Show(e.Message);
    }

    return parsedData;
}

Explanation:

With not knowing much about parsing a .csv file, I tried out this solution. It initially worked with the .csv file I was given to parse, since the content itself didn’t actually contain any comma characters. That’s exactly where this solution falls down. When the line is read in as a string, the Split() method is called on it. Split() takes a parameter which is the character you want to break each field into. Each field is then passed to an Array called ‘row’, with each array object being the parsed field. Then, because we’re likely to be reading more than a single line, the row is passed into a List object called ‘parsedData’, which is returned at the end of the function. Pretty simple stuff.

Why not to use it:

As discussed, each field in a typical .csv is delimited using the Comma character. But what happens when your reading in some text that uses full English grammar? Say for example you have a product description that reads: “The all new i7 CPU, is the latest cutting edge innovation from Intel.”. When Split(‘,’) is called, it will be added to the array as shown below:

row.GetValue(0).ToString() = “The all new i7 CPU”

row.GetValue(1).ToString() = “is the latest cutting edge innovation from Intel”

Hopefully you see why this is an issue. The text has been split into 2 separate values in the array as it’s been treated as a new field.

2. The better solution.

Originally posted here: http://stackoverflow.com/questions/3507498/reading-csv-file by user: David Pokluda.
TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv");
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(",");
while (!parser.EndOfData) 
{
    //Processing row
    string[] fields = parser.ReadFields();
    foreach (string field in fields) 
    {
        //TODO: Process field
    }
}
parser.Close();

My Implimentation:

public List parseCSV(string path)
{
    List<string[]> parsedData = new List<string[]>();
    string[] fields;

    string line = parser.ReadLine();

    try
    {
        TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv");
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");

        while (!parser.EndOfData) 
        {
            fields = parser.ReadFields();
            parsedData.Add(fields);

            //Did more stuff here with each field.
        }

    parser.Close();

    }
    catch (Exception e)
    {
        MessageBox.Show(e.Message);
    }

    return parsedData;
}

 

Explanation:

Whilst this is technically VB.NET code, it works perfectly fine within C#.

You’ll need to add a reference to Microsoft.VisualBasic, by right-clicking on ‘References’ and selecting ‘Add new Reference’. You’ll also need to include required namespace at the top of the code to parse the CSV. This is done as follows:

using Microsoft.VisualBasic.FileIO;

This will then allow you to use the TextFieldParser class within VB.NET.

This method will read each field from the .csv, based upon the given delimiter that is passed to parser.SetDelimiters(), as opposed to reading each line and then splitting. It will not split any content text up like the previous example, so long as the entire field is surrounded by double quote marks (needs verification, however seems to work perfectly fine for me).

Any questions or comments? Feel free to post below!

From ‘Lazy’ Student, to Web Developer – Placement Week 1.

Been a little while since I’ve last blogged, but the past few weeks have been rather busy.

As part of my Degree course at Salford University, I opted to do an Industrial Placement for a year. With how competitive the Jobs market is at the moment, any relevant experience to my field of Web Development would defiantly put me at an advantage over other graduates when the time comes. I submitted for a position at Inspire Tech UK, a small company based in Salford, working on their website / back-end systems using ASP.NET (C#). The company trades from their own website as well as Ebay, Amazon and Play under the name eoutlet.

Continue reading