Data Analysis with C#: Leveraging .NET for High-Performance Tasks

Data Analysis with C#: Leveraging .NET for High-Performance Tasks
by Brad Jolicoeur
08/17/2024

Assuming you are skilled in C# and the .NET ecosystem, you may be wondering if you need to learn Python to do data analysis or AI/ML. The short answer is you can do everything that Python does with C# and it will likely perform better and take less time to build in C#.

Background

As someone who's built .NET solutions since 1.0, I'm a bit biased. That said, based on my research, Python has historically been easier to learn for non-programmers and is just over 10 years older than .NET. The low bar of entry for non-programmers has been Python's advantage.

In recent years, Microsoft has made a dramatic shift towards embracing Open Source. As a part of this shift, they introduced .NET core and then Open Sourced all of .NET. In more recent years, .NET has introduced concepts like top-level statements, minimal API, etc. to take the ceremony and learning curve out of C# and make it much more accessible to newcomers and non-programmers.

Given this information, if you are already a skilled C# programmer, you do not need to learn Python to be equally effective as someone using Python to complete data analysis tasks. You can quickly build the same functionality in C# that will likely perform drastically faster.

To validate this assertion, let's get into a simple example of a somewhat mundane yet routine example of data analysis. Let's say, you are given a csv file with data and you need to filter or transform it.

In Python, you would use the data analysis library called Pandas to perform this task with Dataframes. In the .NET world an equivalent is the Microsoft.Data.Analysis library.

Why use DataFrames in C#?

  • I have a world of knowledge in C#
  • I know C# and I have a data analysis task that I need to complete quickly
  • My team primarily has C# knowledge and is not strong with Python
  • I want to use ML.NET

Simple Example

In our example we were given a csv file with housing prices. We need to filter this list for current prices less than 250,000 and output it to another csv file for some other down stream task.

We are going to leverage dotnet-script and create a file called dataframe.csx. If you'd like to learn more about dotnet-script, check out my article on scripting with C#.

Note: you could easily do this same thing in a console application as a top-level statement or even use the Polyglot Notebooks extension in VS Code to do this.

In your dataframe.csx file add the following code.

#r "nuget: Microsoft.Data.Analysis, 0.21.1"

using System.IO;
using System.Linq;
using Microsoft.Data.Analysis;

// Define data path
var dataPath = Path.GetFullPath(@"home-sale-prices.csv");

// Load the data into the data frame
var dataFrame = DataFrame.LoadCsv(dataPath);

// output a description of the data loaded
Console.WriteLine(dataFrame.Description());

// Filter for prices over 200,000
PrimitiveDataFrameColumn<bool> boolFilter = dataFrame["CurrentPrice"].ElementwiseLessThan(250000);
DataFrame filteredDataFrame = dataFrame.Filter(boolFilter);

Console.WriteLine(filteredDataFrame.Description());

// Save the filtered output to a csv file
DataFrame.SaveCsv(filteredDataFrame, "result.csv", ',');

You can create the example csv file by creating a file named home-sale-prices.csv and then pasting the following data into the file.

Id,Size,HistoricalPrice,CurrentPrice
1,4174,302283,350235
2,4507,296769,175939
3,1860,137065,592141
4,2294,323165,586157
5,2130,199299,302906
6,2095,111534,168047
7,4772,140397,438249
8,4092,357750,225766
9,2638,453531,558923
10,3169,363160,565192
11,1466,155591,194262
12,2238,320884,537261
13,1330,123247,435920
14,2482,124300,311152
15,3135,182798,278376
16,4444,109268,287848
17,4171,448951,242787
18,3919,374329,277948
19,4735,294776,205016
20,1130,317851,552690

Then execute your script with dotnet script dataframe.csx in a console window.

You will see a summary of the original Dataframe and the filtered Dataframe in the console window and a result.csv file will be created with the filtered results.

C:>dotnet script dataframe.csx
Description     Id              Size            HistoricalPrice CurrentPrice
Length (excluding null values)20              20              20              20
Max             20              4772            453531          592141
Min             1               1130            109268          168047
Mean            10.5            3039.05         256847.4        364340.75

Description     Id              Size            HistoricalPrice CurrentPrice
Length (excluding null values)6               6               6               6
Max             19              4735            448951          242787
Min             2               1466            111534          168047
Mean            10.5            3511            277561.84       201969.5
Id,Size,HistoricalPrice,CurrentPrice
2,4507,296769,175939
6,2095,111534,168047
8,4092,357750,225766
11,1466,155591,194262
17,4171,448951,242787
19,4735,294776,205016

If you are curious how fast this is with a larger set of data, you can find a sample with 10k rows in my GitHub repo here.

While this is a very simple example, it shows how you can do this simple task in essentially 4 lines of code after you take out the Console.Writeline that I put in for demonstration. Arguably, you could do this same task with Excel, but since we used dotnet-script with Dataframes, it is now easily repeatable.

Conclusion

If you already have C# skillset and you are not looking to make a full time career in Data Science, Dataframes with is a valuable tool to have in your toolbelt.

If you do want a full time career in Data Science then you should learn Python merely because it won the popularity contest long ago and it is unlikely you'll get past the resume screen for a Data Science job without Python experience listed.

Note: I'm not saying that you don't need to learn languages other than C#. I firmly believe all software engineers should learn multiple languages. Learning multiple languages helps you master your craft and increases your job opportunities. Python is a good choice if you are looking to learn a new language.

References

You May Also Like


Convert HTML into JSON using Semantic Kernel and OpenAI

solvingsomethingawesome.jpg
Brad Jolicoeur - 09/28/2024
Read

Fabricate Sample Data with ChatGPT

fall-road.jpg
Brad Jolicoeur - 08/24/2024
Read

Exploring C# Scripting with dotnet-script

scatteredtree.JPG
Brad Jolicoeur - 08/07/2024
Read