Exploring Squirrel: A Fluent Data Analysis Library for .NET

Exploring Squirrel: A Fluent Data Analysis Library for .NET
by Brad Jolicoeur
11/27/2025

In my previous article, I explored how .NET developers can leverage Microsoft.Data.Analysis (DataFrames) to perform data analysis tasks without needing to switch to Python. It was a great exercise in showing that C# is more than capable of handling data workloads.

After publishing that piece, I had an interesting interaction. (Sudipta Mukherjee)[https://www.linkedin.com/in/sudipto80/], the author of the Squirrel library, reached out and suggested I take a look at his project. Intrigued by the promise of a more "Pandas-like" experience for .NET, I decided to dive in and explore what Squirrel has to offer.

After spending some time building a Proof of Concept (POC), I've found Squirrel to be a surprisingly powerful and developer-friendly tool. It offers a level of convenience and readability that, for certain tasks, makes it a compelling alternative to the official Microsoft DataFrame library.

What is Squirrel?

Squirrel (available on NuGet as TableAPI) is a data processing and analytics framework designed specifically for .NET. Its core philosophy is to provide a fluent, readable API that allows you to express data transformations as a series of business rules.

While Microsoft.Data.Analysis focuses on providing a high-performance, low-level memory structure for data (similar to NumPy arrays or Apache Arrow), Squirrel focuses on the developer experience of writing analysis scripts. It aims to be an all-in-one toolkit that handles everything from data loading and cleaning to statistical analysis and visualization.

Squirrel vs. Microsoft.Data.Analysis

The biggest difference you'll notice immediately is the API style.

Microsoft.Data.Analysis uses a more traditional, object-oriented approach. You work with DataFrame objects and perform operations that often feel like database operations or LINQ queries, but with a bit more verbosity when handling types.

Squirrel, on the other hand, uses a method-chaining style that reads almost like English.

Feature Squirrel (TableAPI) Microsoft.Data.Analysis
Primary Use Case ETL, Data Cleaning, Reporting, Business Logic Machine Learning Prep, High-Performance Compute
API Style Fluent, chainable (e.g., .RemoveOutliers().SortBy()) Object-oriented, imperative (e.g., df.Filter(), df.Sort())
Readability High; reads like a specification Moderate; requires more boilerplate
Data Cleaning Excellent (Built-in RemoveOutliers, Anonymize, Normalize) Basic (Manual filtering/imputation required)
Type System Hybrid (Dynamic + Strong RecordTable<T>) Strictly Columnar (PrimitiveDataFrameColumn<T>)
Visualization Built-in connectors (Google Charts, etc.) Requires external libraries (e.g., XPlot.Plotly)

The "Pandas for .NET" Experience

For many developers, Python's Pandas library is the gold standard for data manipulation because of its expressiveness. Squirrel brings that same "dataframe" capability to C#, but with a distinct "home court" advantage: LINQ Integration.

Instead of learning a new query syntax (like Pandas' boolean masking), you can use the C# skills you already have.

// Squirrel + LINQ = ❤️
data.SplitOn("bedroom_count")
    .Select(group => new
    {
        Bedrooms = group.Key,
        AveragePrice = Math.Round(group.Value["price"].Average(), 2)
    })
    .ToTableFromAnonList()
    .PrettyDump();

This snippet demonstrates one of Squirrel's best features: Hybrid Typing. You can start with loose, dynamic data (reading a CSV where you don't know the schema yet), project it into a strongly-typed anonymous object using LINQ, and then convert it back into a Squirrel Table for further processing.

A Practical Example: Housing Price Analysis

To test Squirrel, I revisited the housing price dataset from my previous article. My goal was to load the data, clean it, and perform some aggregations.

Here is how that workflow looks in Squirrel:

using Squirrel;
using Squirrel.Cleansing;

// 1. Load the data
var data = DataAcquisition.LoadCsv("house.csv");

// 2. Clean & Transform
data.RemoveOutliers("price")
    .AddColumn("price_per_sqm", "[price]/[net_sqm]"); // String-based formula!

// 3. Analyze
data.SplitOn("bedroom_count")
    .Select(group => new
    {
        Bedrooms = group.Key,
        AveragePrice = Math.Round(group.Value["price"].Average(), 2)
    })
    .ToTableFromAnonList()
    .SortBy("AveragePrice", how: SortDirection.Descending)
    .Top(10)
    .PrettyDump(header: "Average Price by Bedroom (Top 10)");

Key Highlights

  1. SplitOn vs GroupBy: Squirrel uses SplitOn to divide the table into chunks. The projection inside .Select() allows you to easily map the key (bedroom count) and the value (the sub-table for that group) into a new shape.
  2. ToTableFromAnonList: This is a fantastic utility. You can project your data into an anonymous type in C#, and Squirrel instantly converts it back into a Table object for further processing.
  3. PrettyDump: For anyone who uses Polyglot Notebooks or dotnet-script, this method is a lifesaver. It outputs a beautifully formatted ASCII table to the console, making debugging and exploration incredibly fast.

Beyond Basic Analysis

One area where Squirrel really surprised me was its built-in utilities for tasks that usually require external tools.

SQL Generation

In my POC, I wanted to take my analyzed data and generate a SQL script to insert it into a database. Usually, this involves writing a loop and string interpolation. Squirrel has this built-in:

// Convert our analysis result to a strongly-typed helper
var recordTable = RecordTable<BedRoomSize>.FromTable(analysisResult);

// Generate SQL statements automatically
var createScript = recordTable.ToSqlTable().CreateTableScript;
var insertScript = recordTable.ToSqlTable().RowInsertCommands;

File.WriteAllText("analysis.sql", createScript + "\n" + string.Join("\n", insertScript));

This feature alone makes it an excellent tool for ETL (Extract, Transform, Load) scripts where you need to scrub a CSV file and get it into a SQL Server quickly.

Data Cleaning Suite

Squirrel includes a suite of cleaning methods that are often tedious to implement manually. This is where it really shines as a "business logic" tool:

  • RemoveOutliers(column): Automatically detects and removes statistical outliers using algorithms like IQR.
  • Anonymize(column): Great for GDPR compliance; it can mask PII data (like emails) with a single call.
  • RemoveNonMatches(column, regex): Filters data based on patterns (great for email or phone number validation).
  • NormalizeColumn(column, strategy): Can fix casing issues or trim whitespace automatically.

Call to Action: Let's Build the Ecosystem

The future of .NET depends on it being seen not just as a web or enterprise backend platform, but as a performant workhorse for AI and Machine Learning. For that to happen, we need more than just raw performance—we need mature, accessible tooling that makes data work a joy.

Squirrel is a fantastic example of this kind of tooling, but like all open-source projects, it thrives on community. Having a vibrant ecosystem of libraries that rival Python's is critical to illustrating the power of .NET to the broader data science world.

I encourage you to not just use Squirrel, but to participate. Star the repo, open issues, contribute documentation, or build your own POCs. Let's show that the .NET community is ready to lead in the data space.

Conclusion

If you are a .NET engineer looking to do some quick data analysis, or if you need to write a script to clean up a messy CSV file before importing it, I highly recommend checking out Squirrel.

While Microsoft.Data.Analysis is still the go-to for heavy-duty integration with ML.NET pipelines or processing millions of rows where memory layout is critical, Squirrel offers a developer experience that is simply more fun and productive for ad-hoc analysis, ETL scripts, and reporting. It bridges the gap between the raw power of .NET and the ease of use that Python developers have enjoyed for years.

Check out my exploration repo for more examples, including HTML report generation and interactive notebooks.

You May Also Like


From C# Developer to ML Expert: Building a House Price Prediction System with ML.NET

csharp-mlnet.png
Brad Jolicoeur - 10/13/2025
Read

The Architect’s Guide to .NET Templates: Building Scalable Golden Paths

architect-patterns.png
Brad Jolicoeur - 10/01/2025
Read

Master AI in Software Engineering: Vibe vs. Spec Coding

ai-powered-engineer.png
Brad Jolicoeur - 09/24/2025
Read