Shikhil Saxena

Oct 10, 2025 • 1 min read

HOW I ACCIDENTALLY CREATED THE FASTEST CSV PARSER EVER MADE

🧠 Key Highlights

  • Origin Story: It started as a personal challenge to apply branchless programming to a real-world problem—CSV parsing, notorious for its unpredictable branching due to commas, quotes, and newlines.

  • Naive vs Optimized: The initial scalar parser was clean but slow, plagued by branch mispredictions and cache misses. Sanix then pivoted to AVX-512 intrinsics, enabling parallel processing of 64 bytes at once.

  • SIMD Magic:

    • Uses AVX-512 to compare characters in bulk.

    • Generates bitmasks to identify delimiters without branching.

    • Applies bit manipulation (_tzcnt_u64, popcntu64) to extract positions efficiently.

  • Memory Mastery:

    • Employs memory-mapped files (mmap) for zero-copy access.

    • Advises the kernel to use huge pages for fewer TLB misses.

    • Aggressively prefetches data to keep caches fed.

  • Node.js Integration: Wrapped the C parser using N-API, enabling blazing-fast CSV parsing in JavaScript with minimal overhead.

  • Benchmark Results:

    • CLI: Beats tools like csvkit, rust-csv, and miller by wide margins.

    • Node.js: Outperforms papaparse and csv-parse by 2–4x in throughput.

  • Real-World Impact: cisv can process 1TB of CSV data in ~10 minutes, compared to hours with traditional parsers.

  • Caveats:

    • AVX-512 throttles CPU frequency and consumes more power.

    • Portability is limited to x86-64; ARM would require NEON rewrites.

  • Philosophy: Optimization isn’t evil—it’s essential when you're in the “critical 3%” where performance truly matters.

Join Shikhil on Peerlist!

Join amazing folks like Shikhil and thousands of other builders on Peerlist.

peerlist.io/

It’s available... this username is available! 😃

Claim your username before it's too late!

This username is already taken, you’re a little late.😐

0

3

0