@QuizzaciousOtter

QuizzaciousOtter@lemm.ee · 2 months ago

Right, true!

QuizzaciousOtter@lemm.ee · 2 months ago

I mean, yeah, that’s the point of compression. I don’t quite get what you mean by that comment.

QuizzaciousOtter@lemm.ee · 2 months ago

I really don’t think that’s a lot either. Nowadays we routinely process terabytes of data.

QuizzaciousOtter@lemm.ee · 2 months ago

Oh, I know, believe me. I have some painful first-hand experience with such code.

QuizzaciousOtter@lemm.ee · 2 months ago

I think portability and easy parsing is the only advantage od CSV. It’s definitely good enough (maybe even the best) for small datasets but if you have a lot of data you need a compressed binary format, something like parquet.

QuizzaciousOtter@lemm.ee · 2 months ago

Is 600 MB a lot for pandas? Of course, CSV isn’t really optimal but I would’ve sworn pandas happily works with gigabytes of data.

QuizzaciousOtter@lemm.ee · 4 months ago

I have to mention dataclasses here, especially with frozen=True.

Seriously, use dataclasses whenever possible, they’re great.

QuizzaciousOtter@lemm.ee · 4 months ago

EAFP - “Easier to ask for forgiveness than for permission”.

For those who are (like me) unfamiliar with this… acronym?