Post-Mortem: The massive lemmy.world -> lemmy.dbzer0.com federation delays.

db0@lemmy.dbzer0.com · 8 months ago

Post-Mortem: The massive lemmy.world -> lemmy.dbzer0.com federation delays.

kbotc@lemmy.world · 8 months ago

Tossing stuff on the same server is not great as I don’t want to pay for fast storage for my image store, but I want fast for my DB. My web server should have extra CPU and network but is otherwise ephemeral. This is the same stuff people have been running for years and is microservices 101.

The correct thing to do here is build in tracing and profiling hooks, as an example OpenTracing so something like Jaeger can consume and show problems and would have lit this up like a Christmas tree, Pyroscope can show changes over time in where CPU goes, and logs get shuffled off into graylog or some other centralized service for correlation.

nutomic@lemmy.ml · 8 months ago

Images can be stored in S3 so that’s not an issue. And Lemmy has some tracing logs as well as Prometheus stats, not sure if db0 tried looking into those.

db0@lemmy.dbzer0.com · 8 months ago

I don’t think if seen mention of these anywhere or how to use them