A couple days ago, someone posted on /0 (the meta community for the Divisions by zero) that the incoming federation from lemmy.world (the largest lemmy instance by an order of magnitude) is malfunctioning. Alarmed, I started digging in, since a federation problem with lemmy.world will massively affect the content my community can see. As always [...]
Tossing stuff on the same server is not great as I don’t want to pay for fast storage for my image store, but I want fast for my DB. My web server should have extra CPU and network but is otherwise ephemeral. This is the same stuff people have been running for years and is microservices 101.
The correct thing to do here is build in tracing and profiling hooks, as an example OpenTracing so something like Jaeger can consume and show problems and would have lit this up like a Christmas tree, Pyroscope can show changes over time in where CPU goes, and logs get shuffled off into graylog or some other centralized service for correlation.
Images can be stored in S3 so that’s not an issue. And Lemmy has some tracing logs as well as Prometheus stats, not sure if db0 tried looking into those.
I don’t think if seen mention of these anywhere or how to use them