UPDATE (4/27): There were two reasons that affected federation between Ani.Social and Lemmy.World.
The first reason was an outgoing activity that got stuck in queue. This was resolved by manually marking that activity as successful.
The second reason was a subsea fiber cut between Europe and Asia which caused incoming activities from Lemmy.World to lag behind.
The original plan was to move Ani.Social closer to Lemmy.World this weekend but seeing the activity queue rapidly dropping over the past two days, it’s unlikely this will push through. We will continue to monitor this however.
Again, thanks to @wjs018 for informing me about this and for keeping others up to date in this post’s comments. Thank you MrKaplan from Lemmy.World for helping us resolve the issue!
There are ongoing issues between incoming and outgoing activities between Ani.Social and Lemmy.World. Posts, comments, and votes from both instances are not in sync with each other.
The exact cause is to be determined. The instance may experience downtime and interruptions until further notice.
In the meantime, we suggest using an account on an instance that is federated with both Ani.Social and Lemmy.World.
Thanks to @wjs018 for informing me about this issue. Thanks also to MrKaplan from the Lemmy.World team for continuously helping us resolve this issue.
Alright, I have been doing some poking around the grafana dashboard and noticed that about 20k activities/hour (~ 6 per second) seems to be the limit that ani.social can process coming in from lemmy.world. Whenever the activity peaks on world go over that (generally EU afternoon/NA morning), we start to lag a bit. Then, after the peak has subsided, we catch up.
All this really seems like it is putting a pretty hard limit on how big the fediverse could actually grow without federation becoming completely impossible. I was reading up on efforts that reddthat has undertaken to improve federation from world (since they are in AUS). Their EU-based proxy seems to have worked well, but even with batching like this, federation is always going to be a lot of bandwidth and message passing between servers that just might not scale past a certain point. Anyway, I am off topic.
In any case, the lag seems like it will be coming and going with a bit of regularity, kind of like fediverse tides.
The latency limit is caused by the activity queue that was introduced in v19.
Servers can only talk as fast as round time allows, because Lemmy instances now keep track that each event actually does get federated, and in the right order.
That last point means each event only gets sent once acknowledgement of the last one is received, creating a hard limit for how many events can be communicated, depending on ping. A mere two per second with a latency of 500ms.
This serial process will obviously need to be parallelized. But that’s difficult.