Edit: it seems like my explanation turned out to be too confusing. In simple terms, my topology would look something like this:

I would have a reverse proxy hosted in front of multiple instances of git servers (let’s take 5 for now). When a client performs an action, like pulling a repo/pushing to a repo, it would go through the reverse proxy and to one of the 5 instances. The changes would then be synced from that instance to the rest, achieving a highly available architecture.

Basically, I want a highly available git server. Is this possible?


I have been reading GitHub’s blog on Spokes, their distributed system for Git. It’s a great idea except I can’t find where I can pull and self-host it from.

Any ideas on how I can run a distributed cluster of Git servers? I’d like to run it in 3+ VMs + a VPS in the cloud so if something dies I still have a git server running somewhere to pull from.

Thanks

  • solrize@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    edit-2
    2 days ago

    I see, fair enough. Replication is never instantaneous, so do you have definite bounds on how much latency you’ll accept? Do you really want independent git servers online? Most HA systems have a primary and a failover, so users only see one server. If you want to use Ceph, in practice all servers would be in the same DC. Is that ok?

    I think I’d look in one of the many git books out there to see what they say about replication schemes. This sounds like something that must have been done before.

    • marauding_gibberish142@lemmy.dbzer0.comOP
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 days ago

      Well it’s a tougher question to answer when it’s an active-active config rather than a master slave config because the former would need minimum latency possible as requests are bounced all over the place. For the latter, I’ll probably set up to pull every 5 minutes, so 5 minutes of latency (assuming someone doesn’t try to push right when the master node is going down).

      I don’t think the likes of Github work on a master-slave configuration. They’re probably on the active-active side of things for performance. I’m surprised I couldn’t find anything on this from Codeberg though, you’d think they have already solved this problem and might have published something. Maybe I missed it.

      I didn’t find anything in the official git book either, which one do you recommend?