Foundation RFP: Babylon Gateway

Alfred_Dulaire · 5 February 2026 14:02

After discussing this deeper with Adam, I can understand the need for fundation to open those strategic “architecture” topics (gateway / connector / signaling) early, as having them on track would let fundation unlock its next steps.

However I’d be much more comfortable if the DAO “Radix Accountability Council” has a say in the new operators selection (even if the DAO entity itself is not yet finalised).

Such Working Group is important for some risks mitigation (we don’t want to be irresponsible) :

Competitor Capture (“Trojan Horse”):
Those architectures parts are protocol’s choke point. Ensure provider is community sourced / at least aligned with community’s vision, to avoid hostile competitor capturing our infrastructure in those hard times.
Technical Debt:
Ensure the contract has “break clauses” so once the DAO is up and running, it isn’t locked into paying for a legacy Babylon code when the inevitable Xian rebuild starts.

Pawel_XRD · 9 February 2026 15:42

I’m all good with PostGRES choice … but I still don’t get the single-writer issue. Is the DataAggregator code doing the ordering and other stuff?? or is in fact some stored procedure doing it in the DB side, post processing the ingested data?
There are native solutions in PostGRES for multi-writing, collision and duplicate detection.
I’m not a DBA or a DB expert at all, so I wonder why you decided to avoid it and thus forfeit the ability to have the DB fed from multiple aggregators.

DataAggregator itself does not perform any ordering, but it does much more than simply reading raw transaction data and storing it in the database. If its only responsibility were to store raw transaction data and expose it via an API, it would not be required at all, as a node could handle that directly.

The data present in a transaction is not always sufficient to populate the gateway’s custom read model used to serve Gateway API requests. For this reason, transactions must be processed sequentially.

When DataAggregator processes transactions sequentially, you have a guarantee that once a transaction with state version X is processed, requests for that transaction can be served immediately, because all required data is already available. If transaction processing were not sequential, additional and complex logic would be required to determine the latest state version that can be safely served (i.e., the highest state version from a consecutive sequence starting at state version 1).

While it is probably possible to relax this requirement and rework the read models and processing logic, doing so would likely require a substantial or even complete rework of DataAggregator.

Regarding stored procedures, there is no business logic executed in the database. Database is used only to store and return data. This is one of the reasons why sequential processing is required: it avoids the need for complex and time-consuming queries or moving business logic to the database layer.

Examples from the Gateway (illustrative, not exhaustive)

To determine how much of a resource was initially minted, the transaction in which the resource was created must be tracked. Over time, the resource may be burned or minted in subsequent transactions that emit burn or mint events. Such events contain only the amount that was burned or minted. To serve requests about the current total supply of a resource, or the total amount ever burned, the system must rely on data from previous transactions.

To store accurate totals, it is necessary to know how much was initially minted, as well as how much was minted or burned in earlier transactions. This also applies when returning aggregated totals per entity (for example, key–value store entries while iterating).

You mention that anything new or improved bears the challenge of backward compatibility … but aren’t the clients all API clients?? Does the gateway provide any direct access or simple direct mapping between DB tables and API calls?
It makes all the sense that we can’t break API compatibility - at most, introduce versioning and if the clients miss it, assuming they’re v1.
But it makes absolutely no sense that you can’t change what’s behind!
That being true, why even bothering in using an API front-end for clients?? I find it hard to believe, so for now I’m assuming the simplified phrasing can be furthered with some details.
We should be able to completely change the DB layer whilst maintaining full API compatibility.

Of course, it is always possible to rework the database layer while maintaining full API compatibility, but replacing what PostgreSQL already provides may be challenging, at least in my opinion. That said, feel free to evaluate the options.

Regarding compatibility and Gateway users: the Gateway was designed and maintained with multiple extensibility points in mind. Some users have forked the codebase and added features on top of what we originally developed. While backward compatibility was never formally guaranteed, we intentionally tried to avoid breaking community tools and services.

projectShift · 11 February 2026 17:30

Thank you for taking the time to elaborate and clarify this.

It is now clearer for me that DataAggregator is doing things I wasn’t expecting it to but that’s derived from the way the data model was designed at the DB layer.

No point in insisting on the whys, we’ll have to live with it until we transition from Babs. It does make it into “d-dot-go-changing-stuff-here” territory, so, in essence, legacy

Glad it’s clear that backward compatibility is an API-only game, not backend, as that was the only reasonable implementation.Gives us the possibility to support such legacy even with a completely renewed solution, if we want.

Michael · 19 February 2026 17:07

I have created an RFC on this topic.
The goal of the RFC is to outline an approach for the community to take over the gateways. The operation of the servers, etc., would be carried out by community members (if, hopefully, interested parties can be found):

(Adding the link to connect the RFP with the RFC.)

shambu.xrd · 27 February 2026 09:56

Hi All
Shambu Pujar and Marek Karwacki (Current Foundation DevOps team) are proposing to operate the Babylon Mainnet Gateway + RCR/SS ( all of P1 services). for the Radix ecosystem.

Gateway Proposal

The database and Kubernetes cluster is planned to run on AWS while the Radix full nodes are planned to run on OVH — combining AWS’s reliability with OVH’s cost-effective compute.
The setup we’re proposing has been battle-tested for a few years now, handling a variety of workloads and traffic fluctuations in production.
We use a Blue/Green deployment strategy, meaning upgrades — including full ledger resyncs — happen with zero downtime for end users.
We can be production-ready within 5-8 weeks of project start

Relay and Signalling Server Proposal

Shambu Pujar and Marek Karwacki are also proposing to operate the Radix Connect Relay (RCR) and WebRTC Signalling Server.
Both services are lightweight message relays — they pass encrypted data between Radix clients without inspecting or storing it.
Our preferred approach (Option A) is to run these as add-on workloads on the same infrastructure already built for the Gateway,

We had a thought on if it is possible to offset some of the costs from service offering to commercial vendors within ecosystem. It is technically feasible, however commercially not viable given the conditions and probable payment modes that a commercial vendor might propose. So the cost offsetting from the proposal fee in near term doesn’t seems viable.

mountaintop · 1 March 2026 23:05

I have crunched some numbers, and I would be ready to setup & operate the full Babylon Mainnet Gateway for around 300-400 EUR monthly.

Not charging any additional costs, other than the costs of running the servers, but would like to share the costs with the community by asking to stake & donate to my validator node.

Obviously, would be looking for some developers to aid in the maintenance & uptime of the servers through an on-call rotation, but I believe there are a few among the community members that are available for this.

The relay & Signalling services would be probably around 100 EUR monthly max (probably much less, didn’t explore it as much detail).

So for the entire stack of P1 it would be around 500 EUR monthly I would say.

Leonets · 2 March 2026 13:52

hello, what are the next steps in going on deciding with these P1 services ?

mountaintop · 2 March 2026 15:55

It’s up to the RAC to decide the next steps I believe, proposals have been laid out.

octo · 2 March 2026 16:16

That’s a small amount of money. Would be amazing if possible. How far have you thought through the amount of traffic you’ll be receiving? 250k requests / hour would be realistic.

Also, I imagine you’d be running this on a VPS? Might be a little less robust than the setup Shambu is proposing? Maybe have a couple fail-overs?

mountaintop · 2 March 2026 17:30

It’s up to the RAC to decide the next steps I believe, proposals have been laid out.

Yeah, so I stemmed from the idea of @Michael in his post (can’t link for some reason).

I just then explored a bit on Hetzner & OVH, and found adequate bare metal servers that we can use for our use case.

The idea is to run the following and setup on Hetzner on a Bare Metal:

(x2) AX42-U Servers, with the following configuration:
RAM - 64 GB DDR5
DRIVES - 2x 512 GB NVMe SSD (Gen4)
CPU - AMD Ryzen™ 7 PRO 8700GE
ADDITIONAL DRIVE - 7.68 or 3.84 TB NVMe SSD Datacenter Edition

Costing 190 EUR each, per month

These would be used for PostgreSQL DB mainly, with 1 primary one, and a secondary one, in case primary goes down, where the fail-over will be using Patroni using just a small performance VPS.

Maybe even to host a couple of nodes on these servers too, but needs to be tested performance wise, but ideally not, unless we absolutely need to.

A Storage Box on Hetzner for 10TB (20 EUR per month) for continuous data backup, with its unlimited traffic. Or use RsyncNet or OVH storage even.

As for the Data Aggregator/Gateway API & (respective Gateway Proxy) nodes requires a bit more thought perhaps, which is briefly outlined on the radix docs as minimal hardware requirements, but would like to hear more on the actual current server specs used for better approximation. But for now, I assumed that they don’t need much in terms of hardware.

Leonets · 4 March 2026 13:54

I dont’ know but I ‘d like to see any kind of proposal going on, yours is very good and for me it is ok. I can’t simply continue thinking the gateway has not a plan yet. And also all the other P1 services, also for example I have read the Wallet needs an entity to continue to stay in Apple Store.

skywave · 11 March 2026 15:50

What is your validator name?

mountaintop · 11 March 2026 17:38

validator_rdx1sv9gp89vf89dmlr5uuhquvggyrrsew5mjfykyf868ja70ukst3jsug

Joe_Armstrong · 23 April 2026 12:12

Posting in response to the Foundation RFP for Babylon Gateway operators. Full technical document available. This is the readable summary.

Who we are

LinkPool has operated Web3 infrastructure since 2017. We own and operate a cluster of production servers across three availability zones in Manchester.

Live scale today:

Chainlink infrastructure across major chains (Data Streams, CCIP, OCR, Automation)
RPC endpoints across major mainnet chains
Lido DVT clusters using Obol and SSV technology
Foundation delegations from Polygon, NEAR, and IOTA
AAA Staking Rewards rating, 72 months on mainnet, zero slashing

The chains and protocols we run are the clients. Chainlink is the largest.

How we meet the RFP

PostgreSQL 2TB with 60GB/month growth. Dedicated database node with 128TB NVMe headroom. Monitoring storage and offsite backup to Wasabi via Velero are included in the operating retainer (see economic model below).

99.9% uptime with blue/green. Our target SLA is 99.99% across three availability zones. ArgoCD handles zero-downtime rollouts via blue/green or canary patterns. etcd quorum survives losing a full datacenter.

Sub-1s latency. Workloads run on owned hardware without a hypervisor layer. 400G spine, 100G leaf, dual 10G ECMP to transit. We measure and publish.

DDoS protection. Cloudflare Basic included in the platform fee. Cloudflare Advanced with WAF available as an add-on at $310/mo.

Complex DB migrations without disruption. CloudNativePG with logical replication, blue/green schema cutover, and documented rollback. We run this pattern across our own Postgres clusters.

Full stack redundancy. Two parallel stacks (blue and green) served behind a Cilium load balancer. MLAG failover at every leaf pair, ECMP across the spine.

K8s orchestration for >50 req/s. That’s our default operating mode. The platform runs thousands of production pods across owned hardware.

Economic model

Two layers.

Layer 1: Foundation operating retainer. The Foundation covers the full monthly operating cost until heavy consumer revenue builds. This includes the dedicated Postgres node, dual-stack Gateway compute, Radix full nodes, bandwidth, monitoring, and on-call operations. The exact figure is confirmed after a short scoping exercise on full node storage and bandwidth requirements.

Layer 2: Heavy consumer tiers. As dApps, data aggregators, and institutional integrators subscribe to paid tiers, that revenue offsets the Foundations monthly retainer. The tiers mirror our RPC pricing: $99/$299/$599 per month for 50, 200, and 500 sustained req/s. Custom above that. No overages, no compute units.

As heavy consumer revenue grows, the Foundations direct cost decreases. The goal is a self-sustaining Gateway where ecosystem usage funds the infrastructure. The Foundation carries the cost early; the ecosystem pays it back as it scales.

Honest caveats

Primary geography is the UK. For global read-heavy traffic we propose Cloudflare CDN caching at $310/mo. That routes reads to edge worldwide. Happy to discuss a second region if the Foundation wants underwritten expansion.

ISO 27001 and SOC 2 are completing. Not a blocker for the technical operation, and we can share the current audit progress under NDA.

Core team is lean. We offset scale through automation. Claude AI integrated with PagerDuty has resolved 904 incidents in 50 days at 93.3% success. Average analysis time is 6.2 minutes. Documented in a case study we can share.

Next steps

Named contact: Joe Armstrong, Growth and Partnerships Director, joe@linkpool.io.

Technical lead on the engagement: Jonny, Founder and Infrastructure lead.

Happy to run a working session with the Foundation to walk through the technical PDF, the pricing model, and a proposed timeline.

Magal36 · 25 April 2026 15:05

The foundation won’t be your costumer, as it’s dismantling, the community DAO is. So you will need to provide all the details here or schedule this session with the RAC and KOLs on the community. Kinda of no point addressing the foundation anymore

Joe_Armstrong · 27 April 2026 13:47

Thanks Magal36, appreciated.

Two questions to make sure I land in the right place:

1. Who are the right RAC members and KOLs to coordinate a working session with? 2. Is the preference for me to post the full technical PDF, pricing model, and proposed timeline directly on this thread for everyone to review, or to run a separate community call with the active stakeholders first?

Magal36 · 27 April 2026 17:03

Joe, you seem a bit out of the loop wrt the situation Radix is in. We’re at present in a limbo, Foubdation is closing and DAO is in the process of being legally instantiated. I’ll try to answer these questions as accurately as I can:

The RAC members at present are @projectShift @Avaunt @Alfred_Dulaire @Tadas_GetRadix and I may be forgetting someone. To gather support from KOLs on the community, maybe you will have to spend some time in the official telegram channel, as it’s usually more active than Discord or this platform (although for governance purposes this platform use is preferable).
I really didn’t understand why you didn’t post the full proposal here in the first place, feel free to do so, because the RAC might like it or not, but ultimately the community will have to vote on this, so it doesn’t make sense the proposal not being public.
Since we’re in this limbo situation, these kinds of decisions are being take very slowly, so please take this into consideration. Also, you have already competing proposals from people that used to work for Foundation to keep running these services, so both proposals will be competing. Take a look at Shambu’s and Michael’s proposals.
Last but not least, we don’t know yet how much FIAT is left for the DAO treasury, and general feeling on the overall community is that we have very low budget, so maybe you will need to tailor down your expectations and service promises (but not 100% sure on budget yet).

Hope I have been helpful

Joe_Armstrong · 28 April 2026 10:14

Makes sense, thanks for the context on the DAO transition. I was not aware the Foundation was winding down and that RAC is now the decision-making body.

Happy to post the full proposal publicly will do that now:

LinkPool: Babylon Gateway managed operator proposal

Posting in response to the Foundation RFP for Babylon Gateway operators.

Who we are

LinkPool has operated Web3 infrastructure since 2017. We own and operate a cluster of production servers across three availability zones in Manchester.

Live scale today:

Chainlink infrastructure across major chains (Data Streams, CCIP, OCR, Automation)
RPC endpoints across major mainnet chains
Lido DVT clusters using Obol and SSV technology
Foundation delegations from Polygon, NEAR, and IOTA
AAA Staking Rewards rating, 72 months on mainnet, zero slashing

The chains and protocols we run are the clients. Chainlink is the largest.

How we meet the RFP

PostgreSQL 2TB with 60GB/month growth. Dedicated database node with 128TB NVMe headroom. Monitoring storage and offsite backup to Wasabi via Velero included in the operating retainer.

99.9% uptime with blue/green. Target SLA 99.99% across three availability zones. ArgoCD handles zero-downtime rollouts. etcd quorum survives losing a full datacenter.

Sub-1s latency. Workloads run on owned hardware without a hypervisor layer. 400G spine, 100G leaf, dual 10G ECMP to transit.

DDoS protection. Cloudflare Basic included in platform fee. Advanced WAF available as add-on.

Complex DB migrations without disruption. CloudNativePG with logical replication, blue/green schema cutover, documented rollback.

Full stack redundancy. Two parallel stacks served behind a Cilium load balancer. MLAG failover at every leaf pair.

K8s orchestration for >50 req/s. Default operating mode across thousands of production pods on owned hardware.

Economic model

Two layers.

Layer 1: Foundation operating retainer covering full monthly operating cost until heavy consumer revenue builds. Exact figure confirmed after scoping.

Layer 2: Heavy consumer tiers at $99/$299/$599/month for 50, 200, and 500 sustained req/s. No overages, no compute units. Consumer revenue offsets the Foundation retainer over time.

Honest caveats

Primary geography is the UK. For global read-heavy traffic we propose Cloudflare CDN caching at $310/mo.

ISO 27001 and SOC 2 are completing. Can share audit progress under NDA.

Core team is lean. We offset scale through automation. Claude AI integrated with PagerDuty resolved 904 incidents in 50 days at 93.3% success rate.

Next steps

Joe Armstrong, joe@linkpool.io — Growth and Partnerships Director

Jonny — Founder and infrastructure lead

Happy to run a working session on the technical detail, pricing, and timeline.