Foundation RFP: Babylon Gateway

The Network Gateway is a publicly exposed entry point to the Babylon Radix network. It provides a higher-level API that aggregates and abstracts data from the underlying ledger, making it easier for clients to interact with the network.

The Gateway supports simplified transaction submission, tracks transaction status, and enables full historical access to ledger data, allowing users to query the state of any on-ledger object at any point in time.

The Network Gateway is primarily designed for use by wallets and blockchain explorers, as well as for lightweight queries from front-end decentralized applications (dApps). For exchange or asset integrations, back-end dApp services, or simpler use cases, it is generally recommended to use the Core API exposed by a node directly. A Gateway is typically required only when access to historical ledger snapshots or a more robust, scalable setup is needed.

The system is in three main parts:

  • The Database Migration sets up PostgreSQL database and applies schema migrations if necessary.

  • The Data Aggregator reads from the Core API of one or more full nodes, ingests from their Core API transaction stream endpoint, and commits transactions to a PostgreSQL database. It also handles the resubmission of submitted transactions, where relevant.

  • The Gateway API provides the public API for Wallets and Explorers. It handles read queries using the database, and proxies transaction preview and submission requests to the Core API of one or more full nodes.

This Request for Proposal (RFP) seeks submissions from community members interested in replacing the current Radix Foundation-operated Mainnet Gateway with a robust, high-performance, and resilient solution. The selected operator(s) will provide a default gateway service, ensuring optimal performance for the Radix ecosystem’s applications, wallets, and users.

You can find the full RFP here: https://drive.google.com/file/d/18m-Y5A2Te_1CJlVS8rww0HDHtbJvwg4O/view?usp=sharing

Further details on the RFPs and process can be found here: The Next Phase of Decentralization: RFPs for Gateway and Relay Services | The Radix Blog | Radix DLT

Please submit your proposal in this thread for consideration.

1 Like

The real issue is that this architecture is simply wrong for the task, Adam, that’s why it’s costly and demanding.

We’re forcing a centralized component into a decentralized network. This needs, imho, to be replaced by a wide set of infrastructure elements and proper high-grade load-balacing of them.

If done right, the load-balancing part is probably the only part that will be somewhat costly, as its costly and hard to make it multi-tenant - not just CF based like the current one.

Can you please elaborate on the why was this the chosen architecture?

I’ll start with the “small” detail that is the biggest detriment to have a large scale deployment (extract from the docs shared) - yes, this the one that most annoys me.

”For the Gateway to operate correctly, it is required that only a single instance of the
DataAggregator service is running and writing to the database at any given time. Running
multiple DataAggregator instances concurrently would result in write conflicts, data
collisions, and application-level errors
.”

you’re using PostGRES FFS!

it can properly support replicas for mass reading operations and also, more importantly, multi-version concurrency control (MVCC) for any multi-writer scenarios, including conflicts and collisions resolution!

Why did you decide to build an ingest process that’s limiting and just plain daft if you select a proper DB backend? It’s puzzling and makes sure we have a bottleneck!!

Because instead of having multiple DataAgregators feeding in from all possible nodes in the network (damn, in Babs it’s just 100 of them at most) … we’re stuck with just one, that on top of that, is not, in itself, feeding from multiple nodes either! It connects only yo one node! This is simply compounded bottlenecking by design!** see Edit 2

At worst, if it was a performance issue, you could add a buffer stage prior to ingest - using Kafka or some MQ solution like Rabbit to handle that. These would also be highly scalable horizontally btw.
Makes no sense. the FND spent too much money to build a crippled solution, imho.

Moreover, since this is mostly a high-volume, append-only scenario of data …. there are extensions of PostGRES that provide out-of-the-box optimizations for such scenarios like Citus and TimescaleDB used in conjunction!

These are well established technologies with proven history and wide support.
For a company wanting to build a sharded network to build a single-instance ingest method to keep a TX DB and ignoring known sharded/partition solutions for large scale and high-volume ingestion of time-series data is just the very definition of irony!

So, in essence, imho, this should be fully reviewed and rebuilt with a proper alignment to goals and tech and as such, we’ll be spending money to support a terrible design and bad architectural choices from the past.

Maybe you’ll get some responses to this RFP, but Community should rather immediately invest in replacing a bad solution instead of supporting it.

Edit: corrected a couple of typos

Edit 2: with some help from proper devs, I was able to check that the DataAggregator can indeed consume from more than one mode, using basic round-robin or simple distribution scheme; it does seem, however, that no one ever used it with more than one node actively.

7 Likes

that’s great stuff and makes some sense to me but from a business continuation perspective would it be preferable during the handover just to maintain the status quo to get the handover done or would it be better to tackle the issues you mentioned at the same time? that’s what i’m trying to get a handle on … the dao doesn’t want to bite off more than it can chew… but if it makes sense to do everything at once and we have the collective resources and expertise to do something sensible (akin to the fox rewriting hyperscale in rust) then let’s do that .. but we only get one chance at this and we don’t want to drop the ball.

4 Likes

The trade-off is not clear to me, honestly.

It mostly depends, imho, of what commitment we get tied-in with this not-FND run service.

I highly doubt we have the time to change anything, I agree, so a continuity solution is warranted, I don’t dispute that.
But it’s quite different if we’re replacing the solution for something proper or not.

The winning proposal should, imv, be tied only to how long we need it too run, not a fixed-time contract that will imply we must keep paying for it, even if we stop using it - pardon my french, but that would be some more of that teat-sucking from the past!

I mean, anyone responding to that RFP will thoroughly analyze the risks involved and the time it requires to operate it into profit or at least cost-bearing/no-losses … and that will require it to be either too expensive upfront with forfeited payments … or a long-termed contract that may be way too into the near-future :sweat_smile:

If we make it clear that any winning proposal must be written in such a way as it ceases to operate if the Community has a new solution to work with, giving it, ofc, preference in up-taking a new contract for this new solution, then all good.

Otherwise, I think we’ll just be deserving ourselves in this … and all that money, remember, is going to be less money handed-over to the community.

2 Likes

@Adam_XRD couple more details, I didn’t find in the doc/requirements (maybe it slipped me)

the community needs to have full auditable access, using the DAO and/or RAC, and make sure the systems and resources placed under the service rendering are indeed there and accordingly to specifications.

The service rendered should be able to be mapped and server more than one internet domain, if possible, so we don’t fully depend on the radixdlt.com one - this should be easily accommodated, btw, not a change that needs much from the proponent’s side.
Might become an issue with the wallet if we want automatic failover or load/balancing but that can be handled.

1 Like

tbf, the real issue is the chosen database for the node software. It doesn’t allow anything else to consume it besides the node software.
So all requests go to the node software to build the postgress database.
Running multiple instances of the datamigration isn’t really needed to increase the speed. The database probably needs to be build chroniclly so all FK/relations etc can be properly mapped.

Ideally the same party that takes over the running of the infra might also take the mission to fix the clunky architechre into a more reasonable one to make the service cheaper and more decentralized, if that’s possible. So all the timings are tied up.

This seems like something that should be managed by an IT company with networking and cybersecurity skills, one or more traditional IT companies could make their own offer to manage this gateway

1 Like

Sharing research here of “distributed sql databases”.

We all agree / believe that we are getting to Xian - we need a solution (ie database, aggregator and all that) that can keep up with it. We should thing ahead for sure.

gilesmorris-me and projectshift make excellent points about avoiding vendor lock-in. Who would want such thing ?

  • NO infrastructure handover should occur until the DAO and its roadmap is established. A partner that the community might not have chosen is a massive decision, potentially burdening us for years.

    Whaaat Minion GIFs | Tenor

  • Rushing this “out of the box” while not needed yet and other critical transition topics are still in flux is inappropriate at best, and suspicious at worst :

DAO selection, validators bandage > DAO ROADMAP for Xian / Validators / Connector (currently one DDOS attack on its IP and network is harmed) / Gateway / Wallet / engine toolkit / WebRTC / RCR / etc, as a whole package transfert, right ?

Centralized bottlenecks like gateway and connector are among the important topics ONCE community and DAO are ready to make the decision itself. Maybe it needs to be a service proposed by validators ?

For some costs saving, like it seems this is the post initial goal, I propose for you Adam to first check if your wife is still on the fundation payroll, and if yes verifying its relevance.

Thank you for your time

And I highly doubt it will be a relational DB one, makes little sense performance wise.

There was a small convo between me and Foxy on hyperscale-rs about it, seems we’re on the same page there, so I’ll sum up what’s a more likely scenario (pardon the jargon) :sweat_smile:

The relevant part starts here - Telegram: View @hyperscale_rs

projectShift, [03-Feb-26 10:36 PM]
note: I do not believe, at this point, that the future of this service can be properly executed using a relational DB engine like PoastGRES or any other of such nature
but I lack the fundamentals on why that was the choice initially
I hope it’s not just because “we want to have T-SQL to fetch from it” :roll_eyes:
but in any case, no shortage of choices nowadays for this sort of high-capacity DBs, I venture
and some even give you old fashion T-SQL compatible access anyways, should you choose to use them

flightofthefox (proven.network), [03-Feb-26 10:40 PM]
Yeah indeed.. the idea of indexing everything (richly) into one relational DB eventually goes the way of the dodo

I think probably what makes more sense is an intermediary layer designed for bottomless storage that just indexes near-raw data from each shard. And then dApps drink from that firehose to ETL use-case specific formats into whatever their chosen back-end is (whether that’s postgres, or anything else)

We have several of those, mostly due to RDXW choices, but as foxy as pointed out, others have them as well, when it comes to wallets at least - it’s the nature of the best I guess.

Doesn’t mean we shouldn’t do better, ofc.

I do agree that FND, should it get viable proposals for this or the other RFP undergoing, does defer to the community for acceptance!

Even without any lock-in or other limitations, we’re talking about commitments that won’t be community choices but will become community obligations upon the handover.

I’ll re-paste/rephrase my comment on main TG, for clarity:

we just need two things:

a) clear awareness of the issue, both the community at large and any would-be proponents to this RFP

b) FND’s willingness to work a contract that provisions the best damage-control and future-readiness for it

Slightly confused by this post.

The details clearly state that proposals will be voted on by the community, and are being public submitted.

Likewise, we’re not rushing this. We’re kicking off this RFP process so there is time to get these proposals in, for the community to debate them, and the best option going forward to be selected.

On your point on vendor lock-in, this is something I recommend that the community consider in proposals. Proposals may offer different contract terms, and there may be benefits to having shorter term engagements to give flexibility, although that also means it will need to be renegotiated sooner as well.

There are realistically only three outcomes here:

  1. The foundation continues running these services, which delays the handover to the community.
  2. The community agree on a proposal for new provider(s) that best fit the goals (e.g. more options rather than a single provider, factored in agreement to develop the system futher, flexible terms etc)
  3. We do nothing, and at some point in the future the foundation will turn off the gateway service. If no replacement or alternative is in place, users will have to source their own solution to interact with the network

From the current foundation perspective, delaying proposals and actions on this would just increase the timeline to transition to the community, or risk functionality of the wallet and user interaction with the network. That would be irresponsible.

I’d like to clarify a few points, as I think there may be some misunderstandings about the current design and its constraints.

First, the system is decentralized by design. Nothing prevents anyone from running their own gateway services if they choose to do so.

Regarding the DataAggregator, we intentionally run a single instance. This is because it must interpret data in the context of previous transactions - as Jonas already mentioned, it needs to be built chronologically. We are aggregating data and producing a custom view model, which requires a consistent, ordered processing flow.

It’s also worth noting that the gateway does not rely on a single node. We read from multiple nodes to provide redundancy and to mitigate temporary performance issues or outages on individual nodes.

From a performance perspective, we are not currently facing TPS limitations in the gateway, and this has never been a bottleneck. The gateway has consistently been able to keep up with the network TPS.

We chose PostgreSQL due to strong ecosystem requirements around atomic updates - all data related to a single transaction must be committed together to avoid inconsistencies across different areas. PostgreSQL also enabled a full historical browsing capability, which was an explicit and important requirement and PostgreSQL feature of lateral joins allowed us to do that easily.

Adopting Xian would require building a completely new gateway with a fundamentally different architecture. The current setup is not something that can be reused, as it was designed specifically with the Babylon node in mind, just as we previously had a different gateway for Olympia due to its distinct requirements.

While it’s always possible to design something new or improved, the key challenge here is backward compatibility with existing clients and applications that depend on the current system.

Finally, the setup described in the RFP is meant purely as an example of what we are running today, to give a general idea of the scope and requirements. It is not intended to prescribe a fixed architecture or mandate that future implementations must follow the same setup.

4 Likes

What would be the expected cost structure for the Foundation to run these services i.e. $X for Y months…etc.

Fundamentally, the open and transparent RFP process implies that the community needs to know that information before we blindly select any potential external party to take over these core services.

For example, if some entity proposes a bid for twice (or half) what the Foundation would consume for these services to continue then that cost, plus their additional soft benefits like their ability to upgrade, optimize and better decentralize (through a complete re-configuring of the stack) need to be weighed against the proposed cost structure.

3 Likes

Re: $X for Y months…etc

Worth mentioning: While knowing the exact costs we currently pay for these services would be ideal, it is worth pointing out that this is not entirely feasible since it could skew any potential bidder to simply increase their costs to just below that level vs. their true cost structure needs. That’s something we want to avoid as well.

2 Likes

Thank you for the clarification.

In that case, I do have some asks and comments, if you or @Adam_XRD could possibly address them:

Can you share the exact setup you’re using? Because if the current setup, as per your writing, never faced any performance issues, then any would-be proponent needs those details, as it can’t go build a solution based on examples, imho.
Exact specs need to be known - you don’t build a solution for a 99% uptime service based on a “general idea of the scope and requirements”. They need to be complete and precise, else it’s all just up in the air and we’ll only find out when we’re trying to run using it.
For example: are you’re using a single server DB or a clustered instance? are you providing multiple read-only replicas to feed the API, so there’s “no performance issues”? what’s the current sizing of each component, actual specs? are you feeding from Validator nodes or just plain nodes? What’s CF’s services role in the current architecture? what specific services are being used in CF, so one can understand the actual technical requirements, so it doesn’t have to be CF?

Decentralized is not distributed. My argue is that this needs to be distributed, so that any and all existing gateways can be transparently used by the wallets - that is not the case, they require explicit knowledge of the user, the ability to trust and change what gateway they’re using. It’s all pretty much locked-in to what RDXW built and operated.
Even if the community builds a zillion gateways … they’re still not used by the wallet.

I’m all good with PostGRES choice … but I still don’t get the single-writer issue. Is the DataAggregator code doing the ordering and other stuff?? or is in fact some stored procedure doing it in the DB side, post processing the ingested data?
There are native solutions in PostGRES for multi-writing, collision and duplicate detection.
I’m not a DBA or a DB expert at all, so I wonder why you decided to avoid it and thus forfeit the ability to have the DB fed from multiple aggregators.

You mention that anything new or improved bears the challenge of backward compatibility … but aren’t the clients all API clients?? Does the gateway provide any direct access or simple direct mapping between DB tables and API calls?
It makes all the sense that we can’t break API compatibility - at most, introduce versioning and if the clients miss it, assuming they’re v1.
But it makes absolutely no sense that you can’t change what’s behind!
That being true, why even bothering in using an API front-end for clients?? I find it hard to believe, so for now I’m assuming the simplified phrasing can be furthered with some details.
We should be able to completely change the DB layer whilst maintaining full API compatibility.

It’s clear we’re all on the same page that this solution isn’t Xi’an compatible, at least there’s no argument there and I’m glad that you made it public - as we’ll fs need to start working on that as opposed to worrying on Babs :slight_smile:

1 Like

We still need a good enough ballpark for it, like in happens in public procurement, for example.
The highest cost predicted is X, the lowest cost is Y.
So that anything outside such interval warrants a very specific and thorough checking and evaluation - the concept of an abnormal low price, for example, is of particular interest.

@Adam_XRD can you be so helpful as to define and include such an interval in the RFP, pls?

1 Like

Yep, as you say if we disclose current costs, it is likely to skew potential bidders - especially as the current foundation offering is the only default provider and operated purely as a “public good” with generous rate limits etc. I’m hopeful we will see proposals come in that look at balancing this with commercial opportunities, potentially multiple providers, etc. so the community can consider these options.

What I can say is another reason we’re presenting these RFPs early is so we can assist in reviewing them as they are critical services. When putting proposals up for community consultation, we will also add any opinion or observation from our side including if their pricing is significantly above or concerningly below what we’re currently operating at.

3 Likes