We need to act faster than what the general expectation might be around stokenet, because technically it takes a significant amount of resources and time.
Stokenet, at a minimum, requires the following:
4 Nodes , as Validators with majority of Stake
A functioning gateway - that’s at least 1 DataAggregator and 1 DB and 1 front-end to uphold the API
A functioning dev console
A functioning explorer
A functioning faucet (or a working process so ppl can have tokens to work with in a reasonable amount of time)
Additionally, we should have as many additional nodes and validators that are community run as possible.
So, bottom line, who’s up to run any of these or all of these?
And if you are, how much do you need to cover, cost wise.
We need clear, specific costs maps and conditions.
We need clear will to do it as well, so we can ask the FND for both technical support on it - like making a DB available so it doesn’t take 1 week to sync - but also, via RAC, some funding to uphold it.
Pls reply if you’re up for this, give us details, don’t be shy.
We need to build a team to run this.
I’ve just got to the point of being able to host a mainnet node, and today just worked through the docs to get it to the point of syncing…. although if more beneficial I’m happy to host a stokenet node. It was on my to-do list anyway when Radix was actually something, but now I’d be more than happy to do anything that would benefit the wider community moving forward. I’ve limited funds and hardware resources, so will hang fire as this seems like it could be more beneficial to Radix as a whole right now?
If your mainet node will not be supporting a specific validator then your time and resources could possibly be of more use supporting Stokenet, yes.
But only if you feel that you won’t be able to sum up enough stake to be in the validator set.
Some existing validator owners might be looking for node-runners that can take the mantle and run the nodes for them, so you should explore that, I believe.
Dully noted though, on your willingness, you will be called up for this if the team’s lacking and you don’t get into the mainnet set.
Hi.
I was one of the persons sending in a google doc answer where I offered to take a look at how I can assist. Stokenet was and still is one of the parts I can assist with. But there has been absolutely silence from anyone. I know also others are awaiting feedback on their application.
I am currently testing my HW setup. But I still need a week or so before I can give a firm commitment. Shambu provided a fresh snapshot this morning. It is currently at 300 GB size. I am pretty sure he can host that snapshot for a couple of weeks until we are more ready.
I am aiming at running the 2 Validator Nodes and the 2 Full Nodes on my infrastructure. I will have basic backup service and provide a bucket where ppl can download a fresh node.
The Infra cost recovery for these 4 servers are around 450-500 usd pr month.
Yes, minimum 4 nodes. 2 Validators and 2 Full Nodes. And as many community Validators as possible
The Gateway is quite technically complex and the postgres database is a critical component that maybe should be served as a fully managed service. The database can drive 400-800 usd alone pr month.
The Dapps seems to be the same across mainnet and stokenet (i think) and are all hosted on a monorepo Kubernetes cluster. I have not seen anyone interested in taking over that cluster. The best option would be that someone took over the full cluster from an operations PoV and remove those dapps that we agree to be removed. Another question about these apps are who is going to maintain them from a functional aspect.
I have been looking to run a gateway API for Astrolescent and DefiPlaza and I assume a Stokenet gateway might have similar availability and performance needs. I believe it’s possible to run the whole Gateway on a modern bare metal server, which would bring the cost down to (only hosting) around 150-200 USD/month. That would mean a single instance with no option to fallback, but the question remains how critical is stokenet vs the operating cost involved.
Would you be ok to have admin access and help handling it even if the underlaying servers aren’t your’s?
Do you have any practical experience fit for administering that part of the Dapps?
We do not need to give much importance to maintain it, this will mostly be a life-support action, to keep it running as it is without the FND’s involvement and until the DAO can have a proper way to manage and develop it.
Here are the details on the specs of the current Stokenet that is run by the Foundation.
Stokenet Nodes
There are 4 validators and 2 full nodes spread across 4 AWS regions. The network can be run with a minimum of three validators — running with only two is not advisable. Although there may be community-run nodes on goodwill, they cannot be relied upon completely.
Compute
Region
Node
Type
vCPU
RAM
us-east-1
Validator
r7i.large
2
16 GB
eu-west-1
Validator
r7i.large
2
16 GB
eu-west-1
Fullnode 0
r8g.large
2
16 GB
eu-west-1
Fullnode 1
r7i.large
2
16 GB
ap-south-1
Validator
r8g.large
2
16 GB
ap-southeast-2
Validator
r8g.large
2
16 GB
Storage
Each node we have around 460GB currently and setup uses logical volumes to add disks when required.
Resource
Size
Count
Total
Data volume (primary)
300 GB
6
1,800 GB
Data volume (secondary)
100 GB
6
600 GB
Root volume
60 GB
6
360 GB
Total
2,760 GB
Stokenet Gateway
The Gateway is run on a Kubernetes cluster. Under current load it is running on 3 pods.
Gateway API
Spec
Value
CPU Request
475m per pod
Memory Request
750Mi per pod
Memory Limit
750Mi per pod
Min Replicas (HPA)
2
Max Replicas (HPA)
100
HPA Target CPU
70%
Data Aggregator
Spec
Value
CPU Request
400m per pod
Memory Request
2Gi per pod
Memory Limit
3Gi per pod
Replicas
1
Database (AWS Aurora PostgreSQL)
Spec
Value
Engine
Aurora PostgreSQL 15.12
Instance Type
db.r6g.large
vCPU per Instance
2
RAM per Instance
16 GB
Number of Instances
2 (writer + reader)
Read Replica Autoscaling
1-10 replicas at 70% CPU
Backup Retention
15 days
Availability Zones
eu-west-2a, eu-west-2b, eu-west-2c
Database volume size on Aurora is 1.1TB . Aurora uses one storage for multiple RDS instances. For any other setup, the replicas use their own storage.
Monitoring & Log Aggregation
In the Foundation’s case, monitoring and log aggregation are shared infrastructure running within the same Kubernetes cluster.
Metrics: We run Mimir (the open-source backend that powers Grafana Cloud) for metrics collection and long-term storage.
Running these self-hosted within the cluster keeps costs manageable as part of shared infrastructure. However, if one opts for a managed provider like Grafana Cloud, costs can rise significantly depending on metrics cardinality, log volume, and retention requirements. These services are generally free for small request load. How much you invest in monitoring ultimately depends on how resilient the infrastructure needs to be and the level of observability you require.
The Stokenet Gateway is run by the Foundation in a similar setup to mainnet, but the community can choose to run it differently and validate the setup. Running the Gateway on bare metal is technically possible, however the downside is managing traffic, rate limiting, autoscaling on increase of load for both pods, RDS, and disk/storage management — which can get quite cumbersome.
The storage allocation is relatively large considering that Stokenet’s ledger was wiped at the start of Babylon. Resetting the ledger and starting fresh is an option that would reduce storage requirements, however this could disrupt community Dapp developers who would need to redeploy their components and recreate ledger entities on the new ledger. So something to consider down the line.
My profile is not within hosting. I am more and architect and developer. If DAO end up with providing infra I believe there are others that can to that without spending a lot of efforts like I need do.
I was thinking of providing my stuff more like a service. Where the risk and reward is more up to how well I am providing it.
Let me read through the new details that Shambu provided. That was a really good and detailed list.
Shambu.
Why didn’t you guys also include the Stokenet Gateway into your P1 offer? Imo it should be a perfect fit to host it alongside the mainnet infra. Maybe you can include a “downscaled” Stokenet Gateway based on business hours support into your P1 offer and show how these synergies play out?
Besides the added cost (since it requires at least two nodes and an unauthenticated gateway) and the operational overhead of managing Stokenet, having it run by groups other than the Foundation DevOps team would give others the opportunity to gain hands-on experience with the setup involving high load.
Stokenet also receives a considerable amount of load, which I believe mostly comes from dApps, as relatively few users are likely using the Stokenet wallet. In my personal opinion, this makes it a good learning opportunity for the other than us to operate and test different configurations compared to ours—for example, running on bare metal servers.
It could also serve as a platform to experiment with re-architecting the gateway, not just in terms of how it is operated, but also how it is developed, given there are concerns with the current architecture.
Stokenet is listed as P3 because it is a test network and, strictly speaking, doesn’t need to exist. I understand that its absence would be inconvenient for dApp developers, but choosing not to run Stokenet is still a viable option for community if you look at it in different direction.
Hence we stuck with just P1 and mainly gateway, if other P1’s are run by different group or individual.
As usual, not a good fit for running it simpler and bare-metal or bare-metal Hypervisor + VMs
Can you give us your take on what the specs would be, for the gateway part, if we were to reduce it to a single DB (no read replicas, we can’t go spending on performance rn) and no monitoring (if it breaks, those using it must report and we’ll see about fixing it then, purely reactive) and run it in a box (or two), no K and no serviced DB.
The nodes/validators part is easy enough, as it’s the same as mainnet, just a different network ID, we have plenty of experience for it.
We do need to have a recent DB, though, for speedy sync - can you provide that for us as before?
It sure make sense. And that learning aspect is also why I am interested in the Stokenet parts.
Does the Gateway need to be located in the same datacenter as the FullNodes? I guess the reason why I’m asking is to predict the egress cost on the gateway if the GTW cluster and the nodes are located differently.
My value-add is that I can serve the 3 Validator Nodes and 2 Full Nodes together with a Monitoring and backup server all on two large virtualized servers. But I am not competent to run the k8s cluster and operate the database.
DataAgregator needs direct access to nodes API port and good bandwith, low latency connection, so although technically not required, it’s the best viable option, for performance and security reasons, if I am not mistaken on anything.
I’m also an infra-structure provider, so we’re on the same boat regarding DBs and Ks and the rest
Btw, why do you all keep referring to Validators and Full nodes??
It’s confusing, there’s only one version of the node software - they are all the same (what you’re calling full nodes), there isn’t a “light” node nor a different option.
A Validator is just a node that is configured to support the validator function - a simple entry in the config file, that needs to match the ledger config done by validator badge owner (node public key)
@shambu.xrd why are you running two extra nodes then, if they’re not validating?
Btw Daffy, I mostly work with OVH, what provider are you usually using?
Or are you by any chance a provider yourself?
My base costs for running a decent infra for what is described seem to be lower than what you describe, but I’m also not giving two shits about monitoring in Stokenet and assuming no redundancy nor high performance for the gateway
I guess that the Full Nodes are just saying that it is the data Integrity Layer which can be uised for other activities without risking interfering with the validation process. And its easier to run backups and archiving on these nodes as you can stop it without having any consequence.
I have used different providers. But my main interest is developing apps. Not running infrastructure. I am using some local and also AWS. And now I have come across 3 large Dell servers that I aim to setup on a colocation datacenter. When I observed absolutely no interest for hosting anything I thought I could volunteer on an area I could manage without interfering with my main interest. I know OVH is cheap but I also know that a final solution ends up more costly than at first glance.
And to be blunt about it. I am not willing to run this infrastructure out of my own pocket. Then I would rather spend my time on developing apps. And if I am going to run it I do need a small contingency on top to handle those extras. 100 usd pr node incl 500 GB Storage pr node is not that bad I think. And taking payment in XRD is not easy transferred into fiat either.
Maybe this is easier in your country idk.
But if the Gateway need to be in the same perimeter as the nodes I am not an option anyway. I can only assist with the nodes unfortunately.