I’m all good with PostGRES choice … but I still don’t get the single-writer issue. Is the DataAggregator code doing the ordering and other stuff?? or is in fact some stored procedure doing it in the DB side, post processing the ingested data?
There are native solutions in PostGRES for multi-writing, collision and duplicate detection.
I’m not a DBA or a DB expert at all, so I wonder why you decided to avoid it and thus forfeit the ability to have the DB fed from multiple aggregators.
DataAggregator itself does not perform any ordering, but it does much more than simply reading raw transaction data and storing it in the database. If its only responsibility were to store raw transaction data and expose it via an API, it would not be required at all, as a node could handle that directly.
The data present in a transaction is not always sufficient to populate the gateway’s custom read model used to serve Gateway API requests. For this reason, transactions must be processed sequentially.
When DataAggregator processes transactions sequentially, you have a guarantee that once a transaction with state version X is processed, requests for that transaction can be served immediately, because all required data is already available. If transaction processing were not sequential, additional and complex logic would be required to determine the latest state version that can be safely served (i.e., the highest state version from a consecutive sequence starting at state version 1).
While it is probably possible to relax this requirement and rework the read models and processing logic, doing so would likely require a substantial or even complete rework of DataAggregator.
Regarding stored procedures, there is no business logic executed in the database. Database is used only to store and return data. This is one of the reasons why sequential processing is required: it avoids the need for complex and time-consuming queries or moving business logic to the database layer.
Examples from the Gateway (illustrative, not exhaustive)
To determine how much of a resource was initially minted, the transaction in which the resource was created must be tracked. Over time, the resource may be burned or minted in subsequent transactions that emit burn or mint events. Such events contain only the amount that was burned or minted. To serve requests about the current total supply of a resource, or the total amount ever burned, the system must rely on data from previous transactions.
To store accurate totals, it is necessary to know how much was initially minted, as well as how much was minted or burned in earlier transactions. This also applies when returning aggregated totals per entity (for example, key–value store entries while iterating).
You mention that anything new or improved bears the challenge of backward compatibility … but aren’t the clients all API clients?? Does the gateway provide any direct access or simple direct mapping between DB tables and API calls?
It makes all the sense that we can’t break API compatibility - at most, introduce versioning and if the clients miss it, assuming they’re v1.
But it makes absolutely no sense that you can’t change what’s behind!
That being true, why even bothering in using an API front-end for clients?? I find it hard to believe, so for now I’m assuming the simplified phrasing can be furthered with some details.
We should be able to completely change the DB layer whilst maintaining full API compatibility.
Of course, it is always possible to rework the database layer while maintaining full API compatibility, but replacing what PostgreSQL already provides may be challenging, at least in my opinion. That said, feel free to evaluate the options.
Regarding compatibility and Gateway users: the Gateway was designed and maintained with multiple extensibility points in mind. Some users have forked the codebase and added features on top of what we originally developed. While backward compatibility was never formally guaranteed, we intentionally tried to avoid breaking community tools and services.