Constructing a resilient crypto trading database: Data from faulty WebSockets and slow-responding REST APIs
date
May 26, 2023
slug
crypto-trading-database-from-faulty-websockets-and-slow-rest-apis
status
Published
tags
distributed-system
web-scraping
system-design
event-driven-architecture
summary
Building historical trading data faces problems from the unreliability of crypto API endpoints, especially WebSockets. This hinders the retrieval of accurate high-frequency data, like pricing information. Our mission is to calculate a precise rolling 5-minute VWAP by ticker, correcting historical data for trade errors and ensuring accuracy
type
Post
Faulty WebSocket and slow REST APIs
While building the historical trading database by using data from external providers, I encountered common problems with receiving trading records from crypto API endpoints, particularly WebSocket endpoints, which are highly unreliable.
This problem creates issues in pulling high-frequency data, such as pricing information, where we want to run real-time calculations that require accuracy. Our mission this time is to take in trade data and calculate a rolling 5-minute Volume Weighted Average Price (VWAP) by ticker, optimizing for accuracy. We do not just calculate the latest 5-minute rolling VWAP, but also correct historical time points once detect data inconsistency
We have two sources of crypto trading data from a same Data Provider:
WebSocket
Endpoint: ws://x.x.x.x:80/stream
Structure of each trade order
REST API
Endpoint: https://x.x.x.x:80/api
Maximum number of trade orders in reach query response is 100
Problem with unreliable connections
WebSocket | REST API |
Returns many duplicated trade orders | No duplication, can be used as source-of-truth |
Keep disconnecting sometime | Frequently takes so long to response (timeout error) |
Return data might have invalid structure | ㅤ |
![Historical trading data: Overview of System Design with required APIs and components](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F30255a37-86b4-4d4a-9420-75403f1c7b68%2FOverall.drawio.png?table=block&id=977cf666-8b4b-437f-8175-d93c2069f5c5&cache=v2)
Analysis and proposed approaches
Deal with unreliable WebSocket connection
Set up a WebSocket handler that continuously listens for updates from the WebSocket connection. Since the WebSocket is unreliable, it's important to handle connection drops, timeouts, and errors gracefully. Implement reconnection logic to reconnect if the connection is lost
Deal with slow-response REST API
Create retryable HTTP Client – the first solution
• Retry mechanism: Implement a retry mechanism to handle slow-response REST API requests. This mechanism allows the client to retry failed requests, ensuring that data can still be queried even if the API responds slowly
• Timeouts: Set appropriate timeouts for the REST API client to prevent indefinite waiting. Timeouts provide a mechanism to handle unresponsive or slow API calls effectively.
• Backoff Strategy: Use a backoff strategy to control the retry interval between consecutive failed requests. This strategy helps avoid overwhelming the REST API and gives it time to recover
• Logging and Monitoring: Incorporate logging and monitoring mechanisms to track the performance and health of the REST API client. Log important events and errors for debugging purposes
The retryable-http library (in Go programming language) offers us all features. However, it remains insufficient in addressing our needs.
Retry fetching data with Message Queue – the second solution
Once all retries have been attempted using a retryable HTTP Client, the request is not discarded outright. Instead, our system employs an alternative retry approach that leverages the power of Redis and RabbitMQ. The request is placed into a queue, awaiting processing by a dedicated message consumer. If, after each subsequent retry, the request still fails, it is returned to the queue for another round of execution. This iterative retry process persists until the request either succeeds or reaches the maximum number of configured retries specified within the system
Publish delayed message with RabbitMQ: Before applying the second retry solution to the request, we introduce a deliberate delay and setup process instead of immediately placing it in the queue. This pause ensures that there is a suitable interval before attempting the retries again, following the exhaustion of retries with the first solution
![Two solutions of retryable API query requests](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2F03d1bd2c-fce2-436d-b1de-0e6fe18830ea%2FRetryable_HTTP.drawio.png?table=block&id=6f26cb81-7792-4de1-80f6-adef66dc5b3a&cache=v2)
Data Inconsistency
The REST API's trading data serves as the authoritative source. Upon receiving trades, they undergo validation and are compared with records within the local database for the corresponding time window. During this comparison, if any inconsistencies arise among trades sharing the same Trade ID, the "integrity validation" module alerts the system to flag the associated ticker until all conflicts are resolved. Notifications regarding inconsistent data are published to a queue and subsequently consumed by a dedicated message consumer
There are several potential scenarios that can lead to data inconsistency:
• Trade Details Mismatch: In this situation, the details of trades received from the REST API do not match the corresponding records in the local database.
• Difference in the Number of Trades: Another possibility is that the number of trades from the REST API data differs from the number of trades recorded in the local database (same time window). This discrepancy could occur due to a missing trade in the local database, or the presence of redundant trades caused by trade order reversals
![Flagging the token ticker once detect data inconsistency and start resolving conflicts](https://www.notion.so/image/https%3A%2F%2Fs3-us-west-2.amazonaws.com%2Fsecure.notion-static.com%2Fe11c3326-c106-48a8-a8e0-d4f8168e2063%2FTradingDataInconsistency.drawio.png?table=block&id=8117f56e-a2c2-4f79-9038-ef9cc56bb25c&cache=v2)
Next Read: