Navigating the Data Maze

Data Selection

DeFi liquidity constraints pose challenges for autonomous on-chain execution. aarnâ addresses this by focusing on tokens with a minimum liquidity threshold on DEX, ensuring that the model is trained on economically significant and active assets. This approach filters out long-tail assets from both the training and inference datasets, enhancing prediction relevance and accuracy.

Data Handling Pipeline

All collected data, including OHLCV, Twitter, blogs, and user transactions, are stored daily on AWS Cloud, in Amazon S3 as raw data. To transform this raw data into a structured format suitable for analysis, AWS Glue jobs are utilized, which automate the extraction, transformation, and loading processes. Additionally, AWS Lambda functions are employed to handle event-driven data processing, ensuring that data flows smoothly and remains up-to-date. The transformed data is then queried and analyzed using Amazon Athena.

Feature Engineering

The feature set for alpha 30/7 is extensive, with over 93 handcrafted features based on multiple data sources outlined in the data groups in the table below. It includes 17 sentiment-based features from Twitter and blog content, 18 transactional features on whale users filtered from a universe of 22k users to capture market impact, and various price-related metrics to capture market dynamics. For sentiment analysis, aarnâ utilizes the LLAMA and CryptoBERT models, specifically designed to capture trends from social media and blog data.

Data source

Group

Description of Features Created

OHLCV

Price and Market DynamicsFeature

Includes day-specific price data, market volatility, moving averages, trend and momentum indicators to analyze market behavior.

USER TRANSACTION

Transaction and Volume

Features capturing whale users activities, and transaction values,

to reflect trading dynamics and volume flow

SOCIAL MEDIA

Sentiment and Social Media

Data from Twitter and news articles to analyze sentiment trends, consistency, and dynamics through various sentiment scores and changes

Last updated