Sentrix: Real-Time Social-Media Sentiment Feed for Trading Platforms

Sentrix is a real-time event-driven system that transforms unstructured social media content into structured sentiment signals for financial analysis.


Problem Statement

Social media platforms have emerged as a major source of market-related information, reflecting investor sentiment in near real time. However, such data is inherently unstructured, noisy, and inconsistent, limiting its direct applicability for financial analysis.

Traditional financial datasets provide structured and reliable information, but fail to capture rapid shifts in public sentiment. Conversely, social media data is highly responsive yet lacks consistency and interpretability, creating a fundamental challenge in extracting meaningful and actionable signals.

Key challenges include:

  • High noise and unstructured data
  • Low signal reliability
  • Difficulty extracting consistent, actionable insights

Solution Overview

To address these limitations, Sentrix adopts an event-driven microservices architecture that transforms raw social media data through a sequence of structured processing stages. The system integrates filtering, domain-specific sentiment modelling, and hierarchical aggregation to produce interpretable sentiment signals.

The system addresses these challenges through:

  • Multi-stage filtering pipeline for noise reduction and relevance extraction
  • Domain-specific sentiment modelling using finance-oriented NLP models
  • Multi-level aggregation across event, hourly, and ticker-level representations

System Architecture

The system is implemented as a set of modular microservices connected through Apache Kafka, enabling scalable and continuous data processing. Each component is responsible for a specific stage of the pipeline, ensuring clear separation of concerns.

Core pipeline components include:

  • Ingestor Service: Collects stock-related posts and publishes event streams
  • Filtering Service A: Performs deterministic rule-based filtering and noise removal
  • Filtering Service B: Applies semantic AI-based filtering and data relevance checks
  • Sentiment Service: Generates structured sentiment signals from filtered data using a 3-level aggregation pipeline
  • Frontend: Visualises sentiment outputs for monitoring and analysis
Figure 1: System architecture of Sentrix pipeline

Evaluation

Key performance indicators:

  • 62–64% Directional accuracy (above baseline)
  • ~180s Ingestion latency
  • 18–25% Noise removal rate
  • 81.9% Filtering accuracy

Demo


Timeline

PeriodMilestones / Tasks
September 2025– Conduct literature review and finalize requirements analysis
– Finalize project scope, objectives, and success metrics
– Submission of Detailed Project Plan (Oct 1 deliverable)
October 2025– Implementation of API connections and data scraping mechanisms for Reddit and Twitter (in parallel)
– Initial design of the preliminary filtering framework
– Preparation of Project Web Page (preliminary version)
November 2025– Development of filtering, correction, and weightage mechanisms for Reddit/Twitter data
– Initial design of the credibility-weighted scoring pipeline
– Interim backend testing of ingestion and filtering modules
– Research and evaluation of AI/NLP models (FinBERT, FinGPT, LoRA, and COTS models)
December 2025– Experimentation with fine-tuning and prompt engineering for sentiment analysis and keyword extraction
– Drafting of interim findings in preparation for Phase 2 interim report
January 2026– Integration of the best-performing sentiment model into the pipeline
– Preparation for First Presentation (Jan 19)
Submission of Phase 2 Deliverables (Jan 25): preliminary implementation and Interim Report
– Commencement of backend microservices implementation (Spring Boot)
February 2026– Integration of CI/CD pipelines for automated testing and deployment
– Implementation of the authentication system for secure access
– Integration of Kafka messaging across microservices
– Configuration of MongoDB Atlas for the persistence layer
March 2026– Execution of full pipeline testing and bug resolution
– Development of the frontend with authentication and live data streaming
– Integrate one more sentiment scorer and further enhance the performance of keyword extraction
– Implementation of the administrative/developer dashboard with metrics visualization
– Definition and testing of success measurement metrics (filtering accuracy, sentiment directional correlation, latency)
April 2026– Finalization and furnishing of the Project Web Page (final version)
Finalization of cloud deployment environment
Submission of Phase 3 Deliverables (Apr 19): tested implementation, final report, and final web page
Final Presentation (Apr 20)
Preparation of poster and 1-minute demonstration video for Exhibition (Apr 23)

Our Team

Dr. Chim, Tat Wing

Supervisor BEng, M.Phil, PhD HK

Wasif Latif Hussain and Jungmin Jang

BEng Computer Science and BASc Financial Technology