Li Yifei

Data Analyst & Infrastructure Engineer

Alternative Data / Data Engineering / Quantitative Finance

Focusing on alternative data and data infrastructure for investment research. I build web-scraping/ETL pipelines, data lakes, and dashboards that turn analysts' web sources, vendor feeds, reports, and emails into trackable datasets. AI-native working style: treating LLM tools as my first collaborator for prototyping and data processing.

Recent Posts

View all posts →

Education

City University of Hong Kong

Master of Business and Data Analysis (Statistics)

2024 - 2025

Beijing University of Posts and Telecommunications

Bachelor of Computational Mathematics

2020 - 2024

Experience

CloudAlpha Capital Management

Data Analyst Intern (Alternative Data & Infrastructure)

Jun 2025 - Dec 2025
  • Built hardware & raw materials alt-data coverage: DRAM/NAND prices, wafer prices, rare metals, LME/SMM time series
  • Developed LLM-powered extraction agents for PDF reports, HTML pages, news, and emails
  • Designed internal tools: Automeeting, SmartSpider, MailSync, and Data Hub dashboard

Gaorong Ventures

Data Analysis Intern

Sep 2024 - Dec 2024
  • Built GPT-driven NLP pipeline for analyzing 10K+ user reviews with sentiment scoring
  • Authored investment reports on AI tools (Cursor, Vercel) with technical evaluations
  • Built high-concurrency Python crawler system integrated with LLMs

Projects

Crypto Quantitative Trading Framework

Signal generation and backtesting framework using Python-ccxt. CTA strategy based on technical indicators and Hidden Markov Model with 0.68 Sharpe Ratio.

Python ccxt HMM Backtesting

Distributed Data Warehouse

Star schema modeling for data warehouse with ODS and DWD layers. Resolved Oracle data collection issues using Avro format, Hive, and HDFS.

Hadoop Hive Sqoop HDFS

LLM Data Extraction Pipeline

Controllable LLM agent to process PDF reports, HTML pages, news, and emails. Extracts ticker, product, price/volume into semi-structured tables.

Python LLM NLP MongoDB

IRS-Assisted Channel Optimization

Optimized Intelligent Reflecting Surface parameters using ADMM algorithm and penalty functions for non-convex optimization.

MATLAB Optimization ADMM

Skills

Programming

Python SQL Rust R MATLAB

Infrastructure

MongoDB PostgreSQL Hadoop Spark AWS

AI & LLM

Claude GPT Cursor Gemini

Languages

English Mandarin Cantonese Japanese