Joel P. Barmettler

AI Architect & Researcher

< Back
2018-2019·Data Engineer

Data engineering for cryptocurrency analytics at CoinPaper

I architected and deployed ETL pipelines using Python and Spotify Luigi on AWS, processing gigabytes of daily cryptocurrency data for fundamental analysis. CoinPaper was an open-source platform providing transparency in cryptocurrency evaluation through automated data aggregation from GitHub, Reddit, Telegram, and Google Trends.

Pipeline architecture and data sources

I designed a serverless data collection system using Luigi for job orchestration, DynamoDB for storage, and API Gateway with VTL templates for the API layer. The pipeline ran jobs at multiple frequencies: high-frequency jobs (minutes) fetched global market data via Coinpaprika API, hourly jobs updated OHLCV candles, and daily jobs performed GitHub analysis using PyGithub to count lines of code and track commits, Reddit sentiment analysis with PRAW and NLTK's VADER, Telegram metrics via Telethon, and Google Trends data through pytrends.

Analysis and scoring algorithms

I implemented custom analytical engines including linear regression for community growth trends, whitepaper analysis using pdfminer and NLTK for auto-summarization, and Git analytics that cloned repositories to calculate actual code activity rather than relying on stars. The system generated a weighted "Coinpaper Score" combining developer activity, community sentiment, and manual reviews to grade project legitimacy and help users identify scams.

The infrastructure processed multiple data streams at scale, transforming raw social and technical metrics into actionable fundamental analysis for cryptocurrency investors.

What was CoinPaper's core purpose?

CoinPaper was an open-source cryptocurrency analytics platform focusing on fundamental analysis rather than just price tracking. It aggregated data from GitHub, Reddit, Telegram, and Google Trends to provide a comprehensive health check for cryptocurrency projects and help users identify legitimate projects.

What technologies powered the data pipeline?

The pipeline used Python 3.9 with Spotify Luigi for orchestration, AWS DynamoDB for processed data storage, AWS S3 for assets, and AWS API Gateway with VTL templates. Data sources included PyGithub, PRAW for Reddit, Telethon for Telegram, and pytrends for Google Trends.

What data sources were analyzed?

The system analyzed GitHub commits and code metrics, Reddit sentiment using NLTK and VADER, Telegram member counts, Google Trends search volume, whitepaper content via pdfminer, and OHLCV price data from the Coinpaprika API.

What analytical techniques were used?

The system employed linear regression for trend analysis, VADER sentiment analysis for Reddit posts, natural language processing with NLTK for whitepaper summarization, and custom algorithms to calculate lines of code and commit activity from cloned Git repositories.


< Back

.

Copyright 2026 - Joel P. Barmettler