I. Introduction
An internet-based news aggregator, providing hot news scraping on popular news sources, with recommendation feature based on users’ preference with the help of Machine Learning.
#Github: https://github.com/caomingkai/News_Recommendation_System
Pull it and run it with Shell script!
- Firstly, run
./launcher.sh
:- run redis
- run mongoDB locally
- start recommendation service(python)
- start backend service(phython)
- start web-server service(Node.js + ReactJS)
- Secondly, run
./news_pipeline_launcher.sh
:- run redis
- run mongoDB locally
- install python requirements
- start news_topic_modeling_service (python Machine Learning)
- start news collecting service(data pipeline + web scraping)
II. Tech stack:
- Front end: React, Express, Node.js, OAuth
- Built a responsive single-page web application for users to browse news (React, Node.js, RPC, SOA, JWT)
- Back end: Python RPC, MongoDB, Redis, RabbitMQ
- Service Oriented, multiple backends serving via JSON RPC
- Implemented a data pipeline which monitors, scrapes and deduplicates news
- News recommendation system: Tensorflow, DNN, NLP
- Designed and built an offline training pipeline for news topic modeling
- Deployed an online classifying service for news topic modeling using trained model
- News topic classifying system: TF-IDF, NLP, RabbitMQ
- Implemented a click event log processor which collects users’ click logs, updated a news model for each user
Chart1: Login Page with Authentication
Chart2: News feed page
III. System structure:
- Front end tier: React & Node.js
- Back end tier: providing RPC API for communication among different tiers
- News recommendation system: time decay algorithm
- News topic classifying system: Tensorflow with 2-layer CNN model for classification
- data pipeline: get news sources
- News monitor: gets news from News.API
- News scraper: web scraper
- News deduper: news TF-IDF deduplication
Chart 3: System with Machine Learning module
Chart 4: System with Recommendation module
Chart 5: Service dependency