A lightweight tool to classify Twitter/X posts into noise, neutral, and relevant categories using a combination of rule-based logic and Logistic Regression model.
This project automates the classification of trending Twitter/X posts to filter out irrelevant content. It combines a simple rule-based system with a Logistic Regression model to improve filtering accuracy. The application is built with Flask and integrates Google Colab notebooks for model training and experimentation.
- Containing sales offers, coupons, vouchers, or "check this link" for advertising or promotional purposes to encourage transactions.
- Phrases such as "click here!", "don't miss it!", "share if you agree!" without any context.
- Using trending hashtags but unrelated tweet content, solely to generate engagement.
- Short URLs (bit.ly, tinyurl, or other short URLs) without explaining the link content.
- Tagging multiple accounts (@user1 @user2 …) solely for promotional purposes.
- Tweets simply regurgitate keywords without addressing the topic.
- Report data or observations using descriptive sentences. For example, "#Grok has been ranked #1 since 8:00 PM."
- Sentences without opinion or emotion are purely statistical, factual, or informative, and do not contain praise, criticism, or complaints.
- The structure resembles a news story or official report, usually using formal or semi-formal language.
- There are no sales pitches, commercial links, or calls to action.
- Express likes, dislikes, humor, or personal experiences related to the topic.
- Include reasons, criticism (positive or negative), or recommendations related to the topic.
- Contain questions for discussion, for example, "Has anyone attended this event #KonserMalamMinggu? What were your impressions?"
- Contain personal experiences or field observations presented in the form of a story.
- Contextual discussions that address the meaning or background of trending topics.
- Hashtags and mentions are used relevant to the content, not in a mass, unrelated way.
| Category | Stack |
|---|---|
| Back-end | Flask |
| Notebook | Google Colab (Jupyter) |
| Storage | Google Drive |
| Front-end | Bootstrap (basic styling) |
- Rule-based classification for Twitter/X posts.
- Simple web interface displaying classification results and prediction percentages.
- Integration with Google Colab for model retraining and adjustments.
Access the live demo at: https://twitter-classifier.dodevca.com
- Clone this repository.
- Create and activate a virtual environment:
python3 -m venv venv source venv/bin/activate # For Linux/MacOS # OR venv\Scripts\activate # For Windows
- Install dependencies:
pip install -r requirements.txt
- Run the Flask server:
python app.py
- Access the app at
http://localhost:5000.
Notebook file can be run independently inside
/notebook/twitter_trending_classifier.ipynbfor model retraining or modification.
- Add real-time tweet classification using Twitter Streaming API.
- Expand slang dictionary coverage for better pre-processing.
- Develop a more comprehensive visualization dashboard.
- Implement a lightweight model serving architecture.
Interested in collaborating or discussing this project? Reach me at: LinkedIn Profile or visit dodevca.com.
Initiated by Dodevca, open for collaboration and continuous refinement.