Transparent, Reproducible Analysis
Haberler.cloud uses a multi-stage NLP pipeline with 11 specialized analyzers to evaluate news articles. Our methodology is designed to be transparent, objective, and continuously improving.
Analysis Pipeline Overview
When an article enters our system, it goes through a comprehensive analysis pipeline:
Content Extraction
We extract the article text, title, author, and publication date. Only a 500-character snippet is stored for fair use compliance.
Text Preprocessing
The text is tokenized, normalized, and prepared for analysis. We detect the language (Turkish or English) and load appropriate NLP models.
Multi-Analyzer Processing
The content passes through 11 specialized analyzers, each examining different aspects of the article. These run in parallel for efficiency.
Score Aggregation
Individual analyzer outputs are combined using weighted aggregation to produce final credibility and quality scores.
Version Tracking
We hash the content and compare against previous versions to detect stealth edits or deletions over time.
The 11 Analyzers
Each analyzer is a specialized module that examines specific aspects of the article:
Uses TextBlob and VADER lexicons to determine the emotional tone of the article.
Calculates Flesch Reading Ease score and grade level.
Detects political bias using keyword patterns and linguistic markers.
Detects citations, named sources, and other credibility indicators.
Identifies propaganda techniques such as loaded language, name-calling, and fear-mongering.
Assesses misinformation risk by analyzing claim patterns and source attribution.
Identifies logical fallacies such as ad hominem attacks, strawman arguments, and false dichotomies.
Evaluates whether the article provides educational value by checking for context and complexity explanation.
Analyzes headlines and content for clickbait patterns including curiosity gaps and exaggeration.
Uses spaCy NER to extract persons, organizations, and locations mentioned.
Extracts keywords and identifies main topics using TF-IDF algorithms.
Technologies Used
Our analysis pipeline is built with industry-standard NLP and machine learning technologies:
Python 3.11+ BERT Transformers spaCy 3.7 TextBlob VADER Sentiment scikit-learn FastAPI PostgreSQL
Limitations and Caveats
Important: Our analysis has limitations that users should understand.
- Not Fact-Checking: We analyze writing patterns and indicators, not factual accuracy.
- Algorithmic Bias: Our models may have inherent biases from training data.
- Language Limitations: Currently optimized for Turkish and English.
- Context Blindness: Algorithms may miss context, satire, or nuance.
Feedback Welcome: If you notice analysis errors or have suggestions, please contact us.