LLM-based API¶
LLM-based Classification¶
llm_classify_news¶
Classify news articles using Large Language Models (Claude, OpenAI) with flexible custom categories and web content fetching.
Arguments:
df: pandas dataframe. No default.col: column with the story text. Default istextprovider: LLM provider ("claude"or"openai"). Default is"claude"categories: optional dictionary of custom categories. Default uses standard categories.api_key: optional API key (uses environment variable if not provided)fetch_content: whether to fetch full content from URLs. Default isFalseurl_col: column with URLs if fetch_content=True. Default is"url"
Functionality:
Classifies articles using state-of-the-art language models
Provides confidence scores and reasoning for each classification
Supports custom category definitions with descriptions and examples
Can fetch and analyze full web content from URLs
Handles rate limiting and error recovery automatically
Output:
Appends columns with classification results:
llm_category_{provider}: predicted category namellm_confidence_{provider}: confidence score (0-1)llm_reasoning_{provider}: brief explanation of classification
Setup:
Install with LLM dependencies:
pip install notnews[llm] # or uv add notnews --extra llm
Set API keys:
export ANTHROPIC_API_KEY="your_claude_api_key" export OPENAI_API_KEY="your_openai_api_key"
Examples:
Basic Classification:
>>> import pandas as pd >>> from notnews import llm_classify_news >>> >>> df = pd.DataFrame({ ... 'text': [ ... 'The Federal Reserve raised interest rates by 0.25%', ... 'Taylor Swift spotted at Kansas City Chiefs game' ... ] ... }) >>> >>> result = llm_classify_news(df, provider='claude') >>> print(result[['text', 'llm_category_claude', 'llm_confidence_claude']]) text llm_category_claude llm_confidence_claude 0 The Federal Reserve raised interest rates by 0.25% hard_news 0.95 1 Taylor Swift spotted at Kansas City Chiefs game soft_news 0.92
Custom Categories:
>>> custom_categories = { ... 'finance': { ... 'description': 'Financial news, markets, economic policy', ... 'examples': ['Stock market updates', 'Federal Reserve decisions'] ... }, ... 'entertainment': { ... 'description': 'Celebrity news, movies, music, entertainment', ... 'examples': ['Celebrity sightings', 'Movie releases'] ... } ... } >>> >>> result = llm_classify_news(df, provider='claude', categories=custom_categories) >>> print(result[['text', 'llm_category_claude']]) text llm_category_claude 0 The Federal Reserve raised interest rates by 0.25% finance 1 Taylor Swift spotted at Kansas City Chiefs game entertainment
Web Content Fetching:
>>> df_with_urls = pd.DataFrame({ ... 'text': ['Breaking economic news', 'Celebrity update'], ... 'url': ['https://example.com/economy', 'https://example.com/celeb'] ... }) >>> >>> result = llm_classify_news(df_with_urls, ... provider='claude', ... fetch_content=True, ... url_col='url') >>> print(result[['url', 'llm_category_claude', 'llm_fetched_content']])
Default Categories:
hard_news: Politics, economics, international affairs, policysoft_news: Entertainment, lifestyle, sports, celebrity coverageopinion: Editorial content, opinion pieces, analysisbusiness: Business news, corporate announcements, financetechnology: Tech industry, product launches, innovationscience: Scientific discoveries, research, health studies