Quick Start

Quick Start

>>> import pandas as pd
>>> from notnews import *

>>> # Get help
>>> help(soft_news_url_cat_us)

Help on method soft_news_url_cat in module notnews.soft_news_url_cat:

soft_news_url_cat(df, col='url') method of builtins.type instance
    Soft News Categorize by URL pattern.

    Using the URL pattern to categorize the soft/hard news of the input
    DataFrame.

    Args:
        df (:obj:`DataFrame`): Pandas DataFrame containing the URL
            column.
        col (str or int): Column's name or location of the URL in
            DataFrame (default: url).

    Returns:
        DataFrame: Pandas DataFrame with additional columns:
            - `soft_lab` set to 1 if URL match with soft news URL pattern.
            - `hard_lab` set to 1 if URL match with hard news URL pattern.

>>> # Load data
>>> df = pd.read_csv('./tests/sample_us.csv')
>>> df
            src                                                url                                               text
0             nyt  http://www.nytimes.com/2017/02/11/us/politics/...  Mr. Kushner on something of a crash course in ...
1  huffingtonpost  http://grvrdr.huffingtonpost.com/302/redirect?...  Authorities are still searching for a man susp...
2             nyt  http://www.nytimes.com/2016/09/19/us/politics/...  Photo  WASHINGTON — In releasing a far more so...
3          google  http://www.foxnews.com/world/2016/07/17/turkey...  The Turkish government on Sunday ratcheted up ...
4             nyt  http://www.nytimes.com/interactive/2016/08/29/...  NYTimes.com no longer supports Internet Explor...
5           yahoo  https://www.yahoo.com/news/pittsburgh-symphony...  PITTSBURGH AP — Pittsburgh Symphony Orchestra ...
6         foxnews  http://www.foxnews.com/politics/2016/08/13/cli...  Hillary Clintons campaign is questioning a rep...
7         foxnews  http://www.foxnews.com/us/2017/04/15/april-gir...  April the giraffe has given birth at a New Yor...
8         foxnews  http://www.foxnews.com/politics/2017/05/03/hil...  Want FOX News Halftime Report in your inbox ev...
9             nyt  http://www.nytimes.com/2016/09/06/obituaries/p...  Shes an extremely liberated woman Ms. DeCrow s...
>>>
>>> # Get the Soft News URL category
>>> df_soft_news_url_cat_us  = soft_news_url_cat_us(df, col='url')
>>> df_soft_news_url_cat_us
            src                                                url                                               text  soft_lab  hard_lab
0             nyt  http://www.nytimes.com/2017/02/11/us/politics/...  Mr. Kushner on something of a crash course in ...       NaN       1.0
1  huffingtonpost  http://grvrdr.huffingtonpost.com/302/redirect?...  Authorities are still searching for a man susp...       NaN       NaN
2             nyt  http://www.nytimes.com/2016/09/19/us/politics/...  Photo  WASHINGTON — In releasing a far more so...       NaN       1.0
3          google  http://www.foxnews.com/world/2016/07/17/turkey...  The Turkish government on Sunday ratcheted up ...       NaN       1.0
4             nyt  http://www.nytimes.com/interactive/2016/08/29/...  NYTimes.com no longer supports Internet Explor...       NaN       1.0
5           yahoo  https://www.yahoo.com/news/pittsburgh-symphony...  PITTSBURGH AP — Pittsburgh Symphony Orchestra ...       1.0       NaN
6         foxnews  http://www.foxnews.com/politics/2016/08/13/cli...  Hillary Clintons campaign is questioning a rep...       NaN       1.0
7         foxnews  http://www.foxnews.com/us/2017/04/15/april-gir...  April the giraffe has given birth at a New Yor...       NaN       NaN
8         foxnews  http://www.foxnews.com/politics/2017/05/03/hil...  Want FOX News Halftime Report in your inbox ev...       NaN       1.0
9             nyt  http://www.nytimes.com/2016/09/06/obituaries/p...  Shes an extremely liberated woman Ms. DeCrow s...       NaN       NaN
>>>

LLM-based Classification

>>> from notnews import llm_classify_news
>>>
>>> # Modern LLM classification with Claude or OpenAI
>>> # Requires: pip install notnews[llm] and ANTHROPIC_API_KEY env var
>>> 
>>> df_sample = pd.DataFrame({
...     'text': [
...         'Federal Reserve raises interest rates by 0.25% citing inflation',
...         'Taylor Swift breaks attendance records at sold-out concert'
...     ]
... })
>>>
>>> result = llm_classify_news(df_sample, provider='claude')
>>> print(result[['text', 'llm_category_claude', 'llm_confidence_claude']])
                                               text llm_category_claude  llm_confidence_claude
0  Federal Reserve raises interest rates by 0.25...           hard_news                   0.95
1  Taylor Swift breaks attendance records at sol...           soft_news                   0.92
>>>