I’m trying to create a reliable way to track open-source AI news, including new model releases, key GitHub projects, and major community updates, but I’m overwhelmed by the number of sources and constant changes. What tools, feeds, or workflows are you using to stay up-to-date on open-source AI developments without spending hours every day?
Short version. Treat this like building a mini data pipeline, not a “news reading habit”.
Here is a practical setup that stays manageable.
- Decide what you track
Make 3 buckets.
Models
- New open source LLMs, VLMs, diffusion models.
- Benchmarks, evals, key blog posts with numbers.
Code
- Libraries like vLLM, llama.cpp, transformers, diffusers, vLLM, flash-attn.
- Serving stacks like TGI, Ollama, Open WebUI.
- Inference optimizers like TensorRT, vLLM, MLC.
Community
- Standards, licenses, drama with real impact.
- Large projects changing direction.
Write this down. If it is not in your buckets, ignore it.
- Pick a small source set
RSS still works and does not spam you.
Feeds worth adding
- Hugging Face blog RSS
- Hugging Face “Models” filter: license = open, sorted by downloads (they have RSS per tag).
- Papers with Code “Latest” + “Trending” RSS.
- arXiv: cs.CL, cs.LG, cs.AI with keyword filters “llm”, “diffusion”, “multimodal”, “open source”.
- GitHub Trending for Python, C++, Jupyter, TypeScript with topic filters “machine-learning”, “deep-learning”, “llm”, “nlp”.
Use one reader. Example stack
- FreshRSS or Miniflux on a cheap VPS.
- Or a local reader like Fluent Reader or RSS Guard.
Set update to 1 hour or 3 hours, not real time.
- Track GitHub like a robot, not with your eyeballs
Use GitHub features, not manual checking.
Step-by-step.
a) Star and “Watch” key repos
- transformers, diffusers, accelerate
- vLLM, llama.cpp, Ollama, text-generation-inference
- open source models, e.g. mistralai, deepseek-ai, Qwen, NousResearch, HuggingFaceH4
Set “Watch” to “Releases only” for most.
Use “All activity” only for a few core repos.
b) Use GitHub RSS
GitHub supports RSS for releases and commits. Examples
Feed those into your RSS reader.
c) Use GitHub Advanced Search saved links
Example URL for new open source LLM repos in the last week.
Repository search results · GitHub>2026-02-01&type=Repositories&s=stars&o=desc
Tweak date and license, then bookmark. Check once per week.
- Automate filtering instead of reading everything
Write a small script in Python or JS.
Inputs
- Your RSS feeds
- GitHub search APIs
- Hugging Face Hub API
Logic idea
- Filter for keywords in title or summary: “release”, “v1.0”, “Llama”, “Mixtral”, “Qwen”, “DeepSeek”, “flash attention”, “quantization”.
- Keep only items from trusted domains.
- Drop “opinion” posts unless you really want them.
You can push the filtered items to
- A daily HTML page.
- A Telegram channel (bot).
- A Discord channel via webhook.
- Use Hugging Face Hub data
Hugging Face gives you good signals if you query it.
Example workflow
- Once per day, list top models sorted by downloads in last 7 days.
- Filter to license in [apache-2.0, mit, bsd-3-clause, mpl-2.0].
- Ignore “gguf” forks and obvious quant clones using simple string rules.
This gives you a ranked list of what the community actually downloads, not what gets hype on X.
- Set a strict schedule
Without rules, you will drown.
Example rule set
-
Daily, 10–15 minutes
- Check your filtered feed or bot output.
- Open at most 5 tabs. Close everything else.
-
Weekly, 30–45 minutes
- Review GitHub search bookmarks.
- Look at Hugging Face weekly “top movers” via API.
- Save 3–10 items to a “digest” note.
-
Monthly
- Prune feeds that did not give value that month.
- Add at most 2 new sources.
- Build your own “AI news tracker” page
Simple static page works.
Options
-
Use a GitHub repo with a cron workflow
- Fetch RSS + APIs
- Run your filter script
- Generate index.html with the latest 50 items
- Push to GitHub Pages
-
Or use a tiny Flask or FastAPI service behind Nginx
- Cache feeds in SQLite
- Serve a simple UI with search / tags.
Keep the UI boring. Title, source, date, tags, link. No scrolling animations, no votes.
- Define your trust list
To avoid junk, curate a small list of sources you trust.
Examples
- Hugging Face blog
- OpenAI, Google DeepMind, Meta AI, Mistral, Anthropic, xAI, Stability
- Labs like EleutherAI, LAION, AllenAI, FAIR blog
- A few newsletters like “Latent Space”, “Import AI”, “BAIR blog” if they provide data, not vibes.
Rank them in your script.
- Level 1: always show.
- Level 2: show only if text contains “release”, “weights”, “code”, “benchmark”.
- Level 3: ignore.
- Fight FOMO on purpose
Set explicit “no” rules.
Example no-list
- No tweets.
- No TikTok, YouTube Shorts “explainer” junk.
- No “10 prompts to do X” content.
- No benchmarks without full numbers table or code link.
These rules keep your tracker focused on open source progress, not content farming.
- If you want community input
If you want other people to use this tracker:
- Publish the code, config, and a short README.
- Expose a JSON feed of your filtered items.
- Log your filter rules and change history in the repo.
This way the thing stays transparent and others can fork it for their own focus.
Rough minimal stack example
- Backend: Python, feedparser, requests, sqlite3.
- Scheduler: cron or GitHub Actions.
- Frontend: one HTML file with vanilla JS and a table.
- Deploy: GitHub Pages or a cheap VPS nginx site.
You end up with:
- A small source list.
- Automated filters.
- A single place you check once per day.
If you share what stack you like using and your comfort with code, it is easy to sketch a concrete script layout for you.
If you treat this as a product instead of just a pipeline, it gets a lot easier to reason about. @nachtdromer nailed the plumbing side; I’d focus on the “what makes this actually useful to a human” side.
Here’s a different angle:
1. Define the deliverable first, not the sources
Decide what you want your tracker to spit out:
- “Daily top 10 things worth my time”
- Each item tagged like:
[Model],[Library],[Infra],[Community] - One-line human summary + link
If you can’t describe your output format in 5 lines, you’ll end up hoarding feeds and never shipping.
Design a mock “issue #1” of your ideal digest in a markdown file. Literally write fake entries. That becomes your spec. Then you work backward to make the system produce something that looks like that.
2. Add a second layer: impact scoring
Instead of pure keyword filtering like in @nachtdromer’s setup, add a tiny scoring system so your tracker has a sense of “importance,” not just “contains the word LLM.”
Example heuristic score (0 to 10):
- Base score:
- +4 if from a core org (HF, Meta, Mistral, Qwen, DeepSeek, Eleuther, LAION, etc.)
- +2 if from a known high-signal person or project (you maintain this list in a YAML file)
- Activity-based:
- +1 to +3 if GitHub repo stars in last 3 days > threshold
- +1 if paper also appears on Papers with Code “trending”
- Content-based:
- +2 if title/body includes “weights”, “code released”, “license:”
- −2 if title includes “opinion”, “thoughts on”, “future of”
Then your daily script:
- Fetch everything
- Score each item
- Only keep items with score ≥ 5
- Hard cap: top N by score per day
This kills a lot of FOMO noise without you having to manually babysit.
3. Use cross-signals instead of more sources
You don’t need more feeds, you need agreement across them.
For example:
- If a model appears on Hugging Face + has a GitHub repo + shows on Papers with Code trending, auto-mark it “high priority”
- If a new GitHub project spikes stars and matches “llm | inference | quantization” and also gets mentioned on a blog feed, promote it
So instead of “read from 50 sources,” you’re doing “only show me things that at least 2 sources implicitly agree are interesting.” That tends to track real impact better than agreggating every random blog.
4. Make humans part of the loop, lightly
Disagreeing slightly with the pure automation approach: a 100% filter-based tracker will drift or miss weird-but-important stuff (e.g. a tiny repo that later blows up).
Two cheap feedback mechanisms:
- A simple “keep / meh / junk” triage button for yourself
- Store that in a small DB table:
item_id, label - Use it offline later to adjust your scoring rules
- Store that in a small DB table:
- A tiny “submit interesting link” form
- Community posts flow into a separate queue
- They get a lower default score until confirmed by your rules or by you
You are not training an ML model here, just iterating your heuristics with actual usage data.
5. Make the tracker explain itself
One underrated idea: surface why something appeared.
Show along with each item:
- “Reason: HF + GitHub + >500 stars in last 3 days”
- Or “Reason: From trusted org + contains ‘weights’ + ‘Apache-2.0’”
This transparency does two things:
- Makes it easier to spot bad rules
- Makes the tool actually nicer to use, because you can quickly decide if you care
If you ever open your tracker and think “why is this garbage here,” your system isn’t explainable enough.
6. Don’t build a “feed,” build a diff viewer
Most AI news is “version N+1 of thing you already know about.”
So model your UI around changes, not items:
- “New models this week”
- “Major version bumps”
- “Repos crossing X stars for the first time”
- “License changes or repo archivals”
That way your page reads like:
llama.cpp→ v0.0.X to v0.0.Y, changelog mentions “Apple”, “Metal”- New repo:
deepseek-v2[Model][Apache-2.0] now above 5k stars - HF model:
Qwen2.5-Coder-32Bnew, >3k downloads in 24h
Diffs compress the firehose down to the stuff that actually changed in the ecosystem.
7. Keep “context” separate from “events”
If you mix explainers, thinkpieces, and actual releases in the same feed, you’ll drown.
So:
- Tracker = events only
- Releases, code drops, new benchmarks, license clarifications
- Context = long-form
- A separate Obsidian vault / Notion / markdown folder where you save:
- 1 or 2 good explainers a week
- Notes on “what this model is actually good at” after you try it
- A separate Obsidian vault / Notion / markdown folder where you save:
Your tool should show only what happened. Your notes are why it matters to you.
8. Start ugly, but commit to a 4-week refactor
Very specific suggestion:
-
Week 1
- Hardcode 10 sources
- Hardcode a dumb scoring algorithm
- Dump JSON + a primitive HTML page
-
Week 2
- Add tags
[Model] [Code] [Infra] [Community] - Add explanation text per item
- Add tags
-
Week 3
- Add a tiny SQLite DB and “already_seen” table, stop resurfacing old stuff
-
Week 4
- Review what you actually clicked
- Adjust scores and sources based on that, not vibes
By the end of 4 weeks you’ll have something opinionated instead of a generic aggregator, which is where it becomes reliable and personally useful.
If you share which language / stack you’re most comfy with, you can absolutely sketch this in like 2–3 files and keep it open-source without needing some monster microservice zookeeper k8s circus.
You’re already drowning in sources, so instead of more plumbing I’d lean into structure and opinion. Think of this as building an “editor in code” rather than a crawler.
1. Start from topics, not feeds
Where I slightly disagree with the scoring-first approach: if you do not have stable topic buckets, your scores will drift into mush.
Define 5–7 stable tracks like:
- Foundation models & checkpoints
- Inference & optimization (quant, vLLM, serving, GPU tricks)
- Tooling & agents
- Data & evaluation
- Governance & licenses
Each item must belong to exactly one track. If it fits more than one, you split it into two entries. This forces your tracker to be opinionated about what kind of news it thinks it is.
Implementation trick:
- Maintain a simple
topics.ymlwith:- Name, short description
- A few positive keywords and a few exclusion words
Use this as a routing layer on top of whatever ingestion pipeline you build (RSS, GitHub, Hugging Face, etc.).
Now your “daily top 10” becomes “top 2 in each track” which is psychologically easier to process than a single global list.
2. Treat projects as entities with history
Both you and @nachtdromer are flirting with this idea, but I’d hard-commit: your core unit should be a “project profile,” not a “link.”
For each project/model:
id(stable slug)- Canonical URLs: GitHub, HF, paper, website
- Tags:
model,serving,quant,rl, etc. - Signals over time: stars, downloads, new releases, papers citing
Then your tracker becomes a change log over these entities:
- “New project added”
- “Release v0.9.0 → v1.0.0”
- “Stars crossed 10k”
- “Weights released for existing paper”
That naturally gives you the “diff viewer” effect without special UI magic. It also helps reduce duplicates when the same thing appears as a paper, a repo and a blog post.
You can store this in SQLite with three tables:
projectseventssources
Every ingest step just appends events, and a daily job summarizes “what changed per project.”
3. Build opinions into the code
Instead of endless rules, write down explicit editorial stances and encode them.
Examples:
- “I care more about permissively licensed models than anything else.”
- “I do not surface closed English-only models unless they break a benchmark by a large margin.”
- “Agent frameworks are noise unless they exceed X adoption or show novel capabilities.”
Turn those into explicit filters:
- License whitelist/blacklist that heavily affects score
- Special treatment for “Apache-2.0 / MIT / BSD” releases
- Explicit caps per topic: e.g., at most 1 agent framework story per day
This is where you diverge from a generic aggregator and become useful to yourself (and maybe others). The output should reflect your priorities, not “what’s objectively important.”
4. Put manual curation in the right place
I agree you should not hand-curate the whole feed. But I also think button-based “keep / meh / junk” can still be too time consuming.
Alternative:
- Fully automatic daily digest
- Once a week, a 10–15 minute manual pass over:
- “Promote to ‘must read’”
- “Never show this project again”
Both are project-level, not item-level. This way one click fixes a whole future class of noise.
Your UI only needs two lists per week:
- New projects that entered the system
- Projects that generated events this week
You skim, tag, and move on. Over time your heuristic engine stops seeing junk not because the rules got smarter, but because the world of “things you care about” got cleaner.
5. Think about format like a newsletter editor
Before you build anything, design 3 example issues in Markdown:
- A “normal” day
- A crazy release day (multiple big models)
- A quiet weekend
Force yourself to fit each into:
- 1 screen of “headline” items
- 1 screen of “nice to skim”
- 1 tiny “long tail / niche” section
You will notice quickly if you’re over-indexing on one area (e.g. yet another loader for the same models). Use that to reshape your topic buckets and editorial rules.
6. About the empty product title: pros & cons
You mentioned integrating a product called for readability and SEO. With no real name or feature set, I can only speak at a pattern level, but here is how a dedicated “AI news tracker” product typically stacks up compared to rolling your own:
Pros of using :
- Faster setup than building an entity + event system from scratch
- Likely has a nicer UI than a homemade HTML + JSON dump
- Might already integrate common sources like GitHub, HF, Papers with Code
- Often supports saved searches, tags and basic scoring out of the box
- Easier to share with others if it has user accounts and feeds
Cons of using :
- Harder to encode your deeply opinionated filters
- Limited control over data model (projects vs links vs events)
- You depend on someone else’s update latency and source choices
- Export / backup can be weaker than your own SQLite + Git repo
- If it is not open source itself, transparency and extensibility will be limited
The sweet spot I see: use something like as the front-end digest viewer and your open source tracker as the back-end curator. Your code produces a clean, opinionated RSS/JSON feed; just renders it nicely and maybe adds search or tagging. That way you get custom logic without building your own UI stack.
7. Positioning vs @nachtdromer
The approach from @nachtdromer is awesome for the pipeline and filtering layer and is closer to “better plumbing and signal heuristics.”
What I’m suggesting here is to go one abstraction level higher:
- Treat AI projects as entities with life cycles
- Encode personal editorial stances directly in code
- Constrain everything into a small, opinionated layout
If you do that, even a crude scoring model and a small list of sources will give you a far more “human-usable” tracker than a super sophisticated ingestion system that does not know what you personally care about.