AI & ML interests

None defined yet.

Recent Activity

ronantakizawa 
posted an update 2 months ago
view post
Post
2759
Introducing the github-codereview dataset: A compilation of 200k+ human-written code reviews from top OSS projects (React, Tensorflow, VSCode...).

I finetuned a Qwen2.5-Coder-32B-Instruct model with this dataset and saw significant improvements in generating better code fixes and review comments (4x improved BLEU-4, ROUGE-L, SBERT scores compared to base model).

#codereview #code #datasets

ronantakizawa/github-codereview
ronantakizawa 
posted an update 3 months ago
view post
Post
2501
Introducing the WebUI dataset: a compilation of screenshot to code pairs of modern websites detailing the styling, framework used, and box bounds for all viewports (Desktop, mobile, tablet).

This dataset showed signs of improved performance in web design LLM benchmarks for a finetuned QWEN 2.5 VL-7B!

#web #ui #datasets

ronantakizawa/webui
  • 3 replies
·
ronantakizawa 
posted an update 3 months ago
view post
Post
2121
Introducing the github-top-code dataset: A curated dataset of 1.3M+ source code files from GitHub's top ranked developers.

I collected the best source code files from Github's highest trending developers of all time, and compiled a dataset to train LLMs to write well-structured, production-grade code.

#dataset #codedataset #pretraining

ronantakizawa/github-top-code
ronantakizawa 
posted an update 3 months ago
view post
Post
289
Introducing the LeetCode Assembly Dataset: a dataset of 400+ LeetCode problem solutions in assembly across x86-64, ARM64, MIPS64, and RISC-V using GCC & Clang at -O0/-O1/-O2/-O3 optimizations.

This dataset is perfect for teaching LLMs complex compiler behavior!

#dataset #leetcode #assembly

ronantakizawa/leetcode-assembly
ronantakizawa 
posted an update 3 months ago
view post
Post
244
Hit 10,000+ downloads across my models and datasets on Hugging Face!

Follow for more @ronantakizawa !

#building #datasets #huggingface
jorgemunozl 
posted an update 4 months ago
view post
Post
402
Test

I know that it was buggy, OMG
  • 1 reply
·
ronantakizawa 
posted an update 4 months ago
view post
Post
2671
Moltbook, a Reddit platform only for AI agents, is going viral right now as agents are acting unhinged!

I compiled a dataset of all posts and subreddits in Moltbook so far so anyone can easily analyze the activity in Moltbook.

ronantakizawa/moltbook

#moltbook #clawd #aiagent
  • 2 replies
·
ronantakizawa 
posted an update 5 months ago
view post
Post
404
Introducing the HuggingFace Top Trending Papers dataset: a dataset that compiles the most trending papers on HuggingFace Daily Papers in 2025.

This dataset captures which AI/ML research papers gained the most community attention this year!

#huggingface #papers #dataset

ronantakizawa/huggingface-top-papers
ronantakizawa 
posted an update 5 months ago
ronantakizawa 
posted an update 5 months ago
view post
Post
2694
Introducing the github-top-developers dataset: A comprehensive dataset of the top 8000 developers on GitHub (2020-2025). This dataset captures the evolution of GitHub's trending developers repositories over time and the projects they work on.

#github #developers

ronantakizawa/github-top-developers
  • 4 replies
·
ronantakizawa 
posted an update 5 months ago
view post
Post
295
Introducing the trending-stocks-yahoo-finance dataset: a compilation of the most trending stocks on Yahoo Finance from July 2024 to October 2025.

This dataset captures each trending stock's max price, max market cap, best rank on Yahoo Finance, PE ratio, and trading volume.

#stocks #investing #trading

ronantakizawa/trending-stocks-yahoo-finance
  • 2 replies
·
ronantakizawa 
posted an update 5 months ago
view post
Post
281
Introducing the github-top-projects dataset: A comprehensive dataset of 423,098 GitHub trending repository entries spanning 12+ years (August 2013 - November 2025).

This dataset captures the evolution of GitHub's trending repositories over time, providing insights into software development trends across programming languages and domains, popular open-source projects and their trending patterns, and community interests and shifts in developer focus over 12 years.

ronantakizawa/github-top-projects

#github #softwareengineering
ronantakizawa 
posted an update 5 months ago
view post
Post
1111
Introducing the twitter-trending-hashtags dataset, a compilation of 12,000+ unique trending hashtags on Twitter / X from 2020 to 2025. This dataset captures viral and cultural moments on Twitter / X and is perfect for researchers studying viral content patterns on social media.

ronantakizawa/twitter-trending-hashtags

#twitter #trends #socialmedia
ronantakizawa 
posted an update 6 months ago
view post
Post
1644
Introducing the tiktok-trending-hashtags dataset: a compilation of 1,830 unique trending hashtags on TikTok from 2022 to 2025. This dataset captures viral one-time and seasonal viral moments on TikTok and is perfect for researchers, marketers, and content creators studying viral content patterns on social media.

ronantakizawa/tiktok-trending-hashtags
#tiktok #trends #social-media
ronantakizawa 
posted an update 6 months ago
view post
Post
321
Reached 2500+ total downloads across my models and datasets! 🎉

Follow me for more @ronantakizawa
ronantakizawa 
posted an update 6 months ago
view post
Post
331
Introducing the india-trending-words dataset: a compilation of 900 trending Google searches from 2006-2024 based on https://trends.withgoogle.com. This dataset captures search trends in 80 categories, and is perfect for analyzing cultural shifts and predicting future trends in India.

#india #indiadataset #googlesearches

ronantakizawa/india-trending-words
ronantakizawa 
posted an update 6 months ago
view post
Post
2541
Introducing the japanese-trending-words dataset: a dataset consisting 593 words from Japan’s annual trending word rankings (流行語大賞) from 2006-2025. This dataset provides the top 30 words from each year and its meaning in Japanese and english. This resource is awesome for NLP tasks understanding recent Japanese culture and history.

ronantakizawa/japanese-trending-words

#japanese #japanesedataset #trending