|
Legal data creation
|
|
1
|
28
|
May 16, 2026
|
|
[Dataset] CLI-1M: 975K NL→shell pairs — 13 languages, 6 shells, Apache-2.0
|
|
0
|
18
|
May 14, 2026
|
|
Synthetic Australian medical record PDF library (50-doc free sample) - feedback wanted on dataset
|
|
0
|
32
|
May 7, 2026
|
|
PiC/phrase_retrieval dataset (PR-pass & PR-page) is broken — does anyone have a local copy?
|
|
0
|
18
|
May 5, 2026
|
|
Anyone else fighting the “valid json, broken pipeline” problem in planner-executor stacks?
|
|
4
|
49
|
May 3, 2026
|
|
TikTok-10M Dataset
|
|
7
|
847
|
April 29, 2026
|
|
Dino Data Workflow Routing Preview: training models to route, structure, and prepare actions instead of only replying
|
|
2
|
21
|
April 30, 2026
|
|
Built a lane-based dataset bundle explorer for LLM training — would love feedback from the HF community
|
|
0
|
18
|
April 29, 2026
|
|
When Your “Labels” Aren’t Really Labels: Dealing with Entity-Based NLP Datasets
|
|
1
|
28
|
April 26, 2026
|
|
Made a Python failure dataset for DPO/RLHF — how do you source negative examples?
|
|
0
|
37
|
April 26, 2026
|
|
Load_dataset() creates a duplicate in cache
|
|
1
|
54
|
April 25, 2026
|
|
Spanish Historical Web Corpus — unique categories (religion, folklore, conspiracies, BOE)
|
|
0
|
15
|
April 21, 2026
|
|
Dataset viewer broke after repo rename
|
|
5
|
63
|
April 20, 2026
|
|
Huggingface Dataset Download Stuck in Kaggle
|
|
8
|
216
|
April 14, 2026
|
|
Add new official benchmark on the Hub
|
|
3
|
68
|
April 13, 2026
|
|
Otal AI beginner with a 25-year photography archive—is this useful for training?
|
|
0
|
16
|
April 10, 2026
|
|
QSBench: Synthetic quantum circuit datasets for QML benchmarking
|
|
0
|
38
|
April 6, 2026
|
|
I would like to get an opinion from knowledgeable people (since I don't understand anything about it myself)
|
|
26
|
221
|
April 4, 2026
|
|
Request to delete DOI-locked dataset: th1nhng0/vietnamese-legal-documents
|
|
2
|
33
|
April 1, 2026
|
|
Indic-faker: Generate realistic Indian synthetic data for NLP/ML — 8 languages, native scripts, batch DataFrame export
|
|
3
|
61
|
March 30, 2026
|
|
What are some AI/ML concepts or problems you found difficult while learning?
|
|
1
|
28
|
March 24, 2026
|
|
The downloads count of dataset hasn't been updated
|
|
2
|
36
|
March 19, 2026
|
|
Need help in fine-tuning of OCR model at production grade
|
|
1
|
119
|
March 12, 2026
|
|
Would a curated dataset of ~4000 social media design layouts be useful for training or fine-tuning design models?
|
|
1
|
32
|
March 10, 2026
|
|
Huggingface datasets card not work correctly
|
|
1
|
65
|
March 9, 2026
|
|
Fastdedup: Rust-based dataset deduplication — benchmarks on FineWeb sample-10BT
|
|
2
|
76
|
March 4, 2026
|
|
New Datasets: Human Vocality Primitives Series
|
|
0
|
45
|
March 4, 2026
|
|
Any way to streaming-preprocess a dataset to disk?
|
|
7
|
192
|
March 4, 2026
|
|
Inquiry About Dataset for AI-Driven Cloud Load Balancing and Auto scaling of instances
|
|
2
|
64
|
March 4, 2026
|
|
Looking for Data
|
|
2
|
66
|
March 4, 2026
|