CAD$20+

LLM Training Dataset 100k Elementary Animal Comparisons QA

I want this!

LLM Training Dataset 100k Elementary Animal Comparisons QA

CAD$20+

The dataset is synthetic, generated from Java directly. Examples can be seen an interacted with on Hugging Face https://huggingface.co/datasets/CJJones/LLM_Training_Animal_Comparisons

Pricing is low on this one due to a dip in quality on some entries. Recommended use is as a prompt augmentation for more variety and corrections.

Note: Once several copies of this data have been sold, the full dataset will be released to public domain. My projects are intended for community interest, models and data are released as time and effort is balanced with income. By buying the full dataset available now, you support open data development directly.

For a full listing of related products see https://huggingface.co/CJJones

Dataset is provided as shown in the linked Hugging Face example.

Copyright © C.J. Jones, 2025

By purchasing this dataset, you are granted a non-exclusive, worldwide, perpetual license to use, modify, and distribute the dataset and its derivatives for commercial and non-commercial purposes, subject to the terms below:

✅ You May:

  • Use the dataset in commercial applications, products, models, and research.
  • Modify, adapt, and build upon the dataset for any use.
  • Redistribute derived works, models, or outputs generated using the dataset.

❌ You May Not:

  • Resell, redistribute, or repackage the raw dataset itself in its original form.
  • Claim exclusive ownership or authorship of the dataset.
  • Use the dataset for unlawful, harmful, or deceptive purposes.

💬 Attribution (Optional):

Attribution is appreciated but not required.

Note: This dataset is being sold to fund the development of a 500M -1B, fully transparent GPT community model. All data used in the model will eventually be made open to the public with it's release.

⚠️ Disclaimer:

The dataset is provided "as is", without warranties or guarantees of any kind. The creator assumes no liability for any direct or indirect damages resulting from the use of the dataset.

CAD$
I want this!
Size
23.2 MB