Search

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

4.6 (589) · $ 17.00 · In stock

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

Together, the developer, claims it is the largest public dataset specifically for language model pre-training

Recent News articles, page 6 of 134

Recent News articles, page 6 of 134

RedPajama's Giant 30T Token Dataset Shows that Data is the Next Frontier in  LLMs

RedPajama's Giant 30T Token Dataset Shows that Data is the Next Frontier in LLMs

GPT-4 – Dr Alan D. Thompson – Life Architect

GPT-4 – Dr Alan D. Thompson – Life Architect

2311.17035] Scalable Extraction of Training Data from (Production) Language  Models

2311.17035] Scalable Extraction of Training Data from (Production) Language Models

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

2311.17035] Scalable Extraction of Training Data from (Production) Language  Models

2311.17035] Scalable Extraction of Training Data from (Production) Language Models

RLHF: Reinforcement Learning from Human Feedback

RLHF: Reinforcement Learning from Human Feedback

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training  large language models

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models

Shamane Siri, PhD on LinkedIn: RedPajama-Data-v2: an Open Dataset with 30  Trillion Tokens for Training…

Shamane Siri, PhD on LinkedIn: RedPajama-Data-v2: an Open Dataset with 30 Trillion Tokens for Training…