Cosmos-T-80M

Chain-of-Thought Chat Demo

79.7M params 12 attention layers Qwen2.5 tokenizer Apache-2.0

Pretrained from scratch on wop/XXXXXL-chain-of-thought · Model card: wop/Cosmos-T-80M

⚠️ Research / demo model. Only 840 training conversations, so the model is heavily overfit and will hallucinate confidently outside its training distribution. Treat it as a stylish parrot — not a fact source.
Examples
System prompt Temperature Top-K Context window (max 1028) Max new tokens (max 1028)
0 2
1 200
64 1028
16 1028

**Tips** — Keep `temp = 0.1` and `top_k = 1` for the most coherent output. Crank `temp` up to 0.8+ for more creative (but messier) replies. Clear the chat if responses start looping.