Could you compress text files by mapping a word to how commonly it is used and translating it with an application?

Corroded · 10 months ago

Could you compress text files by mapping a word to how commonly it is used and translating it with an application?

youngalfred@lemm.ee · 10 months ago

That’s pretty much what a tokenizer does for Large Language Models like Chat-GPT. You can see how it works here: https://platform.openai.com/tokenizer

Type in the word ‘Antidisestablishmentarianism’ and you can see it becomes 5 tokens instead of 28 characters.