No Stupid Questions @lemmy.world Corroded @leminal.space 1 yr. ago

Could you compress text files by mapping a word to how commonly it is used and translating it with an application?

It's a bit of a weird shower thought but basically I was wondering hypothetical if it would be possible to take data from a social media site like Reddit and map the most commonly used words starting at 1 and use a separate application to translate it back and forth.

So if the word "because" was number 100 it would store the value with three characters instead of seven.

There could also be additions for suffixes so "gardening" could be 5000+1 or a word like "hoped" could be 2000-2 because the "e" is already present.

Would this result in any kind of space savings if you were using larger amounts of text like a book series?

You're viewing a single thread.

24 comments

That’s literally how compression works
- I figured that's roughly how JPEGs work but I wasn't sure how exactly text compression would work. Do you have any recommendations for videos on data compression?
  
  Jpeg compression is much more complex than what you described. I recommend this video as a primer: https://youtu.be/0me3guauqOU

You've viewed 24 comments.