Word Count: The 'Hello World' of Text Analysis, Part 1
- published: October 17, 2025 estimate: 3 min read view-cnt: 1 views
Hello Word
Spoiler:
This is NOT the technical post you’re looking for…😓
This article covers only the background story and the result.
I will leave the technical part until tomorrow.
This post sparked so many interesting topics I want to explore:
- language comparison: finding the best language for scripting
- keyword analysis (the area I haven’t stepped into at all)
- how to prompt LLM properly to get the simplest implementation
- all the weirdness of bash scripting from a JS/C# developer’s perspective
- why and how to use plain text to store your precious data
Hopefully, some of these will be covered in the future!
Story Time
I type a lot every day, which includes daily journals (a single folder), various types of notes (nested folders), and code.
These texts are stored as plain-text files and scattered everywhere inside a folder named data.
(p.s. I treat this folder as the main data source of my future PKM system)
Folders may contain subfolders which grow into a deeply nested structure.
I run wc -w to track how many words I type every day; however, it only works with daily journals.
(for people who are confused, wc -w is the command I run in a terminal)
I need to find an efficient way to count these nasty scattered files.
The Motivation
“Counting words” is the entry-level application in the text-analysis world (according to some LLM).
I’ve been fascinated with the concept of mining useful information from my knowledge base for a long long time.
This type of task is also known as text-analysis (again, according to some LLM).
That’s why I got hooked on this idea and tried to implement it in the most minimal and maintainable way.
These are the prompts that generated better outcomes:
- implement it with “Unix Tools Philosophy”
- Start with a basic command such as
wc -w- then, follow it with your requirements
- and, ask AI to complete your command
The Outcome
- column separator: semicolon ’;’
- column 1: date in format of yyMMdd
- column 2: word count
- column 3: affected file count
251017;687;20
251016;1634;21
251015;771;14
251014;352;2
251013;623;4
251012;670;13
251011;548;10
251010;421;7
It seems on average I type around 700 words per day, excluding the blog posts 😓
The script only counts words in the main data folder, and I haven’t figured out a proper way to deal with the text stored in the coding folders.
This blog is tied to a coding project, hence the script does not count words from my posts.
And I refuse to do any workarounds at this point 🤣🤣 (p.s. It would be 1000+ words on average if blog posts also counted)
END OF BED TIME STORY
Phew! That’s a lot of text in this article.
One of my goals is to keep the reading time of each blog post less than three minutes!
I hope someone has already learned something useful at this point.
See you in the upcoming technical part! 🤓
No comments yet
Be the first to comment!