Word Count: The 'Hello World' of Text Analysis, Part 2
- published: October 18, 2025 estimate: 4 min read view-cnt: 1 views
Prerequisite
This article is for readers who’ve finished part 1 and are interested in the technical part. Readers should already know the following concepts:
- bash, terminal
- Unix Philosophy
- commonly used POSIX tools
I will intentionally skip explaining syntax, and focus on the design/thought process. We are living in a world where answers are no longer valuable. One can always find an answer on the internet. Asking the right questions or having the right mindset is more valuable nowadays.
First Thing First: The Sauce
wcount=0;
fcount=0;
today=$(date +%Y-%m-%d);
while read -r file; do
cur=$(wc -w < "$file");
prev=$(getfattr -n user.prev_wc --only-values "$file" 2>/dev/null || echo 0);
delta=$((cur - prev));
wcount=$((wcount + delta));
setfattr -n user.prev_wc -v "$cur" "$file";
echo "$file: cur=$cur prev=$prev delta=$delta";
((++fcount))
done < <(find ~/data -type f -newermt "$today");
dd="${today:2}"
dd="${dd//-/}"
result="$dd;$wcount;$fcount"
echo $result
sed -i "1i$result" ~/chronicle/wc-global
Brief:
- use find to get all the files with a modified date greater than or equal to today
- run word count on each file
- get previous word count from a custom attribute
- this is the case when you edited an older file today
- you only need to calculate the delta (the words you added)
- I use custom attribute to store the value
- set the current word count value back to the custom attribute
- output the result
- done
How I Came Up With This Solution
Prompt History
The very first one
I want to scan all the text files recursively in the current folder.
Here are the requirements:
* scan all files that are modified on a given date
* pipe all the text to wc to get the word count
* sum up all the word counts
* written in bash script
Brief comments on all the follow-up questions I asked:
- LLM gave long scripts that were too hard to comprehend
- spent some time working on the syntax and trying to compose it myself piece by piece
- started questioning the algorithm. Realized the role of created date, modified date and delta
- went back to the basics, double-checked the limitation of using a single date value in this algorithm
- accepted the limitation and decided to use the simplest solution
- figured out the last piece of the puzzle: use custom attribute to store the last state
- made my way to the result and also learned the prompt tips mentioned in part 1
Retrospective
I started my planning phase using Claude Web client, and then switched to Claude CLI once I was comfortable working with the generated scripts.
Besides the overkill scripts, there was also some quirky bash syntax that I was not aware of. I tend to understand all the quirks before running the scripts or adding new features.
LLM kept generating complicated scripts that made me start to wonder what I was trying to solve.
LLM did help me understand the limitations and essential parts of the algorithm,
though it used overly complex language.
Finally I came up with the idea to store the “state” in the metadata of a file which was the last piece of the puzzle. Everything went smoothly afterward.
AI Pros & Cons
Overkill Solution
The script was written formally with separate sections for arguments and error checks.
This is practical for large teams, but overkill for a personal project.
LLM gave a complete solution including:
- introduce a database to store all the state
- use git to help track the modification as well as the state
- use file system snapshot to track
These are all good options if I have the following requirements:
- users require querying arbitrary history data multiple times, and the result needs to be idempotent
- which is not the case. I run the script on a daily basis, and keep all the records in one place
- i.e. NO need to worry about querying or idempotence
- users require tracking detailed modifications such as word deletion or file removal
- in my case, I count additions exclusively
Things got clearer once I decided to keep the project scope minimal.
LLM also helped with implementing best practices in a text-based data store system. We could explore the idea in another article.
Failed To Deliver Simple Solution
This one is debatable. I thought giving accurate prompts would help the LLM deliver simpler solutions.
But maybe simplicity is just a matter of taste.
Can’t tell if it was me being grumpy or LLM failed to express minimalism 😓
Repeating Themselves For The Same Mistake
This one is hilarious and interesting at the same time.
I saw the LLM make the same mistake and then immediately correct itself—twice in the same response!
(This happened with two different LLMs, once each.)
I cannot tell if there’s another thread to monitor the response and correct themselves dynamically during a response.
If not, it would be ridiculous if LLM already knew the answer, and still deliberately threw out a wrong answer first 🤣🤣
End Of The Article
Enough words for me today 😮💨
It turns out skipping all technical/syntax explanations can still be really lengthy!
Hope you learned a thing or two here. See you in the next one 🤓
No comments yet
Be the first to comment!