JT's Weblog

Word Count: The 'Hello World' of Text Analysis, Part 2

published: October 18, 2025 estimate: 4 min read view-cnt: 1 views

Prerequisite

This article is for readers who’ve finished part 1 and are interested in the technical part. Readers should already know the following concepts:

I will intentionally skip explaining syntax, and focus on the design/thought process. We are living in a world where answers are no longer valuable. One can always find an answer on the internet. Asking the right questions or having the right mindset is more valuable nowadays.

First Thing First: The Sauce

wcount=0;
fcount=0;
today=$(date +%Y-%m-%d);
while read -r file; do
  cur=$(wc -w < "$file");
  prev=$(getfattr -n user.prev_wc --only-values "$file" 2>/dev/null || echo 0);
  delta=$((cur - prev));
  wcount=$((wcount + delta));
  setfattr -n user.prev_wc -v "$cur" "$file";
  echo "$file: cur=$cur prev=$prev delta=$delta";
  ((++fcount))
done < <(find ~/data -type f -newermt "$today");
dd="${today:2}"
dd="${dd//-/}"
result="$dd;$wcount;$fcount"
echo $result
sed -i "1i$result" ~/chronicle/wc-global

Brief:

How I Came Up With This Solution

Prompt History

The very first one

I want to scan all the text files recursively in the current folder.
Here are the requirements:
* scan all files that are modified on a given date
* pipe all the text to wc to get the word count
* sum up all the word counts
* written in bash script

Brief comments on all the follow-up questions I asked:

Retrospective

I started my planning phase using Claude Web client, and then switched to Claude CLI once I was comfortable working with the generated scripts.

Besides the overkill scripts, there was also some quirky bash syntax that I was not aware of. I tend to understand all the quirks before running the scripts or adding new features.

LLM kept generating complicated scripts that made me start to wonder what I was trying to solve.

LLM did help me understand the limitations and essential parts of the algorithm,
though it used overly complex language.

Finally I came up with the idea to store the “state” in the metadata of a file which was the last piece of the puzzle. Everything went smoothly afterward.

AI Pros & Cons

Overkill Solution

The script was written formally with separate sections for arguments and error checks.
This is practical for large teams, but overkill for a personal project.

LLM gave a complete solution including:

These are all good options if I have the following requirements:

Things got clearer once I decided to keep the project scope minimal.

LLM also helped with implementing best practices in a text-based data store system. We could explore the idea in another article.

Failed To Deliver Simple Solution

This one is debatable. I thought giving accurate prompts would help the LLM deliver simpler solutions.

But maybe simplicity is just a matter of taste.

Can’t tell if it was me being grumpy or LLM failed to express minimalism 😓

Repeating Themselves For The Same Mistake

This one is hilarious and interesting at the same time.

I saw the LLM make the same mistake and then immediately correct itself—twice in the same response!
(This happened with two different LLMs, once each.)

I cannot tell if there’s another thread to monitor the response and correct themselves dynamically during a response.

If not, it would be ridiculous if LLM already knew the answer, and still deliberately threw out a wrong answer first 🤣🤣

End Of The Article

Enough words for me today 😮‍💨

It turns out skipping all technical/syntax explanations can still be really lengthy!

Hope you learned a thing or two here. See you in the next one 🤓



No comments yet

Be the first to comment!