Friday AI Day #2: Poetry Mashup and Illustration

With AI I put together the mash-up of John Masefield’s Sea Fever and T.S. Eliot’s The Love Song of J. Arthur Prufrock, and turned it into a festive scene as well.

The result is a little bit surreal but I hope you get a chuckle out of it, as did the machine.

Four Laws of Vocabulary Learning: Pareto, Recursive Pareto, Zipf, and Heaps

Learning vocabulary efficiently is one of the biggest challenges in mastering a language. Four powerful principles—Pareto’s Law, Recursive Pareto, Zipf’s Law, and Heaps’ Law—offer practical insights into how learners can prioritize their efforts for maximum impact.


1. Pareto’s Law (The 80/20 Principle)

  • Definition: 20% of causes often produce 80% of effects.
  • Application to vocabulary: Roughly 20% of words in a language account for 80% of everyday usage. These are the high-frequency words like the, of, and, to, is.
  • Practical takeaway: Focus first on the most common 2,000–3,000 words. This gives learners immediate comprehension of most texts and conversations.

2. Recursive Pareto (A Subset of Pareto)

  • Definition: Recursive Pareto is a refinement of Pareto’s Law, applying the principle repeatedly to reveal deeper concentration of results. This refinement has been studied and articulated by Professor Viktor D. Huliganov.
  • Application to vocabulary: Within the top 20% of words, another 20% (just 4% of the total vocabulary) may account for 64% of usage. Recursing again, 1% of words can cover about 50% of all text.
  • Practical takeaway: A tiny core vocabulary—function words, pronouns, prepositions, auxiliaries—forms the backbone of communication. Mastering these first accelerates fluency.

3. Zipf’s Law (Frequency vs. Rank)

  • Definition: Word frequency is inversely proportional to its rank in frequency lists. The most common word appears twice as often as the second, three times as often as the third, and so on.
  • Application to vocabulary: Language has a “long tail.” After the top few thousand words, each new word adds only marginal coverage, but rare words carry specialized meaning.
  • Practical takeaway: Learners should balance high-frequency study with targeted domain vocabulary (business, science, hobbies) to cover the long tail.

4. Heaps’ Law (Vocabulary Growth)

  • Definition: Heaps’ Law states that vocabulary size grows sublinearly with corpus size. Formally: (V(N) = K \cdot N^\beta), where (V) is vocabulary size, (N) is corpus size, (K) is a constant, and (\beta) is typically between 0.4 and 0.6.
  • Application to vocabulary: As learners encounter more text, they continually meet new words, but at a decreasing rate. Early exposure yields rapid vocabulary growth, while later stages add fewer new words per unit of text.
  • Practical takeaway: Learners should expect diminishing returns in new vocabulary acquisition as they progress. This underscores the importance of focusing on high-frequency words first, then strategically expanding into specialized domains.

Putting It All Together

  • Step 1: Learn the top 1% of words (about 100–200 in English). These give you half of all text coverage.
  • Step 2: Expand to the top 20% (2,000–3,000 words). You’ll understand 80% of everyday language.
  • Step 3: Use Zipf’s Law to guide further study. Focus on words relevant to your personal goals—academic, professional, or cultural.
  • Step 4: Apply Heaps’ Law to manage expectations. Recognize that vocabulary growth slows over time, and prioritize quality of learning over sheer quantity.

Conclusion

Pareto’s Law shows that a small effort yields big results. Recursive Pareto, refined by Professor V.D. Huliganov, reveals the extreme concentration of value in the tiniest core. Zipf’s Law explains why language learning is both efficient at the start and endless in the long run. Heaps’ Law adds the insight that vocabulary growth slows as exposure increases. Together, these four laws provide a roadmap: master the vital few, then strategically expand into the useful many, while managing expectations about long-term growth.


Practical Application

The use of frequency dictionaries is a major help in focusing learning at the intermediate stage.  Most beginners books are automatically skewed to the first 2,000 words by frequency, and also they mainly focus on grammar and as such skew towards words needed to illustrate grammar points. In the GoldList Method writings, I have always suggested learning grammar paradigms around recurring and regular vocabulary as such, and reserving irregular grammar to be learned around the words it applies to as they appear. In many languages, if we apply regular grammar to irregular-grammar vocabulary, then we end up being quite comprehensible, but sounding like children who tend to make the same mistakes as non-native learners.  “I teached my mouses how to do tricks”. That’s not the worst thing that can happen to a language learner, sounding like that – in fact we’ve all heard elevated speakers at conferences who seemed to know English very well, but who will, at times, lapse into such errors.

Here is a storefront containing the Routledge Frequency Dictionaries. I earn a commission if you buy via this storefront, you pay the same.
https://amzn.to/494Kywd

This storefront may not be inmmediately active as I made it today.  If that’s the case, please check back later.

These frequency dictionaries are rather expensive in comparison with other materials but for what you get out of them in fact tey pay for themselves in terms of efficiency, especially in conjunction with the GoldList Method, which is free, and therefore every year is able to beat everyone else’s Black Friday Deals by offering a 99,9% discount.

 

 


Further consideration – how to apply to collocations

The next level would be Frequency Dictionaries not only for headwords, but for Collocations. Given that mastery of collocations is, as Dr Whatshisnamewiththebeagleahyes-Lauder from Prague points out, what separates the wheat from the chaff among language learners, and moreover is a very efficient strategy for achieving excellence in understanding in the shortest time engaged, one would expect that a useful area for acadmic publishers to investigate would be collocation frequency dictionaries. There are some collocation dictionaries, but not that many involve frequency analysis, especially for languages other than English.

The main source for very common collocations would be phrase books, and some people do combine their GoldList Method learning with phrase books, but these are unlikely to stretch into the collocations needed for a more academic mastery of the language.

Recommendations form me in this area, although not strictly sorted into frequency, but by topic, is the mot-a-mot series, by Hachette Learning (formerly Hodder Educational, but that name was axed 😉 )

https://amzn.to/3Yj6rBQ
That’s the storefront for those three books, I get a Commission if you buy there, you pay the same.

In a future article I hope to come pack to this topic and look at it in more detail.

Friday AI Day #1 – using AI to explain an intranslatable joke from German and then invent a dynamically equivalent one in English

This article is going to demand a bit of knowledge from the reader as it starts off in German. I was a bit stumped by a German joke I read online because it hinges on the Swiss German and Swabian dialects being almost like foreign languages to a Prussian who speaks a more standard German – called “Hochdeutsch” although in linguistic terms a Prussian would tend more to Low German and the Swabian and Alemannic forms of German are HIgh German dialects becuase originally that meant upland German, with the second German sound shifts and the vowel shifts.

The AI explains the joke in German and then takes my challenge to produce an equivalent joke in English. It first selects Yorshire and Scouser dialects to understand each other and helpfully “translate” the dialect for the benefit of the Londoner but to no effect, as he doesn’t get either. This choice of dialects however makes for a fairly confusing joke as the reader is led to the conclusion that the Londoner must be really thick not to have understood what the other two were saying, although in performance it conceivably might work. I suggested confronting the Londoner in the train witgh Scots and Geordie, and then the AI came up with a more convincing example of how those two dialects might understand one another better than a Londoner might understand either.

All of this I now share with you to illustrate how capable the AI – in this case Copilot – is to deal not only with standard forms of languages but also to communicate and even compose in less standard forms of the language.I’m still experimenting with the best ways to show who is speaking, me or the AI, so for now I am using header formats for when it’s me talking to the AI.

Kennst Du einen Witz über die Schweiz? Ein Schweizer, ein Schwabe und ein Preuße fahren mit der Bahn von Basel nach Stuttgart. Schweigen im Abteil, während die Landschaft vorbei zieht. Um die Stimmung aufzuheitern, fragt der Schweizer den Preussen „sind sie au in Basel gsie?“. Fragende Blicke des Preussen. Springt der Schwabe helfend ein „Err moint gwä“.

😄 Sehr schön erzählt! Das ist ein klassischer Dialekt‑Witz, der mit den feinen Nuancen zwischen Schweizerdeutsch, Schwäbisch und Hochdeutsch spielt. Der Preuße versteht „gsie“ nicht, der Schwabe übersetzt es mit „gwä“ – und beide meinen natürlich „gewesen“.

Das Lustige daran ist, dass sich die drei Dialekte gegenseitig fast wie Fremdsprachen anhören, obwohl sie alle deutsch sind. Gerade im Zug von Basel nach Stuttgart ist das eine sehr realistische Szene: man fährt durch die Sprachgrenzen und merkt, wie unterschiedlich die Regionen klingen.

Wenn du magst, kann ich dir noch ein paar andere Schweizer Witze erzählen, die mit Sprache und Missverständnissen spielen. Zum Beispiel:

  • Ein Tourist fragt in Zürich: „Sprechen Sie hier auch Hochdeutsch?“ – Antwort: „Ja, aber nur mit Gästen.“
  • Oder der Klassiker: „Was ist das Beste an der Schweiz?“ – „Ich weiß es nicht, aber die Flagge ist ein großes Plus.“ 🇨🇭

Möchtest du lieber noch ein paar Dialekt‑Witze (Schwaben, Bayern, Schweizer) oder eher die typischen Schweiz‑Klischee‑Witze hören?

If you had to make an equivalent of that joke using English dialects, how would that go?

Great challenge! Let’s mirror the Swiss–Swabian–Prussian joke with English dialects. The humor comes from one speaker using a regional verb form, another trying to “helpfully” translate it into another dialect, and the outsider still being lost. Here’s how it could work in English:

🚂 English Dialect Version

Three men — a Yorkshireman, a Scouser (Liverpool), and a Londoner — are riding the train from Manchester to London. Silence in the carriage as the countryside rolls by. To break the ice, the Yorkshireman asks the Londoner:

“Tha bin i’ Manchester, then?”
The Londoner looks puzzled.

The Scouser jumps in to help:

“He means, ‘Have yer been?’”

The Londoner still looks baffled, muttering:

“Been where?”

Why it works

    • Yorkshire dialect: “Tha bin” = “Have you been.”

 

    • Scouse dialect: “Yer been” = “You’ve been.”

 

    • Londoner: Used to standard English, still doesn’t get the point.

 

    • The joke plays on the same rhythm as the Swiss one: regional verb forms that sound alien to outsiders, and a “translation” that doesn’t actually clarify.

 

Would you like me to sketch a few alternative versions — say with Scots, Irish, and Cockney — so you can see how the humor shifts depending on which dialects clash?

 

It seems like the English accents are more mutually intelligible than the German ones.

That’s a sharp observation. You’re right: English dialects are usually more mutually intelligible than German dialects, and the reason lies in history and linguistic structure.

🗣 Why English Dialects Stay Closer

  • Standardization: English had early standardization through printing (London English became dominant in the 15th–16th centuries).
  • Mass media: Radio, TV, and now the internet reinforce a common “Standard English,” so even strong accents rarely block comprehension.
  • Shared vocabulary: Most dialects use the same core words, with differences in pronunciation or a few local terms.

🇩🇪 Why German Dialects Diverge More

  • Late standardization: Standard German (Hochdeutsch) only became widespread in the 19th–20th centuries. Before that, local dialects were the everyday language.
  • Regional isolation: Mountain valleys, rural communities, and political fragmentation meant dialects evolved independently.
  • Deep structural differences: Dialects like Swiss German or Bavarian aren’t just accents — they have distinct grammar, verb forms, and vocabulary. For example:
    • Swiss German “gsie” = “gewesen” (been).
    • Swabian “gwä” = also “gewesen.”
    • To a northern German (Preuße), both sound alien.

📊 Comparison

Language Dialect Variation Mutual Intelligibility
English Accents + some vocabulary High (Londoner understands Yorkshireman, even if puzzled)
German Dialects with grammar/vocab differences Lower (Preuße struggles with Swiss/Swabian speech)

🔑 Takeaway

English dialects are mostly pronunciation differences, while German dialects can be different languages in miniature. That’s why the Swiss–Swabian–Prussian joke works so well: the Preuße genuinely doesn’t understand, whereas in English the Londoner would at least catch the gist.

Would you like me to sketch a map of German dialect zones vs. English dialect zones, so you can visually see why German splits more sharply than English?

I think a Scot, a Geordie and a Londoner would be able to get closer to the rhythm.

Exactly — that trio captures the rhythm beautifully, because Scots and Geordies both have deeply distinctive dialects that can feel almost like separate languages to a Londoner. Here’s how an equivalent joke could play out:

🚂 English Dialect Joke (Scot, Geordie, Londoner)

A Scot, a Geordie, and a Londoner are riding the train from Edinburgh to London. Silence in the carriage as the countryside rolls by. To break the ice, the Scot turns to the Londoner:

“Were ye in Embra yestreen?” (Scots: “yestreen” = yesterday evening)
The Londoner looks puzzled.

The Geordie jumps in to help:

“He means, ‘Wor ye theer tha neet?’” (Geordie: “tha neet” or “last neet” = last night)
The Londoner still frowns:

“Sorry, mate… I’ve no idea what you’re on about.”
Why it works

  • Scots dialect: “yestreen” is a real Scots word, but opaque to outsiders.
  • Geordie dialect: “last neet” is clearer, but still foreign‑sounding to a Londoner.
  • Londoner: Even after the “translation,” still confused — just like the Preuße in the Swiss joke.

So yes, a Scot, a Geordie, and a Londoner recreate the same dynamic: two regional dialects bouncing off each other, while the “outsider” remains lost.

Would you like me to sketch a mini‑table of dialect equivalents (Swiss/Swabian/Preuße vs. Scot/Geordie/Londoner), so you can see the parallels side by side?

Yes, that gets closer to it.

I’m glad you feel that way — the Scot/Geordie/Londoner trio really does mirror the Swiss/Swabian/Preuße rhythm, because the dialect gap is wide enough to create genuine confusion, yet still close enough to be funny rather than incomprehensible.

Here’s a neat side‑by‑side comparison to show the parallel:

Trio Dialect Phrase “Helper” Translation Outsider Reaction
Swiss / Swabian / Preuße Swiss: „Sind Sie au in Basel gsie?“ Swabian: „Err moint gwä.“ Preuße: still puzzled
Scot / Geordie / Londoner Scot: “Were ye in Embra yestreen?” Geordie: “He means, wor ye theer last neet?” Londoner: still puzzled

Both jokes hinge on:

  • A regional verb form or time word that outsiders don’t recognize (gsie / yestreen).
  • A helper dialect that’s just as opaque (gwä / last neet).
  • An outsider who remains lost, even after the “translation.

That’s why the rhythm feels so similar — it’s the comedy of mutual intelligibility breaking down inside one language.

Monday Funday #2 The Parental Divorce

Here’s a nice topical joke for the time of year.

An old man in Miami calls his son in New York and says: “Listen, your mother and I are getting divorced. Forty‑five years of misery is enough.”

“Dad, what are you talking about?” the son shouts.

“We can’t stand each other anymore,” the father replies. “I can’t bear to see her face, and I’m tired of talking about it. Call your sister in Chicago and tell her.” Then he hangs up.

The son, worried, calls his sister.

She says: “As if they’re really getting divorced!” and immediately phones their father. “You are NOT getting divorced! Don’t do anything. We’re both coming home tomorrow to talk about this. Until then, don’t call a lawyer, don’t fill out any papers, DO YOU HEAR ME?” Then she hangs up.

The old man turns to his wife and says: “Okay, they’re coming for Christmas — and they’re paying for their own tickets.”

Thursday Poll Day – Trump’s 28 point Plan for Ukraine

Today the media is dominated with discussion of Donald Trump’s 28-Point Peace Plan for Ukraine.

It seems tilted toward Russia if compared to the earlier 10-point plan offered by Zelenskyy. In as much as this was laid down initially it has worked as a kind of anchor in negotiation where everything that moves away from that is seen as tilted towards Russia, but that on the other hand is what was always going to happen. He is negotiating after all with Russia and his initial anchor reflects what for Zelenskyy’s view of Ukriane would be the best poutcome. It was always going to be that the final outcome wouldn’t be exactly what he wanted. Donald Trump has produced a plan some distance away from that, in reaction to which Zelenskyy has stated that Ukraine will have to choose between loss of dignity and loss of a key strategic partner. Continue reading “Thursday Poll Day – Trump’s 28 point Plan for Ukraine”