Wordle

Half hangman, half mastermind, Wordle is a fun new obsession for half of the internet. Fun as it is to try and remember 5 letter words, what if I got the computer to solve it?

The Problem

You get 6 guesses to find the word of the day. Each guess is assessed and you find out which letters you guessed correctly (i.e. appear in the word) and if any of those are in the right position in the word. It looks like this:

A four-row grid of white letters in colored square tiles, with 5 letters in each row, reading ARISE, ROUTE, RULES, REBUS. The A, I, O, T, and L are in gray squares; the R, S, and E of ARISE, U and E of ROUTE, and U and E of RULES are in yellow squares, and the R of ROUTE, R and S of RULES, and all letters of REBUS are in green squares.
Grey means the letter isn’t in the word
Yellow is in the word, but not in the right place
Green is correct

In this example, the first guess is pure chance.

There is one word each day.

The Solution

My first solution will run through 3 steps:
1. Read in a defined wordlist
2. Read in any previous guesses with the results
3. Strip out words from the wordlist based on the guesses
4. Generate a new guess

https://github.com/alexchatwin/wordleSolver

1. Getting a defined wordlist

I don’t think there’s any way to do this without starting with a list of possible words. without this we’re just guessing symbols, and part of the nature of the puzzle is the players knowledge of english words.

I’ve chosen to use https://www-cs-faculty.stanford.edu/~knuth/sgb-words.txt, but I gather the actual seedlist for the Wordle game is available in GitHub

2. Reading in previous guesses

Each guess generates information about the true word.

Grey – the letter is not in the word at all
Yellow – the letter is in the word, but not where the guess has put it
Green – the letter is in the word and is in the right place

There are some subtleties to the yellow and green options – in both cases there is the potential for a letter to appear twice, it’s easy as a novice player to incorrectly assume green means job-done for that letter. Equally yellow is as much about narrowing down valid letters as it is about restricting the possibilities for the square.

3. Strip out words based on the information from the guesses

I’m going to loop through all the remaining words on the wordlist and prune them in one of 3 ways:

  1. Remove any word containing any number of grey letters
  2. For yellow letters, remove words with that character in that position, and also words where that character isn’t in at least one other position
  3. For green letters, remove any words without that letter in that position

By gradually sifting through the words, I end up with a smaller list of possible words which fulfil the ‘rules’ as they evolve.

4. Generate a new guess

I want to use some tactics with each guess, it’s not helpful to have the algorithm only change a single letter at a time, or I’ll run out of guesses. The game also requires each guess to be a real word.

‘Splittiness’

I’m going to use an approach similar to a decision tree for v1. I’ll start by working out how many words each letter appears in, and then what fraction of the total words that represents:

S46%Y15%
E46%M14%
A38%H14%
R31%B12%
O29%G11%
I27%K10%
T25%F9%
L25%W9%
N21%V5%
D19%X2%
U19%Z2%
C16%J2%
P16%Q1%

Here S & E are the ‘winners’ – they’re each present in almost half of the words in the list. This is useful because guessing a first word with them will allow me to split the wordlist most evenly and remove/include a large number of possible words, reducing my search space considerably.

For later versions, I want to look at the overlap between them, e.g. if most words containing S also contain E, it might be that the first guess is better to skip E, and try the next letter down.

The best starting word, the word which will give me the best split between words I should include and remove contains S,E,A,R,O

Anagramming

SEARO is, unfortunately not a word, so the next step is to try the different permutations of the letters until I get something which is (more-so, something which is a word on my wordlist – so it’s still a viable guess).

First guess is easy enough, a simple iteration through the permutations of the letters comes up with – AROSE. I can use all the letters in a viable word, but I need a strategy to ensure I don’t get stuck further down the line. Even one letter down, EAROI doesn’t yield a valid 5 letter English word.

My solution is to add another level to my anagram solver, My preference is to have the 5 most splitty letters, but I’d take 1,2,3,4,6 if they didn’t work, and 1,2,3,4,7 after that. I now have a way to ensure that I’ll be able to sweep the full range of available letters, hoping I’m stepping ever nearer the right answer.

Rinse and repeat

Putting AROSE into the puzzle gives me

And then I repeat the process, putting in the results and asking for the next best word to try, until the board looks like this:

This is Wordle 215, from 20/1/22, and with only 4 guesses needed, 1 better than my personal attempt!

There’s a tangible sense of the algorithm building on what it learns each round.. but also a tedious inevitability that it will reach a valid answer at the end.

I suppose part of the challenge of wordle is the need for a human to remember all the possible words and to make efficient choices. This bit of coding has helped me understand the puzzle at a lower level.

Update – 21/1/22

Wordle 216 6/6

⬜?⬜⬜⬜
⬜?⬜?⬜
???⬜⬜
⬜??⬜?
⬜????
?????

Today’s word was solved, but I learned 2 things:
1. My algorithm will happily guess double letters which it has no evidence exist in the word – I think this is sub-optimal
2. If you guess the same letter twice, and it’s in the word once, you’ll get two yellow blocks if they’re in the wrong places, but a green and a grey (not yellow) if one of them is correct

Updates to follow!

Leave a Reply

Your email address will not be published. Required fields are marked *