patrickdevlin21.github.io

Here's a thing I did!

Last Resor(d)le

This is a script written by Pat Devlin to help solve the popular wordle puzzle. Simply enter each word that you’ve guessed so far into the text box. Select the revealed color of each letter by clicking on the corresponding square. See below for more.

New word:


Click above to solve

Output

Number of solutions:

Suggested Guesses:

Random guess:

BestEntropy guess:

BestMax guess:

BestAverage guess:

List of possible answers:

*To avoid spoilers, you can select that the output of the function be hidden. If so, individual pieces of output can then be revealed or hidden by clicking on their corresponding text.

About

What can this site do?

Maybe you want to know...

Performance of algorithms

This site has three simple algorithms to solve the wordle in hard mode. Running each of these algorithms for all possible wordle answers gives us the following distribution for the number of guesses needed.

Here, the algorithms have almost identical distributions with the largest difference being that BestEntropy does a better job at guessing the word in 3 guesses (especially relative to BestAve). The best of these algorithms is BestEntropy (requiring 3.60 guesses on average) followed by BestMax (averaging 3.62 guesses), and the worst is BestAve (averaging 3.67 guesses). Each of these simple algorithms guesses the word within 6 tries with the following exceptions:

BestEntropy BestMax BestAve
pound 4 4 7
water 4 7 7
winch 5 7 5
shave 5 7 7
wound 5 7 8
wight 7 5 7
foyer 7 7 7
graze 7 7 7
match 7 7 7
swore 7 7 7
tatty 7 7 7
waste 7 7 7
willy 7 7 7
goner 8 8 8
watch 8 8 8
Interestingly, there is a considerable amount of overlap among the worst words for these algorithms, with "goner" and "watch" being especially difficult to guess. If desired, these guessing strategies could easily be improved to completely avoid using more than six guesses (e.g., simply running them not in hard mode accomplishes this), but I find the above data interesting since it describes how these very natural pure algorithms perform in hard mode.

How does this all work?

(Description below is about to get more technical for those who care)

Big picture

This site (statically hosted by github) was built simply by writing its index file in html, css, and javascript. The key functionality is to make an API call that invokes an AWS lambda instance that I set up. This API call passes the board (with colors) as input to a python script that I wrote, and it receives a handful of outputs which are used to populate the page.

The repo for the stand-alone python code is available here (for the lambda function, naturally I had to add a script that handles the API request, but this was done in exactly the way you're imagining). To handle scaling [and out of fear of paying AWS some extra pennies in hosting fees], I also slightly optimized the python code for the lambda function by speeding up all calls by roughly a multiplicative factor of 10. Memory allocation per call is minimal, so no additional effort was made to optimize that.

Algorithms

The back-end is written in python, which was honestly just chosen for the sake of variety. Another choice for a project like this could have been C or C++, especially since it's relatively easy to imagine writing many simple procedures, which we'll want to call many times (for instance, if we wanted to explore the exponential-size min-max decision tree for this problem). Although a low-level compiled language would ultimately end up being substantially faster, I figured I'd do this one in python because I had just finished another project in C++.

The Random algorithm just outputs a random word consistent with the guesses thus far. Each of the other three algorithms does the following (as described using the language of set theory):

For us, the score of a potential guess will be a function of how that guess partitions the set of consistent guesses, and in particular, it will actually just be a function of the word class sizes (i.e., it will be a function of the multiset of block sizes of the partition). To play with this, you can call splitIntoWordClasses([potentialWord], setOfConsistentWords) from the repository.

For example! Suppose we intially have no information, and we are considering guessing "jazzy". Then the following is true:

Then the score of "jazzy" is a function of the numbers (1111, 2, 10, 1, ...). This sequence of 'class sizes' tells us that although jazzy is sometimes a convenient guess [e.g., if the j is green and the a is yellow, we know the answer must be JUNTA], it is often times lousy (e.g., half the time, all we find out is that it doesn't share any of those letters).

Although we lose information by only considering this sequence of numbers (1111, 2, 10, 1, ...)---e.g., if we land among the 10 guesses where J is green, how easy it is to distinguish those?---heuristically, this sequence of numbers is likely pretty telling of how good a guess "jazzy" is, and we just need to decide how to linearly order such sequences so that we can pick a 'best' choice. There are a bunch of good options, but three especially natural ones are the following:

Analysis of algorithms

Each algorithm finds the next guess in time O(mn), where m is the number of possible words consistent with our information, and n is the set of words we're allowed to say. In hard mode, we have m=n, so this runs in order O(m^2) time. If we are given a dictionary and a set of previous guesses, we can find the set of words consistent with these guesses by just one pass through the dictionary*, so this procedure is done in linear time. As for space, each algorithm only needs O(1) memory.

*There's an interesting question here about how this would scale for a dictionary of size N and a set of G guesses. Unfortunately, for us this point is purely moot since English uses an alphabet of constant size, and the words must be five letters long. Thus, the information in the guesses very quickly becomes redundant (e.g., we can only get at most 5 letters colored not-black, and a black letter conveys the same information regardless of where it appears in the word). In the general setting, I would conjecture that we could check N words against G different guesses in time O(N) + o(N*G) [thereby beating the trivial O(N*G) algorithm], but it would probably depend on what a "word" is...