Random Spice

I’m a bit bored of the herbs and spices in my cupboard, I wonder if I can make a better one?

Wouldn’t it be exciting?

The Problem

I want to create a new herb or spice, it needs a name and a description.

I want to ingest (haha) all the herbs and spices in my cupboard (the ‘legacy’ group) and then use that to generate new ones.

The Approach

I’ll start with a simple N-gram text generator. This works by ingesting a corpus of existing valid input, and ‘learning’ how to make valid outputs by using combinations it knows, for example:

Lets train on the valid words [“THE”, “THIS”], and assume we only look at 1-grams (1 letter at a time). T is the first letter, T is always followed by H, H can be followed by E or I, I is always followed by S. This gives the program the rules it can use to make ‘new’ words. An iterative loop can be used, for example:

This is a pretty straightforward example

This recreates the word ‘THE’ by randomly choosing next letters which it has seen following the last letter. The possibilities get more complex with a greater selection of words, as we’ll see.

The steps for the problem are

  1. Generate some new names based on the legacy group
  2. Generate a description
  3. Look at the optimal parameters

0. Make an N-gram function

(I assume this already exists, but I’m not going to learn by pip installing things)

I’ll take N-gram to refer to the cutting up of parts of my text input into groups of N-size.

Here are some N-gram breakdowns of the initial phrase

Here I’ve done it by character, with increasing values of N. If I take the simplest 1-gram, I can make a list of which letters follow others:

H only has one option (E), but L could be one of 3 options (L,O,D). _ is a space

I’ll get that working in Python, but I want the flexibility of selecting a value of N, and also the option to have separate words treated as distinct pieces of information (in the HELLO WORLD example, O wouldn’t be followed by _ and _ wouldn’t be followed by W)

inputList = ["HELLO", "WORLD"]
#Set value of N
N=2

#Get all the N grams (as strings, for N>1)
nGrams = [["".join(input[n:n+N]) for n in range(len(input)-N+1)] for input in inputList]
#I think it's to n+N+1 rather than n+N because the range isn't inclusive.. I should check that

#Within each list of n-grams, return it and the next character into a list
links = [[[subarr[i], subarr[i+1][N-1]] for i in range(len(subarr)-1)] for subarr in nGrams]

#Flatten the list of lists
flatLinks = [subLink for link in links for subLink in link]

print(flatLinks)

This is for N=2. You’ll see the initial 2-gram is followed by the single character which would come next. If this was going to be much longer or more complex, I’d find a more efficient structure.

1. Generate a new name

In order to make this I’ll need a starting list of names. I’ve borrowed this from Wikipedia.

That’s 200% more basil than I was aware of

The same code will now generate a much longer list of the different N-gram associations

You can still make out the individual legacy names

The process is now to create my new name using a loop, to start I’ll randomly pick one of my 2-grams to use as a seed, then:

  1. Look-up every instance of the last 2-gram in the name
  2. Randomly choose one of the associated next letters
  3. Add this to the name
import random
output=""
lastbit = random.choice(flatLinks)[0]

output=output+lastbit

for i in range(10):
    print("lastbit")
    print(lastbit)
    matchList=[n for n in flatLinks if n[0]==lastbit]
    print("matchList")
    print(matchList)
    lastbit=random.choice(matchList)
    print("lastbit[1]")
    print(lastbit[1])
    output=output+""+lastbit[1]
    print("output")
    print(output)
    lastbit="".join(output[-N:])
print(output)
I assume this is how my cats think, just replace letters with ‘EAT’, ‘HUNT’, ‘SLEEP’

And here’s an example. The code has started randomly with the letters RE. That gives options of S, G, L, E, A and S. It randomly chooses G, adds it to the output name, and then recalculates the last 2 letters of the name. The loop repeats for 10 letters, and gives us our new name:

I’m pronouncing it Re-galable

Regallaball! A brand new spice (or herb?) and I didn’t even have to leave the house!

But what’s it like?

2. Generate a description

My existing n-gram generator will almost work for this, but I need to make a tweak. I’m going to be using whole words, rather than individual letters to build up the description, so I need to add a split() in to the initial n-gram list comprehension:

nGrams = [["".join(input[n:n+N]) for n in range(len(input)-N+1)] for input in inputList]
#becomes
nGrams = [[" ".join(input.split()[n:n+N]) for n in range(len(input.split())-N+1)] for input in inputPhraseList]
#I've also changed the name of the input variable

My input now needs to be some relevant descriptions of the legacy group, the wikipedia descriptions are a bit formal (and specific) and I don’t think they’ll lend themselves to generating meaningful descriptions, so after a bit of googling I found this website with some which I’ll borrow blog.mybalancemeals.com

They aren’t perfect, especially when they’re self referential, but I’ll use this for starters. Just a small tweak to the output generation code (to put spaces between the words) and we’re ready to generate!

Ah. I’ve got a problem. This could have come up in the name generator, but using combinations of letters made it less likely. The problem is that the code is asking for a combination of lookup words which doesn’t exist. The debug trace show us why:

No problems yet

My new creation is called ‘Safronka Cub’, and as the description builds, it starts out well. Safronka Cubs are commonly used herbs..

IndexError: list index out of range

But once we hit the end of a sentence, where there is no viable continuation of the loop, as the list of matches is empty. Let’s code around that:

for i in range(60):
    print("lastbit")
    print(lastbit)
    matchList=[n for n in flatLinks if n[0]==lastbit]
    if (len(matchList)>0): #protect against lookup errors
        #print("matchList")
        #print(matchList)
        lastbit=random.choice(matchList)
        #print("lastbit[1]")
        #print(lastbit[1])
        output=output+" "+lastbit[1]
        #print("output")
        #print(output)
        lastbit=" ".join(output.split()[-N:])
print(output)
Side effect of this project is a lot of snacking

Which, critically, is not in my input text (well, mostly)

Probably a bit too close to be genuinely novel

3. Look at the parameters

The key parameter in my N-grams is the N. Longer and it will produce more realistic things, but it will do that by copying more and more of the original. Shorter and the output will be novel, but garbage. With a bit of refactorising, I can generate a series of different names, varying the N parameter.

N=1N=2N=3N=4N=5
IARYMORREDHERY SEEDOARBEANCUMINHERVIL
AZETAF PAJIALABASMARDAMPEPPERIFERILSHOLTZIA CILIAYNGIUM FOETIDA
TOLILERRDIPEEDOATIDUMINERICELYTHRUMFERIA GALANGAL
LAVAMOLEMEKPEREEDOARONERICELERYNGIURDAMOMMUSTARD
IGELITSEEDOANDER SEEDEPPERCORNPEPPERCORN
NDILATITZEML ANISELEMPFAMPHORSER GALSTARDIMNOPHILA AROMA
NGERANTIGEGRY LIA ANS OERILLAPEPPERCORNOLY BASIL
PABAREBRISSTSERUEROLIVIAN PEPPUZU (ZESTEMON MYRTLE
IARASAN AGALIANDERUVIETNNELJIMBUIGELLA
SASSETUMALEIANGIND DILITIDUMLEAFARADISH
Blue are those which my brain sees as possible genuine names, red are those which are copies, or almost copies of the genuine once

Length aside, the trend seems pretty clear, as you increase N, the number of valid responses increases, but above 4, it’s dominated by copies of the original names.

The number of inputs is quite small, and the nature of the inputs means that beyond 3, there is almost no randomness when the name is built.

Here’s the same thing for the descriptions (red text is a straight copy of the original).

N=1 (gibberish)

  • as strong pine flavour you can be consumed fresh, dried, and Jamaican jerk chicken.
  • number of cooking, pairing often compliments the delicate leaves mostly complement fish tacos to the flavourful power duo in curry powder mostly consists of dill here, including bistromd’s mini pies using simple ingredients to preserve the strong and avocado!

N=2

  • salads, baked potatoes, creamy potato salad, deviled eggs, and can be added into butters, vinegars, and sauces for added flavor depth.
  • a flowering plant with the seeds added to the whole garlic head in these roasted garlic velouté sauce to the taste of soup, others who enjoy it describe it as a mild tasting parsley with a number of cuisines. the nutty flavor is widely used in soups, stews, and roasted dishes. in addition to rosemary in the pantry and offers an accelerated

N=3

  • coming from a hot chili pepper, cayenne pepper offers spice to a number of dips, complement various casseroles, along with these other 25 ways to use parsley.
  • the strong pine flavor of rosemary pairs well with various flavors, including this chocolate mint smoothie and fizzy blueberry mint drink. along with its value in numerous recipes, the herb also provides an extensive number of health benefits.

N=4

  • are commonly used in soups, stews, and for pickling hence “dill pickles.” find eight flavorful recipes to use a bunch of dill here, including grilled carrots with lemon and dill, zucchini with yogurt-dill sauce, and golden quinoa salad with lemon, dill, and avocado!
  • the end of cooking, pairing well with mexican dishes, fish, or soups and salads.

N=5 (straight copy)

  • a spice described as strong and pungent, with light notes of lemon and mint. for the freshest flavour, purchase whole cardamom pods over ground to preserve the natural essential oils. find over 30 cardamom recipes here.
  • note to a number of soups, stews, and roasted dishes. in addition to rosemary in the skewer recipe provided above, thyme further compliments the meat.

Again, beyond N=3, the algorithm is locked to the original source text, even at <=3, it’s typically returning odd chunks of text.

4. Refining the descriptions

Although I’m happy with the code, the descriptions I’ve scraped aren’t working well. I’ve found another source, which is a bit more focussed on the flavour. I’ve also tweaked the definitions to remove any mention of the original name. Finally I’ve tweaked each to start with ‘This..’ so I can be sure that my descriptions start at the beginning of the sentence.

The code will also flag any iterations which yield results which are identical, or substrings of the inputs.. and here’s what I get (corrected here for British spelling).

5. The result

Uzaziliata
This can be sprinkled onto or into sauces, pastas, and other dishes to add a nutty, cheesy, savory flavor.

Avender Seed
This herb smells like maple syrup while cooking, it has a mild woodsy flavor. can be used to add a nutty, cheesy, savory flavor.

Tonka Berroot
This is Sometimes used more for its yellow colour than its flavour, it has a mild woodsy flavour. Can be used in both sweet baked goods and to add depth to savoury dishes. (it’s also almost a google-whack!)

Pepperuviande
This Adds sweet smokiness to dishes, as well as grilled meats.

Any resemblance to spices living or dead..

Much better, right?

Here’s the code https://github.com/alexchatwin/spiceGenerator

Charting my Smart Heating system

Background

When the baby was born, I championed the installation of a new ‘Smart’ heating system. Our house is very much one that jack built, and the temperature can vary widely between rooms based on things like wall/window quality, wind, sunlight. It’s very useful to be able to control the radiators in each room based on what the room needs, rather than centrally applying a schedule, and trying to tweak the radiators.

I opted for Tado, based on a few factors. I’m not 100% sure it was the best option, but it’s really improved the balance of heating across the house. One of the reasons I chose Tado was the ability to query and make changes via the API.

The problem

“Won’t the baby get cold?”

It’s a throwback to an old technology, those of us born before digital stuff became so cheap are very used to anachronisms like turning the heating up higher to ‘make it warm up faster’. I’m keen to find a way to use the API to demonstrate to my wife that the baby’s room is

  1. The right temperature
  2. Below the heating capability of the system (i.e. if we had a blizzard, the radiator could call on extra heat to cope)
  3. There’s no need to change the settings

The steps

  1. Get data from the Tado API
  2. Format it into something useful
  3. Decide how to visualise it

1. Get data from the Tado API

I’ve done some messing with Tado before, and I made another post about automating the getting of a bearer token for access, based on some environment variables with my credentials. Given that, and the limited knowledge of the APIs contents, gleened from much googling, I created the following methods:

    def getTadoHome(): #From our auth credentials, return the homeId 
        response = requests.get('https://my.tado.com/api/v1/me', headers={'Authorization': 'Bearer'+auth.returnTadoToken()})
        return response.json()['homeId']

    def getTadoResponse(zoneId, date): #Return a response object for the given zone, on the given day
        endPoint='https://my.tado.com/api/v2/homes/'+str(getTadoHome())+'/zones/'+str(zoneId)+'/dayReport?date='+str(date)
        response = requests.get(endPoint, headers={'Authorization': 'Bearer '+auth.returnTadoToken()})
        return response

    def getTadoRoomTempReport(response):
        response=json_normalize(response.json()['measuredData']['insideTemperature']['dataPoints'])
        roomTemp_df=pd.DataFrame(response)
        roomTemp_df['timestamp']=pd.to_datetime(response.timestamp, format='%Y-%m-%d %H:%M:%S.%f')
        return roomTemp_df

#shortened

Tado lumps up a bunch of the data you need to review a large chunk of time into a single ‘dayReport’ – a monstrous object which is really unpleasant to parse. I’ve solved the problem bluntly by pulling a single dayReport and then passing it to other functions which cut it up.

I’ve also take the time to cut out and rename columns to make them a bit easier to use later. A frustration is how verbose this syntax is in python. A simple rename goes from

select roomheating.from_ts as from

to (what feels like an unecessary):

roomHeating_df.rename(columns={'from': 'from_ts'}, inplace=True)

It’s made more frustrating by having to separate out my renames from my dropping of columns – which can’t be a simple ‘keep’ like in SQL or SAS

2. Format it into something useful

My lack of familiarity with Python, and Pandas specifically might be showing here. I’ve got some datasets which are roughly indexed on time, and I want to join them. I could reflex some SQL to solve this.. but that won’t teach me anything.. so the first step is to think about the data I’m going to merge

again, this kind of merge in SQL would be easy, but after some googling I’ve got an equivalent Pandas syntax:

df_merge=pd.merge_asof(roomTemp_df,roomHeating_df, left_on="timestamp", right_on="from_ts", direction="backward")

Which produces what I’m after, well sort of

Again, I don’t like how verbose this is making a simple join – I’m going to have to do this a few times for each join I want, and it *feels* clunky. I also dislike how it’s keeping all my columns, especially the horrible datatimes

3. Deciding how to visualise it

3a. A rant

Packages are a necessary evil of an open source language which boasts the flexibility of Python, but it comes at a cost. There are a colossal number of ways to do anything, and because they’re all optimised for slightly different end goals, they all work a bit differently.

I’m trying to draw a graph. I want to do this as easily as possible but I’ve no idea where to start, or at least where *best* to start.

3b. Getting on with it

Ok, back to the task. I’ve gone with Matplotlib, largely thanks to google. Here’s a basic plot showing the room temp, external temp, and a measure of the amount of heating power being supplied to the room

This is pretty good – I’ve got the data for a single day, and by converting the heating power value to an integer (from None, Low, Med, High), I can make it clear what’s happening.

First thing to notice is how little heating power is needed to maintain a fairly even room temperature. There’s definitely more capacity in the system. If the system was running at full power all day it would give an average power of 3. With a bit of work to split up the day into 4 time zones:

I can see the amount of power being used to keep the room warm – midnight to 6 am is the worst spot, where we’re using 15% of maximum heating we could give.

The top line is a bit misleading as it’s not clear that the thermostat is set for different temperatures throughout the day. With a big of clever colour-fill, I can show visually when things are working as they should

I’ve added a few more days on there too

So there we go – the baby is warm enough, and the heating has plenty of capacity to handle a cold spell!

Optimising oil pumps in Factorio [pt1]

The PC game Factorio presents an interesting optimisation problem which I’ve been using to improve my python-thinking: how to optimally lay down structures in the game, with a series of overlapping constraints.

Background

Factorio is a resource management and logistics sandbox game on the PC [https://factorio.com/] which asks the player to place structures which obtain resources, and then process them into increasingly complex products.

One of those products is oil, which is obtained from randomly occuring pools within small areas on the world map:

The brown puddles are oil on the map

The player uses 5 structures to collect and distribute the oil:

A pumpjack, with the oil connect point at the bottom left

Pumpjacks are placed on the oil, and have 4 possible orientations (the cardinal directions). Oil must be piped from these structures and pipes connect to a single location (which changes for each of the 4 rotations.

A beacon

Beacons are late-game structures which provide benefits to other structures within a certain range. Without dwelling on the detail, it is desirable to have as many of these covering each pumpjack as possible. They occupy 3 squares, but the beneficial effect stretches a further 3 in each direction, so 9×9 in total (see below for some examples).

A medium power pole, there are other types with different areas of effect

Power poles are needed to provide electricity to the pumpjacks and beacons. They need to be close enough to each other to be connected to the grid, and they broadcast power in an area around them. There are 4 different kinds, but the concept is the same for each.

Here are some pipes, on the right is a regular overground, which connects to some ‘pipe to ground’ pipes, these create an underground connection.

Pipes (x2) either run above ground, or underground. Oil must be piped away from the oilfield to be useful.

The problem

Given an arbitrary layout of oil on the map, what is the most efficient layout of structures so that:

  1. All oil is pumped
  2. All buildings are powered
  3. All pumpjacks are covered by the largest possible number of beacons

I know of at least one tool which already proposes this, I’m interested in going through the exercise myself, partly to give me a real world problem to practice my Python, and also to see if I can do better!

Part 1: Where to put the Beacons

I want to start backwards, with the purest version of the problem. Given a set of pumps, how do we decide where to put beacons.

Here is a lone pump, it takes up 3×3 cells
Here are 3 possible beacon placements. 1 and 2 are valid, as long as any of the 9×9 square of influence around the beacon touches the pump, it works. The 3rd placement isn’t valid.

For simplicity, let’s refer to an objects placement by the top leftest square it occupies, so the example above has a pump at location (7,7), and the beacon is at (4,4). In the code, I’ll use arrays, so these locations would be [6,6] and [3,3], starting from the 0 base.

We can then draw out a heatmap of the ‘valid’ placements for the beacon around this single pump

The dark blue cells are the full range of options for placing a beacon which affects the pump. Grey cells are ones where it’s not possible to fit a 3×3 beacon. A beacon would fit in the white cells, but they’re too far from the pump, so have no value.

A beacon placed in any of the blue squares would be a valid placement, because there is only a single pump, there is no difference in the value of each square.

The lighter blue squares show valid placements where the beacon would affect both pumps

Adding a second pump gives a new dimension to the problem, the lighter blue squares are beacons which would affect both the pumps. Our first priority should be to make sure we cover as many of these cells as possible, and, we must avoid ‘blocking’ these more valuable cells by placing a beacon in lower value adjacent cells

Putting a beacon in the orange cell, which has value 1, would cover 3 other cells of value 2

There’s also a more subtle ‘blocking’ which can occur if a beacon is placed in a high value cell which stops another being available

If we only consider the orange highlighted column, there are 3 unique placements of beacon – Options A and B allow 4 beacons to be placed on valid cells. Option C only allows us to fit 3. By starting in cell 4, we push the 4th beacon into cell 13 which doesn’t have a value

In this example, starting the beacons in cell 4 would reduce the overall coverage by 1. Despite each cell having the same ability to cover both beacons

Here both examples allow us to place 4 beacons, but option A allows for two of the beacons to have value 2, whereas option B only allows for 1. Option A is a better global solution

In this example, we can place the same number of beacons.. but.. they may not be as valuable.

I’ve added numbers to each cell, as the colour wasn’t coping well with this stress situation! Here the central square in the 5×5 grid is worth 12 – a no brainer.. or is it?

In this final example, the central square has value 12, that seems like an obvious place to put a beacon, but it would prevent any other squares being used:

Here 4 beacons have a total value of 22 – 10 higher than selecting the central square alone

A more optimal solution places 4 lower value beacons, which sum to a greater amount

There is a hidden cost of choosing cell, in that it prevents it’s neighbours being used. This could make it less valuable to our overall solution. We need to find a way to discount cells which lead to this blocking.

Assuming we choose to populate the blue square, we cover the orange squares. The black squares are not accessible either. The green squares are the nearest neighbours, which can still be populated.

In order to correctly assess the opportunity cost of any square choice, we need to look at the best combination it enables, minus the best combination it blocks. First we loop through all the ‘cost’ combinations and take the one which gives the highest value

Here are a set of the possible combinations which need to be assessed to make sure we’re doing the right thing by choosing the centre blue square, a simple nested loop will handle the possibilities

Then we loop through the smaller number of green cell combinations which are the possibilities not blocked by the choice of blue square, and take the best combination which is not blocked – the ‘benefit’ of choosing that square

Each purple outline is still possible with the blue square chosen

I’ve then subtracted the highest cost from the highest benefit, and calculate the benefit-cost = Opportunity Cost for each square.

Each cell now has 2 important values:

Top left is the ‘Value‘ of the cell – how many pumps the beacon would be able to influence

Bottom right is the ‘Opportunity Cost‘ – the benefit of the cell, minus the cost. Where the cost is negative, the cell is worth less than the cells it blocks, where it’s positive, the cell provides more benefit than it removes

(A cell can have a bigger positive than it’s own value because of the other squares it enables, I’m not 100% sure that logic works, but I think I can account for that in the implementation)

Now we can loop through the grid, finding the highest Opportunity Benefit square, placing a beacon on it, then recalculating and repeating.

And finally, after all that, here’s what it does:

This seems to produce a sensible result, albeit with some odd choices, e.g. where two adjacent cells are equally good, it’s all down to the loop to see which gets chosen

I’m left a bit unsure. The final arrangement has some glitches, like 4,5,9 being offset from 11, 14 – this is due to the way I’m looping through the grid when checking for the best Opp-Benefit. But there’s a general sense that, while my logic feels sound, I’m not able to prove the best-ness of the solution.

Anyway. This is the end of part 1. My next options are either going back and fixing up the code (which I’ll do before I share anything), or moving on to the other constraints.

Some Utility Scripts

I’ve written a couple of bits of code I’ve reused

pickleHelper.py

This simplifies using pickle objects in code

import pickle

def writeToPickle(_pickleName, _object):
        with open(_pickleName, 'wb') as file:
            pickle.dump(_object, file)


def checkPickle(_pickleName):
        try:
            with open(_pickleName,"rb") as file:
                return pickle.load(file)
        except IOError:
            return False


def updatePickleInPlace(_pickleName, _object):
        existing = checkPickle(_pickleName)
        if existing!=False: 
            existing = checkPickle(_pickleName)
            for key, val in _object.items():
                existing[key]=val
            writeToPickle(_pickleName, existing)   
        else:
            writeToPickle(_pickleName, _object)


tadoAuth.py

I’ve replaced a chunk of my home heating system with Tado, this is a simple function to return and store the bearer token for API requests to view and change the settings. It assumes you have TADOUSER/TADOPASS in an environment file.

import requests, json, datetime
from pickleHelper import checkPickle, updatePickleInPlace
from requests.utils import quote
import os 
from dotenv import load_dotenv

load_dotenv()

accessToken = ""
refreshToken = ""
expiresTime = datetime.datetime.now()
pickleFile = "auth.pickle"

def getTadoToken():
    username=os.getenv('TADOUSER')
    password=quote(os.getenv('TADOPASS'), safe='')
    url="https://auth.tado.com/oauth/token?client_id=tado-web-app&grant_type=password&scope=home.user&username="+str(username)+"&password="+password+"&client_secret=wZaRN7rpjn3FoNyF5IFuxg9uMzYJcvOoQ8QWiIqS3hfk6gLhVlG57j5YNoZL2Rtc"
 #Don't worry, this isn't my secret!
    response = requests.post(url)
    accessToken=response.json()["access_token"]
    refreshToken=response.json()["refresh_token"]
    expiresTime=datetime.datetime.now() + datetime.timedelta(seconds=response.json()["expires_in"])
    updatePickleInPlace(pickleFile, {"accessToken":accessToken, "refreshToken":refreshToken, "expiresTime":expiresTime})
    return accessToken

def returnTadoToken():
    depickle = checkPickle(pickleFile)
    if (depickle==False):
        return  getTadoToken()
    if (datetime.datetime.now() > depickle["expiresTime"]):
        return  getTadoToken()
    else:
        return depickle["accessToken"]

picStitcher – creating a photo collage

Every month since our son was born, my wife and I have made a 4×4 collage of the best photos of him which goes on the fridge.

We’ve then taken a photo of the photos, but the quality is obviously not as good as if we’d joined the originals together. Rather than rely on photo editing software, or an online tool, I thought I’d see how easy it is to make a tool for this in Python.

The problem

16 photos per month, saved in jpg format, need to be stitched together in a single jpg

A schematic showing 16 individual images, then those images in a grid, finally those files joined as 1

The photos are all portrait, and I want to preserve the original order from the fridge

The solution

I think the steps are:

  1. Rename the files into some kind of sequence (I wonder if I could enhance this with some image recognition eventually?)
  2. Create a new blank image to hold them
  3. Loop through the files and add them to the blank image at the right offset
  4. Save the new file

1. Rename the files

This turned out to be the most tedious bit of the job. I went through a folder of all the fridge photo jpgs from the year and renamed them as

MMM-Y-X.jpg

a list of properly formatted file names

where x and y are the grid co-ordinates, from 1 to 4. With that done, next stop VSCode

2. Create a blank image

I’ve used the Pillow library a few times, it’s pretty accommodating, and we’ll enough used that stackoverflow produces plenty of help when it breaks (or you do it wrong).

from PIL import Image, ImageDraw, ImageOps

The first thing to do is make a new image, which takes 2 arguments. The first is the colour space for the image, then the width and height in a tuple.

Image.new(mode, size, color=0)

rather than worrying about how many pixels each photo was, I thought it was easier to open the first photo for each grid and read in the width and height values.

#All the files are named MON-Y-X.jpg
month = "NOV"
gridX = 4
gridY = 4

#build the image name
imageName = month+"-1-1.jpg"
#open the file
firstImage = Image.open(imageName)
#check the orientation is correct
firstImage = ImageOps.exif_transpose(firstImage)
#get the size of the image
imageWidth, imageHeight = firstImage.size

then the new image is just the number of tiled images multiplied by the size of each one

#make a new image which is large enough to fit the sub-images in
newImage = Image.new('RGB', (imageWidth*gridX, imageHeight*gridY))

3. Stitch them all together

I’m still second guessing myself in Python when it comes to loops. I’ve read a lot of very stuffy views on the unsuitability of looping (“It’s just not Pythonic” stuff) but I think this is one of the correct uses (small volumes, speed not a factor, readability important)

#loop through, opening each image and pasting it into the new image
for x in range(gridX):
    for y in range(gridY):
        getImageName= month+"-"+str(y+1)+"-"+str(x+1)+".jpg"
        getImage = Image.open(getImageName)
        getImage=ImageOps.exif_transpose(getImage)
        #force a resize, just in case
        getImage = getImage.resize((imageWidth,imageHeight), Image.ANTIALIAS)
        
        #paste into the new image
        newImage.paste(getImage,(imageWidth*x,imageHeight*y))
        #quick debug print
        print(getImageName+ " done")

3b. Don’t make assumptions

I’ve put a couple of bits of code in there which weren’t in the original plan, but address issues I got to when I ran the code

firstImage = ImageOps.exif_transpose(firstImage)

This solved a couple of the pictures having the wrong orientation. I think the issue is that somewhere in the history of images and cameras there was a bifurcation of reason which lead to us having some image files orientated correctly, and others arbitrarily, but with a note to tell the software how to rotate it. Seems crazy to me.

getImage = getImage.resize((imageWidth,imageHeight), Image.ANTIALIAS)

This just ensures the image I’m going to tile isn’t an unusual size

4. Save the file

newImage = newImage.resize((6000,8000), Image.ANTIALIAS)

#Save the new image with the month name
newImage.save(month+".jpg")

I didn’t initially worry about the final image size, but google photos had a freak-out when I tried to upload a 25mb enormous jpg, so sorting the size before saving helped

The result

a (blurry) sample of the output

I’m not sure if it beats the time spent vs time saved heuristic, but I’m pleased with the result!