I am a researcher and physician, currently working with Marie von Lilienfeld-Toal at the Institute for Diversity Medicine at Ruhr University Bochum, formerly at the Biophysical Imaging Lab of Christian Eggeling in Jena.
You can get in contact with me by sending an email to alva.seltmann (at) rub
(dot) de. You can also find me on Github.
Ich bin Wissenschaftlerin und Ärztin, aktuell tätig bei Marie von Lilienfeld-Toal am Institut für Diversitätsmedizin der Ruhr-Universität Bochum, davor in der Biophysical Imaging Gruppe von Christian Eggeling in Jena.
Ich bin per Email erreichbar unter alva.seltmann (at) rub
(dot) de. Weiterhin bin ich zu finden auf Github.
I am partial to the translation of Derek Lin
Examples:
| Human Era year | Common Era year | Event |
|---|---|---|
| <r> | <r> | |
| 1001 HE | 9000 BCE | Jericho |
| 7301 HE | 2700 BCE | First pyramid |
| 11460 HE | 1460 CE | Machu Picchu built |
| 11945 HE | 1945 CE | United Nations founded |
Wonderful illustration of the case for the Human Era (courtesy of [kurzgesagt.org](https://kurzgesagt.org)):
Structured arguments and thoughts, often presented in lists.
Bold, italic, verbatim, strikethrough
This is a test. And Hugo Book Shortcodes in an org source file:
Here are the details.
This should also be possible in plain org-mode: π(x) vs
This works, but requires MathJax:
\begin{equation} \label{eq:1} C = Wlog2 (1+\mathrm{SNR}) \end{equation}
x = \begin{cases}
a &\text{if } b
c &\text{if } d
\end{cases}
Test section text.
Here is other critical stuff.
- promise and hope, with freedom comes prosperity
- movement of French colonies → want to become French citizens
- movement of British colonies → Pan-Africanism
- people: activist Alexander Crummell, writer W. E. B. Du Bois (core idea: Black* people should remain in the countries where they now live), journalist Marcus Garvey (core idea: Black* people should return to Africa)
- core ideas
- all Blacks* in the world constitute a single race, a single culture, and should be proud of their colour of skin
- all of Africa should be independent and united → “Africa for Africans”
- BUT: corrupt elites - why?
- no well-developed private sector, plantations belonged to foreigners, banks belong to foreign capital → political career was only way to richness
- civil wars, revolts, massacres, hunger
- opponents used all means (tribal and ethnic conflicts, military might, corruption, murder)
- cold war era → problems + interests of weaker, dependent countries were ignored, subordinate to superpower interests
- coup d’etat in Nigeria - military seizes power after only 5 years of
independence
- and people celebrate! Since own elite is seen as too corrupt (“black imperialists”, “political wolves, who plunder the country”)
Another continent to describe.
Ich persönlich mag die Übersetzung von Günther Debon.
Test post text.
Test post text.
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 1 - see https:adventofcode.com/2023/day/1
The newly-improved calibration document consists of lines of text; each line originally contained a specific calibration value that the Elves now need to recover. On each line, the calibration value can be found by combining the first digit and the last digit (in that order) to form a single two-digit number.
For example:
1abc2 pqr3stu8vwx a1b2c3d4e5f treb7uchet
In this example, the calibration values of these four lines are 12, 38, 15, and 77. Adding these together produces 142.
Consider your entire calibration document. What is the sum of all of the calibration values?
My input file: 2023-12-01-1-aoc.txt
Lets start jupyter in our shell to start coding!
conda activate tf
jupyter lab --no-browser --port=8888First, load the test document
import pandas as pd
import re
txt = pd.read_table('data/2023-12-01-1-aoc.txt', names=['code'])
txt
| code | |
|---|---|
| 0 | jjfvnnlfivejj1 |
| 1 | 6fourfour |
| 2 | ninevbmltwo69 |
| 3 | pcg91vqrfpxxzzzoneightzt |
| 4 | jpprthxgjfive3one1qckhrptpqdc |
| … | … |
| 995 | 583sevenhjxlqzjgbzxhkcl5 |
| 996 | 81s |
| 997 | 2four3threesxxvlfqfive4 |
| 998 | nine6eightsevenzx9twoxc |
| 999 | hmbfjdfnp989mfivefiverpzrjs |
1000 rows × 1 columns
Second, extract the digits. I had to wrap my head around regex matching in
python first, because I first tried pandas.extract (which only extracts the
first match), then pandas.extractall (which extracts all matches but puts them
into a multiindex which makes things more difficult in this case). So I settled
for the re.findall version, which returns a list. To concatenate the elements
in the list, we take use the join function.
txt['digits'] = txt.loc[:, 'code'].apply(
lambda x: ''.join(re.findall(r'(\d+)', x)))
txt
| code | digits | |
|---|---|---|
| 0 | jjfvnnlfivejj1 | 1 |
| 1 | 6fourfour | 6 |
| 2 | ninevbmltwo69 | 69 |
| 3 | pcg91vqrfpxxzzzoneightzt | 91 |
| 4 | jpprthxgjfive3one1qckhrptpqdc | 31 |
| … | … | … |
| 995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 |
| 996 | 81s | 81 |
| 997 | 2four3threesxxvlfqfive4 | 234 |
| 998 | nine6eightsevenzx9twoxc | 69 |
| 999 | hmbfjdfnp989mfivefiverpzrjs | 989 |
1000 rows × 2 columns
Next, combine the first and the last digit and convert the result from string to integer
txt['calibration'] = txt.loc[:, 'digits'].apply(
lambda x: int(x[0] + x[-1]))
txt
| code | digits | calibration | |
|---|---|---|---|
| 0 | jjfvnnlfivejj1 | 1 | 11 |
| 1 | 6fourfour | 6 | 66 |
| 2 | ninevbmltwo69 | 69 | 69 |
| 3 | pcg91vqrfpxxzzzoneightzt | 91 | 91 |
| 4 | jpprthxgjfive3one1qckhrptpqdc | 31 | 31 |
| … | … | … | … |
| 995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 | 55 |
| 996 | 81s | 81 | 81 |
| 997 | 2four3threesxxvlfqfive4 | 234 | 24 |
| 998 | nine6eightsevenzx9twoxc | 69 | 69 |
| 999 | hmbfjdfnp989mfivefiverpzrjs | 989 | 99 |
1000 rows × 3 columns
Lastly, get the sum of our calibration numbers
txt.loc[:, 'calibration'].sum()
Your calculation isn’t quite right. It looks like some of the digits are actually spelled out with letters: one, two, three, four, five, six, seven, eight, and nine also count as valid “digits”.
Equipped with this new information, you now need to find the real first and last digit on each line. For example:
two1nine eightwothree abcone2threexyz xtwone3four 4nineeightseven2 zoneight234 7pqrstsixteen
In this example, the calibration values are 29, 83, 13, 24, 42, 14, and 76. Adding these together produces 281.
What is the sum of all of the calibration values?
Okay, let’s see if we can update the pattern matching. To deal with potential
overlapping values like oneight which contains one as well as eight, I
used the regex positive lookahead ?= as described here. Because this enables
capturing overlapping values, I used \d (one digit) instead of \d+ (one or
more digits), otherwise digits might double. Afterwards, just replace the
spelled out digits with their numerical value.
# for i, r in enumerate(txt.loc[:, 'code']):
# matches = re.findall(
# r'(?=(\d|one|two|three|four|five|six|seven|eight|nine))', r)
# result = ''.join([match for match in matches])
# result = result.replace('one', '1').replace('two', '2').replace(
# 'three', '3').replace('four', '4').replace('five', '5').replace(
# 'six', '6').replace('seven', '7').replace('eight', '8').replace(
# 'nine', '9')
# txt.loc[i, 'digits2'] = result
# txt
# a very nice alternative suggested by Tomalak:
digits = '\d one two three four five six seven eight nine'.split()
txt['digits2'] = txt.loc[:, 'code'].apply(lambda v: ''.join(
str(digits.index(m)) if m in digits else m
for m in re.findall(rf'(?=({"|".join(digits)}))', v)
))
txt
| code | digits | calibration | digits2 | |
|---|---|---|---|---|
| 0 | jjfvnnlfivejj1 | 1 | 11 | 51 |
| 1 | 6fourfour | 6 | 66 | 644 |
| 2 | ninevbmltwo69 | 69 | 69 | 9269 |
| 3 | pcg91vqrfpxxzzzoneightzt | 91 | 91 | 9118 |
| 4 | jpprthxgjfive3one1qckhrptpqdc | 31 | 31 | 5311 |
| … | … | … | … | … |
| 995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 | 55 | 58375 |
| 996 | 81s | 81 | 81 | 81 |
| 997 | 2four3threesxxvlfqfive4 | 234 | 24 | 243354 |
| 998 | nine6eightsevenzx9twoxc | 69 | 69 | 968792 |
| 999 | hmbfjdfnp989mfivefiverpzrjs | 989 | 99 | 98955 |
1000 rows × 4 columns
Now, construct the calibration value as before…
txt['calibration2'] = txt.loc[:, 'digits2'].apply(lambda x: int(x[0] + x[-1]))
txt
| code | digits | calibration | digits2 | calibration2 | |
|---|---|---|---|---|---|
| 0 | jjfvnnlfivejj1 | 1 | 11 | 51 | 51 |
| 1 | 6fourfour | 6 | 66 | 644 | 64 |
| 2 | ninevbmltwo69 | 69 | 69 | 9269 | 99 |
| 3 | pcg91vqrfpxxzzzoneightzt | 91 | 91 | 9118 | 98 |
| 4 | jpprthxgjfive3one1qckhrptpqdc | 31 | 31 | 5311 | 51 |
| … | … | … | … | … | … |
| 995 | 583sevenhjxlqzjgbzxhkcl5 | 5835 | 55 | 58375 | 55 |
| 996 | 81s | 81 | 81 | 81 | 81 |
| 997 | 2four3threesxxvlfqfive4 | 234 | 24 | 243354 | 24 |
| 998 | nine6eightsevenzx9twoxc | 69 | 69 | 968792 | 92 |
| 999 | hmbfjdfnp989mfivefiverpzrjs | 989 | 99 | 98955 | 95 |
1000 rows × 5 columns
… and get the correct sum!
txt.loc[:, 'calibration2'].sum()
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 1 - siehe https:adventofcode.com/2023/day/1
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 2 - see https:adventofcode.com/2023/day/2
As you walk, the Elf shows you a small bag and some cubes which are either red, green, or blue. Each time you play this game, he will hide a secret number of cubes of each color in the bag, and your goal is to figure out information about the number of cubes.
To get information, once a bag has been loaded with cubes, the Elf will reach into the bag, grab a handful of random cubes, show them to you, and then put them back in the bag. He’ll do this a few times per game.
You play several games and record the information from each game (your puzzle input). Each game is listed with its ID number (like the
11inGame 11: ...) followed by a semicolon-separated list of subsets of cubes that were revealed from the bag (like3 red, 5 green, 4 blue).For example, the record of a few games might look like this:
Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green Game 2: 1 blue, 2 green; 3 green, 4 blue, 1 red; 1 green, 1 blue Game 3: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green; 5 green, 1 red Game 4: 1 green, 3 red, 6 blue; 3 green, 6 red; 3 green, 15 blue, 14 red Game 5: 6 red, 1 blue, 3 green; 2 blue, 1 red, 2 green
In game 1, three sets of cubes are revealed from the bag (and then put back again). The first set is 3 blue cubes and 4 red cubes; the second set is 1 red cube, 2 green cubes, and 6 blue cubes; the third set is only 2 green cubes.
The Elf would first like to know which games would have been possible if the bag contained only 12 red cubes, 13 green cubes, and 14 blue cubes?
In the example above, games 1, 2, and 5 would have been possible if the bag had been loaded with that configuration. However, game 3 would have been impossible because at one point the Elf showed you 20 red cubes at once; similarly, game 4 would also have been impossible because the Elf showed you 15 blue cubes at once. If you add up the IDs of the games that would have been possible, you get
8.Determine which games would have been possible if the bag had been loaded with only 12 red cubes, 13 green cubes, and 14 blue cubes. What is the sum of the IDs of those games?
My input file: 2023-12-02-1-aoc.txt
Okay, let’s load our python kernel in emacs-jupyter and get coding! First of
all, let’s load the input and split the riddle code by colon : to extract the
game id and the rest of the code by semicolon ; to get the number of sets
played in each game.
import pandas as pd
import re
txt = pd.read_table('data/2023-12-02-1-aoc.txt', names=['code'])
txt['id'] = txt.loc[:, 'code'].str.split(':').apply(
lambda x: int(x[0].strip('Game ')))
txt['code'] = txt.loc[:, 'code'].str.split(':').apply(lambda x: x[1])
# txt['code'] = txt.loc[:, 'code'].str.split(';')
# txt['nsets'] = txt.loc[:, 'code'].apply(lambda x: len(x))
txt
| code | id | |
|---|---|---|
| 0 | 1 green, 1 blue, 1 red; 1 green, 8 red, 7 blu… | 1 |
| 1 | 9 red, 7 green, 3 blue; 15 green, 2 blue, 5 r… | 2 |
| 2 | 3 red, 1 blue, 4 green; 6 red, 3 green, 2 blu… | 3 |
| 3 | 2 blue, 2 green, 19 red; 3 blue, 11 red, 16 g… | 4 |
| 4 | 8 green, 1 red, 12 blue; 10 green, 6 red, 13 … | 5 |
| … | … | … |
| 95 | 2 red, 2 green, 1 blue; 1 red, 4 green; 1 green | 96 |
| 96 | 4 red, 5 green; 5 blue, 3 red; 8 blue, 2 gree… | 97 |
| 97 | 1 blue; 2 green, 1 red; 5 red, 2 green; 4 red… | 98 |
| 98 | 6 blue, 5 red, 2 green; 9 red, 1 blue; 2 gree… | 99 |
| 99 | 1 blue, 13 green, 14 red; 11 green, 11 blue, … | 100 |
100 rows × 2 columns
Now, let’s extract the three colors in different columns with regex. We use the
lookahead assertion ?= to find the respective colours and only exctract the
digits \d+ coming before. Then we just keep the max imum drawn number of cubes
per color, since this is the only information that matters at the moment.
txt['green'] = txt.loc[:, 'code'].apply(
lambda code: re.findall(r'\d+(?=.green)', code)).apply(
lambda list: max([int(i) for i in list]))
txt['red'] = txt.loc[:, 'code'].apply(
lambda code: re.findall(r'\d+(?=.red)', code)).apply(
lambda list: max([int(i) for i in list]))
txt['blue'] = txt.loc[:, 'code'].apply(
lambda code: re.findall(r'\d+(?=.blue)', code)).apply(
lambda list: max([int(i) for i in list]))
txt
| code | id | green | red | blue | |
|---|---|---|---|---|---|
| 0 | 1 green, 1 blue, 1 red; 1 green, 8 red, 7 blu… | 1 | 2 | 10 | 10 |
| 1 | 9 red, 7 green, 3 blue; 15 green, 2 blue, 5 r… | 2 | 15 | 10 | 3 |
| 2 | 3 red, 1 blue, 4 green; 6 red, 3 green, 2 blu… | 3 | 4 | 6 | 16 |
| 3 | 2 blue, 2 green, 19 red; 3 blue, 11 red, 16 g… | 4 | 16 | 20 | 18 |
| 4 | 8 green, 1 red, 12 blue; 10 green, 6 red, 13 … | 5 | 10 | 6 | 14 |
| … | … | … | … | … | … |
| 95 | 2 red, 2 green, 1 blue; 1 red, 4 green; 1 green | 96 | 4 | 2 | 1 |
| 96 | 4 red, 5 green; 5 blue, 3 red; 8 blue, 2 gree… | 97 | 5 | 4 | 8 |
| 97 | 1 blue; 2 green, 1 red; 5 red, 2 green; 4 red… | 98 | 2 | 5 | 2 |
| 98 | 6 blue, 5 red, 2 green; 9 red, 1 blue; 2 gree… | 99 | 2 | 9 | 11 |
| 99 | 1 blue, 13 green, 14 red; 11 green, 11 blue, … | 100 | 13 | 15 | 11 |
100 rows × 5 columns
Lastly, we just filter the DataFrame to only include games where all drawn cubes were below or equal the number of cubes in the game and sum the result!
txt['id'][(txt['green'] < 14) & (txt['red'] < 13) & (txt['blue'] < 15)].sum()
As you continue your walk, the Elf poses a second question: in each game you played, what is the fewest number of cubes of each color that could have been in the bag to make the game possible?
Again consider the example games from earlier:
Game 1: 3 blue, 4 red; 1 red, 2 green, 6 blue; 2 green Game 2: 1 blue, 2 green; 3 green, 4 blue, 1 red; 1 green, 1 blue Game 3: 8 green, 6 blue, 20 red; 5 blue, 4 red, 13 green; 5 green, 1 red Game 4: 1 green, 3 red, 6 blue; 3 green, 6 red; 3 green, 15 blue, 14 red Game 5: 6 red, 1 blue, 3 green; 2 blue, 1 red, 2 green
- In game 1, the game could have been played with as few as 4 red, 2 green, and 6 blue cubes. If any color had even one fewer cube, the game would have been impossible.
- Game 2 could have been played with a minimum of 1 red, 3 green, and 4 blue cubes.
- Game 3 must have been played with at least 20 red, 13 green, and 6 blue cubes.
- Game 4 required at least 14 red, 3 green, and 15 blue cubes.
- Game 5 needed no fewer than 6 red, 3 green, and 2 blue cubes in the bag.
The power of a set of cubes is equal to the numbers of red, green, and blue cubes multiplied together. The power of the minimum set of cubes in game 1 is 48. In games 2-5 it was 12, 1560, 630, and 36, respectively. Adding up these five powers produces the sum 2286.
For each game, find the minimum set of cubes that must have been present. What is the sum of the power of these sets?
Luckily, this task is made trivial by the approach we have taken before. We just
have to multiply the green, red and blue columns:
txt['power'] = txt.loc[:, 'green'] * txt.loc[:, 'blue'] * txt.loc[:, 'red']
txt
| code | id | green | red | blue | power | |
|---|---|---|---|---|---|---|
| 0 | 1 green, 1 blue, 1 red; 1 green, 8 red, 7 blu… | 1 | 2 | 10 | 10 | 200 |
| 1 | 9 red, 7 green, 3 blue; 15 green, 2 blue, 5 r… | 2 | 15 | 10 | 3 | 450 |
| 2 | 3 red, 1 blue, 4 green; 6 red, 3 green, 2 blu… | 3 | 4 | 6 | 16 | 384 |
| 3 | 2 blue, 2 green, 19 red; 3 blue, 11 red, 16 g… | 4 | 16 | 20 | 18 | 5760 |
| 4 | 8 green, 1 red, 12 blue; 10 green, 6 red, 13 … | 5 | 10 | 6 | 14 | 840 |
| … | … | … | … | … | … | … |
| 95 | 2 red, 2 green, 1 blue; 1 red, 4 green; 1 green | 96 | 4 | 2 | 1 | 8 |
| 96 | 4 red, 5 green; 5 blue, 3 red; 8 blue, 2 gree… | 97 | 5 | 4 | 8 | 160 |
| 97 | 1 blue; 2 green, 1 red; 5 red, 2 green; 4 red… | 98 | 2 | 5 | 2 | 20 |
| 98 | 6 blue, 5 red, 2 green; 9 red, 1 blue; 2 gree… | 99 | 2 | 9 | 11 | 198 |
| 99 | 1 blue, 13 green, 14 red; 11 green, 11 blue, … | 100 | 13 | 15 | 11 | 2145 |
100 rows × 6 columns
And for this one, the sum is:
txt['power'].sum()
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 2 - siehe https:adventofcode.com/2023/day/2
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 3 - see https:adventofcode.com/2023/day/3
The engine schematic (your puzzle input) consists of a visual representation of the engine. There are lots of numbers and symbols you don’t really understand, but apparently any number adjacent to a symbol, even diagonally, is a “part number” and should be included in your sum. (Periods (
.) do not count as a symbol.)Here is an example engine schematic:
467..114.. ...*...... ..35..633. ......#... 617*...... .....+.58. ..592..... ......755. ...$.*.... .664.598..
In this schematic, two numbers are not part numbers because they are not adjacent to a symbol:
114(top right) and58(middle right). Every other number is adjacent to a symbol and so is a part number; their sum is4361.Of course, the actual engine schematic is much larger. What is the sum of all of the part numbers in the engine schematic?
My input file: 2023-12-03-1-aoc.txt
Okay, let’s first get the input as a numpy character array
import numpy as np
import pandas as pd
import sys
import matplotlib.pyplot as plt
from scipy import ndimage as ndi
np.set_printoptions(threshold=sys.maxsize)
txt = pd.read_table('data/2023-12-03-1-aoc.txt', names=['code'])
arr = np.chararray((txt.size, txt.size), unicode=True)
txt['code'] = txt.loc[:, 'code'].apply(lambda x: [i for i in x])
for i, code in enumerate(txt['code']):
arr[i, :] = code
print((arr[:15, :15]))
Now extract symbols, digits and the empty space. We use the numpy character
methods for that. This way, we create a binary mask for all digits and a
binary mask for all empty space (.). The symbols are then every character
which is neither.
digits = np.char.isdigit(arr)
empty = np.char.endswith(arr, '.')
symbols = ~(digits | empty)
# just for visualization
plt.figure(figsize=(16, 5))
plt.subplot(131, title='symbols').matshow(symbols)
plt.subplot(132, title='digits').matshow(digits)
plt.subplot(133, title='empty').matshow(empty)
plt.show()
Now we use the image processing technique of dilation on the symbols mask. So
that we get a new mask which covers the surroundings of all symbols. We use this
afterwards to check if the digits are near a symbol.
struct = ((1, 1, 1), (1, 1, 1), (1, 1, 1))
dilate = ndi.binary_dilation(symbols, structure=struct)
plt.figure(figsize=(10, 6))
plt.subplot(121, title='symbols').matshow(symbols[:15, :15])
plt.subplot(122, title='symbols dilated').matshow(dilate[:15, :15])
plt.show()
Creating this masks as before could be understood as a binary segmentation, as
each element in our mask is either True or False. To extract the single
digits, we’ll convert the binary digits segmentation into a instance
segmentation, where each connected segment has an own index.
markers, num_features = ndi.label(digits)
plt.figure(figsize=(10, 6))
plt.subplot(121, title='binary segmentation').matshow(
digits[:15, :15])
plt.subplot(122, title='instance segmentation').matshow(
markers[:15, :15], cmap='gnuplot')
plt.show()
Now, for each instance, we check if the dilated binary mask overlaps with the instance and if yes, we extract the number.
numbers = [int(''.join(arr[markers == i]))
for i in range(1, num_features+1)
if np.any((markers == i) & dilate)]
Then, we just sum up:
sum(numbers)
The missing part wasn’t the only issue - one of the gears in the engine is wrong. A gear is any * symbol that is adjacent to exactly two part numbers. Its gear ratio is the result of multiplying those two numbers together.
This time, you need to find the gear ratio of every gear and add them all up so that the engineer can figure out which gear needs to be replaced.
Consider the same engine schematic again:The missing part wasn’t the only issue - one of the gears in the engine is wrong. A gear is any * symbol that is adjacent to exactly two part numbers. Its gear ratio is the result of multiplying those two numbers together.
This time, you need to find the gear ratio of every gear and add them all up so that the engineer can figure out which gear needs to be replaced.
Consider the same engine schematic again:
467..114.. ...*...... ..35..633. ......#... 617*...... .....+.58. ..592..... ......755. ...$.*.... .664.598..
In this schematic, there are two gears. The first is in the top left; it has part numbers
467and35, so its gear ratio is16345. The second gear is in the lower right; its gear ratio is451490. (The*adjacent to617is not a gear because it is only adjacent to one part number.) Adding up all of the gear ratios produces467835.What is the sum of all of the gear ratios in your engine schematic?
We’ll use the same method as before, but this time only extract * and create
the instance segmentation before the dilation. Why? Because when we dilate
first, we could merge two independent gears into one instance.
gear = np.char.endswith(arr, '*')
gear_markers, gear_num = ndi.label(gear)
plt.figure(figsize=(10, 6))
plt.subplot(131, title='all symbols dilated').matshow(
symbols[:15, :15])
plt.subplot(132, title='gears').matshow(
gear[:15, :15])
plt.subplot(133, title='gears instances').matshow(
gear_markers[:15, :15], cmap='gnuplot')
plt.show()
Now, we step through each gear instance, create a mask for that gear, dilate it, and then step through all digits and check if they are in the gear mask. If we get two digits in the end, we multiply them to get the gear ratio and save the ratios to a list.
gear_ratios = []
for i in range(1, gear_num+1):
gear_binary = gear_markers == i
gear_dil = ndi.binary_dilation(gear_binary, structure=struct)
part_numbers = [int(''.join(arr[markers == j]))
for j in range(1, num_features+1)
if np.any((markers == j) & gear_dil)]
if len(part_numbers) == 2:
gear_ratios.append(part_numbers[0] * part_numbers[1])
Now, we just sum the output again:
sum(gear_ratios)
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 3 - siehe https:adventofcode.com/2023/day/3
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 4 - see https:adventofcode.com/2023/day/4
The Elf leads you over to the pile of colorful cards. There, you discover dozens of scratchcards, all with their opaque covering already scratched off. Picking one up, it looks like each card has two lists of numbers separated by a vertical bar (
|): a list of winning numbers and then a list of numbers you have. You organize the information into a table (your puzzle input).As far as the Elf has been able to figure out, you have to figure out which of the numbers you have appear in the list of winning numbers. The first match makes the card worth one point and each match after the first doubles the point value of that card.
For example:
Card 1: 41 48 83 86 17 | 83 86 6 31 17 9 48 53 Card 2: 13 32 20 16 61 | 61 30 68 82 17 32 24 19 Card 3: 1 21 53 59 44 | 69 82 63 72 16 21 14 1 Card 4: 41 92 73 84 69 | 59 84 76 51 58 5 54 83 Card 5: 87 83 26 28 32 | 88 30 70 12 93 22 82 36 Card 6: 31 18 13 56 72 | 74 77 10 23 35 67 36 11
In the above example, card 1 has five winning numbers (
41,48,83,86, and17) and eight numbers you have (83,86,6,31,17,9,48, and53). Of the numbers you have, four of them (48,83,17, and86) are winning numbers! That means card 1 is worth8points (1 for the first match, then doubled three times for each of the three matches after the first).
- Card 2 has two winning numbers (
32and61), so it is worth2points.- Card 3 has two winning numbers (
1and21), so it is worth2points.- Card 4 has one winning number (
84), so it is worth1point.- Card 5 has no winning numbers, so it is worth no points.
- Card 6 has no winning numbers, so it is worth no points.
So, in this example, the Elf’s pile of scratchcards is worth
13points.Take a seat in the large pile of colorful cards. How many points are they worth in total?
My input file: 2023-12-04-1-aoc.txt
Loading this data is very similar to Day 2 - so let’s load the data as we did
there. Our goal is to get win and yours columns holding the respective
digits which we want to compare. We find the numbers with one or more digits
using the regex \d+. And we want them to be in sets (not lists), as we can
logically compare sets in Python.
import pandas as pd
import re
txt = pd.read_table('data/2023-12-04-1-aoc.txt', names=['win'])
txt['id'] = txt.loc[:, 'win'].str.split(':').apply(
lambda x: int(x[0].strip('Card ')))
txt['win'] = (txt.loc[:, 'win']
.str.split(':').apply(lambda x: x[1]))
txt['yours'] = (txt.loc[:, 'win']
.str.split('|')
.apply(lambda x: x[1])
# get a list of only the numbers / digits
.apply(lambda x: re.findall(r'\d+', x))
# convert the list of strings to a set of integers
.apply(lambda x: set([int(i) for i in x])))
txt['win'] = (txt.loc[:, 'win']
.str.split('|')
.apply(lambda x: x[0])
.apply(lambda x: re.findall(r'\d+', x))
.apply(lambda x: set([int(i) for i in x])))
txt
| win | id | yours | |
|---|---|---|---|
| 0 | {32, 36, 7, 9, 10, 12, 82, 85, 95, 31} | 1 | {2, 7, 9, 10, 12, 14, 21, 22, 23, 24, 31, 32, … |
| 1 | {35, 76, 16, 82, 19, 22, 88, 59, 60, 95} | 2 | {7, 8, 12, 16, 19, 22, 26, 28, 35, 38, 44, 51,… |
| 2 | {1, 70, 11, 78, 48, 19, 52, 88, 28, 94} | 3 | {3, 4, 8, 17, 18, 19, 24, 31, 34, 45, 52, 54, … |
| 3 | {65, 2, 72, 28, 14, 16, 55, 91, 92, 62} | 4 | {3, 4, 6, 7, 8, 9, 15, 30, 33, 35, 47, 49, 51,… |
| 4 | {38, 41, 75, 77, 50, 24, 94, 60, 61, 30} | 5 | {1, 2, 4, 5, 6, 7, 9, 10, 14, 17, 21, 29, 47, … |
| … | … | … | … |
| 213 | {97, 98, 39, 41, 43, 12, 13, 19, 93, 95} | 214 | {5, 10, 17, 20, 28, 29, 33, 34, 36, 50, 51, 52… |
| 214 | {97, 35, 69, 40, 74, 45, 20, 21, 62, 31} | 215 | {1, 8, 15, 17, 18, 25, 30, 33, 42, 44, 47, 52,… |
| 215 | {33, 70, 71, 12, 78, 17, 51, 86, 60, 94} | 216 | {7, 8, 9, 10, 22, 29, 37, 39, 41, 43, 46, 47, … |
| 216 | {98, 67, 68, 38, 70, 39, 72, 77, 45, 21} | 217 | {8, 21, 22, 25, 26, 31, 37, 41, 42, 48, 54, 57… |
| 217 | {34, 9, 44, 78, 79, 16, 17, 19, 55, 92} | 218 | {1, 4, 20, 21, 27, 38, 39, 40, 41, 45, 46, 52,… |
218 rows × 3 columns
Now, we get the logical conjunction of win and yours, these are our winning
numbers. Then, the number of wins is converted to points - for all number of
wins bigger than 1, we can get the points by 2**(n_wins-1).
txt['n_wins'] = txt.apply(
lambda row: len(row.loc['win'] & row.loc['yours']), axis=1)
txt['points'] = txt.loc[:, 'n_wins'].apply(
lambda x: 2**(x-1) if x > 1 else x)
txt.loc[:, ['win', 'n_wins', 'points']]
| win | n_wins | points | |
|---|---|---|---|
| 0 | {32, 36, 7, 9, 10, 12, 82, 85, 95, 31} | 10 | 512 |
| 1 | {35, 76, 16, 82, 19, 22, 88, 59, 60, 95} | 10 | 512 |
| 2 | {1, 70, 11, 78, 48, 19, 52, 88, 28, 94} | 5 | 16 |
| 3 | {65, 2, 72, 28, 14, 16, 55, 91, 92, 62} | 0 | 0 |
| 4 | {38, 41, 75, 77, 50, 24, 94, 60, 61, 30} | 0 | 0 |
| … | … | … | … |
| 213 | {97, 98, 39, 41, 43, 12, 13, 19, 93, 95} | 0 | 0 |
| 214 | {97, 35, 69, 40, 74, 45, 20, 21, 62, 31} | 0 | 0 |
| 215 | {33, 70, 71, 12, 78, 17, 51, 86, 60, 94} | 2 | 2 |
| 216 | {98, 67, 68, 38, 70, 39, 72, 77, 45, 21} | 1 | 1 |
| 217 | {34, 9, 44, 78, 79, 16, 17, 19, 55, 92} | 0 | 0 |
218 rows × 3 columns
Then, we just sum up:
sum(txt.loc[:, 'points'])
There’s no such thing as “points”. Instead, scratchcards only cause you to win more scratchcards equal to the number of winning numbers you have.
Specifically, you win copies of the scratchcards below the winning card equal to the number of matches. So, if card 10 were to have 5 matching numbers, you would win one copy each of cards 11, 12, 13, 14, and 15.
Copies of scratchcards are scored like normal scratchcards and have the same card number as the card they copied. So, if you win a copy of card 10 and it has 5 matching numbers, it would then win a copy of the same cards that the original card 10 won: cards 11, 12, 13, 14, and 15. This process repeats until none of the copies cause you to win any more cards. (Cards will never make you copy a card past the end of the table.)
This time, the above example goes differently:
Card 1: 41 48 83 86 17 | 83 86 6 31 17 9 48 53 Card 2: 13 32 20 16 61 | 61 30 68 82 17 32 24 19 Card 3: 1 21 53 59 44 | 69 82 63 72 16 21 14 1 Card 4: 41 92 73 84 69 | 59 84 76 51 58 5 54 83 Card 5: 87 83 26 28 32 | 88 30 70 12 93 22 82 36 Card 6: 31 18 13 56 72 | 74 77 10 23 35 67 36 11
- Card 1 has four matching numbers, so you win one copy each of the next four cards: cards 2, 3, 4, and 5.
- Your original card 2 has two matching numbers, so you win one copy each of cards 3 and 4.
- Your copy of card 2 also wins one copy each of cards 3 and 4.
- Your four instances of card 3 (one original and three copies) have two matching numbers, so you win four copies each of cards 4 and 5.
- Your eight instances of card 4 (one original and seven copies) have one matching number, so you win eight copies of card 5.
- Your fourteen instances of card 5 (one original and thirteen copies) have no matching numbers and win no more cards.
Once all of the originals and copies have been processed, you end up with
1instance of card 1,2instances of card 2,4instances of card 3,8instances of card 4,14instances of card 5, and1instance of card 6. In total, this example pile of scratchcards causes you to ultimately have30scratchcards!Process all of the original and copied scratchcards until no more scratchcards are won. Including the original set of scratchcards, how many total scratchcards do you end up with?
We will use the n_wins column we created before and go from there. We step
through each Game. Each current game gets +1 card. Then, we step through the
number of next games depending on our n_wins. Each of these gets added the
card number of the current game.
txt['cards'] = 0
for i, nwin in enumerate(txt.loc[:, 'n_wins']):
txt.loc[i, 'cards'] += 1
for j in range(1, nwin+1):
txt.loc[i+j, 'cards'] += txt.loc[i, 'cards']
txt.loc[:, ['n_wins', 'cards']]
| n_wins | cards | |
|---|---|---|
| 0 | 10 | 1 |
| 1 | 10 | 2 |
| 2 | 5 | 4 |
| 3 | 0 | 8 |
| 4 | 0 | 8 |
| … | … | … |
| 213 | 0 | 9608 |
| 214 | 0 | 8927 |
| 215 | 2 | 8927 |
| 216 | 1 | 12636 |
| 217 | 0 | 21564 |
218 rows × 2 columns
Now, we just sum the output again:
sum(txt['cards'])
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 4 - siehe https:adventofcode.com/2023/day/4
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 5 - see https:adventofcode.com/2023/day/5
Update [2023-12-31 So]:
- substitute
get_generator()(own implementation) withrange()(Python inbuilt) - improve grid search so that it goes through all location ranges, still starting with the lowest range
The almanac (your puzzle input) lists all of the seeds that need to be planted. It also lists what type of soil to use with each kind of seed, what type of fertilizer to use with each kind of soil, what type of water to use with each kind of fertilizer, and so on. Every type of seed, soil, fertilizer and so on is identified with a number, but numbers are reused by each category - that is, soil
123and fertilizer123aren’t necessarily related to each other.For example:
seeds: 79 14 55 13 seed-to-soil map: 50 98 2 52 50 48 soil-to-fertilizer map: 0 15 37 37 52 2 39 0 15 fertilizer-to-water map: 49 53 8 0 11 42 42 0 7 57 7 4 water-to-light map: 88 18 7 18 25 70 light-to-temperature map: 45 77 23 81 45 19 68 64 13 temperature-to-humidity map: 0 69 1 1 0 69 humidity-to-location map: 60 56 37 56 93 4
The almanac starts by listing which seeds need to be planted: seeds
79,14,55, and13.The rest of the almanac contains a list of maps which describe how to convert numbers from a source category into numbers in a destination category. That is, the section that starts with
seed-to-soil map: describes how to convert a seed number (the source) to a soil number (the destination). This lets the gardener and his team know which soil to use with which seeds, which water to use with which fertilizer, and so on.Rather than list every source number and its corresponding destination number one by one, the maps describe entire ranges of numbers that can be converted. Each line within a map contains three numbers: the destination range start, the source range start, and the range length.
Consider again the example
seed-to-soil map:
50 98 2 52 50 48The first line has a destination range start of
50, a source range start of98, and a range length of2. This line means that the source range starts at98and contains two values:98and99. The destination range is the same length, but it starts at50, so its two values are50and51. With this information, you know that seed number98corresponds to soil number50and that seed number99corresponds to soil number51. o The second line means that the source range starts at50and contains48values:50,51, …,96,97. This corresponds to a destination range starting at52and also containing48values:52,53, …,98,99. So, seed number53corresponds to soil number55.Any source numbers that aren’t mapped correspond to the same destination number. So, seed number
10corresponds to soil number10.So, the entire list of seed numbers and their corresponding soil numbers looks like this:
seed soil 0 0 1 1 ... ... 48 48 49 49 50 52 51 53 ... ... 96 98 97 99 98 50 99 51
With this map, you can look up the soil number required for each initial seed number:
- Seed number
79corresponds to soil number81.- Seed number
14corresponds to soil number14.- Seed number
55corresponds to soil number57.- Seed number
13corresponds to soil number13.The gardener and his team want to get started as soon as possible, so they’d like to know the closest location that needs a seed. Using these maps, find the lowest location number that corresponds to any of the initial seeds. To do this, you’ll need to convert each seed number through other categories until you can find its corresponding location number. In this example, the corresponding types are:
- Seed
79, soil81, fertilizer81, water81, light74, temperature78, humidity78, location82.- Seed
14, soil14, fertilizer53, water49, light42, temperature42, humidity43, location43.- Seed
55, soil57, fertilizer57, water53, light46, temperature82, humidity82, location86.- Seed
13, soil13, fertilizer52, water41, light34, temperature34, humidity35, location35.So, the lowest location number in this example is
35.What is the lowest location number that corresponds to any of the initial seed numbers?
My input file: 2023-12-05-1-aoc.txt
Wow, this task is a mouthful…
Let’s start slowly and load the data. Our input text document contains several
maps, which are clearly separated and have a title (seed-to-soil map etc). So
we can tell pandas where each map starts and give each map a dataframe. I got
the values for the skiprows and nrows argument by looking at the input file
and… counting
:)
import pandas as pd
import sys
seeds = pd.read_table('data/2023-12-05-1-aoc.txt', nrows=1, sep=' ',
header=None, index_col=0)
seeds = seeds.values.flatten()
map_opt = {'filepath_or_buffer': 'data/2023-12-05-1-aoc.txt',
'header': None, 'sep': ' ', 'dtype': 'Int64',
'names': ['dest_start', 'src_start', 'range']}
seed_soil = pd.read_table(skiprows=3, nrows=23, **map_opt)
soil_fert = pd.read_table(skiprows=28, nrows=9, **map_opt)
fert_water = pd.read_table(skiprows=39, nrows=20, **map_opt)
water_light = pd.read_table(skiprows=61, nrows=40, **map_opt)
light_temp = pd.read_table(skiprows=103, nrows=36, **map_opt)
temp_humi = pd.read_table(skiprows=141, nrows=35, **map_opt)
humi_loc = pd.read_table(skiprows=178, nrows=26, **map_opt)
maps = (seed_soil, soil_fert, fert_water, water_light, light_temp,
temp_humi, humi_loc)
print('seeds are just a numpy array:')
display(seeds)
print('The "humidity-to-location" map as an example:')
humi_loc
| dest_start | src_start | range | |
|---|---|---|---|
| 0 | 3490144003 | 1623866227 | 218040905 |
| 1 | 1709610578 | 1620839197 | 3027030 |
| 2 | 105449249 | 586389428 | 113279526 |
| 3 | 1899604338 | 2167886292 | 348178199 |
| 4 | 1712637608 | 2678624215 | 186966730 |
| 5 | 218728775 | 0 | 245776251 |
| 6 | 2472992580 | 923734334 | 143670388 |
| 7 | 2616662968 | 3670169885 | 15297294 |
| 8 | 2247782537 | 1395629154 | 225210043 |
| 9 | 0 | 480940179 | 105449249 |
| 10 | 4113852846 | 3959729057 | 159909096 |
| 11 | 3322784653 | 1067404722 | 167359350 |
| 12 | 923734334 | 4119638153 | 175329143 |
| 13 | 1534964496 | 2516064491 | 162559724 |
| 14 | 2631960262 | 3496028440 | 140849733 |
| 15 | 2862695906 | 1234764072 | 97988355 |
| 16 | 1697524220 | 3636878173 | 12086358 |
| 17 | 2985646048 | 1332752427 | 62876727 |
| 18 | 1099063477 | 2970241510 | 435901019 |
| 19 | 4009202281 | 2865590945 | 104650565 |
| 20 | 2960684261 | 2142924505 | 24961787 |
| 21 | 2772809995 | 3406142529 | 89885911 |
| 22 | 4273761942 | 3648964531 | 21205354 |
| 23 | 3708184908 | 1841907132 | 301017373 |
| 24 | 464505026 | 245776251 | 235163928 |
| 25 | 3048522775 | 3685467179 | 274261878 |
My first attempt was to actually construct ranges like in the example above,
mapping out all possible sources and destinations. Python quickly informed me
that even constructing one pandas.Series with int64 values mapping seeds to
soil would cost 64GB memory - not the best solution.
So we take a different approach. For convenience, let’s add a src_end and a
dest_end column to our maps:
for df in maps:
df['src_end'] = df.loc[:, 'src_start'] + df.loc[:, 'range']
df['dest_end'] = df.loc[:, 'dest_start'] + df.loc[:, 'range']
print('Again the "humidity-to-location" map as an example:')
humi_loc
| dest_start | src_start | range | src_end | dest_end | |
|---|---|---|---|---|---|
| 0 | 3490144003 | 1623866227 | 218040905 | 1841907132 | 3708184908 |
| 1 | 1709610578 | 1620839197 | 3027030 | 1623866227 | 1712637608 |
| 2 | 105449249 | 586389428 | 113279526 | 699668954 | 218728775 |
| 3 | 1899604338 | 2167886292 | 348178199 | 2516064491 | 2247782537 |
| 4 | 1712637608 | 2678624215 | 186966730 | 2865590945 | 1899604338 |
| 5 | 218728775 | 0 | 245776251 | 245776251 | 464505026 |
| 6 | 2472992580 | 923734334 | 143670388 | 1067404722 | 2616662968 |
| 7 | 2616662968 | 3670169885 | 15297294 | 3685467179 | 2631960262 |
| 8 | 2247782537 | 1395629154 | 225210043 | 1620839197 | 2472992580 |
| 9 | 0 | 480940179 | 105449249 | 586389428 | 105449249 |
| 10 | 4113852846 | 3959729057 | 159909096 | 4119638153 | 4273761942 |
| 11 | 3322784653 | 1067404722 | 167359350 | 1234764072 | 3490144003 |
| 12 | 923734334 | 4119638153 | 175329143 | 4294967296 | 1099063477 |
| 13 | 1534964496 | 2516064491 | 162559724 | 2678624215 | 1697524220 |
| 14 | 2631960262 | 3496028440 | 140849733 | 3636878173 | 2772809995 |
| 15 | 2862695906 | 1234764072 | 97988355 | 1332752427 | 2960684261 |
| 16 | 1697524220 | 3636878173 | 12086358 | 3648964531 | 1709610578 |
| 17 | 2985646048 | 1332752427 | 62876727 | 1395629154 | 3048522775 |
| 18 | 1099063477 | 2970241510 | 435901019 | 3406142529 | 1534964496 |
| 19 | 4009202281 | 2865590945 | 104650565 | 2970241510 | 4113852846 |
| 20 | 2960684261 | 2142924505 | 24961787 | 2167886292 | 2985646048 |
| 21 | 2772809995 | 3406142529 | 89885911 | 3496028440 | 2862695906 |
| 22 | 4273761942 | 3648964531 | 21205354 | 3670169885 | 4294967296 |
| 23 | 3708184908 | 1841907132 | 301017373 | 2142924505 | 4009202281 |
| 24 | 464505026 | 245776251 | 235163928 | 480940179 | 699668954 |
| 25 | 3048522775 | 3685467179 | 274261878 | 3959729057 | 3322784653 |
Now we actually compute the mapping. For each seed, we go through all mappings and in each mapping we go through each row. We find the row which contains the mapping and exctract the destination, which is the source for the next map until we reach the last map which gives us the locations.
- Approach 1:
df.itertuples()is a convenient way to step through apandas.DataFramein this example. It is faster thandf.iterrows()and returns the row as aNamedTuple- nice! - Approach 2: I actually wondered if it would be faster to get all maps in one
pd.DataFrameand then iterate through the mappings. To test this let’s construct a new DataFramemaps_dfwhich contains allmaps. Since the maps have different lengths it is important to cast the datatype toInt64, which is short forpd.Int64Dtype()and keeps values as integers, even ifNAvalues are in the same column. - Approach 3: A third alternative I tested (not shown here) was to check if a
value is in a Python
rangewith theinoperator as in:if 3 in range(5):.... This was way too slow.
# mapping version 1
def get_location(seed):
current = seed
for df in maps:
current_map = [row
for row in df.itertuples()
if ((current > row.src_start)
and (current < row.src_end))]
if len(current_map) == 0:
pass
elif len(current_map) == 1:
current = (current_map[0].dest_start
+ (current - current_map[0].src_start))
else:
raise ValueError('This should not happen!')
return current
# mapping version 2 - around 10 times slower
# you need to rename the maps_df columns so that they have a unique id
# e.g. 'src_start_1', 'src_start_2' etc
# maps_df = pd.concat(maps, axis='columns')
# def get_dest(i, src):
# return (maps_df[(src > maps_df.loc[:, f'src_start_{i}']) &
# (src < maps_df.loc[:, f'src_end_{i}'])]
# .loc[:, f'dest_start_{i}']
# .iloc[0])
# def get_location2(seed):
# dest = seed
# i = 1
# while i < 7:
# dest = get_dest(i, dest)
# i += 1
# return dest
%timeit get_location(seeds[0])
Now let’s get a list of locations:
locations = [get_location(s) for s in seeds]
locations
Lastly, just get the minimum of all location values.
min(locations)
Everyone will starve if you only plant such a small number of seeds. Re-reading the almanac, it looks like the
seeds:line actually describes ranges of seed numbers.The values on the initial
seeds:line come in pairs. Within each pair, the first value is the start of the range and the second value is the length of the range. So, in the first line of the example above:
seeds: 79 14 55 13This line describes two ranges of seed numbers to be planted in the garden. The first range starts with seed number
79and contains14values:79,80, …,91,92. The second range starts with seed number55and contains13values:55,56, …,66,67.Now, rather than considering four seed numbers, you need to consider a total of
27seed numbers.In the above example, the lowest location number can be obtained from seed number
82, which corresponds to soil84, fertilizer84, water84, light77, temperature45, humidity46, and location46. So, the lowest location number is46.Consider all of the initial seed numbers listed in the ranges on the first line of the almanac. What is the lowest location number that corresponds to any of the initial seed numbers?
Let’s first construct a dataframe containing the range of seeds:
seeds_df = pd.DataFrame({'start': seeds[::2],
'range': seeds[1::2],
'end': seeds[::2] + seeds[1::2]})
print(f'There are {sum(seeds_df.loc[:, "range"]):_} seeds in total')
seeds_df
| start | range | end | |
|---|---|---|---|
| 0 | 3169137700 | 271717609 | 3440855309 |
| 1 | 3522125441 | 23376095 | 3545501536 |
| 2 | 1233948799 | 811833837 | 2045782636 |
| 3 | 280549587 | 703867355 | 984416942 |
| 4 | 166086528 | 44766996 | 210853524 |
| 5 | 2326968141 | 69162222 | 2396130363 |
| 6 | 2698492851 | 14603069 | 2713095920 |
| 7 | 2755327667 | 348999531 | 3104327198 |
| 8 | 2600461189 | 92332846 | 2692794035 |
| 9 | 1054656969 | 169099767 | 1223756736 |
Now - I really needed some time to finally realize, that going through all seed values is really unfeasable. So how do we deal with this problem?
In the end we need the lowest location number - thus our approach is to take
the humi_loc map, start with the lowest location number and go up and get the
corresponding seed values. The location of the first seed value which is inside
seeds_df is our solution.
So first we rebuild the get_location function to a get_seed function (we
reverse the maps tuple with maps[::-1] and switch src and dest).
def get_seed(location):
current = location
for df in maps[::-1]:
current_map = [row
for row in df.itertuples()
if ((current >= row.dest_start)
and (current < row.dest_end))]
if len(current_map) == 0:
pass
elif len(current_map) == 1:
current = (current_map[0].src_start
+ (current - current_map[0].dest_start))
else:
raise ValueError('This should not happen!')
return current
Since the Python 3 range function does not store it’s contents in memory
(similar to a generator), it is well suited to go through these large ranges.
Lastly, we deal with the sheer amount of possible values by performing a grid
search. First, we check every millionth location. After the first match, we
stop this search and check the last million locations before the match with a
finer grid and so on. The last grid is just 1, so we find our lowest location.
Update: We order our locations from smallest to largest with sort_values and
go through them - but the search stops after the first match, since that will be
our lowest location.
def grid_search(start: int, end: int, grid: list[int]):
success = False
for g in grid:
print(f'Start search with grid={g}')
for l in range(start, end, g):
current = get_seed(l)
if any((current >= seeds_df.loc[:, 'start']) & (current < seeds_df.loc[:, 'end'])):
print(f'location {l} is the lowest which contains one of the given seeds ({current})')
start = l - g
end = l
success = True
break
return success
for row in humi_loc.sort_values('dest_start').itertuples():
success = grid_search(start=row.dest_start, end=row.dest_end,
grid=[1_000_000, 100_000, 1000, 1])
if success:
print('Finished search')
break
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 5 - siehe https:adventofcode.com/2023/day/5
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 6 - see https:adventofcode.com/2023/day/6
You manage to sign up as a competitor in the boat races just in time. The organizer explains that it’s not really a traditional race - instead, you will get a fixed amount of time during which your boat has to travel as far as it can, and you win if your boat goes the farthest.
As part of signing up, you get a sheet of paper (your puzzle input) that lists the time allowed for each race and also the best distance ever recorded in that race. To guarantee you win the grand prize, you need to make sure you go farther in each race than the current record holder.
The organizer brings you over to the area where the boat races are held. The boats are much smaller than you expected - they’re actually toy boats, each with a big button on top. Holding down the button charges the boat, and releasing the button allows the boat to move. Boats move faster if their button was held longer, but time spent holding the button counts against the total race time. You can only hold the button at the start of the race, and boats don’t move until the button is released.
For example:
Time: 7 15 30 Distance: 9 40 200
This document describes three races:
- The first race lasts 7 milliseconds. The record distance in this race is 9 millimeters.
- The second race lasts 15 milliseconds. The record distance in this race is 40 millimeters.
- The third race lasts 30 milliseconds. The record distance in this race is 200 millimeters.
Your toy boat has a starting speed of zero millimeters per millisecond. For each whole millisecond you spend at the beginning of the race holding down the button, the boat’s speed increases by one millimeter per millisecond.
So, because the first race lasts 7 milliseconds, you only have a few options:
- Don’t hold the button at all (that is, hold it for =0= milliseconds) at the start of the race. The boat won’t move; it will have traveled =0= millimeters by the end of the race.
- Hold the button for =1= millisecond at the start of the race. Then, the boat will travel at a speed of 1 millimeter per millisecond for 6 milliseconds, reaching a total distance traveled of =6= millimeters.
- Hold the button for =2= milliseconds, giving the boat a speed of 2 millimeters per millisecond. It will then get 5 milliseconds to move, reaching a total distance of =10= millimeters.
- Hold the button for =3= milliseconds. After its remaining 4 milliseconds of travel time, the boat will have gone =12= millimeters.
- Hold the button for =4= milliseconds. After its remaining 3 milliseconds of travel time, the boat will have gone =12= millimeters.
- Hold the button for =5= milliseconds, causing the boat to travel a total of =10= millimeters.
- Hold the button for =6= milliseconds, causing the boat to travel a total of =6= millimeters.
- Hold the button for =7= milliseconds. That’s the entire duration of the race. You never let go of the button. The boat can’t move until you let go of the button. Please make sure you let go of the button so the boat gets to move. =0= millimeters.
Since the current record for this race is
9millimeters, there are actually =4= different ways you could win: you could hold the button for2,3,4, or5milliseconds at the start of the race.In the second race, you could hold the button for at least
4milliseconds and at most11milliseconds and beat the record, a total of =8= different ways to win.In the third race, you could hold the button for at least
11milliseconds and no more than19milliseconds and still beat the record, a total of =9= ways you could win.To see how much margin of error you have, determine the number of ways you can beat the record in each race; in this example, if you multiply these values together, you get =288= (
4*8*9).Determine the number of ways you could beat the record in each race. What do you get if you multiply these numbers together?
My input file: 2023-12-06-1-aoc.txt
The solution for this problem is shorter than the task description!
As always - let’s load the data. Note: we use the delim_whitesspace argument
instead of sep=' ' to separate values by spaces of any length.
import pandas as pd
import numpy as np
race = pd.read_table('data/2023-12-06-1-aoc.txt', delim_whitespace=True,
header=None, index_col=0)
race.index = ['time', 'distance']
race
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| time | 51 | 92 | 68 | 90 |
| distance | 222 | 2031 | 1126 | 1225 |
The we use Python list comprehensions to quickly calculate all distances for the
range of the time value (and keep only the winning distances). Then we get the
respective lengths and compute the product.
def get_distance(time, distance):
return [t * (time - t) for t in range(1, time + 1)
if (t * (time - t)) > distance]
race.loc['nwins', :] = [len(get_distance(int(r.time), int(r.distance)))
for c, r in race.items()]
nwins_prod = np.prod(race.loc['nwins', :].values, dtype=int)
display(race)
print(f'The product of our nwins is {nwins_prod}')
| 1 | 2 | 3 | 4 | |
|---|---|---|---|---|
| time | 51.0 | 92.0 | 68.0 | 90.0 |
| distance | 222.0 | 2031.0 | 1126.0 | 1225.0 |
| nwins | 42.0 | 19.0 | 11.0 | 57.0 |
The product of our nwins is 500346
As the race is about to start, you realize the piece of paper with race times and record distances you got earlier actually just has very bad kerning. There’s really only one race - ignore the spaces between the numbers on each line.
So, the example from before:
Time: 7 15 30 Distance: 9 40 200
…now instead means this:
Time: 71530 Distance: 940200
Now, you have to figure out how many ways there are to win this single race. In this example, the race lasts for =71530= milliseconds and the record distance you need to beat is =940200= millimeters. You could hold the button anywhere from
14to71516milliseconds and beat the record, a total of =71503= ways!How many ways can you beat the record in this one much longer race?
For this riddle we can take the same approach as for Part 1. The cell took 18s to evaluate on a consumer laptop from 2015 - no fancy workarounds needed :)
racetime = int(''.join(str(r) for r in race.loc['time', :4].astype(int)))
racedist = int(''.join(str(r) for r in race.loc['distance', :4].astype(int)))
racewins = len(get_distance(racetime, racedist))
race[5] = [racetime, racedist, racewins]
race
| 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|
| time | 51.0 | 92.0 | 68.0 | 90.0 | 51926890 |
| distance | 222.0 | 2031.0 | 1126.0 | 1225.0 | 222203111261225 |
| nwins | 42.0 | 19.0 | 11.0 | 57.0 | 42515755 |
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 6 - siehe https:adventofcode.com/2023/day/6
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
This year I try to record my attempt at solving the Advent of Code 2023 riddles. This is Day 7 - see https:adventofcode.com/2023/day/7
In Camel Cards, you get a list of hands, and your goal is to order them based on the strength of each hand. A hand consists of five cards labeled one of
A,K,Q,J,T,9,8,7,6,5,4,3, or2. The relative strength of each card follows this order, whereAis the highest and2is the lowest.Every hand is exactly one type. From strongest to weakest, they are:
- Five of a kind, where all five cards have the same label:
AAAAA- Four of a kind, where four cards have the same label and one card has a different label:
AA8AA- Full house, where three cards have the same label, and the remaining two cards share a different label:
23332- Three of a kind, where three cards have the same label, and the remaining two cards are each different from any other card in the hand:
TTT98- Two pair, where two cards share one label, two other cards share a second label, and the remaining card has a third label:
23432- One pair, where two cards share one label, and the other three cards have a different label from the pair and each other:
A23A4- High card, where all cards’ labels are distinct:
23456Hands are primarily ordered based on type; for example, every full house is stronger than any three of a kind.
If two hands have the same type, a second ordering rule takes effect. Start by comparing the first card in each hand. If these cards are different, the hand with the stronger first card is considered stronger. If the first card in each hand have the same label, however, then move on to considering the second card in each hand. If they differ, the hand with the higher second card wins; otherwise, continue with the third card in each hand, then the fourth, then the fifth.
So,
33332and2AAAAare both four of a kind hands, but33332is stronger because its first card is stronger. Similarly,77888and77788are both a full house, but77888is stronger because its third card is stronger (and both hands have the same first and second card).To play Camel Cards, you are given a list of hands and their corresponding bid (your puzzle input). For example:
32T3K 765 T55J5 684 KK677 28 KTJJT 220 QQQJA 483
This example shows five hands; each hand is followed by its bid amount. Each hand wins an amount equal to its bid multiplied by its rank, where the weakest hand gets rank 1, the second-weakest hand gets rank 2, and so on up to the strongest hand. Because there are five hands in this example, the strongest hand will have rank 5 and its bid will be multiplied by 5.
So, the first step is to put the hands in order of strength:
32T3Kis the only one pair and the other hands are all a stronger type, so it gets rank 1.KK677andKTJJTare both two pair. Their first cards both have the same label, but the second card ofKK677is stronger (KvsT), soKTJJTgets rank 2 andKK677gets rank 3.T55J5andQQQJAare both three of a kind.QQQJAhas a stronger first card, so it gets rank 5 andT55J5gets rank 4.Now, you can determine the total winnings of this set of hands by adding up the result of multiplying each hand’s bid with its rank (
765* 1 +220* 2 +28* 3 +684* 4 +483* 5). So the total winnings in this example are =6440=.Find the rank of every hand in your set. What are the total winnings?
My input file: 2023-12-07-1-aoc.txt
As always - let’s read in the data.
import pandas as pd
camel = pd.read_table('data/2023-12-07-1-aoc.txt', delim_whitespace=True,
header=None, names=['cards', 'bid'],
dtype={'cards': 'string', 'bid': 'Int64'})
camel
| cards | bid | |
|---|---|---|
| 0 | A833A | 309 |
| 1 | Q33J3 | 205 |
| 2 | 55KK5 | 590 |
| 3 | K4457 | 924 |
| 4 | Q3QT3 | 11 |
| … | … | … |
| 995 | KQJ53 | 367 |
| 996 | 57866 | 537 |
| 997 | Q94A9 | 210 |
| 998 | J448A | 903 |
| 999 | 6J66J | 114 |
1000 rows × 2 columns
For sorting, we need to compute a type column. The 7 types can be deduced
based on how many different cards are there (through set()) and the number of
cards.
Also, for the next step we prepare one column per card string in the cards.
def get_type(cards):
cardset = set(cards)
cardlen = len(cardset)
cardlist = list(cards)
cardnum = [cardlist.count(i) for i in cardset]
if cardlen == 1:
ctype = 1
elif cardlen in (4, 5):
ctype = cardlen + 2
elif cardlen == 2:
ctype = 2 if set(cardnum) == {1, 4} else 3
else:
ctype = 4 if 3 in cardnum else 5
return ctype
camel['type'] = camel.loc[:, 'cards'].apply(lambda x: get_type(x))
for i in range(5):
camel[f'c{i}'] = camel.loc[:, 'cards'].apply(lambda x: list(x)[i])
camel
| cards | bid | type | c0 | c1 | c2 | c3 | c4 | |
|---|---|---|---|---|---|---|---|---|
| 0 | A833A | 309 | 5 | A | 8 | 3 | 3 | A |
| 1 | Q33J3 | 205 | 4 | Q | 3 | 3 | J | 3 |
| 2 | 55KK5 | 590 | 3 | 5 | 5 | K | K | 5 |
| 3 | K4457 | 924 | 6 | K | 4 | 4 | 5 | 7 |
| 4 | Q3QT3 | 11 | 5 | Q | 3 | Q | T | 3 |
| … | … | … | … | … | … | … | … | … |
| 995 | KQJ53 | 367 | 7 | K | Q | J | 5 | 3 |
| 996 | 57866 | 537 | 6 | 5 | 7 | 8 | 6 | 6 |
| 997 | Q94A9 | 210 | 6 | Q | 9 | 4 | A | 9 |
| 998 | J448A | 903 | 6 | J | 4 | 4 | 8 | A |
| 999 | 6J66J | 114 | 3 | 6 | J | 6 | 6 | J |
1000 rows × 8 columns
For sorting, pandas contributes the sort_values method, which supports
sorting by multiple columns (here, we first sort according to type, then c0,
c1, …). For easier sorting by numerical values, we convert the values using
the card_dict dictionary, which implements the given sorting order.
card_dict = {c: i for i, c in enumerate([7, 6, 5, 4, 3, 2, 1, '2', '3',
'4', '5', '6', '7', '8', '9',
'T', 'J', 'Q', 'K', 'A'])}
camel = camel.sort_values(by=['type', 'c0', 'c1', 'c2', 'c3', 'c4'],
ignore_index=True,
key=lambda x: x.map(card_dict))
camel.index += 1
camel
| cards | bid | type | c0 | c1 | c2 | c3 | c4 | |
|---|---|---|---|---|---|---|---|---|
| 1 | 237AQ | 157 | 7 | 2 | 3 | 7 | A | Q |
| 2 | 23K47 | 759 | 7 | 2 | 3 | K | 4 | 7 |
| 3 | 249K8 | 612 | 7 | 2 | 4 | 9 | K | 8 |
| 4 | 264A7 | 341 | 7 | 2 | 6 | 4 | A | 7 |
| 5 | 26578 | 10 | 7 | 2 | 6 | 5 | 7 | 8 |
| … | … | … | … | … | … | … | … | … |
| 996 | AA5AA | 565 | 2 | A | A | 5 | A | A |
| 997 | AAATA | 704 | 2 | A | A | A | T | A |
| 998 | AAAA3 | 145 | 2 | A | A | A | A | 3 |
| 999 | AAAAJ | 594 | 2 | A | A | A | A | J |
| 1000 | JJJJJ | 219 | 1 | J | J | J | J | J |
1000 rows × 8 columns
Lastly, we multiply the bid with the index and sum the result.
camel.loc[:, 'bid'].mul(camel.index).sum()
To make things a little more interesting, the Elf introduces one additional rule. Now,
Jcards are jokers - wildcards that can act like whatever card would make the hand the strongest type possible.To balance this, =J= cards are now the weakest individual cards, weaker even than
2. The other cards stay in the same order:A,K,Q,T,9,8,7,6,5,4,3,2,J.
Jcards can pretend to be whatever card is best for the purpose of determining hand type; for example,QJJQ2is now considered four of a kind. However, for the purpose of breaking ties between two hands of the same type,Jis always treated asJ, not the card it’s pretending to be:JKKK2is weaker thanQQQQ2becauseJis weaker thanQ.Now, the above example goes very differently:
32T3K 765 T55J5 684 KK677 28 KTJJT 220 QQQJA 483
32T3Kis still the only one pair; it doesn’t contain any jokers, so its strength doesn’t increase.KK677is now the only two pair, making it the second-weakest hand.T55J5,KTJJT, andQQQJAare now all four of a kind!T55J5gets rank 3,QQQJAgets rank 4, andKTJJTgets rank 5.With the new joker rule, the total winnings in this example are =5905=.
Using the new joker rule, find the rank of every hand in your set. What are the new total winnings?
What changes? We have to update our get_type() function and our card_dict
dictionary, otherwise the method stays the same!
Let’s first write get_type_joker(): if no joker is in the cards, we can just
refer to get_type(). If we have 1, 2 or 5 different card faces, the answer is
trivial. We only have to use the number of jokers and the maximum number of
other faces if we have 3 or 4 different faces.
Note: if a J is present, there can never be high card (type = 7), because
we always can get one pair. Also we can never get two pair (type = 5),
because we can always construct three of a kind or full house.
def get_type_joker(cards):
cardset = set(cards)
cardlen = len(cardset)
cardlist = list(cards)
cardnum = [cardlist.count(i) for i in cardset]
if not 'J' in cardset:
ctype = get_type(cards)
else:
idxj = list(cardset).index('J')
numj = cardnum[idxj]
cardnum.pop(idxj)
if cardlen in [1, 2]:
ctype = 1
elif cardlen == 5:
ctype = 6
else:
if (numj in [1, 2]) and (cardlen == 4):
ctype = 4
else:
ctype = 6 - max(cardnum) - numj
return ctype
camel['type'] = camel.loc[:, 'cards'].apply(lambda x: get_type_joker(x))
camel
| cards | bid | type | c0 | c1 | c2 | c3 | c4 | |
|---|---|---|---|---|---|---|---|---|
| 1 | 237AQ | 157 | 7 | 2 | 3 | 7 | A | Q |
| 2 | 23K47 | 759 | 7 | 2 | 3 | K | 4 | 7 |
| 3 | 249K8 | 612 | 7 | 2 | 4 | 9 | K | 8 |
| 4 | 264A7 | 341 | 7 | 2 | 6 | 4 | A | 7 |
| 5 | 26578 | 10 | 7 | 2 | 6 | 5 | 7 | 8 |
| … | … | … | … | … | … | … | … | … |
| 996 | AA5AA | 565 | 2 | A | A | 5 | A | A |
| 997 | AAATA | 704 | 2 | A | A | A | T | A |
| 998 | AAAA3 | 145 | 2 | A | A | A | A | 3 |
| 999 | AAAAJ | 594 | 1 | A | A | A | A | J |
| 1000 | JJJJJ | 219 | 1 | J | J | J | J | J |
1000 rows × 8 columns
Now, we just move J in our new card_dict_joker to where it belongs, the rest
of the solution is the same.
card_dict_joker = {c: i for i, c in enumerate(
[7, 6, 5, 4, 3, 2, 1, 'J', '2', '3', '4', '5', '6', '7', '8', '9',
'T', 'Q', 'K', 'A'])}
camel = camel.sort_values(by=['type', 'c0', 'c1', 'c2', 'c3', 'c4'],
ignore_index=True,
key=lambda x: x.map(card_dict_joker))
camel.index += 1
display(camel)
print('Our new solution: ', camel.loc[:, 'bid'].mul(camel.index).sum())
| cards | bid | type | c0 | c1 | c2 | c3 | c4 | |
|---|---|---|---|---|---|---|---|---|
| 1 | 237AQ | 157 | 7 | 2 | 3 | 7 | A | Q |
| 2 | 23K47 | 759 | 7 | 2 | 3 | K | 4 | 7 |
| 3 | 249K8 | 612 | 7 | 2 | 4 | 9 | K | 8 |
| 4 | 264A7 | 341 | 7 | 2 | 6 | 4 | A | 7 |
| 5 | 26578 | 10 | 7 | 2 | 6 | 5 | 7 | 8 |
| … | … | … | … | … | … | … | … | … |
| 996 | TJTTJ | 609 | 1 | T | J | T | T | J |
| 997 | QQJQJ | 131 | 1 | Q | Q | J | Q | J |
| 998 | QQQJQ | 831 | 1 | Q | Q | Q | J | Q |
| 999 | KKKJK | 183 | 1 | K | K | K | J | K |
| 1000 | AAAAJ | 594 | 1 | A | A | A | A | J |
1000 rows × 8 columns
Our new solution: 246894760
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2023. Dies ist Tag 7 - siehe https:adventofcode.com/2023/day/7
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
I try to solve this year’s Advent of Code 2024 riddles in R. This is Day 1 - see https:adventofcode.com/2024/day/1
Throughout the Chief’s office, the historically significant locations are listed not by name but by a unique number called the location ID. To make sure they don’t miss anything, The Historians split into two groups, each searching the office and trying to create their own complete list of location IDs.
There’s just one problem: by holding the two lists up side by side (your puzzle input), it quickly becomes clear that the lists aren’t very similar. Maybe you can help The Historians reconcile their lists?
For example:
3 4 4 3 2 5 1 3 3 9 3 3
Maybe the lists are only off by a small amount! To find out, pair up the numbers and measure how far apart they are. Pair up the smallest number in the left list with the smallest number in the right list, then the second-smallest left number with the second-smallest right number, and so on.
Within each pair, figure out how far apart the two numbers are; you’ll need to add up all of those distances. For example, if you pair up a
3from the left list with a7from the right list, the distance apart is4; if you pair up a9with a3, the distance apart is6.In the example list above, the pairs and distances would be as follows:
- The smallest number in the left list is
1, and the smallest number in the right list is3. The distance between them is =2=.- The second-smallest number in the left list is
2, and the second-smallest number in the right list is another3. The distance between them is =1=.- The third-smallest number in both lists is
3, so the distance between them is =0=.- The next numbers to pair up are
3and4, a distance of =1=.- The fifth-smallest numbers in each list are
3and5, a distance of =2=.- Finally, the largest number in the left list is
4, while the largest number in the right list is9; these are a distance =5= apart.To find the total distance between the left list and the right list, add up the distances between all of the pairs you found. In the example above, this is
2 + 1 + 0 + 1 + 2 + 5, a total distance of =11=!Your actual left and right lists contain many location IDs. What is the total distance between your lists?
My input file: 2024-12-01-1-aoc.txt
First, we read the data - the fread command from the data.table package is
more versatile than read.delim from base and directly reads the data as a
data.table, which has some benefits.
data <- data.table::fread("2024-12-01-1-aoc.txt", header = FALSE)
head(data)
| V1 | V2 |
|---|---|
| <int> | <int> |
| 41226 | 69190 |
| 89318 | 10100 |
| 59419 | 23880 |
| 63157 | 20193 |
| 81510 | 22869 |
| 83942 | 63304 |
Now, the idea is to get the ordered versions of V1 and V2.
v1 <- data[order(V1)]$V1
v2 <- data[order(V2)]$V2
head(v1)
Coming from Python, this syntax looks a bit odd. The documentation of
data.table::setorder helps:
setorder (and setorderv) reorders the rows of a data.table based on the columns (and column order) provided. It reorders the table by reference and is therefore very memory efficient. Note that queries like x[order(.)] are optimised internally to use data.table's fast order.
So, data[order(V1)] is actually short for data.table::setorder(data, V1).
Then, we extract the vector by name using the $ operator, which allows to
extract elements by name.
head(data.table::setorder(data, V1)$V1)
The actual computation is just the sum of the absolute difference:
sum(abs(v1 - v2))
This time, you’ll need to figure out exactly how often each number from the left list appears in the right list. Calculate a total similarity score by adding up each number in the left list after multiplying it by the number of times that number appears in the right list.
Here are the same example lists again:
3 4 4 3 2 5 1 3 3 9 3 3
For these example lists, here is the process of finding the similarity score:
- The first number in the left list is
3. It appears in the right list three times, so the similarity score increases by3 * 3 = 9.- The second number in the left list is
4. It appears in the right list once, so the similarity score increases by4 * 1 = 4.- The third number in the left list is
2. It does not appear in the right list, so the similarity score does not increase (2 * 0 = 0).- The fourth number,
1, also does not appear in the right list.- The fifth number,
3, appears in the right list three times; the similarity score increases by =9=.- The last number,
3, appears in the right list three times; the similarity score again increases by =9=.So, for these example lists, the similarity score at the end of this process is =31= (
9 + 4 + 0 + 0 + 9 + 9).Once again consider your left and right lists. What is their similarity score?
My idea here is simple - we first count the occurrences of V2 to be able to
check V1 against them. Here, applying the table command works like
pandas.value_counts() and achieves this. We convert the output to a
data.frame and assign the V1 values as row names. Note: table converts the
values which are counted to string, e.g. "10019" instead of 10019.
tab <- table(v2)
v2series <- data.frame(c(tab), row.names = names(tab))
head(v2series)
v2series["10019", ]
| c.tab. | |
|---|---|
| <int> | |
| 10019 | 1 |
| 10100 | 1 |
| 10206 | 1 |
| 10428 | 1 |
| 10645 | 1 |
| 10972 | 1 |
1
Now, we loop through V1 and try to access the frequency in V2 (by converting
to string first, via as.character). If nothing is found in V2, an NA value
is returned. For everything that is not NA, we compute the similarity score,
append it to a list, and sum in the end. Note: a list in R can not directly be
summed (I don’t know why that is) - so we have to unlist first.
sim <- list()
for (num in v1) {
myfreq <- v2series[as.character(num), ]
if (!is.na(myfreq)) {
score <- myfreq * num
sim <- append(sim, score)
}
}
sum(unlist(sim))
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2024 in der auf Statistik fokussierten Programmiersprache R. Dies ist Tag 1 - siehe https:adventofcode.com/2024/day/1
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
I try to solve this year’s Advent of Code 2024 riddles in R. This is Day 2 - see https:adventofcode.com/2024/day/2
The unusual data (your puzzle input) consists of many reports, one report per line. Each report is a list of numbers called levels that are separated by spaces. For example:
7 6 4 2 1 1 2 7 8 9 9 7 6 2 1 1 3 2 4 5 8 6 4 4 1 1 3 6 7 9
This example data contains six reports each containing five levels.
The engineers are trying to figure out which reports are safe. The Red-Nosed reactor safety systems can only tolerate levels that are either gradually increasing or gradually decreasing. So, a report only counts as safe if both of the following are true:
- The levels are either all increasing or all decreasing.
- Any two adjacent levels differ by at least one and at most three.
In the example above, the reports can be found safe or unsafe by checking those rules:
7 6 4 2 1: Safe because the levels are all decreasing by 1 or 2.1 2 7 8 9: Unsafe because2 7is an increase of 5.9 7 6 2 1: Unsafe because6 2is a decrease of 4.1 3 2 4 5: Unsafe because1 3is increasing but3 2is decreasing.8 6 4 4 1: Unsafe because4 4is neither an increase or a decrease.1 3 6 7 9: Safe because the levels are all increasing by 1, 2, or 3.So, in this example, =2= reports are safe.
Analyze the unusual data from the engineers. How many reports are safe?
My input file: 2024-12-02-1-aoc.txt
First, let’s load the data and take a look:
data <- data.table::fread("2024-12-02-1-aoc.txt", header = FALSE, fill = TRUE)
head(data)
| V1 | V2 | V3 | V4 | V5 | V6 | V7 | V8 |
|---|---|---|---|---|---|---|---|
| <int> | <int> | <int> | <int> | <int> | <int> | <int> | <int> |
| 74 | 76 | 78 | 79 | 76 | NA | NA | NA |
| 38 | 40 | 43 | 44 | 44 | NA | NA | NA |
| 1 | 2 | 4 | 6 | 8 | 9 | 13 | NA |
| 65 | 68 | 70 | 72 | 75 | 76 | 81 | NA |
| 89 | 91 | 92 | 95 | 93 | 94 | NA | NA |
| 15 | 17 | 16 | 18 | 19 | 17 | NA | NA |
My core idea here was to compute the differences between each element in a row -
luckily, R provides an inbuilt diff() function that provides exactly what I
need. Then, we check if all differences are either 1, 2, 3 or -1, -2, -3. Note
that each row has a different number of elements, so we have to omit NA values
via na.omit when applying the check.
isalltrue <- function(rep) {
return(all(diff(rep) %in% c(1, 2, 3)) |
all(diff(rep) %in% c(-1, -2, -3)))
}
isreport <- function(row) {
return(isalltrue(na.omit(row)))
}
valid.reports <- apply(data, 1, isreport)
table(valid.reports)
The engineers are surprised by the low number of safe reports until they realize they forgot to tell you about the Problem Dampener.
The Problem Dampener is a reactor-mounted module that lets the reactor safety systems tolerate a single bad level in what would otherwise be a safe report. It’s like the bad level never happened!
Now, the same rules apply as before, except if removing a single level from an unsafe report would make it safe, the report instead counts as safe.
More of the above example’s reports are now safe:
- -=7 6 4 2 1=: Safe without removing any level.
- -=1 2 7 8 9=: Unsafe regardless of which level is removed.
- -=9 7 6 2 1=: Unsafe regardless of which level is removed.
- -=1 3 2 4 5=: Safe by removing the second level,
3.- -=8 6 4 4 1=: Safe by removing the third level,
4.- -=1 3 6 7 9=: Safe without removing any level.
Thanks to the Problem Dampener, =4= reports are actually safe!
Update your analysis by handling situations where the Problem Dampener can remove a single level from unsafe reports. How many reports are now safe?
So, this task is a good example on how easy it is to get lost in a overly complicated approach. My first approach was basically to somehow check if the general trend was descending or ascending and based on that to remove opposite differences. This was too complicated and in the end I found 4 less safe reports than were needed… (Also note the beautiful print debuggin…)
My actual working approach was in the end brute force. I just threw out each element in a row and checked if the report became positive - much simpler and effective!
isreport2 <- function(row) {
rep <- na.omit(row)
out <- isalltrue(rep)
if (out == FALSE) {
for (i in seq_along(rep)) {
# rep <- rep[-i] does not update the variable
# with rep.i, a new variable is assigned (rep.1, rep.2, ...)
rep.i <- rep[-i]
out <- isalltrue(rep.i)
if (out == TRUE) {
break
}
}
}
return(out)
}
sum(apply(data, 1, isreport2))
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2024 in der auf Statistik fokussierten Programmiersprache R. Dies ist Tag 2 - siehe https:adventofcode.com/2024/day/2
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
I try to solve this year’s Advent of Code 2024 riddles in R. This is Day 3 - see https:adventofcode.com/2024/day/3
The computer appears to be trying to run a program, but its memory (your puzzle input) is corrupted. All of the instructions have been jumbled up!
It seems like the goal of the program is just to multiply some numbers. It does that with instructions like
mul(X,Y), whereXandYare each 1-3 digit numbers. For instance,mul(44,46)multiplies44by46to get a result of2024. Similarly,mul(123,4)would multiply123by4.However, because the program’s memory has been corrupted, there are also many invalid characters that should be ignored, even if they look like part of a
mulinstruction. Sequences likemul(4*,mul(6,9!,?(12,34), ormul ( 2 , 4 )do nothing.For example, consider the following section of corrupted memory:
=xmul(2,4)%&mul[3,7]!@^do_not_mul(5,5)+mul(32,64]then(mul(11,8)mul(8,5)) Only the four highlighted sections are real mul instructions. Adding up the result of each instruction produces 161 (2*4 + 5*5 + 11*8 + 8*5).
Scan the corrupted memory for uncorrupted mul instructions. What do you get if you add up all of the results of the multiplications?
My input file: 2024-12-03-1-aoc.txt
First, let’s load the data and take a look:
library(tidyverse)
connection <- file("2024-12-03-1-aoc.txt", open = "r")
lines <- readLines(connection)
print(lines[1])
mullist <- stringr::str_extract_all(lines, "mul\\(\\d*,\\d*\\)")
mullist <- unlist(mullist, recursive = FALSE)
head(mullist)
df <- mullist |>
data.frame() |>
magrittr::set_colnames(c("muls")) |>
tidyr::separate_wider_regex(
cols = "muls",
patterns = c(
"mul\\(",
mul1 = "\\d*",
",",
mul2 = "\\d*"
),
too_few = "align_start"
) |>
mutate(
mul1 = as.integer(mul1),
mul2 = as.integer(mul2),
result = mul1 * mul2
)
head(df)
| mul1 | mul2 | result |
|---|---|---|
| <int> | <int> | <int> |
| 948 | 148 | 140304 |
| 590 | 32 | 18880 |
| 611 | 372 | 227292 |
| 835 | 665 | 555275 |
| 724 | 851 | 616124 |
| 188 | 482 | 90616 |
sum(df$result)
There are two new instructions you’ll need to handle:
- The
do()instruction enables futuremulinstructions.- The
don't()instruction disables futuremulinstructions.Only the most recent
do()ordon't()instruction applies. At the beginning of the program, mul instructions are enabled.For example:
xmul(2,4)&mul[3,7]!^don't()_mul(5,5)+mul(32,64](mul(11,8)undo()?mul(8,5))This corrupted memory is similar to the example from before, but this time the
mul(5,5)andmul(11,8)instructions are disabled because there is adon't()instruction before them. The othermulinstructions function normally, including the one at the end that gets re-*enabled* by ado()instruction.This time, the sum of the results is =48= (
2*4 + 8*5).Handle the new instructions; what do you get if you add up all of the results of just the enabled multiplications?
https://stackoverflow.com/questions/36056219/regex-match-first-occurrence-before-keyword
lines[[4]]
for(i in 1:6) {
varname <- paste("dolistf", i, sep="")
assign(varname, stringr::str_extract(
lines[[i]], "^(?:.*?)don't\\(\\)"))
varname <- paste("dolistl", i, sep="")
assign(varname, stringi::stri_extract_last_regex(
lines[[i]],
"(do\\(\\)).*?(^don't\\(\\))"))
}
## dolist7 <- stringr::str_extract_all(lines, "do\\(\\)(?:.*?)don't\\(\\)")
dolist7 <- stringr::str_extract_all(lines,
"^(?:.*?)don't\\(\\)")
dolist8 <- stringr::str_extract_all(lines,
"(?<!(don't\\(\\)))(do\\(\\)(?:.*?)don't\\(\\))")
dolist9 <- stringr::str_extract_all(lines,
"do\\(\\)(?:.(?!(don't\\(\\))))+$")
dolist7 <- unlist(dolist7, recursive = FALSE)
dolist <- c(dolistf1, dolistf2, dolistf3, dolistf4, dolistf5, dolistf6,
dolistl1, dolistl2, dolistl3, dolistl4, dolistl5, dolistl6,
dolist7)
dolist8
mullist <- stringr::str_extract_all(dolist7, "mul\\(\\d*,\\d*\\)")
mullist <- unlist(mullist, recursive = FALSE)
length(mullist)
df <- mullist |>
data.frame() |>
magrittr::set_colnames(c("muls")) |>
tidyr::separate_wider_regex(
cols = "muls",
patterns = c(
"mul\\(",
mul1 = "\\d*",
",",
mul2 = "\\d*"
),
too_few = "align_start"
) |>
mutate(
mul1 = as.integer(mul1),
mul2 = as.integer(mul2),
result = mul1 * mul2
)
head(df)
| mul1 | mul2 | result |
|---|---|---|
| <int> | <int> | <int> |
| 948 | 148 | 140304 |
| 590 | 32 | 18880 |
| 611 | 372 | 227292 |
| 835 | 665 | 555275 |
| 416 | 366 | 152256 |
| 459 | 47 | 21573 |
sum(df$result)
Dieses Jahr versuche ich mich an den Herausforderungen des Advent of Code 2024 in der auf Statistik fokussierten Programmiersprache R. Dies ist Tag 2 - siehe https:adventofcode.com/2024/day/2
Für die Lösungen siehe die englische Version dieses Blogbeitrages.
- Charlotte Perkins Gilman: The Yellow Wallpaper
- A classic of feminist literature with a soft horror touch - a quick read (only 24 pages) and definitely worth it.
- Herta Müller: Der König verneigt sich und tötet (essay collection)
- Jede Sprache hat andere Augen
- Der König verneigt sich und tötet
- …
Eine Grundüberlegung bei Noah Sham’s Mount Misery: It is not self that heals, but connection.
Corona: Connection (soziale Nähe) tötet.
Siehe Englische Version.
See German version.
In einer kürzlich erschienenen Folge des ZEIT-Podcasts “Alles gesagt” besprechen Thea Dorn, Jochen Wegner, und Christian Amend das Buch Trost. Briefe an Max.. Sie nähern sich einem Kernproblem des modernen Menschen: wir seien darauf aus, entweder vor einem unerwünschten Ereignis Risiken zu minimieren, oder nachher Schadenersatz und Verantwortung zu fordern. Wir hätten verlernt, mit den Unwägbarkeiten des Lebens umzugehen. Religiösität mit ihrer trostspendenen Wirkung sei der entscheidende Ausweg.
Punkte, die ich ansprechen möchte:
- Thea Dorn zitiert NIetzsche platt als “Gott ist tot, der Mensch ist Gott, etc” → eingehen auf Nietzsche’s “Gott ist tot” als Aufruf an Mensch in der “aufgeklärten” Welt, dass es eine neue Begründung für basale Dinge wie Menschenrechte, oder auch Trost, braucht. Glauben als mentaler Trick. “Gott ist tot” als Aufruf und mit Hoffnung besetzt, eine nachhaltigere, tiefere Basis zu finden.
- Sie geht zurück zur Religiösität. Ich als ostdeutsch sozialisiert kann nicht zurückgehen. Ich wurde als Kind nicht an Glauben herangeführt, es bleibt für mich ein unzugängliches Thema, welches nicht als ehrlich und wahren Trost annehmen kann. Ich weiß unverrückbar, dass das Leben ein kurzer, zufälliger Moment ist, dass nach dem Tod nichts ist, dass das Leben am Ende keinen höheren Sinn hat. Dies ist mein Ausgangspunkt. Ist dies der Grund, warum ich bei diesen Überlegungen keine Hoffnungslosigkeit spüre? Ich muss nicht dabei verweilen, ich kann einen Schritt weitergehen: letztendlich zählt das, was ich in meinem jetzigen Leben tue.
- Dass es Leben gibt, ist chemischer Zufall, dass es menschliches Leben gibt, ist evolutionärer Zufall, dass ich in einem reichen Land geboren werde, ist geschichtlicher Zufall, dass meine Eltern mich gut versorgen und großziehen können, ist familiärer Zufall. Brauchen wir eine Zufallsethik? Wenn so viele Teile unserer Existenz Zufall sind, ist das genug, um Trost zu spenden?
- Trifft das Unheil ein, ist dies ein Schicksalsschlag, aber nicht auch eine statistische Größe? Das Medizinstudium zeigt mir: die ganze Zeit erkranken Menschen, während ich oft glimpflich davonkomme. Ich bin nicht mit 18J in einen tödlichen Verkehrsunfall verwickelt worden. Ich bin nicht mit 23J an akuter Leukämie gestorben. Sehe ich diese statistischen Bedrohungen die ganze Zeit als reale Bedrohungen, werde ich gewiss verrückt, verzweifeln, hoffnungslos.
- Ich bin es jedoch nicht. Warum nicht? Gewiss aufgrund einer gewissen stoischen Haltung. Das reicht aber nicht, es braucht auch einen aktiven Part - Risikominimierung bietet letztlich dieses Gewissen: “ich habe mein möglichstes getan”. Christian Drosten hatte eine ähnliche Losung: wir müssen besorgt sein und Maßnahmen treffen - aber nicht panisch.
Weiteres Thema: Thea Dorn sagt, Naturwissenschaftler*innen wägen gegeneinander ab: wir rechnen aus, was wieviel Leid bringt, dann nehmen wir das, was weniger bringt. Das sei Utilitarismus. Sie jedoch sagt: wir müssen abwägen von Leben retten vs “Freiheit”.
- mit Friederike Schmitz: Abwägen im juristischen Sinne IST Utilitarismus!
- State “TODO” from “PENDING” [2021-01-13 Mi 23:24]
- State “PENDING” from “DONE” [2021-01-13 Mi 23:24]
General impression: thoughtful book, mostly he tries to avoid single stories and gives an impression of different scenes in or between different African countries from the end of the 1950s till the 1990s. From time to time he tries to explain certain view or mentalities in African countries, which might come of as condescending from todays view and be of more interest to non-African readers than to African readers.
In this blog post I will share the more impressionist quotes I liked, while some historical takeaways will be collected in Notes
- from very first second: exhibit enormous primal joy and geniality
- extend hand in large, vigorous gesture
- loud laughter, many questions
- a state wherein the civil servant received renumeration beyond all measure
and reason (white low burocratic suddenly gets villa, servants, …)
- after independence this system gives fast rise to new elites
- french: la politique du ventre (… of the stomach)
- 10.000 kingodms, federations and stateless, but independent tribal associations crammed in ~40 colonies! (without asking)
- ports - only leeches on the body of Africa, points of export for slaves, gold and ivory
- p52: Why Indians built the railway
- White worker from Europe → was master, could not do physical labour
- African worker → “did not exist”, the concept of wages was missing and British had system of forced labour later
- p82: Islands around Africa
- Were bases for sailors, merchants, and robbers (especially Europeans)
- For unstable African boats hard to reach → spot for concentration camps for slave trade
- p83: The philosophy that inspired the construction of Kolyma and Auschwitz, one of obsessive content and hatred, vileness and brutality, was formulated and set down centuries earlier by the captains of the Martha and the Progress, the Marie Ann and the Rainbow
- lions attacking humans → are old outcasts
- where are the elephant cemetries → on bottom of lakes
- p70: Poland vs Tansania
- children asking Ryszard in Poland “And did you see many cannibals?”
- Mothers in Tanganyika to their children: “You had better be good, or else the mzungu (Swahili: the white man, the European) will eat you!”
- p110: So often I had felt irritated with people who arrived here, lived in “little Europe” or “little America” (e.g. in luxury hotels), and departed, bragging later that they had been to Africa, a place that in reality they had never seen.
- Lalibela: 11 great churches carved in stone and misery