I wanted to brush up on my Python plotting skills in a fun way. Since I’ve been playing Pokemon Go recently, I decided to make a plot using data from the game. After a few iterations and even some OpenCV work, I decided that the following plot was the best solution.
Even though the above plot is probably the most useful, my favorite plot is the one where each bar is colored based on each Pokemon’s dominant colors. Analyzing each sprite and working with OpenCV was fun. I plan to delve further into k-means clustering and similar algorithms since they were really interesting.
Below is the Jupyter notebook I used to develop everything. If you want to play around with it locally, you can download it from my GitHub.
Plotting Pokemon Max Combat Power¶
This notebook contains nine different iterations of plots in an attempt to visualize a Pokemon’s max Combat Power (CP) in Pokemon GO. CP represents a Pokemon’s ability to perform well in battle, a higher CP generally means a Pokemon will be a better fighter. CP is calculated from a Pokemon’s base stats: Attack, Defence, and Stamina (somtimes called HP). As a Pokemon levels up, it’s base stats increase and as a result Combat Power also increases.
Table of Contents¶
- Equations – Calculating CP
- Mining Data
- Attempt 1 – Scatter Plot (Level vs CP)
- Attempt 2 – Scatter Plot (Pokemon vs CP)
- Attempt 3 – Layered Bar Chart
- Attempt 4 – Four Bar Chart Subplots (sliced by generation, sorted by number)
- Attempt 5 – Horizontal Bar Chart (sorted by number)
- Attempt 6 – Horizontal Bar Chart (sorted by CP)
- Attempt 7 – Horizonatal Bar Chart (Colorized)
- Attempt 8 – Four Bar Chart Subplots (sorted by CP)
- Attempt 9 – Four Horizontal Bar Chart Subplots (sorted by CP)
This project covered a huge variaty of programming and Python related topics that I had never been exposed to. I learned a lot of new information about the following (in no particular order):
Equations – Calculating CP ¶
The equations used to determine CP are based off of the stats of the Pokemon from the original game. Although the equations have changed over time, the following have been used since October 2018 (discovered by redditor u/Pikatrainer):
CP Formula¶
\begin{equation*} CP = BaseAttack \times \sqrt{BaseDef} \times \sqrt{BaseStam} \times \frac{CPMultiplier^2}{10} \end{equation*}Base Attack Formula¶
Higher and Lower are the Attack and Special Attack values from the original Pokemon games, whichever was higher or lower. \begin{equation*} BaseAttack = Round(ScaledAttack \times SpeedMod) \end{equation*}
\begin{equation*} ScaledAttack = Round(2 \times (\frac{7}{8}Higher + \frac{1}{8}Lower)) \end{equation*}Base Defense¶
\begin{equation*} BaseDefense = Round(ScaledDefense \times SpeedMod) \end{equation*}\begin{equation*} ScaledDefense = Round(2 \times (\frac{5}{8}Higher + \frac{3}{8}Lower)) \end{equation*}Base Stamina Formula¶
\begin{equation*} BaseStam = Floor(HP \times 1.75 + 50) \end{equation*}Speed Mod¶
\begin{equation*} SpeedMod = 1 + \frac{Speed – 75}{500} \end{equation*}Mining Data ¶
The PokemonGO app caches tons of data on a player’s phone to increase the performance. On Android, the cache is called GAME_MASTER and is stored in internal/emulated storage: Android/data/com.nianticlabs.pokemongo/files/remote_config_cache
The GAME_MASTER is in a Google Protocol Buffer format and must be parsed. There are projects like pogo-game-master-decoder on GitHub that can parse the GAME_MASTER.
Better yet, the GAME_MASTER files can be found in .json format in the pokmeongo-game-master GitHub project.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import json
import re
import urllib.request
import math
# If GAME_MASTER exists, open it, otherwise download most recent GAME_MASTER from Github and save locally.
try:
with open('GAME_MASTER.json') as json_file:
data = json.load(json_file)
except FileNotFoundError:
# Get the latest Pokemon GO GAME_MASTER in json format from GITHUB
url = 'https://raw.githubusercontent.com/pokemongo-dev-contrib/pokemongo-game-master/master/versions/latest/GAME_MASTER.json'
response = urllib.request.urlopen(url)
raw_data = response.read()
data = json.loads(raw_data)
with open('GAME_MASTER.json', 'wb') as file:
file.write(raw_data)
pattern_pokemon = re.compile('^V\d{4}_POKEMON_.*$') # Regex to match V####_POKEMON_
pokemon_stats = []
cp_multipliers = []
for template in data['itemTemplates']:
if pattern_pokemon.match(template['templateId']):
dictRow = {}
# Pokemon Template ID
dictRow['TemplateId'] = template['templateId']
# Pokemon Number
pokemon_number = re.findall("\d{4}", template['templateId'])
pokemon_number = int(pokemon_number[0])
dictRow['Number'] = pokemon_number
# Pokemon Name
name = template['pokemonSettings']['pokemonId']
dictRow['Name'] = name
# Pokemon Form
form = template['pokemonSettings'].get('form', '')
form = form.replace(name+'_', '').replace('_',' ') # remove name from form
dictRow['Form'] = form
# Stats
dictRow['baseStamina'] = template['pokemonSettings']['stats']['baseStamina']
dictRow['baseAttack'] = template['pokemonSettings']['stats']['baseAttack']
dictRow['baseDefense'] = template['pokemonSettings']['stats']['baseDefense']
# Flee and Capture rate
dictRow['baseFleeRate'] = template['pokemonSettings']['encounter'].get('baseFleeRate', 0.0)
dictRow['baseCaptureRate'] = template['pokemonSettings']['encounter'].get('baseCaptureRate', 0.0)
#if form != 'NORMAL': # ignore NORMAL forms because they are duplicates
pokemon_stats.append(dictRow)
# CP Multiplier
if template['templateId'] == 'PLAYER_LEVEL_SETTINGS':
cp_multipliers = template['playerLevel']['cpMultiplier']
df = pd.DataFrame(pokemon_stats)
print(df.head())
print("Done processing.")
Form Name Number TemplateId baseAttack \ 0 BULBASAUR 1 V0001_POKEMON_BULBASAUR 118 1 IVYSAUR 2 V0002_POKEMON_IVYSAUR 151 2 VENUSAUR 3 V0003_POKEMON_VENUSAUR 198 3 CHARMANDER 4 V0004_POKEMON_CHARMANDER 116 4 CHARMELEON 5 V0005_POKEMON_CHARMELEON 158 baseCaptureRate baseDefense baseFleeRate baseStamina 0 0.20 111 0.10 128 1 0.10 143 0.07 155 2 0.05 189 0.05 190 3 0.20 93 0.10 118 4 0.10 126 0.07 151 Done processing.
Processing Data – Applying Equations ¶
Now that we have our data parsed, we want to do some calculations on it. Below we apply the Combat Power equation mentioned above. We also determine which Generation each Pokemon belongs to; this will be helpful later.
def determine_generation (row):
if row['Number'] <= 151:
return 1
if 151 < row['Number'] <= 251:
return 2
if 251 < row['Number'] <= 386:
return 3
if 386 < row['Number'] <= 493:
return 4
if 493 < row['Number'] <=649:
return 5
if 649 < row['Number'] <= 721:
return 6
if 721< row['Number'] <= 809:
return 7
if row['Number'] > 809:
return 8
# Calculate Max CP for each Pokemon
sqrtDef = np.sqrt(df['baseDefense'] + 15) # +15 assumes perfect IV
sqrtStam = np.sqrt(df['baseStamina'] + 15)
cpMult20 = (cp_multipliers[19]**2) / 10
cpMult30 = (cp_multipliers[29]**2) / 10
cpMult40 = (cp_multipliers[39]**2) / 10
df['cp_level_20'] = np.floor((df['baseAttack'] + 15) * sqrtDef * sqrtStam * cpMult20)
df['cp_level_30'] = np.floor((df['baseAttack'] + 15) * sqrtDef * sqrtStam * cpMult30)
df['cp_level_40'] = np.floor((df['baseAttack'] + 15) * sqrtDef * sqrtStam * cpMult40)
# Cast CP columns to integers
df['cp_level_20'] = pd.to_numeric(df['cp_level_20'], downcast='integer')
df['cp_level_30'] = pd.to_numeric(df['cp_level_30'], downcast='integer')
df['cp_level_40'] = pd.to_numeric(df['cp_level_40'], downcast='integer')
# Determine generation for each Pokemon
df['Generation'] = df.apply(determine_generation, axis=1)
print(df.head())
print('Done processing data')
Form Name Number TemplateId baseAttack \ 0 BULBASAUR 1 V0001_POKEMON_BULBASAUR 118 1 IVYSAUR 2 V0002_POKEMON_IVYSAUR 151 2 VENUSAUR 3 V0003_POKEMON_VENUSAUR 198 3 CHARMANDER 4 V0004_POKEMON_CHARMANDER 116 4 CHARMELEON 5 V0005_POKEMON_CHARMELEON 158 baseCaptureRate baseDefense baseFleeRate baseStamina cp_level_20 \ 0 0.20 111 0.10 128 637 1 0.10 143 0.07 155 970 2 0.05 189 0.05 190 1554 3 0.20 93 0.10 118 560 4 0.10 126 0.07 151 944 cp_level_30 cp_level_40 Generation 0 955 1115 1 1 1456 1699 1 2 2332 2720 1 3 840 980 1 4 1417 1653 1 Done processing data
Attempt 1 – Scatter Plot (Level vs CP) ¶
This graph is bad because we cannot compare each pokemon in any way. The only insight we gain is that Pokemon tend to get stronger as they increase in level. In order to compare CP of each Pokemon, we need Pokemon on the x-axis (not CP).
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random
# Magic command to show plots in Jupyter
%matplotlib inline
# Plot each pokemon's CP with respect to Level
for row in df.itertuples():
x = [20,30,40]
y = [row.cp_level_20, row.cp_level_30, row.cp_level_40]
rng = np.random.RandomState(random.randint(1,101)) # make random state different on each loop
colors = rng.rand(3) # Generate list of 3 values [r,g,b]
plt.scatter(x,y, c=colors, alpha=0.7)
plt.xlabel('Pokemon Level')
plt.ylabel('Combat Power (CP)')
plt.savefig('Pokemon_Max_CP_Attempt_1.png', format='png', dpi=1000) # Export Figure as high resolution
Attempt 2 – Scatter Plot (Pokemon vs CP) ¶
This attempt is better as we can now compare each Pokemon’s CP. Using a different color for each level allows us to compare the CP increase for each Pokemon as it levels up. Unfortunately, we cannot tell which datapoints match to which Pokemon. Also, the points are so clustered that it is difficult to tell which set of points coorespond to each Pokemon.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import string as s
%matplotlib inline
df = df.sort_values('Number')
plt.scatter(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green',s=1)
plt.scatter(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange', s=1)
plt.scatter(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red', s=1)
plt.legend()
plt.xlabel('Pokemon Id')
plt.ylabel('Combat Power (CP)')
plt.title('Pokemon Max CP at Level 20, 30, 40')
plt.savefig('Pokemon_Max_CP_Attempt_2.png', format='png', dpi=1000) # Export Figure as high resolution
Attempt 3 – Layered Bar Chart ¶
This attempt is much better. The CP values are stacked as bars on top of eachother instead as points like in Attempt 2. This makes it easier to keep track of which data cooresponds to each Pokemon. While we can now view ever Pokemon’s name at the bottom, there are so many that the x-axis labels become too small to read without high resolution images and zooming.
Plot Data¶
Layer bars on top of eachother by plotting sequentially. This works because we know that CP always increases with level increase. We could have also created three bars per Pokemon, but this approach is more compact.
%matplotlib inline
#%matplotlib notebook
# Layer bar charts
plt.bar(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red', width=0.5)
plt.bar(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange', width=0.5)
plt.bar(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green', width=0.5)
# Need to remove duplicates in NumId column by incrementing NumId when a special pokemon exists
plt.xticks(df.index.values, df['Name'], fontsize=1, rotation=90)
ax=plt.gca() # GetCurrentAxis of plot
ax.xaxis.set_tick_params(labelsize=1, width=0.1, pad=0.5)
plt.legend()
plt.grid(axis='y')
plt.xlabel('Pokemon Id')
plt.ylabel('Combat Power (CP)')
plt.title('Pokemon Max CP')
plt.savefig('Pokemon_Max_CP_Attempt_3.png', format='png', dpi=1000) # Export Figure as high resolution
plt.show()
Attempt 4 – Four Bar Chart Subplots (sliced by generation, sorted by number) ¶
Make one subplot for each generation of Pokemon so that we can read the x-axis labels.
%matplotlib inline
#%matplotlib notebook
plt.clf() # clear plot
#plt.cla() # clear axis
# Create figure to add subplots to
fig, ax = plt.subplots(nrows=4, sharey=True)
fig.set_size_inches(11, 8.5)
for i in range(0,4): # Iterate through each Pokemon generation
gen = df.loc[df['Generation'] == (i+1)]
# Layer bars on top of eachother as before
ax[i].bar(gen.index.values, gen['cp_level_40'], label='Lvl 40 Max CP', color='red')
ax[i].bar(gen.index.values, gen['cp_level_30'], label='Lvl 30 Max CP', color='orange')
ax[i].bar(gen.index.values, gen['cp_level_20'], label='Lvl 20 Max CP', color='green')
# Define x-axis
ax[i].set_xticks(gen.index.values)
ax[i].set_xticklabels(gen['Name'] + ' ' + gen['Form'],rotation=90, fontsize=5)
ax[i].tick_params(axis='x', which='minor', bottom=False)
# Define y-axis
ax[i].set_yticks([1000,2000,3000,4000])
ax[i].grid(axis='y')
ax[i].set_yticks([500,1500,2500,3500,4500], minor=True)
ax[i].grid(axis='y', which='minor', linestyle='--')
# Set Legend above first axes
#ax[0].legend(loc='upper center', bbox_to_anchor=(0., 1.1, 1., .102),fancybox=True, shadow=True, ncol=4)
handles, labels = ax[0].get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(0., 0.5, 1.2, .102), loc='center right', fancybox=True, shadow=True)
# Set X and Y labels on figure
#fig.suptitle('Pokemon Max CP by Level', y = 1.01, fontsize=18)
fig.suptitle('Pokemon Max CP by Level', y=1.015, fontsize=20)
fig.text(-.01, 0.5, 'Combat Power (CP)', ha='center', va='center', rotation='vertical')
fig.text(0.5, 0.04, 'Pokemon Id', ha='center', va='center')
fig.tight_layout() # Stop x-axis tick labels from being cropped
# Export Figure as high resolution
fig.savefig('Pokemon_Max_CP_Attempt_4.png', format='png', dpi=350, bbox_inches='tight') # tight required to prevent cropping
<Figure size 432x288 with 0 Axes>
Attempt 5 – Horizontal Bar Chart (sorted by number) ¶
In order to make the Pokemon names more readable, we can plot them on a horizontal bar chart.
%matplotlib inline
plt.clf() # clear plot
fig, ax = plt.subplots()
fig.set_size_inches(2, 20)
# Layer bar charts
ax.barh(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red')
ax.barh(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange')
ax.barh(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green')
plt.yticks(df.index.values, df['Name'] + ' ' + df['Form'], fontsize=1)
ax.yaxis.set_tick_params(labelsize=1, width=0.1, pad=0.5)
#ax.invert_yaxis()
# Define x-axis
ax.set_xticks([1000,2000,3000,4000])
ax.tick_params(axis='x', labelsize = 5)
ax.grid(axis='x')
ax.set_xticks([500,1500,2500,3500,4500], minor=True)
ax.grid(axis='x', which='minor', linestyle='--')
plt.legend(prop={'size': 5})
plt.xlabel('Combat Power (CP)')
plt.ylabel('Pokemon Id')
plt.title('Pokemon Max CP')
plt.savefig('Pokemon_Max_CP_Attempt_5.png', format='png', dpi=1000) # Export Figure as high resolution
<Figure size 432x288 with 0 Axes>
Attempt 6 – Horizontal Bar Chart (sorted by CP) ¶
Sort the data by Max CP to find the best Pokemon.
%matplotlib inline
# Sort Data
df = df.sort_values('cp_level_40')
df = df.reset_index(drop=True) # Set index to this newly sorted order
# Plot Data
plt.clf() # clear plot
fig, ax = plt.subplots()
fig.set_size_inches(2, 20)
# Layer bar charts
ax.barh(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red')
ax.barh(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange')
ax.barh(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green')
plt.yticks(df.index.values, df['Name'] + ' ' + df['Form'], fontsize=1)
ax.yaxis.set_tick_params(labelsize=1, width=0.1, pad=0.2)
#ax.invert_yaxis()
# Define x-axis
ax.set_xticks([1000,2000,3000,4000])
ax.tick_params(axis='x', labelsize = 5)
ax.grid(axis='x', alpha=0.7)
ax.set_xticks([500,1500,2500,3500,4500], minor=True)
ax.grid(axis='x', which='minor', linestyle='--', alpha=0.5)
ax.legend(prop={'size': 5})
plt.xlabel('Combat Power (CP)')
plt.ylabel('Pokemon Id')
plt.title('Pokemon Max CP')
plt.savefig('Pokemon_Max_CP_Attempt_6.png', format='png', dpi=1000) # Export Figure as high resolution
<Figure size 432x288 with 0 Axes>