I wanted to brush up on my Python plotting skills in a fun way. Since I’ve been playing Pokemon Go recently, I decided to make a plot using data from the game. After a few iterations and even some OpenCV work, I decided that the following plot was the best solution.

Even though the above plot is probably the most useful, my favorite plot is the one where each bar is colored based on each Pokemon’s dominant colors. Analyzing each sprite and working with OpenCV was fun. I plan to delve further into k-means clustering and similar algorithms since they were really interesting.

Below is the Jupyter notebook I used to develop everything. If you want to play around with it locally, you can download it from my GitHub.

Pokemon_Max_CP_Plotting

Plotting Pokemon Max Combat Power¶

This notebook contains nine different iterations of plots in an attempt to visualize a Pokemon’s max Combat Power (CP) in Pokemon GO. CP represents a Pokemon’s ability to perform well in battle, a higher CP generally means a Pokemon will be a better fighter. CP is calculated from a Pokemon’s base stats: Attack, Defence, and Stamina (somtimes called HP). As a Pokemon levels up, it’s base stats increase and as a result Combat Power also increases.

Table of Contents¶

Equations – Calculating CP
Mining Data
Attempt 1 – Scatter Plot (Level vs CP)
Attempt 2 – Scatter Plot (Pokemon vs CP)
Attempt 3 – Layered Bar Chart
Attempt 4 – Four Bar Chart Subplots (sliced by generation, sorted by number)
Attempt 5 – Horizontal Bar Chart (sorted by number)
Attempt 6 – Horizontal Bar Chart (sorted by CP)
Attempt 7 – Horizonatal Bar Chart (Colorized)
Attempt 8 – Four Bar Chart Subplots (sorted by CP)
Attempt 9 – Four Horizontal Bar Chart Subplots (sorted by CP)

This project covered a huge variaty of programming and Python related topics that I had never been exposed to. I learned a lot of new information about the following (in no particular order):

Equations – Calculating CP ¶

The equations used to determine CP are based off of the stats of the Pokemon from the original game. Although the equations have changed over time, the following have been used since October 2018 (discovered by redditor u/Pikatrainer):

CP Formula¶

\begin{equation*} CP = BaseAttack \times \sqrt{BaseDef} \times \sqrt{BaseStam} \times \frac{CPMultiplier^2}{10} \end{equation*}

Base Attack Formula¶

Higher and Lower are the Attack and Special Attack values from the original Pokemon games, whichever was higher or lower. \begin{equation*} BaseAttack = Round(ScaledAttack \times SpeedMod) \end{equation*}

\begin{equation*} ScaledAttack = Round(2 \times (\frac{7}{8}Higher + \frac{1}{8}Lower)) \end{equation*}

Base Defense¶

\begin{equation*} BaseDefense = Round(ScaledDefense \times SpeedMod) \end{equation*}\begin{equation*} ScaledDefense = Round(2 \times (\frac{5}{8}Higher + \frac{3}{8}Lower)) \end{equation*}

Base Stamina Formula¶

\begin{equation*} BaseStam = Floor(HP \times 1.75 + 50) \end{equation*}

Speed Mod¶

\begin{equation*} SpeedMod = 1 + \frac{Speed – 75}{500} \end{equation*}

Mining Data ¶

The PokemonGO app caches tons of data on a player’s phone to increase the performance. On Android, the cache is called GAME_MASTER and is stored in internal/emulated storage: Android/data/com.nianticlabs.pokemongo/files/remote_config_cache

The GAME_MASTER is in a Google Protocol Buffer format and must be parsed. There are projects like pogo-game-master-decoder on GitHub that can parse the GAME_MASTER.

Better yet, the GAME_MASTER files can be found in .json format in the pokmeongo-game-master GitHub project.

In [1]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import json
import re
import urllib.request
import math

# If GAME_MASTER  exists, open it, otherwise download most recent GAME_MASTER from Github and save locally.
try:
    with open('GAME_MASTER.json') as json_file:
        data = json.load(json_file)
except FileNotFoundError:  
    # Get the latest Pokemon GO GAME_MASTER in json format from GITHUB
    url = 'https://raw.githubusercontent.com/pokemongo-dev-contrib/pokemongo-game-master/master/versions/latest/GAME_MASTER.json'
    response = urllib.request.urlopen(url)
    raw_data = response.read()
    data = json.loads(raw_data)
    with open('GAME_MASTER.json', 'wb') as file:
        file.write(raw_data)
        

pattern_pokemon = re.compile('^V\d{4}_POKEMON_.*$') # Regex to match V####_POKEMON_

pokemon_stats = []
cp_multipliers = []

for template in data['itemTemplates']:
    if pattern_pokemon.match(template['templateId']):
        
        dictRow = {}
        
        # Pokemon Template ID
        dictRow['TemplateId'] = template['templateId']
        
        # Pokemon Number
        pokemon_number = re.findall("\d{4}", template['templateId'])
        pokemon_number = int(pokemon_number[0])
        dictRow['Number'] = pokemon_number
        
        # Pokemon Name
        name = template['pokemonSettings']['pokemonId']
        dictRow['Name'] = name
        
        # Pokemon Form
        form = template['pokemonSettings'].get('form', '')
        form = form.replace(name+'_', '').replace('_',' ') # remove name from form
        dictRow['Form'] = form      
        
        # Stats
        dictRow['baseStamina'] = template['pokemonSettings']['stats']['baseStamina']
        dictRow['baseAttack'] = template['pokemonSettings']['stats']['baseAttack']
        dictRow['baseDefense'] = template['pokemonSettings']['stats']['baseDefense']      
        
        # Flee and Capture rate        
        dictRow['baseFleeRate'] = template['pokemonSettings']['encounter'].get('baseFleeRate', 0.0)
        dictRow['baseCaptureRate'] = template['pokemonSettings']['encounter'].get('baseCaptureRate', 0.0)
        
        
        #if form != 'NORMAL': # ignore NORMAL forms because they are duplicates
        pokemon_stats.append(dictRow)
        
    # CP Multiplier
    if template['templateId'] == 'PLAYER_LEVEL_SETTINGS':
        cp_multipliers = template['playerLevel']['cpMultiplier']
        
        
df = pd.DataFrame(pokemon_stats)
print(df.head())
print("Done processing.")

  Form        Name  Number                TemplateId  baseAttack  \
0        BULBASAUR       1   V0001_POKEMON_BULBASAUR         118   
1          IVYSAUR       2     V0002_POKEMON_IVYSAUR         151   
2         VENUSAUR       3    V0003_POKEMON_VENUSAUR         198   
3       CHARMANDER       4  V0004_POKEMON_CHARMANDER         116   
4       CHARMELEON       5  V0005_POKEMON_CHARMELEON         158   

   baseCaptureRate  baseDefense  baseFleeRate  baseStamina  
0             0.20          111          0.10          128  
1             0.10          143          0.07          155  
2             0.05          189          0.05          190  
3             0.20           93          0.10          118  
4             0.10          126          0.07          151  
Done processing.

Processing Data – Applying Equations ¶

Now that we have our data parsed, we want to do some calculations on it. Below we apply the Combat Power equation mentioned above. We also determine which Generation each Pokemon belongs to; this will be helpful later.

In [2]:

def determine_generation (row):
    if row['Number'] <= 151:
        return 1
    if 151 < row['Number'] <= 251:
        return 2
    if 251 < row['Number'] <= 386:
        return 3
    if 386 < row['Number'] <= 493:
        return 4
    if 493 < row['Number'] <=649:
        return 5
    if 649 < row['Number'] <= 721:
        return 6
    if 721< row['Number'] <= 809:
        return 7
    if row['Number'] > 809:
        return 8

    
# Calculate Max CP for each Pokemon
sqrtDef = np.sqrt(df['baseDefense'] + 15) # +15 assumes perfect IV
sqrtStam = np.sqrt(df['baseStamina'] + 15)
cpMult20 = (cp_multipliers[19]**2) / 10
cpMult30 = (cp_multipliers[29]**2) / 10
cpMult40 = (cp_multipliers[39]**2) / 10

df['cp_level_20'] = np.floor((df['baseAttack'] + 15) * sqrtDef * sqrtStam * cpMult20)
df['cp_level_30'] = np.floor((df['baseAttack'] + 15)  * sqrtDef * sqrtStam * cpMult30)
df['cp_level_40'] = np.floor((df['baseAttack'] + 15)  * sqrtDef * sqrtStam * cpMult40)

# Cast CP columns to integers
df['cp_level_20'] = pd.to_numeric(df['cp_level_20'], downcast='integer')
df['cp_level_30'] = pd.to_numeric(df['cp_level_30'], downcast='integer')
df['cp_level_40'] = pd.to_numeric(df['cp_level_40'], downcast='integer')

# Determine generation for each Pokemon
df['Generation'] = df.apply(determine_generation, axis=1)

print(df.head())
print('Done processing data')

  Form        Name  Number                TemplateId  baseAttack  \
0        BULBASAUR       1   V0001_POKEMON_BULBASAUR         118   
1          IVYSAUR       2     V0002_POKEMON_IVYSAUR         151   
2         VENUSAUR       3    V0003_POKEMON_VENUSAUR         198   
3       CHARMANDER       4  V0004_POKEMON_CHARMANDER         116   
4       CHARMELEON       5  V0005_POKEMON_CHARMELEON         158   

   baseCaptureRate  baseDefense  baseFleeRate  baseStamina  cp_level_20  \
0             0.20          111          0.10          128          637   
1             0.10          143          0.07          155          970   
2             0.05          189          0.05          190         1554   
3             0.20           93          0.10          118          560   
4             0.10          126          0.07          151          944   

   cp_level_30  cp_level_40  Generation  
0          955         1115           1  
1         1456         1699           1  
2         2332         2720           1  
3          840          980           1  
4         1417         1653           1  
Done processing data

Attempt 1 – Scatter Plot (Level vs CP) ¶

This graph is bad because we cannot compare each pokemon in any way. The only insight we gain is that Pokemon tend to get stronger as they increase in level. In order to compare CP of each Pokemon, we need Pokemon on the x-axis (not CP).

In [3]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random

# Magic command to show plots in Jupyter
%matplotlib inline 

# Plot each pokemon's CP with respect to Level
for row in df.itertuples():

    x = [20,30,40]
    y = [row.cp_level_20, row.cp_level_30, row.cp_level_40]
    
    rng = np.random.RandomState(random.randint(1,101)) # make random state different on each loop
    colors = rng.rand(3) # Generate list of 3 values [r,g,b]
    plt.scatter(x,y, c=colors, alpha=0.7)
    
plt.xlabel('Pokemon Level')
plt.ylabel('Combat Power (CP)')
plt.savefig('Pokemon_Max_CP_Attempt_1.png', format='png', dpi=1000) # Export Figure as high resolution

Attempt 2 – Scatter Plot (Pokemon vs CP) ¶

This attempt is better as we can now compare each Pokemon’s CP. Using a different color for each level allows us to compare the CP increase for each Pokemon as it levels up. Unfortunately, we cannot tell which datapoints match to which Pokemon. Also, the points are so clustered that it is difficult to tell which set of points coorespond to each Pokemon.

In [4]:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import string as s
%matplotlib inline


df = df.sort_values('Number')

plt.scatter(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green',s=1)
plt.scatter(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange', s=1)
plt.scatter(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red', s=1)

plt.legend()
plt.xlabel('Pokemon Id')
plt.ylabel('Combat Power (CP)')
plt.title('Pokemon Max CP at Level 20, 30, 40')
plt.savefig('Pokemon_Max_CP_Attempt_2.png', format='png', dpi=1000) # Export Figure as high resolution

Attempt 3 – Layered Bar Chart ¶

This attempt is much better. The CP values are stacked as bars on top of eachother instead as points like in Attempt 2. This makes it easier to keep track of which data cooresponds to each Pokemon. While we can now view ever Pokemon’s name at the bottom, there are so many that the x-axis labels become too small to read without high resolution images and zooming.

Plot Data¶

Layer bars on top of eachother by plotting sequentially. This works because we know that CP always increases with level increase. We could have also created three bars per Pokemon, but this approach is more compact.

In [5]:

%matplotlib inline
#%matplotlib notebook

# Layer bar charts
plt.bar(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red', width=0.5)
plt.bar(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange', width=0.5)
plt.bar(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green', width=0.5)

# Need to remove duplicates in NumId column by incrementing NumId when a special pokemon exists
plt.xticks(df.index.values, df['Name'], fontsize=1, rotation=90)
ax=plt.gca() # GetCurrentAxis of plot
ax.xaxis.set_tick_params(labelsize=1, width=0.1, pad=0.5) 

plt.legend()
plt.grid(axis='y')
plt.xlabel('Pokemon Id')
plt.ylabel('Combat Power (CP)')
plt.title('Pokemon Max CP')

plt.savefig('Pokemon_Max_CP_Attempt_3.png', format='png', dpi=1000) # Export Figure as high resolution
plt.show()

Attempt 4 – Four Bar Chart Subplots (sliced by generation, sorted by number) ¶

Make one subplot for each generation of Pokemon so that we can read the x-axis labels.

In [6]:

%matplotlib inline
#%matplotlib notebook

plt.clf() # clear plot
#plt.cla() # clear axis

# Create figure to add subplots to
fig, ax = plt.subplots(nrows=4, sharey=True)
fig.set_size_inches(11, 8.5)

for i in range(0,4): # Iterate through each Pokemon generation
    
    gen = df.loc[df['Generation'] == (i+1)]
    
    # Layer bars on top of eachother as before
    ax[i].bar(gen.index.values, gen['cp_level_40'], label='Lvl 40 Max CP', color='red')
    ax[i].bar(gen.index.values, gen['cp_level_30'], label='Lvl 30 Max CP', color='orange')
    ax[i].bar(gen.index.values, gen['cp_level_20'], label='Lvl 20 Max CP', color='green')

    # Define x-axis
    ax[i].set_xticks(gen.index.values)
    ax[i].set_xticklabels(gen['Name'] + ' ' + gen['Form'],rotation=90, fontsize=5)   
    ax[i].tick_params(axis='x', which='minor', bottom=False)
    
    # Define y-axis
    ax[i].set_yticks([1000,2000,3000,4000])
    ax[i].grid(axis='y')
     
    ax[i].set_yticks([500,1500,2500,3500,4500], minor=True)
    ax[i].grid(axis='y', which='minor', linestyle='--')


# Set Legend above first axes
#ax[0].legend(loc='upper center', bbox_to_anchor=(0., 1.1, 1., .102),fancybox=True, shadow=True, ncol=4)
handles, labels = ax[0].get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(0., 0.5, 1.2, .102), loc='center right', fancybox=True, shadow=True)

# Set X and Y labels on figure
#fig.suptitle('Pokemon Max CP by Level', y = 1.01, fontsize=18)
fig.suptitle('Pokemon Max CP by Level', y=1.015, fontsize=20)
fig.text(-.01, 0.5, 'Combat Power (CP)', ha='center', va='center', rotation='vertical')
fig.text(0.5, 0.04, 'Pokemon Id', ha='center', va='center')

fig.tight_layout() # Stop x-axis tick labels from being cropped

# Export Figure as high resolution
fig.savefig('Pokemon_Max_CP_Attempt_4.png', format='png', dpi=350, bbox_inches='tight') # tight required to prevent cropping

<Figure size 432x288 with 0 Axes>

Attempt 5 – Horizontal Bar Chart (sorted by number) ¶

In order to make the Pokemon names more readable, we can plot them on a horizontal bar chart.

In [7]:

%matplotlib inline

plt.clf() # clear plot
fig, ax = plt.subplots()
fig.set_size_inches(2, 20)

# Layer bar charts
ax.barh(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red')
ax.barh(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange')
ax.barh(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green')

plt.yticks(df.index.values, df['Name'] + ' ' + df['Form'], fontsize=1)
ax.yaxis.set_tick_params(labelsize=1, width=0.1, pad=0.5) 
#ax.invert_yaxis()

# Define x-axis
ax.set_xticks([1000,2000,3000,4000])
ax.tick_params(axis='x', labelsize = 5)
ax.grid(axis='x')
     
ax.set_xticks([500,1500,2500,3500,4500], minor=True)
ax.grid(axis='x', which='minor', linestyle='--')

plt.legend(prop={'size': 5})
plt.xlabel('Combat Power (CP)')
plt.ylabel('Pokemon Id')
plt.title('Pokemon Max CP')
plt.savefig('Pokemon_Max_CP_Attempt_5.png', format='png', dpi=1000) # Export Figure as high resolution

<Figure size 432x288 with 0 Axes>

Attempt 6 – Horizontal Bar Chart (sorted by CP) ¶

Sort the data by Max CP to find the best Pokemon.

In [8]:

%matplotlib inline

# Sort Data
df = df.sort_values('cp_level_40')
df = df.reset_index(drop=True) # Set index to this newly sorted order

# Plot Data
plt.clf() # clear plot
fig, ax = plt.subplots()
fig.set_size_inches(2, 20)

# Layer bar charts
ax.barh(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color='red')
ax.barh(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color='orange')
ax.barh(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color='green')

plt.yticks(df.index.values, df['Name'] + ' ' + df['Form'], fontsize=1)
ax.yaxis.set_tick_params(labelsize=1, width=0.1, pad=0.2) 
#ax.invert_yaxis()

# Define x-axis
ax.set_xticks([1000,2000,3000,4000])
ax.tick_params(axis='x', labelsize = 5)
ax.grid(axis='x', alpha=0.7)
     
ax.set_xticks([500,1500,2500,3500,4500], minor=True)
ax.grid(axis='x', which='minor', linestyle='--', alpha=0.5)

ax.legend(prop={'size': 5})
plt.xlabel('Combat Power (CP)')
plt.ylabel('Pokemon Id')
plt.title('Pokemon Max CP')
plt.savefig('Pokemon_Max_CP_Attempt_6.png', format='png', dpi=1000) # Export Figure as high resolution

<Figure size 432x288 with 0 Axes>

Attempt 7 – Horizontal Bar Chart (Colorized) ¶

It may be beneficial to change the color of each bar so that they can be distinguished from neighboring bars. I also want there to be a coorelation between the bar color and the actual Pokemon. Since each Pokemon has a fairly unique color scheme, we can try to extract their colors to use in our next bar chart.

What color is Bulbasaur? – Getting to know OpenCV color spaces ¶

Below we read in an image file (PNG sprite of Bulbasaur) using OpenCV and take a look at how OpenCV stores each pixel in a Numpy matrix. We then view the image in the default BGR (Blue, Green, Red) colorspace and then convert it to the correct RGB (Red, Green, Blue) colorspace. Finally we create 3D plots of the image in both the RGB and HSV (Hue, Saturation, Value) color models.

This tutorial about OpenCV color spaces was helpful.

In [9]:

#%matplotlib notebook
%matplotlib inline

import cv2
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm
from matplotlib import colors


bulba = cv2.imread('sprites/1.png') # reads image into numpy matrix in BGR color format
plt.imshow(bulba)
plt.show()
print('Bulbasaur looks sick. He is displayed in the BGR colorspace so Red and Blue are swapped. His red eyes are now blue.')

# Acess a pixel's BGR value
print("Pixel at Row 45 x Column 50: ", 
      "[Blue: ", bulba[45,50,0],
      " Green: ", bulba[45,50,1],
     " Red: ", bulba[45,50,2], "]",
      sep='') 

# Convert from BGR to RGB color format
bulba = cv2.cvtColor(bulba, cv2.COLOR_BGR2RGB)
plt.imshow(bulba)
plt.show()
print('Bulbasaur looks healthy now. He is now in RGB colorspace, so his eyes are now properly red.')


# Access a pixel's RGB values
print("Pixel at Row 45 x Column 50: ", 
      "[Red: ", bulba[45,50,0],
      " Green: ", bulba[45,50,1],
     " Blue: ", bulba[45,50,2], "]",
      sep='') 

r, g, b = cv2.split(bulba)
fig = plt.figure()
axis = fig.add_subplot(1, 1, 1, projection="3d")

# Normalize color ranges from 0-255 to 0-1
pixel_colors = bulba.reshape((np.shape(bulba)[0]*np.shape(bulba)[1], 3))
norm = colors.Normalize(vmin=0,vmax=255)
pixel_colors = norm(pixel_colors).tolist()

axis.scatter(r.flatten(), g.flatten(), b.flatten(), facecolors=pixel_colors, marker=".")
axis.set_xlabel("Red")
axis.set_ylabel("Green")
axis.set_zlabel("Blue")
plt.show()

# 3D Plot HSV (Hue, Saturation, Value)
hsv_bulba = cv2.cvtColor(bulba, cv2.COLOR_RGB2HSV)
h, s, v = cv2.split(bulba)
fig2 = plt.figure()
axis2 = fig2.add_subplot(1, 1, 1, projection="3d")
pixel_hsv = hsv_bulba.reshape((np.shape(hsv_bulba)[0]*np.shape(hsv_bulba)[1], 3))
norm.autoscale(pixel_hsv)
pixel_hsv = norm(pixel_hsv).tolist()
axis2.scatter(h.flatten(), s.flatten(), v.flatten(), facecolors=pixel_hsv, marker=".")
axis2.set_xlabel("Hue")
axis2.set_ylabel("Saturation")
axis2.set_zlabel("Value")

plt.show()

Bulbasaur looks sick. He is displayed in the BGR colorspace so Red and Blue are swapped. His red eyes are now blue.
Pixel at Row 45 x Column 50: [Blue: 148 Green: 148 Red: 57]

Bulbasaur looks healthy now. He is now in RGB colorspace, so his eyes are now properly red.
Pixel at Row 45 x Column 50: [Red: 57 Green: 148 Blue: 148]

Finding dominant colors in pokemon sprites (k-means and frequency count) ¶

Use OpenCV to find dominant color in sprite.

Tutorials used for k-means colors clustering: https://www.pyimagesearch.com/2014/05/26/opencv-python-k-means-color-clustering/

Sprites for each Pokemon are downloaded from https://veekun.com/dex/downloads.

Official sprites were missing from the set above for Meltan and Melmetal, but fan sprites were available on DeviantArt:

Meltan (808): https://www.deviantart.com/ezerart/art/Pokemon-Meltan-Sprites-764955591
Melmetal (809): https://www.deviantart.com/elazulmax/art/Melmetal-Shiny-769688753

In [10]:

#%matplotlib notebook
%matplotlib inline

# import the necessary packages
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
import argparse
import cv2
import numpy as np
from matplotlib import colors


def centroid_histogram(clt):
    # grab the number of different clusters and create a histogram
    # based on the number of pixels assigned to each cluster
    numLabels = np.arange(0, len(np.unique(clt.labels_)) + 1)                          
    (hist, _) = np.histogram(clt.labels_, bins = numLabels)
     
    # normalize the histogram, such that it sums to one
    hist = hist.astype("float")
    hist /= hist.sum()
 
    # return the histogram
    hist = np.sort(hist)
    hist = hist[::-1]
    return hist


def plot_colors(hist, centroids):
    # initialize the bar chart representing the relative frequency
    # of each of the colors
    bar = np.zeros((50, 300, 3), dtype = "uint8")
    startX = 0
    
    # loop over the percentage of each cluster and the color of
    # each cluster
    for (percent, color) in zip(hist, centroids):
        # plot the relative percentage of each cluster
        endX = startX + (percent * 300)
        cv2.rectangle(bar, (int(startX), 0), (int(endX), 50),
        color.astype("uint8").tolist(), -1)
        startX = endX
    
    # return the bar chart
    return bar

def crop_image(img,tol=0):
    # img is image data
    # tol  is tolerance
    mask = img>tol
    return img[np.ix_(mask.any(1),mask.any(0))]


def kmeans_colors(image, clusters, plot=True):
    '''
    Function to get the dominant colors of an image based on Kmeans Algorithm
    
    Parameters:
        image: OpenCV (cv2) Image object in RGB colorspace.
        clusters: Number of clusters (how many colors return)
        plot: Toggle on or off to plot resulting colors
        
    Returns: KMeans object with attibutes:
        cluster_centers_: A list of RGB colors rounded to nearest int
        labels_: A list of pixels and their cooresponding color index in clusters list.      
    ''' 

    # reshape the image to be a list of pixels
    image = image.reshape((image.shape[0] * image.shape[1], 3))

    # crop image with bounding box (effectively removes surrounding black pixels)
    image = crop_image(image)

    # cluster the pixel intensities
    clt = KMeans(n_clusters = clusters)
    clt.fit(image)
    roundClusters = np.around(clt.cluster_centers_).astype(int) # Round and convert RGB values to int    
    clt.cluster_centers_ = roundClusters
    
    
    if plot is True:
        # Build a histogram of clusters and then create a figure
        # representing the number of pixels labeled to each color.
        hist = centroid_histogram(clt)
        bar = plot_colors(hist, clt.cluster_centers_)
    
        # show our kmeans bar
        plt.figure()
        plt.title('Kmeans dominant colors')
        plt.axis("off")
        plt.imshow(bar)
      
    return clt


def count_colors(image, remove_black=False, plot=True): 
    # Convert image matrix to rgb array
    rgbArray = image.reshape((image.shape[0] * image.shape[1], 3))
    
    if remove_black:
        # select all rows that are not entirely zero
        rgbArray=rgbArray[np.all(rgbArray != 0, axis=1)]
    
    rgbArray, counts = np.unique(rgbArray, return_counts=True, axis=0)
    #print('colors: \n', colors)
    #print('count: \n', counts)
    
    # sort colors by their frequency, highest to lowest
    count_sort_indexes = np.argsort(-counts) # - means to sort highest to lowest

    rgbArray = rgbArray[count_sort_indexes]
    counts = counts[count_sort_indexes]

    # must normalize colors between 0 and 1 so that we can use set_color
    norm = colors.Normalize(vmin=0,vmax=255)
    rgbNorm = norm(rgbArray).tolist()
    
    if plot is True:
        num_bins = len(rgbArray)
        index = list(range(0, num_bins))   

        plt.figure()
        barchart = plt.bar(index, counts)
        ax0 = plt.gca()
        ax0.xaxis.set_visible(False)
        ax0.set_ylabel('Pixel count')

        for j in range(num_bins): # iterate through bars and change each color
            barchart[j].set_color(rgbNorm[j])

        plt.show()
    
    return [rgbArray,rgbNorm, counts]


df = df.sort_values('Number')
df = df.reset_index(drop=True) # Set index to this newly sorted order
df['dominant_color_1'] = ''
df['dominant_color_2'] = ''
df['dominant_color_3'] = ''

for i in range(0,len(df)): # Process all Pokemon
    #print('Pokemon Name: ', str(df.iloc[i,df.columns.get_loc('Pokemon')]))
    #print('Pokemon Id: ', str(df.iloc[i,df.columns.get_loc('NumId')]))
    
    # Load the image and convert it from BGR to RGB so that
    # we can dispaly it properly with matplotlib.
    filepath = 'sprites/' + str(df.iloc[i,df.columns.get_loc('Number')]) + '.png' # filepath to image   
    image = cv2.imread(filepath, cv2.IMREAD_COLOR)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)  
    
    # Show our image
    #plt.figure()
    #plt.axis("off")
    #plt.imshow(image)  
    #plt.show()  
    
    # Get dominant colors with k-means algorithm and plot.
    kmeans_colors(image, 3, plot=False)

    # Get dominant colors by counting frequency of each color
    rgbArray, rgbNormalized, counts = count_colors(image, remove_black=True, plot=False)
    
    # save top 3 dominant colors as hex values
    df.at[i, 'dominant_color_1'] = colors.to_hex(rgbNormalized[0])
    df.at[i, 'dominant_color_2'] = colors.to_hex(rgbNormalized[1])
    df.at[i, 'dominant_color_3'] = colors.to_hex(rgbNormalized[2])

print(df.head())    
print('Done Processing')

  Form        Name  Number                TemplateId  baseAttack  \
0        BULBASAUR       1   V0001_POKEMON_BULBASAUR         118   
1          IVYSAUR       2     V0002_POKEMON_IVYSAUR         151   
2         VENUSAUR       3    V0003_POKEMON_VENUSAUR         198   
3       CHARMANDER       4  V0004_POKEMON_CHARMANDER         116   
4       CHARMELEON       5  V0005_POKEMON_CHARMELEON         158   

   baseCaptureRate  baseDefense  baseFleeRate  baseStamina  cp_level_20  \
0             0.20          111          0.10          128          637   
1             0.10          143          0.07          155          970   
2             0.05          189          0.05          190         1554   
3             0.20           93          0.10          118          560   
4             0.10          126          0.07          151          944   

   cp_level_30  cp_level_40  Generation dominant_color_1 dominant_color_2  \
0          955         1115           1          #399494          #62d5b4   
1         1456         1699           1          #209483          #5acdbd   
2         2332         2720           1          #20b49c          #105241   
3          840          980           1          #ff9441          #de5239   
4         1417         1653           1          #cd3939          #ff524a   

  dominant_color_3  
0          #101010  
1          #317b52  
2          #ff7b73  
3          #101010  
4          #101010  
Done Processing

Plotting Attempt 7 – Horizonatal Bar Chart (Colorized) ¶

We can now see which pokemon are the best, however it is difficult to discern which bar cooresponds to what pokemon. Attempt 7 will assign the pokemon’s dominant colors to the color of the bar. A good example of this can be seen on http://pokepalettes.com/ (source code).

In [11]:

import matplotlib.pyplot as plt

%matplotlib inline

plt.clf() # clear plot
fig, ax = plt.subplots()
fig.set_size_inches(2, 20)

# Sort by highest cp first
df = df.sort_values('cp_level_40')
df = df.reset_index(drop=True) # Set index to this newly sorted order

# Layer bar charts
norm = colors.Normalize(vmin = 0, vmax=255) # normalize colors from 0-255 to 0.0 to 1.0
plt40 = ax.barh(df.index.values, df['cp_level_40'], label='Lvl 40 Max CP', color=df['dominant_color_3'])
plt30 = ax.barh(df.index.values, df['cp_level_30'], label='Lvl 30 Max CP', color=df['dominant_color_2'])
plt20 = ax.barh(df.index.values, df['cp_level_20'], label='Lvl 20 Max CP', color=df['dominant_color_1'])


plt.yticks(df.index.values, df['Name'] + ' ' + df['Form'], fontsize=1)
ax.yaxis.set_tick_params(labelsize=1, width=0.1, pad=0.2) 

# Define x-axis
ax.set_xticks([1000,2000,3000,4000])
ax.tick_params(axis='x', labelsize = 5)
ax.grid(axis='x', alpha=0.7)
     
ax.set_xticks([500,1500,2500,3500,4500], minor=True)
ax.grid(axis='x', which='minor', linestyle='--', alpha=0.5)

# ax.legend(prop={'size': 5})
plt.xlabel('Combat Power (CP)')
plt.ylabel('Pokemon Id')
plt.title('Pokemon Max CP')
plt.savefig('Pokemon_Max_CP_Attempt_7.png', format='png', dpi=1000) # Export Figure as high resolution

<Figure size 432x288 with 0 Axes>

Attempt 8 – Four Bar Chart Subplots (sorted by CP) ¶

Okay so the colors from Attempt 7 are pretty, but a little bit overwelming for the eyes. Also, Attempt 7 does not really fit in a reasonable area. In this Attempt 8, we sort Pokemon by CP and make four subplots so that we can fiew all in a reasonable amount of space.

In [12]:

import math

%matplotlib inline
#%matplotlib notebook


# Get ranges of CP depending on iteration of loop
def get_range(i): 
    return {
    0: df[df['cp_level_40'].between(0,999, inclusive = True)],
    1: df[df['cp_level_40'].between(1000,1999, inclusive = True)], 
    2: df[df['cp_level_40'].between(2000,2999, inclusive = True)],     
    3: df[df['cp_level_40'].between(3000,9999, inclusive = True)]
    }[i]


plt.clf() # clear plot
#plt.cla() # clear axis

# Create figure to add subplots to
fig, ax = plt.subplots(nrows=4, sharey=True)
fig.set_size_inches(11, 8.5)

# Sort by highest cp first
df = df.sort_values(['cp_level_40', 'Name'])
df = df.reset_index(drop=True) # Set index to this newly sorted order

for i in range(0,4): # make 4 subplots, increasing in CP on each
    
    # Divide data by 4 equal slices
    start = int(math.ceil(0 + (df.shape[0]/4) * i))
    stop = int(math.ceil((df.shape[0]/4) + (df.shape[0]/4) * i))
    
    gen = df[start:stop]
    
    # Layer bars on top of eachother as before
    ax[i].bar(gen.index.values, gen['cp_level_40'], label='Lvl 40 Max CP', color='red')
    ax[i].bar(gen.index.values, gen['cp_level_30'], label='Lvl 30 Max CP', color='orange')
    ax[i].bar(gen.index.values, gen['cp_level_20'], label='Lvl 20 Max CP', color='green')

    # Define x-axis
    ax[i].set_xticks(gen.index.values)
    ax[i].set_xticklabels(gen['Name'] + ' ' + gen['Form'],rotation=90, fontsize=5)   
    ax[i].tick_params(axis='x', which='minor', bottom=False)
    
    # Define y-axis
    ax[i].set_yticks([1000,2000,3000,4000])
    ax[i].grid(axis='y')
     
    ax[i].set_yticks([500,1500,2500,3500,4500], minor=True)
    ax[i].grid(axis='y', which='minor', linestyle='--')


# Set Legend above first axes
#ax[0].legend(loc='upper center', bbox_to_anchor=(0., 1.1, 1., .102),fancybox=True, shadow=True, ncol=4)
handles, labels = ax[0].get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(0., 0.5, 1.2, .102), loc='center right', fancybox=True, shadow=True)

# Set X and Y labels on figure
#fig.suptitle('Pokemon Max CP by Level', y = 1.01, fontsize=18)
fig.suptitle('Pokemon Max CP by Level', y=1.015, fontsize=20)
fig.text(-.01, 0.5, 'Combat Power (CP)', ha='center', va='center', rotation='vertical')
fig.text(0.5, 0.04, 'Pokemon Id', ha='center', va='center')

fig.tight_layout() # Stop x-axis tick labels from being cropped

# Export Figure as high resolution
fig.savefig('Pokemon_Max_CP_Attempt_8.png', format='png', dpi=350, bbox_inches='tight') # tight required to prevent cropping

print("Done plotting")

Done plotting

<Figure size 432x288 with 0 Axes>

Attempt 9 – Four Horizontal Bar Chart Subplots (sorted by CP) ¶

Attempt 8 was okay, but it is difficult to read the Pokemon names when they are on the X-Axis. Attempt 9 sorts Pokemon by CP and makes four horizontal barchart subplots so that we can fiew all in a reasonable amount of space. I think this final representation is the best.

In [13]:

import math

%matplotlib inline
#%matplotlib notebook


plt.clf() # clear plot
#plt.cla() # clear axis

# Create figure to add subplots to
fig, ax = plt.subplots(ncols=4, sharey=False)
fig.set_size_inches(11, 8.5)

# Sort by highest cp first
df = df.sort_values(['cp_level_40', 'Name'])
df = df.reset_index(drop=True) # Set index to this newly sorted order

for i in range(0,4): # make 4 subplots, increasing in CP on each
    
    # Divide data into 4 equal slices
    start = int(math.ceil(0 + (df.shape[0]/4) * i))
    stop = int(math.ceil((df.shape[0]/4) + (df.shape[0]/4) * i))   
    gen = df[start:stop]
    
    # Layer bars on top of eachother as before
    ax[i].barh(gen.index.values, gen['cp_level_40'], label='Lvl 40 Max CP', color='red')
    ax[i].barh(gen.index.values, gen['cp_level_30'], label='Lvl 30 Max CP', color='orange')
    ax[i].barh(gen.index.values, gen['cp_level_20'], label='Lvl 20 Max CP', color='green')

    # Define y-axis
    ax[i].set_yticks(gen.index.values)
    ax[i].set_yticklabels(gen['Name'] + ' ' + gen['Form'], fontsize=3)   
    ax[i].tick_params(axis='y', which='minor', bottom=False)
    
    # Define x-axis
    ax[i].set_xticks([1000,2000,3000,4000])
    ax[i].grid(axis='x')
     
    ax[i].set_xticks([500,1500,2500,3500,4500], minor=True)
    ax[i].grid(axis='x', which='minor', linestyle='--')


# Set Legend above first axes
#ax[0].legend(loc='upper center', bbox_to_anchor=(0., 1.1, 1., .102),fancybox=True, shadow=True, ncol=4)
handles, labels = ax[0].get_legend_handles_labels()
fig.legend(handles, labels, bbox_to_anchor=(0., 0.5, 1.155, .102), loc='center right', fancybox=True, shadow=True)

# Set X and Y labels on figure
#fig.suptitle('Pokemon Max CP by Level', y = 1.01, fontsize=18)
fig.suptitle('Pokemon Max CP by Level', y=1.015, fontsize=20)
fig.text(-.01, 0.5, 'Pokemon Id', ha='center', va='center', rotation='vertical')
fig.text(0.5, -0.01, 'Combat Power (CP)', ha='center', va='center')

fig.tight_layout() # Stop x-axis tick labels from being cropped

# Export Figure as high resolution
fig.savefig('Pokemon_Max_CP_Attempt_9'+'.png', format='png', dpi=350, bbox_inches='tight') # tight required to prevent cropping

print("Done plotting")

Done plotting

<Figure size 432x288 with 0 Axes>

Plotting Pokemon GO data with Python