Human Diffusion

You can find the repository here

A Q-Learning Process About The Human Migration From Africa

We start by importing the proper modules (equivalent to libraries in R).

These are

  • NumPy
  • MatPlotLib
  • Pandas
  • PIL, (Pillow) an image handler
  • tqdm, (pronounced ta-qa-dum) from Arabic (taqadum, تقدّم) meaning progress, is a simple progress bar to be able to estimate the time for each task
#pip install -r requirements.txt
from earth import Earth

Generating the Map

We initialise the picture that we want to use, and convert it into pixel values, so we can have a pure black and white image of the earth to use.

stage = Earth()

The following forloop checks each individual pixel and the converts it to black or white. The threshold was found through running the loop many times and picking a number that looked good enough.

stage.black_and_white('earth.jpg', 'newPixels.csv', 'pure-bw-earth.jpg')

We then generate the new picture and save it before we convert it into an array.

stage.generate_image('pure-bw-earth.jpg')

We are now ready to create the map we will need.

stage.plot('map.jpg')

Now that we have our map ready, we can move on to the implementation of the algorithm.

Application of the Q-Learning Algorithm

We import the necessary libraries

import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

np.random.seed(1)

and define the actions that the agent is able to perform

actions = ['west', 'east', 'north', 'south']
#coded to 0, 1, 2, 3

Then we can generate the Q-map, which gives the rewards.

q_values = np.random.uniform(-1, 1, size=(stage.height,stage.width, len(actions)))

After, we define the functions that we will use, namely tone to generate our starting position, one for the agent to take action either randomly or by checking the Q-table, and one to define the result of the action taken.

def starting_area(column, row):
    col = np.random.randint(column[0], column[1])
    row = np.random.randint(row[0], row[1])
    return col, row
    
    
def next_action(current_height, current_width, epsilon):
    if np.random.random() < epsilon:
        move = np.argmax(q_values[current_height, current_width])
    else:
        move = np.random.randint(4)
    return move
    
    
def next_location(height, width, action):
    new_width = width
    new_height = height
    
    if actions[action] == 'west' and width > -1:
        new_width = width - 1
            
    if actions[action] == 'east' and width < stage.width - 1:
        new_width = width + 1
            
    if actions[action] == 'north' and height > 1:
        new_height = height -1
    
    if actions[action] == 'south' and height < stage.height:
        new_height = height +1
        
   
    return new_height, new_width
            

Now we are ready to run the algorithm for the number of episodes we need

reward_map = np.zeros(shape=(stage.height,stage.width))
reward_map[np.where(stage.map > 0)] = -10
reward_map[:10, :] = -15
reward_map[610:, :] = -15
reward_map[:, 720:] = -15
reward_map[:, :10] = -15

#Arabian bridge
    # Gulf of Aden
reward_map[350:388, 250:282] = 0
    # Hormuz
reward_map[300:340, 290:315] = 0

# Indonesian bridge
    # Sumatra
reward_map[417:433, 485:495] = 0
    # Java
reward_map[450:455, 495:505] = 0
    # Brunei
reward_map[430:465, 525:530] = 0
    # New Guinea
reward_map[460:465, 525:645] = 0
    # Australia
reward_map[460:505, 525:605] = 0



# Bering Straight
reward_map[30:60, 580:610] = 50
# Australia
reward_map[510:540, 580:610] = 50

real_map = np.ones(shape=(stage.height,stage.width))*10
real_map[np.where(stage.map > 0)] = -10

timeline = np.arange(0, 5000)
episodes = np.arange(0, 200000)

reward_per_episode = np.zeros(len(episodes))
lifetime = np.zeros(len(episodes))
ims = []
for episode in tqdm(episodes): #30k

    epsilon = 0.7
    discount_factor = 0.3
    learning_rate = 1

    rewards = np.zeros(len(timeline))
    
    if episode >= 195000: # This statement is the way we destabilise the system to get more natural motion
    
    # India
        reward_map[390, 388] = 20      
    # New Guinea Papua
        reward_map[455, 650] = 20
    # Brunei
        reward_map[425, 540] = 20
    #Australia
        reward_map[510:540, 580:610] = 50


    old_height, old_width = 400, 230
    height, width = starting_area([old_height-5, old_height+5], [old_width-5, old_width+5])

    for year in timeline:
        try:

            action = next_action(height, width, epsilon)
            old_height, old_width = height, width
            height, width = next_location(height, width, action)

            reward = reward_map[height, width]
            rewards[year] = reward

            old_q_value = q_values[old_height, old_width, action]
            temporal_difference = reward + (discount_factor*np.max(q_values[height, width])) - old_q_value

            new_q_value = old_q_value + (learning_rate * temporal_difference)
            q_values[old_height, old_width, action] = new_q_value                

            if reward_map[old_height, old_width] > 0:
                reward_map[old_height, old_width] = 0

            real_map[old_height, old_width] = 5

        except IndexError as e:
            break

        if year == timeline[-1]:
            lifetime[episode] = year

        if reward_map[old_height, old_width] <= -10 and reward_map[height, width] <= -10:
            lifetime[episode] = year
            break            

    reward_per_episode[episode] = np.mean(rewards)
    if reward_map[510:540, 580:610].all() == 0:
        #Australia
        reward_map[510:540, 580:610] = 50
            
    if reward_map[30:60, 580:610].all() == 0:
        # Bering Straight
        reward_map[30:60, 580:610] = 50


        

plt.figure(figsize = (10,10))
plt.ylabel('Latitude')
plt.xlabel('Logntitude')
plt.xticks([])
plt.yticks([])
plt.imshow(real_map, cmap = 'ocean_r')
plt.show()