Concurrency in Python

Recently, I was required to process bunch of huge CSV and perform the output. I wrote a simple python program but it was dreadfully slow.

I thought why not make it process through multiple threads. Python is notorious for not having a good support for concurrency. Some of it is because of its Global Interpreter lock

Recent versions of python > 3.x , do have multiprocessing and mulithreading modules built in.

Honestly, I find the whole world of multiprocessing / multithreading very confusing in python world.

Below is a sample program, which reads the CSV line by line and submits it to a pool of 50 workers, running concurrently.

from multiprocessing import Pool
from random import randint
from time import sleep
import csv
import requests
import json

def orders_v4(order_number):
    response = requests.request("GET", url, headers=headers, params=querystring, verify=False)
    return response.json()

newcsvFile=open('gom_acr_status.csv', 'w')
writer = csv.writer(newcsvFile)

def process_line(row):
    ol_key = row['ORDER_LINE_KEY']
    orders_json = orders_v4(order_number)
    oms_order_key = orders_json['oms_order_key']

    order_lines = orders_json["order_lines"]
    for order_line in order_lines:
        if ol_key==order_line['order_line_key']:
            ftype = order_line['fulfillment_spec']['fulfillment_type']
            status_desc = order_line['statuses'][0]['status_description']
            listrow = [ol_key, order_number, ftype, status_desc]            

def get_next_line():
    with open("gom_acr.csv", 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            yield row

f = get_next_line()
t = Pool(processes=50)

for i in f:
    results = t.map_async(process_line, (i,))