Optimization In Data Science Using Multiprocessing and Multithreading

[ad_1]

W3Schools


In the actual world, the scale of datasets is big which comes as a problem for each data science programmer. Working on it takes lots of time, so there’s a want for a method that may improve the algorithm’s velocity. Most of us are conversant in the time period parallelization that enables for the distribution of labor throughout all accessible CPU cores. Python presents two built-in libraries for optimization of this course of, multiprocessing and multithreading

Multi-Processing: Multiprocessing refers back to the capability of a system to assist a couple of processor on the similar time. It works in parallel and doesn’t share reminiscence sources.

Threading: Threads are parts of a course of, which may run sequentially. Memory is shared between the CPU core.



In this text, we’ll focus on how a lot time it takes to unravel an issue utilizing a conventional method. Further, we’ll analysis parallelization strategies like multiprocessing and multithreading that may cut back the coaching time of huge dataset for a data science drawback. While coping with bigger implementations of machine learning, the time complexity is a serious concern. Through this text, we’ll learn to deal with the priority of time complexity. 

Practical Implementation

Normal Approach

We have to verify the time taken by the python program after we go by regular method.

import time
def even(n):
        if (n % 2 == 0) : 
          print('The Number '+str(n)+" is "+"Even Number")
    
        else:
          print('The Number '+str(n)+" is "+"Odd Number")
        
starttime = time.time()
for i in vary(1,10):
    time.sleep(2)
    (i, even(i))
print()    
print('Time taken = {} seconds'.format(time.time() - starttime))

Using Multiprocessing

import time
import multiprocessing
def even(n):
  if (n % 2 == 0) : 
    print('The Number '+str(n)+" is "+"Even Number")
  else:
    print('The Number '+str(n)+" is "+"Odd Number")  
def multiprocessing_func(x):
    time.sleep(2)
    (x, even(x))
if __name__ == '__main__':
    starttime = time.time()
    processes = []
    for i in vary(1,10):
        p = multiprocessing.Process(goal=multiprocessing_func, args=(i,))
        processes.append(p)
        p.begin()
    for course of in processes:
        course of.be a part of()
    print()    
    print('Time taken = {} seconds'.format(time.time() - starttime))

From the above end result, we will see, the time taken for the computation has been diminished drastically from 18.01 sec to 2.07 sec utilizing the Process class.

See Also


Using Pool Class

import time
import multiprocessing
def even(n):
  if (n % 2 == 0) : 
    print('The Number '+str(n)+" is "+"Even Number")
  else:
    print('The Number '+str(n)+" is "+"Odd Number")
def multiprocessing_func(x):
    time.sleep(2)
    (x, even(x))
if __name__ == '__main__':
    starttime = time.time()
    pool = multiprocessing.Pool()
    pool.map(multiprocessing_func, vary(1,10))
    pool.shut()
    print()
    print('Time taken = {} seconds'.format(time.time() - starttime))

The time taken for the computation has been diminished from 18.01 sec to 10.04  sec utilizing the Pool class. This technique is slower than the above one, so it’s higher to go for a course of class approach for optimization of algorithm’s velocity.

Threading

import threading 
def even(num): 
    if (num % 2 == 0) : 
      print('The Number '+str(num)+" is "+"Even Number")
    else:
      print('The Number '+str(num)+" is "+"Odd Number")
def print_square(num): 
    """ 
    operate to print sq. of given num 
    """
    print("Square: {}".format(num * num)) 
if __name__ == "__main__": 
    starttime = time.time()
    # creating thread 
    t1 = threading.Thread(goal=print_square, args=(10,))
    # beginning thread 1 
    t1.begin() 
    for i in vary(1,10):
      t2 = threading.Thread(goal=even, args=(i,)) 
      # beginning thread 2 
      t2.begin() 
    # wait till thread 1 is totally executed 
    t1.be a part of() 
    # wait till thread 2 is totally executed 
    t2.be a part of() 
    print('Time taken = {} seconds'.format(time.time() - starttime))

Even with the elevated variety of duties threading computation velocity may be very quick.The time taken for computation is diminished from 18.01 to 0.013.

Conclusion

So, we will conclude that multiprocessing and threading have nice computational velocity. As the development of accelerating parallelism will proceed to rise in future, these strategies will develop into extra and extra vital in offering options to a data science drawback in a a lot lesser time.The full code of the above implementation is offered on the AIM’s GitHub repositories. Please go to this hyperlink to search out the code.


If you liked this story, do be a part of our Telegram Community.


Also, you’ll be able to write for us and be one of many 500+ consultants who’ve contributed tales at AIM. Share your nominations right here.

Ankit Das

Ankit Das

A knowledge analyst with experience in statistical evaluation, knowledge visualization able to serve the business utilizing varied analytical platforms. I sit up for having in-depth data of machine learning and data science. Outside work, yow will discover me as a fun-loving particular person with hobbies resembling sports activities and music.



[ad_2]

Source hyperlink

Write a comment