Technology0Programmer’s Python Async – Process-Based Parallelism

[ad_1]

Page 1 of 3

The asyncio module may be the hottest topic in town, but it isn’t the only asynchronous feature in Python worth knowing about. Find out about process-based parallelism in this extract from my new book Programmer’s Python: Async. It has the advantage of no GIL problems.


Programmer’s Python:
Async
Threads, processes, asyncio & more

Is now available as a print book: Amazon

pythonAsync360Contents

1)  A Lightning Tour of Python

Python’s Origins, Basic Python, Data Structures, Control Structures – Loops, Space Matters, Conditionals and Indenting, Pattern Matching, Everything Is An Object – References, Functions , Objects and Classes, Inheritance, Main and Modules, IDEs for Python, Pythonic – The Meta Philosophy, Where Next, Summary.

2) Asynchronous Explained

A Single Thread, Processes, I/O-Bound and CPU-Bound, Threads, Locking, Deadlock, Processes with Multiple Threads, Single-Threaded Async, Events,,Events or Threads, Callback Hell, More Than One CPU – Concurrency, Summary.

3) Processed-Based Parallelism

        Extract 1 – Process Based Parallism

The Process Class, Daemon, Waiting for Processes, Waiting for the First to Complete, Computing Pi, Fork v Spawn, Forkserve, Controlling Start Method, Summary.

4) Threads

The Thread Class, Threads and the GIL, Threading Utilities, Daemon Threads, Waiting for a Thread, Local Variables, Thread Local Storage, Computing Pi with Multiple Threads, I/O-Bound Threads, Sleep(0), Timer Object, Summary.

5) Locks and Deadlock

Race Conditions, Hardware Problem or Heisenbug, Locks, Locks and Processes, Deadlock, Context Managed Locks, Recursive Lock, Semaphore, Atomic Operations, Atomic CPython, Lock-Free Code, Computing Pi Using Locks, Summary.

 6) Synchronization

Join, First To Finish, Events, Barrier, Condition Object, The Universal Condition Object, Summary.

 7) Sharing Data

The Queue, Pipes, Queues for Threads, Shared Memory,  Shared ctypes, Raw Shared Memory, Shared Memory, Manager, Computing Pi , Summary.

8) The Process Pool

Waiting for Pool Processes, Computing Pi using AsyncResult, Map_async, Starmap_async, Immediate Results – imap, MapReduce, Sharing and Locking, Summary.

9) Process Managers

The SyncManager, How Proxies Work, Locking, Computing Pi with a Manager, Custom Managers, A Custom Data Type, The BaseProxy, A Property Proxy, Remote Managers, A Remote Procedure Call, Final Thoughts, Summary.

10) Subprocesses

Running a program, Input/Output, Popen, Interaction, Non-Blocking Read Pipe, Using subprocess, Summary.

11) Futures

Futures, Executors, I/O-Bound Example, Waiting On Futures, Future Done Callbacks, Dealing With Exceptions, Locking and Sharing Data, Locking and Process Parameters, Using initializer to Create Shared Globals, Using a Process Manager to Share Resources, Sharing Futures and Deadlock, Computing Pi with Futures, Process Pool or Concurrent Futures, Summary.

12) Basic Asyncio

        Extract 1 Basic Asyncio

Callbacks, Futures and Await, Coroutines, Await, Awaiting Sleep, Tasks, Execution Order, Tasks and Futures, Waiting On Coroutines, Sequential and Concurrent, Canceling Tasks, Dealing With Exceptions, Shared Variables and Locks, Context Variables, Queues, Summary.

13) Using asyncio

Streams, Downloading a Web Page, Server, A Web Server, SSL Server, Using Streams, Converting Blocking To Non-blocking, Running in Threads, Why Not Just Use Threads, CPU-Bound Tasks, Asyncio-Based Modules, Working With Other Event Loops – Tkinter, Subprocesses, Summary.

14) The Low-Level API

    Extract 1 – Streams & Web Clients

The Event Loop, Using the Loop, Executing Tasks in Processes, Computing Pi With asyncio, Network Functions,
Transports and Protocols, A UDP Server, A UDP Client, Broadcast UDP, Sockets, Event Loop Implementation, What Makes a Good Async Operation, Summary.

Appendix I Python in Visual Studio Code

 

Most books on asynchronous programming start by looking at threads, as these are generally regarded as the building blocks of execution. However, for Python there are advantages in starting with processes, which come with a default single thread of execution. The reasons are both general and specific. Processes are isolated from one another and this makes locking less of an issue.

There is also the fact that Python suffers from a restriction on the way threads are run called the Global Interpreter Lock or GIL which only allows one thread to run the Python system at any specific time. This means that, without a lot of effort, multiple threads cannot make use of multiple cores and so do not provide a way of speeding things up. There is much more on threads and the GIL in the next chapter, but for the moment all you need to know is that processes can speed up your program by using multiple cores. Using Python threads you cannot get true parallelism, but using processes you can.

What this means is that if you have an I/O-bound task then using threads will speed things up as the processor can get on with another thread when the I/O bound thread is stalled. If you have CPU-bound tasks then nothing but process parallelism will help.

The Process Class

The key idea in working with processes is that your initial Python program starts out in its own process and you can use the Process class from the multiprocessing module to create sub-processes. A child process, in Python terminology, is created and controlled by its parent process. Let’s look at the simplest possible example:

import multiprocessing
def myProcess():
    print("Hello Process World")
if __name__ == '__main__':
    p1=multiprocessing.Process(target=myProcess)
    p1.start()

In this case all we do is create a Process object with its target set to the myProcess function object. Calling the start method creates the new process and starts the code running by calling myProcess. You simply see Hello Process World displayed.

There seems to be little that is new here, but a lot is going on that isn’t obvious. When you call the start method a whole new process is created, complete with a new copy of the Python interpreter and the Python program. Exactly how the Python program is provided to the new process depends on the system you are running it on and this is a subtle point that is discussed later.

process1

For the moment you need to follow the three simple rules:

  1. Always use if __name__ == ‘__main__’: to ensure that the setup code doesn’t get run in the child process.

  2. Do not define or modify any global resources in the setup code as these will not always be available in the child process.

  3. Prefer to use parameters to pass initial data to the child process rather than global constants.

The reasons for these rules are explained in detail later.

In general to create and run a new process you have to create a Process object using:

class multiprocessing.Process(group=None, target=None, 
            name=None, args=(), kwargs={}, daemon=None)

You can ignore the group parameter as it is just included to make the call the same as the one that creates threads in the next chapter. The target is the callable you want the new process to run and name is that applied to the new process. If you don’t supply a name then a unique name is constructed for you. The most important parameters are args and kwargs which specify the positional and keyword parameters to pass to the callable.

For example:

p1=multiprocessing.Process(target=myProcess,(42,43),
{“myParam1”:44})

will call the target as:

myProcess(42,43,myParam1=44)

The Process object has a number of useful methods and attributes in addition to the start method. The simplest of these are name and pid.

The name attribute can be used to find the assigned name of the child process. It has no larger meaning in the sense that the operating system knows nothing about it. The pid attribute, on the other hand, is the process identity number which is assigned by the operating system and it is the pid that you use to deal with child processes via operating system commands such as kill.

Both the name and pid attributes help identify a process, but for security purposes you need to use the authkey attribute. This is set to a random number when the multiprocessing module is loaded and it is intended to act as a secure identifier for child processes. Each Process object is given the authkey value of the parent process and this can be used to prove the process is indeed a child process of the parent, more about this later.

The multiprocessing module also contains some methods that can be used to find out about the environment in which the child processes are running.

  • multiprocessing.cpu_count() gives the number of CPUs i.e. cores in the system. This is the theoretical maximum number of processes that can run in parallel. In practice the maximum number available is usually lower.

  • multiprocessing.cpu_set_executable() gives the location of the Python interpreter to use for child processes.

A Python process isn’t a basic native process. It has a Python interpreter loaded along with the Python code defined in the parent process and it is ready to run the function that has been passed as the target.

 

[ad_2]

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *