Understanding Python Generators: Voting Time!

Hello Pythonistas🙋‍♀️, welcome back. In the previous post, we explored iterators in Python, which are great for saving💰 memory.

However, they do have two ✌ limitations:

  • a long 🚂 syntax and
  • complex logic 🥴 that can be difficult to understand at first glance.

To overcome these limitations, we will explore the concept of generators in Python. Generators provide a more Pythonic 🤜🤛 approach to working with iterators, making them easier to use and understand.

In this article, we will delve in-depth into generators, how they work, and how they can help you write more efficient 💨 and readable 🤓 code.

So, let’s dive in and explore the world 🌎 of generators in Python…

Previous post’s challenge’s solution

Here’s the solution🧪 to the previous post’s challenge:

# This is an iterator class that takes a list and returns only even numbers in that list
class Even:
    def __init__(self, lst):
        # This is the input list
        self.lst = lst
        # This is to keep count of the current element in the list
        self.count = 0
    def __iter__(self):
        # we are returning self as it is an iterator itself.
        return self
    
    def __next__(self):
        # This is to iterate over the complete list
        while self.count < len(self.lst):
            # To get value of the current element
            val = self.lst[self.count]
            # Increasing count by 1 to move on to next element
            self.count+=1
            # To know if the current element is even
            if(val%2==0):
                # returning the value if it is even
                return val
        # to say that there are no more elements
        raise StopIteration()
    
input_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# giving the list to iterator class
# converting the resulting sequence to list
output_list = list(Even(input_list))
# printing the list
print(output_list)

Read the comments in the code to understand it clearly.👍

If you still have any doubts ask them in the comment section below.👇

What are generators?

Generators are special ✨ functions in python. These special functions return iterators.

They are designed to generate values on the fly, rather than loading the whole sequence in memory at once.☝️

Let me give you an analogy to help you understand generators better.👍

Imagine a classroom election with 50 students. If all of them were to vote🗳️ at once, it would take quite a long time for everyone to cast their vote🗳️, and it may not even keep the voter’s identity a secret.🤫

But if the students👩‍🎓👨‍🎓 were to vote one by one according to their serial📝 number, it would be more efficient and civilized, right?

Well, that’s precisely what generators do! They load the elements of a sequence as and when required, rather than all at once. It’s a much more convenient way of generating values on the fly🥏🐤, don’t you think?

Let’s say we want to separate the students👩‍🎓👨‍🎓 into two✌️ lines such that the ones with even roll call get in one line and the ones with odd get in another:

def even():
    for i in range(1,51):
        if(i%2==0):
            yield i
even_gen_iter = even()
print(even_gen_iter)

Output:

<generator object even at 0x0000014A502413F0>

As you can see, a generator is a special✨ function that returns an iterator object.

You can use it just like any other iterator class.

Here are two ways you can use the even_gen_iter iterator:

even_gen_iter = even()
print(next(even_gen_iter))
print(next(even_gen_iter))
print(next(even_gen_iter))
print(next(even_gen_iter))
print(next(even_gen_iter))

OR:

even_gen_iter = even()
for j in even_gen_iter:
    print(j)

This way you can make students with even roll call in one line and the rest in other.

You may be curious 🧐 about what that funky yield keyword does in the even() function we just talked about. Cause usually any function simply returns a value.

Let’s see what’s this yield thing…

yield vs return

When a function encounters a “return” statement, it stops🛑 executing and returns a value (if there is one) to the caller.

On the other hand, when a function encounters a “yield” statement, it pauses⏸️ a function and returns a value while saving📥 its state

Let’s understand it with our analogy.

Say you are the teacher👩‍🏫 of this class and you are in charge of conducting this election.

You have two ✌️ choices:

  1. Let students👩‍🎓👨‍🎓 stand in a row and cast their votes🗳️ all at once. For this, you would need 50 ballot boxes. Which might not be a possibility. It will happen in the case of a return statement:
def vote():
    students =[]
    for i in range(1,51):
        students.append(i)
    return students
print(vote())

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50]     

This would load the whole 1 to 50 at once in the memory.

  1. Let students👩‍🎓👨‍🎓 stand in a row and cast their votes one by one. For this, you would need 1 ballot🗳️ box. Which sounds quite feasible. It will happen in the case of a yield statement:
def vote():
    for i in range(1,51):
        yield i
votes =  vote()
print(next(votes))
print(next(votes))
print(next(votes))

Output:

1
2
3

This would load the 1 to 50 one by one in the memory.1️

In the first case, a list📝 of numbers from 1 to 50 will be returned. In the second case, an iterator will be returned which will start from 1 and will go to 50.🐤

Thus, the yield statement makes a function generator, it allows the function to pause⏸️ execution and save📥 its current state.

On the other hand, return just returns the value and control back to the caller.

Featureyieldreturn
Function TypeGenerator functionNormal function
Execution FlowPauses function execution and saves state. The function can be resumed from where it left off.Ends function execution and returns a value to the caller.
OutputGenerates a sequence of valuesComputes and returns a single value
UsageUsed when generating large sequences of values in a memory-efficient way.Used to return a value to the caller for further processing or output.
Can be repeatedYes, “yield” can be used multiple times to generate a sequence of values.No, “return” can only be used once to return a single value.
Caller responseThe caller receives a generator object that can be iterated over to get each value generated by the function.The caller receives the returned value from the function, which can be assigned to a variable or used directly in further processing.
Control FlowThe function remains under the control of the caller, which can request additional values from the generator function.The function gives up control to the caller and does not resume execution after the “return” statement.

generator expression

Generators are a powerful tool in Python for creating iterators, but generator expressions take⏩ it to the next level.

A generator expression is similar ↔️ to a list comprehension, but instead of returning a list📝, it returns an iterator.🐤⏸️📥

A generator expression is an expression that looks just like a list comprehension with square brackets [] replaced💱 by paranthesis () and they do return an iterator instead of a list.

This is particularly useful when you don’t want to create an entirely new🆕 function just to generate an iterator.

For example, let’s take the example of 50 students 👩‍🎓👨‍🎓 🗳️ (voters) say you have a list of those 50 students and you want to create an iterator for their roll calls. You can use a generator expression like this:

stu = (i for i in range(1,51))

This generator expression will generate values from 1 to 50, as specified by the range() function.

The values will be generated one at a time as needed, allowing for memory-efficient iteration through a large range of numbers.

In addition, built-in functions like sum(), min(), and max() that take iterators can also take a generator expression. 

For instance, if you want to get the sum of squares of the first 10 numbers, you can use this expression:

sq = (x ** 2 for x in range(1, 11))
print(sum(sq))

generator expression vs list comprehension

As I already mentioned though they look the same ↔️ generator expressions and list comprehensions are two✌️ different things.

List comprehensions return a list📝, and generator expressions return an iterator.🐤⏸️📥

List comprehensions are enclosed in square brackets [] and generator expressions are enclosed in parenthesis ().

Generator expressions are usually used when you are dealing 🤝 with large amounts of data, using for loops. As they generate values on the fly.🐤

So, ditch💔 what list comprehensions, or generator expressions?

None. Both have their own importance. Their use will depend on your requirements.

 Use list comprehension when you want to create a new list based on an existing one.

And if you find yourself working with large data sets or need to iterate multiple times over the same data without changing it, consider using generator expressions instead of list comprehensions.

Some Advanced Generator Methods

In addition to the basic “yield” statement, there are three 3️⃣ advanced methods that can be used to create generators:

  1. .send(),
  2. .throw(), and
  3. .close()

These methods are not ❌ commonly used when creating simple generators, but they can be very useful when dealing🤝 with more complex scenarios.

Let’s take a closer look at each of these methods and how they can be used to create more sophisticated generators.

.send()

.send() method is used to send a value to the generator.

This is similar to a student 👩‍🎓 giving their voting slip 📃 to the teacher after they have voted. To ensure every student votes just once.

The value sent is used as the result of the current yield expression.

Here’s how the code will look:

def election():
    roll_call = 0
    while True:
        vote = yield roll_call
        roll_call += 1
        print(vote)
        print(f"Roll call {vote} voted.")
# Create generator object
e = election()
next(e) #starting the generator
# Iterate through 1-50 and "vote" on each one
for i in range(1, 51):
    e.send(i)

The function contains a loop that runs indefinitely♾️ (while True:), and on each iteration, it uses the yield keyword to return the current value of roll_call.

The first time the generator is called roll_call is initialized to 0. On subsequent iterations, roll_call is incremented by 1. (++)

The yield keyword also allows the function to receive📲 values from the caller of the generator.

Specifically, the value of vote is assigned to the result of the yield expression.

This means that whenever the generator is called using the send() method, the value passed as an argument to send() will be assigned to vote.

The two✌️ print() statements in the function simply print out the value of vote and a message indicating that the corresponding roll call has voted.✅

These two lines 10th and 11th of the code create an instance of the generator using the election() function, and then start the generator by calling next(e).

This causes the yield statement in the generator function to execute, and returns the initial value of roll_call, which is 0.

At this point, the generator is “paused” ⏸️ and waiting for a value to be sent to it.

This code iterates through the numbers 1 to 50 (inclusive) and sends📡 each number to the generator using the send() method.

When the generator receives 📲 each value, it assigns it to the vote variable, increments(++) the value of roll_call, and prints out a message 💬 indicating that the corresponding roll call has voted.

Note that the first time send() is called, it will “resume”⏯️ the generator and pass the value to the yield expression in the election() function.

After that, each call to send() will resume the generator and pass the value to the vote variable in the loop.

.throw()

.throw() method is used to raise ✋ an exception in the generator.

This is like you have a student👨‍🎓 in your class who always has a doubt🤷‍♀️, and will stop to ask you about it.

Say you know that a student👨‍🎓 with roll call 30 is one such in your class.

Here’s how you will modify the code above for this student:

def election():
    roll_call = 0
    while True:
        try:
            vote = yield roll_call
            roll_call += 1
            print(vote)
            print(f"Roll call {vote} voted.")
        except Exception as e:
            print("Doubt:", e)
# Create generator object
e = election()
next(e) #starting the generator
# Iterate through 1-50 and "vote" on each one
for i in range(1, 51):
    if i == 30:
        e.throw(NameError("How does the ballot work behind the scenes?"))
    e.send(i)

In the election() function, the try-except block catches any exceptions that might occur when the generator is receiving values through the yield expression.

In this case, if an exception is raised, the except block catches the exception and prints out a message indicating that there is some doubt or confusion, along with the error🚩 message.

In the for loop➰ that follows, the code iterates through the numbers 1 to 50 (inclusive) and sends📡 each number to the generator using the send() method.

However, we have added an if statement that checks if the current number being iterated through is equal to 30(because of that student).

If so, it uses the throw() method to raise a NameError exception with the message “How does the ballot🗳️ work behind the scenes?”.

This will cause the try-except block in the generator function to catch the exception and print out the associated message.

Note: I have used this exception so that we would not need to define a new🆕 exception of our own. You can use any other exception or can also define your own.

.close()

.close() method is used to stop 🛑 the generator.

This is like the teacher👩‍🏫(you) announcing the end of the voting process.

Here’s what you will add to the code:

def election():
    roll_call = 0
    while True:
        try:
            vote = yield roll_call
            roll_call += 1
            print(vote)
            print(f"Roll call {vote} voted.")
        except GeneratorExit:
            print("Elections are over now!")
            return
        except Exception as e:
            print("Doubt:", e)
# Create generator object
e = election()
next(e) #starting the generator
# Iterate through 1-50 and "vote" on each one
for i in range(1, 51):
    if i == 30:
        e.throw(NameError("How does the ballot work behind the scenes?"))
    e.send(i)
e.close()

The GeneratorExit exception is raised✋ when the close() method is called on the generator object. This exception is used to signal the generator to shut down🛑 and release any resources it’s holding.

an additional except block has been added in the election() function to handle🤝 the GeneratorExit exception.

By handling the GeneratorExit exception in the election() function, the code is able to print a message💬 indicating that the elections are over👍 and return from the function when the generator is closed.

This ensures that the generator is properly shut down and any resources it’s holding are released.

Pipelining with Generators

Pipelining refers to a technique in which data is transferred from one generator to another in a sequence or chain.

The output of the first👆 generator works as an input to the next✌️ generator, the output of the next generator works as an input to the generator next to it, and so on.

For example, say you have all the votes from students in a CSV file in this format:

Roll callVoted to

Now to declare the result you want this data to go through stages:

  • Extract the whole data in python.
  • Just keep the “Voted to” column.
  • Categorize the data in the “Voted to” column.
  • Get the sum of votes various categories.

To extract to whole data:

import csv
def get_result():
    with open("C:\\Users\\maitr\\Desktop\\pyhon-hub\\GUI\\election_results.csv") as f:
        result_data = csv.reader(f)
    
        for row in result_data:
            if row != ['Roll Call', ' Voted to']:
                yield row

To keep the “Voted to” column only:

def filter_results(res_row):
    for row in res_row:
        ch = row[1]
        yield ch

To categorize the votes:

def sunflower_results(filter_row):
    for row in filter_row:
        if "Sunflower" in row:
            yield row
def lily_results(filter_row):
    for row in filter_row:
        if "Lily" in row:
            yield row

To get the sum of votes in various categories:

lily = len(list(lily_results(filter_results(get_result()))))
sunflower = len(list(sunflower_results(filter_results(get_result()))))
print(f"Sunflower results: {sunflower}")
print(f"Lily results: {lily}")

Full code:

import csv
def get_result():
    with open("C:\\Users\\maitr\\Desktop\\pyhon-hub\\GUI\\election_results.csv") as f:
        result_data = csv.reader(f)
    
        for row in result_data:
            if row != ['Roll Call', ' Voted to']:
                yield row
def filter_results(res_row):
    for row in res_row:
        ch = row[1]
        yield ch
def sunflower_results(filter_row):
    for row in filter_row:
        if "Sunflower" in row:
            yield row
def lily_results(filter_row):
    for row in filter_row:
        if "Lily" in row:
            yield row
lily = len(list(lily_results(filter_results(get_result()))))
sunflower = len(list(sunflower_results(filter_results(get_result()))))
print(f"Sunflower results: {sunflower}")
print(f"Lily results: {lily}")

Real-world use cases of generators

Processing large files📂

If you need to work with a massive file that won’t fit into memory💾, generators can be a lifesaver.

By reading and processing the file line-by-line📄 or chunk-by-chunk, you can save memory and improve efficiency.

Libraries like pandas 🐼 and BioPython🧬 use generators to handle large CSV and genomic data files, respectively.

Generating large datasets

Sometimes you need to create a ton of data for testing or analysis📊, but you don’t have the disk space or memory to store it all.

That’s where generators come in!

They can create data on-the-fly 🐤, saving you time⏱️ and space.💾

The NumPy and PyTorch 🔦 libraries use generators to generate arrays of random numbers and batches of training data, respectively.

Streaming data

If you’re working with data from a web API🌐 or database👨🏻‍💻, you might not want to fetch it all at once.

That’s where streaming comes in – and generators can help with that too.

By fetching data in chunks and processing it on-the-fly 🐤, you can save bandwidth and processing time⏱️.

Libraries like tweepy and pyspark use generators to stream real-time Twitter data and process large datasets in parallel.

Parallel processing

Finally, if you need to process data in parallel across multiple processes or threads🧵, generators can help you split up the data into chunks and distribute them efficiently.

The concurrent.futures module and mpi4py library both use generators for parallel processing.

Conclusion

There we come to the end of this post.

We have covered a range of concepts and features related to generators in Python, drawing parallels with the voting process in a class.

Throughout this post, we delved into the fundamentals of generators, highlighting the distinctions between yield and return statements.

Additionally, we explored the realm of generator expressions and discerned their divergences from list comprehensions.

Next, we saw some advanced generator methods and generator pipelines.

Finally, to solidify our understanding, we examined real-world applications of generators, discovering their invaluable utility.

Remember this is not it you can dive deeper into this concept. However, it’s better to do it by making projects than just mugging up theory.

Official tutorial on python generators.

Challenge 🧗‍♀️

Your challenge is to create a mini file🕵️‍♂️🔍 search tool in Python. You need to create a program that takes two inputs from the user:

  1. 📁Directory: The directory path where you want to search for files.
  2. 📝Extension: The file extension you want to search for.

Your task is to implement a generator function that provides an iterator to iterate over the files in the specified directory whose extensions match the provided input.

To accomplish this, you can utilize:

  • The os module’s listdir() method to retrieve the list of files in the directory.
  • Then, use the endswith() method to filter and yield the files that have the matching extension.

Example usage:

# Implement your generator function here
# ------------
directory = input("Enter the directory path: ")
extension = input("Enter the file extension: ")
# Iterate over the matching files
files = file_search(directory, extension)
for file in files:
    print(file)

Remember to handle any potential errors⚠️ or invalid inputs gracefully to ensure the smooth execution of the program.

Your goal is to create an efficient and user-friendly file search tool that accurately identifies and presents the files with the specified extension in the given directory.

Happy solving…

Stay happy 😄 and keep coding and do suggest any improvements if there.

Take care and have a great 😊 time I’ll see you soon in the next post…Bye Bye👋

Leave a Reply