Have you ever seen this warning ⚠:
What if I ask you to create this functionality🤔? You might use a lot of loops ➰ and conditional statements based on the complexity required.
Regular expressions in python can help🤝🏻 you do it in a line.
Instead of using 2-3 loops to find email addresses from a text you can simply use this:
email_pattern = r'[A-Za-z0-9]+\.+@+[A-Za-z0-9]+\.com'
Don’t take my word for it see yourself by reading 👁️🗨️ ahead.
Regular expressions and regex are the same thing. To save time and oxygen, people say “regex” for short.
Contents
Previous post’s challenge’s solution
Here’s the solution 🧪 to the challenge provided in the last post:
class TaskManager:
def __init__(self):
"""
Initializes a TaskManager object.
It checks if the 'tasks.json' file exists. If not, it creates the file.
"""
try:
with open("tasks.json","x") as self.file:
pass
except FileExistsError:
with open("tasks.json") as self.file:
pass
def add_task(self, task: str, deadline: str, status="pending") -> None:
"""
Adds a new task to the task manager.
Args:
task (str): The task description.
deadline (str): The deadline for the task.
status (str, optional): The status of the task. Defaults to "pending".
Returns:
None
"""
try:
with open("tasks.json") as self.file:
exist_dict = json.load(self.file)
except Exception:
exist_dict = {}
with open("tasks.json", "w") as self.file:
task_dict = {
task:{"deadline":deadline, "status":status}
}
exist_dict[task] = task_dict[task]
json.dump(exist_dict, self.file, indent=2)
def delete_task(self, task:str) -> None:
"""
Deletes a task from the task manager.
Args:
task (str): The task to be deleted.
Returns:
None
"""
try:
with open("tasks.json") as self.file:
exist_dict = json.load(self.file)
del exist_dict[task]
with open("tasks.json", "w") as self.file:
json.dump(exist_dict, self.file, indent=2)
except Exception:
print("No such task yet")
def view_tasks(self) -> None:
"""
Displays all the tasks in the task manager.
Returns:
None
"""
try:
with open("tasks.json") as self.file:
exist_dict = json.load(self.file)
print(exist_dict)
except Exception:
print("No tasks yet")
Please read the comments in the code to understand it clearly.👍 If you still have any doubts ask them in the comment section below.👇
What is Regex?
Regular expression or regex is a mysterious🕵️♀️-looking string that describes a search🔍 pattern. It is used to search for a pattern through a string.
These are available in around all programming languages to save 🤝 programmers 👨💻 from being lost in loops and if(s).
Here’s what they might look like:
rexp = r'^[a-z]{9}[A-Z]+\.$'
Let’s try to understand these before jumping 🦘 into the re
module of Python.
Once you know the patterns well, their results are easy👌👍 to deal with.
position
The ^
🥕 and $
💵 signs describe a position.
The ^
(caret)🥕 sign is used to describe the start of a string. And the $
(dollar)💵 sign is to describe the end of the string.
Look at this list of words:
words = ["simply", "dummy", "easily", "scrambled", "only5"]
The pattern:
pattern = r's'
r
is written to show it’s a raw string. You can also use re.compile()
function for this.
Will match simply, easily, and scrambled. But if you add a ^
🥕 before ‘s’ (‘^s’). It would only match simply and scrambled. The ^
(caret)🐇 sign shows the start of the string.
If the string starts with a particular pattern then only it is matched.
On the other hand, $
💵 sign matches a string only if it ends 🔚 with a particular pattern. Example:
pattern = r'y$'
This will match simply, dummy, and easily.
set of characters
The [a-z]
and [A-Z]
describe a set of characters. These sets can be used to match a specific range of characters.
For example, if you have a pattern like:
pattern = r'[aeiou]' #It is a custom character set
it would match any lowercase vowel (a, e, i, o, u).
[a-z]
will match if the string has any lowercase letters. [A-Z]
will match if the string has any uppercase letters.
\d
matches if the string has any numeric digit in it. In our example, it will match only5. But, if we write\d\d
then it wouldn’t match any as there is no string having two consecutive digits.
The \w
will match any letter, digit, or an _. The w stands for “word” character. It will match all the strings in our example.
The \s
will match whitespace. In our case, no string has a whitespace.
What if you want to match a string that has 50 word-character pattern?
Would you write \w
50 times?
No. You will use quantifiers.
Quantifiers
The +
, *
, and {9}
describe quantifiers. These are ways to specify the number of copies of an expression.
‘+’ matches one or more occurrences of the preceding pattern. For example, [a-z]+ would match one or more lowercase letters.
- ‘*’ matches zero or more occurrences of the preceding pattern.
- ‘?’ matches zero or one occurrence of the prior pattern.
- ‘{n}’ matches exactly n occurrences of the prior pattern.
- ‘{n,}’ matches at least n occurrences of the prior pattern.
- ‘{n,m}’ matches between n and m occurrences of the preceding pattern.
Putting it all together, let’s take a look at the previously mentioned pattern: rexp = r’^[a-z]{9}[A-Z]+\.$’
This pattern will match a string that:
- starts with exactly nine lowercase letters,
- followed by one or more uppercase letters, and
- ends with a period
'.'
.
The ‘^’ represents the start of the string, [a-z]{9}
matches exactly 9 lowercase letters, [A-Z]+
matches one or more uppercase letters, and \.$
matches a period at the end of the string.
There is a lot more to regular expressions. You can play with them on the regexr website.
Regex module functions
Making the password strength checker regex pattern
Let’s make a regex string for checking password strength 💪 before we dive into the functions of the re
module.
Passwords have at least 8 characters:
pattern = r"{8,}"
They have at least 1 capital letter:
pattern = r"(?=.*[A-Z]).{8,}"
(?=)
is a positive lookahead that checks if a pattern exists ahead in the string..*
allows traversing the entire string.[A-Z]
checks for at least one uppercase character.
- The string must have at least one uppercase letter.
- The string must be a minimum of 8 characters long.
They also have at least 1 lower case letter:
pattern = r"(?=.*[a-z])(?=.*[A-Z]).{8,}"
They also have at least 1 digit:
pattern = r"(?=.*[a-z])(?=.*[A-Z])(?=.*\d).{8,}"
They also have at least 1 special character(@$!#%^&*
):
pattern = r"(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!#%^&*]).{8,}"
search()
Okay, so let’s first talk about search()
function of the re
module.
This function takes in two ✌️ arguments: pattern and string.
It returns a match object if the match is found👍. And returns None if not👎.
Say for example:
text = "With no fancy words, python is a language used to communicate with computers. It is considered an easier language compared to other languages"
pattern = r"is"
print(re.search(pattern,text))
It will return us a match object like this:
<re.Match object; span=(28, 30), match='is'>
It only returns the first occurrence of match.
match()
The match function returns a match object only if the pattern is found at the start of the string.
In this case, it would return None.
If you want all the matches you can use the findall()
🔍 function, it returns, a list of match strings.
finditer()
It is a very useful function as it returns iterator of match objects found.
print(re.finditer(pattern,text))
Output:
<callable_iterator object at 0x00000229A8CDBE20>
Match object
It’s great we know that these functions return a match object(s). But how on the earth🌎 are we gonna use them?
There are ✌️ 4 ✌️ major methods that help you get data from the match objects:
- group()
- start()
- end()
- span()
Let’s see what the group()
function returns:
result = re.search(pattern, text)
print(result.group())
Output:
is
It returns the matched string.
Moving on to the start()
function:
result = re.search(pattern, text)
print(result.start())
Output:
28
It returns the starting index number of the matched string.
Time to see what the end()
function returns:
result = re.search(pattern, text)
print(result.end())
Output:
30
It returns the last index number of the matched string.
Last but not least the span()
function:
result = re.search(pattern, text)
print(result.span())
Output:
(28, 30)
It returns a tuple of the start and last index number of the matched string.
Now let’s make a function that checks whether a password is strong enough or not using the expression we made above:
def check_password_strength(password:str) -> bool: #parameter password which is a string and returns a boolean.
"""This function checks password strength.
Args:
password (str)
Returns:
True if:
- At least 8 characters long
- Contains at least one lowercase letter
- Contains at least one uppercase letter
- Contains at least one digit
- Contains at least one special character (e.g., !@#$%^&*)
Otherwise returns False.
"""
# Regular expression pattern for password strength
pattern = r"(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!#%^&*]).{8,}"
# Check if the password matches the pattern.
if re.search(pattern, password):
return True
else:
return False
# Example usage
password = "Stro1ng@Psswrd"
strength = check_password_strength(password)
print(strength)
Everything is explained clearly inside the code.
Conclusion
We started off with understanding what are regular expressions. We explored various patterns.
Then we made a regex for checking the strength 💪 of a password. Then we explored functions of the re
module in python that allow you to use regular expressions.
Next, we saw how to deal with the match objects returned by these functions.
Finally, we wrote a function to check password strength using the expression we made.
Challenge 🧗♀️
Your challenge for today is to make a simple URL Extractor.
You might need to refer to this cheat sheet.
The URL Extractor will extract URLs from a given text(s) input using regular expressions (regex).
It will take a text input as an argument and will search for URLs present within the text. It will then extract and display the identified URLs.
Features:
- User Input: The script prompts the user to enter a text input that may contain URLs.
- URL Extraction: The script applies regex patterns to identify URLs within the input text.
- Display Extracted URLs: Once the URLs are extracted, the script displays the identified URLs to the user.
- Multiple URLs: The script is designed to handle multiple URLs within the text input and extract them all.
- Error Handling: The script includes appropriate error handling to handle cases where no URLs are found or invalid input is provided.
Happy solving…
Stay happy 😄 and keep coding and do suggest any improvements if there.
Take care and have a great 😊 time I’ll see you soon in the next post…Bye Bye👋