Compare strings in Python

In Python, a string is an immutable object. A variable is just a label given to an object in the memory. It means, if two variables are assigned the same string value, they are really referring to the same string object in memory. This fact can be verified by checking their id() value.

Example: id()

str1="Hello"
str2="Hello"
str3="HELLO"
print (id(str1), id(str2), id(str3))

Output

1215823728944 1215823728944 1215823729648

Hence, comparison operator == for checking equality returns True if two string operands have same id() value, and False otherwise.

Example: String Comparison using ==

print(str1 == str2)
print(str1 == str3)

Output

True
False

Python also has != operator (read as is not equal to), which obviously returns True if id() values of string operands are different, and False if same.

Example: String Comparison using !=

print(str1 != str2)
print(str1 != str3)

Output

False
True

Python also has the identity operator called is. This operator evaluates to True if id of two operands is the same.

Example: String Comparison using 'is'

print(str1 is str2)
print(str1 is str3)

Output

True
False

There is also is not operator, which is exactly opposite.

Example: String Comparison using 'is not' Operator

print (str1 is not str2)
print (str1 is not str3)

Output

False
True

On the face of it, == and is operators seem to behave similarly. However, consider the following example.

Example: String Comparison

var1="Tutorials"
var2="Teacher"
var3="TutorialsTeacher"

print(var1+var2 == var3)
print(var1+var2 is var3)

Output

True
False

Even though concatenation of var1 + var2 evaluates to var3, comparison with var3 using == returns True but using is returns False.

Comparison operators ==, !=, <, > <= and >= perform comparison of strings according to lexicographic order of letter. Unicode values of letters in each string are compared one by one. Result of > and < operator depends on Unicode values of letters at index where they are not the same. For example, "bat" > "ball" returns True, which simply means, first string appears after the second in alphabetical order.

Example: > Operator

print("bat">"ball")
print("car">"cat")

Output

True
False

This is because the position where the comparison of string breaks, Unicode value of t is more than that of l and r.

Example: String Comparison

print(ord('t'), ord('l'))  #in first comparison
print(ord('r'), ord('t')) #in second comparison

Output

116 108
114 116

Obviously, the string comparison is case-sensitive as Unicode values of lower case letters are more than that of upper case letters. If you should compare strings without taking the case into consideration, convert them to upper or lower case.

Example: String Comparison

str1="Hello"
str3="HELLO"
print (str1.upper()>=str3)

Output

True

Finally, we take a brief look at the match() and search() functions defined in the re module. Python's re module implements regular expression syntax for finding the appearance of a pattern of letters in a string. The match() function checks whether the given pattern is found at the beginning of a string. On the other hand, the search() function is able to check its presence anywhere in the string. Another function, the findall() returns all appearances of the pattern.

Example: re.match()

import re
string="Simple is better than complex"
pattern="Simple"
if re.match(pattern, string):
    print ("found match")
else:
    print("match not found")

pattern="dummy"
if re.match(pattern, string):
    print ("found match")
else:
    print("match not found")

pattern="ple"
obj=re.search(pattern, string)
print ("found pattern at ", obj.start())

obj=re.findall(pattern, string)
print (obj)

Output

found match
match not found
found pattern at  3
['ple', 'ple']

To find the position of each appearance of the pattern, use the finditer() function.

Example: finditer()

obj=re.finditer(pattern, string)
for app in obj:
    print ("found pattern at index : ", app.start())

Output

found pattern at index :  3
found pattern at index :  25

The re module is much more powerful with the ability to search for complex string patterns such as alphanumeric strings, ignoring case while searching, escape characters, etc. This advanced discussion is beyond the scope of this article.

Malhar Lathkar

Author

Malhar Lathkar is an independent software professional having 30+ years of experience in various technologies such as Python, Java, Android, PHP, and Databases. He is an author of Python Data Persistence: With SQL and NOSQL Databases