import re
= re.search('[a-z][a-z]', '1A_ux_FHX_yu')
some_match some_match
<re.Match object; span=(3, 5), match='ux'>
Official documentation: https://docs.python.org/3/library/re.html
A regular expression is a recursive succession of rules that will match parts of a string. Some notable example:
[a-z][a-z]
matches two consecutive lower case letters
<re.Match object; span=(3, 5), match='ux'>
We can retrieve the position of the match
We can retrieve the string itself
We can retrieve the group (substring of the origin string) that was matched
[A-Z]
matches one upper case letter[a-z]
matches one lower case letter[a-]
and [-a]
matches either the letter a or the character -
[A-Z][A-Z]
matches two consecutive upper case letters\d
or [0-9]
: matches a digit.
: matches any single characters\w
: matches a whole word\s
: matches a space character\b
: Matches the empty string, but only at the beginning or end of a word.[]
indicates a set of possible charactersBeginning/end of sentence:
^
: matches the beginning of the line$
: matches the end of the lineRepetitions:
[A-Z][A-Z]\d
matches two consecutive upper case letters followed by a digit+
: matches the preceding pattern one or more times*
: matches the preceding pattern zero or more times?
: matches the preceding pattern zero or one time\d+
matches one or more consecutive digits\d{2,3}
matches two or three consecutive digitsLogical or:
(ab|123)
matches either ab
or 123
([a-z]+|\d*)
matches either one or more lowercase letters, or zero ore more digits<re.Match object; span=(29, 32), match='Ab3'>
sentence = 'I paid $10'
sentence2 = 'I paid $1061.50'
# without escaping the dollar sign, nothing is found
re.search(r'$\d+', sentence) is None
True
<re.Match object; span=(7, 10), match='$10'>
sentence = '16:332:509 Convex Optimization for Engineering Applications (3)'
expression = "(\d{2,3}:?){3}\s"
re.findall(expression, sentence)
['509']
sentence = '16:332:509 Convex Optimization for Engineering Applications (3)'
expression = r"(?P<school>\d{2}):(?P<level>\d{3}):(?P<course>\d{3})\s(?P<title>(\w\s?)+)\s\((?P<credits>\d+)\)"
m = re.search(expression, sentence)
m.groupdict()
{'school': '16',
'level': '332',
'course': '509',
'title': 'Convex Optimization for Engineering Applications',
'credits': '3'}
compiled = re.compile(r"(?P<school>\d{2}):(?P<level>\d{3}):(?P<course>\d{3})\s(?P<title>(\w\s?)+)\s\((?P<credits>\d+)\)",
re.DEBUG)
SUBPATTERN 1 0 0
MAX_REPEAT 2 2
IN
CATEGORY CATEGORY_DIGIT
LITERAL 58
SUBPATTERN 2 0 0
MAX_REPEAT 3 3
IN
CATEGORY CATEGORY_DIGIT
LITERAL 58
SUBPATTERN 3 0 0
MAX_REPEAT 3 3
IN
CATEGORY CATEGORY_DIGIT
IN
CATEGORY CATEGORY_SPACE
SUBPATTERN 4 0 0
MAX_REPEAT 1 MAXREPEAT
SUBPATTERN 5 0 0
IN
CATEGORY CATEGORY_WORD
MAX_REPEAT 0 1
IN
CATEGORY CATEGORY_SPACE
IN
CATEGORY CATEGORY_SPACE
LITERAL 40
SUBPATTERN 6 0 0
MAX_REPEAT 1 MAXREPEAT
IN
CATEGORY CATEGORY_DIGIT
LITERAL 41
0. INFO 4 0b0 16 MAXREPEAT (to 5)
5: MARK 0
7. REPEAT_ONE 9 2 2 (to 17)
11. IN 4 (to 16)
13. CATEGORY UNI_DIGIT
15. FAILURE
16: SUCCESS
17: MARK 1
19. LITERAL 0x3a (':')
21. MARK 2
23. REPEAT_ONE 9 3 3 (to 33)
27. IN 4 (to 32)
29. CATEGORY UNI_DIGIT
31. FAILURE
32: SUCCESS
33: MARK 3
35. LITERAL 0x3a (':')
37. MARK 4
39. REPEAT_ONE 9 3 3 (to 49)
43. IN 4 (to 48)
45. CATEGORY UNI_DIGIT
47. FAILURE
48: SUCCESS
49: MARK 5
51. IN 4 (to 56)
53. CATEGORY UNI_SPACE
55. FAILURE
56: MARK 6
58. REPEAT 22 1 MAXREPEAT (to 81)
62. MARK 8
64. IN 4 (to 69)
66. CATEGORY UNI_WORD
68. FAILURE
69: REPEAT_ONE 9 0 1 (to 79)
73. IN 4 (to 78)
75. CATEGORY UNI_SPACE
77. FAILURE
78: SUCCESS
79: MARK 9
81: MAX_UNTIL
82. MARK 7
84. IN 4 (to 89)
86. CATEGORY UNI_SPACE
88. FAILURE
89: LITERAL 0x28 ('(')
91. MARK 10
93. REPEAT_ONE 9 1 MAXREPEAT (to 103)
97. IN 4 (to 102)
99. CATEGORY UNI_DIGIT
101. FAILURE
102: SUCCESS
103: MARK 11
105. LITERAL 0x29 (')')
107. SUCCESS