Programming language: Python 3.9 exchanges the parser


Python 3.9 was released on the planned release date. With the release, the programming language gets a new parser that is supposed to enable more flexible extensions. In terms of syntax, the union operators for dictionaries and the extended type hinting for collections are worth mentioning. There are also two new helper functions for strings.

Version 3.9, which has been in beta since May 2020, also marks a change in the timetable: instead of every 18 months as before, new releases are now pending every year. Python 3.8 was released in October 2019, and in November Brett Cannon, a member of Python’s Steering Council, announced the new release cycle. In the next year, however, unlike, for example, TypeScript with its decimal counting, the successor is not version 4.0, but an incremental update with Python 3.10 is planned for 2021.

CPython has been using an LL (1) parser since the beginning of the programming language 30 years ago, which specifically means that it works from left to right and with a left derivation (left-to-right, leftmost derivation) and can look ahead with a token. The way it works is high-performance, but it has some limitations.

Therefore leads the PEP 617 (Python Enhancement Proposal) introduced a new parser that works on the principle of the Parsing Expression Grammar (PEG). In the summer of 2019, the python creator Guido van Rossum has one Blog series on how the PEG parser works and the motivation for the change.

One of the main points of criticism is that looking ahead to just one additional token limits the syntax options. In addition, Python already uses constructs that are not compatible with LL (1) grammar and that the parser can only process with tricks. The interaction with left-recursive syntax can be quite uncomfortable, which in the worst case leads to endless loops during parsing and thus to stack overflow errors.

The new PEG parser should have roughly the same performance as the previous LL (1) parser, but it is much more flexible. In the current release it has no direct impact on the language, but Python 3.10 is created with a view to increased flexibility. The LL (1) parser will no longer be included in the upcoming release. Python 3.9 uses the new parser by default, but the old one can be activated via the command line parameter -X oldparser or the environment variable PYTHONOLDPARSER=1 use.

In terms of syntax, the merge operator is | For dict to call, which as PEP 584 Finding its way into the language. It connects two dictionaries. The result contains all elements from both sources, and if the keys are duplicated, the last one in the chain is used:

d1 = {'a': 1, 'b': 2, 'c': 3}
d2 = {'c': 4, 'd': 5}

d3 = d1 | d2

# führt zu folgender Ausgabe, bei der der
# Eintrag c aus dem zweiten Dictionary stammt
{'a': 1, 'b': 2, 'c': 4, 'd': 5}

d3 = d2 | d1

# nimmt dagegen den Eintrag c aus 
# dem ersten Dictonary, aber den
{'d': 5, 'a': 1, 'b': 2, 'c': 3}

In addition to the regular merge operator, it can also be used via the update operator |= possible, where the result of the union is in the first dictionary.

The PEP 585 extends the type hinting for collections. Explicit type specifications for the integrated generic types such as list or dict use, for example in strList: list[str] or intList = list[int](). Developers no longer have to use the corresponding variants such as List or Dict from the typing-Import module.

Also noteworthy are those with Methods introduced in PEP 616to trim strings at the beginning or end. The methods do not remove a fixed number of characters, but rather strings in front or in the back, if they are available. The PEP uses this as an example CPython module which currently uses the following code to remove the enclosing quotation marks:

def strip_quotes(text):
    if text.startswith('"'):
        text = text[1:]
    if text.endswith('"'):
        text = text[:-1]
    return text

With the new auxiliary functions, a one-liner is sufficient without if-Clauses for cleaning up: