Thursday, February 16, 2012

Python HTMLParser and super()

So I have a class that inherits from HTMLParser, and I want to call the super class init (the __init__ of HTMLParser), I would think I should do:

class MyParser(HTMLParser):
def __init__(self):
super(MyParser, self).__init__()

But this causes a problem:

myparser = MyParser()
Traceback (most recent call last):
File "", line 1, in
File "", line 3, in __init__
TypeError: must be type, not classobj

What's with that? The super(class, instance).__init__ idiom is the supposed proper way of calling a parent class constructor, and it is -- if the class is a "new-style" Python class (one which inherits from object, or a class which inherits from object).

And therein is the problem: HTMLParser inherits from markupbase.ParserBase, and markupbase.ParserBase is defined as:

class ParserBase:
"""Parser base class which provides some common support methods used
by the SGML/HTML and XHTML parsers."""

That is, as an *old* style class. One definitely wonders why in Python 2.7+ the classes that form part of the standard library wouldn't all be new-style classes, *especially* when the class is intended as being something you inherit from (like HTMLParser). Anywho, to fix:

class MyParser(HTMLParser):
def __init__(self):
# Old style way of doing super()
HTMLParser.__init__(self)

No comments:

Post a Comment