Skip to content

Result parser issue when dealing with results containing a semicolon #3

@hecklerponics

Description

@hecklerponics

There's an issue with parsing returned data with URLs that include a semicolon:

From /python_semrush/semrush.py

     84             result = {}
     85             for i, datum in enumerate(line.split(';')):
---> 86                 result[columns[i]] = datum.strip('"\n\r\t')
     87             results.append(result)

As an example this URL was passed from a call to the organic_phrase function:
http://www.hilton.com/en/hotels/content/SPTSHHF/media/pdf/Tangerine_Bar_2.pdf;jsessionid=DTE5TAZBV525MCSGBI12VCQ

Resulting in list index out of range error. To get around this (just in case others find the same problem) I modified my script to declare export_escape=1 in the arguments to force double-quotes; I then updated the parser to split on '";"' instead of ";"

The new code looks like this:
(lines 75-89 of /python_semrush/semrush.py)

    @staticmethod
    def parse_response(data):
        results = []
        data = data.decode('unicode_escape')
        lines = data.split('\r\n')
        lines = list(filter(bool, lines))
        columns = lines[0].split(';')

        for line in lines[1:]:
            result = {}
            for i, datum in enumerate(line.split('";"')):
                result[columns[i]] = datum.strip('"\n\r\t')
            results.append(result)

        return results

I'm sure there is a better way to do this, but in the meantime, this is a workaround that works!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions