Skip to content

Conversation

@schmamps
Copy link

In parser.py:

    # Jagadeesh, J., Pingali, P., & Varma, V. (2005). Sentence Extraction Based Single Document Summarization. International Institute of Information Technology, Hyderabad, India, 5.
    def getSentencePositionScore(self, i, sentenceCount):
        normalized = i / (sentenceCount * 1.0)

        if normalized > 0 and normalized <= 0.1:
            return 0.17
        # ... snip ....
        elif normalized > 0.8 and normalized <= 0.9:
            return 0.04
        elif normalized > 0.9 and normalized <= 1.0:
            return 0.15
        else:
            return 0

When Summarizer scores sentences (Summarizer.computeScore()) it iterates through the list and passes the index as its argument. Notice what happens when you unroll that loop, though:

# where: ['sentence 1', 'sentence 2', ..., 'sentence 10']
sentences = ['sentence ' + str(x + 1) for x in range(10)]

# first sentence, returns 0
sentencePosition = parser.getSentencePositionScore(0, len(sentences))
# ...
# last sentence, returns 0.04
sentencePosition = parser.getSentencePositionScore(9, len(sentences))

Instinct tells me that a (next-to-)zero score for the introductory and concluding sentences is wrong. Adding to my suspicions, sum of all cases in the referenced paper are 0 < normalized <= 1.0.

To me this looks like an off-by-one bug. Nine of the ten sentences above are caught one case too early and the first falls to default. That shouldn't even happen with valid arguments!

If I'm opinionated about this as an issue, respectable people can differ about where to fix it. One could make a case for Summarizer.computeScore() but Parser.getSentencePositionScore() is already parameterized for index (zero-based, presumably) so fixing it there has the smallest impact.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant