Update politeness v2 #367

sundy1994 · 2025-12-17T19:34:01Z

xehu

See error in the logic flow in the inline comments! I think we might need to make a small adjustment.

xehu · 2025-12-18T19:53:52Z

src/team_comm_tools/features/politeness_v2_helper.py

+                counted_sentences.add(sent.start)
+                break
+    return yesno_count, wh_count
+    # sentences = [str(sent) for sent in doc.sents if '?' in str(sent)]


This seems like the old logic commented out, which we can remove once we establish the new logic works!

xehu · 2026-01-07T18:27:07Z

src/team_comm_tools/features/keywords.py

            " thank ",
            " thanks ",
-            " thank you ",
+            #" thank you ",


Is it correct that we're commenting this out because otherwise we'd double count "thank you very much" ?

Yes. Both "thank" and "thank you" will be counted. We discussed this issue with Burint and he made the changes.

xehu · 2026-01-07T18:29:18Z

src/team_comm_tools/features/politeness_v2_helper.py

+                 'is', 'are', 'was', 'were', 'am'}
+    # Pronouns that often follow auxiliaries in Yes/No questions
+    pronoun_followers = {'i', 'you', 'we', 'he', 'she', 'they', 'it'}
+    # filler_words = {'ok', 'so', 'well', 'like', 'you know', 'i mean', 'actually', 'basically', 'right', 'just', 'uh', 'um', 'oh', 'hmm', 'like'}


Are we no longer ignoring filler words like 'ok so' ?

Yes. This was my own solution to this test case: "so Which part should we do first?". Burint updated his code later, so I adapted that instead.

xehu · 2026-01-07T18:39:40Z

src/team_comm_tools/features/politeness_v2_helper.py

+        sent_tokens = list(sent)
+        if not sent_tokens:
+            continue
+        # Method 1: Find question sentences by checking for '?' at end


So there's a bug with this logic, which is that, if the sentence doesn't end with a question mark, the fallback method is very error prone. For example:

Question(nlp("can you tell me what is your name?")) yields YesNo = 1, WH = 0 (correct)
Question(nlp("can you tell me what is your name")) yields YesNo = 0, WH = 1 (incorrect)

Why can't we use logic that looks for the question words, but then applies the logic with the search_tags, which helps to get around some of these problems?

…n the Question function

…little bit of help from claude)

xehu · 2026-01-10T06:01:04Z

politeness_test.ipynb

I wrote a much more robust set of tests as I tried to iterate and figure out edge cases to the question function. Basically, I ended up just trying to come up with test cases, fixing them, and eventually I ended up reworking a good chunk of the function in order to catch all of the errors.

I think this is probably the best version of the question detector we have so far, and perhaps we can (1) update the test cases with these expected values and (2) make a PR or comment to the other repository with a note on the suggested fixes. Ultimately, I think it's better that we have a more accurate version of the question extraction, than to follow the source material if it is error-prone.

xehu

Our version of the politeness questions is more robust than ever, and we also have updated tests to confirm that we get the edge cases right. I say we merge this in!

xehu · 2026-01-09T01:09:11Z

src/team_comm_tools/features/politeness_v2_helper.py

+    wh_words = {'what', 'who', 'where', 'when', 'why', 'how', 'which'}
+    wh_followers = {
+        'what': {'are', 'is', 'do', 'does', 'can', 'should', 'might'},
+        'who': {'am', 'is', 'are', 'was', 'can', 'should'},


@sundy1994 I found a bug when testing this. It turns out that if you do not add 'am' to WH_followers, things like who am I or what am I supposed to do are not detected as WH_questions. But there are still some bugs with this... I'll follow up on Slack.

UPDATE - I realized that it actually doesn't make much sense to separate the wh_followers from the other auxiliaries, so I refactored this so that we only use a consistent set of auxiliaries

xehu · 2026-01-11T01:52:16Z

tests/data/cleaned_data/test_chat_level.csv

+1,B,I am not sure. Where is the rest of our team?,WH_Questions_receptiveness_yeomans,1
+1,B,I am not sure. Where is the rest of our team?,First_Person_Single_receptiveness_yeomans,1
+1,B,"Well please help me figure this out, I really want to do well on this please okay",factuality_politeness_convokit,1
+1B,A,is where different from why?,YesNo_Questions_receptiveness_yeomans,1


@sundy1994 I added in this more robust set of question tests to test all the edge cases that the code (hopefully) now catches!

sundy1994 added 2 commits December 16, 2025 11:30

sync politeness updates

c3b4422

add new WH/yesno questions to yeomans_test dataset

b388a32

sundy1994 requested a review from xehu December 17, 2025 19:34

xehu requested changes Jan 7, 2026

View reviewed changes

sundy1994 and others added 7 commits January 8, 2026 10:25

Refactor Yes/No question detection and improve WH question handling i…

14dc294

…n the Question function

add 'am' to WH followers

125d821

remove commented-out thank you

a6e99a7

add am everywhere

183d1e6

update to question detection

bafe275

an expanded and more robust version of the question function (with a …

40aee79

…little bit of help from claude)

fixing more edge cases

8e15934

update robustness of question detector and update tests

a011781

xehu approved these changes Jan 11, 2026

View reviewed changes

xehu merged commit 4b2b0ed into dev Jan 12, 2026
1 check passed

Update politeness v2 #367

Update politeness v2 #367

Uh oh!

Conversation

sundy1994 commented Dec 17, 2025

Uh oh!

xehu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xehu commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xehu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xehu commented Jan 10, 2026 •

edited

Loading