Skip to content

Conversation

@thread13
Copy link

@thread13 thread13 commented Jun 4, 2016

… since this seems to break the source code parser ; certainly the preferred way to go would be to convince the parser to accept non-ASCII characters ( Python does that, for one thing ), but as a quick fix:

--- __init__.py.orig    2016-06-04 20:24:26.507343246 +1000
+++ __init__.py 2016-06-04 20:39:29.206238307 +1000
@@ -23,6 +23,19 @@
 import getopt, sys, os, string, re
 import keyword, parser, symbol, token

+_re_ascii_filter = '[^%s]' % (re.escape(string.printable), )
+
+def ascii_dammit( sourcecode, _re_expr = re.compile( _re_ascii_filter ) ):
+    """
+        just ignore all non-ascii characters 
+        since any identifiers should be ASCII anyway ;
+        nb: this will work for utf-8 as well
+        
+    """
+
+    result = _re_expr.sub( '', sourcecode )
+    return result
+

 class Mark(object):
     """ Marks, as defined by Cscope, that are implemented.
@@ -234,6 +247,7 @@
     # Add path info to any syntax errors in the source files
     if filecontents:
         try:
+            filecontents = ascii_dammit( filecontents )
             indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len, dump)
         except (SyntaxError, AssertionError) as e:
             e.filename = fullpath

@thread13
Copy link
Author

thread13 commented Jun 4, 2016

pycscope also does not like embedded '\0'-s : ( btw, probably it shall add the filename to the printed exception )

Traceback (most recent call last):
  File "/usr/local/bin/pycscope", line 9, in <module>
    load_entry_point('pycscope==1.2.1', 'console_scripts', 'pycscope')()
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 128, in main
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 171, in work
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 237, in parseFile
  File "build/bdist.linux-x86_64/egg/pycscope/__init__.py", line 938, in parseSource
TypeError: suite() argument 1 must be string without null bytes, not str

@thread13
Copy link
Author

thread13 commented Jun 4, 2016

printing the filename of the file that brings us down ( commit 50e42f9 in the fork ):

$ diff -u __init__.py.new __init__.py 
--- __init__.py.orig    2016-06-04 22:31:10.027610098 +1000
+++ __init__.py 2016-06-04 22:53:03.805282697 +1000
@@ -247,11 +247,16 @@
     # Add path info to any syntax errors in the source files
     if filecontents:
         try:
             indexbuff_len = parseSource(filecontents, indexbuff, indexbuff_len, dump)
         except (SyntaxError, AssertionError) as e:
             e.filename = fullpath
             raise e
+        except Exception as e:
+            # debug a fatal exception: 
+            e.filename = fullpath
+            print("pycscope.py: %s in %s" % (e, repr(fullpath)))
+            raise e

     return indexbuff_len

@portante portante added the bug label Oct 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants