Bug 7558 - pyobfuscate doesn't work with Python 3
Summary: pyobfuscate doesn't work with Python 3
Status: CLOSED FIXED
Alias: None
Product: ThinLinc
Classification: Unclassified
Component: Build system (show other bugs)
Version: trunk
Hardware: PC Unknown
: P2 Normal
Target Milestone: 4.13.0
Assignee: Pierre Ossman
URL:
Keywords: prosaic, samuel_tester, wilsj_tester
Depends on:
Blocks: 4586
  Show dependency treegraph
 
Reported: 2020-09-21 13:27 CEST by Pierre Ossman
Modified: 2023-09-14 11:05 CEST (History)
3 users (show)

See Also:
Acceptance Criteria:


Attachments

Description Pierre Ossman cendio 2020-09-21 13:27:57 CEST
Our Python obfuscator is written for Python 2 and needs an update so it can work with our new Python 3 code.
Comment 1 Pierre Ossman cendio 2020-09-24 10:30:33 CEST
So unfortunately things are a bit more problematic than just porting the syntax of pyobfuscate:

 a) The module "compiler" is gone

 b) The module "parser" is not gone yet, but marked as deprecated

 c) The grammar has changed quite a bit

Fortunately we seem to have a way forward:

We can replace "compiler" with "ast", which makes things a lot cleaner.

Upstream is also pointing at "ast" as a replacement for "parser", which would clean that up as well. However "ast" doesn't have line numbers for all nodes until 3.9, so we can't use it yet. We'll have to stick with "parser" for now and look at a switch to "ast" once "parser" actually gets removed.

As for c), it will just be about rolling up our sleeves and start adapting the code to the new grammar.
Comment 2 Pierre Ossman cendio 2020-09-24 10:31:42 CEST
Upstream threw another wrench in to things by breaking Symbol.is_local() for us:

https://bugs.python.org/issue41840

An ugly workaround seems possible, but hopefully they'll fix this quickly and we can go back to normal.
Comment 3 Pierre Ossman cendio 2020-09-24 12:41:04 CEST
Conversion to Python 3 done and sent upstream:

https://github.com/astrand/pyobfuscate/pull/24
Comment 6 Pierre Ossman cendio 2020-09-24 16:03:11 CEST
Should be done now. It passes all the included tests, and I couldn't see any meaningful difference when running the new and old version on code that is both Python 2 and Python 3 compatible.

Since we need both the new version has gotten a suffix (i.e. "pyobfuscate3"). We could consider removing that once all Python 2 code is gone.
Comment 9 Samuel Mannehed cendio 2020-11-06 12:10:07 CET
Looks good.

I've looked through the code and verified the functionality by obfuscating hiveconf and running the unit tests (the only modification I had to do was to specify the public function names in "__all__ = []").

I ran both python2 and python3 unittests on the output from `cbrun x86_64 pyobfuscate3 hiveconf.py`. I also compared the output from pyobfuscate with pyobfuscate3. No problems found.
Comment 10 Niko Lehto cendio 2021-03-08 12:46:15 CET
We get a traceback if we try to obfuscate a file containing non-ascii. In my case this was an "Å" located in a comment in the handler_unbindports.py code. The traceback we get is:

>Traceback (most recent call last):
>  File "/usr/bin/pyobfuscate3", line 1154, in <module>
>    main()
>  File "/usr/bin/pyobfuscate3", line 1117, in main
>    source = open(conf.file, 'r').read()
>  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
>    return codecs.ascii_decode(input, self.errors)[0]
>UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 67: >ordinal not in range(128)
>make[3]: *** [install-agent] Error 254
>make[3]: Leaving directory `/home/nikle/dev/ctc/buildarea/BUILD/thinlinc-vsm'
>make[2]: *** [install-obfusc] Error 2
>make[2]: Leaving directory `/home/nikle/dev/ctc/buildarea/BUILD/thinlinc-vsm'
>error: Bad exit status from /var/tmp/rpm-tmp.0k2OhO (%install)
Our Python 3 obfuscator should respect the encoding declarationor if given. In absence of the encoding declaration we should instead use the Python 3 default value, which is UTF-8. (https://docs.python.org/3/reference/lexical_analysis.html)
Comment 11 Niko Lehto cendio 2021-03-09 08:29:22 CET
(In reply to Niko Lehto from comment #10)
Missed to include the line causing this traceback:
> Calling /usr/bin/pyobfuscate3 modules/thinlinc/vsm/handler_unbindports.py >/home/nikle/dev/ctc/buildarea/BUILDROOT/thinlinc-vsm-4.12.1post-6770.i386/opt/thinlinc/modules/thinlinc/vsm/handler_unbindports.py
Comment 13 Niko Lehto cendio 2021-03-11 14:45:41 CET
We get the following traceback if we try to use the pyobfuscate script 'run_tests' in python 3.9:

>Traceback (most recent call last):
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 1179, in <module>
>    main()
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 1154, in main
>    cw = CSTWalker(source, pae.pubapi)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 140, in __init__
>    self.walk(elements, [self.symtab])
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk
>    self.walk(node, symtabs)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk
>    self.walk(node, symtabs)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk
>    self.walk(node, symtabs)
>  [Previous line repeated 1 more time]
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 197, in walk
>    self.handle_classdef(elements, symtabs)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 638, in handle_classdef
>    self.walk(node, symtabs + [classtab])
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk
>    self.walk(node, symtabs)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk
>    self.walk(node, symtabs)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 208, in walk
>    self.walk(node, symtabs)
>  [Previous line repeated 2 more times]
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 203, in walk
>    self.handle_decorator(elements, symtabs)
>  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 709, in handle_decorator
>    assert name[0] == token.NAME
>AssertionError
Comment 14 Niko Lehto cendio 2021-03-12 10:30:35 CET
When obfuscating an iso-8859-15 (Latin9) encoded file and piping this to python 3.6, we get the following error:
> SyntaxError: encoding problem: ISO-8859-15
This piping of output is what our 'run_tests' script does at the moment.
The problem does not occur when either using Python 3.9 or saving the pyobfuscate output into a file first.

Note that this is a different error from the one you get if the encoding is unknown, which gives a traceback followed by:
> SyntaxError: unknown encoding: ISO-bad-15
Comment 15 Pierre Ossman cendio 2021-03-12 12:26:50 CET
(In reply to Niko Lehto from comment #13)
> >  File "/home/nikle/dev/pyobfuscate3-ossman/test/../pyobfuscate", line 709, in handle_decorator
> >    assert name[0] == token.NAME
> >AssertionError

Python has changed the grammar in 3.9 so we need some tweaks.
Comment 17 Pierre Ossman cendio 2021-03-15 12:23:35 CET
Both issues are now fixed and a new package has been deployed.
Comment 18 William Sjöblom cendio 2021-03-15 16:26:27 CET
File encoding set to `latin-1' and non-ASCII letters (outside of comments) runs fine in python 3.9.2 but crashes `pyobfuscate' with the following stack trace:
> Traceback (most recent call last):
>   File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1168, in <module>
>     main()
>   File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1143, in main
>     cw = CSTWalker(source, pae.pubapi)
>   File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 143, in __init__
>     cst = parser.suite(source.decode(encoding))
>   File "<string>", line 2
>     ä = 3
>      ^
> SyntaxError: invalid character '¤' (U+00A4)

Input file:
> # -*- coding: latin-1; -*-
> ä = 3
> print(ä)
Comment 19 Pierre Ossman cendio 2021-03-15 17:00:09 CET
(In reply to William Sjöblom from comment #18)
> >   File "<string>", line 2
> >     ä = 3
> >      ^
> > SyntaxError: invalid character '¤' (U+00A4)
> 

AFAICT this is a bug in Python's parser module. It requires the input to be "str", not "bytes" like for compile(). However it needs to feed the lower layers a byte stream so it seems to always convert things to UTF-8. However if the file is tagged as something other than UTF-8, then the lower layers will get upset and complain.

Everything works just fine if the file is actually UTF-8, so in most cases this is not an issue.

No point in reporting this upstream as they have dropped the entire parser module for 3.10 (which is another issue, but we'll deal with that when we get there).
Comment 20 William Sjöblom cendio 2021-03-16 13:18:38 CET
I could reproduce the issue and handling of characters in UTF-8 outside the ASCII range works as expected.

Tested by running `python3.9 pyobfuscate' on `cpython/Lib/test/*.py' (https://github.com/python/cpython/, 3.9 branch). These ran successful results in regard to handling of encoding, apart from one character in `Lib/test/test_unicode_identifiers.py' with a trailing `VARIATION SELECTOR-17' that resulted in the following stack trace:

Traceback (most recent call last):
  File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1154, in <module>
    main()
  File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 1141, in main
    ce = ColumnExtractor(source, cw.names)
  File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 809, in __init__
    self.parse(f)
  File "/home/wilsj/workbench/pyobfuscate/./pyobfuscate", line 840, in parse
    raise RuntimeError("Overlooked symbol '%s' on line %d column %d" % (t_string, srow, scol))
RuntimeError: Overlooked symbol 'x' on line 11 column 12

This is deemed out of scope as of now. Marking as closed.

Note You need to log in before you can comment on or make changes to this bug.