Hi,
Related to issue #56 and issue #37
On Windows10 javasphinx-apidoc
won't work when run on Python 3.6.4. It will if run on Python2.7. This is with javasphinx==0.9.15
for both
It looks like the script, or possibly the python stdlib, are expecting the read files to be encoded in cp1252? But the files are actually utf-8. This will hit a problem on any byte that isn't a valid cp1252 character.
e.g. If trying to read character ๐ ( U+1F40D, encoded in UTF-8 as b'\xF0\x9F\x90\x8D') then the script throws an exception, as it's treating that as 4 separate characters, and byte 0x90 is not a cp1252 character.
The stack trace shown is:
File "C:\dev\env\python\Python36\Scripts\javasphinx-apidoc-script.py", line 11, in <module>
load_entry_point('javasphinx==0.9.15', 'console_scripts', 'javasphinx-apidoc')()
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 347, in main
opts.member_headers, opts.parser_lib)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 228, in generate_documents
this_file_documents = generate_from_source_file(doc_compiler, source_file, cache_dir)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 191, in generate_from_source_file
source = f.read()
File "c:\dev\env\python\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 24: character maps to <undefined>
Whilst it works in py2, I'm feel like this is purely by accident due to python2's very "liberal" string decoding policies and the fact that it's a UTF-8 file. If my file was encoded in something weird, e.g. EUCJIS/SJIS, then the tool will fail. The official javadoc tool has an encoding option.
It would be good if javasphinx-apidoc
could take an --encoding
parameter and ensure that all files are read/decoded in that format.
Full Example
This was done using Powershell_ISA to "ensure" that the unicode characters were printed correctly, but it will happen in cmd.exe or git bash etc.
PS C:\dev\work\Mobile-SDK-Android\docs> Get-Content .\java\utf8.java -Encoding UTF8
package java;
/**
* ๐ U+1F40D -> \xF0\x9F\x90\x8D
* ๐ U+1F450 -> \xF0\x9F\x91\x90
*/
public class EncodingProblems {
public static void main(String[] args) {
System.out.println("Hello!");
}
}
PS C:\dev\work\Mobile-SDK-Android\docs> C:\dev\env\python\Python36\Scripts\javasphinx-apidoc.exe --output-dir=tmp/ java/
C:\dev\env\python\Python36\Scripts\javasphinx-apidoc.exe : Traceback (most recent call last):
At line:1 char:1
+ C:\dev\env\python\Python36\Scripts\javasphinx-apidoc.exe --output-dir ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (Traceback (most recent call last)::String) [], RemoteException
+ FullyQualifiedErrorId : NativeCommandError
File "C:\dev\env\python\Python36\Scripts\javasphinx-apidoc-script.py", line 11, in <module>
load_entry_point('javasphinx==0.9.15', 'console_scripts', 'javasphinx-apidoc')()
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 347, in main
opts.member_headers, opts.parser_lib)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 228, in generate_documents
this_file_documents = generate_from_source_file(doc_compiler, source_file, cache_dir)
File "c:\dev\env\python\python36\lib\site-packages\javasphinx\apidoc.py", line 191, in generate_from_source_file
source = f.read()
File "c:\dev\env\python\python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 24: character maps to <undefined>
PS C:\dev\work\Mobile-SDK-Android\docs> C:\dev\env\python\Python27\Scripts\javasphinx-apidoc.exe --output-dir=tmp/ java/