Skip to content

Conversation

@sudharsh
Copy link

StringBufferInputStream has issues converting chars into bytes and is a deprecated class
http://docs.oracle.com/javase/7/docs/api/java/io/StringBufferInputStream.html. This meant that parsing from strings returned wrong results (in many cases, empty content). In my case, using FileInputStream wasn't an option. Anything that would make use of String at some point would screw up the raw data.

Therefore, I have replaced StringBufferInputStream with ByteArrayInputStream in the jcc args.

As you can see, I have reorganized the directory structure and bumped up tika to 1.1. I have also added a new module called parser exposing from_file and from_buffer functions for the lazy ones out there.

Have tested the changes on Mac and Linux.

@Bengt
Copy link

Bengt commented Sep 16, 2012

This works for me and fixes issue #1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants