Here is the (without most of the functions) definition of a class called note.
public class Note
{
private String text;
String fileName = "";
NoteManager noteManager = null;
List<String> hyperlinks = new ArrayList<String>();
public static final int BUFFER_SIZE = 512;
public Note(NoteManager noteManager) {
this.noteManager = noteManager;
this.text = "";
}
public Note(NoteManager noteManager, String content) {
this(noteManager);
if (content == null)
setText("");
else
setText(content);
}
public Note(NoteManager noteManager, CharSequence content) {
this(noteManager, content.toString());
}
....some functions....
public static Note newFromFile(NoteManager noteManager, Context context,
String filename) throws IOException
{
FileInputStream inputFileStream = context.openFileInput(filename);
StringBuilder stringBuilder = new StringBuilder();
byte[] buffer = new byte[BUFFER_SIZE];
int len;
while ((len = inputFileStream.read(buffer)) > 0)
{
String line = new String(buffer, 0, len);
stringBuilder.append(line);
buffer = new byte[Note.BUFFER_SIZE];
}
Note n = new Note(noteManager, stringBuilder.toString().trim());
n.fileName = filename;
inputFileStream.close();
return n;
}
.... some functions attributed to this class
}
These notes are managed by a class called NoteManager.java, which I have abbreviated below:
public class NoteManager
{
Context context=null;
ArrayList<Note> notes = new ArrayList<Note>();
..... some functions...
public void addNote(Note note)
{
if (note == null || note.noteManager != this || notes.contains(note)) return;
note.noteManager = this;
notes.add(note);
try
{
note.saveToFile(context);
} catch (IOException e)
{
e.printStackTrace();
}
}
....some functions....
public void loadNotes()
{
String[] files = context.fileList();
notes.clear();
for (String fname:files)
{
try
{
notes.add(Note.newFromFile(this, context, fname));
} catch (IOException e)
{
e.printStackTrace();
}
}
}
}
public void addNote(Note note)
{
if (note == null || notes.contains(note)) return;
note.noteManager = this;
notes.add(note);
try
{
note.saveToFile(context);
} catch (IOException e)
{
e.printStackTrace();
}
}
I am trying to work out why this notepad app creates random new notes when the app is fully shutdown and then reopened, however I just cannot see what the problem is. I have cut out all the functions which didnt seem to relate to the problem, so the logical error must be here somewhere.
How does one go about finding what I am guessing to be some kind of circular reference or lack of checks?
Android typically uses UTF-8, with multi-byte characters. Creating a new String on a arbitrary byte sub-array can have issues at begin and end, if you deviate from ASCII.
public static Note newFromFile(NoteManager noteManager, Context context,
String filename) throws IOException
{
Path path = Paths.get(filename);
byte[] bytes = Files.readAllBytes(path);
String content = new String(bytes, "UTF-8");
Note n = new Note(noteManager, content.trim());
n.fileName = filename;
noteManager.add(n); // One registration?
return n;
}
The problem of having multiple instances of a node might need the addition within newFromFile or maybe an extra check:
public void addNote(Note note)
{
if (note == null || note.noteManager != this || notes.contains(note)) {
return;
}
note.noteManager = this;
notes.add(note);
And finally a Note must be well defined.
public class Note extends Comparable<Note> {
private NoteManager noteManager:
private final String content; // Immutable.
public NoteManager(NoteManager noteManager, String content) {
this.noteManager = noteManager;
this.content = content;
}
... compare on the immutable content
... hashCode on content
Not being to be able to change the content, and comparing on the string content, means notes cannot be doubled, change in the set, mixing up the set ordering.
Related
I´m writing my own library in java, where you can save variables very simple. But I have a problem in changing the values of the variables. The ArrayList empties itself as soon as the txt file is empty.
My Code:
public class SaveGameWriter {
private File file;
private boolean closed = false;
public void write(SaveGameFile savegamefile, String variableName, String variableValue, SaveGameReader reader) throws FileNotFoundException
{
if(!reader.read(savegamefile).contains(variableName))
{
file = savegamefile.getFile();
OutputStream stream = new FileOutputStream(file, true);
try {
String text = variableName+"="+variableValue;
stream.write(text.getBytes());
String lineSeparator = System.getProperty("line.separator");
stream.write(lineSeparator.getBytes());
}catch(IOException e)
{}
do {
try {
stream.close();
closed = true;
} catch (Exception e) {
closed = false;
}
} while (!closed);
}
}
public void setValueOf(SaveGameFile savegamefile, String variableName, String Value, SaveGameReader reader) throws IOException
{
ArrayList<String> list = reader.read(savegamefile);
if(list.contains(variableName))
{
list.set(list.indexOf(variableName), Value);
savegamefile.clear();
for(int i = 0; i<list.size()-1;i+=2)
{
write(savegamefile,list.get(i),list.get(i+1),reader);
}
}
}
}
Here my SaveGameReader class:
public class SaveGameReader {
private File file;
private ArrayList<String> result = new ArrayList<>();
public String getValueOf(SaveGameFile savegamefile, String variableName)
{
ArrayList<String> list = read(savegamefile);
if(list.contains(variableName))
{
return list.get(list.indexOf(variableName)+1);
}else
return null;
}
public ArrayList<String> read(SaveGameFile savegamefile) {
result.clear();
file = savegamefile.getFile();
BufferedReader in = null;
try {
in = new BufferedReader(new FileReader(file));
String read = null;
while ((read = in.readLine()) != null) {
String[] splited = read.split("=");
for (String part : splited) {
result.add(part);
}
}
} catch (IOException e) {
} finally {
boolean closed = false;
while(!closed)
{
try {
in.close();
closed=true;
} catch (Exception e) {
closed=false;
}
}
}
result.remove("");
return result;
}
}
And my SaveGameFile class:
public class SaveGameFile {
private File file;
public void create(String destination, String filename) throws IOException {
file = new File(destination+"/"+filename+".savegame");
if(!file.exists())
{
file.createNewFile();
}
}
public File getFile() {
return file;
}
public void clear() throws IOException
{
PrintWriter pw = new PrintWriter(file.getPath());
pw.close();
}
}
So, when I call the setValueOf() methode the ArrayList is empty and in the txt file there´s just the first variable + value. Hier´s my data structure:
Name=Testperson
Age=40
Phone=1234
Money=1000
What´s the problem with my code?
In your SaveGameReader.read() method you have result.clear(); which clears ArrayList. And you do it even before opening the file. So basically before every read from file operation you are cleaning up existing state and reread from file. If file is empty then you finish with empty list
I use LanguageTool for some spellchecking and spell correction functionality in my application.
The LanguageTool documentation describes how to exclude words from spell checking (with call the addIgnoreTokens(...) method of the spell checking rule you're using).
How do you add some words (e.g., from a specific dictionary) to spell checking? That is, can LanguageTool fix words with misspellings and suggest words from my specific dictionary?
Unfortunately, the API doesn't support this I think. Without the API, you can add words to spelling.txt to get them accepted and used as suggestions. With the API, you might need to extend MorfologikSpellerRule and change this place of the code. (Disclosure: I'm the maintainer of LanguageTool)
I have similar requirement, which is load some custom words into dictionary as "suggest words", not just "ignored words". And finally I extend MorfologikSpellerRule to do this:
Create class MorfologikSpellerRuleEx extends from MorfologikSpellerRule, override the method "match()", and write my own "initSpeller()" for creating spellers.
And then for the language tool, create this custom speller rule to replace existing one.
Code:
Language lang = new AmericanEnglish();
JLanguageTool langTool = new JLanguageTool(lang);
langTool.disableRule("MORFOLOGIK_RULE_EN_US");
try {
MorfologikSpellerRuleEx spellingRule = new MorfologikSpellerRuleEx(JLanguageTool.getMessageBundle(), lang);
spellingRule.setSpellingFilePath(spellingFilePath);
//spellingFilePath is the file has my own words + words from /hunspell/spelling_en-US.txt
langTool.addRule(spellingRule);
} catch (IOException e) {
e.printStackTrace();
}
The code of my custom MorfologikSpellerRuleEx:
public class MorfologikSpellerRuleEx extends MorfologikSpellerRule {
private String spellingFilePath = null;
private boolean ignoreTaggedWords = false;
public MorfologikSpellerRuleEx(ResourceBundle messages, Language language) throws IOException {
super(messages, language);
}
#Override
public String getFileName() {
return "/en/hunspell/en_US.dict";
}
#Override
public String getId() {
return "MORFOLOGIK_SPELLING_RULE_EX";
}
#Override
public void setIgnoreTaggedWords() {
ignoreTaggedWords = true;
}
public String getSpellingFilePath() {
return spellingFilePath;
}
public void setSpellingFilePath(String spellingFilePath) {
this.spellingFilePath = spellingFilePath;
}
private void initSpellerEx(String binaryDict) throws IOException {
String plainTextDict = null;
if (JLanguageTool.getDataBroker().resourceExists(getSpellingFileName())) {
plainTextDict = getSpellingFileName();
}
if (plainTextDict != null) {
BufferedReader br = null;
if (this.spellingFilePath != null) {
try {
br = new BufferedReader(new FileReader(this.spellingFilePath));
}
catch (Exception e) {
br = null;
}
}
if (br != null) {
speller1 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 1);
speller2 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 2);
speller3 = new MorfologikMultiSpeller(binaryDict, br, plainTextDict, 3);
br.close();
}
else {
speller1 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 1);
speller2 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 2);
speller3 = new MorfologikMultiSpeller(binaryDict, plainTextDict, 3);
}
setConvertsCase(speller1.convertsCase());
} else {
throw new RuntimeException("Could not find ignore spell file in path: " + getSpellingFileName());
}
}
private boolean canBeIgnored(AnalyzedTokenReadings[] tokens, int idx, AnalyzedTokenReadings token)
throws IOException {
return token.isSentenceStart() || token.isImmunized() || token.isIgnoredBySpeller() || isUrl(token.getToken())
|| isEMail(token.getToken()) || (ignoreTaggedWords && token.isTagged()) || ignoreToken(tokens, idx);
}
#Override
public RuleMatch[] match(AnalyzedSentence sentence) throws IOException {
List<RuleMatch> ruleMatches = new ArrayList<>();
AnalyzedTokenReadings[] tokens = getSentenceWithImmunization(sentence).getTokensWithoutWhitespace();
// lazy init
if (speller1 == null) {
String binaryDict = null;
if (JLanguageTool.getDataBroker().resourceExists(getFileName())) {
binaryDict = getFileName();
}
if (binaryDict != null) {
initSpellerEx(binaryDict); //here's the change
} else {
// should not happen, as we only configure this rule (or rather its subclasses)
// when we have the resources:
return toRuleMatchArray(ruleMatches);
}
}
int idx = -1;
for (AnalyzedTokenReadings token : tokens) {
idx++;
if (canBeIgnored(tokens, idx, token)) {
continue;
}
// if we use token.getToken() we'll get ignored characters inside and speller
// will choke
String word = token.getAnalyzedToken(0).getToken();
if (tokenizingPattern() == null) {
ruleMatches.addAll(getRuleMatches(word, token.getStartPos(), sentence));
} else {
int index = 0;
Matcher m = tokenizingPattern().matcher(word);
while (m.find()) {
String match = word.subSequence(index, m.start()).toString();
ruleMatches.addAll(getRuleMatches(match, token.getStartPos() + index, sentence));
index = m.end();
}
if (index == 0) { // tokenizing char not found
ruleMatches.addAll(getRuleMatches(word, token.getStartPos(), sentence));
} else {
ruleMatches.addAll(getRuleMatches(word.subSequence(index, word.length()).toString(),
token.getStartPos() + index, sentence));
}
}
}
return toRuleMatchArray(ruleMatches);
}
}
I have an app in which I have to read a .txt file so that I can store some values and keep them. This is working pretty well, except for the fact that I want to make those values non-readable or "non-understandable" for external users.
My idea was to convert the file content into Hex or Binary and, in the reading process, change it back to Char. The thing is that I don't have access to methods such as String.Format due to my compiler.
Here's how I'm currently reading and keeping the values:
byte[] buffer = new byte[1024];
int len = myFile.read(buffer);
String data = null;
int i=0;
data = new String(buffer,0,len);
Class to open and manipulate the file:
public class File {
private boolean debug = false;
private FileConnection fc = null;
private OutputStream os = null;
private InputStream is = null;
private String fileName = "example.txt";
private String pathName = "logs/";
final String rootName = "file:///a:/";
public File(String fileName, String pathName) {
super();
this.fileName = fileName;
this.pathName = pathName;
if (!pathName.endsWith("/")) {
this.pathName += "/"; // add a slash
}
}
public boolean isDebug() {
return debug;
}
public void setDebug(boolean debug) {
this.debug = debug;
}
public void write(String text) throws IOException {
write(text.getBytes());
}
public void write(byte[] bytes) throws IOException {
if (debug)
System.out.println(new String(bytes));
os.write(bytes);
}
private FileConnection getFileConnection() throws IOException {
// check if subfolder exists
fc = (FileConnection) Connector.open(rootName + pathName);
if (!fc.exists() || !fc.isDirectory()) {
fc.mkdir();
if (debug)
System.out.println("Dir created");
}
// open file
fc = (FileConnection) Connector.open(rootName + pathName + fileName);
if (!fc.exists())
fc.create();
return fc;
}
/**
* release resources
*/
public void close() {
if (is != null)
try {
is.close();
} catch (IOException e) {
e.printStackTrace();
}
is = null;
if (os != null)
try {
os.close();
} catch (IOException e) {
e.printStackTrace();
}
os = null;
if (fc != null)
try {
fc.close();
} catch (IOException e) {
e.printStackTrace();
}
fc = null;
}
public void open(boolean writeAppend) throws IOException {
fc = getFileConnection();
if (!writeAppend)
fc.truncate(0);
is = fc.openInputStream();
os = fc.openOutputStream(fc.fileSize());
}
public int read(byte[] buffer) throws IOException {
return is.read(buffer);
}
public void delete() throws IOException {
close();
fc = (FileConnection) Connector.open(rootName + pathName + fileName);
if (fc.exists())
fc.delete();
}
}
I would like to know a simple way on how to read this content. Binary or Hex, both would work for me.
So, with some understanding of the question, I believe you're really looking for a form of obfuscation? As mentioned in the comments, the easiest way to do this is likely a form of cipher.
Consider this example implementation of a shift cipher:
Common
int shift = 11;
Writing
// Get the data to be wrote to file.
String data = ...
// cipher the data.
char[] chars = data.toCharArray();
for (int i = 0; i < chars.length; ++i) {
chars[i] = (char)(chars[i] + shift);
}
String cipher = new String(chars);
// Write the data to the cipher file.
...
Reading
// Read the cipher file.
String data = ...
// Decipher the data.
char[] chars = data.toCharArray();
for (int i = 0; i < chars.length; ++i) {
chars[i] = (char)(chars[i] - shift);
}
String decipher = new String(chars);
// Use data as required.
...
Here's an example implementation on Ideone. The output:
Data : I can read this IP 192.168.0.1
Cipher : T+nly+}plo+st~+T[+<D=9<AC9;9<
Decipher: I can read this IP 192.168.0.1
I tried to keep this as low level as possible in order to satisfy the Java 3 requirement.
Note that this is NOT secure by any means. Shift ciphers (like most ciphers in a bubble) are trivial to break by malicious entities. Please do not use this if security is an actual concern.
Your solution is too complex. With java 8, you can try :
String fileName = "configFile.txt";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
//TO-DO .Ex
stream.forEach(System.out::println);
} catch (IOException e) {
e.printStackTrace();
}
I am trying to parse pdf file using Apache Tika after upgrading PDFBOX version to 1.6.0... And I started getting this error for few pdf files.
Any suggestions?
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream#3a72d4e5
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:74)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.Tika.parseToString(Tika.java:357)
at edu.uci.ics.crawler4j.crawler.BinaryParser.parse(BinaryParser.java:37)
at edu.uci.ics.crawler4j.crawler.WebCrawler.handleBinary(WebCrawler.java:223)
at edu.uci.ics.crawler4j.crawler.WebCrawler.processPage(WebCrawler.java:461)
at edu.uci.ics.crawler4j.crawler.WebCrawler.run(WebCrawler.java:129)
at java.lang.Thread.run(Thread.java:662)
WARN [Crawler 2] Did not found XRef object at specified startxref position 0
And this is my code.
if (page.isBinary()) {
handleBinary(page, curURL);
}
-------------------------------------------------------------------------------
public int handleBinary(Page page, WebURL curURL) {
try {
binaryParser.parse(page.getBinaryData());
page.setText(binaryParser.getText());
handleMetaData(page, binaryParser.getMetaData());
//System.out.println(" pdf url " +page.getWebURL().getURL());
//System.out.println("Text" +page.getText());
} catch (Exception e) {
// TODO: handle exception
}
return PROCESS_OK;
}
public class BinaryParser {
private String text;
private Map<String, String> metaData;
private Tika tika;
public BinaryParser() {
tika = new Tika();
}
public void parse(byte[] data) {
InputStream is = null;
try {
is = new ByteArrayInputStream(data);
text = null;
Metadata md = new Metadata();
metaData = new HashMap<String, String>();
text = tika.parseToString(is, md).trim();
processMetaData(md);
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(is);
}
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
private void processMetaData(Metadata md){
if ((getMetaData() == null) || (!getMetaData().isEmpty())) {
setMetaData(new HashMap<String, String>());
}
for (String name : md.names()){
getMetaData().put(name.toLowerCase(), md.get(name));
}
}
public Map<String, String> getMetaData() {
return metaData;
}
public void setMetaData(Map<String, String> metaData) {
this.metaData = metaData;
}
}
public class Page {
private WebURL url;
private String html;
// Data for textual content
private String text;
private String title;
private String keywords;
private String authors;
private String description;
private String contentType;
private String contentEncoding;
// binary data (e.g, image content)
// It's null for html pages
private byte[] binaryData;
private List<WebURL> urls;
private ByteBuffer bBuf;
private final static String defaultEncoding = Configurations
.getStringProperty("crawler.default_encoding", "UTF-8");
public boolean load(final InputStream in, final int totalsize,
final boolean isBinary) {
if (totalsize > 0) {
this.bBuf = ByteBuffer.allocate(totalsize + 1024);
} else {
this.bBuf = ByteBuffer.allocate(PageFetcher.MAX_DOWNLOAD_SIZE);
}
final byte[] b = new byte[1024];
int len;
double finished = 0;
try {
while ((len = in.read(b)) != -1) {
if (finished + b.length > this.bBuf.capacity()) {
break;
}
this.bBuf.put(b, 0, len);
finished += len;
}
} catch (final BufferOverflowException boe) {
System.out.println("Page size exceeds maximum allowed.");
return false;
} catch (final Exception e) {
System.err.println(e.getMessage());
return false;
}
this.bBuf.flip();
if (isBinary) {
binaryData = new byte[bBuf.limit()];
bBuf.get(binaryData);
} else {
this.html = "";
this.html += Charset.forName(defaultEncoding).decode(this.bBuf);
this.bBuf.clear();
if (this.html.length() == 0) {
return false;
}
}
return true;
}
public boolean isBinary() {
return binaryData != null;
}
public byte[] getBinaryData() {
return binaryData;
}
Are you sure that you don't accidentally truncate the PDF document when you load it into the binary buffer in the Page class?
There are multiple potential problems in your Page.load() method. To start with, the finished + b.length > this.bBuf.capacity() should be finished + len > this.bBuf.capacity() since the read() method could have returned fewer than b.length bytes. Also, are you sure that the totalsize argument you give is accurate? Finally, it could be that the given document is larger than the MAX_DOWNLOAD_SIZE limit.
I am trying to parse pdf file using Apache Tika by using ByteArrayInputStream for Binary files... And started getting error for some pdf file and for some it is parsing very well.. Earlier I was able to parse same pdf files using Tika, but now when I tried using ByteArrayInputStream, I started getting error..I think there is some problem with the ByteArray This is the Error I am getting..
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser#652489c0
And this is my code...
if (page.isBinary()) {
handleBinary(page, curURL);
}
public int handleBinary(Page page, WebURL curURL) {
try {
binaryParser.parse(page.getBinaryData());
page.setText(binaryParser.getText());
handleMetaData(page, binaryParser.getMetaData());
//System.out.println(" pdf url " +page.getWebURL().getURL());
//System.out.println("Text" +page.getText());
} catch (Exception e) {
// TODO: handle exception
}
return PROCESS_OK;
}
public class BinaryParser {
private String text;
private Map<String, String> metaData;
private Tika tika;
public BinaryParser() {
tika = new Tika();
}
public void parse(byte[] data) {
InputStream is = null;
try {
is = new ByteArrayInputStream(data);
text = null;
Metadata md = new Metadata();
metaData = new HashMap<String, String>();
text = tika.parseToString(is, md).trim();
processMetaData(md);
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(is);
}
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
private void processMetaData(Metadata md){
if ((getMetaData() == null) || (!getMetaData().isEmpty())) {
setMetaData(new HashMap<String, String>());
}
for (String name : md.names()){
getMetaData().put(name.toLowerCase(), md.get(name));
}
}
public Map<String, String> getMetaData() {
return metaData;
}
public void setMetaData(Map<String, String> metaData) {
this.metaData = metaData;
}
}
public class Page {
private WebURL url;
private String html;
// Data for textual content
private String text;
private String title;
private String keywords;
private String authors;
private String description;
private String contentType;
private String contentEncoding;
private byte[] binaryData;
private List<WebURL> urls;
private ByteBuffer bBuf;
private final static String defaultEncoding = Configurations
.getStringProperty("crawler.default_encoding", "UTF-8");
public boolean load(final InputStream in, final int totalsize,
final boolean isBinary) {
if (totalsize > 0) {
this.bBuf = ByteBuffer.allocate(totalsize + 1024);
} else {
this.bBuf = ByteBuffer.allocate(PageFetcher.MAX_DOWNLOAD_SIZE);
}
final byte[] b = new byte[1024];
int len;
double finished = 0;
try {
while ((len = in.read(b)) != -1) {
if (finished + b.length > this.bBuf.capacity()) {
break;
}
this.bBuf.put(b, 0, len);
finished += len;
}
} catch (final BufferOverflowException boe) {
System.out.println("Page size exceeds maximum allowed.");
return false;
} catch (final Exception e) {
System.err.println(e.getMessage());
return false;
}
this.bBuf.flip();
if (isBinary) {
binaryData = new byte[bBuf.limit()];
bBuf.get(binaryData);
} else {
this.html = "";
this.html += Charset.forName(defaultEncoding).decode(this.bBuf);
this.bBuf.clear();
if (this.html.length() == 0) {
return false;
}
}
return true;
}
public boolean isBinary() {
return binaryData != null;
}
public byte[] getBinaryData() {
return binaryData;
}
Any suggestions what wrong I am doing...!!
UPDATED:-
After upgrading to pdfbox 1.6.0 version, I started getting this error for some pdf...
Parsing Error, Skipping Object
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream#70dbdc4b
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
And for some pdf this error...
Did not found XRef object at specified startxref position 0
Invalid dictionary, found: '' but expected: '/'
WARN [Crawler 2] Did not found XRef object at specified startxref position 0
This is a known bug of PDFBox version 1.4.0. Just update to PDFBox 1.5.0+.
Check this release notes:
[PDFBOX-578] NPE NullPointerException in PDPageNode.getCount
And this JIRA ticket.