I'm trying to read CSV files from GTFS.zip with help of uniVocity-parsers and run into an issue that I can't figure out. For some reason it seems the first column of some CSV files won't be parsed correctly. For example in the "stops.txt" file that looks like this:
stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station
"de:3811:30215:0:6","Freiburg Stübeweg","48.0248455941735","7.85563688037231","","Parent30215"
"de:8311:30054:0:1","Freiburg Schutternstraße","48.0236251356332","7.72434519425597","","Parent30054"
"de:8311:30054:0:2","Freiburg Schutternstraße","48.0235446600679","7.72438739944883","","Parent30054"
The "stop_id" field won't be parsed correctly will have the value "null"
This is the method I'm using to read the file:
public <T> List<T> readCSV(String path, String file, BeanListProcessor<T> processor) {
List<T> content = null;
try {
// Get zip file
ZipFile zip = new ZipFile(path);
// Get CSV file
ZipEntry entry = zip.getEntry(file);
InputStream in = zip.getInputStream(entry);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setProcessor(processor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
parser.parse(new InputStreamReader(in));
content = processor.getBeans();
zip.close();
return content;
} catch (Exception e) {
e.printStackTrace();
}
return content;
}
And this is how my Stop Class looks like:
public class Stop {
#Parsed
private String stop_id;
#Parsed
private String stop_name;
#Parsed
private String stop_lat;
#Parsed
private String stop_lon;
#Parsed
private String location_type;
#Parsed
private String parent_station;
public Stop() {
}
public Stop(String stop_id, String stop_name, String stop_lat, String stop_lon, String location_type,
String parent_station) {
this.stop_id = stop_id;
this.stop_name = stop_name;
this.stop_lat = stop_lat;
this.stop_lon = stop_lon;
this.location_type = location_type;
this.parent_station = parent_station;
}
// --------------------- Getter --------------------------------
public String getStop_id() {
return stop_id;
}
public String getStop_name() {
return stop_name;
}
public String getStop_lat() {
return stop_lat;
}
public String getStop_lon() {
return stop_lon;
}
public String getLocation_type() {
return location_type;
}
public String getParent_station() {
return parent_station;
}
// --------------------- Setter --------------------------------
public void setStop_id(String stop_id) {
this.stop_id = stop_id;
}
public void setStop_name(String stop_name) {
this.stop_name = stop_name;
}
public void setStop_lat(String stop_lat) {
this.stop_lat = stop_lat;
}
public void setStop_lon(String stop_lon) {
this.stop_lon = stop_lon;
}
public void setLocation_type(String location_type) {
this.location_type = location_type;
}
public void setParent_station(String parent_station) {
this.parent_station = parent_station;
}
#Override
public String toString() {
return "Stop [stop_id=" + stop_id + ", stop_name=" + stop_name + ", stop_lat=" + stop_lat + ", stop_lon="
+ stop_lon + ", location_type=" + location_type + ", parent_station=" + parent_station + "]";
}
}
If I call the method i get this output which is not correct:
PartialReading pr = new PartialReading();
List<Stop> stops = pr.readCSV("VAGFR.zip", "stops.txt", new BeanListProcessor<Stop>(Stop.class));
for (int i = 0; i < 4; i++) {
System.out.println(stops.get(i).toString());
}
Output:
Stop [stop_id=null, stop_name=Freiburg Stübeweg, stop_lat=48.0248455941735, stop_lon=7.85563688037231, location_type=null, parent_station=Parent30215]
Stop [stop_id=null, stop_name=Freiburg Schutternstraße, stop_lat=48.0236251356332, stop_lon=7.72434519425597, location_type=null, parent_station=Parent30054]
Stop [stop_id=null, stop_name=Freiburg Schutternstraße, stop_lat=48.0235446600679, stop_lon=7.72438739944883, location_type=null, parent_station=Parent30054]
Stop [stop_id=null, stop_name=Freiburg Waltershofen Ochsen, stop_lat=48.0220902613143, stop_lon=7.7205756507492, location_type=null, parent_station=Parent30055]
Does anyone know why this happens and how I can fix it? This also happens in the "routes.txt" and "trips.txt" files that I tested.
This is the GTFS file : http://stadtplan.freiburg.de/sld/VAGFR.zip
If you print the headers you will notice that the first column doesn't look right. That's because you are parsing a file encoded using UTF-8 with a BOM marker.
Basically the file starts with a few bytes indicating what is the encoding. Until version 2.5.*, the parser didn't handle that internally, and you had to skip these bytes to get the correct output:
//... your code here
ZipEntry entry = zip.getEntry(file);
InputStream in = zip.getInputStream(entry);
if(in.read() == 239 & in.read() == 187 & in.read() == 191){
System.out.println("UTF-8 with BOM, bytes discarded");
}
CsvParserSettings parserSettings = new CsvParserSettings();
//...rest of your code here
The above hack will work on any version before 2.5.*, but you could also use Commons-IO provides a BOMInputStream for convenience and a more clean handling of this sort of thing - it's just VERY slow.
Updating to a recent version should take care of it automatically.
Hope it helps.
Related
I'm struggling to get my simple Tomcat app to work.
I have csv file placed in src/main/resources/cities.csv
and in my test main method everything goes well. However when I refer to this using a method in servlet I get:
java.nio.file.NoSuchFileException
FileOperations:
public class FileOparations {
static private final Path path = Paths.get("src/main/resources/cities.csv");
public static List<City> getCitiesList() {
return getCitiesStream().collect(Collectors.toList());
}
public static Stream<City> getCitiesStream() {
try {
return Files.readAllLines(path)
.stream()
.skip(1)
.map((String line) -> {
return line.split("\",\"");
})
.map((String[] line) -> {
String[] newData = new String[line.length];
for (int i = 0; i < line.length; i++) {
newData[i] = line[i].replaceAll("\"", "");
}
return newData;
})
.map((String[] data) -> {
String name = data[0];
String nameAscii = data[1];
String gps = data[2] + ":" + data[3];
String country = data[4];
String adminName = data[7];
String capitol = data[8];
long population;
try {
population = Long.parseLong(data[9]);
} catch (NumberFormatException e) {
population = Integer.MIN_VALUE;
}
int id = Integer.parseInt(data[10]);
return new City(name, nameAscii, id, country, gps, capitol, population, adminName);
});
} catch (IOException e) {
e.printStackTrace();
}
return Stream.empty();
}
My code in servlet looks like this:
protected void doGet(HttpServletRequest request, HttpServletResponse response){
List<City> cities = FileOparations.getCitiesList();
request.setAttribute("cities", cities);
request.getRequestDispatcher("result.jsp").forward(request, response);
}
I'm surprised, because I'm not passing any URL through servlet, I want to call static method from Java. Method getCitiesList calls stream, mapping and returns ready to use list.
Try using getServletContext():
String relativePath = "/resources/cities.csv";
InputStream input = getServletContext().getResourceAsStream(relativeWebPath);
Relative path will be the path from the directory, expanded from the WAR file in your tomcat. So before building, it should be in src/main/webapp/
Im working on a task that requires me to read from a .csv file using stream API, go over each line and construct an object with the lines. The object class is called Planet and is:
public Planet(String name, long inhabitants, boolean stargateAvailable, boolean dhdAvailable, List<String> teamsVisited) {
}
public String getName() {
return name;
}
public long getInhabitants() {
return inhabitants;
}
public boolean isStargateAvailable() {
return stargateAvailable;
}
public boolean isDhdAvailable() {
return dhdAvailable;
}
public List<String> getTeamsVisited() {
return teamsVisited;
}
#Override
public String toString() {
return name;
}
}
So using stream API to go over each of the lines of the .cvs file i need to create objects of class Planet.
I havent made any progress at all because I really am not sure how to use stream API
public class Space {
public List<Planet> csvDataToPlanets(String filePath) {
return null;
}
try the below snippet.
File inputF = new File(inputFilePath);
InputStream inputFS = new FileInputStream(inputF);
BufferedReader br = new BufferedReader(new InputStreamReader(inputFS));
// skip the header of the csv
inputList = br.lines().skip(1).map(mapToItem).collect(Collectors.toList());
br.close();
For more information check this link
public static void main(String args[]) {
String fileName = "<Your File Path"";
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(<Method to split the string based with ',' as delimiter and call Constructor using Reflection API>);
} catch (IOException e) {
e.printStackTrace();
}
}
I'm having major trouble piecing this together. I have basic read and write functionality. What I need is for the input from file 'Books.txt' to be checked so that:
ISBN is valid
CopyNumber, Year and Statistics should be numeric
Title, Author and Publisher must contain values
BorrowDate must be a valid date
ReturnDate if available must be a valid date
LibraryCardNumber if available must be numeric.
If a book is not borrowed the two last fields are nonexistent.
2 sample rows from 'Books.txt':
9780140455168#2#The Twelve Caesars#Suetonius#Penguin Classics#2007#3#101009#101030#5478
9780141188607#1#Claudius the God#Robert Graves#Penguin Classics#2006#2#080123
Error lines should be written to 'ErrorLines.txt' with an error-message, e.g. Wrong ISBN. Error-free books should be written to 'NewBooks.txt' sorted by name of author.
Here's what I've got so far. I'm not looking for a complete solution, because I obviously have a looong way to go, but if someone would be so kind as to give me some pointers, I'd be extremely grateful! And yes, it's homework :D
Do I need to make a try loop to validate the input...?
The Library class:
import java.util.ArrayList;
import java.util.*;
import java.io.*;
import java.io.IOException;
public class Library {
public void readFromFile (String filename) throws IOException {
String inLine;
File inFile;
inFile = new File("Books.txt");
BufferedReader fIn = new BufferedReader(new FileReader(inFile));
inLine = fIn.readLine();
while (inLine != null) {
inLine = fIn.readLine();
aBookList.add(inLine + "\n");
}
fIn.close();
}
public void writeToFile (String fileName) throws IOException {
BufferedWriter bw = null;
try {
bw = new BufferedWriter(new FileWriter(fileName));
bw.write("???"); //Dont know what to put here...
bw.newLine();
} catch (IOException e) {
System.out.println("Error writing file.");
} finally {
bw.close();
}
}
public static boolean isISBN13Valid(isbn) {
int check = 0;
for (int i = 0; i < 12; i += 2) {
check += Integer.valueOf(isbn.substring(i, i + 1));
}
for (int i = 1; i < 12; i += 2) {
check += Integer.valueOf(isbn.substring(i, i + 1)) * 3;
}
check += Integer.valueOf(isbn.substring(12));
return check % 10 == 0;
}
}
And here's the Book class:
import java.util.*;
import java.io.*;
public class Book {
Book b = new Book();
private static ArrayList<String> aBookList = new ArrayList<String>();
private String Isbn;
private int CopyNumber;
private String Title;
private String Author;
private String Publisher;
private int Year;
private int Statistics;
private String BorrowDate;
private String ReturnDate;
private int LibraryCardNumber;
public void bookInfo (String nIsbn, int nCopyNumber, String nTitle, String nAuthor, String nPublisher, int nYear,
int nStatistics, String nBorrowDate, String nReturnDate, int nLibraryCardNumber) {
Isbn = nIsbn;
CopyNumber = nCopyNumber;
Title = nTitle;
Author = nAuthor;
Publisher = nPublisher;
Year = nYear;
Statistics = nStatistics;
BorrowDate = nBorrowDate;
ReturnDate = nReturnDate;
LibraryCardNumber = nLibraryCardNumber;
}
public void bookInfo (String Row) {
StringTokenizer sT = new StringTokenizer(Row);
Isbn = sT.nextToken("#");
CopyNumber = Integer.parseInt(sT.nextToken("#") );
Title = sT.nextToken("#");
Author = sT.nextToken("#");
Publisher = sT.nextToken("#");
Year = Integer.parseInt(sT.nextToken("#") );
Statistics = Integer.parseInt(sT.nextToken("#") );
BorrowDate = sT.nextToken("#");
ReturnDate = sT.nextToken("#");
LibraryCardNumber = Integer.parseInt(sT.nextToken("#") );
}
public void setIsbn(String nIsbn) {
Isbn = nIsbn;
}
public void setCopynumber(int nCopyNumber) {
CopyNumber = nCopyNumber;
}
public void setTitle(String nTitle) {
Title = nTitle;
}
public void setAuthor(String nAuthor) {
Author = nAuthor;
}
public void setPublisher(String nPublisher) {
Publisher = nPublisher;
}
public void setYear(int nYear) {
Year = nYear;
}
public void setStatistics(int nStatistics) {
Statistics = nStatistics;
}
public void setBorrowDate(String nBorrowDate) {
BorrowDate = nBorrowDate;
}
public void setReturnDate(String nReturnDate) {
ReturnDate = nReturnDate;
}
public void setLibraryCardNumber(int nLibraryCardNumber) {
LibraryCardNumber = nLibraryCardNumber;
}
public String getAll () {
String s = " ";
return (Isbn + s + CopyNumber + s + Title + s + Author + s + Publisher + s +
Year + s + Statistics + s + BorrowDate + s + ReturnDate + s +
LibraryCardNumber);
}
public void showAll () {
String t = "\t";
System.out.println(Isbn + t + CopyNumber + t + Title + t + Author + t +
Publisher + t + Year + t + Statistics + t +
BorrowDate + t + ReturnDate + t + LibraryCardNumber);
}
}
And finally there's the Main class with main method:
public class Main<aBookList> implements Comparable<aBookList> {
public static void main(String [] args) throws Exception {
new Library().readFromFile("Books.txt");
new Library().writeToFile("NewBooks.txt");
new Library().writeToFile("ErrorLines.txt");
}
#Override
public int compareTo(aBookList o) {
return 0;
}
}
as it is homework, i will point you direction, not give you code
1) you have lot of mess here, ie i'm not sure why you have compare in your main class? instead of creating getAll method in bookInfo(which is named against java nameing convention) just override toString method
2) why do you have list of strings? read a line, convert this into book, if book is valid add it to your list, otherwise report an error
3) move your isISBN13Valid method to book
4) write to file -> loop through your list, and save each element into file by bw.write(book.toString()),
5) create second method createErrorFile, then each error what you will have add into your error list, and after you call that method, you will sace each element into given file, it is not perfetc solution, better will be if you add error to file each time when it occur.
6) create one instance of library in your main method, and just call on it all your method, and avoid using static fields in your project(sometimes you must, but if you don;t need, just avoid them)
7) names for method import/ export i think sounds nicer than read from file read from file
I am trying to parse pdf file using Apache Tika by using ByteArrayInputStream for Binary files... And started getting error for some pdf file and for some it is parsing very well.. Earlier I was able to parse same pdf files using Tika, but now when I tried using ByteArrayInputStream, I started getting error..I think there is some problem with the ByteArray This is the Error I am getting..
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.PDFParser#652489c0
And this is my code...
if (page.isBinary()) {
handleBinary(page, curURL);
}
public int handleBinary(Page page, WebURL curURL) {
try {
binaryParser.parse(page.getBinaryData());
page.setText(binaryParser.getText());
handleMetaData(page, binaryParser.getMetaData());
//System.out.println(" pdf url " +page.getWebURL().getURL());
//System.out.println("Text" +page.getText());
} catch (Exception e) {
// TODO: handle exception
}
return PROCESS_OK;
}
public class BinaryParser {
private String text;
private Map<String, String> metaData;
private Tika tika;
public BinaryParser() {
tika = new Tika();
}
public void parse(byte[] data) {
InputStream is = null;
try {
is = new ByteArrayInputStream(data);
text = null;
Metadata md = new Metadata();
metaData = new HashMap<String, String>();
text = tika.parseToString(is, md).trim();
processMetaData(md);
} catch (Exception e) {
e.printStackTrace();
} finally {
IOUtils.closeQuietly(is);
}
}
public String getText() {
return text;
}
public void setText(String text) {
this.text = text;
}
private void processMetaData(Metadata md){
if ((getMetaData() == null) || (!getMetaData().isEmpty())) {
setMetaData(new HashMap<String, String>());
}
for (String name : md.names()){
getMetaData().put(name.toLowerCase(), md.get(name));
}
}
public Map<String, String> getMetaData() {
return metaData;
}
public void setMetaData(Map<String, String> metaData) {
this.metaData = metaData;
}
}
public class Page {
private WebURL url;
private String html;
// Data for textual content
private String text;
private String title;
private String keywords;
private String authors;
private String description;
private String contentType;
private String contentEncoding;
private byte[] binaryData;
private List<WebURL> urls;
private ByteBuffer bBuf;
private final static String defaultEncoding = Configurations
.getStringProperty("crawler.default_encoding", "UTF-8");
public boolean load(final InputStream in, final int totalsize,
final boolean isBinary) {
if (totalsize > 0) {
this.bBuf = ByteBuffer.allocate(totalsize + 1024);
} else {
this.bBuf = ByteBuffer.allocate(PageFetcher.MAX_DOWNLOAD_SIZE);
}
final byte[] b = new byte[1024];
int len;
double finished = 0;
try {
while ((len = in.read(b)) != -1) {
if (finished + b.length > this.bBuf.capacity()) {
break;
}
this.bBuf.put(b, 0, len);
finished += len;
}
} catch (final BufferOverflowException boe) {
System.out.println("Page size exceeds maximum allowed.");
return false;
} catch (final Exception e) {
System.err.println(e.getMessage());
return false;
}
this.bBuf.flip();
if (isBinary) {
binaryData = new byte[bBuf.limit()];
bBuf.get(binaryData);
} else {
this.html = "";
this.html += Charset.forName(defaultEncoding).decode(this.bBuf);
this.bBuf.clear();
if (this.html.length() == 0) {
return false;
}
}
return true;
}
public boolean isBinary() {
return binaryData != null;
}
public byte[] getBinaryData() {
return binaryData;
}
Any suggestions what wrong I am doing...!!
UPDATED:-
After upgrading to pdfbox 1.6.0 version, I started getting this error for some pdf...
Parsing Error, Skipping Object
java.io.IOException: expected='endstream' actual='' org.apache.pdfbox.io.PushBackInputStream#70dbdc4b
at org.apache.pdfbox.pdfparser.BaseParser.parseCOSStream(BaseParser.java:439)
at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:552)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:184)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1088)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1053)
And for some pdf this error...
Did not found XRef object at specified startxref position 0
Invalid dictionary, found: '' but expected: '/'
WARN [Crawler 2] Did not found XRef object at specified startxref position 0
This is a known bug of PDFBox version 1.4.0. Just update to PDFBox 1.5.0+.
Check this release notes:
[PDFBOX-578] NPE NullPointerException in PDPageNode.getCount
And this JIRA ticket.
I wrote a simple java application, I have a problem please help me;
I have a file (JUST EXAMPLE):
1.TXT
-------
SET MRED:NAME=MRED:0,MREDID=60;
SET BCT:NAME=BCT:0,NEPE=DCS,T2=5,DK0=KOR;
CREATE LCD:NAME=LCD:0;
-------
and this is my source code
import java.io.IOException;
import java.io.*;
import java.util.StringTokenizer;
class test1 {
private final int FLUSH_LIMIT = 1024 * 1024;
private StringBuilder outputBuffer = new StringBuilder(
FLUSH_LIMIT + 1024);
public static void main(String[] args) throws IOException {
test1 p=new test1();
String fileName = "i:\\1\\1.txt";
File file = new File(fileName);
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line, ";|,");
while (st.hasMoreTokens()) {
String token = st.nextToken();
p.processToken(token);
}
}
p.flushOutputBuffer();
}
private void processToken(String token) {
if (token.startsWith("MREDID=")) {
String value = getTokenValue(token,"=");
outputBuffer.append("MREDID:").append(value).append("\n");
} else if (token.startsWith("DK0=")) {
String value = getTokenValue(token,"=");
outputBuffer.append("DK0=:").append(value).append("\n");
} else if (token.startsWith("NEPE=")) {
String value = getTokenValue(token,"=");
outputBuffer.append("NEPE:").append(value).append("\n");
}
if (outputBuffer.length() > FLUSH_LIMIT) {
flushOutputBuffer();
}
}
private String getTokenValue(String token,String find) {
int start = token.indexOf(find) + 1;
int end = token.length();
String value = token.substring(start, end);
return value;
}
private void flushOutputBuffer() {
System.out.print(outputBuffer);
outputBuffer = new StringBuilder(FLUSH_LIMIT + 1024);
}
}
I want this output :
MREDID:60
DK0=:KOR
NEPE:DCS
But this application show me this :
MREDID:60
NEPE:DCS
DK0=:KOR
please tell me how can i handle this , because of that DK0 must be at first and this is just a sample ; my real application has 14000 lines
Thanks ...
Instead of outputting the value when you read it, put it in a hashmap. Once you've read your entire file, output in the order you want by getting the values from the hashmap.
Use a HashTable to store the values and print from it in the desired order after parsing all tokens.
//initialize hash table
HashTable ht = new HashTable();
//instead of outputBuffer.append, put the values in to the table like
ht.put("NEPE", value);
ht.put("DK0", value); //etc
//print the values after the while loop
System.out.println("MREDID:" + ht.get("MREDID"));
System.out.println("DK0:" + ht.get("DK0"));
System.out.println("NEPE:" + ht.get("NEPE"));
Create a class, something like
class data {
private int mredid;
private String nepe;
private String dk0;
public void setMredid(int mredid) {
this.mredid = mredid;
}
public void setNepe(String nepe) {
this.nepe = nepe;
}
public void setDk0(String dk0) {
this.dk0 = dk0;
}
public String toString() {
String ret = "MREDID:" + mredid + "\n";
ret = ret + "DK0=:" + dk0 + "\n";
ret = ret + "NEPE:" + nepe + "\n";
}
Then change processToken to
private void processToken(String token) {
Data data = new Data();
if (token.startsWith("MREDID=")) {
String value = getTokenValue(token,"=");
data.setMredid(Integer.parseInt(value));
} else if (token.startsWith("DK0=")) {
String value = getTokenValue(token,"=");
data.setDk0(value);
} else if (token.startsWith("NEPE=")) {
String value = getTokenValue(token,"=");
data.setNepe(value);
}
outputBuffer.append(data.toString());
if (outputBuffer.length() > FLUSH_LIMIT) {
flushOutputBuffer();
}
}