XML not parsing completely - java

I'm trying to extract the xml over Goodread's API, problem is the xml i've extracted is incomplete.
String resultJsonStr = null;
ArrayList<String> results = new ArrayList<String>();
String xml = null;
try {
String str = "https://www.goodreads.com/book/isbn?isbn=9780755380022&key=UpH4L0IYjAXcezlfg0yT2Q";
URL url = new URL(str);
InputStream is = url.openStream();
int ptr = 0;
StringBuilder builder = new StringBuilder();
while ((ptr = is.read()) != -1) {
builder.append((char) ptr);
}
StringBuffer stringBuffer = new StringBuffer(builder);
xml = builder.toString();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
I'm using this xml
The code stops inside reviews_widget I believe it's reading the thing as a style, and disregarding further lines of the xml.
inside String xml
I'm doing this in preparation for JSON conversion.

Related

How to split parsed String data without special characters?

I parsed this data from Wikipedia and trying to get only characters from here. But the result comes with \n* in the front of data.
"": "=== 고양이의 종류 ===\n [[시암고양이]]\n* [[페르시안 네브스카야]]\n* [[페르시안]]\n*
[[노르웨이지언 포레스트]]\n* [[터키시 앙고라]]\n* [[아메리칸 숏헤어]]\n* [[브리티시 숏헤어]]\n*
[[러시안블루]]\n* [[뱅갈]]\n* [[메인쿤]]\n* [[랙돌]]\n* [[히말라얀]]\n* [[재패니즈
밥테일]]\n* [[오리엔탈 숏헤어]]\n* [[피터볼드]]\n* [[스코티시 폴드]]\n* 스코티시 스트레이트\n*
[[하일랜드 폴드]]\n* [[시베리안 포레스트]]\n* [[터키시 반]]\n* [[코리안 쇼트헤어]]\n*
[[올블랙]]\n* [[사바나캣]]\n* [[쿠나]]\n* [[아비시니안]]\n* 먼치킨"
This is my code.
try {
URL url = new URL("https://ko.wikipedia.org/w/api.php?action=query&prop=revisions&rvprop=content&rvsection=20&titles=%EA%B3%A0%EC%96%91%EC%9D%B4&format=json");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
BufferedReader reader = new BufferedReader(isr);
while(true){
String data = reader.readLine();
if(data == null) break;
result += data;
}
JSONObject obj = new JSONObject(result);
JSONObject query = (JSONObject) obj.get("query");
JSONObject pages = (JSONObject) query.get("pages");
JSONObject pageid = (JSONObject) pages.get("93349");
JSONArray revisions = (JSONArray) pageid.get("revisions");
String catcat = String.valueOf(revisions);
String star = "\n*";
catcat = catcat.replaceAll("\\[\\[","").replaceAll("\\]\\]",",").replaceAll("\\r|\\n", "").replaceAll(star,"");
String[] catcategory = catcat.split(",");
for (int i = 0; i<catcategory.length;i++){
list.add(catcategory[i]);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (JSONException e) {
e.printStackTrace();
}
Result for this looks like
\n시암고양이
\n페르시안
and I want to remove \n*.
How to split parsed String data without special characters?
Try this piece of code, It's removed \n* , Then you can add _result_word to your list.
for (int i = 0; i < catcategory.length; i++) {
try {
String _result_word = catcategory[i].replaceFirst("\\\\n", "").replace("*", "");
//String _result_word=catcategory[i].replaceFirst("\\\\n", "").replace("*", "").replaceFirst("\\\\n", "").replace("*", "");
System.out.println("" + _result_word);
list.add(_result_word);
} catch (Exception ex) {
System.out.println("Special Exception occurred at index : i = " + i);
ex.printStackTrace();
}
}
Everything correct except one line where you need escape asterisk character and escape slash character
String star = "\\\\n\\*";
str.replaceAll(star, "");

How to search and remove some values from xml java

I have an xml abc.xml
<soapenv:Envelope>
<soapenv:Header/>
<soapenv:Body>
<mes:SomeRq>
<RqID>?</RqID>
<MsgRqHdr>
....
</mes:SomeRq>
</soapenv:Body>
</soapenv:Envelope>
Is there a way i can search mes:, from this xml and replace it with ins:
Thanks in advance.
public static void findreplcae(String strFilePath) throws IOException {
String currentString = "mes:";
String changedString = "ins:";
BufferedReader reader = new BufferedReader(new FileReader(strFilePath));
StringBuffer currentLine = new StringBuffer();
String currentLineIn;
while ((currentLineIn = reader.readLine()) != null) {
System.out.println(currentLineIn);
boolean bool = false;
String trimmedLine = currentLineIn.trim();
System.out.println(trimmedLine);
if (trimmedLine.contains(currentString)) {
trimmedLine.replace(currentString, changedString);
bool = true;
if (bool != true) {
currentLine = currentLine.append(currentLineIn + System.getProperty("line.separator"));
}
}
reader.close();
BufferedWriter writer = new BufferedWriter(new FileWriter(strFilePath));
writer.write(currentLine.toString());
writer.close();
}
}
It's not good idea to parse it as a text file. DocumentBuilder.parse to parse the file, call getDocumentElement() and check for getPrefix. If it matches, replace with setPrefix(). Note you have to register prefix if not yet done already.
Check this page for tutorial.
Some issues:
trimmedLine.replace(currentString, changedString);, the result is returned, so you have to store it somewhere. See here
What is this supposed to do?
bool = true;
if (bool != true) {
currentLine = currentLine.append(currentLineIn + System.getProperty("line.separator"));
}
Don't close the reader while reading in a loop.
If you want to overwrite the original file, this should do (although I am not sure, if you really want to trim the lines):
String currentString = "mes:";
String changedString = "ins:";
try {
BufferedReader reader = new BufferedReader(new FileReader(strFilePath));
StringBuffer newContents = new StringBuffer();
String currentLineIn = null;
while ((currentLineIn = reader.readLine()) != null) {
String trimmedLine = currentLineIn.trim();
if (trimmedLine.contains(currentString)) {
newContents.append(trimmedLine.replace(currentString, changedString));
}
else {
newContents.append(trimmedLine);
}
newContents.append(System.getProperty("line.separator"));
}
reader.close();
BufferedWriter writer = new BufferedWriter(new FileWriter(strFilePath));
writer.write(newContents.toString());
writer.close();
} catch (IOException e) {
// TODO handle it
}

File can not read correctly

I trying to read a json file from dbpedia and parse it. But the code that i have wrote can not correctly read the whole json file and for that reason parsing error comes. Here is my code for reading and parsing...
URL url=new URL("http://dbpedia.org/data3/assembly.json");
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String inputLine="asdf";
while (( in.readLine()) != null)
{
if (inputLine=="asdf")
inputLine=in.readLine();
else
inputLine+=in.readLine();
//System.out.println(inputLine);
}
System.out.println(inputLine);
Object obj = parser.parse(inputLine);
JSONObject jsonObject = (JSONObject) obj;
You can create a helper method to read the file from url:
private static String readUrl(String urlString) throws Exception {
BufferedReader reader = null;
try {
URL url = new URL(urlString);
reader = new BufferedReader(new InputStreamReader(url.openStream()));
StringBuffer buffer = new StringBuffer();
int read;
char[] chars = new char[1024];
while ((read = reader.read(chars)) != -1) {
buffer.append(chars, 0, read);
}
return buffer.toString();
} finally {
if (reader != null)
reader.close();
}
}
then you can call the method like this
try {
JSONObject json = new JSONObject(readUrl("http://dbpedia.org/data3/assembly.json"));
...
} catch (JSONException e) {
e.printStackTrace();
}
It's up to you, if you need StringBuffer or StringBuilder

How to extract starting of a String in Java

I have a text file with more than 20,000 lines and i need to extract specific line from it. The output of this program is completely blank file.
There are 20,000 lines in the txt file and this ISDN line keeps on repeating lots of time each with different value. My text file contains following data.
RecordType=0(MOC)
sequenceNumber=456456456
callingIMSI=73454353911
callingIMEI=85346344
callingNumber
AddInd=H45345'1
NumPlan=H34634'2
ISDN=94634564366 // Need to extract this "ISDN" line only
public String readTextFile(String fileName) {
String returnValue = "";
FileReader file = null;
String line = "";
String line2 = "";
try {
file = new FileReader(fileName);
BufferedReader reader = new BufferedReader(file);
while ((line = reader.readLine()) != null) {
// extract logic starts here
if (line.startsWith("ISDN") == true) {
System.out.println("hello");
returnValue += line + "\n";
}
}
} catch (FileNotFoundException e) {
throw new RuntimeException("File not found");
} finally {
if (file != null) {
try {
file.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
return returnValue;
}
We will assume that you use Java 7, since this is 2014.
Here is a method which will return a List<String> where each element is an ISDN:
private static final Pattern ISDN = Pattern.compile("ISDN=(.*)");
// ...
public List<String> getISDNsFromFile(final String fileName)
throws IOException
{
final Path path = Paths.get(fileName);
final List<String> ret = new ArrayList<>();
Matcher m;
String line;
try (
final BufferedReader reader
= Files.newBufferedReader(path, StandardCharsets.UTF_8);
) {
while ((line = reader.readLine()) != null) {
m = ISDN.matcher(line);
if (m.matches())
ret.add(m.group(1));
}
}
return ret;
}

find specific string after certain pattern in a txt file

so i am new to java.please offer some sample codes if possible.
The situation is i have a html format in a text file. i need to read the file and find the string after a pattern which is 'data-name'. i need to find every string after the "data-name" through the entire text file. i did some research online . i already used html parser to get the html and store it in a text file. i know i might need to use regular expression. so please help me. Thank you guys!
below is my code for getting the html. the result is concatenated.
public static void main(String[] args) {
try {
URL url = new URL("https://twitter.com/search?q=%23JENOSMROOKIESOPENFOLBACK&src=tren");
// read text returned by server
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream()));
String line;
PrintWriter out = new PrintWriter(new FileWriter("C:/Users/Desktop/htmlsourcecode.txt"));
while ((line = in.readLine()) != null) {
System.out.println(line);
out.print(line);
}
out.close();
}
How about something like this
// External resource(s).
BufferedReader in = null;
PrintWriter out = null;
try {
URL url = new URL(
"https://twitter.com/search?q=%23JENOSMROOKIESOPENFOLBACK&src=tren");
// read text returned by server
in = new BufferedReader(new InputStreamReader(
url.openStream()));
String line;
// out = new PrintWriter(new FileWriter(
// "htmlsourcecode.txt"));
final String DATA_NAME = "data-name=\"";
while ((line = in.readLine()) != null) {
int pos1 = line.indexOf(DATA_NAME); // opening position.
if (pos1 > -1) { // did we match?
// Add the length of the string.
pos1 += DATA_NAME.length();
// find the closing quote.
int pos2 = line.indexOf("\"", pos1 + 1);
if (pos2 > -1) {
String dataName = line.substring(pos1,
pos2);
System.out.println(dataName);
// out.print(line);
}
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
// Close external resource(s).
if (in != null) {
try {
in.close();
} catch (IOException e) {
}
}
if (out != null) {
out.close();
}
}

Categories

Resources