I'm trying to parse, using Java and GSON, a large (about 10GB) database dump in JSON format from the Musicbrainz.org
the JSON file has this structure. No '[' ']' to indicate that this is gonna be an array of objects, and no ',' between each object. Don't know why, but this JSON file is just like that.
{
"id": "d0ab06e1-751a-414b-a976-da72670391b1",
"name": "Arcing Wires",
"sort-name": "Arcing Wires"
}
{
"id": "6f0c2c16-dd7e-4268-a484-bc7b2ac78108",
"name": "Another",
"sort-name": "Another"
}
{
"id": "e062b6cd-5506-47b0-afdb-72f4279ec38c",
"name": "Agent S",
"sort-name": "Agent S"
}
and this is the code that I'm using:
try(JsonReader jsonReader = new JsonReader(
new InputStreamReader(
new FileInputStream(jsonFilePath), StandardCharsets.UTF_8))) {
Gson gson = new GsonBuilder().create();
jsonReader.beginArray();
while (jsonReader.hasNext()) {
Artist mapped = gson.fromJson(jsonReader, Artist.class);
//TODO do something with the object
}
}
jsonReader.endArray();
}
catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
and the class that I mapped is this:
public class Artist {
#SerializedName("id")
public String id;
#SerializedName("name")
public String name;
#SerializedName("sort-name")
public String sortName;
}
the error I'm getting:
Exception in thread "main" java.lang.IllegalStateException: Expected BEGIN_ARRAY but was BEGIN_OBJECT at line 1 column 2 path $
at com.google.gson.stream.JsonReader.beginArray(JsonReader.java:350)
at DBLoader.parse(DBLoader.java:39)
at DBLoader.main(DBLoader.java:23)
I believe that the GSON expect a different structure from what I declared, but I don't understand how should I define this kind of JSON with no commas and no brackets.
Any clues?
thanks
JSON by default declares one top value only (and yes, this would be a valid JSON document), but there is JSON streaming that uses arbitrary techniques to concatenate multiple JSON elements into a single stream assuming that the stream consumer can parse it (read more). Gson supports a so-called lenient mode that turns off the "one top value only" mode (and does some more things irrelevant to the question) for JsonReader: setLenient. Having the lenient mode on, you can read JSON elements one by one, and it turns out that this mode can be used to parse/read line-delimited JSON and concatenated JSON values since they are simply delimited by zero or more whitespaces that are ignored by Gson (therefore more exotic record separator-delimited JSON and length-prefixed JSON are unsupported). The reason of why it does not work for you is that your initial code assumes that the stream contains a single JSON array (and it does not obviously: it is supposed to be a stream of elements that does not conform the JSON array syntax).
A simple generic JSON stream support might look like this (using Stream API for its more rich API than Iterator has, but it is fine to show an idea, and you can easily adapt it to iterators, callbacks, observable streams, whatever you like):
#UtilityClass
public final class JsonStreamSupport {
public static <T> Stream<T> parse(#WillNotClose final JsonReader jsonReader, final Function<? super JsonReader, ? extends T> readElement) {
final boolean isLenient = jsonReader.isLenient();
jsonReader.setLenient(true);
final Spliterator<T> spliterator = new Spliterators.AbstractSpliterator<T>(Long.MAX_VALUE, Spliterator.ORDERED) {
#Override
public boolean tryAdvance(final Consumer<? super T> action) {
try {
final JsonToken token = jsonReader.peek();
if ( token == JsonToken.END_DOCUMENT ) {
return false;
}
// TODO: read more elements in batch
final T element = readElement.apply(jsonReader);
action.accept(element);
return true;
} catch ( final IOException ex ) {
throw new RuntimeException(ex);
}
}
};
return StreamSupport.stream(spliterator, false)
.onClose(() -> jsonReader.setLenient(isLenient));
}
}
And then:
JsonStreamSupport.<Artist>parse(jsonReader, jr -> gson.fromJson(jr, Artist.class))
.forEach(System.out::println);
Output (assuming Artist has Lombok-generated toString()):
Artist(id=d0ab06e1-751a-414b-a976-da72670391b1, name=Arcing Wires, sortName=Arcing Wires)
Artist(id=6f0c2c16-dd7e-4268-a484-bc7b2ac78108, name=Another, sortName=Another)
Artist(id=e062b6cd-5506-47b0-afdb-72f4279ec38c, name=Agent S, sortName=Agent S)
How many bytes does such an approach, JSON streaming, save so that it is used at the service you're trying to consume? I don't know.
It looks like jsonl format where every line is a valid JSON object. (read more here)
You can read file line by line and convert to object. I think it will works.
Related
Im trying to get a key:value pair from a simple jsonString to add it after into a memory tab. If facing an issue cause my input is a string. and it looks like my loop isnot able to read the key value pair.
I read many topics about it, and im still in trouble with it. As you can see below
{"nom":"BRUN","prenom":"Albert","date_naiss":"10-10-1960","adr_email":"abrun#gmail.com","titre":"Mr","sexe":"F"}
and my method, find only on object... the result is the same in my loop
public static ArrayHandler jsonSimpleObjectToTab(String data) throws ParseException {
if( data instanceof String) {
final var jsonParser = new JSONParser();
final var object = jsonParser.parse(data);
final var array = new JSONArray();
array.put(object);
final var handler = new ArrayHandler("BW_funct_Struct");
for( KeyValuePair element : array) {
handler.addCell(element);
Log.warn(handler);
}
return handler;
} else {
throw new IllegalArgumentException("jsonSimpleObjectToTab: do not support complex object" + data + "to Tab");
}
}
i also tryed before to type my array as a List, Object etc, without the keyValuePair object, i would appreciate some help.
Thanks again dear StackOverFlowers ;)
You can try this :
const json = '{"nom":"BRUN","prenom":"Albert","date_naiss":"10-10-1960","adr_email":"abrun#gmail.com","titre":"Mr","sexe":"F"}';
map = new Map();
const obj = JSON.parse(json,(key,value) => {
map.set(key,value)
});
and you'll have every pair stored in map
Simply split the whole line at the commas and then split the resulting parts at the colon. This should give you the individual parts for your names and values.
Try:
supposing
String input = "\"nom\":\"BRUN\",\"prenom\":\"Albert\"";
then
String[] nameValuePairs = input.split(",");
for(String pair : nameValuePairs)
{
String[] nameValue = pair.split(":");
String name = nameValue[0]; // use it as you need it ...
String value = nameValue[1]; // use it as you need it ...
}
You can use TypeReference to convert to Map<String,String> so that you have key value pair.
String json = "{\"nom\":\"BRUN\",\"prenom\":\"Albert\",\"date_naiss\":\"10-10-1960\",\"adr_email\":\"abrun#gmail.com\",\"titre\":\"Mr\",\"sexe\":\"F\"}";
ObjectMapper objectMapper = new ObjectMapper();
TypeReference<Map<String,String>> typeReference = new TypeReference<Map<String, String>>() {
};
Map<String,String> map = objectMapper.readValue(json, typeReference);
I just answered a very similar question. The gist of it is that you need to parse your Json String into some Object. In your case you can parse it to Map. Here is the link to the question with my answer. But here is a short version: you can use any Json library but the recommended ones would be Jackson Json (also known as faster XML) or Gson(by Google) Here is their user guide site. To parse your Json text to a class instance you can use ObjectMapper class which is part of Jackson-Json library. For example
public <T> T readValue(String content,
TypeReference valueTypeRef)
throws IOException,
JsonParseException,
JsonMappingException
See Javadoc. But also I may suggest a very simple JsonUtils class which is a thin wrapper over ObjectMapper class. Your code could be as simple as this:
Map<String, Object> map;
try {
map = JsonUtils.readObjectFromJsonString(input , Map.class);
} catch(IOException ioe) {
....
}
Here is a Javadoc for JsonUtils class. This class is a part of MgntUtils open source library written and maintained by me. You can get it as Maven artifacts or from the Github
I am attempting to parse the output of Apache Tika Server's rmeta web servivce endpoint: https://cwiki.apache.org/confluence/display/TIKA/TikaServer#TikaServer-RecursiveMetadataandContent
It's payloads look like the following:
[
{"Application-Name":"Microsoft Office Word",
"Application-Version":"15.0000",
"X-Parsed-By":["org.apache.tika.parser.DefaultParser","org.apache.tika.parser.microsoft.ooxml.OOXMLParser"],
"X-TIKA:content":"this content string can be many MB large"
...
},
{"Content-Encoding":"ISO-8859-1",
"Content-Length":"8",
"Content-Type":"text/plain; charset=ISO-8859-1"
"X-TIKA:content":"again, this content string can be many MB large",
...
}
...
]
As indicated, the X-TIKA:content strings can be quite oppressively large. Enough to OOM my JVM if I load the entire string into memory.
So if I were to use JsonParser.getText() like this:
private void parseRmetaResponse(CloseableHttpResponse response) {
ObjectMapper objectMapper = new ObjectMapper();
JsonFactory jsonFactory = objectMapper.getFactory();
JsonParser jsonParser = jsonFactory.createParser(response.getEntity().getContent());
JsonToken arrayStartToken = jsonParser.nextToken();
if (arrayStartToken != JsonToken.START_ARRAY) {
throw new IllegalStateException("The first element of the Json structure was expected to be a start array token, but it was: " + arrayStartToken);
}
JsonToken nextToken = jsonParser.nextToken();
while (nextToken != JsonToken.END_ARRAY) {
parseNextField(jsonParser);
}
}
private String getTextContents(JsonParser jsonParser, OutputStream os, Metadata metadata) throws IOException {
String nextAttr = jsonParser.nextFieldName();
if ("X-TIKA:content".equals(nextAttr)) {
return jsonParser.getText();
}
// ...
}
It would be prone to OOM crashes because I cannot load all of that string in memory without eating up all the JVM heap.
Instead I have a maximum number of chars parameter maxChars that I want to stop reading chars from X-TIKA:content after I reach that number.
How can I say "get me text, but only read up to maxChars characters, and discard any additional characters"?
I can use GSON, Fasterxml Jackson, or any other library that helps me do what I need to do here.
Instead of calling String getText(), you can call int getText(Writer writer).
Give it a custom Writer that works similar to StringWriter, but discards any characters beyond a given threshold.
The you would use it like this:
if ("X-TIKA:content".equals(nextAttr)) {
try (LimitedStringWriter writer = new LimitedStringWriter(maxParseChars)) {
jsonParser.getText(writer);
return writer.toString();
}
}
Writing the LimitedStringWriter class is your job to do.
Added by questioner (Nicholas DiPiazza):
Here is an example of an implementation you could use as an example: https://github.com/ow2-proactive/scheduling/blob/master/common/common-api/src/main/java/org/ow2/proactive/utils/BoundedStringWriter.java
I am trying to parse a xml using stax but the error I get is:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[414,47]
Message: The reference to entity "R" must end with the ';' delimiter.
Which get stuck on the line 414 which has P&Rinside the xml file. The code I have to parse it is:
public List<Vild> getVildData(File file){
XMLInputFactory factory = XMLInputFactory.newFactory();
try {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(Files.readAllBytes(file.toPath()));
XMLStreamReader reader = factory.createXMLStreamReader(byteArrayInputStream, "iso8859-1");
List<Vild> vild = saveVild(reader);
reader.close();
return vild;
} catch (IOException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
return Collections.emptyList();
}
private List<Vild> saveVild(XMLStreamReader streamReader) {
List<Vild> vildList = new ArrayList<>();
try{
Vild vild = new Vild();
while (streamReader.hasNext()) {
streamReader.next();
//Creating list with data
}
}catch(XMLStreamException | IllegalStateException ex) {
ex.printStackTrace();
}
return Collections.emptyList();
}
I read online that the & is invalid xml code but I don't know how to change it before it throws this error inside the saveVild method. Does someone know how to do this efficiently?
Change the question: you're not trying to parse an XML file, you're trying to parse a non-XML file. For that, you need a non-XML parser, and to write such a parser you need to start with a specification of the language you are trying to parse, and you'll need to agree the specification of this language with the other partners to the data interchange.
How much work you could all save by conforming to standards!
Treat broken XML arriving in your shop the way you would treat any other broken goods coming from a supplier: return it to sender marked "unfit for purpose".
The problem here, as you mention is that the parser finds the & and it expects also the ;
This gets fixed escaping the character, so that the parser finds & instead.
Take a look here for further reference
I have the following code to convert an object to Json:
public static Function<Object, Object> WRITE_JSON = (Object val) -> {
try {
return new ObjectMapper().writeValueAsString(val);
} catch (IOException e) {
// log exception
return "";
}
}
This works fine for most cases, but f.e I have an Avro class named AvroData, and a class that saves it:
class SomeData {
private AvroData avroData;
// more fields, getter/setter boilerplate, etc...
}
When I try to serialise the object to Json, this fails when trying to serialize the Avro field.
In reality, I have a bit more data, like Sets and Maps that contain Avro record values, but I think the point stands.
How do you manage to serialise a avro to json, but specifically when it's part of a Non-avro object?
To convert your Object val in JSON with Jackson:
ObjectWriter ow = new ObjectMapper().writer().withDefaultPrettyPrinter();
String json = ow.writeValueAsString(val);
hi I'm trying to get all 'id' value from my json into my 'results' array.
I didn't really understood how the json class of libgdx works, but I know how json works itself.
Here is the json : http://pastebin.com/qu71EnMx
Here is my code :
Array<Integer> results = new Array<Integer>();
Json jsonObject = new Json(OutputType.json);
JsonReader jsonReader = new JsonReader();
JsonValue jv = null;
JsonValue jv_array = null;
//
try {
String str = jsonObject.toJson(jsonString);
jv = jsonReader.parse(str);
} catch (SerializationException e) {
//show error
}
//
try {
jv_array = jv.get("table");
} catch (SerializationException e) {
//show error
}
//
for (int i = 0; i < jv_array.size; i++) {
//
try {
jv_array.get(i).get("name").asString();
results.add(new sic_PlayerInfos(
jv_array.get(i).get("id").asInt()
));
} catch (SerializationException e) {
//show error
}
}
Here is the error I get : 'Nullpointer' on jv_array.size
Doing it this way will result in a very hacky, not maintainable code. Your JSON file looks very simple but your code is terrible if you parse the whole JSON file yourself. Just imagine how it will look like if you are having more than an id, which is probably going to happen.
The much more clean way is object oriented. Create an object structure, which resembles the structure of your JSON file. In your case this might look like the following:
public class Data {
public Array<TableEntry> table;
}
public class TableEntry {
public int id;
}
Now you can easily deserialize the JSON with libgdx without any custom serializers, because libgdx uses reflection to handle most standard cases.
Json json = new Json();
json.setTypeName(null);
json.setUsePrototypes(false);
json.setIgnoreUnknownFields(true);
json.setOutputType(OutputType.json);
// I'm using your file as a String here, but you can supply the file as well
Data data = json.fromJson(Data.class, "{\"table\": [{\"id\": 1},{\"id\": 2},{\"id\": 3},{\"id\": 4}]}");
Now you've got a plain old java object (POJO) which contains all the information you need and you can process it however you want.
Array<Integer> results = new Array<Integer>();
for (TableEntry entry : data.table) {
results.add(entry.id);
}
Done. Very clean code and easily extendable.