Let's say I have a json that looks like this:
{"body":"abcdef","field":"fgh"}
Now suppose the value of the 'body' element is huge(~100 MB or more). I would like to stream out the value of the body element instead of storing it in a String.
How can I do this? Is there any Java library I could use for this?
This is the line of code that fails with an OutOfMemoryException when a large json value comes in:
String inputStreamString = (String) JsonPath.read(textValue.toString(), "$.body");
'textValue' here is a hadoop.io.Text object.
I'm assuming that the OutOfMemory error occurs because we try to do method calls like toString() (which creates a new object), and JsonPath.read(), all of which are done in-memory. I need to know if there is an approach I could take while handling large-sized textValue objects.
Please let me know if you need additional info.
JsonSurfer is good for processing very large JSON data with selective extraction.
Example how to surf in JSON data collecting matched values in the listeners:
BufferedReader reader = new BufferedReader(new FileReader(jsonFile));
JsonSurfer surfer = new JsonSurfer(GsonParser.INSTANCE, GsonProvider.INSTANCE);
SurfingConfiguration config = surfer.configBuilder().bind("$.store.book[*]", new JsonPathListener() {
#Override
public void onValue(Object value, ParsingContext context) throws Exception {
JsonObject book = (JsonObject) value;
}
}).build();
surfer.surf(reader, config);
Jackson offers a streaming API for generating and processing JSON data.
Related
I have an request object, that contains a huge amount of data. But there is a filter in my code, where I need to take out just one element. At the moment I am Deserializing the whole object, which seems overkill to just get one element
This is part of a zuul filter
import com.netflix.zuul.context.RequestContext;
RequestContext ct = RequestContext.getCurrentContext();
HttpServletRequest request = ctx.getRequest();
ObjectMapper mapper = new ObjectMapper();
ServletInputStream stream = null;
try {
stream = request.getInputStream();
GetPageRequest page = mapper.readValue(stream,GetPageRequest.class);
log.info("URL IN BODY "+page.getUrl());
It seems over kill to deserialize an entire object to get one element but I cant think of a more efficient and optomized way
At it's simplest the request payload can just be a string so you could read the input as a string and then parse what you want out using a regular expression or an indexOf or whatever suits best?
With thanks to everyone. I created this method. Streamed inputstream into a string Created a JSONObject which takes in and tokenizies the string
private String getURLFromRequest(ServletInputStream stream) throws IOException, JSONException {
String requestStr = IOUtils.toString(stream, "UTF-8");
JSONObject jsonObj = new JSONObject(requestStr);
return (String) jsonObj.get("url");
}
I have currently started a kind of diary project to teach myself how to code, which I write in Java. The project has a graphical interface which I realized with JavaFX.
I want to write data into a JSON file, which I enter into two text fields and a slider. Such a JSON entry should look like this:
{
"2019-01-13": {
"textfield1": "test1",
"textfield2": "test2",
"Slider": 2
}
}
I have already created a class in which the values can be passed and retrieved by the JSONWriter.
The class looks like this:
public class Entry {
private String date, textfield1, textfield2;
private Integer slider;
public String getDate() {
return date;
}
public void setDate(String date) {
this.date = date;
}
public String getTextfield1() {
return textfield1;
}
public void setTextfield1(String textfield1) {
this.textfield1 = textfield1;
}
public String getTextfield2() {
return textfield2;
}
public void setTextfield2(String textfield2) {
this.textfield2 = textfield2;
}
public Integer getSlider() {
return slider;
}
public void setSlider(Integer slider) {
this.slider= slider;
}
}
The code of the JSONWriter looks like this:
void json() throws IOException {
Gson gson = new GsonBuilder().setPrettyPrinting().create();
JsonWriter writer = new JsonWriter(new FileWriter("test.json",true));
JsonParser parser = new JsonParser();
Object obj = parser.parse(new FileReader("test.json"));
JsonObject jsonObject = (JsonObject) obj;
System.out.println(jsonObject);
writer.beginObject();
writer.name(entry.getDate());
writer.beginObject();
writer.name("textfield1").value(entry.getTextfield1());
writer.name("textfield2").value(entry.getTextfield2());
writer.name("Slider").value(entry.getSlider());
writer.endObject();
writer.endObject();
writer.close();
}
The date is obtained from the datepicker. Later I want to filter the data from the Json file by date and transfer the containing objects (textfield 1, textfiel2, slider) into the corresponding fields.
If possible, I would also like to try to overwrite the objects of a date. This means, if an entry of the date already exists and I want to change something in the entries, it should be replaced in the JSON file, so I can retrieve it later.
If you can recommend a better memory type for this kind of application, I am open for it. But it should also be compatible with databases later on. Later I would like to deal with databases as well.
So far I have no idea how to do this because I am still at the beginning of programming. I've been looking for posts that could cover the topic, but I haven't really found anything I understand.
You could start without JsonParser and JsonWriter and use Gson's fromJson(..) and toJson(..) because your current Json format is easily mapped as a map of entry POJOs.
Creating some complex implementation with JsonParser & JsonWriter might be more efficient for big amounts of data but in that point you already should have studied how to persist to db anyway.
POJOs are easy to manipulate and they can be later easily persisted to db - for example if you decide to use technology like JPA with only few annotations.
See below simple example:
#Test
public void test() throws IOException {
Gson gson = new GsonBuilder().setPrettyPrinting().create();
// Your current Json seems to be a map with date string as a key
// Create a corresponding type for gson to deserialize to
// correct generic types
Type type = new TypeToken<Map<String, Entry>>() {}.getType();
// Check this file name for your environment
String fileName = "src/test/java/org/example/diary/test.json";
Reader reader = new FileReader(new File(fileName));
// Read the whole diary to memory as java objects
Map<String, Entry> diary = gson.fromJson(reader, type);
// Modify one field
diary.get("2019-01-13").setTextfield1("modified field");
// Add a new date entry
Entry e = new Entry();
e.setDate("2019-01-14");
e.setScale(3);
e.setTextfield1("Dear Diary");
e.setTextfield1("I met a ...");
diary.put(e.getDate(), e);
// Store the new diary contents. Note that this one does not overwrite the
// original file but appends ".out.json" to file name to preserver the original
FileWriter fw = new FileWriter(new File(fileName + ".out.json"));
gson.toJson(diary, fw);
fw.close();
}
This should result test.json.out.json like:
{
"2019-01-13": {
"textfield1": "modified field",
"textfield2": "test2",
"Slider": 2
},
"2019-01-14": {
"date": "2019-01-14",
"textfield1": "Dear Diary",
"textfield2": "I met a ...",
"Slider": 3
}
}
Note that I also made little assumption about this:
// Just in case you meant to map "Slider" in Json as "scale"
#SerializedName("Slider")
private Integer scale;
I will give you general tips up to you to go deeper.
First of all, I recommend you this architecture that is common on web-applications or even desktop apps to get the front-end layer separately of back-end server:
Front-end (use Java Fx if you want). Tutorial: http://www.mastertheboss.com/jboss-frameworks/resteasy/rest-services-using-javafx-tutorial
Back-end (Java 1.8, Springboot, MySQL database). Example: there are tons of examples and tutorials using this stack, I recommend mykong or baeldung blogs.
The front-end will communicate to server over HTTP request through back-end REST API using JSON or XML format for messaging. In real life there are physically separated but just create 2 different java projects running on different ports.
For the back-end, just follow the tutorial to get up and running a REST API server. Set up MVC pattern: Controller layer, Service layer, Repository layer, model layer, dto layers, etc. For your specific model I recommend you the following:
selected_date: Date
inputs: Map of strings
size: Integer
On Front-end project with Java FX, just re-use the code you already wrote and add some CSS if you want. Use the components actions to call the back-end REST API to create, retrieve, update and delete your data from date-picker or whatever operation you want to do.
You will transform java objects into JSON strings permanently, I recommend you to use Gson library or Jackson library that do this in a direct way and it is not need to build the JsonObject manually. If you still want to write the JSON into a file, transform the java object into string (this is a string with the object written in JSON format) using the mentioned libraries, and then write the string into file. But I strongly believe it will more practice if you implement database.
Hope it helps
I am trying to use Jackson streaming API to deserialize huge objects from XML. The idea is to combine streaming API and ObjectMapper to parse XML(or JSON) by small chunks. However I see some inconsistent behavior with XML Parser.
With this code snippet:
try {
String xml1 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo></foo>";
String xml2 = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo><bar></bar></foo>";
XmlFactory xmlFactory = new XmlFactory();
JsonParser jp = xmlFactory.createParser(new ByteArrayInputStream(xml1.getBytes()));
JsonToken token = jp.nextToken();
while (token != null) {
System.out.println("xml1 token=" + token);
token = jp.nextToken();
}
jp = xmlFactory.createParser(new ByteArrayInputStream(xml2.getBytes()));
token = jp.nextToken();
while (token != null) {
System.out.println("xml2 token=" + token);
token = jp.nextToken();
}
} catch (IOException e) {
e.printStackTrace();
}
I am getting:
xml1 token=START_OBJECT
xml1 token=END_OBJECT
xml2 token=START_OBJECT
xml2 token=FIELD_NAME
xml2 token=VALUE_NULL
xml2 token=END_OBJECT
Why is the FIELD_NAME token missing for xml1? Why is there just one START_OBJECT token for the second xml? Is there any setting that would allow me to see FIELD_NAME of outer tag?
Problem is quite simple: XML module is different from most other Jackson dataformat modules in that direct access via Streaming API is not supported.
This is mentioned on project README (along with mention that "tree model" is similarly not supported).
Not supported does not necessarily mean "can not be used at all", just that its behavior is different from handling for JSON so callers really need to know what they are doing above and beyond API used for JSON content (and Smile, CBOR, YAML -- even CSV content is represented in a way that is compatible with JSON access).
While you can try to use XmlFactory and streaming parser/generator, its behavior is controlled by XmlMapper based on metadata from Java classes, to make things works correctly via databinding API (that is, XmlMapper).
With that, the reason for observed tokens is that such translation is necessary to map to expected Java object structure:
public class Foo {
public Bar bar;
}
which would map to JSON like:
json
{
"bar" : null
}
as well as XML of
xml
<foo>
<bar></bar>
</foo>
Another way to put this is that XML and JSON data models are fundamentally different, and they can not be trivially translated. Since Jackson's token model is based on JSON, some work is needed to translated XML elements and attributes into structure that equivalent JSON would have.
Above is not to say that what you try to do is impossible. There are 2 ways you might be able to make things work:
Knowing translation that XmlParser does, call getToken() expecting translation
Instead of using XmlParser directly, construct XMLStreamReader (Stax low-level streaming parser), read "raw" tokens, and construct separate XmlParser (via XmlFactory) at expected location, use that for reading.
I hope this helps.
A kid with a hammer...
I don't know much about Jackson; in fact, I just started using it, thinking of using JSON or YAML instead of XML. But for XML, we have been using XStream with success.
//Consumer side
FileInputStream fis = new FileInputStream(filename);
XStream xs = new XStream();
Object obj = xs.fromXML(fis);
fis.close();
Also, if the case is that you are also originating the serialization and it is from Java, you could use Java serialization altogether for a lower footprint and faster operation.
//producer side
FileOutputStream fos = new FileOutputStream(filename);
ObjectOutputStream oos = new ObjectOutputStream(new BufferedOutputStream(fos));
oos.writeObject(yourVeryComplexObjectStructure); //I am writing a list of ten 1MB objects
oos.flush();
oos.close();
fos.close();
//Consumer side
final FileInputStream fin = new FileInputStream(filename);
final ObjectInputStream ois = new ObjectInputStream(new BufferedInputStream(fin));
#SuppressWarnings("unchecked")
final YourVeryComplexObjectStructureType object = (YourVeryComplexObjectStructureType) ois.readObject();
ois.close();
fin.close();
I have been trying to create a Json String with a large amount document but using the below code but i get out of range or have to wait till up to 5min b4 the String is greated any idiea how i could optimise the code?
public String getJson() throws NotesException {
...
View view1 = ...;
ViewNavigator nav =view1.createViewNav();
ViewEntry ve = nav.getFirst();
JSONObject jsonMain = new JSONObject();
JSONArray items = new JSONArray();
Document docRoot = null
while (ve != null) {
docRoot= ve.getDocument();
items.add(getJsonDocAndChildren(docRoot));
ViewEntry veTemp = nav.getNextSibling(ve);
ve.recycle();
ve = docTemp;
}
jsonMain.put("identifier", "name");
jsonMain.put("label", "name");
jsonMain.put("items", items);
return jsonMain.toJSONString();
}
private JSONObject getJsonDocAndChildren(Document doc) throws NotesException {
String name = doc.getItemValueString("Name");
JSONObject jsonDoc = new JSONObject();
jsonDoc.put("name", name);
jsonDoc.put("field", doc.getItemValueString("field"));
DocumentCollection responses = doc.getResponses();
JSONArray children = new JSONArray();
getDocEntry(name,children);//this add all doc that has the fieldwith the same value name to children
if (responses.getCount() > 0) {
Document docResponse = responses.getFirstDocument();
while (docResponse != null) {
children.add(getJsonDocAndChildren(docResponse));
Document docTemp = responses.getNextDocument(docResponse);
docResponse.recycle();
docResponse = docTemp;
}
}
jsonDoc.put("children", children);
return jsonDoc;
}
There are a few things here, ranging from general efficiency to optimizations based on how you want to use the code.
The big one that would likely speed up your processing would be to do view operations only, without cracking open the documents. Since it looks like you want to get responses indiscriminately, you could add the response documents to the original view, with the "Show responses in hierarchy" option turned on. Then, if you have columns for Name and field in the view (and no "Show responses only") columns, then a nav.getNext() walk down the view will get them in turn. By storing the entry.getIndentLevel() value for each previous entry and comparing it at the start of the loop, you could "step" up and down the JSON tree: when the indent level increases by one, create a new array and add it to the existing object; when it decreases, step up one. It may be a little conceptually awkward at first, having to track previous states in a flat loop, but it'd be much more efficient.
Another option, also having the benefit of not having to crack open each individual document, would be to have a view of the response documents categorized by #Text($REF) and then making your recursive method look more like:
public static void walkTree(final View treeView, final String documentId) {
ViewNavigator nav = treeView.createViewNavFromCategory(documentId);
nav.setBufferMaxEntries(400);
for (ViewEntry entry : nav) {
// Do code here
walkTree(treeView, entry.getUniversalID(), callback);
}
}
(That example is using the OpenNTF Domino API, but, if you're not using that, you could down-convert the for loop to the legacy style)
As a minor improvement any time you traverse through ViewNavigators, you can set view.setAutoUpdate(false) and then nav.setBufferMaxEntries(400) to improve the internal caching.
And finally, depending on your needs - say, if you're outputting the JSON directly to an HTTP response's output stream - you could use JsonWriter instead of JsonObject to stream the content out instead of building a huge object in memory. I wrote about it with some simple code here: https://frostillic.us/blog/posts/EF0B875453B3CFC285257D570072F78F
You should first determine where the time is spent in your code. Maybe it is in doc.getResponses() or responses.getNextDocument() which you did not show here.
The obvious optimization which could be done within your code snippet is the following:
Basically you have some data structure called Document and build up a corresponding in memory JSON structure consisting of JSONObjects and JSONArrays. This JSON structure is then serialized to a String and returned.
Instead of building the JSON structure you could directly use a JsonWriter (don't know what JSON library you are using but there must be something like a JsonWriter). This avoids the memory allocations for the temporary JSON structure.
In getJson() you start:
StringWriter stringOut = new StringWriter();
JsonWriter out = new JsonWriter(stringOut);
and end
return stringOut.toString();
Now everywhere where you creating JSONObjects or JSONArrays you invoke corresponding writer methods. e.g.
private void getJsonDocAndChildren(Document doc, JsonWriter out) throws NotesException {
out.name("name");
out.value(doc.getItemValueString("Name"));
out.name("field");
out.value(doc.getItemValueString("field"));
DocumentCollection responses = doc.getResponses();
if (responses.getCount() > 0) {
Document docResponse = responses.getFirstDocument();
out.startArray();
...
Hope you get the idea.
I'm working on an open-source, cross-platform pomodoro timer with statistics support.
For tasks, I have a tree data structure like this:
class Task {
String name;
int minutesWorkedOn;
int uniqueID;
Task parent;
...
ArrayList<Task> childTasks; //Note, not binary, but can have n-children.
}
(which is actually a bit bigger in practice)
I want to store this data structure in a file between sessions.
I was considering JSON or xml, and recurse for childTasks, or write all tasks out, one task per line and piece things back together by taskID's. But JSON/XML is not a hard-requirement, I'm just thinking out loud.
Some S.O answers mention serialization, but preferably I'd like to be able to see the stored data structure as is the case with JSON or XML. Also those two formats would make it easier to build reporting tools.
Considering I'm new to java and haven't worked with File/I/O before, can someone give me a tip/advise on which route to take here?
[edit]
The solution below works well. There is an issue with loops thou. I edited the code above, a task has a backwards link to it's parent. This causes gson to crash. I might ignore this field and fix it again after the data was loaded or maybe read some more about the tutorial.
The best and easy way is to use Gson to write/read the object to a file.
Write:
//Get the json serialization of the task object
GsonBuilder builder = new GsonBuilder();
//builder.setPrettyPrinting().serializeNulls(); //optional
Gson gson = builder.create();
String json = gson.toJson(task);
try {
//write json string to a file named "/tmp/task.json"
FileWriter writer = new FileWriter("/tmp/task.json");
writer.write(json);
writer.close();
} catch (IOException e) {e.printStackTrace();}
Read:
Gson gson = new Gson();
try {
BufferedReader br = new BufferedReader(new FileReader("/tmp/task.json"));
//convert the json string from file back to object
Task task = gson.fromJson(br, Task.class);
} catch (IOException e) {
e.printStackTrace();
}