I have a csv
id,name,description,price,date,name,address
1,SuperCsv,Write csv file,1234.56,28/03/2016,amar,jp nagar
I want to read it and store it to json file.
I have created two bean course(id,name,description,price,date) and person(name,address)
on reading by bean reader i'm not able to set the person address.
The (beautified) output is
Course [id=1,
name=SuperCsv,
description=Write csv file,
price=1234.56,
date=Mon Mar 28 00:00:00 IST 2016,
person=[
Person [name=amar, address=null],
Person [name=null, address=jpnagar]
]
]
I want the adress to set with name
My code:
public static void readCsv(String csvFileName) throws IOException {
ICsvBeanReader beanReader = null;
try {
beanReader = new CsvBeanReader(new FileReader(csvFileName), CsvPreference.STANDARD_PREFERENCE);
// the header elements are used to map the values to the bean (names must match)
final String[] header = beanReader.getHeader(true);
final CellProcessor[] processors = getProcessors();
final String[] fieldMapping = new String[header.length];
for (int i = 0; i < header.length; i++) {
if (i < 5) {
// normal mappings
fieldMapping[i] = header[i];
} else {
// attribute mappings
fieldMapping[i] = "addAttribute";
}}
ObjectMapper mapper=new ObjectMapper();
Course course;
List<Course> courseList=new ArrayList<Course>();
while ((course = beanReader.read(Course.class, fieldMapping, processors)) != null) {
// process course
System.out.println(course);
courseList.add(course);
}
private static CellProcessor[] getProcessors(){
final CellProcessor parsePerson = new CellProcessorAdaptor() {
public Object execute(Object value, CsvContext context) {
return new Person((String) value,null);
}
};
final CellProcessor parsePersonAddress = new CellProcessorAdaptor() {
public Object execute(Object value, CsvContext context) {
return new Person(null,(String) value);
}
};
return new CellProcessor[] {
new ParseInt(),
new NotNull(),
new Optional(),
new ParseDouble(),
new ParseDate("dd/MM/yyyy"),
new Optional(parsePerson),
new Optional(parsePersonAddress)
};
SuperCSV is the first parser I have seen that lets you create an object within an object.
for what you are wanting you can try Apache Commons CSV or openCSV (CSVToBean) to map but to do this you need to have the setters of the inner class (setName, setAddress) in the outer class so the CSVToBean to pick it up. That may or may not work.
What I normally tell people is to have a plain POJO that has all the fields in the csv - a data transfer object. Let the parser create that then use a utility/builder class convert the plain POJO into the nested POJO you want.
Related
I am trying to read the events from a large JSON file one-by-one using the Jackson JsonParser. I would like to store each event temporarily in an Object something like JsonObject or any other object which I later want to use for some further processing.
I was previously reading the JSON events one-by-one and storing them into my own custom context: Old Post for JACKSON JsonParser Context which is working fine. However, rather than context, I would like to store them into jsonObject or some other object one by one.
Following is my sample JSON file:
{
"#context":"https://context.org/context.jsonld",
"isA":"SchoolManagement",
"format":"application/ld+json",
"schemaVersion":"2.0",
"creationDate":"2021-04-21T10:10:09+00:00",
"body":{
"members":[
{
"isA":"student",
"name":"ABCS",
"class":10,
"coaching":[
"XSJSJ",
"IIIRIRI"
],
"dob":"1995-04-21T10:10:09+00:00"
},
{
"isA":"teacher",
"name":"ABCS",
"department":"computer science",
"school":{
"name":"ABCD School"
},
"dob":"1995-04-21T10:10:09+00:00"
},
{
"isA":"boardMember",
"name":"ABCS",
"board":"schoolboard",
"dob":"1995-04-21T10:10:09+00:00"
}
]
}
}
At a time I would like to store only one member such as student or teacher in my JsonObject.
Following is the code I have so far:
What's the best way to store each event in an Object which I can later use for some processing.
Then again clear that object and use it for the next event?
public class Main {
private JSONObject eventInfo;
private final String[] eventTypes = new String[] { "student", "teacher", "boardMember" };
public static void main(String[] args) throws JsonParseException, JsonMappingException, IOException, JAXBException, URISyntaxException {
// Get the JSON Factory and parser Object
JsonFactory jsonFactory = new JsonFactory();
JsonParser jsonParser = jsonFactory.createParser(new File(Main.class.getClassLoader().getResource("inputJson.json").toURI()));
JsonToken current = jsonParser.nextToken();
// Check the first element is Object
if (current != JsonToken.START_OBJECT) {
throw new IllegalStateException("Expected content to be an array");
}
// Loop until the start of the EPCIS EventList array
while (jsonParser.nextToken() != JsonToken.START_ARRAY) {
System.out.println(jsonParser.getCurrentToken() + " --- " + jsonParser.getCurrentName());
}
// Goto the next token
jsonParser.nextToken();
// Call the method to loop until the end of the events file
eventTraverser(jsonParser);
}
// Method which will traverse through the eventList and read event one-by-one
private static void eventTraverser(JsonParser jsonParser) throws IOException {
// Loop until the end of the EPCIS events file
while (jsonParser.nextToken() != JsonToken.END_OBJECT) {
//Is there a possibility to store the complete object directly in an JSON Object or I need to again go through every token to see if is array and handle it accordingly as mentioned in my previous POST.
}
}
}
After trying some things I was able to get it working. I am posting the whole code as it can be useful to someone in the future cause I know how frustrating it is to find the proper working code sample:
public class Main
{
public void xmlConverter (InputStream jsonStream) throws IOException,JAXBException, XMLStreamException
{
// jsonStream is the input JSOn which is normally passed by reading the JSON file
// Get the JSON Factory and parser Object
final JsonFactory jsonFactory = new JsonFactory ();
final JsonParser jsonParser = jsonFactory.createParser (jsonStream);
final ObjectMapper objectMapper = new ObjectMapper ();
//To read the duplicate keys if there are any key duplicate json
final SimpleModule module = new SimpleModule ();
module.addDeserializer (JsonNode.class, new JsonNodeDupeFieldHandlingDeserializer ());
objectMapper.registerModule (module);
jsonParser.setCodec (objectMapper);
// Check the first element is Object if not then invalid JSON throw error
if (jsonParser.nextToken () != JsonToken.START_OBJECT)
{
throw new IllegalStateException ("Expected content to be an array");
}
while (!jsonParser.getText ().equals ("members"))
{
//Skipping the elements till members key
// if you want you can do some process here
// I am skipping for now
}
// Goto the next token
jsonParser.nextToken ();
while (jsonParser.nextToken () != JsonToken.END_ARRAY)
{
final JsonNode jsonNode = jsonParser.readValueAsTree ();
//Check if the JsonNode is valid if not then exit the process
if (jsonNode == null || jsonNode.isNull ())
{
System.out.println ("End Of File");
break;
}
// Get the eventType
final String eventType = jsonNode.get ("isA").asText ();
// Based on eventType call different type of class
switch (eventType)
{
case "student":
final Student studentInfo =
objectMapper.treeToValue (jsonNode, Student.class);
//I was calling the JAXB Method as I was doing the JSON to XML Conversion
xmlCreator (studentInfo, Student.class);
break;
case "teacher":
final Teacher teacherInfo =
objectMapper.treeToValue (jsonNode, Teacher.class);
xmlCreator (teacherInfo, Teacher.class);
break;
}
}
}
//Method to create the XML using the JAXB
private void xmlCreator (Object eventInfo,
Class eventType) throws JAXBException
{
private final StringWriter sw = new StringWriter ();
// Create JAXB Context object
JAXBContext context = JAXBContext.newInstance (eventType);
// Create Marshaller object from JAXBContext
Marshaller marshaller = context.createMarshaller ();
// Print formatted XML
marshaller.setProperty (Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
// Do not add the <xml> version tag
marshaller.setProperty (Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
// XmlSupportExtension is an interface that every class such as Student Teacher implements
// xmlSupport is a method in XmlSupportExtension which has been implemented in all classes
// Create the XML based on type of incoming event type and store in SW
marshaller.marshal (((XmlSupportExtension) eventInfo).xmlSupport (),
sw);
// Add each event within the List
eventsList.add (sw.toString ());
// Clear the StringWritter for next event
sw.getBuffer ().setLength (0);
}
}
This is the class that overrides the JACKSON class.
This can be used if your Json has duplicate JSON keys. Follow this post for the complete explnation if you need. If you dont need then skip this part and remove the part of the code module from the above class:
Jackson #JsonAnySetter ignores values of duplicate key when used with Jackson ObjectMapper treeToValue method
#JsonDeserialize(using = JsonNodeDupeFieldHandlingDeserializer.class)
public class JsonNodeDupeFieldHandlingDeserializer extends JsonNodeDeserializer {
#Override
protected void _handleDuplicateField(JsonParser p, DeserializationContext ctxt, JsonNodeFactory nodeFactory, String fieldName,
ObjectNode objectNode, JsonNode oldValue, JsonNode newValue) {
ArrayNode asArrayValue = null;
if (oldValue.isArray()) {
asArrayValue = (ArrayNode) oldValue;
} else {
asArrayValue = nodeFactory.arrayNode();
asArrayValue.add(oldValue);
}
asArrayValue.add(newValue);
objectNode.set(fieldName, asArrayValue);
}
}
I have an excel sheet with a standard format that will have multiple sheets. I need to read the sheets in Java and convert them to Objects and store in the database. I and currently using the Simple Factory pattern for this. I am reading the sheet name and invoking the corresponding converter objects based on the sheet name. Is this the only way to achieve this or is there a better way. Below is the sample code.
FileInputStream fis = new FileInputStream(new File("test.xls"));
HSSFWorkbook hssfWorkbook = new HSSFWorkbook(fis);
int numberOfSheets = hssfWorkbook.getNumberOfSheets();
for (int i=0; i<numberOfSheets; i++) {
HSSFSheet sh = hssfWorkbook.getSheetAt(i);
String sname = sh.getSheetName();
switch (sname) {
case "Test" : //go to test convertor;
case "Test1": //go to test1 convertor
}
}
I'd go with the strategy design pattern.
You could have a ISheetConverter and multiple implementations defining its own conversion logic.
It'd look something like:
#FunctionalInterface
inferface ISheetConverter {
POJOForSheet convert(SheetStream sheetFromIO);
}
Also your existing factory class can return appropriate ISheetConverter class based on the sh.getSheetName() (which will do the conversion later-on).
Of course, the STRATEGY pattern is the befitting one here.
The complete code goes below (though in C#) but it shall explain the implementation style
class Program
{
private static void Main(string[] args)
{
IStrategyManager manager = new StrategyManager(); // this code is added
FileInputStream fis = new FileInputStream(new File("test.xls"));
HSSFWorkbook hssfWorkbook = new HSSFWorkbook(fis);
int numberOfSheets = hssfWorkbook.getNumberOfSheets();
for (int i = 0; i < numberOfSheets; i++)
{
HSSFSheet sh = hssfWorkbook.getSheetAt(i);
String sname = sh.getSheetName();
manager.Manage(sname); // this code is added
}
}
}
The StrategyManager stores all the converters
interface IStrategyManager
{
void Manage(string sheetName);
}
class StrategyManager : IStrategyManager
{
private Dictionary<string, IConverter> dictionary = new Dictionary<string, IConverter>();
public StrategyManager()
{
// store converter name & its implementation class name which are one and same
dictionary.Add("Test", new Test());
dictionary.Add("Test2", new Test2());
}
public void Manage(string sheetName)
{
dictionary[sheetName].Convert(sheetName); // magic happens here
}
}
Various type of converter implementation goes below
interface IConverter
{
void Convert(string sheetName);
}
class Test : IConverter
{
public void Convert(string sheetName)
{
/* conversion logic for test converter goes here*/
}
}
class Test2 : IConverter
{
public void Convert(string sheetName)
{
/* conversion logic for test2 converter goes here*/
}
}
#user3310115, Hope it helps ,if any clarity needed, please get hold of me.Thanks
I'm creating a RDD in 1st part of the application, then converting it to a list using rdd.collect().
But for some reason the list size is coming as 0 in the second part of the application , while the RDD from which I'm creating the list is not empty.Even rdd.toArray() is giving empty list.
Below is my program.
public class Query5kPids implements Serializable{
List<String> ListFromS3 = new ArrayList<String>();
public static void main(String[] args) throws JSONException, IOException, InterruptedException, URISyntaxException {
SparkConf conf = new SparkConf();
conf.setAppName("Spark-Cassandra Integration");
conf.set("spark.cassandra.connection.host", "12.16.193.19");
conf.setMaster("yarn-cluster");
SparkConf conf1 = new SparkConf().setAppName("SparkAutomation").setMaster("yarn-cluster");
Query5kPids app1 = new Query5kPids(conf1);
app1.run1(file);
Query5kPids app = new Query5kPids(conf);
System.out.println("Both RDD has been generated");
app.run();
}
private void run() throws JSONException, IOException, InterruptedException {
JavaSparkContext sc = new JavaSparkContext(conf);
query(sc);
sc.stop();
}
private void run1(File file) throws JSONException, IOException, InterruptedException {
JavaSparkContext sc = new JavaSparkContext(conf);
getData(sc,file);
sc.stop();
}
private void getData(JavaSparkContext sc, File file) {
JavaRDD<String> Data = sc.textFile(file.toString());
System.out.println("RDD Count is " + Data.count());
// here it prints some count value
ListFromS3 = Data.collect();
// ListFromS3 = Data.toArray();
}
private void query(JavaSparkContext sc) {
System.out.println("RDD Count is " + ListFromS3.size());
// Prints 0
// So cant convert the list to RDD
JavaRDD<String> rddFromGz = sc.parallelize(ListFromS3);
}
}
NOTE -> In the actual program , the RDD and List is of type.
List<UserSetGet> ListFromS3 = new ArrayList<UserSetGet>();
JavaRDD<UserSetGet> Data = new ....
where UserSetGet is a Pojo , With Setter and getter methods, and its Serializable.
app1.run1 puts the RDD contents into app1.ListFromS3. Then you look at app.ListFromS3, which is empty. app1.ListFromS3 and app.ListFromS3 are fields on two different objects. Setting one does not set the other.
I think you meant ListFromS3 to be static, meaning it belongs to the Query5kPids class, not to a particular instance. Like this:
static List<String> ListFromS3 = new ArrayList<String>();
I am creating a Spark job in Java. Here is my code.
I am trying to filter records from a CSV file. Header contains fields OID, COUNTRY_NAME, ......
Instead of just filtering based on s.contains("CANADA"), I would like to be more specific, like I want to filter based on COUNTRY_NAME.equals("CANADA").
Any thoughts on how I can do this?
public static void main(String[] args) {
String gaimFile = "hdfs://xx.yy.zz.com/sandbox/data/acc/mydata";
SparkConf conf = new SparkConf().setAppName("Filter App");
JavaSparkContext sc = new JavaSparkContext(conf);
try{
JavaRDD<String> gaimData = sc.textFile(gaimFile);
JavaRDD<String> canadaOnly = gaimData.filter(new Function<String, Boolean>() {
private static final long serialVersionUID = -4438640257249553509L;
public Boolean call(String s) {
// My file id csv with header OID, COUNTRY_NAME, .....
// here instead of just saying s.contains
// i would like to be more specific and say
// if COUNTRY_NAME.eqauls("CANADA)
return s.contains("CANADA");
}
});
}
catch(Exception e){
System.out.println("ERROR: G9 MatchUp Failed");
}
finally{
sc.close();
}
}
You will have to map your values into a custom class first:
rdd.map(lines=>ConvertToCountry(line))
.filter(country=>country == "CANADA")
class Country{
...ctor that takes an array and fills properties...
...properties for each field from the csv...
}
ConvertToCountry(line: String){
return new Country(line.split(','))
}
The above is a combination of Scala and pseudocode, but you should get the point.
I am struggling with the creation of a SynonymFilter that I try to create programmatically. How are you supposed to tell the filter where the synonym list is?
I am using Hibernate Search, but I don't want to use the #AnalyzerDef annotation.
All I can do is pass a synonym map?
private class AllAnalyzer extends Analyzer {
private SynonymFilterFactory synonymFilterFactory = new SynonymFilterFactory();
public AllAnalyzer() {
ClassLoader classLoader = getClass().getClassLoader();
String filePath = classLoader.getResource("synonyms.txt").getFile();
HashMap<String, String> stringStringHashMap = new HashMap<String, String>();
stringStringHashMap.put("synonyms", filePath);
stringStringHashMap.put("format", "solr");
stringStringHashMap.put("ignoreCase", "false");
stringStringHashMap.put("expand", "true");
stringStringHashMap.put("luceneMatchVersion", Version.LUCENE_36.name());
synonymFilterFactory.init(stringStringHashMap);
}
#Override
public TokenStream tokenStream(String fieldName, Reader reader) {
TokenStream result = null;
result = new StandardTokenizer(Version.LUCENE_36, reader);
result = new StandardFilter(Version.LUCENE_36, result);
result = synonymFilterFactory.create(result);
return result;
}
}
Unable to get it to work. When I debug it says that the map is null and I get a NPE. What is wrong?
Yes, you need to pass a SynonymMap to the SynonymFilter.
Sounds like you want to populate it from a file, so you'll likely want to use SolrSynonymParser to generate it. Along the lines of:
SolrSynonymParser parser = new SolrSynonymParser(true, false, analyzer);
Reader synonymFileReader = new FileRader(new File(path));
parser.add(synonymFileReader);
SynonymMap map = parser.build(); // SolrSynonymParser extends SynonymMap.Builder