I've got method which returns me a Map from an XML file. I've converted that map to separate Keys and Values into List.
However I'm noticing there are newline characters in the values list. How can I strip out the newline and replace them with a space or leave them blank.
Code:
#Test
public void testGetXMLModelData() throws Exception {
File f = new File("xmlDir/example.xml");
Model m = getXMLModelData(f);
logger.debug("Models Keys: "+m.getInputs());
logger.debug("Models Values: "+m.getValues());
}
public Model getXMLModelData(File f) throws Exception {
Model model = new Model();
Map<String,String> map = p(f);
List<String> listKeys = new ArrayList<String>(map.keySet());
List<String> listValues = new ArrayList<String>(map.values());
model.setInputs(listKeys);
model.setValues(listValues);
return model;
}
public Map<String, String> p(File file) throws Exception {
Map<String, String> map = new HashMap<String,String>();
XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(new FileInputStream(file));
while(xr.hasNext()) {
int e = xr.next();
if (e == XMLStreamReader.START_ELEMENT) {
String name = xr.getLocalName();
xr.next();
String value = null;
try {
value = xr.getText();
} catch (IllegalStateException exep) {
exep.printStackTrace();
}
map.put(name, value);
}
}
return map;
}
Output:
2015-08-19 20:13:52,327 : Models Keys: [IRS1095A, MonthlyPlanPremiumAmtPP, WagesSalariesAndTipsAmt, MonthlyAdvancedPTCAmtPP, MonthCdPP, ReturnData, IndividualReturnFilingStatusCd, PrimaryResidentStatesInfoGrpPP, MonthlyPTCInformationGrpPP, IRS1040, ResidentStateInfoPP, SelfSelectPINGrp, MonthlyPremiumSLCSPAmtPP, Filer, ResidentStateAbbreviationCdPP, PrimaryBirthDt, Return, ReturnHeader, TotalExemptionsCnt, AdjustedGrossIncomeAmt, PrimarySSN]
2015-08-19 20:13:52,328 : Models Values: [
, 136, 22000, 125, SEPTEMBER,
, 1,
,
,
,
,
, 250,
, CA, 1970-01-01,
,
, 1, 22000, 555-11-2222]
Any help or assistance would be much appreciated. Thanks in advance
Edit:
XML file
<Return xmlns="http://www.irs.gov/efile">
<ReturnData>
<IRS1095A uuid="a77f40a2-af31-4404-a27d-4c1eaad730c2">
<MonthlyPTCInformationGrpPP uuid="69dc9dd5-5415-4ee4-a199-19b2dbb701be">
<MonthlyPlanPremiumAmtPP>136</MonthlyPlanPremiumAmtPP>
<MonthlyAdvancedPTCAmtPP>125</MonthlyAdvancedPTCAmtPP>
<MonthCdPP>SEPTEMBER</MonthCdPP>
<MonthlyPremiumSLCSPAmtPP>250</MonthlyPremiumSLCSPAmtPP>
</MonthlyPTCInformationGrpPP>
</IRS1095A>
<IRS1040>
<IndividualReturnFilingStatusCd>1</IndividualReturnFilingStatusCd>
<WagesSalariesAndTipsAmt>22000</WagesSalariesAndTipsAmt>
<TotalExemptionsCnt>1</TotalExemptionsCnt>
<AdjustedGrossIncomeAmt>22000</AdjustedGrossIncomeAmt>
</IRS1040>
</ReturnData>
<ReturnHeader>
<SelfSelectPINGrp>
<PrimaryBirthDt>1970-01-01</PrimaryBirthDt>
</SelfSelectPINGrp>
<Filer>
<PrimarySSN>555-11-2222</PrimarySSN>
<PrimaryResidentStatesInfoGrpPP>
<ResidentStateInfoPP uuid="a77f40a2-af31-4404-a27d-4c1eaad730c2">
<ResidentStateAbbreviationCdPP>CA</ResidentStateAbbreviationCdPP>
</ResidentStateInfoPP>
</PrimaryResidentStatesInfoGrpPP>
</Filer>
</ReturnHeader>
</Return>
Set value = xr.getText().trim(). That will trim extraneous characters from the beginning and end of the values.
To then prevent adding the value, wrap the map.put(name, value) with an if (value != null && !value.isEmpty())
Your code is extracting the element name and the text immediately following the start element, ignoring any text following an end element.
So, it collects:
Return = <newline><space><space>
ReturnData = <newline><space><space><space><space>
IRS1095A = <newline><space><space><space><space><space><space>
MonthlyPTCInformationGrpPP = <newline><space><space><space><space><space><space><space><space>
MonthlyPlanPremiumAmtPP = 136
...
And then you add those to a HashMap, which shuffles the key/value pairs in random order, making it difficult to see what happened.
Updated
I'm not going to write the code for you, but if you want "value elements" then you need to:
Remember start element when seen
Collect any text, concatenating with other text already collected, e.g. when you see <text><cdata><text>
When seeing a start element and a start element is remembered, verify text is empty or all whitespace, then discard text
When seeing an end element:
if start element is remembered, add elementName/text to result, then forget start element and discard text. Note: Don't use map if same element name can occur more than once.
if start element is not remembered (was forgotton), verify text is empty or all whitespace, then discard text
This will collect just the leaf elements, ignoring any "layout".
Code exactly as written above
Well, I did add missing resource cleanup.
Map<String, String> map = new HashMap<>();
try (FileInputStream in = new FileInputStream(file)) {
XMLStreamReader xr = XMLInputFactory.newInstance().createXMLStreamReader(in);
try (
String elementName = null;
StringBuilder textBuf = new StringBuilder();
while (xr.hasNext()) {
switch (xr.next()) {
case XMLStreamConstants.START_ELEMENT:
// 3. When seeing a start element and a start element is remembered
if (elementName != null) {
// verify text is empty or all whitespace
if (! textBuf.toString().trim().isEmpty())
throw new IllegalArgumentException("Found text mixed with elements");
// then discard text
textBuf.setLength(0);
}
// 1. Remember start element when seen
elementName = xr.getLocalName();
break;
case XMLStreamConstants.CHARACTERS:
case XMLStreamConstants.CDATA:
case XMLStreamConstants.SPACE:
// 2. Collect any text
textBuf.append(xr.getText());
break;
case XMLStreamConstants.END_ELEMENT: // 4. When seeing an end element
if (elementName != null) { // 1. if start element is remembered
// add elementName/text to result
map.put(elementName, textBuf.toString());
// then forget start element
elementName = null;
// and discard text
textBuf.setLength(0);
} else { // 2. if start element is not remembered (was forgotton)
// verify text is empty or all whitespace
if (! textBuf.toString().trim().isEmpty())
throw new IllegalArgumentException("Found text mixed with elements");
// then discard text
textBuf.setLength(0);
}
break;
default:
// ignore
}
}
} finally {
xr.close();
}
}
return map;
Related
I am new to Java and practicing parsing csv file into the object. I've tried but cannot figure it out.
The file looks like this:
[0], [1], [2], [3] , [4] , [5] , [6] , [7] , [8] , [9]
class, gender, age, bodyType, profession, pregnant, isYou ,species, isPet, role
scenario:green, , , , , , , ,
person, female, 24, average , , FALSE , , , , passenger
animal, male , 4, , , FALSE , , dog , TRUE , pedestrian
scenario:red
person, male , 16, athletic, boxer , FALSE , TRUE , , , passenger
person, female, 25, athletic, doctor , TRUE , FALSE , , , pedestrian
I need to parse it by any number of passengers and pedestrians with any scenarios. Finally, add these scenarios into an ArrayList for analyzing.
What I think is to:
loop through each line, stops when reaches to the next scenario:red, adds the passengers and the pedestrians to the Character ArrayList. (I've done adding, but don't how to stop).
Create a scenario using constructor scenario(ArrayList<Character> passenger, ArrayList<Character> pedestrians, boolean redOrGreen);
The ArrayList scenarios add the created scenarios.
What I've done is put everything together instead of separate them. Any help or hint is highly appreciated.
Thanks for this community who helped me, here is what I've got so far.
public void loadCsv() throws IOException {
String csvFile = "config.csv";
String line = "";
String csvSplit = "\\s*,\\s*";
Scenario scenario = new Scenario();
Person person = new Person();
Animal animal = new Animal();
ArrayList<Scenario> scenaios = new ArrayList<Scenario>();
ArrayList<String> csvContents = new ArrayList<String>();
ArrayList<Character> passengers = new ArrayList<Character>();
ArrayList<Character> pedestrians = new ArrayList<Character>();
try (BufferedReader csvReader = new BufferedReader(new FileReader(csvFile));) {
String headerLine = csvReader.readLine(); //get rid of the header
//add each line to the arrayList
while ((line = csvReader.readLine()) != null) {
csvContents.add(line);
}
for(String csvLine : csvContents) {
String[] data = csvLine.split(csvSplit); // split by comma and remove redundant spaces
if (data.length == NO_OF_FIELD) { //check and avoid indexOutOfBoundException
String clazz = data[0].toLowerCase();// cannot use word "class" as a variable
if (clazz.startsWith("scenario") && data.length == 1) {
scenario = new Scenario();
scenario.setLegalCrossing(clazz.endsWith("green"));
continue;
}
else if ("person".equals(clazz) && data.length ==10) {
person = loadCsvPerson(data);
addCharacter(person, data);
}
else if ("animal".equals(clazz) && data.length ==10) {
animal = loadCsvAnimal(data);
addCharacter(animal, data);
}
}
}
}
//passenger and pedestrians are in position
System.out.println("passengers: " + passengers);
System.out.println("pedestrians: " + pedestrians);
if (null != scenario) {
scenario.setPassengers(passengers);
scenario.setPedestrians(pedestrians);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
If it is possible to change the csv file format, I would add scenario type column (and scenario id or name if required), so you can work with csv file as a result set from database when you join tables (scenario + passenger + pedestrian) and return plain rows.
With this approach you will be able to delegate parsing to any csv library and do your logic (group by scenario id/name/type) separately. With surrogate rows you have (scenario:green...) you have to write your custom parser.
For example, you can use univocity to simply parse file into your model (even using annotations) and iteratively group it and handle.
Or if you need to work with existing file format do something like that:
if (clazz.startsWith("scenario") && data.length == 1) {
// collect existing scenario before starting processing new one
if (scenario != null) {
scenario.setPassengers(passengers);
scenario.setPedestrians(pedestrians);
passengers = new ArrayList();
pedestrians = new ArrayList();
scenarios.add(scenario);
}
// now start new group (scenario)
scenario = new Scenario();
scenario.setLegalCrossing(clazz.endsWith("green"));
continue;
}
Following things need to be addressed in your code:
Strive to avoid using the name of a class which is already used by the standard library (and especially when it is in the default package, java.lang) e.g. there is already a class Character in Java library and therefore you should use a different name for your custom class.
Use continue to skip the line, scenario:red
for(String csvLine : csvContents) {
if(csvLine.equals("scenario:red")){
continue;
}
String[] data = csvLine.split(csvSplit); // split by comma and remove redundant spaces
if (data.length == NO_OF_FIELD) {
//..
}
//..
}
If you have already defined final int NO_OF_FIELD = 10, you can use the same instead of using the value 10 directly i.e. you should use NO_OF_FIELD instead of 10 in the following code:
if (data.length == NO_OF_FIELD) { //check and avoid indexOutOfBoundException
String clazz = data[0].toLowerCase();// cannot use word "class" as a variable
//...
else if ("person".equals(clazz) && data.length ==10) {
However, you also need to understand that && data.length ==10 is unnecessary here as you have already checked data.length == NO_OF_FIELD in the enclosing if condition.
I couldn't understand the rest of your points. If you clarify them, I'll be able to help you further.
I need to add the previous scenario in the second round.
Since the last set of data won't be captured, I need to set another new scenario to add it in. Thanks for the art sir.
Character character = null;
try (BufferedReader csvReader = new BufferedReader(new FileReader(csvFile));) {
String headerLine = csvReader.readLine(); //get rid of the header
//add each line to the arrayList
while ((line = csvReader.readLine()) != null) {
csvContents.add(line);
}
final int NO_OF_FIELDS = 10;
for(String csvLine : csvContents) {
String[] data = csvLine.split(csvSplit); // split by comma and remove redundant spaces
String clazz = data[0].toLowerCase();// cannot use word "class" as a variable
if (clazz.startsWith("scenario") && data.length == 1) {
// adding scenario after one set of data
// i.e second round adding the first round data
if (passengers.size() != 0 && pedestrians.size() != 0) {
Scenario scenario = new Scenario();
scenario.setPassengers(passengers);
scenario.setPedestrians(pedestrians);
scenarios.add(scenario);
}
passengers = new ArrayList<Character>();
pedestrians = new ArrayList<Character>();
if (clazz.endsWith("green")) {
scenario.setLegalCrossing(true);
System.out.println("green light");
}
else if (clazz.endsWith("red")){
scenario.setLegalCrossing(false);
System.out.println("red light");
}
continue;
}
//...
Scenario scenario = new Scenario();
scenario.setPassengers(passengers);
scenario.setPedestrians(pedestrians);
scenarios.add(scenario);
scenario.setPassengers(passengers);
scenario.setPedestrians(pedestrians);
Audit audit = new Audit();
audit.setScenario(scenarios);
Below is my code to detect abbreviations and their long forms. The code loops over a line in a document, loops over each word of that line and identifies an acronym candidate. It then again loops over each line of the document to find an appropriate long form for the abbreviation. My issue is if an acronym occurs multiple times in a document my output contains multiple instances of it. I just want to print an acronym only once with all its possible long forms. Here's my code:
public static void main(String[] args) throws FileNotFoundException
{
BufferedReader in = new BufferedReader(new FileReader("D:\\Workspace\\resource\\SampleSentences.txt"));
String str=null;
ArrayList<String> lines = new ArrayList<String>();
String matchingLongForm;
List <String> matchingLongForms = new ArrayList<String>() ;
List <String> shortForm = new ArrayList<String>() ;
Map<String, List<String>> abbreviationPairs = new HashMap<String, List<String>>();
try
{
while((str = in.readLine()) != null){
lines.add(str);
}
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
String[] linesArray = lines.toArray(new String[lines.size()]);
// document wide search for abbreviation long form and identifying several appropriate matches
for (String line : linesArray){
for (String word : (Tokenizer.getTokenizer().tokenize(line))){
if (isValidShortForm(word)){
for (int i = 0; i < linesArray.length; i++){
matchingLongForm = extractBestLongForm(word, linesArray[i]);
//shortForm.add(word);
if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))){
matchingLongForms.add(matchingLongForm);
//System.out.println(matchingLongForm);
abbreviationPairs.put(word, matchingLongForms);
//matchingLongForms.clear();
}
}
if (abbreviationPairs != null){
//for(abbreviationPairs.)
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
abbreviationPairs.clear();
matchingLongForms.clear();
//System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}
else
continue;
}
}
}
}
Here's the current output:
Abbreviation Pair: {GLBA=[Gramm Leach Bliley act]}
Abbreviation Pair: {NCUA=[National credit union administration]}
Abbreviation Pair: {FFIEC=[Federal Financial Institutions Examination Council]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {OFAC=[Office of Foreign Assets Control]}
Try to use java.util.Set to store your matching short forms and long forms. From the javadoc of the class:
... If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements...
You want a key value pair for abbreviation and text. So you should use Map.
A map cannot contain duplicate keys; each key can map to at most one value.
The Problem is in the position of the output and not in the map.
You try to output in the loop, so the Map is shown multiple time.
Move the code outside the loop:
if (abbreviationPairs != null){
//for(abbreviationPairs.)
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
abbreviationPairs.clear();
matchingLongForms.clear();
//System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}
Here's the solution
Thanks to code_angel and Holger
Move the printing code outside the loop and create a new list for every matchingLongForm.
for (String line : linesArray){
for (String word : (Tokenizer.getTokenizer().tokenize(line))){
if (isValidShortForm(word)){
for (int i = 0; i < linesArray.length; i++){
matchingLongForm = extractBestLongForm(word, linesArray[i]);
List <String> matchingLongForms = new ArrayList<String>() ;
if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))&& !(abbreviationPairs.containsKey(word))){
matchingLongForms.add(matchingLongForm);
//System.out.println(matchingLongForm);
abbreviationPairs.put(word, matchingLongForms);
//matchingLongForms.clear();
}
}
}
}
}
if (abbreviationPairs != null){
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
//abbreviationPairs.clear();
//matchingLongForms.clear();
}
}
The new output:
Abbreviation Pair: {NCUA=[National credit union administration], FFIEC=[Federal Financial Institutions Examination Council], OFAC=[Office of Foreign Assets Control], MSSP=[Managed Security Service Providers], IS=[Information Systems], SLA=[Service level agreements], CFR=[comments for the Report], MIS=[Management Information Systems], IDS=[Intrusion detection systems], TSP=[Technology Service Providers], RFI=[risk that FIs], EIC=[Examples of in the cloud], TIER=[The institution should ensure], BCP=[Business continuity planning], GLBA=[Gramm Leach Bliley act], III=[It is important], FI=[Financial Institutions], RFP=[Request for proposal]}
I'm currently using Jersey REST to create a webpage that has a list of birds and taxonomy number, with a link to a page specifically about the bird in question. While my links work between the two pages, and my Bird Name and Taxonomy Number appear, I can't get the order or family name to appear. Following is the code in question.
#Path("/birdslist")
public class BirdsList extends Birds {
#GET
#Path("/all")
#Produces("text/html")
public String all() {
Iterator iterator = birdnames.keySet().iterator();
String page = "<html><title>All Birds</title><body>";
page += "<p>This is the list of all birds. <br> Click the taxonomy number of the bird you wish to view in detail.</p>";
while(iterator.hasNext()){
Object key = iterator.next();
String value = birdnames.get(key);
HashSet fam = family.get(key);
HashSet ord = order.get(key);
}
for (String key : birdnames.keySet()) {
page += String.format("<p>Name:%s <br> Taxonomy Number:<a href=%s>%s</a></p>",birdnames.get(key),key,
key);
getBird(key);
}
page += "</body></html>";
return page;
}
#GET
#Path("{key}")
#Produces("text/html")
public String getBird(#PathParam("key") String key) {
String page = "<html><title>Bird #: {key}</title><body>";
page += String.format("<p>This page contains info on the %s</p>",birdnames.get(key));
page += String.format("<p>Name:%s <br> Taxonomy Number:%s <br> Family:%s <br> Order:%s</p>",birdnames.get(key),key,family.get(key),order.get(key));
page += "<p>Please click <a href=all>here</a> to return to the list of all birds.</p>";
page += "</body></html>";
return page;
}
}
The family and order are saved in a HashSet that is inside of a hashmap, while bird name is in a hashmap. It was written over from a csv file and converted into hashmaps. Following is that code.
public class Birds {
HashMap<String,String> birdnames;
HashMap<String,HashSet<String>> family;
HashMap<String,HashSet<String>> order;
/**
Constructor reads the CSV of all birds
*/
public Birds() {
// long path to eBirds assuming Maven "mvn exec:java" is many levels up
String fileName = "src/main/java/com/example/rest/eBirds.csv";
boolean firstLine = true;
this.birdnames = new HashMap<String,String>();
this.family = new HashMap<String,HashSet<String>>();
this.order = new HashMap<String,HashSet<String>>();
try {
BufferedReader R = new BufferedReader(new FileReader(fileName));
String line;
while (true) {
line = R.readLine();
if (line == null) break;
if (firstLine) { // ignore the first line, it's not a bird
firstLine = false;
continue;
}
String[] fields = line.split(",");
if (!fields[1].equalsIgnoreCase("species")) continue; // ignore all but species records
birdnames.put(fields[0],fields[4]); // add this bird to name table
// extract the order name from fields[6]
String ordername = fields[6];
if (!order.containsKey(ordername)) { // if needed, create first-time order set
order.put(ordername,new HashSet<String>());
}
order.get(ordername).add(fields[0]); // new order member by number for lookup
// extract the family name from fields[7] -- removing quotes first if needed
String famname = fields[7].replace("\"","");
if (!family.containsKey(famname)) { // if needed, create first-time family set
family.put(famname,new HashSet<String>());
}
family.get(famname).add(fields[0]); // new family member by number for lookup
}
}
catch (IOException e) { System.out.println("Stack trace: " + e); }
}
...
}
I've never used HashSets before, that was part of the given info to us. Our assignment was to create a list page and pages specific to each bird and link between the two. I just can't get these last two values to appear correctly. Can anyone help?
Here you use the same key for all values, birdnames, family and order:
while(iterator.hasNext()){
Object key = iterator.next();
String value = birdnames.get(key);
HashSet fam = family.get(key);
HashSet ord = order.get(key);
}
But you initialize them with different keys:
// extract the order name from fields[6]
String ordername = fields[6];
if (!order.containsKey(ordername))
{ // if needed, create first-time order set
order.put(ordername, new HashSet<>());
}
order.get(ordername).add(fields[0]); // new order member by number for lookup
Here the key would be fields[6] and not the birdnames key.
If you want to keep using the same key, you could do the following for the orders:
if (!order.containsKey(fields[0]))
{
order.put(fields[0], new HashSet<>());
}
order.get(fields[0]).add(fields[6]);
Then you can use:
HashSet ord = order.get(key);
And you will receive all the orders for that bird name.
If you don't want to change that and still use the same key you could do something like the following, but that is highly discouraged as it destroys the purpose of using a map in the first place:
Set<String> ord = new HashSet<>();
for (String tmp : order.keySet())
{
if (order.get(tmp).contains(key))
ord.add(tmp);
}
Here ord would contain all the orders for the "key".
As you can see, you need to do much more redundant work, if you don't switch value and "key".
I am using temboo to get all the events for a calendar. However, i am trying to create a hashtable of the events and the days. but the for loop says its a null pointer exception even though the program is actually able to access that ith element. I have even printed it and the i is less than the size of the array. Here is the snippet code: Error is in the second line of the for loop.Errr occurs when i = 23, but items.size is 41.
GetAllEvents getAllEventsChoreo = new GetAllEvents(session);
// Get an InputSet object for the choreo
GetAllEventsInputSet getAllEventsInputs = getAllEventsChoreo.newInputSet();
// Set inputs
getAllEventsInputs.set_AccessToken(accessToken);
getAllEventsInputs.set_ClientID(clientID);
getAllEventsInputs.set_ClientSecret(clientSecret);
getAllEventsInputs.set_CalendarID(callIDs[0]);
// Execute Choreo
GetAllEventsResultSet getAllEventsResults = getAllEventsChoreo.execute(getAllEventsInputs);
results = getAllEventsResults.get_Response();
System.out.println(results);
root = jp.parse(results);
rootobj = root.getAsJsonObject();
JsonArray items = rootobj.get("items").getAsJsonArray();
System.out.println("Abour to enter the for loop\nItems:\n"+items.toString());
System.out.println("****************************\nEnter the for loop");
System.out.println("iems Size: "+items.size());
System.out.println(items.get(23).toString());
for(int i = 0;i < items.size();i++)
{
System.out.println("i: "+i);
String startTime = items.get(i).getAsJsonObject().get("start").getAsJsonObject().get("dateTime").getAsString();
System.out.println("startTime: "+startTime);
String dayKey = startTime.split("T")[0];
if(dayKey.equals(beginDate)==false | dayKey.equals(endDate)==false)
{
System.out.println(startTime + " not the one interested so skipping");
continue;
}
System.out.println("passed the first if in for loop");
String endTime = items.get(i).getAsJsonObject().get("end").getAsJsonObject().get("dateTime").getAsString();
String name = items.get(i).getAsJsonObject().get("summary").getAsJsonPrimitive().getAsString();
calendarEvent eventTemp = new calendarEvent(name,startTime,endTime);
if(table.containsKey(dayKey))
table.get(dayKey).add(eventTemp);
else
{
ArrayList<calendarEvent> schedule = new ArrayList<calendarEvent>();
schedule.add(eventTemp);
table.put(dayKey,schedule);
}
}
Set<String> key = table.keySet();
Iterator<String> it = key.iterator();
while(it.hasNext())
{
String keyValue = it.next();
System.out.println("Events on "+keyValue);
ArrayList<calendarEvent> temp = table.get(keyValue);
for(int j =0;j<temp.size();j++)
{
System.out.println(temp.get(j));
}
}
After breaking down the exception line, the exception occurs when I try to get the dateTime as string, the last part creates an exception.
Just because the ith element of an array exists, it does not mean that the element is not null.
Referencing a property or method of such an element will yield a NullPointerException.
If i went beyond the bounds of the array, you would get an ArrayIndexOutOfBoundsException instead.
Check indexed array elements for null before using them.
Sorry to be brief and not reference your code or other sources. I am on my phone. The likely source of your problem is pretty clear, though.
I am using Jericho HTML Parser to parse some malformed html. In particular I am trying to get all text nodes, process the text and then replace it.
I want to skip specific elements from processing. For example I want to skip all elements, and any element that has attribute class="noProcess". So, if a div has class="noProcess" then I want to skip this div and all children from processing. However, I do want these skipped elements to return back to the output after processing.
Jericho provides an Iterator for all nodes but I am not sure how to skip complete elements from the Iterator. Here is my code:
private String doProcessHtml(String html) {
Source source = new Source(html);
OutputDocument outputDocument = new OutputDocument(source);
for (Segment segment : source) {
if (segment instanceof Tag) {
Tag tag = (Tag) segment;
System.out.println("FOUND TAG: " + tag.getName());
// DO SOMETHING HERE TO SKIP ENTIRE ELEMENT IF IS <A> OR CLASS="noProcess"
} else if (segment instanceof CharacterReference) {
CharacterReference characterReference = (CharacterReference) segment;
System.out.println("FOUND CHARACTERREFERENCE: " + characterReference.getCharacterReferenceString());
} else {
System.out.println("FOUND PLAIN TEXT: " + segment.toString());
outputDocument.replace(segment, doProcessText(segment.toString()));
}
}
return outputDocument.toString();
}
It doesn't look like using the ignoreWhenParsing() method works for me as the parser just treats the "ignored" element as text.
I was thinking that if I could convert the Iterator loop to a for (int i = 0;...) loop I could probably be able to skip the element and all its children by modifying i to point to the EndTag and then continue the loop.... but not sure.
I think you might want to consider a redesign of the way your segments are built. Is there a way to parse the html in such a way that each segment is a parent element that contains a nested list of child elements? That way you could do something like:
for (Segment segment : source) {
if (segment instanceof Tag) {
Tag tag = (Tag) segment;
System.out.println("FOUND TAG: " + tag.getName());
// DO SOMETHING HERE TO SKIP ENTIRE ELEMENT IF IS <A> OR CLASS="noProcess"
continue;
} else if (segment instanceof CharacterReference) {
CharacterReference characterReference = (CharacterReference) segment;
System.out.println("FOUND CHARACTERREFERENCE: " + characterReference.getCharacterReferenceString());
for(Segment child : segment.childNodes()) {
//Use recursion to process child elements
//You will want to put your for loop in a separate method so it can be called recursively.
}
} else {
System.out.println("FOUND PLAIN TEXT: " + segment.toString());
outputDocument.replace(segment, doProcessText(segment.toString()));
}
}
Without more code to inspect its hard to determine if restructuring the segment element is even possible or worth the effort.
Managed to have a working solution by using the getEnd() method of the Element object of the Tag. The idea is to skip elements if their end position is less than a position you set. So you find the end position of the element you want to exclude and you do not process anything else before that position:
final ArrayList<String> excludeTags = new ArrayList<String>(Arrays.asList(new String[] {"head", "script", "a"}));
final ArrayList<String> excludeClasses = new ArrayList<String>(Arrays.asList(new String[] {"noProcess"}));
Source.LegacyIteratorCompatabilityMode = true;
Source source = new Source(htmlToProcess);
OutputDocument outputDocument = new OutputDocument(source);
int skipToPos = 0;
for (Segment segment : source) {
if (segment.getBegin() >= skipToPos) {
if (segment instanceof Tag) {
Tag tag = (Tag) segment;
Element element = tag.getElement();
// check excludeTags
if (excludeTags.contains(tag.getName().toLowerCase())) {
skipToPos = element.getEnd();
}
// check excludeClasses
String classes = element.getAttributeValue("class");
if (classes != null) {
for (String theClass : classes.split(" ")) {
if (excludeClasses.contains(theClass.toLowerCase())) {
skipToPos = element.getEnd();
}
}
}
} else if (segment instanceof CharacterReference) { // for future use. Source.LegacyIteratorCompatabilityMode = true;
CharacterReference characterReference = (CharacterReference) segment;
} else {
outputDocument.replace(segment, doProcessText(segment.toString()));
}
}
}
return outputDocument.toString();
This should work.
String skipTag = null;
for (Segment segment : source) {
if (skipTag != null) { // is skipping ON?
if (segment instanceof EndTag && // if EndTag found for the
skipTag.equals(((EndTag) segment).getName())) { // tag we're skipping
skipTag = null; // set skipping OFF
}
continue; // continue skipping (or skip the EndTag)
} else if (segment instanceof Tag) { // is tag?
Tag tag = (Tag) segment;
System.out.println("FOUND TAG: " + tag.getName());
if (HTMLElementName.A.equals(tag.getName()) { // if <a> ?
skipTag = tag.getName(); // set
continue; // skipping ON
} else if (tag instanceof StartTag) {
if ("noProcess".equals( // if <tag class="noProcess" ..> ?
((StartTag) tag).getAttributeValue("class"))) {
skipTag = tag.getName(); // set
continue; // skipping ON
}
}
} // ...
}