I am trying to read a very large excel file Apache POI. I have managed to read the file but i am able to store the values to a object. I am doing so because i have to change some values and then add it to a table.
Below is the code
protected void processSheet(StylesTable styles,
ReadOnlySharedStringsTable strings, InputStream sheetInputStream)
throws IOException, SAXException {
InputSource sheetSource = new InputSource(sheetInputStream);
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
try {
SAXParser saxParser = saxFactory.newSAXParser();
XMLReader sheetParser = saxParser.getXMLReader();
System.out.println("sheet is not read");
ContentHandler handler = new XSSFSheetXMLHandler(styles, strings, new SheetContentsHandler() {
#Override
public void startRow(int rowNum) {
System.out.println("rwnum"+rowNum);
}
#Override
public void endRow() {
System.out.println("endrow");
}
#Override
public void cell(String cellReference, String formattedValue) {
}
#Override
public void headerFooter(String text, boolean isHeader, String tagName) {
}
},
false//means result instead of formula
);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
Related
Here, I'm using SAX method for parsing array. I'm facing an issue where I'm not able to write a generic code to parse an array type of xml. I couldn't find a solution for generic way methodology to identify it as an array and iterate over it and store it in List
<bookstore>
<book category="children">
<title>Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title>Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
Any solution will help. Thanks in advance
I use code below. I got it from: https://github.com/niteshapte/generic-xml-parser
public class GenericXMLParserSAX extends DefaultHandler {
private ListMultimap<String, String> listMultimap = ArrayListMultimap.create();
String tempCharacter;
private String[] startElements;
private String[] endElements;
public void setStartElements(String[] startElements) {
this.startElements = startElements;
}
public String[] getStartElements() {
return startElements;
}
public void setEndElements(String[] endElements) {
this.endElements = endElements;
}
public String[] getEndElements() {
return endElements;
}
public void parseDocument(String xml, String[] startElements, String[] endElements) {
setStartElements(startElements);
setEndElements(endElements);
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
sp.parse(inputStream, this);
} catch(SAXException se) {
se.printStackTrace();
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch (IOException ie) {
ie.printStackTrace();
}
}
public void parseDocument(String xml, String[] endElements) {
setEndElements(endElements);
SAXParserFactory spf = SAXParserFactory.newInstance();
try {
SAXParser sp = spf.newSAXParser();
InputStream inputStream = new ByteArrayInputStream(xml.getBytes());
sp.parse(inputStream, this);
} catch(SAXException se) {
se.printStackTrace();
} catch(ParserConfigurationException pce) {
pce.printStackTrace();
} catch (IOException ie) {
ie.printStackTrace();
}
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
String[] startElements = getStartElements();
if(startElements!= null){
for(int i = 0; i < startElements.length; i++) {
if(qName.startsWith(startElements[i])) {
listMultimap.put(startElements[i], qName);
}
}
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
tempCharacter = new String(ch, start, length);
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
String[] endElements = getEndElements();
for(int i = 0; i < endElements.length; i++) {
if (qName.equalsIgnoreCase(endElements[i])) {
listMultimap.put(endElements[i], tempCharacter);
}
}
}
public ListMultimap<String, String> multiSetResult() {
return listMultimap;
}
}
You can create a custom Handler that extends DefaultHandler and use it to parse your XML and generate the List<Book> for you.
The Handler will maintain a List<Book> and:
every time it will encounter the book start tag, it will create a new Book
every time it will encounter the book end tag, it will add this Book to the List.
In the end it will be holding the complete list of Books and you can access it via its getBooks() method
Assuming this Book class:
class Book {
private String category;
private String title;
private String author;
private String year;
private String price;
// GETTERS/SETTERS
}
You can create a custom Handler like this:
class MyHandler extends DefaultHandler {
private boolean title = false;
private boolean author = false;
private boolean year = false;
private boolean price = false;
// Holds the list of Books
private List<Book> books = new ArrayList<>();
// Holds the Book we are currently parsing
private Book book;
public void startElement(String uri, String localName,String qName, Attributes attributes) {
switch (qName) {
// Create a new Book when finding the start book tag
case "book": {
book = new Book();
book.setCategory(attributes.getValue("category"));
}
case "title": title = true;
case "author": author = true;
case "year": year = true;
case "price": price = true;
}
}
public void endElement(String uri, String localName, String qName) {
// Add the current Book to the list when finding the end book tag
if("book".equals(qName)) {
books.add(book);
}
}
public void characters(char[] ch, int start, int length) {
String value = new String(ch, start, length);
if (title) {
book.setTitle(value);
title = false;
} else if (author) {
book.setAuthor(value);
author = false;
} else if (year) {
book.setYear(value);
year = false;
} else if (price) {
book.setPrice(value);
price = false;
}
}
public List<Book> getBooks() {
return books;
}
}
Then you parse using this custom Handler and retrieve the list of Books
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
MyHandler myHandler = new MyHandler();
saxParser.parse("/path/to/file.xml", myHandler);
List<Book> books = myHandler.getBooks();
I have large excel file with several worksheets.
I want to process just one sheet in file...Read value from two columns and update two columns.
Using this code, I am able to read data from sheet.But unable to figure out, how to save output back.
public class ExcelFunctions {
private class ExcelData implements SheetContentsHandler {
private Record rec ;
public void startRow(int rowNum) {
rec = new Record();
output.put("R"+rowNum, rec);
}
public void endRow(int rowNum) {
}
public void cell(String cellReference, String formattedValue,
XSSFComment comment) {
int thisCol = (new CellReference(cellReference)).getCol();
if(thisCol==7){
try {
rec.setK1(formattedValue);
} catch (Exception e) {
}
}
if(thisCol==8){
try {
rec.setK2(formattedValue);
} catch (Exception e) {
}
}
if(thisCol == 27){
String key = rec.full_key();
System.out.println(key);
///////Process Matched Key...get Data
//////Set value to column 27
}
if(thisCol == 28){
String key = rec.full_key();
System.out.println(key);
///////Process Matched Key...get Data
//////Set value to column 28
}
}
public void headerFooter(String text, boolean isHeader, String tagName) {
}
}
///////////////////////////////////////
private final OPCPackage xlsxPackage;
private final Map<String, Record> output;
public ExcelFunctions(OPCPackage pkg, Map<String, Record> output) {
this.xlsxPackage = pkg;
this.output = output;
}
public void processSheet(
StylesTable styles,
ReadOnlySharedStringsTable strings,
SheetContentsHandler sheetHandler,
InputStream sheetInputStream)
throws IOException, ParserConfigurationException, SAXException {
DataFormatter formatter = new DataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(
styles, null, strings, sheetHandler, formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch(ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
public void process()
throws IOException, OpenXML4JException, ParserConfigurationException, SAXException {
ReadOnlySharedStringsTable strings = new ReadOnlySharedStringsTable(this.xlsxPackage);
XSSFReader xssfReader = new XSSFReader(this.xlsxPackage);
StylesTable styles = xssfReader.getStylesTable();
XSSFReader.SheetIterator iter = (XSSFReader.SheetIterator) xssfReader.getSheetsData();
boolean found = false;
while (iter.hasNext() && !found) {
InputStream stream = iter.next();
String sheetName = iter.getSheetName();
if(sheetName.equals("All Notes") ){
processSheet(styles, strings, new ExcelData(), stream);
found = true;
}
stream.close();
}
}
#SuppressWarnings("unused")
public static void main(String[] args) throws Exception {
File xlsxFile = new File("C:\\Users\\admin\\Downloads\\Unique Name Macro\\big.xlsm");
if (!xlsxFile.exists()) {
System.err.println("Not found or not a file: " + xlsxFile.getPath());
return;
}
// The package open is instantaneous, as it should be.
OPCPackage p = OPCPackage.open(xlsxFile.getPath(), PackageAccess.READ_WRITE);
Map<String, Record> output = new HashMap<String, Record>();
ExcelFunctions xlFunctions = new ExcelFunctions(p, output);
xlFunctions.process();
p.close();
if (output != null){
for(Record rec : output.values()){
System.out.println(rec.full_key());
}
}
}
}
File is very large and I only want to use Event API.
I have successfully tested Using this code.
But this loads Whole file in memory(causing application to crash)...While I only need to edit One sheet.
public static void saveToExcel(String ofn, Map<String, Record> data) {
FileInputStream infile;
try {
infile = new FileInputStream(new File("C:\\Users\\admin\\Downloads\\Unique Name Macro\\big.xlsm"));
XSSFWorkbook workbook = new XSSFWorkbook (infile);
XSSFSheet sheet = workbook.getSheet("All Notes");
for(Record rec : output.values()){
Row dataRow = rec.getRow(rev.getRownum-1);
setCellValue(dataRow, 26, "SomeValue");
setCellValue(dataRow, 27, "SomeValue");
}
FileOutputStream out = new FileOutputStream(new File(ofn));
workbook.write(out);
infile.close();
out.close();
workbook.close();
}
catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
private static void setCellValue(Row row,int col, String value){
Cell c0 = row.getCell(col);
if (c0 == null){
c0 = row.createCell(col);
}
c0.setCellValue(value);
}
I don't think there is anything provided in POI out of the box which allows to do that.
Therefore you might be better off doing this by unzipping the XLSX/XLSM file (they are actually a bunch of xml-files inside a zip) and reading the xml-files as text-files or with a normal XML Parser so that you can easily write out the changed file again to produce the XLSX/XLSM file again.
I want to ignore external entities and external stylesheets (eg. <?xml-stylesheet type="text/xsl" href="......."?>).
I know I have to set XMLReader property to ignore external entities but I don't know how to ignore stylesheets...
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.XMLReader;
//...
final XMLReader parser = new SAXParser();
// Ignore entities
parser.setProperty("http://xml.org/sax/features/external-general-entities", false);
// IS CORRECT???
parser.setProperty("http://xml.org/sax/features/external-general-entities", false);
There are more properties to set to avoid external entities and stylesheet?
How Can I understand if there are external entities o stylesheets?
Working for me:
public class SaxParser extends DefaultHandler
implements ContentHandler, DTDHandler, EntityResolver{
public transient static final String STYLE_SHEET_TAG = "xml-stylesheet";
public transient static final String EXTERNAL_ENTITY = "ExternalEntity";
public static void main(String[] args) {
new SaxParser().execute();
}
public void execute() {
String pathFileXml = "test/XML.xml";
final XMLReader parser = new SAXParser();
parser.setContentHandler(this);
parser.setDTDHandler(this);
parser.setEntityResolver(this);
try {
parser.parse(pathFileXml);
} catch (IOException e) {
e.printStackTrace();
} catch (SAXException e) {
if (SaxParser.STYLE_SHEET_TAG.equals(e.getMessage())
|| SaxParser.EXTERNAL_ENTITY.equals(e.getMessage())) {
System.out.println("CATCH ERRORE");
}
e.printStackTrace();
}
System.out.println("OK");
}
#Override
public void processingInstruction(String target, String data)
throws SAXException {
System.out.println("Processing Instruction");
System.out.println("PI=> target: " + target + ", data: " + data);
if (STYLE_SHEET_TAG.equalsIgnoreCase(target.trim())) {
throw new SAXException(STYLE_SHEET_TAG);
}
return;
}
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws IOException, SAXException {
System.out.println("publicId: " + publicId + ", systemId: " + systemId);
throw new SAXException(SaxParser.EXTERNAL_ENTITY);
}
}
The external stylesheet declaration is a standard processing instruction.
You can ignore processing instructions by not implementing the handler method:
void processingInstruction(java.lang.String target, java.lang.String data) {}
in your SAX handler.
I'm new to Java and we were given an assignment about XML Parsing. We have done DOM and now we are on SAX. That's why I'm using SAX Parser for parsing an rss feed. Its already working on files but when I try to parse an online rss feed, it returns an Error 403. I haven't tried parsing the same site on DOM because my laptop is so slow it takes me 5 minutes just to open a file.
Thanks for the help.
public class NewsHandler extends DefaultHandler {
private String url = "http://tomasinoweb.org/feed/rss";
private boolean inDescription = false;
private String[] descs = new String[11];
int i = 0;
public void processFeed() {
try {
SAXParserFactory factory =
SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(this);
InputStream inputStream = new URL(url).openStream();
reader.parse(new InputSource(inputStream));
} catch (Exception e) { e.printStackTrace(); }
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if(qName.equals("description")) inDescription = true;
}
public void characters(char ch[], int start, int length) {
String chars = new String(ch).substring(start, start + length);
if(inDescription) descs[i] = chars;
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equals("description")) {
inDescription = false;
i++;
}
}
public String getDesc(int index) { return descs[index]; }
public static void main(String[] args) {
NewsHandler nh = new NewsHandler();
nh.processFeed();
for(int i=0; i<10; i++) {
System.out.println(nh.getDesc(i));
}
}
}
Solution:
Instead of using String url = "url", I used URL url = new URL("url") and URLConnection con = url.openConnection() and then con.addRequestProperty("user-agent", user-agent string);
<inputs>
<MAT_NO>123</MAT_NO>
<MAT_NO>323</MAT_NO>
<MAT_NO>4223</MAT_NO>
<FOO_BAR>122</FOO_BAR>
<FOO_BAR>125</FOO_BAR>
</inputs>
I've to parse the above the XML. After parsing, i want the values to be in a Map<String, List<String>> with Key values corresponding to the child nodes - MAT_NO, FOO_BAR
and values - the values of the child nodes -123, 323 etc.
Following is my shot. Is there any better way of doing this??
public class UserInputsXmlParser extends DefaultHandler {
private final SaveSubscriptionValues subscriptionValues = null;
private String nodeValue = "";
private final String inputKey = "";
private final List<String> valuesList = null;
private Map<String, List<String>> userInputs;
public Map<String, List<String>> parse(final String strXML) {
try {
final SAXParserFactory parserFactory = SAXParserFactory
.newInstance();
final SAXParser saxParser = parserFactory.newSAXParser();
saxParser.parse(new InputSource(new StringReader(strXML)), this);
return userInputs;
} catch (final SAXException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final IOException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final ParserConfigurationException e) {
e.printStackTrace();
throw new MyException("", e);
} catch (final Exception e) {
e.printStackTrace();
throw new MyException("", e);
}
}
#Override
public void startElement(final String uri, final String localName,
final String qName, final Attributes attributes)
throws SAXException {
nodeValue = "";
if ("inputs".equalsIgnoreCase(qName)) {
userInputs = MyUtil.getNewHashMap();
return;
}
}
#Override
public void characters(final char[] ch, final int start, final int length)
throws SAXException {
if (!MyUtil.isEmpty(nodeValue)) {
nodeValue += new String(ch, start, length);
} else {
nodeValue = new String(ch, start, length);
}
}
#Override
public void endElement(final String uri, final String localName,
final String qName) throws SAXException {
if (!"inputs".equalsIgnoreCase(qName)) {
storeUserInputs(qName, nodeValue);
}
}
/**
* #param qName
* #param nodeValue2
*/
private void storeUserInputs(final String qName, final String nodeValue2) {
if (nodeValue2 == null || nodeValue2.trim().equals("")) { return; }
final String trimmedValue = nodeValue2.trim();
final List<String> values = userInputs.get(qName);
if (values != null) {
values.add(trimmedValue);
} else {
final List<String> valueList = new ArrayList<String>();
valueList.add(trimmedValue);
userInputs.put(qName, valueList);
}
}
public static void main(final String[] args) {
final String sample = "<inputs>" + "<MAT_NO>154400-0000</MAT_NO>"
+ "<MAT_NO> </MAT_NO>" + "<MAT_NO>154400-0002</MAT_NO>"
+ "<PAT_NO>123</PAT_NO><PAT_NO>1111</PAT_NO></inputs>";
System.out.println(new UserInputsXmlParser().parse(sample));
}
}
UPDATE: The children of <inputs> nodes are dynamic. I'll be knowing just the root node.
Do you have to provide a solution as part of a SAX event handler? If not then you could use one of the many XML libraries around, such as dom4j. Make the solution a lot simpler;
public static void main(String[] args) throws Exception
{
String sample = "<inputs>" + "<MAT_NO>154400-0000</MAT_NO>"
+ "<MAT_NO> </MAT_NO>" + "<MAT_NO>154400-0002</MAT_NO>"
+ "<PAT_NO>123</PAT_NO><PAT_NO>1111</PAT_NO></inputs>";
System.out.println(parse(sample));
}
static Map<String,List<String>> parse(String xml) throws Exception
{
Map<String,List<String>> map = new HashMap<String,List<String>>();
SAXReader reader = new SAXReader();
Document doc = reader.read(new StringReader(xml));
for (Iterator i = doc.getRootElement().elements().iterator(); i.hasNext();)
{
Element element = (Element)i.next();
//Maybe handle elements with only whitespace text content
List<String> list = map.get(element.getName());
if (list == null)
{
list = new ArrayList<String>();
map.put(element.getName(), list);
}
list.add(element.getText());
}
return map;
}
I would check xstream....( http://x-stream.github.io/tutorial.html )
XStream is a simple library to serialize objects to XML and back again.
For something this basic, look into xpath.