Writing an Element object to file using java - java

I have a data of Element class. I'm trying to write its values to a file but I'm having trouble:
< Some process to acquire values into the variable "fieldData" >
// Prepare file output
FileWriter fstream = new FileWriter("C:/output.txt");
BufferedWriter out = new BufferedWriter(fstream);
Element field = fieldData.getElement(i);
out.write(field); // DOESN'T WORK: The method write(int) in the type BufferedWriter is not applicable for the arguments (Element)
out.write(field.getValueAsString()); // DOESN'T WORK: Cannot convert SEQUENCE to String
Any suggestions on how I should handle this case? In addition, what is the best way for me to see (i.e. print out to screen) the available static variables and methods associated with an object? Thx.
More code snippets to help debug:
private static final Name SECURITY_DATA = new Name("securityData");
private static final Name FIELD_DATA = new Name("fieldData");
Element securityDataArray = msg.getElement(SECURITY_DATA); // msg is a Bloomberg desktop API object
Element securityData = securityDataArray.getValueAsElement(0);
Element fieldData = securityData.getElement(FIELD_DATA);
Element field = fieldData.getElement(0)
out.write(field); // DOESN'T WORK: The method write(int) in the type BufferedWriter is not applicable for the arguments (Element)
out.write(field.getValueAsString()); // DOESN'T WORK: Cannot convert SEQUENCE to String

Turns out that this Bloomberg Prop data structure is long-winded to say the least:
private static final Name SECURITY_DATA = new Name("securityData");
private static final Name FIELD_DATA = new Name("fieldData");
Element securityDataArray = msg.getElement(SECURITY_DATA); // msg is a Bloomberg desktop API object
Element securityData = securityDataArray.getValueAsElement(0);
Element fieldData = securityData.getElement(FIELD_DATA);
Element field = fieldData.getElement(0);
/* the above codes were known at the time of the question */
/* below is what I was shown by a bloomberg representative */
Element bulkElement = field.getValueAsElement(0);
Element elem = bulkElement.getElement(0);
out.write(elem.name() + "\t" + elem.getValueAsString() + "\n");
whew...I don't think they try to make it easy! I'm also curious as to if there was a way that I could have figure this out by having Java print out the right method to use to trace down the data structure?

Element element = msg.GetElement("securityData");
for (int i = 0; i < element.NumValues; i++)
{
Element security = element.GetValueAsElement(i); //ie: DJI INDEX
Element fields = security.GetElement("fieldData");//ie: INDX_MEMBERS
for (int j = 0; j < fields.NumElements; j++)
{
Element field = fields.GetElement(j); //a list of members
for (int k = 0; k < field.NumValues; k++)
{
//print field.GetValueAsElement(k); //print members name
}
}
}

It sounds like you are trying to print the value of a input field element?
If so, then try:
out.write(field.getAttribute("value"));

Check out this one, for your second question:
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/Class.html

Related

Stop Bullet number to be updated automatically when merging word docs using docx4j

I am trying to merge 2 docx files which has their own bullet number, after merging of word docs the bullets are automatically updated.
E.g:
Doc A has 1 2 3
Doc B has 1 2 3
After merging the bullet numbering are updated to be 1 2 3 4 5 6
how to stop this.
I am using following code
if(counter==1)
{
FirstFileByteStream = org.apache.commons.codec.binary.Base64.decodeBase64(strFileData.getBytes());
FirstFileIS = new java.io.ByteArrayInputStream(FirstFileByteStream);
FirstWordFile = org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(FirstFileIS);
main = FirstWordFile.getMainDocumentPart();
//Add page break for Table of Content
main.addObject(objBr);
if (htmlCode != null) {
main.addAltChunk(org.docx4j.openpackaging.parts.WordprocessingML.AltChunkType.Html,htmlCode.toString().getBytes());
}
//Table of contents - End
}
else
{
FileByteStream = org.apache.commons.codec.binary.Base64.decodeBase64(strFileData.getBytes());
FileIS = new java.io.ByteArrayInputStream(FileByteStream);
byte[] bytes = IOUtils.toByteArray(FileIS);
AlternativeFormatInputPart afiPart = new AlternativeFormatInputPart(new PartName("/part" + (chunkCount++) + ".docx"));
afiPart.setContentType(new ContentType(CONTENT_TYPE));
afiPart.setBinaryData(bytes);
Relationship altChunkRel = main.addTargetPart(afiPart);
CTAltChunk chunk = Context.getWmlObjectFactory().createCTAltChunk();
chunk.setId(altChunkRel.getId());
main.addObject(objBr);
htmlCode = new StringBuilder();
htmlCode.append("<html>");
htmlCode.append("<h2><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><p style=\"font-family:'Arial Black'; color: #f35b1c\">"+ReqName+"</p></h2>");
htmlCode.append("</html>");
if (htmlCode != null) {
main.addAltChunk(org.docx4j.openpackaging.parts.WordprocessingML.AltChunkType.Html,htmlCode.toString().getBytes());
}
//Add Page Break before new content
main.addObject(objBr);
//Add new content
main.addObject(chunk);
}
Looking at your code, you are adding HTML altChunks to your document.
For these to display it Word, the HTML is converted to normal docx content.
An altChunk is usually converted by Word when you open the docx.
(Alternatively, docx4j-ImportXHTML can do it for an altChunk of type XHTML)
The upshot is that what happens with the bullets (when Word converts your HTML) is largely outside your control. You could experiment with CSS but I think Word will mostly ignore it.
An alternative may be to use XHTML altChunks, and have docx4j-ImportXHTML convert them. main.convertAltChunks()
If the same problem occurs when you try that, well, at least we can address it.
I was able to fix my issue using following code. I found it at (http://webapp.docx4java.org/OnlineDemo/forms/upload_MergeDocx.xhtml). You can also generate your custom code, they have a nice demo where they generate code according to your requirement :).
public final static String DIR_IN = System.getProperty("user.dir")+ "/";
public final static String DIR_OUT = System.getProperty("user.dir")+ "/";
public static void main(String[] args) throws Exception
{
String[] files = {"part1docx_20200717t173750539gmt.docx", "part1docx_20200717t173750539gmt (1).docx", "part1docx_20200717t173750539gmt.docx"};
List blockRanges = new ArrayList();
for (int i=0 ; i< files.length; i++) {
BlockRange block = new BlockRange(WordprocessingMLPackage.load(new File(DIR_IN + files[i])));
blockRanges.add( block );
block.setStyleHandler(StyleHandler.RENAME_RETAIN);
block.setNumberingHandler(NumberingHandler.ADD_NEW_LIST);
block.setRestartPageNumbering(false);
block.setHeaderBehaviour(HfBehaviour.DEFAULT);
block.setFooterBehaviour(HfBehaviour.DEFAULT);
block.setSectionBreakBefore(SectionBreakBefore.NEXT_PAGE);
}
// Perform the actual merge
DocumentBuilder documentBuilder = new DocumentBuilder();
WordprocessingMLPackage output = documentBuilder.buildOpenDocument(blockRanges);
// Save the result
SaveToZipFile saver = new SaveToZipFile(output);
saver.save(DIR_OUT+"OUT_MergeWholeDocumentsUsingBlockRange.docx");
}

Implementing language locales into array to be used in a loop

I'm trying to read every file in a directory, clean up with java util.locale, then write to a new directory. The reading and writing methods work, the Locale.SPANISH might be the issue as I have read in other posts.
I iterated through the available languages in the java.util.locale, spanish was in there.
First, the array issue: the following extract of code below is the long way of entering the Locale.(LANGUAGE) into the array. This seems to work fine. However, I can't understand why the 'short' way doesn't seem to work.
String[] languageLocale = new String[fileArray.length];
languageLocale[0] = "Locale.ENGLISH";
languageLocale[1] = "Locale.FRENCH";
languageLocale[2] = "Locale.GERMAN";
languageLocale[3] = "Locale.ITALIAN";
languageLocale[4] = "Locale.SPANISH";
The short way:
String[] languageLocale = new String[("Locale.ENGLISH" , "Locale.FRENCH" , "Locale.GERMAN" , "Locale.ITALIAN" , "Locale.SPANISH")];
I need to input the Locale.(langauge) into a string so they can be called in the following:
File file = new File("\\LanguageGuessing5.0\\Learning\\");
File[] fileArray = file.listFiles();
ArrayList<String> words = new ArrayList<String>();
for (int i = 0; i < fileArray.length; i++) {
if (fileArray[i].isFile()) {
if (fileArray[i].isHidden()) {
continue;
} else {
String content = readUTF8File("\\LanguageGuessing5.0\\Learning\\"+fileArray[i].getName());
words = extractWords(content, languageLocale[i]);
outputWordsToUTF8File("\\LanguageGuessing5.0\\Model\\"+ fileArray[i].getName() + "out.txt", words);
}
} else if (fileArray[i].isDirectory()) {
System.out.println("Directory " + fileArray[i].getName());
}
}
The following method call:
words = extractWords(content, languageLocale[i]);
also presents the following error:
The method extractWords(String, Locale) in the type CleaningText(the class name) is not applicable for the arguments (String, String)
My understanding is that while the array argument is not a locale, the string holds the correct text to make it valid. I'm clearly incorrect, I'm hoping someone could explain how this works.
The input types of the methods are below for context:
public static String readUTF8File(String filePath)
public static ArrayList extractWords(String inputText, Locale currentLocale)
public static void outputWordsToUTF8File(String filePath, ArrayList wordList)
Many thanks in advance

ArrayList<String> in PDF from a new row

I want to send some survey in PDF from java, I tryed different methods. I use with StringBuffer and without, but always see text in PDF in one row.
public void writePdf(OutputStream outputStream) throws Exception {
Paragraph paragraph = new Paragraph();
Document document = new Document();
PdfWriter.getInstance(document, outputStream);
document.open();
document.addTitle("Survey PDF");
ArrayList nameArrays = new ArrayList();
StringBuffer sb = new StringBuffer();
int i = -1;
for (String properties : textService.getAnswer()) {
nameArrays.add(properties);
i++;
}
for (int a= 0; a<=i; a++){
System.out.println("nameArrays.get(a) -"+nameArrays.get(a));
sb.append(nameArrays.get(a));
}
paragraph.add(sb.toString());
document.add(paragraph);
document.close();
}
textService.getAnswer() this - ArrayList<String>
Could you please advise how to separate the text in order each new sentence will be starting from new row?
Now I see like this:
You forgot the newline character \n and your code seems a bit overcomplicated.
Try this:
StringBuffer sb = new StringBuffer();
for (String property : textService.getAnswer()) {
sb.append(property);
sb.append('\n');
}
What about:
nameArrays.add(properties+"\n");
You might be able to fix that by simply appending "\n" to the strings that you collecting in your list; but I think: that very much depends on the PDF library you are using.
You see, "newlines" or "paragraphs" are to a certain degree about formatting. It seems like a conceptual problem to add that "formatting" information to the data that you are processing.
Meaning: you might want to check if your library allows you to provide strings - and then have the library do the formatting for you!
In other words: instead of giving strings with newlines; you should check if you can keep using strings without newlines, but if there is way to have the PDF library add line breaks were appropriate.
Side note on code quality: you are using raw types:
ArrayList nameArrays = new ArrayList();
should better be
ArrayList<String> names = new ArrayList<>();
[ I also changed the name - there is no point in putting the type of a collection into the variable name! ]
This method is for save values in array list into a pdf document. In the mfilePath variable "/" in here you can give folder name. As a example "/example/".
and also for mFileName variable you can use name. I give the date and time that document will created. don't give static name other vice your values are overriding in same pdf.
private void savePDF()
{
com.itextpdf.text.Document mDoc = new com.itextpdf.text.Document();
String mFileName = new SimpleDateFormat("YYYY-MM-DD-HH-MM-SS", Locale.getDefault()).format(System.currentTimeMillis());
String mFilePath = Environment.getExternalStorageDirectory() + "/" + mFileName + ".pdf";
try
{
PdfWriter.getInstance(mDoc, new FileOutputStream(mFilePath));
mDoc.open();
for(int d = 0; d < g; d++)
{
String mtext = answers.get(d);
mDoc.add(new Paragraph(mtext));
}
mDoc.close();
}
catch (Exception e)
{
}
}

How to keep Lucene index without deleted documents

This is my first question on Stack Overflow,so wish me luck.
I am doing a classification process over a Lucene index with java and i need to update a document field named category. I have been using Lucene 4.2 with the index writer updateDocument() function for that purpose and its working very well, except for the deletion part. Even if i use the forceMergeDeletes() function after the update the index show me some already deleted documents. For example, if I run the classification over an index with 1000 documents the final amount of documents in the index remain the same and work as expected, but when I increase the index documents to 10000 the index shows some already deleted documents but not all. So, how can I actually erase those deleted documents from index?
Here is some snippets of my code:
public static void main(String[] args) throws IOException, ParseException {
///////////////////////Preparing config data////////////////////////////
File indexDir = new File("/indexDir");
Directory fsDir = FSDirectory.open(indexDir);
IndexWriterConfig iwConf = new IndexWriterConfig(Version.LUCENE_42, new WhitespaceSpanishAnalyzer());
iwConf.setOpenMode(IndexWriterConfig.OpenMode.APPEND);
IndexWriter indexWriter = new IndexWriter(fsDir, iwConf);
IndexReader reader = DirectoryReader.open(fsDir);
IndexSearcher indexSearcher = new IndexSearcher(reader);
KNearestNeighborClassifier classifier = new KNearestNeighborClassifier(100);
AtomicReader ar = new SlowCompositeReaderWrapper((CompositeReader) reader);
classifier.train(ar, "text", "category", new WhitespaceSpanishAnalyzer());
System.out.println("***Before***");
showIndexedDocuments(reader);
System.out.println("***Before***");
int maxdoc = reader.maxDoc();
int j = 0;
for (int i = 0; i < maxdoc; i++) {
Document doc = reader.document(i);
String clusterClasif = doc.get("category");
String text = doc.get("text");
String docid = doc.get("doc_id");
ClassificationResult<BytesRef> result = classifier.assignClass(text);
String classified = result.getAssignedClass().utf8ToString();
if (!classified.isEmpty() && clusterClasif.compareTo(classified) != 0) {
Term term = new Term("doc_id", docid);
doc.removeField("category");
doc.add(new StringField("category",
classified, Field.Store.YES));
indexWriter.updateDocument(term,doc);
j++;
}
}
indexWriter.forceMergeDeletes(true);
indexWriter.close();
System.out.println("Classified documents count: " + j);
System.out.println();
reader.close();
reader = DirectoryReader.open(fsDir);
System.out.println("Deleted docs: " + reader.numDeletedDocs());
System.out.println("***After***");
showIndexedDocuments(reader);
}
private static void showIndexedDocuments(IndexReader reader) throws IOException {
int maxdoc = reader.maxDoc();
for (int i = 0; i < maxdoc; i++) {
Document doc = reader.document(i);
String idDoc = doc.get("doc_id");
String text = doc.get("text");
String category = doc.get("category");
System.out.println("Id Doc: " + idDoc);
System.out.println("Category: " + category);
System.out.println("Text: " + text);
System.out.println();
}
System.out.println("Total: " + maxdoc);
}
I have spend many hours looking for a solution to this, someones say that the deleted documents in the index are not important and that eventually they will be erased when we keep adding documents to the index, but I need to control that process in a way I can iterate over the index documents at any time and that the documents I retrieve are actually the lived ones. Lucene versions previous to 4.0 had a function in the IndexReader class named isDeleted(docId) that gives if a document has been marked has deleted, that could be just half of the solution to my problem but I have not found a way to do that with the version 4.2 of Lucene. If you know how to do that I really appreciate if you share it.
You can check is a document is deleted is the MultiFields class, like:
Bits liveDocs = MultiFields.getLiveDocs(reader);
if (!liveDocs.get(docID)) ...
So, working this into your code, perhaps something like:
int maxdoc = reader.maxDoc();
Bits liveDocs = MultiFields.getLiveDocs(reader);
for (int i = 0; i < maxdoc; i++) {
if (!liveDocs.get(docID)) continue;
Document doc = reader.document(i);
String idDoc = doc.get("doc_id");
....
}
By the way, sounds like you have previously been working with 3.X, and are now on 4.X. The Lucene Migration Guide is very helpful for these understanding these sorts of changes between versions, and how to resolve them.

Flat file parsing with java

I want to parse below wmi output to hashmap as a key value pair using java.Please give me suggestions ..
My WMI Output contains 2 rows with multiple columns, first row is header and second row contains data. I want either regex or any approach to seperate the header with corresponding data as a key value for hashmap.
I am not getting any idea how to proceed...
Caption Description IdentifyingNumber Name
Computer System Product Computer System Product HP xw4600 Workstation
Parsing output should be like ...
Key = Value
Caption = Computer System Product
Description = Computer System Product
IdentifyingNumber =
Name = HP xw4600 Workstation
If your file format is always the same, you can easy use parser:
FileInputStream fis = new FileInputStream(filename);
InputStreamReader isr = new InputStreamReader(fis);
Reader in = new BufferedReader(isr);
String[] array = new String[2];
for(int i = 0; i < 2; i++)
{
((BufferedReader)in).readLine();
}
for(int i = 0; i < array.length; i++)
{
array[i] = ((BufferedReader)in).readLine();
if(array[i] == null)
{
array[i] = ""; //$NON-NLS-1$
}
}
in.close();
String[] headers = array[0].split(Pattern.quote("\t")); //$NON-NLS-1$
String[] values = array[1].split(Pattern.quote("\t")); //$NON-NLS-1$
And then running through both and filling hashmap
I formatted the wmi output as a list and now its easy to formate the output.

Categories

Resources