Classify tweets in Java using Weka

Classify tweets in Java using Weka - java

I have some tweets to do sentiment analysis. Thus, i fetched tweets by using Twitter4J then i decided to use Weka libraries for using methods like KMeans,Naive Bayes, SVM etc.
Firstly, i moved tweets into a text file by hand, and wrote their classes myself. This is my training data. In my code i read this file and tried to train and test my model. But i got the error
"Exception in thread "main" weka.core.UnsupportedAttributeTypeException: Cannot handle string attributes!"
To fix it i used StringtoWordVector filter but it didn't work either. Here is my code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import weka.classifiers.Classifier;
import weka.classifiers.Evaluation;
import weka.classifiers.bayes.NaiveBayes;
import weka.classifiers.lazy.IBk;
import weka.classifiers.meta.FilteredClassifier;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instance;
import weka.core.Instances;
import weka.filters.unsupervised.attribute.StringToWordVector;
public class Driver {
public static BufferedReader readDataFile(String filename) {
BufferedReader inputReader = null;
try {
inputReader = new BufferedReader(new FileReader(filename));
} catch (FileNotFoundException ex) {
System.err.println("File not found: " + filename);
}
return inputReader;
}
public static void main(String[] args) throws Exception{
BufferedReader datafile = readDataFile("file.txt");
Instances data = new Instances(datafile);
data.setClassIndex(data.numAttributes() - 1);
FilteredClassifier fc = new FilteredClassifier();
/
Classifier cModel = (Classifier)new IBk();
cModel.buildClassifier(data);
StringToWordVector swv = new StringToWordVector();
fc.setFilter(swv);
fc.setClassifier(cModel);
// Test the model
Evaluation eTest = new Evaluation(data);
eTest.evaluateModel(cModel, data);
// Print the result à la Weka explorer:
String strSummary = eTest.toSummaryString();
System.out.println(strSummary);
// Get the confusion matrix
double[][] cmMatrix = eTest.confusionMatrix();
for(int row_i=0; row_i<cmMatrix.length; row_i++){
for(int col_i=0; col_i<cmMatrix.length; col_i++){
System.out.print(cmMatrix[row_i][col_i]);
System.out.print("|");
}
System.out.println();
}
}
}
I also want to show my file.txt:
#relation twitter
#attribute tweetMsg string
#attribute class{positive,negative,neutral}
#data
"bugün hava çok güzel",positive
"hiç iyi hissetmiyorum",negative
"hayat çok normal",neutral
"Diriliş Ertuğrul izlerken her türlü kumpasın döndüğünü görmek ama günün birinde Osmanlı Beyliği' nin kurulacağını bilmenin huzuru ?",positive
"Diriliş Ertuğrul dizisi ile tarihe merakim arttı ??",positive
"Kanka moralim bozuk diyorum boşver kanka gel diriliş ertuğrul izleyelim diyor yemin ederim kanka gibi kanka .",positive
"Diriliş Ertuğrul beni son zamanlarda futbol dışında TVde tutan tek yapım kurgusu, görseli süper",positive
"#kösemsultan Osmanlının gerçek yüzünü çıkardıkları için mi hoşunuza gitmiyor Diriliş Ertuğrul saçmalığın alası hadi onuda şikayet edin!!!",negative
"Benim için LeylaileMecnun neyse abim için Diriliş Ertuğrul da o.",neutral
"#MutlulukNeDiyeSorsalar diriliş Ertuğrul izlemek derim",positive
"beyler muhteşem yüz yıl kösemi izliyorum da diriliş ertuğrul bu diziye 10 takar. saray saray değil kadınlar hamamı sanki.",positive
"Diriliş Ertuğrul diziside ne boktan bir senaryo arkadaş. Herif 4 bölümde bir hain ilan edilip sonra obaya geri geliyor sonra yine hain :):)",negative
"Diriliş Ertuğrul izlemekten babama beyim dedim amk",neutral
"Diriliş ertuğrul haric bütün Türk dizileri saçmalik broo",positive
However, these tweets are in Turkish language. So, do you think i am going in right way? Or should i do something more complicated? Like firstly stemming the words etc.
Any help to my questions will be appreciated.

Read the error message:
Cannot handle string attributes!
obviously refers to this line:
#attribute tweetMsg string
The classifier IBk does not support string attributes.

Related

Set up URI or catalog resolver with Saxon/XQuery

I am developing a simple command line application in Java to mine data from a large XML data set (15,000+ XML files). I have chosen to use Saxon S9API as the XQuery processor for this. Everything works fine so long as there is open access to the internet where the parser used by Saxon can resolve the xsi:noNamespaceSchemaLocation URI (or any other I will assume).
I have scoured Stackoverflow, as well as general Google searching, for answers on how to provide a catalog to the XQuery processor. I have not found a good explanation on how to do so.
This is the simple code I have at this point, which as I stated works fine when there is open access the Internet:
package ipd.part.info.mining.app;
import java.io.File;
import java.util.List;
import java.util.Scanner;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import net.sf.saxon.Configuration;
import net.sf.saxon.TransformerFactoryImpl;
import net.sf.saxon.s9api.DOMDestination;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.QName;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.XQueryCompiler;
import net.sf.saxon.s9api.XQueryEvaluator;
import net.sf.saxon.s9api.XQueryExecutable;
import net.sf.saxon.s9api.XdmAtomicValue;
import net.sf.saxon.lib.*;
import static org.apache.xerces.jaxp.JAXPConstants.JAXP_SCHEMA_LANGUAGE;
import static org.apache.xerces.jaxp.JAXPConstants.W3C_XML_SCHEMA;
import org.apache.xerces.util.XMLCatalogResolver;
import org.apache.xml.resolver.tools.CatalogResolver;
import org.w3c.dom.Document;
import org.xml.sax.ErrorHandler;
/**
*
* #author tfurst
*/
public class IPDPartInfoMiningApp {
/**
* #param args the command line arguments
*/
private static Scanner scanner = new Scanner(System.in);
private static String ietmPath;
private static String outputPath;
private static CatalogResolver resolver;
private static org.apache.xerces.util.XMLCatalogResolver xres;
private static ErrorHandler eHandler;
private static DocumentBuilderFactory DBF;
private static DocumentBuilder DB;
public static void main(String[] args) {
initDb();
try {
// TODO code application logic here
System.out.println("Enter path to complete IETM Export:");
ietmPath = scanner.nextLine();
System.out.println("Enter path to save report:");
outputPath = scanner.nextLine();
Processor proc = new Processor(true);
XQueryCompiler comp = proc.newXQueryCompiler();
//File xq = fixXquery(new File(XQ));
//XQueryExecutable exp = comp.compile(xq);
XQueryExecutable exp = comp.compile("declare variable $path external;\n" +
"\n" +
"let $coll := collection(concat($path,'?select=*.xml'))//itemSequenceNumber \n" +
"\n" +
"return\n" +
"<parts>\n" +
"{\n" +
" for $mod in $coll\n" +
" let $pn := normalize-space($mod/partNumber)\n" +
" let $nomen := $mod/partIdentSegment[1]/descrForPart\n" +
" let $smr := $mod/locationRcmdSegment/locationRcmd/sourceMaintRecoverability\n" +
" order by $pn\n" +
" return <part pn=\"{$pn}\" nomen=\"{$nomen}\" smr=\"{$smr}\"/>\n" +
"}\n" +
"</parts>");
//Serializer out = proc.newSerializer(System.out);
Document dom = DB.newDocument();
XQueryEvaluator ev = exp.load();
ev.setExternalVariable(new QName("path"), new XdmAtomicValue(new File(ietmPath).toPath().toUri().toString().substring(0, new File(ietmPath).toPath().toUri().toString().lastIndexOf("/"))));
ev.run(new DOMDestination(dom));
TransformerFactoryImpl tfact = new net.sf.saxon.TransformerFactoryImpl();
Transformer trans = tfact.newTransformer();
DOMSource src = new DOMSource(dom);
StreamResult res = new StreamResult(new File(outputPath + File.separator + "output.xml"));
trans.transform(src, res);
} catch (SaxonApiException | TransformerException ex) {
Logger.getLogger(IPDPartInfoMiningApp.class.getName()).log(Level.SEVERE, null, ex);
}
}
private static XMLCatalogResolver createXMLCatalogResolver(CatalogResolver resolver)
{
int i = 0;
List files = resolver.getCatalog().getCatalogManager().getCatalogFiles();
String[] catalogs = new String[files.size()];
XMLCatalogResolver xcr = new XMLCatalogResolver();
for(Object file : files)
{
catalogs[i] = new File(file.toString()).getAbsolutePath();
}
xcr.setCatalogList(catalogs);
return xcr;
}
private static void initDb()
{
try
{
resolver = new CatalogResolver();
eHandler = new DocumentErrorHandler();
xres = createXMLCatalogResolver(resolver);
DBF = DocumentBuilderFactory.newInstance();
DBF.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
DBF.setNamespaceAware(true);
DB = DBF.newDocumentBuilder();
DB.setEntityResolver(xres);
DB.setErrorHandler(eHandler);
}
catch (ParserConfigurationException ex)
{
ex.printStackTrace();
}
}
}
I am receiving this error when I disconnect my machine from the network:
C:\Users\tfurst\Desktop\XQuery Test\testXml\test\tool>java -jar IPD_Part_Info_Mining_App.jar
Enter path to complete IETM Export:
C:\Users\tfurst\Desktop\Wire Repl Testing
Enter path to save report:
C:\Users\tfurst\Desktop\Wire Repl Testing\report
Error on line 6 column 2
collection(): failed to parse XML file
file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: I/O error reported by XML parser processing file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: Read timed out
Aug 20, 2019 2:55:23 PM ipd.part.info.mining.app.IPDPartInfoMiningApp main
SEVERE: null
net.sf.saxon.s9api.SaxonApiException: collection(): failed to parse XML file file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: I/O error reported by XML parser processing file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: Read timed out
at net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:372)
at ipd.part.info.mining.app.IPDPartInfoMiningApp.main(IPDPartInfoMiningApp.java:80)
Caused by: net.sf.saxon.trans.XPathException: collection(): failed to parse XML file file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: I/O error reported by XML parser processing file:/C:/Users/tfurst/Desktop/Wire%20Repl%20Testing/DMC-HH60W-A-52-21-0001-04AAA-520A-B.xml: Read timed out
at net.sf.saxon.resource.XmlResource.getItem(XmlResource.java:113)
at net.sf.saxon.functions.CollectionFn$2.mapItem(CollectionFn.java:246)
at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:113)
at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:108)
at net.sf.saxon.expr.ItemMappingIterator.next(ItemMappingIterator.java:108)
at net.sf.saxon.om.FocusTrackingIterator.next(FocusTrackingIterator.java:85)
at net.sf.saxon.expr.ContextMappingIterator.next(ContextMappingIterator.java:59)
at net.sf.saxon.expr.sort.DocumentOrderIterator.<init>(DocumentOrderIterator.java:47)
at net.sf.saxon.expr.sort.DocumentSorter.iterate(DocumentSorter.java:230)
at net.sf.saxon.expr.flwor.ForClausePush.processTuple(ForClausePush.java:34)
at net.sf.saxon.expr.flwor.FLWORExpression.process(FLWORExpression.java:841)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:337)
at net.sf.saxon.expr.instruct.ElementCreator.processLeavingTail(ElementCreator.java:284)
at net.sf.saxon.expr.instruct.Instruction.process(Instruction.java:151)
at net.sf.saxon.query.XQueryExpression.run(XQueryExpression.java:411)
at net.sf.saxon.s9api.XQueryEvaluator.run(XQueryEvaluator.java:370)
... 1 more
C:\Users\tfurst\Desktop\XQuery Test\testXml\test\tool>pause
Press any key to continue . . .
I am sure this is probably a relatively simple fix, most likely something I have overlooked. I know how to handle this when working with XSL tranformations, by supplying a catalog and the location of the schemas. Thanks in advance for any help, much appreciated.

To use an XML catalog file something like the following in your code should work:
Processor proc = new Processor(false); //false for Saxon-HE
XQueryCompiler compiler = proc.newXQueryCompiler();
XmlCatalogResolver.setCatalog("path/catalog.xml", proc.getUnderlyingConfiguration(), false);
...

How to create an attribute in Weka

I working on a data mining project using WEKA in Java and the instructions says that I have to create an Attribute object for each attribute in the dataset and add them to a FastVector. I try to look at the API but I don't think I'm doing it right can someone show me the right way to do it. I'm using the iris.arff file
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instances;
import weka.core.converters.ArffSaver;
public class StartWeka {
public static void main(String[]args)throws Exception{
Instances dataset = new Instances(new BufferedReader(new FileReader("C:/Users/Student/workspace/Data Mining/src/iris.arff.txt")));
Instances train = new Instances(dataset);
train.setClassIndex(train.numAttributes()-1);
System.out.println(dataset.toSummaryString());
Attribute a1 = new Attribute("sepallength", 0);
Attribute a2 = new Attribute("sepalwidth", 1);
Attribute a3 = new Attribute("petalwidth", 2);
FastVector attrs = new FastVector();
attrs.addElement(a1);
}
}

FastVector is deprecated. You can use an ArrayList instead.
If you use an arff file, however, you don't have to do any of that. You can just do the following:
ArffLoader loader = new ArffLoader();
loader.setFile(new File("iris.arff");
Instances structure = loader.getStructure();
structure.setClassIndex(structure.numAttributes() - 1);
From here, you can create a classifier based on your instances. (structure).

Jena Fuseki API add new data to an exsisting dataset [java]

i was trying to upload an RDF/OWL file to my Sparql endpoint (given by Fuseki). Right now i'm able to upload a single file, but if i try to repeat the action, the new dataset will override the old one. I'm searching a way to "merge" the content of the data in the dataset with the new ones of the rdf file just uploaded. Anyone can help me? thanks.
Following the code to upload/query the endpoint (i'm not the author)
// Written in 2015 by Thilo Planz
// To the extent possible under law, I have dedicated all copyright and related and neighboring rights
// to this software to the public domain worldwide. This software is distributed without any warranty.
// http://creativecommons.org/publicdomain/zero/1.0/
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.ByteArrayOutputStream;
import org.apache.jena.query.DatasetAccessor;
import org.apache.jena.query.DatasetAccessorFactory;
import org.apache.jena.query.QueryExecution;
import org.apache.jena.query.QueryExecutionFactory;
import org.apache.jena.query.QuerySolution;
import org.apache.jena.query.ResultSet;
import org.apache.jena.query.ResultSetFormatter;
import org.apache.jena.rdf.model.Model;
import org.apache.jena.rdf.model.ModelFactory;
import org.apache.jena.rdf.model.RDFNode;
class FusekiExample {
public static void uploadRDF(File rdf, String serviceURI)
throws IOException {
// parse the file
Model m = ModelFactory.createDefaultModel();
try (FileInputStream in = new FileInputStream(rdf)) {
m.read(in, null, "RDF/XML");
}
// upload the resulting model
DatasetAccessor accessor = DatasetAccessorFactory.createHTTP(serviceURI);
accessor.putModel(m);
}
public static void execSelectAndPrint(String serviceURI, String query) {
QueryExecution q = QueryExecutionFactory.sparqlService(serviceURI,
query);
ResultSet results = q.execSelect();
// write to a ByteArrayOutputStream
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
//convert to JSON format
ResultSetFormatter.outputAsJSON(outputStream, results);
//turn json to string
String json = new String(outputStream.toByteArray());
//print json string
System.out.println(json);
}
public static void execSelectAndProcess(String serviceURI, String query) {
QueryExecution q = QueryExecutionFactory.sparqlService(serviceURI,
query);
ResultSet results = q.execSelect();
while (results.hasNext()) {
QuerySolution soln = results.nextSolution();
// assumes that you have an "?x" in your query
RDFNode x = soln.get("x");
System.out.println(x);
}
}
public static void main(String argv[]) throws IOException {
// uploadRDF(new File("test.rdf"), );
uploadRDF(new File("test.rdf"), "http://localhost:3030/MyEndpoint/data");
}
}

Use accessor.add(m) instead of putModel(m). As you can see in the Javadoc, putModel replaces the existing data.

unable to train location.bin using opennlp with java

I am trying to train en-ner-location.bin file using opennlp in java The thing is i got the training text file in the following format
<START:location> Fontana <END>
<START:location> Palo Verde <END>
<START:location> Picacho <END>
and i trained the file using the following code
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.nio.charset.Charset;
import java.util.Collections;
import opennlp.tools.namefind.NameFinderME;
import opennlp.tools.namefind.NameSample;
import opennlp.tools.namefind.NameSampleDataStream;
import opennlp.tools.namefind.TokenNameFinderModel;
import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.Span;
public class TrainNames {
#SuppressWarnings("deprecation")
public void TrainNames() throws IOException{
File fileTrainer=new File("citytrain.txt");
File output=new File("en-ner-location.bin");
ObjectStream<String> lineStream = new PlainTextByLineStream(new FileInputStream(fileTrainer), "UTF-8");
ObjectStream<NameSample> sampleStream = new NameSampleDataStream(lineStream);
System.out.println("lineStream = " + lineStream);
TokenNameFinderModel model = NameFinderME.train("en", "location", sampleStream, Collections.<String, Object>emptyMap(), 1, 0);
BufferedOutputStream modelOut = null;
try {
modelOut = new BufferedOutputStream(new FileOutputStream(output));
model.serialize(modelOut);
} finally {
if (modelOut != null)
modelOut.close();
}
}
}
I got no errors or warnings but when i try to get a city name from a string like this cnt="John is planning to specialize in Electrical Engineering in UC Fontana and pursue a career with IBM."; It returns the whole string
anybody could tell me why...??

Welcome to SO! Looks like you need more context around each location annotation. I believe right now openNLP thinks you are training it to find words (any word) because your training data has only one word. You need to annotate locations within whole sentences and you will need at least a few hundred samples to start seeing good results.
See this answer as well:
How I train an Named Entity Recognizer identifier in OpenNLP?

Merging MS Word documents with Java

I'm looking for java libraries that read and write MS Word Document.
What I have to do is:
read a template file, .dot or .doc, and fill it with some data read from DB
take data from another Word document and merging that with the file described above, preserving paragraphs formats
users may make updates to the file.
I've searched and found POI Apache and UNO OpenOffice.
The first one can easily read a template and replace any placeholders with my own data from DB. I didn't found anything about merging two, or more, documents.
OpenOffice UNO looks more stable but complex too. Furthermore I'm not sure that it has the ability to merge documents..
We are looking the right direction?
Another solution i've thought was to convert doc file to docx. In that way I found more libraries that can help us merging documents.
But how can I do that?
Thanks!

You could take a look at Docmosis since it provides the four features you have mentioned (data population, template/document merging, DOC format and java interface). It has a couple of flavours (download, online service), but you could sign up for a free trial of the cloud service to see if Docmosis can do what you want (then you don't have to install anything) or read the online documentation.
It uses OpenOffice under the hood (you can see from the developer guide installation instructions) which does pretty decent conversions between documents. The UNO API has some complications - I would suggest either Docmosis or JODReports to isolate your project from UNO directly.
Hope that helps.

import java.io.File;
import java.util.List;
import javax.xml.bind.JAXBException;
import org.docx4j.dml.CTBlip;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.Part;
import org.docx4j.openpackaging.parts.PartName;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageBmpPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageEpsPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageGifPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageJpegPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImagePngPart;
import org.docx4j.openpackaging.parts.WordprocessingML.ImageTiffPart;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart;
import org.docx4j.openpackaging.parts.relationships.RelationshipsPart.AddPartBehaviour;
import org.docx4j.relationships.Relationship;
public class MultipleDocMerge {
public static void main(String[] args) throws Docx4JException, JAXBException {
File first = new File("D:\\Mreg.docx");
File second = new File("D:\\Mreg1.docx");
File third = new File("D:\\Mreg4&19.docx");
File fourth = new File("D:\\test12.docx");
WordprocessingMLPackage f = WordprocessingMLPackage.load(first);
WordprocessingMLPackage s = WordprocessingMLPackage.load(second);
WordprocessingMLPackage a = WordprocessingMLPackage.load(third);
WordprocessingMLPackage e = WordprocessingMLPackage.load(fourth);
List body = s.getMainDocumentPart().getJAXBNodesViaXPath("//w:body", false);
for(Object b : body){
List filhos = ((org.docx4j.wml.Body)b).getContent();
for(Object k : filhos)
f.getMainDocumentPart().addObject(k);
}
List body1 = a.getMainDocumentPart().getJAXBNodesViaXPath("//w:body", false);
for(Object b : body1){
List filhos = ((org.docx4j.wml.Body)b).getContent();
for(Object k : filhos)
f.getMainDocumentPart().addObject(k);
}
List body2 = e.getMainDocumentPart().getJAXBNodesViaXPath("//w:body", false);
for(Object b : body2){
List filhos = ((org.docx4j.wml.Body)b).getContent();
for(Object k : filhos)
f.getMainDocumentPart().addObject(k);
}
List<Object> blips = e.getMainDocumentPart().getJAXBNodesViaXPath("//a:blip", false);
for(Object el : blips){
try {
CTBlip blip = (CTBlip) el;
RelationshipsPart parts = e.getMainDocumentPart().getRelationshipsPart();
Relationship rel = parts.getRelationshipByID(blip.getEmbed());
Part part = parts.getPart(rel);
if(part instanceof ImagePngPart)
System.out.println(((ImagePngPart) part).getBytes());
if(part instanceof ImageJpegPart)
System.out.println(((ImageJpegPart) part).getBytes());
if(part instanceof ImageBmpPart)
System.out.println(((ImageBmpPart) part).getBytes());
if(part instanceof ImageGifPart)
System.out.println(((ImageGifPart) part).getBytes());
if(part instanceof ImageEpsPart)
System.out.println(((ImageEpsPart) part).getBytes());
if(part instanceof ImageTiffPart)
System.out.println(((ImageTiffPart) part).getBytes());
Relationship newrel = f.getMainDocumentPart().addTargetPart(part,AddPartBehaviour.RENAME_IF_NAME_EXISTS);
blip.setEmbed(newrel.getId());
f.getMainDocumentPart().addTargetPart(e.getParts().getParts().get(new PartName("/word/"+rel.getTarget())));
} catch (Exception ex){
ex.printStackTrace();
} }
File saved = new File("D:\\saved1.docx");
f.save(saved);
}
}

I've developed the next class (using Apache POI):
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBody;
public class WordMerge {
private final OutputStream result;
private final List<InputStream> inputs;
private XWPFDocument first;
public WordMerge(OutputStream result) {
this.result = result;
inputs = new ArrayList<>();
}
public void add(InputStream stream) throws Exception{
inputs.add(stream);
OPCPackage srcPackage = OPCPackage.open(stream);
XWPFDocument src1Document = new XWPFDocument(srcPackage);
if(inputs.size() == 1){
first = src1Document;
} else {
CTBody srcBody = src1Document.getDocument().getBody();
first.getDocument().addNewBody().set(srcBody);
}
}
public void doMerge() throws Exception{
first.write(result);
}
public void close() throws Exception{
result.flush();
result.close();
for (InputStream input : inputs) {
input.close();
}
}
}
And its use:
public static void main(String[] args) throws Exception {
FileOutputStream faos = new FileOutputStream("/home/victor/result.docx");
WordMerge wm = new WordMerge(faos);
wm.add( new FileInputStream("/home/victor/001.docx") );
wm.add( new FileInputStream("/home/victor/002.docx") );
wm.doMerge();
wm.close();
}

The Apache POI code does not work for Images.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Classify tweets in Java using Weka - java

Read the error message: Cannot handle string attributes! obviously refers to this line: #attribute tweetMsg string The classifier IBk does not support string attributes.

Related

Set up URI or catalog resolver with Saxon/XQuery

How to create an attribute in Weka

Jena Fuseki API add new data to an exsisting dataset [java]

unable to train location.bin using opennlp with java

Merging MS Word documents with Java

Categories

Resources