Add content that prevents page breaks inside - java

Using docx4j I'm adding multiple dynamically filled subTemplates to my main template.
I don't want to have page breaks inside those subTemplates (unless even a whole page is too small for one).
Therefore: If a subTemplate would break inside, I want to move the whole subTemplate to the next page.
How do I do this?
My code so far:
//...
WordprocessingMLPackage mainTemplate = getWp();//ignore this method
List<WordprocessingMLPackage> projectTemplates = new ArrayList<>();
List<Project> projects = getProjects();//ignore this method
for (Project project : projects) {
WordprocessingMLPackage template = getWpProject();//ignore this method
//fill template with content from project
//...
projectList.add(template);
}
//Here's the part that will have to be changed I think:
//Since the projectTemplate only consists of tables I just added all its tables to the main template
for (WordprocessingMLPackage temp : projectTemplates){
List<Object> tables = doc.getAllElementFromObject(temp.getMainDocumentPart(), Tbl.class);
for (Object table : tables) {
mainTemplate.getMainDocumentPart().addObject(table);
}
}
If you can think of a way to change the .docx template with Word to achieve my goal feel free to suggest it.
And if you have suggestions for code improvement in general just write a comment.

I made this "workaround" that works nicely for me:
I count all rows together and also check if the text inside the rows breaks (with an approximate threshold).
Then I add the rows of each project up and as soon as there are too many rows I insert a break before the current project and start over.
final int maxRowCountPerPage = 44;
final int maxLettersPerLineInDescr = 55;
int totalRowCount = 0;
WordprocessingMLPackage mainTemplate = getWp();
//Iterate over projects
for (Project project : getProjects()) {
WordprocessingMLPackage template = this.getWpProject();
String projectDescription = project.getDescr();
//Fill template...
//Count the lines
int rowsInProjectDescr = (int) Math.floor((double) projectDescription.length() / maxLettersPerLineInDescr);
int projectRowCount = 0;
List<Object> tables = doc.getAllElementFromObject(template.getMainDocumentPart(), Tbl.class);
for (Object table : tables) {
List<Object> rows = doc.getAllElementFromObject(table, Tr.class);
int tableRowCount = rows.size();
projectRowCount += tableRowCount;
}
//System.out.println("projectRowCount before desc:" + projectRowCount);
projectRowCount += rowsInProjectDescr;
//System.out.println("projectRowCount after desc:" + projectRowCount);
totalRowCount += projectRowCount;
//System.out.println("totalRowCount: " + totalRowCount);
//Break page if too many lines for page
if (totalRowCount > maxRowCountPerPage) {
addPageBreak(wp);
totalRowCount = projectRowCount;
}
//Add project template to main template
for (Object table : tables) {
mainTemplate.getMainDocumentPart().addObject(table);
}
}
If you notice a way to make the code nicer, let me know in a comment!

Related

Set items in table java web scraping

Okey so my problem is next: I use web scraping to take some data from web page IMDB in this case, that data is titles of movies, and I already tried to print it in console and that works fine. My problem is that I can not save that titles in my table columns, I put all needed codes for this problem, and I cant find where I made mistake and why titles wont show in table columns. At the and, user need to pick one title and that title need to be stored in text field. Have someone any idea, please?
I have table:
TableColumn izborAuta = new TableColumn("Izbor auta");
TableColumn lokacijaPreuzimanja = new TableColumn("Lokacija
preuzimanja");
TableColumn lokacijaVracanja = new TableColumn("Lokacija Vracanja");
TableColumn cena = new TableColumn("Cena");
I have this code to setup columns:
izborAuta.setCellValueFactory(new PropertyValueFactory<Vozila, String>
("izborAuta"));
lokacijaPreuzimanja.setCellValueFactory(new
PropertyValueFactory<Vozila, String>("lokacijaPreuzimanja"));
lokacijaVracanja.setCellValueFactory(new
PropertyValueFactory<Vozila, String>("lokacijaVracanja"));
cena.setCellValueFactory(new PropertyValueFactory<Vozila, String>
("cena"));
tableView.setItems(Baza.baza.prikazBaze());
tableView.getColumns().addAll(izborAuta, lokacijaPreuzimanja,
lokacijaVracanja, cena);
I have this code, so when I pick one item from table that item need to be stored in textField:
tableView.setOnMouseClicked((e) -> {
Vozila v = (Vozila)
tableView.getSelectionModel().getSelectedItem();
txIzborAuta.setText(v.getIzborAuta());
txLokacijaPreuzimanja.setText(v.getLokacijaPreuzimanja());
txLokacijaVracanja.setText(v.getLokacijaVracanja());
txCena.setText(v.getCena());
});
And at the end I use web scraping to save items in table:
Document doc = Jsoup.connect("https://www.imdb.com/chart/top?
ref_=nv_mv_250").get();
Elements elems = doc.select("table.chart.full-width");
for (Element e : elems) {
String izborAuta = e.select(".titleColumn").text();
String lokacijaPreuzimanja = e.select(".titleColumn").text();
String lokacijaVracanja = e.select(".titleColumn").text();
String cena = e.select(".titleColumn").text();
Vozila v = new Vozila();
v.setIzborAuta(izborAuta);
v.setLokacijaPreuzimanja(lokacijaPreuzimanja);
v.setLokacijaVracanja(lokacijaVracanja);
v.setCena(cena + " " + "RSD");
Baza.insertVozila(v);
}

Cannot get text from table using docx4j

I'm simply trying to output the data found in tables, however I have only managed to print out memory locations and other obj info. Here I'm using a tablefinder to locate all of my tables in a word doc then traversing through them. I'm just so unbelievably stuck how to print out the data contained in these tables. Below is an image of the Text.docx I am working with along with a snippet of the code. To be clear I'm not sure if I should accessing a table row (Tr) as this code snippet shows, or the parent Tbl object to print out the text contained within the table. In this case, I just want it to print "I", "Just", "Want"... etc.
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("C:\\Users\\1120248\\Test\\Test.docx"));
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
TableFinder finder = new TableFinder();
new TraversalUtil(documentPart.getContent(), finder);
System.out.println("Found " + finder.tblList.size() + "tables");
for (Object o : finder.tblList) {
Object o2 = XmlUtils.unwrap(o);
if (o2 instanceof org.docx4j.wml.Tbl) {
Tbl tbl = (Tbl)o2;
Tr t = (Tr)tbl.getContent().get(0);
System.out.println(t.getContent());
System.out.println(t.toString());
System.out.println(XmlUtils.unwrap(t.getContent().get(0)));
}
}
This is the output produced by this setup:
[javax.xml.bind.JAXBElement#a146b11, javax.xml.bind.JAXBElement#f438904, javax.xml.bind.JAXBElement#4ed5a1b0, javax.xml.bind.JAXBElement#18d003cd, javax.xml.bind.JAXBElement#3135bf25, javax.xml.bind.JAXBElement#22ad1bae]
org.docx4j.wml.Tr#4116f66a
org.docx4j.wml.Tc#59c04bee
Work for me
TableFinder finder = new TableFinder();
finder.walkJAXBElements(documentPart.getContent());
For those who will be stuck on this question. For visibility, the comment by #JasonPlutext is the answer. Tr - Tc - P - R - Text. Table row, to Table cell to Paragraph and R and then add text.

How to fetch data of multiple HTML tables through Web Scraping in Java

I was trying to scrape the data of a website and to some extents I succeed in my goal. But, there is a problem that the web page I am trying to scrape have got multiple HTML tables in it. Now, when I execute my program it only retrieves the data of the first table in the CSV file and not retrieving the other tables. My java class code is as follows.
public static void parsingHTML() throws Exception {
//tbodyElements = doc.getElementsByTag("tbody");
for (int i = 1; i <= 1; i++) {
Elements table = doc.getElementsByTag("table");
if (table.isEmpty()) {
throw new Exception("Table is not found");
}
elements = table.get(0).getElementsByTag("tr");
for (Element trElement : elements) {
trElement2 = trElement.getElementsByTag("tr");
tdElements = trElement.getElementsByTag("td");
File fold = new File("C:\\convertedCSV9.csv");
fold.delete();
File fnew = new File("C:\\convertedCSV9.csv");
FileWriter sb = new FileWriter(fnew, true);
//StringBuilder sb = new StringBuilder(" ");
//String y = "<tr>";
for (Iterator<Element> it = tdElements.iterator(); it.hasNext();) {
//Element tdElement1 = it.next();
//final String content2 = tdElement1.text();
if (it.hasNext()) {
sb.append("\r\n");
}
for (Iterator<Element> it2 = trElement2.iterator(); it.hasNext();) {
Element tdElement2 = it.next();
final String content = tdElement2.text();
//stringjoiner.add(content);
//sb.append(formatData(content));
if (it2.hasNext()) {
sb.append(formatData(content));
sb.append(" , ");
}
if (!it.hasNext()) {
String content1 = content.replaceAll(",$", " ");
sb.append(formatData(content1));
//it2.next();
}
}
System.out.println(sb.toString());
sb.flush();
sb.close();
}
System.out.println(sampleList.add(tdElements));
}
}
}
What I analyze is that there is a loop which is only checking tr tds. So, after first table there is a style sheet on the HTML page. May be due to style sheet loop is breaking. I think that's the reason it is proceeding to the next table.
P.S: here's the link which I am trying to scrap
http://www.mufap.com.pk/nav_returns_performance.php?tab=01
What you do just at the beginning of your code will not work:
// loop just once, why
for (int i = 1; i <= 1; i++) {
Elements table = doc.getElementsByTag("table");
if (table.isEmpty()) {
throw new Exception("Table is not found");
}
elements = table.get(0).getElementsByTag("tr");
Here you loop just once, read all table elements and then process all tr elements for the first table you find. So even if you would loop more than once, you would always process the first table.
You will have to iterate all table elements, e.g.
for(Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
// process "td"s and so on
}
}
Edit Since you're having troubles with the code above, here's a more thorough example. Note that I'm using Jsoup to read and parse the HTML (you didn't specify what you are using)
Document doc = Jsoup
.connect("http://www.mufap.com.pk/nav_returns_performance.php?tab=01")
.get();
for (Element table : doc.getElementsByTag("table")) {
for (Element trElement : table.getElementsByTag("tr")) {
// skip header "tr"s and process only data "tr"s
if (trElement.hasClass("tab-data1")) {
StringJoiner tdj = new StringJoiner(",");
for (Element tdElement : trElement.getElementsByTag("td")) {
tdj.add(tdElement.text());
}
System.out.println(tdj);
}
}
}
This will concat and print all data cells (those having the class tab-data1). You will still have to modify it to write to your CSV file though.
Note: in my tests this processes 21 tables, 243 trs and 2634 tds.

Getting issue while trying to select rows in sequence via Selenium Webdriver

I have table in website. Table allows to select multiple rows by pressing Shift key + Down arrow keys.
I am trying to perform same using selenium webdriver but it's not selecting rows one by one, it select row then unselect it and goes to next....
My Code :
List<WebElement> TRcount = driver.findElements(By.tagName("tr"));
int x;
for(x=0;x<TRcount.size();x++)
{
Actions rows = new Actions(Base.getdriver());
rows.keyDown(TRcount.get(x),Keys.SHIFT).keyUp(TRcount.get(x+1), Keys.SHIFT).build();
rows.build().perform();
TRcount.get(x).click();
}
You pressing keyDown and keyUp. Try
Actions rows = new Actions(Base.getdriver());
rows.keyDown(Keys.SHIFT).perform();
for(x = 0 ; x < TRcount.size() ; x++)
{
TRcount.get(x).click();
}
rows.keyUp(Keys.SHIFT).perform();
By the way, perform() is doing build(), no need to call them both.
I believe this should be:
List<WebElement> TRcount = driver.findElements(By.tagName("tr"));
int x;
Actions rows = new Actions(Base.getdriver());
rows = rows.keyDown(Keys.SHIFT).build();
for(x=0;x<TRcount.size();x++)
{
rows = rows.sendKeys(TRcount.get(x),Keys.DOWN).build();
}
rows = rows.keyUp(Keys.SHIFT).build();
rows.build().perform();
If you have public URL to replicate this then We could try it more easily.

how to Create table in word doc using docx4j in specific bookmark without overwritting the word doc

I need to create a table at the location of particular bookmark. ie i need to find the bookmark and insert the table . how can i do this using docx4j
Thanks in Advance
Sorry Jason, I am new to Stackoverflow so i couldnt write my problem clearly, here is my situation and problem.
I made changes in that code as you suggested and to my needs, and the code is here
//loop through the bookmarks
for (CTBookmark bm : rt.getStarts()) {
// do we have data for this one?
String bmname =bm.getName();
// find the right bookmark (in this case i have only one bookmark so check if it is not null)
if (bmname!=null) {
String value = "some text for testing run";
//if (value==null) continue;
List<Object> theList = null;
//create bm list
theList = ((ContentAccessor)(bm.getParent())).getContent();
// I set the range as 1 (I assume this start range is to say where the start the table creating)
int rangeStart = 1;
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
// create the table
Tbl table = factory.createTbl();
//add boards to the table
addBorders(table);
for(int rows = 0; rows<1;rows++)
{// create a row
Tr row = factory.createTr();
for(int colm = 0; colm<1;colm++)
{
// create a cell
Tc cell = factory.createTc();
// add the content to cell
cell.getContent().add(wordPackage.getMainDocumentPart()
.createParagraphOfText("cell"+colm));
// add the cell to row
row.getContent().add(cell);
}
// add the row to table
table.getContent().add(row);
// now add a run (to test whether run is working or not)
org.docx4j.wml.R run = factory.createR();
org.docx4j.wml.Text t = factory.createText();
run.getContent().add(t);
t.setValue(value);
//add table to list
theList.add(rangeStart, table);
//add run to list
//theList.add(rangeStart, run);
}
I dont need to delete text in bookmark so i removed it.
I dont know whats the problem, program is compiling but I cannot open the word doc , it says "unknown error". I test to write some string "value" it writes perfectly in that bookmark and document is opening but not in the case of table. Please help me
Thanks in advance
You can adapt the sample code BookmarksReplaceWithText.java
In your case:
line 89: the parent won't be p, it'll be body or tc. You could remove the test.
line 128: instead of adding a run, you want to insert a table
You can use TblFactory to create your table, or the docx4j webapp to generate code from a sample docx.
For some reason bookmark replacement with table didn't workout for me, so I relied on text replacement with table. I created my tables from HTML using XHTML importer for my use case
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
String xhtml= <your table HTML>;
XHTMLImporterImpl XHTMLImporter = new XHTMLImporterImpl(wordMLPackage);
int ct = 0;
List<Integer> tableIndexes = new ArrayList<>();
List<Object> documentContents = documentPart.getContent();
for (Object o: documentContents) {
if (o.toString().contains("PlaceholderForTable1")) {
tableIndexes.add(ct);
}
ct++;
}
for (Integer i: tableIndexes) {
documentPart.getContent().remove(i.intValue());
documentPart.getContent().addAll(i.intValue(), XHTMLImporter.convert( xhtml, null));
}
In my input word doc, I defined text 'PlaceholderForTable1' where I want to insert my table.

Categories

Resources