what is the best way to print invoice in java

what is the best way to print invoice in java - java

in my desktop application (POS System). I used IText api for generating invoices and printing, but my printer thermal invoice printer don't support .pdf file. only supporting text file and .docx file. i use simple text file printer print whole invoice in long vertical single word line and don't auto cut page. I used .docx file which works good, i got print as i designed. but my program first open document in ms word then give me print. my code is:
try
{
FileOutputStream output = new FileOutputStream(FILE);
XWPFDocument doc = new XWPFDocument();
CTBody body = doc.getDocument().getBody();
if(!body.isSetSectPr()){
body.addNewSectPr();
}
CTSectPr section = body.getSectPr();
if(!section.isSetPgSz()){
section.addNewPgSz();
}
CTPageSz pageSize = section.getPgSz();
pageSize.setOrient(STPageOrientation.PORTRAIT);
int value = 4000+(gui.model.getRowCount()*1000);
pageSize.setW(BigInteger.valueOf(4050));
pageSize.setH(BigInteger.valueOf(value));
CTPageMar pageMar = section.addNewPgMar();
pageMar.setLeft(BigInteger.valueOf(400L));
pageMar.setTop(BigInteger.valueOf(0L));
pageMar.setRight(BigInteger.valueOf(0L));
pageMar.setBottom(BigInteger.valueOf(0L));
XWPFParagraph para = doc.createParagraph();
para.setAlignment(ParagraphAlignment.LEFT);
XWPFRun run = para.createRun();
para.setWordWrap(true);
run.setBold(true);
run.setFontSize(10);
run.setText(" "+address.shopName);
run.addBreak();
run.setText(" "+address.phoneNo);
run.addBreak();
run.setText(" "+address.description);
run.addBreak();
para = doc.createParagraph();
para.setAlignment(ParagraphAlignment.LEFT);
run = para.createRun();
para.setWordWrap(true);
run.setFontSize(10);
run.setText("Invoice No."+invoiceno);
run.addBreak();
run.setText("Type: "+table);
run.addBreak();
run.setText("Customer Name: "+name+" "+tempObj);
run.addBreak();
run.setText("--------------------------------------------------------");
run.addBreak();
run.setText("Product Qty Price Total");
run.addBreak();
run.setText("--------------------------------------------------------");
run.addBreak();
String temp = null;
for(int i = 0 ; i < gui.table.getRowCount(); i++){
temp = gui.table.getValueAt(i, 1).toString();
String quanstr = gui.table.getValueAt(i, 2)+"";
String unitPricestr = gui.table.getValueAt(i, 3)+"";
String totalstr =gui.table.getValueAt(i, 4)+"";
run.setText(temp);run.addBreak();
run.setText(" "+quanstr+" "+unitPricestr+" "+totalstr);
run.addBreak();
}
double subTotal = tableTotalCounter();
run.setText("--------------------------------------------------------");run.addBreak();
run.setText("Discount: "+dis+"%");run.addBreak();
run.setText("Sub total: "+(subTotal - (subTotal*dis/100)));run.addBreak();
run.setText("Cash: "+cash);run.addBreak();
run.setText("Balance: "+(cash-(subTotal - (subTotal*dis/100))));
run.addBreak();
doc.write(output);
output.close();
} catch (FileNotFoundException e1) {
// TODO Auto-generated catch block
System.out.println("Exception");
e1.printStackTrace();
}catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.out.println("Exception");
}
if(confirmation("Print invoice?","Confirmation")==0){
Desktop desktop = Desktop.getDesktop();
try {
desktop.print(new File(FILE));
} catch (IOException e) {
e.printStackTrace();
}
}
please tell me how to print without getting that file open. and there is any other way to print invoice.

Format your invoice in a string and pass to the code I have pasted below. Before executing this code print a test page (windows) (Linux) to make sure your printer is configured correctly.
public class GenerateInvoice {
public static void printInvoice(String invoice) {
try {
PrintService mPrinter = null;
Boolean bFoundPrinter = false;
PrintService[] printServices = PrinterJob.lookupPrintServices();
for (PrintService printService : printServices) {
String sPrinterName = printService.getName();
if (sPrinterName.equals("Black Cobra")) {
mPrinter = printService;
bFoundPrinter = true;
}
}
String testData = invoice+"\f";
InputStream is = new ByteArrayInputStream(testData.getBytes());
DocFlavor flavor = DocFlavor.INPUT_STREAM.AUTOSENSE ;
PrintService service = PrintServiceLookup.lookupDefaultPrintService();
System.out.println(service);
DocPrintJob job = service.createPrintJob();
Doc doc= new SimpleDoc(is, flavor, null);
PrintJobWatcher pjDone = new PrintJobWatcher(job);
job.print(doc, null);
pjDone.waitForDone();
is.close();
} catch (PrintException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
static class PrintJobWatcher {
boolean done = false;
PrintJobWatcher(DocPrintJob job) {
// Add a listener to the print job
job.addPrintJobListener(new PrintJobAdapter() {
public void printJobCanceled(PrintJobEvent pje) {
allDone();
}
public void printJobCompleted(PrintJobEvent pje) {
allDone();
}
public void printJobFailed(PrintJobEvent pje) {
allDone();
}
public void printJobNoMoreEvents(PrintJobEvent pje) {
allDone();
}
void allDone() {
synchronized (PrintJobWatcher.this) {
done = true;
PrintJobWatcher.this.notify();
}
}
});
}
public synchronized void waitForDone() {
try {
while (!done) {
wait();
}
} catch (InterruptedException e) {
}
}
}
}

Related

Flatten signatures in pdf with PDFBOX java

I want to flatten a pdf with signatures from a form but I am using this code and when I generate the final pdf I can still delete the signature. What I want is that when I generate the final pdf, I cannot delete anything at all from the pdf
private static void flattenPDF(String src, String dst) throws IOException {
PDDocument doc = null;
try {
doc = PDDocument.load(new File(src));
} catch (IOException e) {
System.out.println("Exception: " + e.getMessage());
}
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
if(acroForm == null) acroForm = new PDAcroForm(PDDocument.load(new File(src)));
PDResources resources = new PDResources();
List<PDField> fields = new ArrayList<>(acroForm.getFields());
processFields(fields, resources);
acroForm.setDefaultResources(resources);
try {
acroForm.flatten();
doc.save(dst);
doc.close();
} catch (IOException e) {
System.out.println("Exception: " + e.getMessage());
}
}
private static void processFields(List<PDField> fields, PDResources resources) {
fields.stream().forEach(f -> {
f.setReadOnly(true);
COSDictionary cosObject = f.getCOSObject();
String value = cosObject.getString(COSName.DV) == null ?
cosObject.getString(COSName.V) : cosObject.getString(COSName.DV);
System.out.println("Setting " + f.getFullyQualifiedName() + ": " + value);
try {
f.setValue(value);
} catch (IOException e) {
if (e.getMessage().matches("Could not find font: /.*")) {
String fontName = e.getMessage().replaceAll("^[^/]*/", "");
System.out.println("Adding fallback font for: " + fontName);
resources.put(COSName.getPDFName(fontName), PDType1Font.HELVETICA);
try {
f.setValue(value);
} catch (IOException e1) {
e1.printStackTrace();
}
} else {
e.printStackTrace();
}
}
if (f instanceof PDNonTerminalField) {
processFields(((PDNonTerminalField) f).getChildren(), resources);
}
});
}

Printing plain text files to PDF printer using javax.print results in an empty file

I need to create a pdf file from plain text files. I supposed that the simplest method would be read these files and print them to a PDF printer.
My problem is that if I print to a pdf printer, the result will be an empty pdf file. If I print to Microsoft XPS Document Writer, the file is created in plain text format, not in oxps format.
I would be satisfied with a two or three step solution. (Eg. converting to xps first then to pdf using ghostscript, or something similar).
I have tried a couple of pdf printers such as: CutePDF, Microsoft PDF writer, Bullzip PDF. The result is the same for each one.
The environment is Java 1.7/1.8 Win10
private void print() {
try {
DocFlavor flavor = DocFlavor.SERVICE_FORMATTED.PRINTABLE;
PrintRequestAttributeSet patts = new HashPrintRequestAttributeSet();
PrintService[] ps = PrintServiceLookup.lookupPrintServices(flavor, patts);
if (ps.length == 0) {
throw new IllegalStateException("No Printer found");
}
System.out.println("Available printers: " + Arrays.asList(ps));
PrintService myService = null;
for (PrintService printService : ps) {
if (printService.getName().equals("Microsoft XPS Document Writer")) { //
myService = printService;
break;
}
}
if (myService == null) {
throw new IllegalStateException("Printer not found");
}
myService.getSupportedDocFlavors();
DocPrintJob job = myService.createPrintJob();
FileInputStream fis1 = new FileInputStream("o:\\k\\t1.txt");
Doc pdfDoc = new SimpleDoc(fis1, DocFlavor.INPUT_STREAM.AUTOSENSE, null);
HashPrintRequestAttributeSet pr = new HashPrintRequestAttributeSet();
pr.add(OrientationRequested.PORTRAIT);
pr.add(new Copies(1));
pr.add(MediaSizeName.ISO_A4);
PrintJobWatcher pjw = new PrintJobWatcher(job);
job.print(pdfDoc, pr);
pjw.waitForDone();
fis1.close();
} catch (PrintException ex) {
Logger.getLogger(Docparser.class.getName()).log(Level.SEVERE, null, ex);
} catch (Exception ex) {
Logger.getLogger(Docparser.class.getName()).log(Level.SEVERE, null, ex);
}
}
class PrintJobWatcher {
boolean done = false;
PrintJobWatcher(DocPrintJob job) {
job.addPrintJobListener(new PrintJobAdapter() {
public void printJobCanceled(PrintJobEvent pje) {
allDone();
}
public void printJobCompleted(PrintJobEvent pje) {
allDone();
}
public void printJobFailed(PrintJobEvent pje) {
allDone();
}
public void printJobNoMoreEvents(PrintJobEvent pje) {
allDone();
}
void allDone() {
synchronized (PrintJobWatcher.this) {
done = true;
System.out.println("Printing done ...");
PrintJobWatcher.this.notify();
}
}
});
}
public synchronized void waitForDone() {
try {
while (!done) {
wait();
}
} catch (InterruptedException e) {
}
}
}

If you can install LibreOffice, it is possible to use the Java UNO API to do this.
There is a similar example here which will load and save a file: Java Convert Word to PDF with UNO. This could be used to convert your text file to PDF.
Alternatively, you could take the text file and send it directly to the printer using the same API.
The following JARs give access to the UNO API. Ensure these are in your class path:
[Libre Office Dir]/URE/java/juh.jar
[Libre Office Dir]/URE/java/jurt.jar
[Libre Office Dir]/URE/java/ridl.jar
[Libre Office Dir]/program/classes/unoil.jar
[Libre Office Dir]/program
The following code will then take your sourceFile and print to the printer named "Local Printer 1".
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import com.sun.star.beans.PropertyValue;
import com.sun.star.frame.XComponentLoader;
import com.sun.star.uno.UnoRuntime;
import com.sun.star.view.XPrintable;
public class DirectPrintTest
{
public static void main(String args[])
{
// set to the correct name of your printers
String printer = "Local Printer 1";// "Microsoft Print to PDF";
File sourceFile = new File("c:/projects/WelcomeTemplate.doc");
if (!sourceFile.canRead()) {
throw new RuntimeException("Can't read:" + sourceFile.getPath());
}
com.sun.star.uno.XComponentContext xContext = null;
try {
// get the remote office component context
xContext = com.sun.star.comp.helper.Bootstrap.bootstrap();
System.out.println("Connected to a running office ...");
// get the remote office service manager
com.sun.star.lang.XMultiComponentFactory xMCF = xContext
.getServiceManager();
Object oDesktop = xMCF.createInstanceWithContext(
"com.sun.star.frame.Desktop", xContext);
com.sun.star.frame.XComponentLoader xCompLoader = (XComponentLoader) UnoRuntime
.queryInterface(com.sun.star.frame.XComponentLoader.class,
oDesktop);
StringBuffer sUrl = new StringBuffer("file:///");
sUrl.append(sourceFile.getCanonicalPath().replace('\\', '/'));
List<PropertyValue> loadPropsList = new ArrayList<PropertyValue>();
PropertyValue pv = new PropertyValue();
pv.Name = "Hidden";
pv.Value = Boolean.TRUE;
loadPropsList.add(pv);
PropertyValue[] loadProps = new PropertyValue[loadPropsList.size()];
loadPropsList.toArray(loadProps);
// Load a Writer document, which will be automatically displayed
com.sun.star.lang.XComponent xComp = xCompLoader
.loadComponentFromURL(sUrl.toString(), "_blank", 0,
loadProps);
// Querying for the interface XPrintable on the loaded document
com.sun.star.view.XPrintable xPrintable = (XPrintable) UnoRuntime
.queryInterface(com.sun.star.view.XPrintable.class, xComp);
// Setting the property "Name" for the favoured printer (name of
// IP address)
com.sun.star.beans.PropertyValue propertyValue[] = new com.sun.star.beans.PropertyValue[2];
propertyValue[0] = new com.sun.star.beans.PropertyValue();
propertyValue[0].Name = "Name";
propertyValue[0].Value = printer;
// Setting the name of the printer
xPrintable.setPrinter(propertyValue);
propertyValue[0] = new com.sun.star.beans.PropertyValue();
propertyValue[0].Name = "Wait";
propertyValue[0].Value = Boolean.TRUE;
// Printing the loaded document
System.out.println("sending print");
xPrintable.print(propertyValue);
System.out.println("closing doc");
((com.sun.star.util.XCloseable) UnoRuntime.queryInterface(
com.sun.star.util.XCloseable.class, xPrintable))
.close(true);
System.out.println("closed");
System.exit(0);
} catch (Exception e) {
e.printStackTrace(System.err);
System.exit(1);
}
}
}

Thank you for all. After two days struggling with various type of printers (I gave a chance to CUPS PDF printer too but I could not make it to print in landscape mode) I ended up using the Apache PDFbox.
It's only a POC solution but works and fits to my needs. I hope it will be useful for somebody.
( cleanTextContent() method removes some ESC control characters from the line to be printed. )
public void txt2pdf() {
float POINTS_PER_INCH = 72;
float POINTS_PER_MM = 1 / (10 * 2.54f) * POINTS_PER_INCH;
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy.MM.dd HH:m.ss");
PDDocument doc = null;
try {
doc = new PDDocument();
PDPage page = new PDPage(new PDRectangle(297 * POINTS_PER_MM, 210 * POINTS_PER_MM));
doc.addPage(page);
PDPageContentStream content = new PDPageContentStream(doc, page);
//PDFont pdfFont = PDType1Font.HELVETICA;
PDFont pdfFont = PDTrueTypeFont.loadTTF(doc, new File("c:\\Windows\\Fonts\\lucon.ttf"));
float fontSize = 10;
float leading = 1.1f * fontSize;
PDRectangle mediabox = page.getMediaBox();
float margin = 20;
float startX = mediabox.getLowerLeftX() + margin;
float startY = mediabox.getUpperRightY() - margin;
content.setFont(pdfFont, fontSize);
content.beginText();
content.setLeading(leading);
content.newLineAtOffset(startX, startY);
BufferedReader fis1 = new BufferedReader(new InputStreamReader(new FileInputStream("o:\\k\\t1.txt"), "cp852"));
String inString;
//content.setRenderingMode(RenderingMode.FILL_STROKE);
float currentY = startY + 60;
float hitOsszesenOffset = 0;
int pageNumber = 1;
while ((inString = fis1.readLine()) != null) {
currentY -= leading;
if (currentY <= margin) {
content.newLineAtOffset(0, (mediabox.getLowerLeftX()-35));
content.showText("Date Generated: " + dateFormat.format(new Date()));
content.newLineAtOffset((mediabox.getUpperRightX() / 2), (mediabox.getLowerLeftX()));
content.showText(String.valueOf(pageNumber++)+" lap");
content.endText();
float yCordinate = currentY+30;
float sX = mediabox.getLowerLeftY()+ 35;
float endX = mediabox.getUpperRightX() - 35;
content.moveTo(sX, yCordinate);
content.lineTo(endX, yCordinate);
content.stroke();
content.close();
PDPage new_Page = new PDPage(new PDRectangle(297 * POINTS_PER_MM, 210 * POINTS_PER_MM));
doc.addPage(new_Page);
content = new PDPageContentStream(doc, new_Page);
content.beginText();
content.setFont(pdfFont, fontSize);
content.newLineAtOffset(startX, startY);
currentY = startY;
}
String ss = new String(inString.getBytes(), "UTF8");
ss = cleanTextContent(ss);
if (!ss.isEmpty()) {
if (ss.contains("JAN") || ss.contains("SUMMARY")) {
content.setRenderingMode(RenderingMode.FILL_STROKE);
}
content.newLineAtOffset(0, -leading);
content.showText(ss);
}
content.setRenderingMode(RenderingMode.FILL);
}
content.newLineAtOffset((mediabox.getUpperRightX() / 2), (mediabox.getLowerLeftY()));
content.showText(String.valueOf(pageNumber++));
content.endText();
fis1.close();
content.close();
doc.save("o:\\k\\t1.pdf");
} catch (IOException ex) {
Logger.getLogger(Document_Creation.class.getName()).log(Level.SEVERE, null, ex);
} finally {
if (doc != null) {
try {
doc.close();
} catch (IOException ex) {
Logger.getLogger(Document_Creation.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
}

File rendering for PDF file print

I tried below code for print the PDF file for
public static void main(String args[])
{
FileInputStream psStream = null;
try {
psStream = new FileInputStream("E://ssc exam.pdf");
} catch (FileNotFoundException ffne) {
ffne.printStackTrace();
}
if (psStream == null) {
return;
}
DocFlavor psInFormat = DocFlavor.INPUT_STREAM.AUTOSENSE;
Doc myDoc = new SimpleDoc(psStream, psInFormat, null);
PrintRequestAttributeSet aset = new HashPrintRequestAttributeSet();
PrintService[] services = PrintServiceLookup.lookupPrintServices(psInFormat, aset);
// this step is necessary because I have several printers configured
PrintService myPrinter = null;
for (int i = 0; i < services.length; i++){
String svcName = services[i].toString();
System.out.println("service found: "+svcName);
if (svcName.contains("sfg")){
myPrinter = services[i];
System.out.println("my printer found: "+svcName);
break;
}
}
if (myPrinter != null) {
DocPrintJob job = myPrinter.createPrintJob();
try {
job.print(myDoc, aset);
} catch (Exception pe) {
pe.printStackTrace();}
} else {
System.out.println("no printer services found");
}
}
}
but i got "PDF file not printed.128 MB of memory is required to enable direct PDF printing" error,So I decide to use PDF rendering for print the PDF.can anyone to help how to use PDFfile rendering concept in detail.

trying to set arabic sentence in word using Apache poi?

i'am trying to generate a word document using apache poi api, and i want to set an arabic sentence into the word, but the words didn't stay on the order !!! for instead of "شهادة بالملك" i get بالملك شهادة
public class word {
public static void main (String [] args) {
XWPFDocument docx = new XWPFDocument();
try {
XWPFParagraph tmpParagraph = docx.createParagraph();
XWPFRun tmpRun = tmpParagraph.createRun();
tmpRun.setText("شهادة بالملك");
tmpRun.setFontSize(18);
tmpRun.setFontFamily("Calibri (Corps)");
tmpRun.setBold(true);
tmpRun.setColor("003894");
tmpParagraph.setAlignment(ParagraphAlignment.LEFT);
tmpRun.setUnderline(UnderlinePatterns.SINGLE);
tmpParagraph.setSpacingAfter(300);
FileOutputStream fos = new FileOutputStream("Word2.docx");
docx.write(fos);
fos.close();
}
catch (Exception e ) {
e.printStackTrace();
}
}
}

this is the answer :
public class word {
public enum TextOrientation {
LTR,
RTL
}
public static void main (String [] args) {
XWPFDocument docx = new XWPFDocument();
try {
XWPFParagraph tmpParagraph = docx.createParagraph();
XWPFRun tmpRun = tmpParagraph.createRun();
tmpRun.setText("شهادة بالملك");
tmpRun.setFontSize(18);
tmpRun.setFontFamily("Calibri (Corps)");
tmpRun.setBold(true);
tmpRun.setColor("003894");
tmpParagraph.setAlignment(ParagraphAlignment.CENTER);
tmpRun.setUnderline(UnderlinePatterns.SINGLE);
tmpParagraph.setSpacingAfter(300);
setOrientation(tmpParagraph, TextOrientation.RTL);
FileOutputStream fos = new FileOutputStream("Word2.docx");
docx.write(fos);
fos.close();
}
catch (Exception e ) {
e.printStackTrace();
}
}
private static void setOrientation(XWPFParagraph par, TextOrientation orientation) {
if ( par.getCTP().getPPr()==null ) {
par.getCTP().addNewPPr();
}
if ( par.getCTP().getPPr().getBidi()==null ) {
par.getCTP().getPPr().addNewBidi();
}
par.getCTP().getPPr().getBidi().setVal(orientation==TextOrientation.RTL?STOnOff.ON:STOnOff.OFF);
}
}

Do Not Crawl certain page in a particular link(exclude certain url from crawling)

This is the below code in my MyCrawler.java and it is crawling all those links that I have provided in href.startsWith but suppose If I do not want to crawl this particular page http://inv.somehost.com/people/index.html then how can I do this in my code..
public MyCrawler() {
}
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
if (href.startsWith("http://www.somehost.com/") || href.startsWith("http://inv.somehost.com/") || href.startsWith("http://jo.somehost.com/")) {
//And If I do not want to crawl this page http://inv.somehost.com/data/index.html then how it can be done..
return true;
}
return false;
}
public void visit(Page page) {
int docid = page.getWebURL().getDocid();
String url = page.getWebURL().getURL();
String text = page.getText();
List<WebURL> links = page.getURLs();
int parentDocid = page.getWebURL().getParentDocid();
try {
URL url1 = new URL(url);
System.out.println("URL:- " +url1);
URLConnection connection = url1.openConnection();
Map responseMap = connection.getHeaderFields();
Iterator iterator = responseMap.entrySet().iterator();
while (iterator.hasNext())
{
String key = iterator.next().toString();
if (key.contains("text/html") || key.contains("text/xhtml"))
{
System.out.println(key);
// Content-Type=[text/html; charset=ISO-8859-1]
if (filters.matcher(key) != null){
System.out.println(url1);
try {
final File parentDir = new File("crawl_html");
parentDir.mkdir();
final String hash = MD5Util.md5Hex(url1.toString());
final String fileName = hash + ".txt";
final File file = new File(parentDir, fileName);
boolean success =file.createNewFile(); // Creates file crawl_html/abc.txt
System.out.println("hash:-" + hash);
System.out.println(file);
// Create file if it does not exist
// File did not exist and was created
FileOutputStream fos = new FileOutputStream(file, true);
PrintWriter out = new PrintWriter(fos);
// Also could be written as follows on one line
// Printwriter out = new PrintWriter(new FileWriter(args[0]));
// Write text to file
Tika t = new Tika();
String content= t.parseToString(new URL(url1.toString()));
out.println("===============================================================");
out.println(url1);
out.println(key);
//out.println(success);
out.println(content);
out.println("===============================================================");
out.close();
fos.flush();
fos.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// http://google.com
}
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println("=============");
}
And this is my Controller.java code from where MyCrawler is getting called..
public class Controller {
public static void main(String[] args) throws Exception {
CrawlController controller = new CrawlController("/data/crawl/root");
controller.addSeed("http://www.somehost.com/");
controller.addSeed("http://inv.somehost.com/");
controller.addSeed("http://jo.somehost.com/");
controller.start(MyCrawler.class, 20);
controller.setPolitenessDelay(200);
controller.setMaximumCrawlDepth(2);
}
}
Any suggestions will be appreciated..

How about adding a property to tell which urls you want to exclude.
Add to your exclusions list all the pages that you don't want them to get crawled.
Here is an example:
public class MyCrawler extends WebCrawler {
List<Pattern> exclusionsPatterns;
public MyCrawler() {
exclusionsPatterns = new ArrayList<Pattern>();
//Add here all your exclusions using Regular Expresssions
exclusionsPatterns.add(Pattern.compile("http://investor\\.somehost\\.com.*"));
}
/*
* You should implement this function to specify
* whether the given URL should be visited or not.
*/
public boolean shouldVisit(WebURL url) {
String href = url.getURL().toLowerCase();
//Iterate the patterns to find if the url is excluded.
for (Pattern exclusionPattern : exclusionsPatterns) {
Matcher matcher = exclusionPattern.matcher(href);
if (matcher.matches()) {
return false;
}
}
if (href.startsWith("http://www.ics.uci.edu/")) {
return true;
}
return false;
}
}
In this example we are telling that all urls that start with http://investor.somehost.com should not be crawled.
So these wont be crawled:
http://investor.somehost.com/index.html
http://investor.somehost.com/something/else
I recommend you reading about regular expresions.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

what is the best way to print invoice in java - java

Related

Flatten signatures in pdf with PDFBOX java

Printing plain text files to PDF printer using javax.print results in an empty file

File rendering for PDF file print

trying to set arabic sentence in word using Apache poi?

Do Not Crawl certain page in a particular link(exclude certain url from crawling)

Categories

Resources