Using Java and pdfBox to search a pdf for usd amounts - java

This is for something that could save me about 10 minutes at work, I am not getting paid for it. This is Java. Its been a while since I touched Java. I'm searching a PDF for just numbers that use USD currency form via pdfBox. This is a what the document looks a lot like.
Activity Report
Business Date: 10/9/2019 Property Code: me.ra777 Shift: 9 User: me.ra777
Reserve
Account Person Name Start End Days Status Money TypeOfCode Type Location Source GTD Date User
077071543 Smith's, John Middle 9/25/19 9/26/19 1 O 55.50 BAR SNQQ 211 WI MC 9/25/19 me.ra777
877075375 45Lisa, Jo.nes Mid 9/25/19 9/26/19 1 I 99.00 SEG SNKE 138 WI VI 9/25/19 me.ra777
677256813 Jo^hn Wi.ck Ed 9/26/19 9/27/19 1 O 129.00 TRQ SNQQ 132 WI VI 9/26/19 me.ra777
477007406 Guys, Are 9/26/19 9/27/19 1 O 129.00 BAR SNQQ 133 WI VI 9/26/19 me.ra777
977495887 Last, First 9/27/19 9/28/19 1 O 165.00 BAR SNKE 438 WI VI 9/27/19 me.ra777
677472246 Po.or, Rich 9/27/19 9/28/19 1 O 165.00 BAR SNKE 138 WI MC 9/27/19 me.ra777
677457228 Dude, Isn't Here 9/27/19 9/28/19 1 I 180.00 BAR SNQQ 433 WI MC 9/27/19 me.ra777
Date/Time of Printing: 10/10/2019 1:42 PM Software Version: ssrs7x67 Page 1 of 1
If I used a a method like this......
public static void oneLine(Scanner sc){
while (sc.hasNextLine()) {
String line = sc.nextLine();
if(line.contains(" WI ")){
displayArea.append("\n"+line + "\n");
break;
}else{}
}
sc.close();
}
I would only get this for my output.
077071543 Smith's, John Middle 9/25/19 9/26/19 1 O 55.50 BAR SNQQ 211 WI MC 9/25/19 me.ra777
My desired out put would be just
55.50
Maybe even all the USD amounts like this
55.50
99.00
129.00
129.00
165.00
165.00
180.00
Okay a little bit more data about this document. I only need the data in these lines
077071543 Smith's, John Middle 9/25/19 9/26/19 1 O 55.50 BAR SNQQ 211 WI MC 9/25/19 me.ra777
877075375 45Lisa, Jo.nes Mid 9/25/19 9/26/19 1 I 99.00 SEG SNKE 138 WI VI 9/25/19 me.ra777
677256813 Jo^hn Wi.ck Ed 9/26/19 9/27/19 1 O 129.00 TRQ SNQQ 132 WI VI 9/26/19 me.ra777
477007406 Guys, Are 9/26/19 9/27/19 1 O 129.00 BAR SNQQ 133 WI VI 9/26/19 me.ra777
977495887 Last, First 9/27/19 9/28/19 1 O 165.00 BAR SNKE 438 WI VI 9/27/19 me.ra777
677472246 Po.or, Rich 9/27/19 9/28/19 1 O 165.00 BAR SNKE 138 WI MC 9/27/19 me.ra777
677457228 Dude, Isn't Here 9/27/19 9/28/19 1 I 180.00 BAR SNQQ 433 WI MC 9/27/19 me.ra777
Everything in the those lines can change EXCEPT under source where it says "WI" AND Under User where it says "me.ra777" People can mess up names like where you see "45Lisa, Jo.nes" and "Jo^hn Wi.ck"
Ultimately I still have more work to do after this. Where I need to add all the USD amounts and actually, still a little more where I divide them by 100; which, in this example I believe would give me 9.225 if I did my math right.....
I'm really hoping I can just change part of this code like here ....
if(line.contains(" WI ")){
So then I could at least get an output of only the lines I need and I could work a little on my own from there and try to figure the rest out on my own.

Solved it. In short I had two major methods ----> find() & getUSD(Final String)
find() Used
1. A for Loop
2. A while(String.astNextLine)
3. If line contains("WI" && ! linecontains "Software Version " )
4. varable rate = getUSD(String)
5. doSumMathStuffs'andComplainAboutWhyJavacan'tTellThisIsa#WithoutParseing
;;
6. print ("\n"+rate);
getUSD(Final String) used
1. If/else "Matcher m = Pattern.compile("-?\d+(\.\d+)").matcher(strings);"
2. while(m.find)
3 return m.group
4. There's Actualy some parseing and some other "transfer this variable tYpE to that TyPe " too

Related

Read csv file with Java OpenCSV

I have the following .csv file:
Company ABC
"Jan 1, 2020 - Sep 30, 2020"
Product Country Avg. monthly clients Avg. month charge Parts change Impact In stock Clients in list City
Nissan Maxima USA 6600 0% -18% Low 18
BMW X7 M50i USA 18100 22% 0% Low 28
Volvo XC90 USA 880 0% -12% Low 10
Opel Insignia USA 320 -34% -34% Low 23
Renult Triber USA 140 -18% -36% Low 8
Toyota Yaris USA 880 0% -28% Low 30
Ford Mondeo USA 70 -20% -71% Low 1
for delimiter I have empty space(Tab). I tried to use this code in order to read the file using Opencsv:
#Getter
#Setter
public class CsvLine {
#CsvBindByPosition(position = 1)
private String model;
#CsvBindByPosition(position = 2)
private String country;
}
String fileName = "C:\\in_progress\\zzz.csv";
List<CsvLine> beans = new CsvToBeanBuilder(new FileReader(fileName))
.withType(CsvLine.class)
.withSeparator(' ')
.withSkipLines(1)
.build()
.parse();
for(CsvLine item: beans){
System.out.println(item.getModel());
}
But I get this output:
X C 9 0
null
I n s i g n i a U S A 3 2 0 - 3 4 % - 3 4 % L o w 2 3
null
T r i b e r
null
Y a r i s U S A 8 8 0 0 % - 2 8 % L o w 3 0
null
M o n d e o U S A 7 0 - 2 0 % - 7 1 % L o w 1
null
null
Do you know how I can the file properly with Java preferably with OpenCSV?
Test file https://www.dropbox.com/s/7jo4i3bs6h8at25/zzz.csv?dl=0
If your CSV file really uses the Tab character as field delimitier, it should be sufficient to change to:
List<CsvLine> beans = new CsvToBeanBuilder(new FileReader(fileName))
.withType(CsvLine.class)
.withSeparator('\t')
.withSkipLines(2)
.build()
.parse();
I changed withSeparator argument and increased the number of lines to skip to 2

Display the total amount of each products below the table

I'm using itextpdf to display the sql query
SELECT prod_id, prod_name, amt, sum(amt) over(partition by prod_name) as total_amt
FROM items
ORDER BY prod_name;
I was able separate the table according to PROD_NAME, and calculate the total amount of each products by getting the sum(amt) over(partition by prod_name) as total_amt from the sql query. Now, I've been trying to display the total amount of each products below like this:
APPLE
PROD_ID | AMT
11111 12.75
22222 13.75
33333 14.75
Total: 41.25
ORANGE
PROD_ID | AMT
44444 15.75
55555 16.75
Total: 32.5
However, this is the output of my code. The Total amount is displayed after each row.
APPLE
PROD_ID | AMT
11111 12.75
Total: 41.25
22222 13.75
Total: 41.25
33333 14.75
Total: 41.25
ORANGE
PROD_ID | AMT
44444 15.75
Total: 32.5
55555 16.75
Total: 32.5
Here is the snippet of my code:
List<String> prod_Names = new ArrayList<>();
while(rs.next()){
String prodName = rs.getString(2);
if (!prod_Names.contains(prodName)){
prod_Names.add(prodName);
// Displays the Product Name on top of the table
PdfPCell name = new PdfPCell(new Phrase(prodname, bold));
name.setColspan(2);
prod_Table.addCell(name);
// Displays the Row Header
prod_Table.addCell(new Phrase("PROD_ID", header_Bold));
prod_Table.addCell(new Phrase("AMT", header_Bold));
}
String prodId_Values = result.getInt(1);
int amt_Values = result.getInt(3); // amount
int totalAmt = result.getInt(4); // total amount of each products
//Displays the Values
prod_Table.addCell(new Phrase(Integer.toString(prodId_Values), normalFont));
prod_Table.addCell(new Phrase(Integer.toString(amt_Values), normalFont));
// Display Total
prod_Table.addCell(new Phrase("TOTAL:", normalFont));
prod_Table.addCell(new Phrase(Integer.toString(totalAmt), normalFont));
}
I tried putting the Display Total lines inside an if condition just like how I did with the Product Name. I also added another ArrayList called prod_Names2.
if(!prod_Names2.contains(prodName)){
prod_Names2.add(prodName);
// Display Total
prod_Table.addCell(new Phrase("TOTAL:", normalFont));
prod_Table.addCell(new Phrase(Integer.toString(totalAmt), normalFont));
}
The total amount is now only displayed one time, but it's displayed after one row of each products like this. This is the best that I could do:
APPLE
PROD_ID | AMT
11111 12.75
Total: 41.25
22222 13.75
33333 14.75
ORANGE
PROD_ID | AMT
44444 15.75
Total: 32.5
55555 16.75
In Standard SQL you can do what you want using grouping sets:
SELECT prod_id, prod_name, SUM(amt) as amt
FROM items
GROUP BY GROUPING SETS ( (prod_id, prod_name), () )
ORDER BY prod_name NULLS LAST;
Not all databases support exactly this syntax, but most support some variation of it. Then your java code can just read the results from the query.

i want to scrape text from a image file and store it in excel

BOWLING O M R W ECON 0s 45 6 WD NB Losing Dhoni as a batter always
difficult for us - Raina
TABoult 4 0 3 0 925 M 2 3 1 0 The Chennai Super Kings batsman
struck form after lean season and
JETED 6 0 = 4 O 0 0 lauded Dhoni's support at the crease
CHMorris 4 0 4 ns o9 8 1 1 against Delhi Capitals
AR Patel 3 o 3 1 1033 6 3 2 o o “Watch the ball, hit the ball' - Dhoni's
formula for the final over
S o0 e sEoe 10 o o The CSK captain has hit 554 runs in
e PR el 227 balls inthe 20th over of an IPL
match. Thats 13% of all the runs he's
made i this tournament
. Delhi Capitals Innings (target: 180 runs from 20 overs) Talking Points - Is Dhoni babering #EEIEER -
this one is my String
i want in excel
Based on the sparse description on what you want to do i would suggest:
Read the text from the image
Replace all spaces with a colon
String csvContent = imgData.replaceAll(" ",";");
save text to a csv file
open csv file with excel
The following example assumes that you have managed to retrieve the data which is then post-processed to provide the csv format. The contents are written to a file which you can just doubleclick to see that the data is split into columns as you requested.
String[] data = new String[] {
"BOWLING O M R W ECON 0s 45 6", //notice that your OCR software does not properly recognise the string here
"TABoult 4 0 3 0 925 M 2 3",
"JETED 6 0 = 4 O 0 0"
};
BufferedWriter writer = new BufferedWriter( new FileWriter( System.getProperty( "user.home" ) + System.getProperty( "file.separator" ) + "data.csv" ) );
for( String record : data ) {
writer.write( record.replaceAll( " ", ";" ) );
writer.write( "\n" );
}
writer.close();
Like i put in comment above, your OCR does not work correctly. I would suggest you take a look into JSOUP html parser to get the information and continue from there. Otherwise you will not be satisfied by the result.
driver.get("https://www.espncricinfo.com/series/8048/scorecard/1178425/chennai-super-kings-vs-delhi-capitals-50th-match-indian-premier-league-2019");
WebElement element = driver.findElement(By.xpath("//article[#class='sub-module scorecard'][1]"));
JavascriptExecutor js = (JavascriptExecutor) driver;
js.executeScript("arguments[0].scrollIntoView(true);", element);
File screen = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
File file = new File("C:\\Users\\user\\Desktop\\screenshot1\\screenshotOfElement2.png");
FileHandler.copy(screen, file);
ITesseract instance = new Tesseract();
instance.setDatapath("C:\\selenium_work\\ScrapingText.PDF\\tessdata");
String result = instance.doOCR(file);
//System.out.println(result);
String[] lines = result.split("\\n");
this one what am trying

I want to extract strings from a line

Below contents are available in a text file. I want to extract data (Name, age, Working experience, position). How can I do? I tried to extract using java stringtokenizer and split function. But cannot extract data.
Name Age Working Experience Position
John 23 10 Team Leader
Christian Elverdam 27 7 Director
Niels Bye Nielsen 59 16 Composer
Rajkumar Hirani 40 23 Director
Vidhu Vinod Chopra 58 21 Screenplay
Expected ouput:
John |23|10|Team Leader|
Christian Elverdam|27|7 |Director |
Niels Bye Nielsen |59|16|Composer |
Rajkumar Hirani |40|23|Director |
Vidhu Vinod Chopra|58|21|Screenplay |
Don't use StringTokenizer:
StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code. It is recommended that anyone seeking this functionality use the split method of String or the java.util.regex package instead.
You can use split() if you split on 2 or more spaces: split(" {2,}")
Demo
String input = "Name Age Working Experience Position \n" +
"John 23 10 Team Leader \n" +
"Christian Elverdam 27 7 Director \n" +
"Niels Bye Nielsen 59 16 Composer\n" +
"Rajkumar Hirani 40 23 Director \n" +
"Vidhu Vinod Chopra 58 21 Screenplay\n";
List<String[]> rows = new ArrayList<>();
try (BufferedReader in = new BufferedReader(new StringReader(input))) {
in.readLine(); // skip header line
for (String line; (line = in.readLine()) != null; ) {
rows.add(line.split(" {2,}"));
}
}
for (String[] row : rows)
System.out.println(Arrays.toString(row));
Output
[John, 23, 10, Team Leader]
[Christian Elverdam, 27, 7, Director]
[Niels Bye Nielsen, 59, 16, Composer]
[Rajkumar Hirani, 40, 23, Director]
[Vidhu Vinod Chopra, 58, 21, Screenplay]

I cannot get this code to line up underneath the headers

I have to get my output to line up beneath a heading. No matter what I do, I cannot get to line up. The item name is very long also, and the words end up wrapping to the next line when I open my outfile. Here is my current output:
8 items are currently available for purchase in Joan's Hardware Store.
----------Joan's Hardware Store-----------
itemID itemName pOrdered pInStore pSold manufPrice sellPrice
1111 Dish Washer 20 20 0 250.50 550.50
2222 Micro Wave 75 75 0 150.00 400.00
3333 Cooking Range 50 50 0 450.00 850.00
4444 Circular Saw 150 150 0 45.00 125.00
5555 Cordless Screwdriver Kit 10 10 0 250.00 299.00
6666 Keurig Programmable Single-Serve 2 2 0 150.00 179.00
7777 Moen Chrome Kitchen Faucet 1 1 0 90.00 104.00
8888 Electric Pressure Washer 0 0 0 150.00 189.00
Total number of items in store: 308
Total inventory: $: 48400.0
Here is my code:
public void endOfDay(PrintWriter outFile)
{
outFile.println (nItems + " items are currently available for purchase in Joan's Hardware Store.");
outFile.println("----------Joan's Hardware Store-----------");
outFile.printf("itemID, itemName, pOrdered, pInStore, pSold, manufPrice, sellPrice");
for (int index = 0; index < nItems; index++)
{
outFile.printf("%n %-5s %-32s %d %d %d %.2f %.2f%n", items[index].itemID , items[index].itemName , items[index].numPord ,items[index].numCurrInSt , items[index].numPSold , items[index].manuprice , items[index].sellingprice);
}
outFile.println("Total number of items in store: " + getTotalOfStock());
outFile.println("Total inventory: $: " + getTotalDollarValueInStore());
} // end endOfDay
Thanks for any help! I have tried many things for hours!!
Basically, you need to format your header the same way you format your lines, for example...
System.out.println("----------Joan's Hardware Store-----------");
System.out.printf("%-6s %-32s %-8s %-8s %-5s %-10s %-8s%n", "itemID", "itemName", "pOrdered", "pInStore", "pSold", "manufPrice", "sellPrice");
System.out.printf("%-6s %-32s %-8d %-8d %-5d %-10.2f %-8.2f%n", "1111", "Dish Washer", 20, 20, 0, 250.50, 550.50);
Results in...
----------Joan's Hardware Store-----------
itemID itemName pOrdered pInStore pSold manufPrice sellPrice
1111 Dish Washer 20 20 0 250.50 550.50

Categories

Resources