Writing data to text file in table format - java

So far I have this:
File dir = new File("C:\\Users\\User\\Desktop\\dir\\dir1\\dir2);
dir.mkdirs();
File file = new File(dir, "filename.txt");
FileWriter archivo = new FileWriter(file);
archivo.write(String.format("%20s %20s", "column 1", "column 2 \r\n"));
archivo.write(String.format("%20s %20s", "data 1", "data 2"));
archivo.flush();
archivo.close();
However. the file output looks like this:
Which I do not like at all.
How can I make a better table format for the output of a text file?
Would appreciate any assistance.
Thanks in advance!
EDIT: Fixed!
Also, instead of looking like
column 1 column 2
data 1 data 2
How can I make it to look like this:
column 1 column 2
data 1 data 2
Would prefer it that way.

The \r\n is been evaluated as part of the second parameter, so it basically calculating the required space as something like... 20 - "column 2".length() - " \r\n".length(), but since the second line doesn't have this, it takes less space and looks misaligned...
Try adding the \r\n as part of the base format instead, for example...
String.format("%20s %20s \r\n", "column 1", "column 2")
This generates something like...
column 1 column 2
data 1 data 2
In my tests...

I think you are trying to get data in tabular format. I've developed a Java library that can build much complex tables with more customization. You can get the source code here. Following are some of the basic table-views that my library can create. Hope this is useful enough!
COLUMN WISE GRID(DEFAULT)
+--------------+--------------+--------------+--------------+-------------+
|NAME |GENDER |MARRIED | AGE| SALARY($)|
+--------------+--------------+--------------+--------------+-------------+
|Eddy |Male |No | 23| 1200.27|
|Libby |Male |No | 17| 800.50|
|Rea |Female |No | 30| 10000.00|
|Deandre |Female |No | 19| 18000.50|
|Alice |Male |Yes | 29| 580.40|
|Alyse |Female |No | 26| 7000.89|
|Venessa |Female |No | 22| 100700.50|
+--------------+--------------+--------------+--------------+-------------+
FULL GRID
+------------------------+-------------+------+-------------+-------------+
|NAME |GENDER |MARRIE| AGE| SALARY($)|
+------------------------+-------------+------+-------------+-------------+
|Eddy |Male |No | 23| 1200.27|
+------------------------+-------------+------+-------------+-------------+
|Libby |Male |No | 17| 800.50|
+------------------------+-------------+------+-------------+-------------+
|Rea |Female |No | 30| 10000.00|
+------------------------+-------------+------+-------------+-------------+
|Deandre |Female |No | 19| 18000.50|
+------------------------+-------------+------+-------------+-------------+
|Alice |Male |Yes | 29| 580.40|
+------------------------+-------------+------+-------------+-------------+
|Alyse |Female |No | 26| 7000.89|
+------------------------+-------------+------+-------------+-------------+
|Venessa |Female |No | 22| 100700.50|
+------------------------+-------------+------+-------------+-------------+
NO GRID
NAME GENDER MARRIE AGE SALARY($)
Alice Male Yes 29 580.40
Alyse Female No 26 7000.89
Eddy Male No 23 1200.27
Rea Female No 30 10000.00
Deandre Female No 19 18000.50
Venessa Female No 22 100700.50
Libby Male No 17 800.50
Eddy Male No 23 1200.27
Libby Male No 17 800.50
Rea Female No 30 10000.00
Deandre Female No 19 18000.50
Alice Male Yes 29 580.40
Alyse Female No 26 7000.89
Venessa Female No 22 100700.50

You're currently including " \r\n" within your right-aligned second argument. I suspect you don't want the space at all, and you don't want the \r\n to be part of the count of 20 characters.
To left-align instead of right-aligning, use the - flag, i.e. %-20s instead of %20s. See the documentation for Formatter documentation for more information.
Additionally, you can make the code work in a more cross-platform way using %n to represent the current platform's line terminator (unless you specifically want a Windows file.
I'd recommend the use of Files.newBufferedWriter as well, as that allows you to specify the character encoding (and will use UTF-8 otherwise, which is better than using the platform default)... and use a try-with-resources statement to close the writer even in the face of an exception:
try (Writer writer = Files.newBufferedWriter(file.toPath())) {
writer.write(String.format("%-20s %-20s%n", "column 1", "column 2"));
writer.write(String.format("%-20s %-20s%n", "data 1", "data 2"));
}

try {
PrintWriter outputStream = new PrintWriter("myObjects.txt");
outputStream.println(String.format("%-20s %-20s %-20s", "Name", "Age", "Gender"));
outputStream.println(String.format("%-20s %-20s %-20s", "John", "30", "Male"));
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}

It also works with the printf in case you want to output variables instead of hard coding
try {
PrintWriter myObj = new PrintWriter("Result.txt");
resultData.println("THE RESULTS OF THE OPERATIONS\n");
for (int i = 0; i < 15; i++){
resultData.printf("%-20d%-20d\r", finalScores[i], midSemScores[i]);
}
resultData.close();
} catch (IOException e){
System.Out.Println("An error occurred");
e.printStackTrace();
}

Related

Spark Dataset - How to create a new column by modifying an existing column value

I have a Dataset like below
Dataset<Row> dataset = ...
dataset.show()
| NAME | DOB |
+------+----------+
| John | 19801012 |
| Mark | 19760502 |
| Mick | 19911208 |
I want to convert it to below (formatted DOB)
| NAME | DOB |
+------+------------+
| John | 1980-10-12 |
| Mark | 1976-05-02 |
| Mick | 1991-12-08 |
How can I do this? Basically, I am trying to figure out how to manipulate existing column string values in a generic way.
I tried using dataset.withColumn but couldn't quite figure out how to achieve this.
Appreciate any help.
With "substring" and "concat" functions:
df.withColumn("DOB_FORMATED",
concat(substring($"DOB", 0, 4), lit("-"), substring($"DOB", 5, 2), lit("-"), substring($"DOB", 7, 2)))
Load the data into a dataframe(deltaData) and just use the following line
deltaData.withColumn("DOB", date_format(to_date($"DOB", "yyyyMMdd"), "yyyy-MM-dd")).show()
Assuming DOB is a String you could write a UDF
def formatDate(s: String): String {
// date formatting code
}
val formatDateUdf = udf(formatDate(_: String))
ds.select($"NAME", formatDateUdf($"DOB").as("DOB"))

With Apache Spark flattern the 2 first rows of each group with Java

Giving the following input table:
+----+------------+----------+
| id | shop | purchases|
+----+------------+----------+
| 1 | 01 | 20 |
| 1 | 02 | 31 |
| 2 | 03 | 5 |
| 1 | 03 | 3 |
+----+------------+----------+
I would like, grouping by id and based on the purchases, obtain the first 2 top shops as follow:
+----+-------+------+
| id | top_1 | top_2|
+----+-------+------+
| 1 | 02 | 01 |
| 2 | 03 | |
+----+-------+------+
I'm using Apache Spark 2.0.1 and the first table is the result of other queries and joins which are on a Dataset. I could maybe do this with the traditional java iterating over the Dataset, but I hope there is another way using the Dataset functionalities.
My first attempt was the following:
//dataset is already ordered by id, purchases desc
...
Dataset<Row> ds = dataset.repartition(new Column("id"));
ds.foreachPartition(new ForeachPartitionFunction<Row>() {
#Override
public void call(Iterator<Row> itrtr) throws Exception {
int counter = 0;
while (itrtr.hasNext()) {
Row row = itrtr.next();
if(counter < 2)
//save it into another Dataset
counter ++;
}
}
});
But then I were lost in how to save it into another Dataset. My goal is, at the end, save the result into a MySQL table.
Using window functions and pivot you can define a window:
import org.apache.spark.sql.expressions.Window
import org.apache.spark.sql.functions.{col, first, row_number}
val w = Window.partitionBy(col("id")).orderBy(col("purchases").desc)
add row_number and filter top two rows:
val dataset = Seq(
(1, "01", 20), (1, "02", 31), (2, "03", 5), (1, "03", 3)
).toDF("id", "shop", "purchases")
val topTwo = dataset.withColumn("top", row_number.over(w)).where(col("top") <= 2)
and pivot:
topTwo.groupBy(col("id")).pivot("top", Seq(1, 2)).agg(first("shop"))
with result being:
+---+---+----+
| id| 1| 2|
+---+---+----+
| 1| 02| 01|
| 2| 03|null|
+---+---+----+
I'll leave converting syntax to Java as an exercise for the poster (excluding import static for functions the rest should be close to identical).

RegEx for normalising UK telephone number

I am trying to normalise UK telephone numbers to international format.
The following strings should resolve to: +447834012345
07834012345
+447834012345
+4407834012345
+44 (0) 7834 012345
+44 0 7834 012345
004407834012345
0044 (0) 7834012345
00 44 0 7834012345
So far, I have got this:
"+44" + mobile.replaceAll("[^0-9]0*(44)?0*", "")
This doesn't quite cut it, as I am having problems with leading 0's etc; see table below. I'd like to try and refrain from using the global flag if possible.
Mobile | Normalised |
--------------------+--------------------+------
07834012345 | +4407834012345 | FAIL
+447834012345 | +447834012345 | PASS
+4407834012345 | +447834012345 | PASS
+44 (0) 7834 012345 | +44783412345 | FAIL
+44 0 7834 012345 | +44783412345 | FAIL
004407834012345 | +44004407834012345 | FAIL
0044 (0) 7834012345 | +4400447834012345 | FAIL
00 44 0 7834012345 | +44007834012345 | FAIL
+4407834004445 | +447834004445 | PASS
Thanks
If you still want the regex I was able to get it working like this:
"+44" + System.out.println(replaceAll("[^0-9]", "")
.replaceAll("^0{0,2}(44){0,2}0{0,1}(\\d{10})", "$2"));
EDIT: Changed the code to reflect failed tests. Removed non-numeric characters before running the regex.
EDIT: Update code based on comments.
Like my answer here, I would also suggest looking at the Google libphonenumber library. I know it is not regex but it does exactly what you want.
An example of how to do it in Java (it is available in other languages) would be the following from the documentation:
Let's say you have a string representing a phone number from
Switzerland. This is how you parse/normalize it into a PhoneNumber
object:
String swissNumberStr = "044 668 18 00";
PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
try {
PhoneNumber swissNumberProto = phoneUtil.parse(swissNumberStr, "CH");
} catch (NumberParseException e) {
System.err.println("NumberParseException was thrown: " + e.toString());
}
At this point, swissNumberProto contains:
{
"country_code": 41,
"national_number": 446681800
}
PhoneNumber is a class that is auto-generated from the
phonenumber.proto with necessary modifications for efficiency. For
details on the meaning of each field, refer to
https://github.com/googlei18n/libphonenumber/blob/master/resources/phonenumber.proto
Now let us validate whether the number is valid:
boolean isValid = phoneUtil.isValidNumber(swissNumberProto); // returns true
There are a few formats supported by the formatting method, as
illustrated below:
// Produces "+41 44 668 18 00"
System.out.println(phoneUtil.format(swissNumberProto, PhoneNumberFormat.INTERNATIONAL));
// Produces "044 668 18 00"
System.out.println(phoneUtil.format(swissNumberProto, PhoneNumberFormat.NATIONAL));
// Produces "+41446681800"
System.out.println(phoneUtil.format(swissNumberProto, PhoneNumberFormat.E164));

Talend - generating n multiple rows from 1 row

Background: I'm using Talend to do something (I guess) that is pretty common: generating multiple rows from one. For example:
ID | Name | DateFrom | DateTo
01 | Marco| 01/01/2014 | 04/01/2014
...could be split into:
new_ID | ID | Name | DateFrom | DateTo
01 | 01 | Marco | 01/01/2014 | 02/01/2014
02 | 01 | Marco | 02/01/2014 | 03/01/2014
03 | 01 | Marco | 03/01/2014 | 04/01/2014
The number of outcoming rows is dynamic, depending on the date period in the original row.
Question: how can I do this? Maybe using tSplitRow? I am going to check those periods with tJavaRow. Any suggestions?
Expanding on the answer given by Balazs Gunics
Your first part is to calculate the number of rows one row will become, easy enough with a date diff function on the to and from dates
Part 2 is to pass that value to a tFlowToIterate, and pick it up with a tJavaFlex that will use it in its start code to control a for loop:
tJavaFlex start:
int currentId = (Integer)globalMap.get("out1.id");
String currentName = (String)globalMap.get("out1.name");
Long iterations = (Long)globalMap.get("out1.iterations");
Date dateFrom = (java.util.Date)globalMap.get("out1.dateFrom");
for(int i=0; i<((Long)globalMap.get("out1.iterations")); i++) {
Main
row2.id = currentId;
row2.name = currentName;
row2.dateFrom = TalendDate.addDate(dateFrom, i, "dd");
row2.dateTo = TalendDate.addDate(dateFrom, i+1, "dd");
End
}
and sample output:
1|Marco|01-01-2014|02-01-2014
1|Marco|02-01-2014|03-01-2014
1|Marco|03-01-2014|04-01-2014
2|Polo|01-01-2014|02-01-2014
2|Polo|02-01-2014|03-01-2014
2|Polo|03-01-2014|04-01-2014
2|Polo|04-01-2014|05-01-2014
2|Polo|05-01-2014|06-01-2014
2|Polo|06-01-2014|07-01-2014
2|Polo|07-01-2014|08-01-2014
2|Polo|08-01-2014|09-01-2014
2|Polo|09-01-2014|10-01-2014
2|Polo|10-01-2014|11-01-2014
2|Polo|11-01-2014|12-01-2014
2|Polo|12-01-2014|13-01-2014
2|Polo|13-01-2014|14-01-2014
2|Polo|14-01-2014|15-01-2014
2|Polo|15-01-2014|16-01-2014
2|Polo|16-01-2014|17-01-2014
2|Polo|17-01-2014|18-01-2014
2|Polo|18-01-2014|19-01-2014
2|Polo|19-01-2014|20-01-2014
2|Polo|20-01-2014|21-01-2014
2|Polo|21-01-2014|22-01-2014
2|Polo|22-01-2014|23-01-2014
2|Polo|23-01-2014|24-01-2014
2|Polo|24-01-2014|25-01-2014
2|Polo|25-01-2014|26-01-2014
2|Polo|26-01-2014|27-01-2014
2|Polo|27-01-2014|28-01-2014
2|Polo|28-01-2014|29-01-2014
2|Polo|29-01-2014|30-01-2014
2|Polo|30-01-2014|31-01-2014
2|Polo|31-01-2014|01-02-2014
You can use tJavaFlex to do this.
If you have a small amount of columns the a tFlowToIterate -> tJavaFlex options could be fine.
In the begin part you can start to iterate, and in the main part you assign values to the output schema. If you name your output is row6 then:
row6.id = (String)globalMap.get("id");
and so on.
I came here as I wanted to add all context parameters into an Excel data sheet. So the solution bellow works when you are taking 0 input lines, but can be adapted to generate several lines for each line in input.
The design is actually straight forward:
tJava –trigger-on-OK→ tFileInputDelimited → tDoSomethingOnRowSet
↓ ↑
[write into a CSV] [read the CSV]
And here is the kind of code structure usable in the tJava.
try {
StringBuffer wad = new StringBuffer();
wad.append("Key;Nub"); // Header
context.stringPropertyNames().forEach(
key -> wad.
append(System.getProperty("line.separator")).
append(key + ";" + context.getProperty(key) )
);
// Here context.metadata contains the path to the CSV file
FileWriter output = new FileWriter(context.metadata);
output.write(wad.toString());
output.close();
} catch (IOException mess) {
System.out.println("An error occurred.");
mess.printStackTrace();
}
Of course if you have a set of rows as input, you can adapt the process to use a tJavaRow instead of a tJava.
You might prefer to use an Excel file as an on disk buffer, but dealing with this file format asks more work at least the first time when you don’t have the Java libraries already configured in Talend. Apache POI might help you if you nonetheless chose to go this way.

read pdf from itext

I had made a table in pdf using text in java web application.
PDF Generated is:
Gender | Column 1 | Column 2 | Column 3
Male | 1845 | 645 | 254
Female | 214 | 457 | 142
On reading pdf i used following code:
ArrayList allrows = firstable.getRows();
for (PdfPRow currentrow:allrows) {
PdfPCell[] allcells = currentrow.getCells();
System.out.println("CurrentRow ->"+currentrow.getCells());
for(PdfPCell currentcell : allcells){
ArrayList<Element> element = (ArrayList<Element>) currentcell.getCompositeElements();
System.out.println("Element->"+element.toString());
}
}
How to read text from pdf columns and pass to int variables?
Why don't you generate the Column of the pdf as fields, so that reading will be much easier

Categories

Resources