How can I generate million row records in a csv format using java with some unique data.
Check out this tutorial.
The code can be quite simple:
MockNeat m = MockNeat.threadLocal();
final Path path = Paths.get("./test.csv");
m.fmt("#{id},#{first},#{last},#{email},#{salary}")
.param("id", m.longSeq())
.param("first", m.names().first())
.param("last", m.names().last())
.param("email", m.emails())
.param("salary", m.money().locale(GERMANY).range(2000, 5000))
.list(1000)
.consume(list -> {
try { Files.write(path, list, CREATE, WRITE); }
catch (IOException e) { e.printStackTrace(); }
});
And the possible result is:
0,Ailene,Greener,auldsoutache#gmx.com,4.995,59 €
1,Yung,Skovira,sereglady#mail.com,2.850,23 €
2,Shanelle,Hevia,topslawton#mac.com,2.980,19 €
3,Venice,Lepe,sagelyshroud#mail.com,4.611,83 €
4,Mi,Repko,nonedings#email.com,3.811,38 €
5,Leonie,Slomski,plumpcreola#aol.com,4.584,28 €
6,Elisabeth,Blasl,swartjeni#mail.com,2.839,69 €
7,Ernestine,Syphard,prestoshod#aol.com,3.471,93 €
8,Honey,Winfrey,pseudpatria#email.com,4.276,56 €
9,Dian,Holecek,primbra#att.net,3.643,66 €
10,Mitchell,Lawer,lessjoellen#yahoo.com,3.260,92 €
11,Kayla,Labbee,hobnailmastella#mail.com,2.504,99 €
12,Jann,Grafenstein,douremile#verizon.net,4.535,70 €
13,Shaunna,Uknown,taughtclifton#gmx.com,3.028,81 €
...
This can give you an idea of how to build a generator.
The random data can be generated using Random class and adapting it to the data you need to generate.
public interface ICsvRandomRenerator{
/* Adds the field definition to an array list that describes the csv */
public void addFieldDefinition(FieldDefinition fieldDefinition);
/* Runs a loop for the number of records needed and for each one it
goes through the FieldDefinition ArrayList, generates the random
data based on field definition, and adds it to the curret
record. Last field changes to a new record*/
public void generateFile(String fileName);
};
public class FieldDefinition(){
String fieldName;
String fieldType; //Alphabetic, Number, Date, etc..
int length;
<getters and setters>
}
public abstract class CsvRandomGenerator implements ICsvRandomGenerator{
ArrayList<FieldDefinition> fields = new ArrayList<>();
<#Override interface classes to implement them >.
private String generateRandomAlpha();
private String generateRandomDate();
private String generateRandomNumber();
...
}
Related
I am new to Java Spring boot and I am trying to read data from a CSV file and then pass this data to my UI via REST.
I am using OpenCSV to parse my file. Here is my sample line in my file
firstname, lastname, card details.
card details consists of card number and expiry date. So my example text looks like:
joe, keller, 123456 22-2-1999.
The output from my rest end point should be:
{
"firstname": "joe",
"lastname" : "keller",
"card number" : "123456",
"expiry date" : 22-2-1999
}
I currently read the file using CsvToBeanBuilder and I seem to get stuck with mapping the final line to two different items.
Because the last column contains 2 fields - card number and expiry date, you cannot directly use CsvToBeanBuilder to read CSV file and convert it to your POJO. A alternative way is shown as follows:
First, I assume that you have already a POJO looks like:
class CreditCardInfo {
private String firstname;
private String lastname;
private String cardNumber;
private String expiryDate;
//general getters and setters
//toString()
}
Then you can use CSVReader.readNext() to read each line into a string array. And for the 3-rd column in CSV file you can separate it into 2 fields by empty space(" "): "123456" and "22-2-1999" with String.trim() for ignoring leading space. Therefore, you can store these fields to cardNumber and expiryDate of the POJO CreditCardInfo, respectively.
Code snippet
try (
Reader reader = Files.newBufferedReader(Paths.get("So58792579.csv"));
CSVReader csvReader = new CSVReader(reader);
) {
String[] nextRecord;
while ((nextRecord = csvReader.readNext()) != null) {
CreditCardInfo creditCardInfo = new CreditCardInfo();
creditCardInfo.setFirstname(nextRecord[0]);
creditCardInfo.setLastname(nextRecord[1].trim()); //trim() for ignoring leading space
creditCardInfo.setCardNumber(nextRecord[2].trim().split(" ")[0]);
creditCardInfo.setExpiryDate(nextRecord[2].trim().split(" ")[1]);
System.out.println(creditCardInfo.toString());
}
} catch (IOException e) {
e.printStackTrace();
}
Console output
CreditCardInfo [firstname=joe, lastname=keller, cardNumber=123456, expiryDate=22-2-1999]
Is it possible to parse a delimited file and find column datatypes? e.g
Delimited file:
Email,FirstName,DOB,Age,CreateDate
test#test1.com,Test User1,20/01/2001,24,23/02/2015 14:06:45
test#test2.com,Test User2,14/02/2001,24,23/02/2015 14:06:45
test#test3.com,Test User3,15/01/2001,24,23/02/2015 14:06:45
test#test4.com,Test User4,23/05/2001,24,23/02/2015 14:06:45
Output:
Email datatype: email
FirstName datatype: Text
DOB datatype: date
Age datatype: int
CreateDate datatype: Timestamp
The purpose of this is to read a delimited file and construct a table creation query on the fly and insert data into that table.
I tried using apache validator, I believe we need to parse the complete file in order to determine each column data type.
EDIT: The code that I've tried:
CSVReader csvReader = new CSVReader(new FileReader(fileName),',');
String[] row = null;
int[] colLength=(int[]) null;
int colCount = 0;
String[] colDataType = null;
String[] colHeaders = null;
String[] header = csvReader.readNext();
if (header != null) {
colCount = header.length;
}
colLength = new int[colCount];
colDataType = new String[colCount];
colHeaders = new String[colCount];
for (int i=0;i<colCount;i++){
colHeaders[i]=header[i];
}
int templength=0;
String tempType = null;
IntegerValidator intValidator = new IntegerValidator();
DateValidator dateValidator = new DateValidator();
TimeValidator timeValidator = new TimeValidator();
while((row = csvReader.readNext()) != null) {
for(int i=0;i<colCount;i++) {
templength = row[i].length();
colLength[i] = templength > colLength[i] ? templength : colLength[i];
if(colHeaders[i].equalsIgnoreCase("email")){
logger.info("Col "+i+" is Email");
} else if(intValidator.isValid(row[i])){
tempType="Integer";
logger.info("Col "+i+" is Integer");
} else if(timeValidator.isValid(row[i])){
tempType="Time";
logger.info("Col "+i+" is Time");
} else if(dateValidator.isValid(row[i])){
tempType="Date";
logger.info("Col "+i+" is Date");
} else {
tempType="Text";
logger.info("Col "+i+" is Text");
}
logger.info(row[i].length()+"");
}
Not sure if this is the best way of doing this, any pointers in the right direction would be of help
If you wish to write this yourself rather than use a third party library then probably the easiest mechanism is to define a regular expression for each data type and then check if all fields satisfy it. Here's some sample code to get you started (using Java 8).
public enum DataType {
DATETIME("dd/dd/dddd dd:dd:dd"),
DATE("dd/dd/dddd",
EMAIL("\\w+#\\w+"),
TEXT(".*");
private final Predicate<String> tester;
DateType(String regexp) {
tester = Pattern.compile(regexp).asPredicate();
}
public static Optional<DataType> getTypeOfField(String[] fieldValues) {
return Arrays.stream(values())
.filter(dt -> Arrays.stream(fieldValues).allMatch(dt.tester)
.findFirst();
}
}
Note that this relies on the order of the enum values (e.g. testing for datetime before date).
Yes it is possible and you do have to parse the entire file first. Have a set of rules for each data type. Iterate over every row in the column. Start of with every column having every data type and cancel of data types if a row in that column violates a rule of that data type. After iterating the column check what data type is left for the column. Eg. Lets say we have two data types integer and text... rules for integer... well it must only contain numbers 0-9 and may begin with '-'. Text can be anything.
Our column:
345
-1ab
123
The integer data type would be removed by the second row so it would be text. If row two was just -1 then you would be left with integer and text so it would be integer because text would never be removed as our rule says text can be anything... you dont have to check for text basically if you left with no other data type the answer is text. Hope this answers your question
I have slight similar kind of logic needed for my project. Searched lot but did not get right solution. For me i need to pass string object to the method that should return datatype of the obj. finally i found post from #sprinter, it looks similar to my logic but i need to pass string instead of string array.
Modified the code for my need and posted below.
public enum DataType {
DATE("dd/dd/dddd"),
EMAIL("#gmail"),
NUMBER("[0-9]+"),
STRING("^[A-Za-z0-9? ,_-]+$");
private final String regEx;
public String getRegEx() {
return regEx;
}
DataType(String regEx) {
this.regEx = regEx;
}
public static Optional<DataType> getTypeOfField(String str) {
return Arrays.stream(DataType.values())
.filter(dt -> {
return Pattern.compile(dt.getRegEx()).matcher(str).matches();
})
.findFirst();
}
}
For example:
Optional<DataType> dataType = getTypeOfField("Bharathiraja");
System.out.println(dataType);
System.out.println(dataType .get());
Output:
Optional[STRING]
STRING
Please note, regular exp pattern is vary based on requirements, so modify the pattern as per your need don't take as it is.
Happy Coding !
I want a CSV format for Order objects. My Order Object will have order details, order line details and item details. Please find the java object below:
Order {
OrderNo, OrderName, Price,
OrderLine {
OrderLineNo, OrderLinePrice,
Item{
ItemNo, ItemName, Item Description
}
}
}
Can anyone please guide me to create csv format for this.
Have a POJO class for your Object for which you want to create CSV file and use java.io.FileWriter to write/append values in csv file. This Java-code-geek Link will help you with this.
If you are feeling adventurous, I'm building support for nested elements in CSV in uniVocity-parsers.
The 2.0.0-SNAPSHOT version supports parsing nested beans with annotations. We are planning to release the final version in a couple of weeks. Writing support has not been implemented yet, so that part you'll have to do manually (should be fairly easy with the current API).
Parsing this sort of structure is more complex, but the parser seems to be working fine for most cases. Have a look at that test case:
Input CSV:
1,Foo
Account,23234,HSBC,123433-000,HSBCAUS
Account,11234,HSBC,222343-130,HSBCCAD
2,BAR
Account,1234,CITI,213343-130,CITICAD
Note that the first column of each row identifies which bean will be read. As "Client" in the CSV matches the class name, you don't need to annotate
Pojos
enum ClientType {
PERSONAL(2),
BUSINESS(1);
int typeCode;
ClientType(int typeCode) {
this.typeCode = typeCode;
}
}
public static class Client {
#EnumOptions(customElement = "typeCode", selectors = { EnumSelector.CUSTOM_FIELD })
#Parsed(index = 0)
private ClientType type;
#Parsed(index = 1)
private String name;
#Nested(identityValue = "Account", identityIndex = 0, instanceOf = ArrayList.class, componentType = ClientAccount.class)
private List<ClientAccount> accounts;
}
public static class ClientAccount {
#Parsed(index = 1)
private BigDecimal balance;
#Parsed(index = 2)
private String bank;
#Parsed(index = 3)
private String number;
#Parsed(index = 4)
private String swift;
}
Code to parse the input
public void parseCsvToBeanWithList() {
final BeanListProcessor<Client> clientProcessor = new BeanListProcessor<Client>(Client.class);
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setLineSeparator("\n");
settings.setRowProcessor(clientProcessor);
CsvParser parser = new CsvParser(settings);
parser.parse(new StringReader(CSV_INPUT));
List<Client> rows = clientProcessor.getBeans();
}
If you find any issue using the parser, please send update this issue
I am using POI's Event API to process large volume of records without any memory foot print issues. Here is the refernce for it.
When i processing XLSX sheet, i am getting different format of Date value than specified format in excel sheet. Date format for a column in excel sheet is 'dd-mm-yyyy' where as I am getting the value in 'mm/dd/yy' format.
Can some one tell me how to get the actual format given in excel sheet. Reference of code snippet is given below.
ContentHandler handler = new XSSFSheetXMLHandler(styles, strings,
new SheetContentsHandler() {
public void startRow(int rowNum) {
}
public void endRow() {
}
public void cell(String cellReference, String formattedValue) {
System.out.println(formattedValue);
} catch (IOException e) {
System.out.println(
"Exception during file writing");
}
}
Getting formmatedValue in cell method for date column is like 'mm/dd/yy' and hence i cant able to do the validations properly in my pl/sql program.
Two points to keep in mind:
The original Excel cell may have a format that doesn't work for you
or may be formatted as general text.
You may want to control exactly how dates, times or numeric values
are formatted.
Another way to control the formatting of date, and other numeric values is to provide your own custom DataFormatter extending org.apache.poi.ss.usermodel.DataFormatter.
You simply override the formatRawCellContents() method (or other methods depending on your needs):
Sample code constructing the parser / handler:
public void processSheet(Styles styles, SharedStrings strings,
SheetContentsHandler sheetHandler, InputStream sheetInputStream)
throws IOException, SAXException {
DataFormatter formatter = new CustomDataFormatter();
InputSource sheetSource = new InputSource(sheetInputStream);
try {
XMLReader sheetParser = SAXHelper.newXMLReader();
ContentHandler handler = new XSSFSheetXMLHandler(styles, null, strings, sheetHandler,
formatter, false);
sheetParser.setContentHandler(handler);
sheetParser.parse(sheetSource);
} catch (ParserConfigurationException e) {
throw new RuntimeException("SAX parser appears to be broken - " + e.getMessage());
}
}
private class CustomDataFormatter extends DataFormatter {
#Override
public String formatRawCellContents(double value, int formatIndex, String formatString,
boolean use1904Windowing) {
// Is it a date?
if (DateUtil.isADateFormat(formatIndex, formatString)) {
if (DateUtil.isValidExcelDate(value)) {
Date d = DateUtil.getJavaDate(value, use1904Windowing);
try {
return new SimpleDateFormat("yyyyMMdd").format(d);
} catch (Exception e) {
logger.log(Level.SEVERE, "Bad date value in Excel: " + d, e);
}
}
}
return new DecimalFormat("##0.#####").format(value);
}
}
I had the very same problem. After a few days googling and research, I came up with a solution. Unfortunately, it isn't nice, but it works:
Make a copy of org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler class in your project.
Find the interface SheetContentsHandler in the class.
Add a new method definition: String overriddenFormat(String cellRef, int formatIndex, String formatString);
Find this method in the class: public void endElement(String uri, String localName, String name) throws SAXException.
It has a long switch over the cell types.
In the case NUMBER there is an if statement like this: if (this.formatString != null) {...
Before that, paste this code:
String overriddenFormat = output.overriddenFormat(cellRef, formatIndex, formatString);
if (overriddenFormat != null) {
this.formatIndex = -1;
this.formatString = overriddenFormat;
}
Follow this article/answer: https://stackoverflow.com/a/11345859 but use your new class and interface.
Now you can use unique date formats if it is needed.
My use case was:
In a given sheet I have date values in G, H, and I columns, so my implementation of SheetContentsHandler.overriddenFormat is:
#Override
public String overriddenFormat(String cellRef, int formatIndex, String formatString) {
if (cellRef.matches("(G|H|I)\\d+")) { //matches all cells in G, H, and I columns
return "yyyy-mm-dd;#"; //this is the hungarian date format in excel
}
return null;
}
As you can see, in the endElement method I have overridden the formatIndex and formatString. The possible values of the formatIndex are described in org.apache.poi.ss.usermodel.DateUtil.isInternalDateFormat(int format). If the given value doesn't fit on these (and -1 does not fit), the formatString will be used through formatting the timestamp values. (The timestamp values are counted from about 1900.01.01 and have day-resolution.)
Excel stores some dates with regional settings. For example in the number format dialog in Excel you will see a warning like this:
Displays date and time serial numbers as date values, according to the type and locale (location) that you specify. Date formats that begin with an asterisk (*) respond to changes in regional date and time settings that are specified in Control Panel. Formats without an asterisk are not affected by Control Panel settings.
The Excel file that you are reading may be using one of those *dates. In which case POI probably uses a US default value.
You will probably need to add some workaround code to map the date format strings to the format that you want.
See also the following for a discussion of regional date settings in Excel.
In C# there is a method to write a string to the Console.
it is Console.WriteLine("Hello {0} My name is {1}", "World", "John");
this would return
Hello World My name is John
How can i recreate such a method structure in java. So that i can pass in unlimited amount of the parameters in the end of my method and get it placed in the right indexes?
Any help would be greatly appreciated
// EDIT
Maybe i have not explained well enough. I do not need the method to make a console output. I just want to know how can i recreate a structure in which i can pass as many parameters as i want and get it placed in the right place. For example
movie.setPlot("This movie is {0} and gets a rating of {1}", "FUN", "6 Thumbs up");
which would set the the plot varialbe for a movie to
This movie is FUN and gets a rating of 6 Thumbs up
// EDIT 2
End Result:
private static final String PREFIX = "AwesomeApp";
public static void e(String TAG, String msg){
android.util.Log.e(PREFIX + " >> " +TAG, msg);
}
public static void e(String TAG, String msg, Object...args){
e(TAG, String.format(msg, args));
}
You can use var-args to handle an indeterminate number of parameters:
void setPlot(String text, String... args) {
System.out.printf(text, args);
}
You can use the Formatter class introduced in Java 5, like this:
Formatter f = new Formatter();
f.format("Hello %s my name is %s", "World", "John");
System.out.println(f.toString());
Edit: (in response to the edit of the question) You can use a formatter in the implementation of your own custom method, like this:
private String plot;
void setPlot(String formatStr, Object... data) {
Formatter f = new Formatter();
format(formatStr, data);
plot = f.toString();
}
You can now call your setPlot function like this:
movie.setPlot("This movie is %s and gets a rating of %s", "FUN", "6 Thumbs up");