How to remove CR LF in nifi

How to remove CR LF in nifi - java

I have a csv which has a CRLF when i checked in notepad++ due to which the data corresponding to that column is populating in next column and the same is happening with the next column value.
I want to replace that with Replace text in Apache-Nifi. Any leads?
Example below:
Name,id,product,product_id,email,phone_no,fax_no
John,1,2,3,CRLF
x#p.com, +212 -909-9008, +212 -909-9009 -- it is coming in next line.

You can use replaceTextprocessor line by line mode.
replace \n with '' or any other value.

Related

Amazon S3 Select Issue : not supporting line break occurring inside fields

I am trying to use Amazon S3 Select to read records from a CSV file and if the field contains a line break(\n), then the record is not being parsed as a single record. Also, the line break inside the field has been properly escaped by double quotes as per standard CSV format.
For example, the below CSV file
Id,Name,Age,FamilyName,Place
p1,Albert Einstein,25,"Einstein
Cambridge",Cambridge
p2,Thomas Edison,30,"Edison
Cardiff",Cardiff
is being parsed as
Line 1 : Id,Name,Age,FamilyName,Place
Line 2 : p1,Albert Einstein,25,"Einstein
Line 3 : Cambridge",Cambridge
Line 4 : p2,Thomas Edison,30,"Edison
Line 5 : Cardiff",Cardiff
Ideally it should have been parsed as given below:
Line 1:
Id,Name,Age,FamilyName,Place
Line 2:
p1,Albert Einstein,25,"Einstein
Cambridge",Cambridge
Line 3:
p2,Thomas Edison,30,"Edison
Cardiff",Cardiff
I'm setting AllowQuotedRecordDelimiter to TRUE in the SelectObjectContentRequest as given in their documentation. It's still not working.
Does anyone know if Amazon S3 Select supports line break inside fields as described in the case mentioned above? Or any other parameters I need to change or set to make this work?

This is being parsed / printed correctly. The confusion lies in that the literal newline is being printed in the output. You can test this if you run the following expression on the given csv:
SELECT COUNT(*) from s3Object s
Output: 2
Note that if you specify only the third column, you get only the correct value:
SELECT s._3 frin s3Object s
You get only the parts of each line that enclose said field:
"Einstein
Cambridge"
"Edison
Cardiff"
What's happening is the character in the field is the same as the default CSVOutput.RecordDelimiter value (\n) which is causing a clash. If you want to separate each field in a different way, you could add the the following to the CSVOutput part of the OutputSerialization:
"RecordDelimiter": "\r\n"
or use some other type of 1-2 length character sequence in place of \r\n

str in control file is not working to load csv data having carriage return and line feed

I am trying to load (using sqlldr) a csv file from linux system to oracle database where a column is having data which has carriage return and line feed.
Control File looks as below:
OPTIONS (DIRECT = TRUE, SKIP = 1, ERRORS=0)
unrecoverable load data
CHARACTERSET UTF8
infile 'abc.csv' "str '\r\n'"
into table USER1."ABC"
Append
fields terminated by "," optionally enclosed by '"'
TRAILING NULLCOLS
("COLUMN1" CONSTANT 100,
"COLUMN2",
"COLUMN3" CONSTANT 'XYZ',
"COLUMN4")
CSV File looks as below:
COLUMN2, COLUMN4
"abc1","abc2
welcome"
"ok","abc4"
I have tried following things in control file but load was successful with zero row insertion to the table:
1. "str '\r\n'"
2. "str '#EOR#'"
3. "str x'0D'"
4. "str '\n'"
"str '\n'":This generates .bad file. Content of .bad file is as below:
"abc1","abc2
Is there anything that is being missed? Kindly help. Thanks in advance.

Have The Data Adhere to the Stream Record Format You Have Identified
You are using the Stream Record Format and you are indicating each record ends with \r\n.
Based on the *.bad file, your data file records end with \n and not \r\n (standard Unix line ending behavior).
Can you change your stream record format's end of record to, |\n, and add a | at the end of every record in your data?
You would change this line:
infile 'abc.csv' "str '\r\n'
to
infile 'abc.csv' "str '|\n'
The data would change to this:
"abc1","abc2
welcome"|
"ok","abc4"|

How can i write values in a column using comma

As per my jmeter test plan,i am saving following information into a csv file using Beanshell PostProcessor
username = vars.get("username");
password = vars.get("password");
f = new FileOutputStream("/path/user_details.csv", true);
p = new PrintStream(f);
this.interpreter.setOut(p);
print(username + "," + password);
f.close()
How can i save those values into a single column using comma (username,password)

Put double quotes around the entire string, so that the comma will be part of a single data item's value, rather than a value separator.
In practice the character you use for the column separator, and the characters you use as a delimiter, are configurable, by a CSV library (which, should really almost always be used instead of trying to get the syntax details right on your own).

Special chars in JAVA

สวัสดี Mr.Java Sp'e c'i'a'l'' '
I tried to parse the String using below code but I could't make
simply it shows the wrong value.
String s = "สวัสดี Mr.Java Sp'e c'i'a'l'' '"";
s = s.replaceAll("'", "&apos;");
//s = s.replaceAll("'", "''");
StringEscapeUtils.escapeHtml(s);
I am trying to get from JSP and save in SQL Server DB and show using JSP and update.
But some times in JSP it shows the converted &apos in jsp as it is instead of Special
Chars.
Very Simple is Here I have shown this String(สวัสดี Mr.Java Sp'e c'i'a'l'' ') in StackOverflow they
save in their DB and Shows and allows me to update this is what I
wanted.

OK. So lets look at what your code does:
// line 1
String s = "สวัสดี Mr.Java Sp'e c'i'a'l'' '";
We have a String with various international characters in it ... and some "'" characters.
// line 2
s = s.replaceAll("'", "&apos;");
Assuming that those are really "'" characters characters, we will replace all instances of "'" with an XML / HTML character entity giving us:
"สวัสดี Mr.Java Sp&apos;e c&apos;i&apos;a&apos;l&apos;&apos; &apos;"
And so ...
// line 3
s = StringEscapeUtils.escapeHtml(s);
This replaces any active HTML / XML characters with character references. This includes the ampersand characters "&" that you previously inserted. The result is this:
"&#xxxx;&#xxxx;&#xxxx;&#xxxx; Mr.Java Sp&apos;e
c&apos;i&apos;a&apos;l&apos;&apos; &apos;"
(The &#xxxx; numeric character references encode those Thai (?) characters.)
When you embed that in an HTML document and display it, you will see "สวัสดี Mr.Java Sp&apos;e c&apos;i&apos;a&apos;l&apos;&apos; &apos;"
See what has happened? You have HTML escaped your HTML escaped apostrophies!!
So what do you really need to do?
There is no need replace apostrophes with &apos;. Apostrophes are legal in HTML text.
There should be no need to add HTML escapes so that you can store text in a database:
Any modern database will allow you to store Unicode strings without any special encoding.
If you are trying to prevent the database's SQL parser getting confused by quotes in the text you are storing, you are doing it the wrong way. The right way to do this is to use a PreparedStatement, add parameter placeholders to the query, and use the PreparedStatement.setXxx methods to provide the parameter values. The execute (or whatever) will take care of any SQL escaping that needs to be done.

how to skip lines of a csv file while using LOAD DATA command?

i'm using sql command load data to insert data in a csv file to mysql database. the problem is that at the end of the file there's a few line like ",,,,,,,,,,,,,,,,,," (the csv file is a conversion of an excel file). so when sql get to those lines he send me : #1366 - Incorrect integer value: '' for column 'Bug_ID' at row 661.
the 'bug_id' is an int and i have 32 column.
how can i tell him to ignore those lines considering the number of filed lines is variable?
thanks for your help.

MySQL supports a 'LINES STARTING BY "xxxx" ' for when reading delimited text files. If you can, require your specific .CVS file to have each data line with a 'prefix' and non-data lines to not have that prefix. This gives you the benefit of being able to putting comments into a .CSV if desired.
MySQL Doc: Load Data InFile
You can:
step 1 - (optionally) export data:
SELECT *
INTO OUTFILE "myFile.csv"
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES STARTING BY 'DATA:'
TERMINATED BY '\n'
FROM table
step 2 - import data
LOAD DATA INFILE "myFile.csv"
INTO TABLE some_table
COLUMNS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
ESCAPED BY '\\'
LINES STARTING BY 'DATA:'
Effectively you can modify the .csv file to look like this:
# Comment for humans
// Comment for humans
Comments for us humans.
DATA:1,3,4,5,6,'asdf','abcd'
DATA:4,5,6,7,8,'qwerty','zxcv'
DATA:9,8,7,6,5,'yuio','hjlk'
# Comments for humans
// Comments for humans
Comments for humans
DATA:13,15,64,78,54,'bla bla','foo bar'
Only the lines with 'DATA:' prefix will be interpreted/read by the construct.
I used this technique to create a 'config' file for a SQL script that needed external control information. But there was a human element that needed to be able to easily manipulate the .csv file and understand its contents.
-- J Jorgenson --

i fixed it:
i just added a condition on the line in my csv parser
while ((line = is.readLine()) != null) {
if (!line.equals(",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,"))
{
Iterator e = csv.parse(line).iterator();
......
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to remove CR LF in nifi - java

You can use replaceTextprocessor line by line mode. replace \n with '' or any other value.

Related

Amazon S3 Select Issue : not supporting line break occurring inside fields

str in control file is not working to load csv data having carriage return and line feed

How can i write values in a column using comma

Special chars in JAVA

how to skip lines of a csv file while using LOAD DATA command?

Categories

Resources