List<String> list= jsc.wholeTextFiles(hdfsPath).keys().collect();
for (String string : list) {
System.out.println(string);
}
Here i am getting all the zip files.From here i am unable to proceed how to extract each file and store into hdfs path with same zipname folder
You can use like below, But only thing we need to do collect at zipFilesRdd.collect().forEach before writing the contents into hdfs. Map and flat map gives task not serializable at this point.
public void readWriteZipContents(String zipLoc,String hdfsBasePath){
JavaSparkContext jsc = new JavaSparkContext(new SparkContext(new SparkConf()));
JavaPairRDD<String, PortableDataStream> zipFilesRdd = jsc.binaryFiles(zipLoc);
zipFilesRdd.collect().forEach(file -> {
ZipInputStream zipStream = new ZipInputStream(file._2.open());
ZipEntry zipEntry = null;
Scanner sc = new Scanner(zipStream);
try {
while ((zipEntry = zipStream.getNextEntry()) != null) {
String entryName = zipEntry.getName();
if (!zipEntry.isDirectory()) {
//create the path in hdfs and write its contents
Configuration configuration = new Configuration();
configuration.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
configuration.set("fs.file.impl", org.apache.hadoop.fs.LocalFileSystem.class.getName());
FileSystem fs = FileSystem.get(URI.create("hdfs://localhost:8020"), configuration);
FSDataOutputStream hdfsfile = fs.create(new Path(hdfsBasePath + "/" + entryName));
while(sc.hasNextLine()){
hdfsfile.writeBytes(sc.nextLine());
}
hdfsfile.close();
hdfsfile.flush();
}
zipStream.closeEntry();
}
} catch (IllegalArgumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
sc.close();
//return fileNames.iterator();
});
}
With gzip files, wholeTextFiles should gunzip everything automatically.
With zip files however, the only way I know is to use binaryFiles and to unzip the data by hand.
sc
.binaryFiles(hdfsDir)
.mapValues(x=> {
var result = scala.collection.mutable.ArrayBuffer.empty[String]
val zis = new ZipInputStream(x.open())
var entry : ZipEntry = null
while({entry = zis.getNextEntry();entry} != null) {
val scanner = new Scanner(zis)
while (sc.hasNextLine()) {result+=sc.nextLine()}
}
zis.close()
result
}
This gives you a (pair) RDD[String, ArrayBuffer[String]] where the key is the name of the file on hdfs and the value the unzipped content of the zip file (one line per element of the ArrayBuffer). If a given zip file contains more than one file, everything is aggregated. You may adapt the code to fit your exact needs. For instance, flatMapValues instead of mapValues would flatten everything (RDD[String, String]) to take advantage of spark's parallelism.
Note also that in the while condition, "{entry = is.getNextEntry();entry} could be replaced by (entry = is.getNextEntry()) in java. In scala however the result of an affectation is Unit so this would yield an infinite loop.
Come up with this solution written in Scala.
Tested with spark2 (version 2.3.0.cloudera2), scala (version 2.11.8)
def extractHdfsZipFile(source_zip : String, target_folder : String,
sparksession : SparkSession) : Boolean = {
val hdfs_config = sparksession.sparkContext.hadoopConfiguration
val buffer = new Array[Byte](1024)
/*
.collect -> run on driver only, not able to serialize hdfs Configuration
*/
val zip_files = sparksession.sparkContext.binaryFiles(source_zip).collect.
foreach{ zip_file: (String, PortableDataStream) =>
// iterate over zip_files
val zip_stream : ZipInputStream = new ZipInputStream(zip_file._2.open)
var zip_entry: ZipEntry = null
try {
// iterate over all ZipEntry from ZipInputStream
while ({zip_entry = zip_stream.getNextEntry; zip_entry != null}) {
// skip directory
if (!zip_entry.isDirectory()) {
println(s"Extract File: ${zip_entry.getName()}, with Size: ${zip_entry.getSize()}")
// create new hdfs file
val fs : FileSystem = FileSystem.get(hdfs_config)
val hdfs_file : FSDataOutputStream = fs.create(new Path(target_folder + "/" + zip_entry.getName()))
var len : Int = 0
// write until zip_stream is null
while({len = zip_stream.read(buffer); len > 0}) {
hdfs_file.write(buffer, 0, len)
}
// close and flush hdfs_file
hdfs_file.close()
hdfs_file.flush()
}
zip_stream.closeEntry()
}
zip_stream.close()
} catch {
case zip : ZipException => {
println(zip.printStackTrace)
println("Please verify that you do not use compresstype9.")
// for DEBUG throw exception
//false
throw zip
}
case e : Exception => {
println(e.printStackTrace)
// for DEBUG throw exception
//false
throw e
}
}
}
true
}
Related
How could I write a code to be able to delete exactly the duplicates
that I get previously with this code.?? please be specific when
answering as I am new to java.I have very basic knowledge of java.
private static MessageDigest messageDigest;
static {
try {
messageDigest = MessageDigest.getInstance("SHA-512");
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("cannot initialize SHA-512 hash function", e);
}
}
public static void findDuplicatedFiles(Map<String, List<String>> lists, File directory) {
for (File child : directory.listFiles()) {
if (child.isDirectory()) {
findDuplicatedFiles(lists, child);
} else {
try {
FileInputStream fileInput = new FileInputStream(child);
byte fileData[] = new byte[(int) child.length()];
fileInput.read(data);
fileInput.close();
String uniqueFileHash = new BigInteger(1, md.digest(fileData)).toString(16);
List<String> list = lists.get(uniqueFileHash);
if (list == null) {
list = new LinkedList<String>();
lists.put(uniqueFileHash, list);
}
list.add(child.getAbsolutePath());
} catch (IOException e) {
throw new RuntimeException("cannot read file " + child.getAbsolutePath(), e);
}
}
}
}
Map<String, List<String>> lists = new HashMap<String, List<String>>();
FindDuplicates.findDuplicateFiles(lists, dir);
for (List<String> list : lists.values()) {
if (list.size() > 1) {
System.out.println("\n");
for (String file : list) {
System.out.println(file);
}
}
}
System.out.println("\n");
Do not read the entire contents of the file into memory. The whole point of an InputStream is that you can read small, manageable chunks of data, so you don’t have to use a great deal of memory.
Imagine if you were trying to check a file that’s one gigabyte in size. By creating a byte array to hold the entire content, you have forced your program to use a gigabyte of RAM. (If the file were two gigabytes or larger, you wouldn’t be able to allocate the byte array at all, since an array may not have more than 2³¹-1 elements.)
The easiest way to compute the hash of a file’s contents is to copy the file to a DigestOutputStream, which is an OutputStream that makes use of an existing MessageDigest:
messageDigest.reset();
try (DigestOutputStream stream = new DigestOutputStream(
OutputStream.nullOutputStream(), messageDigest)) {
Files.copy(child.toPath(), stream);
}
String uniqueFileHash = new BigInteger(1, messageDigest.digest());
Scanning directories is easier with NIO Path / Files class because it avoids awkward recursion of File class, and it is much quicker for deeper directory trees.
Here is an example scanner which returns a Stream of duplicates - that is where each item in the stream is a List<Path> - a group of TWO or more identical files.
// Scan a directory and returns Stream of List<Path> where each list has 2 or more duplicates
static Stream<List<Path>> findDuplicates(Path dir) throws IOException {
Map<Long, List<Path>> candidates = new HashMap<>();
BiPredicate<Path, BasicFileAttributes> biPredicate = (p,a)->a.isRegularFile()
&& candidates.computeIfAbsent(Long.valueOf(a.size())
, k -> new ArrayList<>()).add(p);
try(var stream = Files.find(dir, Integer.MAX_VALUE, biPredicate)) {
stream.count();
}
Predicate<? super List<Path>> twoOrMore = paths -> paths.size() > 1;
return candidates.values().stream()
.filter(twoOrMore)
.flatMap(Duplicate::duplicateChecker)
.filter(twoOrMore);
}
The above code starts by collating candidates of same file size, then uses a flatMap operation to compare all those candidates to get the exact matches where the files are identical in each List<Path>:
// Checks possible list of duplicates, and returns stream of definite duplicates
private static Stream<List<Path>> duplicateChecker(List<Path> sameLenPaths) {
List<List<Path>> groups = new ArrayList<>();
try {
for (Path p : sameLenPaths) {
List<Path> match = null;
for (List<Path> g : groups) {
Path prev = g.get(0);
if(Files.mismatch(prev, p) < 0) {
match = g;
break;
}
}
if (match == null)
groups.add(match = new ArrayList<>());
match.add(p);
}
} catch(IOException io) {
throw new UncheckedIOException(io);
}
return groups.stream();
}
Finally an example launcher:
public static void main(String[] args) throws IOException {
Path dir = Path.of(args[0]);
Stream<List<Path>> duplicates = findDuplicates(dir);
long count = duplicates.peek(System.out::println).count();
System.out.println("Found "+count+" groups of duplicate files in: "+dir);
}
You will need to process list of duplicate files using Files.delete - I've not added Files.delete at the end so that you can check the results before deciding to delete them.
// findDuplicates(dir).flatMap(List::stream).forEach(dup -> {
// try {
// Files.delete(dup);
// } catch(IOException io) {
// throw new UncheckedIOException(io);
// }
// });
I want to have an INI file with a section containing a bunch of option values with the same option key. In othe rwords, I want to represent an array in the ini file. My problem is that only the last value is read into the array or liste, depending on the getAll method I use:
The .ini file:
[FTP]
; Access FTP server?
active = false
file.pattern = VA_.*.(csv|dat)$
#file.pattern = VA_.*(\\.(?i)(csv|dat))$
delete.after.download = false
[SFTP]
; Access FTP server?
active = true
file.pattern = VA_.*.(csv|dat)$
#file.pattern = VA_.*(\\.(?i)(csv|dat))$
delete.after.download = false
[SMB]
; Access SMB target?
active = false
[SCP]
; Access SCP target?
active = false
[FTP_Accounts]
ftpAccount = /aaa/xxx
ftpAccount = /bbb/xxx
ftpAccount = /ccc/xxx
ftpAccount = /ddd/xxx
ftpAccount = /eee/xxx
ftpAccount = /fff/xxx
The follwoing Java code doesn't get me all options values for option key ftpAccount:
public SftpFileHandler() {
Wini ini = null;
try {
Config.getGlobal().setEscape(false);
Config.getGlobal().setMultiSection(true);
Config.getGlobal().setMultiOption(true);
ini = new Wini(new File("MyIniFile.ini"));
} catch (InvalidFileFormatException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
final String ftpFileNamePattern =
ini.get("FTP", "file.pattern", String.class);
pattern = Pattern.compile(ftpFileNamePattern);
List<Ini.Section> list = ini.getAll("FTP_Accounts");
final Ini.Section ftpAccountsSection = ini.get("FTP_Accounts");
for (final String optionKey: ftpAccountsSection.keySet()) {
System.out.println(optionKey);
}
ftpAccounts = ftpAccountsSection.getAll("ftpAccount", String[].class);
final List<String> ftpAccountsList = ftpAccountsSection.getAll("ftpAccount");
final Ini.Section sftpAccountsSection = ini.get("SFTP_Accounts");
sftpAccounts = sftpAccountsSection.getAll("sftpAccount", String[].class);
connect();
}
I thought I could get all the option values with the getAll calls into an array.
Thanks to https://stackoverflow.com/users/7345335/philippe-cerou in question Java ini4j - reading multiple options from .ini file .
He pointed me to not load the ini file while instantiating the Wini object.
First create a Config instance and set its MultiOption property to true.
Then initialize a Wini instance without the ini file as a parameter. Instead load the ini file afterwards using the load() method.
Wini ini = null;
Config conf = new Config();
try {
conf.setMultiOption(true);
ini = new Wini();
ini.setConfig(conf);
ini.load(new File("MyIniFile.ini"));
} catch (InvalidFileFormatException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
This is my Code inside myDir.mkdirs(); this code show me that warning of Result of File.mkdirs() is ignored.
I try to fix this Warning but I failed.
private void saveGIF() {
Toast.makeText(getApplicationContext(), "Gif Save", Toast.LENGTH_LONG).show();
String filepath123 = BuildConfig.VERSION_NAME;
try {
File myDir = new File(String.valueOf(Environment.getExternalStorageDirectory().toString()) + "/" + "NewyearGIF");enter code here
//My Statement Code This Line Show Me that Warning
myDir.mkdirs();
File file = new File(myDir, "NewyearGif_" + System.currentTimeMillis() + ".gif");
filepath123 = file.getPath();
InputStream is = getResources().openRawResource(this.ivDrawable);
BufferedInputStream bis = new BufferedInputStream(is);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] img = new byte[AccessibilityNodeInfoCompat.ACTION_NEXT_HTML_ELEMENT];
while (true) {
int current = bis.read();
if (current == -1) {
break;
}
baos.write(current);
}
FileOutputStream fos = new FileOutputStream(file);
fos.write(baos.toByteArray());
fos.flush();
fos.close();
is.close();
} catch (Exception e) {
e.printStackTrace();
}
Intent mediaScanIntent = new Intent("android.intent.action.MEDIA_SCANNER_SCAN_FILE");
mediaScanIntent.setData(Uri.fromFile(new File(filepath123)));
sendBroadcast(mediaScanIntent);
}
The method mkdirs has a boolean return value, which you didn't use.
boolean wasSuccessful = myDir.mkdirs();
The create operation returns a value, which indicates if the creation of the directory was successful. For example, the result value wasSuccessful can be used to display an error when it is false.
if (!wasSuccessful) {
System.out.println("was not successful.");
}
From the Java docs about the boolean return value:
true if and only if the directory was created, along with all
necessary parent directories; false otherwise
File CDir = new File(Environment.getExternalStorageDirectory(), IMPORT_DIRECTORY);
if (!CDir.exists()) {
boolean mkdir = CDir.mkdir();
if (!mkdir) {
Log.e(TAG, "Directory creation failed.");
}
}
mkdir return a Boolean value. we need to catch the return value from mkdir .Replace your code with this and check (warning of Result of File.mkdirs() is ignored.) will be gone
The idea behind the return-value of mkdir is, that every IO-Operation could fail and your program should react to this situation.
You can do:
if(myDirectory.exists() || myDirectory.mkdirs()) {
// Directory was created, can do anything you want
}
or you can just remove the warning using:
#SuppressWarnings("ResultOfMethodCallIgnored")
The mkdirs method checks if file exists but returns false if the directory was already created so you should check one more time using first method.
File myDirectory = new File(Environment.getExternalStorageDirectory(),"NewyearGIF");
if(!myDirectory.exists()) {
myDirectory.mkdirs();
}else{
// Directory already exist
}
This is a old question but still, the simplest way I've found is:
File imageThumbsDirectory = getBaseContext.getExternalFilesDir("ThumbTemp");
if(imageThumbsDirectory != null) {
if (!imageThumbsDirectory.exists()) {
if (imageThumbsDirectory.mkdir()) ; //directory is created;
}
}
Just put this code:
File myDirectory = new File(Environment.getExternalStorageDirectory(),"NewyearGIF");
if(!myDirectory.exists()) {
myDirectory.mkdirs();
}else{
// Directory already exist
}
If application running in above Lollipop, then you need to add runtime permission for storage.
I am developing a System using Struts 1.x, Jsp Servlet and Pentaho(report generation).
Here when after generating report user can open and save that file as excel and that is working fine.
But problem occurs when a open a file, it creates file in our jboss temp folder that file is not deleted - that is the issue.
we are deleting file from code level and we figureout that after restating sever after first time it is deleting and other thing is when debugging it is deleting every time.
public Object process() throws RenderException, IOException {
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
// properties for XSSF
File xmlFile = null;
File templateFile = null;
Writer xmlWriter = null;
boolean isXLSX = false;
timeStamp = Calendar.getInstance().getTimeInMillis();
try {
OutputType outputType = rendererAttrList.getOutputType();
Frame[] frameList = xmlDef.getShownFrames(renderDocs);
if (frameList != null) {
// ==============================================TPID#65448 code
// add========================================
boolean isPentahoExcel = false;
Frame t_CurrFrame = frameList[0];
if (t_CurrFrame != null) {
FrameType frameType = t_CurrFrame.getFrameType();
if (frameType == FrameType.FRAME_TYPE_EXTERNAL) {
isPentahoExcel = true;
}
}
// ==============================================TPID#65448 code end========================================
isXLSX = isXLSXOutput(frameList);
// native excel support and if the output is not supported by
// .xls, change the output to .xlsx
if (((outputType == OutputType.NATIVE_EXCEL2007) || ((outputType == OutputType.NATIVE_EXCEL97) && isXLSX))&&!isPentahoExcel)
{
workbook = new XSSFWorkbook();
rendererAttrList.setOutputType(OutputType.NATIVE_EXCEL2007);
xmlFile = File.createTempFile(getXmlDef().getName()
+ timeStamp, ".xml");
logger.info("XML File location :"
+ xmlFile.getAbsolutePath());
xmlWriter = new OutputStreamWriter(new FileOutputStream(
xmlFile), "UTF-8");
spreadSheetWriter = new SpreadsheetWriter(xmlWriter);
spreadSheetWriter.beginSheet();
} else {
workbook = new HSSFWorkbook();
}
dataFormat = workbook.createDataFormat();
sheet = workbook.createSheet("Report");
logger.debug("Start rendering the excel output in "
+ rendererAttrList.getOutputType() + " format ");
renderOutput();
logger.debug("Stop rendering the excel output ");
if (workbook instanceof HSSFWorkbook) {
// ==============================================TPID#65448 code add========================================
if (isPentahoExcel) {
renderExternalXLSX(t_CurrFrame,byteArrayOutputStream);
} else {
autoSizeColumn();
// write the excel to output
workbook.write(byteArrayOutputStream);
}
// ==============================================TPID#65448 code end========================================
} else {
// 1. generate data in XML format
spreadSheetWriter.endSheet();
// close the xml stream before we substitute in xlsx file
try {
if (xmlWriter != null)
xmlWriter.close();
xmlWriter = null;
} catch (Exception ex) {
logger.error("Error while closing xmlWriter for file "
+ xmlFile.getName());
}
// Step 2. create template from the excel workbook
String sheetRef = ((XSSFSheet) sheet).getPackagePart()
.getPartName().getName();
templateFile = createTemplate();
ByteArrayOutputStream xlsxOutput = new ByteArrayOutputStream();
// Step 3. Substitute the template entry with the generated
// data
substitute(templateFile, xmlFile, sheetRef.substring(1),
xlsxOutput);
// if the data is too large don't try to auto size the
// columns
// may result into out of memory exception
if (!isXLSX) {
// autosize the columns
InputStream inp = new ByteArrayInputStream(
xlsxOutput.toByteArray());
workbook = WorkbookFactory.create(inp);
sheet = workbook.getSheetAt(0);
autoSizeColumn();
if (xlsxOutput != null)
xlsxOutput.close();
byteArrayOutputStream = new ByteArrayOutputStream();
workbook.write(byteArrayOutputStream);
inp.close();
xlsxOutput.close();
} else {
byteArrayOutputStream = xlsxOutput;
}
}
}
} catch (Exception de) {
logger.error(de.getMessage(), de);
throw new RenderException(de.getMessage());
} finally {
try {
if (xmlWriter != null)
xmlWriter.close();
if (xmlFile != null)
xmlFile.delete();
if (templateFile != null){
templateFile.delete();
}
} catch (Exception ex) {
logger.error("Error while closing xmlWriter for file "
+ xmlFile.getName());
}
}
return byteArrayOutputStream;
}
Summary of my question - when Jboss restarting then first time and when debugging, temporary created file(temp directory) is deleting successfully .
but when it run normally it not deleting the file
but every time it is calling particular code level problem is why it doesn't perform.
templateFile.delete();
Thank you very much...
What happens when the delete method is called? You can log the output of the 'delete' method to see the result. It should be true or false.
Does it throw an exception? The delete method could throw a SecurityException indicating that you are denied access to delete the file.
In general, the first thing to do is try to understand why the file is not beeing deleted given the toolbox provided to you by this particular function.
Another approach could be to also call the deleteIfExists method instead. See here:
https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#deleteIfExists-java.nio.file.Path-
I had a problem with reading /proc/%d/stat files using my Java method copyFiles() (source code below).
I have found workaround using similar readProc() method.
Now I am wondering what was the problem. Output files were created, but each file had 0 bytes (in /proc/ all files are 0 bytes because it is not standard filesystem). FileUtils is from the Apache Commons IO library.
I've tried to do the same using java.nio - again, IOException is being thrown that attributes are wrong for each file.
I removed some part of the code regarding parsing exceptions etc.
Why does this work with FileInputStream, but not with FileUtils.copyFile()?
public void copyFiles() {
final File dir = new File("/proc");
final String[] filedirArray = dir.list();
long counter = 0;
for(String filedir : filedirArray) {
final File checkFile = new File(dir, filedir);
if (checkFile.isDirectory()) {
try {
Integer.parseInt(filedir);
File srcFile = new File(checkFile, "stat");
File dstFile = new File("/home/waldekm/files/stat" + "." + Long.toString(counter++));
try {
FileUtils.copyFile(srcFile, dstFile);
} catch (IOException e1) {}
} catch (NumberFormatException e) {
// not a number, do nothing
}
}
}
}
public static void readProc(final String src, final String dst) {
FileInputStream in = null;
FileOutputStream out = null;
File srcFile = new File(src);
File dstFile = new File(dst);
try {
in = new FileInputStream(srcFile);
out = new FileOutputStream(dstFile);
int c;
while((c = in.read()) != -1) {
out.write(c);
}
} catch (IOException e1) {
} finally {
try {
if (in != null) {
in.close();
}
} catch (IOException e1) {}
try {
if (out != null) {
out.close();
}
} catch (IOException e1) {}
}
The reason is most likely that the operating system is reporting the file size as zero.
On my machine, man 2 stat says this:
"For most files under the /proc directory, stat() does not return the file size in the st_size field; instead the field is returned with the value 0."
(The stat system call will be what the JVM uses to find out what a file's size is.)
Here is a code snipped that would read specific fields from a proc file, using methods that are available (but not documented directly) in the Process class of Android. Modify the FORMAT buffer and the output buffer size to read more/different values from the proc file,
int PROC_SPACE_TERM = (int)' ';
int PROC_OUT_LONG = 0x2000
public static final int[] PROCESS_STATS_FORMAT = new int[] {
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM,
PROC_SPACE_TERM|PROC_OUT_LONG, // 13: utime
PROC_SPACE_TERM|PROC_OUT_LONG // 14: stime
};
long buf[] = new long[2];
try {
int pid = 1000; // Assume 1000 is a valid pid for a process.
Method mReadProcFile =
Process.class.getMethod("readProcFile", String.class,
int[].class, String[].class,
long[].class, float[].class);
mReadProcFile.invoke(null, "/proc/" + pid + "/stat",
PROCESS_STATS_FORMAT, null, buf, null);
return buf;
} catch(NoSuchMethodException e) {
Log.e(TAG, "Error! Could not get access to JNI method - readProcFile");
} catch (InvocationTargetException e) {
Log.e(TAG, "Error! Could not invoke JNI method - readProcFile");
} catch (IllegalAccessException e) {
Log.e(TAG, "Error! Illegal access while invoking JNI method - readProcFile");
}
return null;
I see you are creating a FileInputStream to read a /proc file. Instead I suggest you create a FileReader object. FileInputStream gets tripped up by the lack of file length for /proc files but FileReader does not.