I am looking for an efficient way to identify new xml files in a Ceph storage using the S3 API (in java with the aws-java-sdk).
There is a date in the filename with the format "yyyyDDmm"
I have tested several methods, the first was to add tags to the processed files but it was too slow because it had to go through all the files present in the bucket.
I identified another solution, which is to move the processed files to a "processed" prefix, but in production, I would only have read-only access to these files.
Do you have any ideas ?
Related
In my project which deals with SAS
we have risk binary files of extension .rskcdesc
I have been looking but not able to find any python Java library which can read it.
I need to automate data checks via Backend process, hence need a way to decode these files.
Any suggestions?
Rohit,
I don't have SAS Risk but SAS has been using zip for some of their files (like EG projects). Try renaming the file extension to .zip and open it. It may be comprised of XML files, similar to EG.
All data must be stored in one single persistent file name secure_store.dat.
The following command should add new files to the Secure Store realm:
put [path_on_OS] [file_name]
How can I do this ?
How can I add a file that in my PC to secure.store ? Thank you.
If you don't mind that secure_store.dat will be a zipped file then you can use standard Java handling for zipped files...
Edit:
When you add multiple files together into one single file you must store them in such a way to preserve their boundaries, if you fail to do that the two or more files will become garbled mess.
java.util.zip functionality provides all features that you seem to need, it will create a zipped archive file with separate entries for each file that you add. It provides functionality to add/extract/remove files from the archive too.
I am in an situation where i need to run one command-line tool for file which is uploaded in alfresco repository.The reason behind this is i need to perform OCR on that particular file.
I know i can use transformation which alfresco by default provides.But transformation does not provides conversation between same mimetype and my requirement is like performing OCR on PDF File(which contains images) and again generate PDF File(Which contains extracted data).
My approach is to create a policy, when node is uploaded in alfresco repository.
From that policy I will access the node which is uploaded in alfresco repository using java,Here is the problem ,I dont know under which location of alf_data directory the file is getting uploaded.As i need to get physical location of file.
By the way I am using linux system.
Can any one help on this?
You need to use the ContentService, specifically getReader(NodeRef,QName) then getContent(File) to a temporary file
Your code would then be something like
File tmp = File.createTempFile("for-ocr",".tmp");
ContentReader reader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
reader.getContent(tmp);
// Run the OCR program here
tmp.delete();
I have a swing application that uses many data files, these data files will change time to time. How can I load these data files on client's machine? Is there any way to create a folder like structure and run a batch file or so? Any help is appreciated.
There are several ways to do this:
Assume you want to ship your application with the datafiles, you may embed them as a zip/jar in your application-jar-file.
Extract the embedded zip to a temporary local file and use ZipFileSystemProvider to extract the content to some place on the disc.
Here is an example how to extract some content from zip/jar-file embedded in a .jar-file downloaded by JWS.
Same as 1, but skip the zip stuff and instead provide a list of all the resources you want to extract
One other way is to create the files pragmatically using either java.nio.file (java 7+) or java.io.File
According to current requirement,user will upload files with large size,which he may like to download later. I cannot store the uploaded files in DB because the size of files is large and performance will be impacted if I store uploaded files in DB.
Any one knows any java plugin which provide efficient file management on webserver and maintains the link to file so that the file can be downloaded when the link is requested. Also the code will make sure that user will be able to download only those files which is uploaded by them,they cannot download any file just by modifying the download link etc. I am using spring3 as the framework.
Please suggest how to solve this problem?
if you have write access to the file system why not just save them there ?
you then generate an unique ID and save the hash/file relation in db, you then need to supply the ID to get the file feed from a servlet
Store the file content on a part of filesystem out of web application so you cannot reach it changing the link.
Then you can store on db the path for that file, and return them only if the user has the permissions to read it.
Pay attention, do not store all the file on the same folder, or the number of files could grow too much. So find a way to store them with more folder levels.