How to index file names on a disk with Java [closed]

How to index file names on a disk with Java [closed] - java

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I write a service which stores millions files (20-30mb file) on a disk and I need to write a search function to find a file by name (there is no need to search file content) or view files in explorer (for example, navigate in browser as a folder structure). I want to make it fast, reliable and simple in Java. Say, I plan to run two services both of which can be used to upload a file or search files by name pattern. What will be the best technology/approach to use to implement this? Store a file on a disk as well as the path and name in the database, search against the database and fetch findings by path from the database? Any other good ideas? I thought about elasticsearch but looks like a heavy solution.

This question is too broad and rather not in a format of SO (concrete programming questions mostly with code snippets that try to address a concrete technical difficulty given the set of technologies).
There are many ways to fulfill your requirements. Yet, based solely on the information presented in your question, its impossible to recommend something because we don't really know your requirements. I'll explain:
I plan to run two services both of which can be used to upload a file or search files by name pattern.
Does this mean that the file system has to be distributed?
If so, consider Cloud solutions style aws's S3.
If you can't run in the cloud, here you can find a comprehensive list of distributed filesystems.
Elasticsearch can also work of course as a search engine, but its more a full fledged search engine, so looks like an overkill for me in this case.
You might want to work directly with lucene so that you won't need to run an additional process that also might fail (ES is built on top of lucene). Lucene will store its index directly on the filesystem, again if it meets the requirements.
Now you're talking also about the database - again a possible direction especially if you're already have one in your project. In general relational database management servers have some support of searching but there are more advanced solutions: in PostgreSQL for example you have a GIN index (inverted index) again the same concepts for full text search that go way beyond standard's SQL's LIKE Operator.
Yet another idea: go with a local disk. If you're on linux there is an indexing utility called "locate" that you can delegate the index creation to.
So the choice is yours.

Related

Safe ways to save data in Java application [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed last month.
Improve this question
Account is new but I've been reading on this website for years and I now need some advice.
Still pretty new at Java and I am developing a desktop application that deals with personal information relating to the user but also their social circles.
What would be the best option to keep these info safe? I am not considering external DB (like MySql which I am using so far for my beta) as it means potential users would have to install and configure some other softwares, also not considering serialization as I don't really like the idea of an external txt file (please correct me if I am wrong about any of these two points).
Is there a way to directly store info in the application? Could someone please give me some perspective on the matter and recommend leads?
(And sorry if a previous thread covers my concerns, I did not find it!)
Thanks a lot in advance!
Good night, evening, day or morning!

Modern OS design and security principles strongly dictate that an app should not have write access to itself. There are real hacky ways (complex, hard to write, most libraries don't work well with it, and fragile, in that it'll easily break on some systems) to write into your own jar, but it's a bad design in any case, let alone when you take into account that complex and fragile.
You can't reasonably encrypt anything (because the password will have to be inside the app) unless the user has to enter the decryption password as they start your app / open your 'storage file' if it's a multi-document kind of deal, in which case, by all means, do that.
h2 is an all-java database engine, no need for the user to install separate anything, it produces a single file with the data. It's that or handroll your own serialization to a separate file in the user's home or ~/Documents or whatnot. You'll have to hardcode for each major OS the right location if you don't just want to 'write to .myapp.bin in the home dir of the user' which you can easily do via System.getProperty("user.home").
So, yes, you are wrong on both points:
DB/SQL is fine - there is no need to force the user to install a separate library. sqlite can do it (but this does involve DLLs and the like; sqlite JDBC driver takes care of this), but I'd recommend h2database.com for this purpose.
It will be a file. It won't be a txt file. This is good.

Best practices when learning a new API [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
When learning/working with a new API, does anyone have any tips for effectively learning it?
I currently make a bunch of get requests to understand what I can and cannot retrieve based on the API's responses. From that point I try to map out what is within the API and see what I can build out.
If you guys could share anything what you do that would be great.

The first thing i do is to read API documentation and search for examples in it. As you get used to read this kind of docs you'll find easier to find exactly what parts of the functionality you need to learn first.
I also use search engines to look for more working examples, and after that I work on creating a minimal use case of the API (for example write a file with commons-io api of apache). For this is a good idea to create a project with multiple JUnit tests with minimal use cases of an API (in the example of commons-io create a file, delete a file, move a file, copy a file, ...).
I must say this is not a science and each API is a new world and may require a slightly differnt approach (As with rest apis you'll need to use some tool like curl or postman to understand how to communicate with them, others will have pre-requirements like have a working installation of a system, and so on).
As everything in coding you'll need to do it by yourself and struggle to solve issues you'll find by yourself (what can take several hours of your free time).
There is no "magic" behind learning something, and coding is in some way like playing a musical instrument, it requires practice.

I dont know whether you are a beginner or you have developed already but will start from scratch..!!
Apis are the code which will allow you to play with the content having certain formats...!!
There are apis based on what operations you want to do are.
Get=> In order to fetch something.
Post=> In order to save something.
Put => to update something.
Delete => to delete something.
People also use patch similar to update...!!
You can play around all these by constructing objects and databases...!!
You will require rest services spring restful web service is the ultimate good options..!!
Diving deeper you need to be careful assigning the names you give I mean the meaningful names as you dont know if tomorrow you become famous and need to make your apis sharable ;)
Now some common concerns are like
Meaningful Name.
Versioning is required like what the old apis are working and now what data your apis give.
Can implement swagger its a tool which will allow you to describe the apis like you can write what this api does what type of data it brings etc etc..!!
Apis are more or less called an end points means you have that link as a connection between front end and backend So need to keep it secure..!! By authentication.
Above four points are considered to be good practises for writing apis ;)

Can you recommend a way to present many multiple pages of data on the web sortable [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a report generated from a database containing about 100,000 entries, each entry contains about 10 columns, the data is stored on Amazon S3 and is generated monthly. I'm looking for some pointers on a way you recommend a way to present this many multiple pages of data on, and I want it to be sortable and because however I sort it it wont suit all users, ideally it should be searchable as well.
Is it possible to do purely client-side or is that unfeasible, do I need go back to the server. I don't have the database available but if needs be, the website is backed by a java servlet application running on Tomcat. A self contained library for doing this would be very useful.

To paraphrase the discussion above.
Providing search/paging in Javascript is not sensible because this would still require the user to download all the data in one go, and representing that amount of data in html is not going to work well.
So either have to provide a server backend and provide a mechanism for searching and paging. Or provide the data in a spreadsheet format then the user can use the capabilities of their spreadsheet tool, which is well suited to dealing with large volumes of data.
Im going to try the spreadsheet idea.

How to have access to a Bible Document in Java/Android [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
This may be too broad of a question, but how does one access a Bible document in an Android application? Is it a text file that has an index to find specific verses or is it something much more complicated? I hope that is enough to answer from.

The first step would be to actually find a structured bible dataset from somewhere.
You can search and try to see if there's an xml version of your favourite translation somewhere, and maybe download that.
Once you've got it (either as xml, json, or whatever format), you can write some Java code to parse the file, and load it into some appropriate data structure that allows you to do what you want with it efficiently (eg. search it by verse).
You could even put it into a database (eg. MySQL or MongoDB), which would allow you to search it efficiently.
But really, how you want to structure the data depends on how you're going to use it, and what formats it's already available in (as it could be a pain to clean up the XML).
You might find the following resources useful:
Web Service APIs to directly get verses: http://www.4-14.org.uk/xml-bible-web-service-api
These would mean avoiding a lot of the headaches of dealing with file formats, indexing, and all kinds of other stuff.
Web service APIs generally work by your program submitting a query to a website (eg. including the biblical reference), and you get back some structured data (eg. xml/json) containing the verse(s) you requested.
Download a structured offline copy: http://www.bibletechnologies.net./osistext/
This would mean you have to find, download, parse, and index your own data structure for dealing with the text, but it would be much faster (if done right) than using a web service to do it.
The link I posted here has only some example books from the bible, but if you look you'll find more around the web.

It completely depends on the format of the file.
Any book or text document has multiple ways it can be stored and distributed. It could simply be in a .pdf file, or it could be stored in an XML, or .epub
It is beyond broad, because there are so many ways to do it, it's impossible to guess without more information.
This link has some information about the e-book formats:
http://en.wikipedia.org/wiki/Comparison_of_e-book_formats
And that's just one small subsection of ways text can be stored.

What to choose for article management website [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
I have some questions regarding article management system.
I am thinking of making a website where people will become members and write their articles, they can publish them, rank them etc.
And i have been googling for past two weeks that which technology is best.
And how to store the article so that search engines (like google, yahoo, etc.) can find those articles.
If the articles are stored as html somewhere on my server then Google Spider programs will be able to get them for search results
but if i store the content of my article in MySQL (the database which i want to use), how would search engines rank my website articles.
I am really confused, please guide me.
I need to know if there is any PHP article management script which is open source which i can update or change to suit my needs and
has not been hacked. Or Java Content management script or something which can save me the time to develop this whole thing.
I would really appreciate it.

Generally if you store the content in a database, you have scripts which serve up that content, and thus search engine spiders index the served versions of the article.
There are many content management systems out there, it's really a subjective choice which one you choose. Whether or not something "has been hacked" is a poor indicator of whether it can currently and/or might in the future be compromised; the developers of CMS software tend to patch known holes and it's impossible to predict future holes based on past ones, so really, you're best bet is to just try to find something with solid support and active development, and patch frequently as security updates are released.

As others have said, storing article data in a database is no problem. The articles will get rendered into HTML by some script, and displayed on your site, where search engines will find them. There are a bunch of techniques to improve how well your articles will show up in search.
In this day and age, I wouldn't recommend rolling your own system. There are a great number of off-the-shelf software packages that can handle your requirements. Wordpress is a very popular blogging system, written in PHP (with MySQL), that will probably meet all of your requirements. It supports multiple authors (and various roles for authors such as author/editor/administrator), commenting/discussion, and has a huge array of plugins that provide additional or altered functionality. It's well documented (both user and developer documentation), actively maintained, and pretty good overall.
If Wordpress doesn't float your boat, I'd look around at some of the other PHP-driven blogging tools. There are a ton of them, and it's very likely that one will fit your needs, and you can avoid reinventing the wheel for the 900th time.

I am sorry i still didn't understand. Here is the example lets say user1 submitted the article1 and the content got stored in the database. Now on a home page there is a link "How to train your pet" and user clicks on this link and it goes to a servlet which pulls the article content and information from the database and generates an output and displays it into ... what an html or what like will it save the output as an html on the server so that next time when another user clicks on "How to train your pet" on the homepage he will be directed to this generated html
Or another case where servlet will generate the output on the browser where user will read and vote, rank etc. but in this case there is no html file so how would search engines will rank this article as this file doesn't exist. Its so confusing.

If the articles are stored as html somewhere on my server then Google Spider programs will be able to get them for search results but if i store the content of my article in MySQL (the database which i want to use), how would search engines rank my website articles.
It doesn't matter how they are stored, only that they are addressable via http URIs. Browsers don't access data in databases, they make requests to web servers (which might run programs to fetch data from databases or might fetch data from files on the file system).
I need to know if there is any PHP article management script which is open source which i can update or change to suit my needs and has not been hacked. Or Java Content management script or something which can save me the time to develop this whole thing.
Hundreds, in both Java and PHP.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.