Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I had a requirement for uploading Images Java web project with clustered tomcats.
The issue I'm facing is weather to upload images on the file system or the database.
Since with file system there is an issue with integrity(inconsistent images across the clustered nodes) but performance is faster.
Had got the solution of syncing files between two node through a hybrid approach.
Did instantaneous sync between each node by making a Rest post of the file by converting it to a base 64 string and posting as XML, to the other nodes in cluster in a separate thread.
As a consequence of this many scenarios arose when the design would fail. (One of them being that one node is down while upload happens on the other node.). Then came up with the solution of keep DB as a Master copy for sync on server start-up and use secondary storage on other times.
Still the solution seemed too complex so we resorted to a shared mounted drive, as suggested by Andrey Chaschev and venergiac. It worked like a charm. guess the only plus point of this RnD was I got pretty familiar with what Rest can and can't do and Jgroups for messaging between cluster aware tomcats.
A distributed file system might be what you need. In one of my previous projects I used GlusterFS and the experience was quite smooth - the DFS is seen as a folder which is shared between hosts to which any of the hosts may write or read. It promises to be fault tolerant. Googling for a modern solution gives out XtreemFS.
From what I read about DFS vs FS vs NFS, DFS are much faster and easier to configure than NFS and are a bit slower than direct file copying via network.
Firstly,
I save the files on the database (Oracle);
however, you could use a shared directory between the nodes of the cluster (NFS or CIFS).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I write a service which stores millions files (20-30mb file) on a disk and I need to write a search function to find a file by name (there is no need to search file content) or view files in explorer (for example, navigate in browser as a folder structure). I want to make it fast, reliable and simple in Java. Say, I plan to run two services both of which can be used to upload a file or search files by name pattern. What will be the best technology/approach to use to implement this? Store a file on a disk as well as the path and name in the database, search against the database and fetch findings by path from the database? Any other good ideas? I thought about elasticsearch but looks like a heavy solution.
This question is too broad and rather not in a format of SO (concrete programming questions mostly with code snippets that try to address a concrete technical difficulty given the set of technologies).
There are many ways to fulfill your requirements. Yet, based solely on the information presented in your question, its impossible to recommend something because we don't really know your requirements. I'll explain:
I plan to run two services both of which can be used to upload a file or search files by name pattern.
Does this mean that the file system has to be distributed?
If so, consider Cloud solutions style aws's S3.
If you can't run in the cloud, here you can find a comprehensive list of distributed filesystems.
Elasticsearch can also work of course as a search engine, but its more a full fledged search engine, so looks like an overkill for me in this case.
You might want to work directly with lucene so that you won't need to run an additional process that also might fail (ES is built on top of lucene). Lucene will store its index directly on the filesystem, again if it meets the requirements.
Now you're talking also about the database - again a possible direction especially if you're already have one in your project. In general relational database management servers have some support of searching but there are more advanced solutions: in PostgreSQL for example you have a GIN index (inverted index) again the same concepts for full text search that go way beyond standard's SQL's LIKE Operator.
Yet another idea: go with a local disk. If you're on linux there is an indexing utility called "locate" that you can delegate the index creation to.
So the choice is yours.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an API in C# .Net for uploading files whose size vary between 10 B - 100 KB. Per second, the system receives around 5 such calls. Now I want to pass this file to a JAVA process (Because it is a producer of Kafka, and we want it to be JVM based, while C# API is legacy). Both are going to reside on the same machine almost always. What is the best way of doing that?
I read about jni4net, IKVM for interacting with Java from C#. Would they be better or should I make it socket based (Web API in Java accepting the files), or should I read from the local filesystem where C# App has uploaded or any other option that I am missing?
In environment with high concurrency, reading from the local file-system might not be good idea.
You could use memory mapped files which java supports with FileChannel.
Depending on operating system, you could also use Named pipes for IPC, here is an article showing how to use pipes between .Net and Java:
http://v01ver-howto.blogspot.com/2010/04/howto-use-named-pipes-to-communicate.html
All options considered, i would go with sockets. They are portable and easy to do and will most likely meet performance requirements you have.
You may also use a message queue. You can either put a binary message, serialize the file or put the location of file on the queue if you are storing the files in the file system.
A solution with sockets and message queues will allow you to have a more distributed architecture i.e. not loading a single machine with too much work.
If you want to actually use the C# API from Java, a bridge is probably the way to go. It'll likely be more efficient than a Web API. Some bridges will also allow you to run the C# and the Java in the same process, which is more efficient still.
In addition to the bridges you mention, you might want to consider JNBridgePro. You can find more information at our website.
Disclosure -- I am affiliated with JNBridge.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
Say we are writing applications with Play Framework, or Spring for instance, and we want to use a standard way to manage and store binary file uploads using an API (object storage), as opposed to managing a file system yourself. In a similar way to using Amazon cloud storage, but without locking into an external provider.
Imagine we would also like to be able start by running this service on the same server as the application you are developing, until growth requires it you to move the file storage to it's own server (or cluster) and your application itself in it's own cluster. If we start off with a cluster-ready service then surely we can scale up quickly.
Do we continue to manage the file system ourselves while running at a small scale, or do we adopt something else?
Is this where we need to look at running a local Hadoop HDFS instance for example, this way we wouldn't need to rewrite our file upload and handling in our application, and could scale-out the file management system into a cluster when the need arises. It would be great if this functionality was provided as a service with a common API, such as running H2DB in-memory for integration testing.
Or are Hadoop HDFS, OpenStack Swift or Ceph overkill when we're still making our simple prototype application?
I'm sure that I'm missing something - but struggling to find the solution. Managing a file system by hand feels dirty and wouldn't allow me to push applications to any PaaS provider without rewriting. Again, I think there should also be a local solution rather than always integrating with Amazon and the likes.
Any thoughts?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have a server (Linux/Apache-Tomcat & MySQL) which hosts several almost identical websites. At least, the java libraries are identical.
Right now, every website has it's own .jar file with these java classes.
I'd like to know if this is a good practice, or if I should have these classes in one place where each of the websites can access them? Would this improve performance in any way? Would it result in less memory usage for the JVM? Are there any down-sides?
I haven't been able to find any information related to this situation.
Upsides: a small amount of disk space and RAM is saved. Remember that the only heap space taken belongs to the java.lang.Class instances representing the types you actually load from that JAR file.
Downsides: all applications in the JVM are locked-into using the version of the library that is shared. If you really want all deployed webapps to be identical, then this really is no downside. Deployments can get tricky because you have to maintain a non-standard deployment process (e.g. the webapp is not self-contained) that may be different from container-to-container or between versions of the same container (e.g. Tomcat changed its mind between versions 4 and 5, 5 and 5.5, and 5 and 6 for how to configure "common" and "shared" libraries).
If the web applications are identical, you should ask yourself: should you even be deploying more than one? Instead, you could sniff the URL and use a configuration for each kind of client instead of deploying the applications separately.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm using JSP/Java and Spring MVC as framework. I'm going to support uploading PDF files to my site. The site will be uploaded to a free web hosting site.
I want to know which is the best way to support the uploading the PDF files:
Save it into database?
Save it to the web server ( if this is possible )
Save it to your recommendation ( please share what you have in mind )
And also give me a link or tutorial of how may I do this ( if you suggest other than option 1 and 2 )
Thank you in advance.
Saving file into database is not recommended and i am not sure if anyone would like to do that also.
Uploading file into web server is also not a good choice because it will increase the load unnecessary.
The best option is use a dedicated space (it may be cloud or anywhere), and secure it.
You can refer to spring file upload for reference.
If you are working in localhost for development then i would suggest just use the drive other than where your sever is running. And you can mention the path in web.xml or where ever you want.
Little is achieved by putting "large blobs of binary data" (such as PDFs) into a database. The only situation where this would be justified is when the data has to be handled transactionally.
If there is nothing to be achieved by putting the PDFs into the database, then don't do it. You should be able to upload and download large documents faster if they are stored as files in the file system.
How do you do it? Well there are a variety of ways. But WebDAV offers a simple "off-the-shelf" solution ... if that is what you are after.
Recommended way is always to use a File System storage service like Amazon S3.
This application on GitHub shows you exactly what you need using AmazonS3 but uploading an image instead of a .pdf. But I am sure that you will find the example really useful.