If I know the URL of an MP3 file, what is the easiest/fastest way to get its length, bitrate, size, etc?
How can I download just the ID3 tag part of the MP3 to get these details?
You will need to look at the ID3 tags in the mp3 file.
Unless you keep track of the metadata you want somewhere else.
To specifically get the Track length of the file you will need to look into the ID3 metadata tag for sure, specifically the 'TRCK' frame of the tag.
To only download the ID3 Tag part, you must first download the ID3 header part of the file.
This website contains very specific information about the ID3 Tag format. You will need to look at the version number of the ID3 Tag and then, based on that, you will need to find the information regarding how long the ID3 Tag is. Then, you must download the WHOLE tag because the frames are not in any specific order.
Then you should be able to use a third party library to find the TRCK frame and its data.
Here is how you get file size, but how do you get bitrate and track length?
import java.io.*;
import java.net.*;
public class FileSizeFromURL {
public static final void main(String[] args) {
URL url;
URLConnection conn;
int size;
if(args.length != 1) {
System.out.println("Usage: FileSizeFromURL ");
return;
}
try {
url = new URL(args[0]);
conn = url.openConnection();
size = conn.getContentLength();
if(size < 0)
System.out.println("Could not determine file size.");
else
System.out.println(args[0] + "\nSize: " + size);
conn.getInputStream().close();
}
catch(Exception e) {
e.printStackTrace();
}
}
}
what is the easiest/fastest way to get its length, bitrate, size, etc?
File size you can get with an HTTP HEAD request. If by ‘length’ you mean playing time in seconds, you cannot get this without fetching the entire file. You can guess, by fetching the first few MP3 frames, looking at their bitrate, and assuming that the rest of the file has the same bitrate, but given the popularity of Variable Bit-Rate encoding the likelihood this will be close to accurate is quite low.
ID3 tags can in theory contain information that might allow you to guess the length better, in the ASPI and ETCO tags. But in practice these are very rarely present.
How can I download just the ID3 tag part of the MP3 to get these details?
For ID3v2 tags, grab the start of the file. (It's possible for ID3v2 frames to be elsewhere, but in practice they're always there.) You can't tell how long the tag is going to be in advance. For text-only tags you're likely to find the information you want in the first 512-1024 bytes. Unfortunately more and more MP3s have embedded ‘album art’ pictures, which can be much longer; try to pick an ID3 library that will gracefully ignore truncated ID3 information.
ID3v1 tags are located at the end of the file. Again you can't tell how long they're going to be. And of course you don't know in advance whether the file has ID3v1 tags, ID3v2 tags, both or neither. Generally these days ID3v2 is a better bet though.
To read part of a file through HTTP you need the Range header. This too is not supported everywhere.
In summary, there are enough problems with this that the best option may well be giving up and just fetching the whole file.
I am answering this question a year+ after it was asked but I do have some info to add for those happening by at a later time. I answered a similar question about getting the image dimensions from an image on a remote server without downloading the whole image. See that discussion at the following URL:
Get image dimensions with Curl
You could definitely use the same technique to pull info out of the ID3 tags without downloading the whole MP3 from a remote server.
I hope that helps for future passers-by.
You cannot get this information from the URL alone.
You'll have to load the first few K of content and use an mp3 library to decode the header and get the values.
That data is encoded in the id3 tag, so you need to download at least that much of the file, which is at the beginning so you're in luck.
Alternatively you can look at the content-length header value in the http response header to know the length, if the server tells you which it may not.
Related
My idea is to divide a big response text into small parts to load them concurrently.
The following code helps me open stream from an URL but I want to load its whole content from multithreads to optimize performance, then I will merge them into a single result. However, the method return a ReadableByteChannel which cannot specify the start position and I have to transfer it linearly:
URL url = new URL("link");
InputStream fromStream = url.openStream();
ReadableByteChannel fromChannel = Channels.newChannel(fromStream);
Is there any way to specify the position like SeekableByteChannel (seem likes this interface only works with file)? Thanks you :D
If you can manipulate the request before it's a stream then yes, you would use the Range http header to specify the chunk of data you wanted...
See Downloading a portion of a File using HTTP Requests
If not then you will manually have to read past the data you don't need.
See Given a Java InputStream, how can I determine the current offset in the stream?
I have a Google App Engine App that converts CSV to XML files. It works fine for small XML inputs, but refuses to finalize the file for larger inputed XML. The XML is read from, and the resulting csv files are written to, many times before finalization, over a long-running (multi-day duration) task. My problem is different than this FileServiceFactory getBlobKey throws IllegalArgumentException , since my code works fine both in production and development with small input files. So it's not that I'm neglecting to write to the file before closing/finalizing. However, when I attempt to read from a larger XML file. The input XML file is ~150MB, and the resulting set of 5 CSV files is each much smaller (perhaps 10MB each). I persisted the file urls for the new csv files, and even tried to close them with some static code, but I just reproduce the same error, which is
java.lang.IllegalArgumentException: creation_handle: String properties must be 500 characters or less. Instead, use com.google.appengine.api.datastore.Text, which can store strings of any length.
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedSingleValue(DataTypeUtils.java:242)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:207)
at com.google.appengine.api.datastore.DataTypeUtils.checkSupportedValue(DataTypeUtils.java:173)
at com.google.appengine.api.datastore.Query$FilterPredicate.<init>(Query.java:900)
at com.google.appengine.api.datastore.Query$FilterOperator.of(Query.java:75)
at com.google.appengine.api.datastore.Query.addFilter(Query.java:351)
at com.google.appengine.api.files.FileServiceImpl.getBlobKey(FileServiceImpl.java:329)
But I know that it's not a String/Text data type issue, since I am already using similar length file service urls for the previous successful attempts with smaller files. It also wasn't an issue for the other stackoverflow post I linked above. I also tried putting one last meaningless write before finalizing, just in case it would help as it did for the other post, but it made no difference. So there's really no way for me to debug this... Here is my file closing code that is not working. It's pretty similar to the Google how-to example at http://developers.google.com/appengine/docs/java/blobstore/overview#Writing_Files_to_the_Blobstore .
log.info("closing out file 1");
try {
//locked set to true
FileWriteChannel fwc1 = fileService.openWriteChannel(csvFile1, true);
fwc1.closeFinally();
} catch (IOException ioe) {ioe.printStackTrace();}
// You can't get the blob key until the file is finalized
BlobKey blobKeyCSV1 = fileService.getBlobKey(csvFile1);
log.info("csv blob storage key is:" + blobKeyCSV1.getKeyString());
csvUrls[i-1] = blobKeyCSV1.getKeyString();
break;
At this point, I just want to finalize my new blob files for which I have the urls, but cannot. How can I get around this issue, and also, what may be the cause? Again, my code works for small files (~60 kB), but the input file of ~150MB fails). Thank you for any advice on what is causing this or how to get around it! Also, how long will my unfinalized files stick around for, before being deleted?
This issue was a bug in the Java MapReduce and Files API, which was recently fixed by Google. Read announcement here: groups.google.com/forum/#!topic/google-appengine/NmjYYLuSizo
I have a simple TCP serversocket that will GET a byte array. This GET comes from when entering a website on this server that contains an img src link to a gif image, the requests looks like this:
GET /myHome.htm HTTP/1.1
GET /house.gif HTTP/1.1
Now the byte array is done like this:
byte[] fileByte = Files.readAllBytes(filePath);
To print the website which contains this image I do this:
out.writeBytes(new String(fileByte));
out:
DataOutputStream out= new DataOutputStream(socketClient.getOutputStream());
Now to make the image display I think I have to use something else then
out.writeBytes()
but I do not know for sure. Anybody knows how to make the image display? Right now the image just dont show at all.
First, make sure your GIF file is not corrupted. (Happened to me before, too).
If that is the case, try this code for sending the GIF file:
byte[] fileByte = Files.readAllBytes(filePath);
writer.writeBytes("HTTP/1.1 200 OK\r\n");
writer.writeBytes("Content-Type: image/gif\r\n");
writer.writeBytes("Content-Length: "+fileByte.length+"\r\n");
writer.writeBytes("\r\n");
writer.write(fileByte, 0, fileByte.length);
And then try to navigate to "house.gif" directly instead of "myHome.htm". Let me know in the comments what this does.
Previous answer attempts:
I think I may have misunderstood your question. Let me try with a different answer:
You are not sure how to figure out on the server when to return the HTML file myHome.htm and when to return house.gif?
I think for this you need to simply parse out the requested URL. Just check whether it contains "house.gif" or not. Then, depending on this, you either return the HTML file as you described above, or you send the .gif file, making sure that you use
writer.write(fileByte, 0, fileByte.length);
to send the binary data and that you set a reply header of
Content-Type: image/gif
In both cases (for the HTML file and the GIF file), though, you should prepend the data you are sending with correct HTTP response headers. Don't take the page-title the wrong way, but this site might help: http://net.tutsplus.com/tutorials/other/http-headers-for-dummies/
And just to make sure: Your server will be receiving TWO independent requests. The first one will ask for the HTML file, the second one will ask for the GIF file. So you send either one or the other. So, there's no "special way" to send the GIF instead of the HTML file. You use the same clientSocket. But it's a different connection.
Previous answer(s):
I think you might be missing the mime-type of your returned data. Try adding the following HTTP header to your reply:
Content-Type: image/gif
Actually... Are you sending a correct HTTP reply at all (including headers, specifically Content-Length)? If not, shoot me a comment and I'll post the code that you need for this.
If, for some reason, you cannot set the content-type header to let the browser know that you are sending it an image, you might be able to load the binary data on the client with an XMLHttpRequest into a JavaScript function rather than specifying it as the source Url of an img tag. Then you can use JavaScript to encode the binary data into a dataURI (http://en.wikipedia.org/wiki/Data_URI_scheme) with the correct mime type and set that as the source of the image.
Actually, I just noticed something in your code:
new String(fileByte)
might interpret the fileBytes as unicode characters rather than binary. Then, when you write this to the writer, it might screw it up as probably not all data in the image are valid unicode. Try replacing the line with this:
writer.write(fileByte, 0, fileByte.length);
Maybe this is all you need to do to fix it???
I have a crawler that downloads pages and tries to parse the HTML. One of the issues I've been facing is how to properly determine what mimetype an HTML file is.
Right now I'm using
is = new ByteArrayInputStream( htmlResult.getBytes( "UTF-8" ) );
mimeType = URLConnection.guessContentTypeFromStream(is);
but it misses sites like this: http://www.artdaily.org/index.asp?int_sec%3D11%26int_new%3D39415 because of the extra space between the doc tag and HTML tag in the source.
Does anyone know a good way to determine if a string is HTML or not? Searching for or some other tag wouldn't necessarily work because of text being embedded in binary files I may come across.
thanks
Do you have control over the http connection that you crawler uses? Then how about checking the HTTP response header "Content-type". Thats one way to determine the content type. I just did a quick test of the artdaily.com to see if the content type header was sent. And there is one that has a value text/html
I'm developing a Web application that will let users upload images.
My concern is the file´s size, specially if they are invalid formats.
I'm wondering if there´s a way in java (or a third party library) to check the allowed files formats (jpg, gif and png) before reading the entire file.
If you wish to support only a few types of images you can start by (up)loading the image and at some point use the first few bytes to check wether you wish to continue the upload.
Quite a lot of image formats can be recognized by the first few bytes, the magic number. If the number matches you don't know whether the file is valid of course, but it may be used to match extension and magic number to prevent is really does not correspond at all.
Have a look at this page to check out some Java which checks mime-types. Do read the docs or source to check whether any given method requires the entire file, or can operate on the first few bytes. I've not used those libraries :)
Also check out this page which also lists some java libraries, and some papers on which detection is based.
Don't forget to put in some feedback if you managed to find something you like!
You don't need 3rd party libraries. The code you have to write is simple.
At the point you are handling your uploads, filter the files by their extension. This isn't perfect, but will account for most of the cases.
However, this would mean files are already uploaded to the server. You can use a bit of javascript on the client-side to perform the same operation - check whether the value of the file-upload component contains an allowed file type - .jpg, .png, etc.
function extensionsOkay(fval) {
var extension = new Array();
extension[0] = ".png";
extension[1] = ".gif";
extension[2] = ".jpg";
extension[3] = ".jpeg";
extension[4] = ".bmp";
// No other customization needed.
var thisext = fval.substr(fval.lastIndexOf('.')).toLowerCase();
for(var i = 0; i < extension.length; i++) {
if(thisext == extension[i]) {
$('#support-documents').hide();
return true; }
}
// show client side error message
$('#span.failed').show();
return false;
}