Speech to text Java web app for live caption possible? [closed]

Speech to text Java web app for live caption possible? [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 10 months ago.
Improve this question
This is regaarding Google Speech to text API:
API
I want to develop Spring Boot Java Web App:
The app is launched in local host
I open browser to http://localhost:8080
The app displays simple UI, main window that display live realtime captions for
any English audio comping from the laptop speaker which could be zoom video call in which participants are speaking and I hear them and I also see the live captions in my local web app
Live captions remains on the screen in a window with scrollbar
Live captions are saved in text file as new captions keep on appending in the text file
It is critical for the captions to have best accuracy and display captions quickly as the person is speaking.
Can this be achieved? If not possible with Google API, what is the alternative API?

If I am understanding you correctly, IMHO I would separate it into two parts
Transcribe the speec to text, like below from google api
and then do the caption as stream overlay
//
// Performs streaming speech recognition on raw PCM audio data.
//
// #param fileName the path to a PCM audio file to transcribe.
//
public static void streamingRecognizeFile(String fileName) throws Exception, IOException {
Path path = Paths.get(fileName);
byte[] data = Files.readAllBytes(path);
// Instantiates a client with GOOGLE_APPLICATION_CREDENTIALS
try (SpeechClient speech = SpeechClient.create()) {
// Configure request with local raw PCM audio
RecognitionConfig recConfig =
RecognitionConfig.newBuilder()
.setEncoding(AudioEncoding.LINEAR16)
.setLanguageCode("en-US")
.setSampleRateHertz(16000)
.setModel("default")
.build();
StreamingRecognitionConfig config =
StreamingRecognitionConfig.newBuilder().setConfig(recConfig).build();
class ResponseApiStreamingObserver<T> implements ApiStreamObserver<T> {
private final SettableFuture<List<T>> future = SettableFuture.create();
private final List<T> messages = new java.util.ArrayList<T>();
#Override
public void onNext(T message) {
messages.add(message);
}
#Override
public void onError(Throwable t) {
future.setException(t);
}
#Override
public void onCompleted() {
future.set(messages);
}
// Returns the SettableFuture object to get received messages / exceptions.
public SettableFuture<List<T>> future() {
return future;
}
}
ResponseApiStreamingObserver<StreamingRecognizeResponse> responseObserver =
new ResponseApiStreamingObserver<>();
BidiStreamingCallable<StreamingRecognizeRequest, StreamingRecognizeResponse> callable =
speech.streamingRecognizeCallable();
ApiStreamObserver<StreamingRecognizeRequest> requestObserver =
callable.bidiStreamingCall(responseObserver);
// The first request must **only** contain the audio configuration:
requestObserver.onNext(
StreamingRecognizeRequest.newBuilder().setStreamingConfig(config).build());
// Subsequent requests must **only** contain the audio data.
requestObserver.onNext(
StreamingRecognizeRequest.newBuilder()
.setAudioContent(ByteString.copyFrom(data))
.build());
// Mark transmission as completed after sending the data.
requestObserver.onCompleted();
List<StreamingRecognizeResponse> responses = responseObserver.future().get();
for (StreamingRecognizeResponse response : responses) {
// For streaming recognize, the results list has one is_final result (if available) followed
// by a number of in-progress results (if iterim_results is true) for subsequent utterances.
// Just print the first result here.
StreamingRecognitionResult result = response.getResultsList().get(0);
// There can be several alternative transcripts for a given chunk of speech. Just use the
// first (most likely) one here.
SpeechRecognitionAlternative alternative = result.getAlternativesList().get(0);
System.out.printf("Transcript : %s\n", alternative.getTranscript());
}
}
}
For your mobile Voice overlay
https://github.com/algolia/voice-overlay-android
For web HTML 5 overlay
<video id="video" controls preload="metadata">
<source src="video/sintel-short.mp4" type="video/mp4">
<source src="video/sintel-short.webm" type="video/webm">
<track label="English" kind="subtitles" srclang="en" src="captions/vtt/sintel-en.vtt" default>
<track label="Deutsch" kind="subtitles" srclang="de" src="captions/vtt/sintel-de.vtt">
<track label="Español" kind="subtitles" srclang="es" src="captions/vtt/sintel-es.vtt">
</video>
// per the sample linked above you can feed the / append the captions
var subtitlesMenu;
if (video.textTracks) {
var df = document.createDocumentFragment();
var subtitlesMenu = df.appendChild(document.createElement('ul'));
subtitlesMenu.className = 'subtitles-menu';
subtitlesMenu.appendChild(createMenuItem('subtitles-off', '', 'Off'));
for (var i = 0; i < video.textTracks.length; i++) {
subtitlesMenu.appendChild(createMenuItem('subtitles-' + video.textTracks[i].language, video.textTracks[i].language, video.textTracks[i].label));
}
videoContainer.appendChild(subtitlesMenu);
}

One of the fastest and most efficient ways to convert speech to text is Java Speech API (documentation at https://www.oracle.com/java/technologies/speech-api-frequently-asked-questions.html)
In the course of text conversion, you will need to break it down to pieces and because of this, the meaning may change slightly, since some expressions may have a different meaning than a single word, but this will help reduce the time of the final translation. Then send the already received segments (words, phrases) via API for translation.
You can choose several options you like (for example https://rapidapi.com/blog/best-translation-api/) and check which one will work faster. In my experience "Microsoft Translator Text" and "Google Translate" are some of the fastest. I also think that you will not be able to get instant translation, but if you test several API options and play around with whether to process all sentences, phrases or individual words at once, you can reduce the translation time to a minimum.

Related

Streaming Desktop in VLCJ

I'm trying to write a Java program which allows one user to act as a server and stream their desktop (video & audio), then other users act as clients and watch the live stream of their desktop (similar to Twitch, Webex, Skype screenshare, etc). I am using VLCJ for this, although I have no commitment to using it so if there is a better solution I'm all ears. Here is the code, which is copied from the link I provide below:
package test.java;
import uk.co.caprica.vlcj.discovery.NativeDiscovery;
import uk.co.caprica.vlcj.player.MediaPlayerFactory;
import uk.co.caprica.vlcj.player.headless.HeadlessMediaPlayer;
import test.java.VlcjTest;
/**
* An example of how to stream a media file over HTTP.
* <p>
* The client specifies an MRL of <code>http://127.0.0.1:5555</code>
*/
public class StreamHttp extends VlcjTest {
//when running this it requires an MRL (Media Resource Locator)
//fancy term for saying the file you want to stream. This could be a url to another
//location that streams media or a filepath to a media file you want to stream
//on the system you are running this code on.
public static void main(String[] args) throws Exception {
new NativeDiscovery().discover();
if(args.length != 1) {
System.out.println("Specify a single MRL to stream");
System.exit(1);
}
//the media you are wanting to stream
String media = args[0];
//this is the IP address and port you are wanting to stream at
//this means clients will connect to http://127.0.0.1:5555
//to watch the stream
String options = formatHttpStream("127.0.0.1", 5555);
System.out.println("Streaming '" + media + "' to '" + options + "'");
//this creates a the actual media player that will make calls into the native
//vlc libraries to actually play the media you supplied. It does it in
//a headless fashion, as you are going to stream it over http to be watched
//instead of playing it locally to be watched.
MediaPlayerFactory mediaPlayerFactory = new MediaPlayerFactory(args);
HeadlessMediaPlayer mediaPlayer = mediaPlayerFactory.newHeadlessMediaPlayer();
//this simply starts the player playing the media you gave it
mediaPlayer.playMedia(media, options);
// Don't exit
//basically you don't want the thread to end and kill the player,
//so it just hangs around and waits for it to end.
Thread.currentThread().join();
}
private static String formatHttpStream(String serverAddress, int serverPort) {
StringBuilder sb = new StringBuilder(60);
sb.append(":sout=#duplicate{dst=std{access=http,mux=ts,");
sb.append("dst=");
sb.append(serverAddress);
sb.append(':');
sb.append(serverPort);
sb.append("}}");
return sb.toString();
}
}
I pass "screen://" as a parameter to this program. When I run the code, I get this error message:
[000000000038b250] access_output_http access out: Consider passing --http-host=IP on the command line instead.
[000000001ccaa220] core mux error: cannot add this stream
[000000001cc72100] core decoder error: cannot create packetizer output (RV32)
I tried searching for a solution but all I could find was this:
Video Streaming in vlcj
and although this user had the same error, I couldn't solve my problem from this link, although I did use the StreamHttp code sample from it. I am a relatively inexperienced programmer so if I missed an obvious solution then I apologize. I am using Java 1.8, Windows 7 64 bit.

You need something like this:
String media = "screen://";
String[] options = {
":sout=#transcode{vcodec=FLV1,vb=4096,scale=0.500000}:http{mux=ffmpeg{mux=flv},dst=:5000/"
};
The key things shown here are a "sout" string to transcode the video, then another appended "sout" string to stream (in this case via http).
In this example string, for http streaming only the port (5000, arbitrarily chosen) is specified. No host is specified, so it means localhost. You could have something like "dst=127.0.0.1:8080/" or whatever you need.
You will have to choose/experiment with the specific transcoding/streaming options that you want. There is no one size fits all for those options.
Foot-note:
You can actually use VLC itself to generate this string for you.
Start VLC, then choose the media you want to play.
Instead of pressing "Play", use the widget to select "Stream" instead. This opens the Streaming wizard where you can pick all of your options.
At the end of the wizard, before you start playing, it shows you the string you need.

Creating an NDEF WiFi record using application/vnd.wfa.wsc in Android

As of Android 5.0.0 you can long tap on a WiFi connection and write that connection to a tag ("Write to NFC tag"). You can find the source for that operation here: WriteWifiConfigToNfcDialog.java. The relevant line that takes a WiFi connection and creates an NDEF payload appears to be here:
String wpsNfcConfigurationToken = mWifiManager.getWpsNfcConfigurationToken(mAccessPoint.networkId);
mWifiManager is an instance of WifiManager, however getWpsNfcConfigurationToken is not part of the API. By tracking down this method, we can find its commit here: Add calls for NFC WSC token creation which is unfortunately no help. This is where my investigation has run out. Edit:
I've found out the following call stack:
WifiServiceImpl.java calls mWifiStateMachine.syncGetWpsNfcConfigurationToken(netId);
WifiStateMachine.java calls mWifiNative.getNfcWpsConfigurationToken(netId);
WifiNative.java finally has the method
public String getNfcWpsConfigurationToken(int netId) {
return doStringCommand("WPS_NFC_CONFIG_TOKEN WPS " + netId);
}
which then calls
String result = doStringCommandNative(mInterfacePrefix + command);
where doStringCommandNative makes a system call (can't find the code for this anywhere).
Which is now where the investigation ends.
Hoping someone can step in and show me a method that creates an NdefRecord that is of the type application/vnd.wfa.wsc given an SSID, Password, Encryption/Auth type.
I've of course inspected the bytes of an actual application/vnd.wfa.wsc record created by Android but manually recreating this process with the bytes seems potentially very unreliable and is incredibly tedious.

The answer lies in the Wi-Fi Alliance "Wi-Fi Simple Configuration Technical Specification v2.0.5" (available for download here). Android makes use of this standard format for configuring WiFi networks, I wrongly assumed it was proprietary.
Firstly, I created an NFC helper class (aptly named NFCHelper.java) which has all the byte constants needed to construct the record. Then, I created a hacky method for creating one of the two records required. The spec is actually fairly useless here, what I did was examined a number of payloads of tags that had been successfully configured via the Android OS. Finally, you need to have a mechanism to prepend a "Handover Select Record (NFC WKT Hs)" (see page 90 of WiFi spec). I believe this record "tells" Android to register the network in the following token.
How to create the handover record:
ndefRecords = new NdefRecord[2];
byte[] version = new byte[] { (0x1 << 4) | (0x2)};
ndefRecords[0] = new NdefRecord(NdefRecord.TNF_WELL_KNOWN, NdefRecord.RTD_HANDOVER_REQUEST, new byte[0], version);
// and then obviously add the record you create with the method below.
Method for creating the configuration token:
private NdefRecord createWifiRecord(String[] data) {
String ssid = data[0];
String password = data[1];
String auth = data[2];
String crypt = data[3];
byte[] authByte = getAuthBytes(auth);
byte[] cryptByte = getCryptBytes(crypt);
byte[] ssidByte = ssid.getBytes();
byte[] passwordByte = password.getBytes();
byte[] ssidLength = {(byte)((int)Math.floor(ssid.length()/256)), (byte)(ssid.length()%256)};
byte[] passwordLength = {(byte)((int)Math.floor(password.length()/256)), (byte)(password.length()%256)};
byte[] cred = {0x00, 0x36};
byte[] idx = {0x00, 0x01, 0x01};
byte[] mac = {0x00, 0x06};
byte[] keypad = {0x00, 0x0B};
byte[] payload = concat(NFCHelper.CREDENTIAL, cred,
NFCHelper.NETWORK_IDX, idx,
NFCHelper.NETWORK_NAME, ssidLength, ssidByte,
NFCHelper.AUTH_TYPE, NFCHelper.AUTH_WPA_PERSONAL, authByte,
NFCHelper.CRYPT_TYPE, NFCHelper.CRYPT_WEP, NFCHelper.CRYPT_AES_TKIP,
NFCHelper.NETWORK_KEY, passwordLength, passwordByte);
// NFCHelper.MAC_ADDRESS, mac);
return NdefRecord.createMime(NFC_TOKEN_MIME_TYPE, payload);
}
License and gist here. You can find an implementation of the concat method anywhere on the net, or just write your own.
Note: this is a fairly hacky implementation (as you may notice). I am including the AES and AES/TKIP bytes as I found in testing it worked for a variety of networks using different encryption/auth methods under Android 5.*
Please feel free to change the function prototype, the String array just worked nicely with what I was doing.
Using the two records created in the first snippet above, you should then pass that into an NdefMessage and write it to your tag.
One day soon I'm going to do a write up and a far better/robust soln with graphics and stuff too, so I'll update this answer then.

The call of doStringCommand("WPS_NFC_CONFIG_TOKEN WPS " + netId) in the end is handled by the wpa_supplicant module. This feature is described here. I think the actual implementation of this can be found in wps_supplicant.c.
What you are actually trying to do isn't something Android specific actually. It's defined in the "WiFi Simple Configuration Technical Specification", which you can download by filling this form. The relevant part should be 10.1.2 Configuration Token.
NfcUtils.java has a working implementation for this! There are a few FIXMEs and TODOs, but in total it works and should give you a pretty good idea of what you need to do.
In case you want to parse such NdefRecords yourself and do something with the SSID and key, NfcWifiProtectedSetup.java shows how to do that.

Java voice recognition for very small dictionary

I have MP3 audio files that contain voicemails that are left by a computer.
The message content is always in same format and left by the same computer voice with only a slight variation in content:
"You sold 4 cars today" (where the 4 can be anything from 0 to 9).
I have be trying to set up Sphinx, but the out-of-the-box models did not work too good.
I then tried to write my own acoustic model and haven't had much better success yet (30% unrecognized is my best).
I am wondering if voice recognition might be overkill for this task since I have exactly ONE voice, an expected audio pattern and a very limited dictionary that would need to be recognized.
I have access to each of the ten sounds (spoken numbers) that I would need to search for in the message.
Is there a non-VR approach to finding sounds in an audio file (I can convert MP3 to another format if necessary).
Update: My solution to this task follows
After working with Nikolay directly, I learned that the answer to my original question is irrelevant since the desired results may be achieved (with 100% accuracy) using Sphinx4 and a JSGF grammar.
1: Since the speech in my audo files is very limited, I created a JSGF grammar (salesreport.gram) to describe it. All of the information I needed to create the following grammar was available on this JSpeech Grammar Format page.
#JSGF V1.0;
grammar salesreport;
public <salesreport> = (<intro> | <sales> | <closing>)+;
<intro> = this is your automated automobile sales report;
<sales> = you sold <digit> cars today;
<closing> = thank you for using this system;
<digit> = zero | one | two | three | four | five | six | seven | eight | nine;
NOTE: Sphinx does not support JSGF tags in the grammar. If necessary, a regular expression may be used to extract specific information (the number of sales in my case).
2: It is very important that your audio files are properly formatted. The default sample rate for Sphinx is 16Khz (16Khz means there are 16000 samples collected every second). I converted my MP3 audio files to WAV format using FFmpeg.
ffmpeg -i input.mp3 -acodec pcm_s16le -ac 1 -ar 16000 output.wav
Unfortunately, FFmpeg renders this solution OS dependent. I am still looking for a way to convert the files using Java and will update this post if/when I find it.
Although it was not required to complete this task, I found Audacity helpful for working with audio files. It includes many utilities for working with the audio files (checking sample rate and bandwidth, file format conversion, etc).
3: Since telephone audio has a maximum bandwidth (the range of frequencies included in the audio) of 8kHz, I used the Sphinx en-us-8khz acoustic model.
4: I generated my dictionary, salesreport.dic, using lmtool
5: Using the files mentioned in the previous steps and the following code (modified version of Nikolay's example), my speech is recognized with 100% accuracy every time.
public String parseAudio(File voiceFile) throws FileNotFoundException, IOException
{
String retVal = null;
StringBuilder resultSB = new StringBuilder();
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("file:acoustic_models/en-us-8khz");
configuration.setDictionaryPath("file:salesreport.dic");
configuration.setGrammarPath("file:salesreportResources/")
configuration.setGrammarName("salesreport");
configuration.setUseGrammar(true);
StreamSpeechRecognizer recognizer = new StreamSpeechRecognizer(configuration);
try (InputStream stream = new FileInputStream(voiceFile))
{
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null)
{
System.out.format("Hypothesis: %s\n", result.getHypothesis());
resultSB.append(result.getHypothesis()
+ " ");
}
recognizer.stopRecognition();
}
return resultSB.toString().trim();
}

The accuracy on such task must be 100%. Here is the code sample to use with the grammar:
public class TranscriberDemoGrammar {
public static void main(String[] args) throws Exception {
System.out.println("Loading models...");
Configuration configuration = new Configuration();
configuration.setAcousticModelPath("file:en-us-8khz");
configuration.setDictionaryPath("cmu07a.dic");
configuration.setGrammarPath("file:./");
configuration.setGrammarName("digits");
configuration.setUseGrammar(true);
StreamSpeechRecognizer recognizer =
new StreamSpeechRecognizer(configuration);
InputStream stream = new FileInputStream(new File("file.wav"));
recognizer.startRecognition(stream);
SpeechResult result;
while ((result = recognizer.getResult()) != null) {
System.out.format("Hypothesis: %s\n",
result.getHypothesis());
}
recognizer.stopRecognition();
}
}
You also need to make sure that both sample rate and audio bandwidth matches the decoder configuration
http://cmusphinx.sourceforge.net/wiki/faq#qwhat_is_sample_rate_and_how_does_it_affect_accuracy

First of all, Sphinx only work with WAVE file. For very limited vocabulary, Sphinx should generate good result when using a JSGF grammar file (but not that good in dictation mode). The main issue I found is that it does not provide confidence score (it is currently bugged). You might want to check three other alternative:
SpeechRecognizer from Windows platform. It provide easy to use recognition with confidence score and support grammar. This is C#, but you could build a native wrapper or custom server.
Google Speech API is an online speech recognition engine, free up to 50 request per day. There is several API for this, but I like JARVIS. Be careful though, since there is no official support or documentation about this and Google might (and already have in the past) close this engine whenever they want. Of course, you will have some privacy issue (is it okay to send this audio data to a third party ?).
I recently came through ISpeech and got good result with it. It provides its own Java wrapper API, free for mobile app. Same privacy issue as Google API.
I myself choose to go with the first option and build a speech recognition service in a custom http server. I found it to be the most effective way to tackle speech recognition from Java until Sphinx scoring issue is fixed.

Detect mobile devices from user agent string [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am looking for a way to analyze user agent strings to determine whether they were generated by mobile devices. This needs to be java-based and usable in large batch log file analysis on hadoop for generating statistics (i.e., a web service wouldn't be appropriate).
I have seen WURFL, but given that I just need a binary mobile/not mobile response, the license fee seems prohibitive.
So far I have been using UADetector, which is almost exactly what I need. However, I have encountered some limitations with it. In my testing, I have found many user agent strings that provide enough information to determine that the user agent is from a mobile device, but are reported as UNKNOWN by UADetector.
For example, poorly-standardized Android apps can send the UA string "Android". This is enough to know that it came from a mobile device, but UADetector reports this UserAgentType as UNKNOWN rather than MOBILE_BROWSER.
Apache Mobile Filter's Lite Device Detection does the right thing, but I need something I can use from Java.
Can anyone recommend a better solution?

I'm the founder and maintainer of the MobileESP project, a free open source cross-platform library for detecting mobile devices. It's still very much alive! :-)
www.mobileesp.org
MobileESP only gives binary "is mobile" responses. You can detect by platform like iOS, Android or Windows Phone, or by device category, like "iPhone Tier" smartphones vs. tablet. Be sure to take a quick review of the API page.
As you may know, useragent strings vary widely. If the browser shipped on the device, the manufacturer may customize it. For example, HTC often customizes the native Android browser's useragent string.
Google provides recommendations on how the OEM should customize the useragent. If the device should be considered a phone, then Google recommends including the word "mobile" element in the string. But if the device should be considered a tablet, then the string should not contain "mobile." Adherence to this recommendation varies widely, of course.
Third party browsers like Opera or Maxthon can put whatever they want to in the useragent string -- and do! Certain "new" browsers which shall remain nameless have been doing very poor jobs of putting the correct information in their useragent strings for each platform (e.g., Android vs. iOS versions). There's not much you can do unless you get a lot of traffic from these browsers and wish to invest in tracking their exact useragent values per platform and software rev.
Anyway, MobileESP was created with the vision of doing the detection on a page-by-page basis when the page is served. I purposefully wrote the code to be very easy to read and customize, too.
To do the batch processing, you might do something like this:
1.) In the constructor, comment out the initDeviceScan() method. You won't need this for bulk processing.
2.) Pass the UserAgent and an empty string in to the constructor (UAgentInfo()).
3.) Then run whatever detect methods you're interested in. Be thoughtful about the order in which you do them to save time, based on a scan of your users.
For example, if most of your users are on iPhone and that's one of the detection criteria you're interested in, then run that check first. If this example, you certainly wouldn't run the BlackBerry method first!
My contact info is in the source code and on the web site. Send me a note if you have any questions or run into any bugs. Definitely look around the MobileESP.org web site for some tips.
Best wishes on your project, Aniket!
Anthony

Another thread suggests using the following library:
https://github.com/ahand/mobileesp/blob/master/Java/UAgentInfo.java
which seems OK.

How to read the Apache Mobile Filter value in JSP (for Tomcat)?
Before in the httpd.conf file where you have to configure mod_jk you muse add this:
JkEnvVar AMF_IS_MOBILE undefined
The Java code is:
request.getAttribute("AMF_IS_MOBILE")
from: http://wiki.apachemobilefilter.org

51Degrees has a free open source Java API that allows you to run offline processing. You can access it from the GitHub Repository here. https://github.com/51Degrees/Java-Device-Detection.
As part of the API there is an offline processing example (code also shown below) this takes a CSV file of User-Agents and returns the required properties into an Output file. The following example just uses 3 of the properties within the data set, for a full list you can look at the dictionary here https://51degrees.com/resources/property-dictionary
// output file in current working directory
public String outputFilePath = "batch-processing-example-results.csv";
// pattern detection matching provider
private final Provider provider;
/**
* Initialises the device detection Provider with the included Lite data
* file. For more data see:
* <a href="https://51degrees.com/compare-data-options">compare data options
* </a>
*
* #throws IOException if there was a problem reading from the data file.
*/
public OfflineProcessingExample() throws IOException {
provider = new Provider(StreamFactory.create(
Shared.getLitePatternV32(), false));
}
/**
* Reads a CSV file containing User-Agents and adds the IsMobile,
* PlatformName and PlatformVersion information for the first 20 lines.
* For a full list of properties and the files they are available in please
* see: <a href="https://51degrees.com/resources/property-dictionary">
* Property Dictionary</a>
*
* #param inputFileName the CSV file to read from.
* #param outputFilename where to save the file with extra entries.
* #throws IOException if there was a problem reading from the data file.
*/
public void processCsv(String inputFileName, String outputFilename)
throws IOException {
BufferedReader bufferedReader =
new BufferedReader(new FileReader(inputFileName));
try {
FileWriter fileWriter = new FileWriter(outputFilename);
try {
// it's more efficient over the long haul to create a match
// once and reuse it in multiple matches
Match match = provider.createMatch();
// there are 20k lines in supplied file, we'll just do a couple
// of them!
for (int i = 0; i < 20; i++) {
// read next line
String userAgentString = bufferedReader.readLine();
// ask the provider to match the UA using match we created
provider.match(userAgentString, match);
// get some property values from the match
Values isMobile = match.getValues("IsMobile");
Values platformName = match.getValues("PlatformName");
Values platformVersion = match.getValues("PlatformVersion");
// write result to file
fileWriter.append("\"")
.append(userAgentString)
.append("\", ")
.append(getValueForDisplay(isMobile))
.append(", ")
.append(getValueForDisplay(platformName))
.append(", ")
.append(getValueForDisplay(platformVersion))
.append('\n')
.flush();
}
} finally {
fileWriter.close();
}
} finally {
bufferedReader.close();
}
}
/**
* Match values may be null. A helper method to get something displayable
* #param values a Values to render
* #return a non-null String
*/
protected String getValueForDisplay(Values values) {
return values == null ? "N/A": values.toString();
}
/**
* Closes the {#link fiftyone.mobile.detection.Dataset} by releasing data
* file readers and freeing the data file from locks. This method should
* only be used when the {#code Dataset} is no longer required, i.e. when
* device detection functionality is no longer required, or the data file
* needs to be freed.
*
* #throws IOException if there was a problem accessing the data file.
*/
#Override
public void close() throws IOException {
provider.dataSet.close();
}
/**
* Instantiates this class and starts
* {#link #processCsv(java.lang.String, java.lang.String)} with default
* parameters.
*
* #param args command line arguments.
* #throws IOException if there was a problem accessing the data file.
*/
public static void main(String[] args) throws IOException {
System.out.println("Starting Offline Processing Example");
OfflineProcessingExample offlineProcessingExample =
new OfflineProcessingExample();
try {
offlineProcessingExample.processCsv(Shared.getGoodUserAgentsFile(),
offlineProcessingExample.outputFilePath);
System.out.println("Output written to " +
offlineProcessingExample.outputFilePath);
} finally {
offlineProcessingExample.close();
}
}
Hope this helps.
Disclosure: I work at 51Degrees.

To detect iPhone, Android and other mobile devices in Java user-agent can be used. If you are using Spring you can customize the below code as per your need.
#Override
public ModelAndView redirectToAppstore(HttpServletRequest request) {
String userAgent = request.getHeader("user-agent").toLowerCase();
String iphoneStoreUrl = "IPONE_STORE_URL";
String androidStoreUrl = "ANDROID_STORE_URL";
if (userAgent.contains("iphone"))
return new ModelAndView("redirect:" + iphoneStoreUrl);
else if (userAgent.contains("android"))
return new ModelAndView("redirect:" + androidStoreUrl);
return new ModelAndView("redirect:/");
}

Download Pandora source with Java?

I'm trying to download www.pandora.com/profile/stations/olin_d_kirkland HTML with Java to match what I get when I select 'view page source' from the context menu of the webpage in Chrome.
Now, I know how to download webpage HTML source code with Java. I have done it with downloads.nl and tested it on other sites. However, Pandora is being a mystery. My ultimate goal is to parse the 'Stations' from a Pandora account.
Specifically, I would like to grab the Station names from a site such as www.pandora.com/profile/stations/olin_d_kirkland
I have attempted using the selenium library and the built in URL getter in Java, but I only get ~4700 lines of code when I should be getting 5300. Not to mention that there is no personalized data in the code, which is what I'm looking for.
I figured it was that I wasn't grabbing the JavaScript or letting the JavaScript execute first, but even though I waited for it to load in my code, I would only always get the same result.
If at all possible, I should have a method called 'grabPageSource()' that returns a String. It should return the source code when called upon.
public class PandoraStationFinder {
public static void main(String[] args) throws IOException, InterruptedException {
String s = grabPageSource();
String[] lines = s.split("\n\r");
String t;
ArrayList stations = new ArrayList();
for (int i = 0; i < lines.length; i++) {
t = lines[i].trim();
Pattern p = Pattern.compile("[\\w\\s]+");
Matcher m = p.matcher(t);
if (m.matches() ? true : false) {
Station someStation = new Station(t);
stations.add(someStation);
// System.out.println("I found a match on line " + i + ".");
// System.out.println(t);
}
}
}
public static String grabPageSource() throws IOException {
String fullTxt = "";
// Get HTML from www.pandora.com/profile/stations/olin_d_kirkland
return fullTxt;
}
}
It is irrelevant how it's done, but I'd like, in the final product, to grab a comprehensive list of ALL songs that have been liked by a user on Pandora.

The Pandora pages are heavily constructed using ajax, so many scrapers struggle. In the case you've shown above, looking at the list of stations, the page actually puts through a secondary request to:
http://www.pandora.com/content/stations?startIndex=0&webname=olin_d_kirkland
If you run your request, but point it to that URL rather than the main site, I think you will have a lot more luck with your scraping.
Similarly, to access the "likes", you want this URL:
http://www.pandora.com/content/tracklikes?likeStartIndex=0&thumbStartIndex=0&webname=olin_d_kirkland
This will pull back the liked tracks in groups of 5, but you can page through the results by increasing the 'thumbStartIndex' parameter.

Not an answer exactly, but hopefully this will get you moving in the correct direction:
Whenever I get into this sort of thing, I always fall back on an HTTP monitoring tool. I use firefox, and I really like the Live HTTP Headers extension. Check out what the headers are that are going back and forth, then tailor your http requests accordingly. As an absolute lowest level test, grab the header from a successful request, then send it to port 80 using telnet and see what comes back.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.