How to create a local variable with ASM? - java

I'm trying to patch a class with ASM. I need to add some logic in a function. This logic needs a new local variable. Here is what I've done:
class CreateHashTableMethodAdapter extends MethodAdapter {
#Override
public void visitMethodInsn(int opcode, String owner,String name, String desc){
System.out.println(opcode + "/" + owner + "/" + name + "/" + desc);
if(opcode == Opcodes.INVOKESPECIAL &&
"javax/naming/InitialContext".equals(owner) &&
"<init>".equals(name) &&
"()V".equals(desc)){
System.out.println("In mod");
// 83: new #436; //class javax/naming/InitialContext
// 86: dup
mv.visitMethodInsn(Opcodes.INVOKESPECIAL, "javax/naming/InitialContext", "<init>", "()V");
mv.visitVarInsn(Opcodes.ASTORE, 1);
Label start_patch = new Label();
Label end_patch = new Label();
mv.visitLabel(start_patch);
mv.visitTypeInsn(Opcodes.NEW,"java/util/Hashtable");
mv.visitInsn(Opcodes.DUP);
mv.visitMethodInsn(Opcodes.INVOKESPECIAL, "java/util/Hashtable", "<init>", "()V");
mv.visitVarInsn(Opcodes.ASTORE,9);
// ........ sNip ..........
mv.visitLabel(end_patch);
mv.visitLocalVariable("env","Ljava/util/Hashtable;",null,start_patch,end_patch,9);
// 127: astore_1
}
else {
mv.visitMethodInsn(opcode, owner, name, desc);
}
}
}
When I run this method adapter against CheckClassAdapter it states:
org.objectweb.asm.tree.analysis.AnalyzerException: Error at instruction 51: Trying to access an inexistant local variable 9
.... sNiP ....
00050 R R . . . : R R : INVOKESPECIAL java/util/Hashtable.<init> ()V
00051 R R . . . : R : ASTORE 9
I think I misuse the visitLocalVariable, but I can not find out where I'm supposed to call it.
When I javap generated bytecode (without checking), I get the following local variables table:
LocalVariableTable:
Start Length Slot Name Signature
91 40 9 env Ljava/util/Hashtable;
0 343 0 this Lpmu/jms/ServerJMS;
132 146 1 initialContext Ljavax/naming/InitialContext;
153 125 2 topicConnectionFactory Ljavax/jms/TopicConnectionFactory;
223 55 3 topic Ljavax/jms/Topic;
249 29 4 topicSubscriber Ljavax/jms/TopicSubscriber;
279 55 1 ex Ljava/lang/Exception;
281 53 2 codeMessage I
289 45 3 params Lpmu/data/Parameters;
325 9 4 messageError Ljava/lang/String;
As you may notice, my variable is here but topmost ?!
Any idea ?

One convenient way to create new local variables is to extend LocalVariablesSorter instead of MethodAdapter. Then you can allocate local variables as needed using newLocal() without interfering with existing variables. See section 3.3.3 of the ASM 4.0 A Java bytecode engineering library on the ASM homepage for details.

Related

JavaObject from Netlogo has no length using py4j?

I am running nl4py (a python module for NetLogo) in Jupyter notebook. I am trying to get import a list from netlogo into python, but the import is in a Java format. However, when I try to convert the JavaObject to a python format using py4j I get an error of: JavaObject has no len(). Is there a better way to convert JavaObject in python? Thanks.
python 3.8, ipython 7.10.0, nl4py 0.5.0, jdk 15.0.2, Netlogo 6.0, MacOS Catalina 10.15.7
#start of code for nl4py
import nl4py
nl4py.startServer("/Applications/NetLogo 6.0/")
n = nl4py.NetLogoApp()
n.openModel('/Users/tracykuper/Desktop/Netlogo models/Mucin project/1_21_20/PA_metabolite_model_1_21.nlogo')
n.command("setup")
#run abm model for n number of times
#change patch variable under a specific turtle
for i in range(1):
n.command("repeat 10 [go]")
#A = np.array([1,2,3,4],[3,2,-1,-6])) #turtle number, metabolite diff.
#run simulation of metabolic network to get biomass and metabolite values
#change patch variable under a specific turtle
names = ["1", "2", "3"] #turtle names
patch_values = ["-0.5", "50", "-0.5"] #metabolite values
for i in range(len(names)):
x = ('ask turtle {} [ask patch-here [set succinate succinate + {}]]'.format(names[i],patch_values[i]))
n.command(x)
#set new bacteria mass values
values = ["5", "30", "5"] #biomass values
y = ('ask turtle {} [set m m + {}]'.format(names[i],values[i]))
n.command(y)
n.command("ask turtle {} [set color red]".format(names[i]))
import py4j
mass = n.report("mass-list")
print(mass)
self = n.report("self-list")
type(mass)
s = py4j.protocol.get_return_value(mass, object)
[[0.69], [0.8], [0.73], [0.71], [0.5], [0.51], [0.54], [0.82], [0.72], [0.88]]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-31-0b04d0127b47> in <module>
11 #map(mass + mass,mass)
12
---> 13 s = py4j.protocol.get_return_value(mass, object)
~/opt/anaconda3/envs/netlogo4/lib/python3.6/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
319 (e.g., *hello* in `object1.hello()`). Optional.
320 """
--> 321 if is_error(answer)[0]:
322 if len(answer) > 1:
323 type = answer[1]
~/opt/anaconda3/envs/netlogo4/lib/python3.6/site-packages/py4j/protocol.py in is_error(answer)
372
373 def is_error(answer):
--> 374 if len(answer) == 0 or answer[0] != SUCCESS:
375 return (True, None)
376 else:
TypeError: object of type 'JavaObject' has no len()

How to encode optional fields in spark dataset with java?

I would like to not use null value for field of a class used in dataset. I try to use scala Option and java Optional but it failed:
#AllArgsConstructor // lombok
#NoArgsConstructor // mutable type is required in java :(
#Data // see https://stackoverflow.com/q/59609933/1206998
public static class TestClass {
String id;
Option<Integer> optionalInt;
}
#Test
public void testDatasetWithOptionField(){
Dataset<TestClass> ds = spark.createDataset(Arrays.asList(
new TestClass("item 1", Option.apply(1)),
new TestClass("item .", Option.empty())
), Encoders.bean(TestClass.class));
ds.collectAsList().forEach(x -> System.out.println("Found " + x));
}
Fails, at runtime, with message File 'generated.java', Line 77, Column 47: Cannot instantiate abstract "scala.Option"
Question: Is there a way to encode optional fields without null in a dataset, using java?
Subsidiary question: btw, I didn't use much dataset in scala either, can you validate that it is actually possible in scala to encode a case class containing Option fields?
Note: This is used in an intermediate dataset, i.e something that isn't read nor write (but for spark internal serialization)
This is fairly simple to do in Scala.
Scala Implementation
import org.apache.spark.sql.{Encoders, SparkSession}
object Test {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder
.appName("Stack-scala")
.master("local[2]")
.getOrCreate()
val ds = spark.createDataset(Seq(
TestClass("Item 1", Some(1)),
TestClass("Item 2", None)
))( Encoders.product[TestClass])
ds.collectAsList().forEach(println)
spark.stop()
}
case class TestClass(
id: String,
optionalInt: Option[Int] )
}
Java
There are various Option classes available in Java. However, none of them work out-of-the-box.
java.util.Optional : Not serializable
scala.Option -> Serializable but abstract, so when CodeGenerator generates the following code, it fails!
/* 081 */ // initializejavabean(newInstance(class scala.Option))
/* 082 */ final scala.Option value_9 = false ?
/* 083 */ null : new scala.Option(); // ---> Such initialization is not possible for abstract classes
/* 084 */ scala.Option javaBean_1 = value_9;
org.apache.spark.api.java.Optional -> Spark's implementation of Optional which is serializable but has private constructors. So, it fails with error : No applicable constructor/method found for zero actual parameters. Since this is a final class, it's not possible to extend this.
/* 081 */ // initializejavabean(newInstance(class org.apache.spark.api.java.Optional))
/* 082 */ final org.apache.spark.api.java.Optional value_9 = false ?
/* 083 */ null : new org.apache.spark.api.java.Optional();
/* 084 */ org.apache.spark.api.java.Optional javaBean_1 = value_9;
/* 085 */ if (!false) {
One option is to use normal Java Optionals in the data class and then use Kryo as serializer.
Encoder en = Encoders.kryo(TestClass.class);
Dataset<TestClass> ds = spark.createDataset(Arrays.asList(
new TestClass("item 1", Optional.of(1)),
new TestClass("item .", Optional.empty())
), en);
ds.collectAsList().forEach(x -> System.out.println("Found " + x));
Output:
Found TestClass(id=item 1, optionalInt=Optional[1])
Found TestClass(id=item ., optionalInt=Optional.empty)
There is a downside when using Kryo: this encoder encodes in a binary format:
ds.printSchema();
ds.show(false);
prints
root
|-- value: binary (nullable = true)
+-------------------------------------------------------------------------------------------------------+
|value |
+-------------------------------------------------------------------------------------------------------+
|[01 00 4A 61 76 61 53 74 61 72 74 65 72 24 54 65 73 74 43 6C 61 73 F3 01 01 69 74 65 6D 20 B1 01 02 02]|
|[01 00 4A 61 76 61 53 74 61 72 74 65 72 24 54 65 73 74 43 6C 61 73 F3 01 01 69 74 65 6D 20 AE 01 00] |
+-------------------------------------------------------------------------------------------------------+
An udf-based solution to get the normal output columns of a dataset encoded with Kryo describes this answer.
Maybe a bit off-topic but probably a start to find a long-term solution is to look at the code of JavaTypeInference. The methods serializerFor and deserializerFor are used by ExpressionEncoder.javaBean to create the serializer and deserializer part of the encoder for Java beans.
In this pattern matching block
typeToken.getRawType match {
case c if c == classOf[String] => createSerializerForString(inputObject)
case c if c == classOf[java.time.Instant] => createSerializerForJavaInstant(inputObject)
case c if c == classOf[java.sql.Timestamp] => createSerializerForSqlTimestamp(inputObject)
case c if c == classOf[java.time.LocalDate] => createSerializerForJavaLocalDate(inputObject)
case c if c == classOf[java.sql.Date] => createSerializerForSqlDate(inputObject)
[...]
there is the handling for java.util.Optional missing. It could probably be added here as well as in the corresponding deserialize method. This would allow Java beans to have properties of type Optional.

Heap Corruption In C When Using DLL With JNA

I am using C language Native API callbacks with DLL files. When we are calling callback first time everything is working fine but on second call I am getting heap corruption error and JVM is getting crashed.
In the native code the memory allocated in first call is being released and then is being used in second call again and during memory allocation in second call JVM is being crashed. But on the same place when in second call new memory pointer is used rather than the one which was used in previous call I am not getting this heap corruption error.
As this callback is called many times I can not keep on allocating new space every time. In below logs I am getting error as INVALID_POINTER_READ.
I am not able to understand what is the reason behind it and how this can be fixed. When same DLL is used with JNA it's working fine.
Java/JNA Code:
Setting Hook:
final PropertyCallBack callback = new PropertyCallBack();
final int setHookStatus = callback.setHook();
private static CALLBACK callback;
public int setHook() {
if (callback != null) {
return 0;
}
synchronized (this) {
if (callback == null) {
callback = new CALLBACK();
return callback.setHook();
}
}
return 0;
}
Callback Method Called From Native:
#Override
public int PropertyHook(final DESTINATION dest, final BACSTAC_READ_INFO.ByReference info) {
final PROPERTY_CONTENTS.ByReference content = new PROPERTY_CONTENTS.ByReference();
final BUFFER.ByReference buffer = new BUFFER.ByReference();
// Memory assign
final int bufferSize = 1048;
buffer.pBuffer = new Memory(bufferSize);
buffer.nBufferSize = bufferSize;
content.tag = "INVALID";
content.buffer = buffer;
content.nElements = 0;
Pointer dev = NativeLibrary.INSTANCE.Call_1();
Pointer obj = null;
if (dev != null) {
obj = NativeLibrary.INSTANCE.call_2(dev, info.objectID);
}
final int readDbStatus = NativeLibrary.INSTANCE.call_3(obj, info.prop, info.index, content, null);
final int responseStatus = NativeLibrary.INSTANCE.call_4(dest, info, content);
return 0;
}
When I analyzed heap dump with windbg I am getting below details:
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(6201c.5ef10): Access violation - code c0000005 (first/second chance not available)
For analysis of this file, run !analyze -v
ntdll!NtWaitForMultipleObjects+0x14:
00007ffa`46deb4f4 c3 ret
0:026> !analyze -v
*******************************************************************************
* *
* Exception Analysis *
* *
*******************************************************************************
*** WARNING: Unable to verify checksum for srv.dll
DEBUG_FLR_EXCEPTION_CODE(c0000374) and the ".exr -1" ExceptionCode(c0000005) don't match
KEY_VALUES_STRING: 1
Key : AV.Fault
Value: Read
Key : Timeline.Process.Start.DeltaSec
Value: 46
PROCESSES_ANALYSIS: 1
SERVICE_ANALYSIS: 1
STACKHASH_ANALYSIS: 1
TIMELINE_ANALYSIS: 1
Timeline: !analyze.Start
Name: <blank>
Time: 2019-12-02T11:13:41.439Z
Diff: 3429439 mSec
Timeline: Dump.Current
Name: <blank>
Time: 2019-12-02T10:16:32.0Z
Diff: 0 mSec
Timeline: Process.Start
Name: <blank>
Time: 2019-12-02T10:15:46.0Z
Diff: 46000 mSec
DUMP_CLASS: 2
DUMP_QUALIFIER: 400
CONTEXT: (.ecxr)
rax=0000000000030000 rbx=000000002b200000 rcx=0000000000000303
rdx=0000000000000003 rsi=01fda8c00000ed00 rdi=000000002b223ef0
rip=00007ffa46d6cb7a rsp=000000002b8ff500 rbp=0000000000000008
r8=0000000000000028 r9=0000000000000030 r10=00000000014da2d0
r11=00000000014e2ef0 r12=0000000000000001 r13=0000000000000003
r14=000000002b223ee0 r15=000000000600c1ba
iopl=0 nv up ei pl zr na po nc
cs=0033 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010246
ntdll!RtlpAllocateHeap+0xdaa:
00007ffa`46d6cb7a 498b07 mov rax,qword ptr [r15] ds:00000000`0600c1ba=????????????????
Resetting default scope
FAULTING_IP:
ntdll!RtlpAllocateHeap+daa
00007ffa`46d6cb7a 498b07 mov rax,qword ptr [r15]
EXCEPTION_RECORD: (.exr -1)
ExceptionAddress: 00007ffa46d6cb7a (ntdll!RtlpAllocateHeap+0x0000000000000daa)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 000000000600c1ba
Attempt to read from address 000000000600c1ba
DEFAULT_BUCKET_ID: HEAP_CORRUPTION
PROCESS_NAME: javaw.exe
FOLLOWUP_IP:
ntdll!RtlpAllocateHeap+daa
00007ffa`46d6cb7a 498b07 mov rax,qword ptr [r15]
READ_ADDRESS: 000000000600c1ba
ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.
EXCEPTION_CODE: (NTSTATUS) 0xc0000374 - A heap has been corrupted.
EXCEPTION_CODE_STR: c0000005
EXCEPTION_PARAMETER1: 0000000000000000
EXCEPTION_PARAMETER2: 000000000600c1ba
WATSON_BKT_PROCSTAMP: 5d1dea24
WATSON_BKT_PROCVER: 8.0.2210.11
PROCESS_VER_PRODUCT: Java(TM) Platform SE 8
WATSON_BKT_MODULE: ntdll.dll
WATSON_BKT_MODSTAMP: 7f828745
WATSON_BKT_MODOFFSET: 1cb7a
WATSON_BKT_MODVER: 10.0.17134.799
MODULE_VER_PRODUCT: Microsoft® Windows® Operating System
BUILD_VERSION_STRING: 17134.1.amd64fre.rs4_release.180410-1804
MODLIST_WITH_TSCHKSUM_HASH: f06ad8a6a7f7267c783c08e3a62df4696020d52f
MODLIST_SHA1_HASH: cdafa8057ac19b1a3608c439ebbfa992407212d6
NTGLOBALFLAG: 0
PROCESS_BAM_CURRENT_THROTTLED: 0
PROCESS_BAM_PREVIOUS_THROTTLED: 0
APPLICATION_VERIFIER_FLAGS: 0
DUMP_FLAGS: 94
DUMP_TYPE: 1
ANALYSIS_SESSION_HOST: MD2E86EC
ANALYSIS_SESSION_TIME: 12-02-2019 16:43:41.0439
ANALYSIS_VERSION: 10.0.18362.1 x86fre
THREAD_ATTRIBUTES:
ADDITIONAL_DEBUG_TEXT: Enable Pageheap/AutoVerifer ; Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]
FAULTING_THREAD: 0005ef10
THREAD_SHA1_HASH_MOD_FUNC: 5d531e271dfb1ef7af4984c7ee0dd671c07337f5
THREAD_SHA1_HASH_MOD_FUNC_OFFSET: d858fa5fb04738fbbbbb9e4df89e26d53dc74794
OS_LOCALE: ENU
BUGCHECK_STR: APPLICATION_FAULT_INVALID_POINTER_READ_HEAP_CORRUPTION
PRIMARY_PROBLEM_CLASS: APPLICATION_FAULT
PROBLEM_CLASSES:
ID: [0n262]
Type: [HEAP_CORRUPTION]
Class: Primary
Scope: DEFAULT_BUCKET_ID (Failure Bucket ID prefix)
BUCKET_ID
Name: Add
Data: Omit
PID: [0x6201c]
TID: [0x5ef10]
Frame: [0] : ntdll!RtlpAllocateHeap
ID: [0n262]
Type: [HEAP_CORRUPTION]
Class: Primary
Scope: BUCKET_ID
Name: Add
Data: Omit
PID: [0x6201c]
TID: [0x5ef10]
Frame: [0] : ntdll!RtlpAllocateHeap
ID: [0n313]
Type: [#ACCESS_VIOLATION]
Class: Addendum
Scope: BUCKET_ID
Name: Omit
Data: Omit
PID: [Unspecified]
TID: [0x5ef10]
Frame: [0] : ntdll!RtlpAllocateHeap
ID: [0n285]
Type: [INVALID_POINTER_READ]
Class: Primary
Scope: BUCKET_ID
Name: Add
Data: Omit
PID: [Unspecified]
TID: [0x5ef10]
Frame: [0] : ntdll!RtlpAllocateHeap
LAST_CONTROL_TRANSFER: from 00007ffa46d69725 to 00007ffa46d6cb7a
STACK_TEXT:
00000000`00000000 00000000`00000000 heap_corruption!javaw.exe+0x0
THREAD_SHA1_HASH_MOD: ca4e26064d24ef7512d2e94de5a93c38dbe82fe9
SYMBOL_STACK_INDEX: 0
SYMBOL_NAME: heap_corruption!javaw.exe
FOLLOWUP_NAME: MachineOwner
MODULE_NAME: heap_corruption
IMAGE_NAME: heap_corruption
DEBUG_FLR_IMAGE_TIMESTAMP: 0
STACK_COMMAND: ** Pseudo Context ** ManagedPseudo ** Value: a3807e8 ** ; kb
FAILURE_BUCKET_ID: HEAP_CORRUPTION_c0000005_heap_corruption!javaw.exe
BUCKET_ID: APPLICATION_FAULT_INVALID_POINTER_READ_HEAP_CORRUPTION_heap_corruption!javaw.exe
FAILURE_EXCEPTION_CODE: c0000005
FAILURE_IMAGE_NAME: heap_corruption
BUCKET_ID_IMAGE_STR: heap_corruption
FAILURE_MODULE_NAME: heap_corruption
BUCKET_ID_MODULE_STR: heap_corruption
FAILURE_FUNCTION_NAME: javaw.exe
BUCKET_ID_FUNCTION_STR: javaw.exe
BUCKET_ID_OFFSET: 0
BUCKET_ID_MODTIMEDATESTAMP: 0
BUCKET_ID_MODCHECKSUM: 0
BUCKET_ID_MODVER_STR: 0.0.0.0
BUCKET_ID_PREFIX_STR: APPLICATION_FAULT_INVALID_POINTER_READ_
FAILURE_PROBLEM_CLASS: APPLICATION_FAULT
FAILURE_SYMBOL_NAME: heap_corruption!javaw.exe
WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/javaw.exe/8.0.2210.11/5d1dea24/ntdll.dll/10.0.17134.799/7f828745/c0000005/0001cb7a.htm?Retriage=1
TARGET_TIME: 2019-12-02T10:16:32.000Z
OSBUILD: 17134
OSSERVICEPACK: 753
SERVICEPACK_NUMBER: 0
OS_REVISION: 0
SUITE_MASK: 256
PRODUCT_TYPE: 1
OSPLATFORM_TYPE: x64
OSNAME: Windows 10
OSEDITION: Windows 10 WinNt SingleUserTS
USER_LCID: 0
OSBUILD_TIMESTAMP: unknown_date
BUILDDATESTAMP_STR: 180410-1804
BUILDLAB_STR: rs4_release
BUILDOSVER_STR: 10.0.17134.1.amd64fre.rs4_release.180410-1804
ANALYSIS_SESSION_ELAPSED_TIME: 307a
ANALYSIS_SOURCE: UM
FAILURE_ID_HASH_STRING: um:heap_corruption_c0000005_heap_corruption!javaw.exe
FAILURE_ID_HASH: {ddc2b378-b1e1-2aec-adc8-f11b7a5773a9}
Any help in fix/debug will be highly appreciated.
I got the solution of to prevent above heap corruption by calling NativeLibrary methods of PropertyHook in another thread. Somehow by calling NativeLibrary methods in different thread heap is not getting corrupted and sub-sequently JVM is not being crashed.

Is it possible to create a list in java using data from multiple text files

I have multiple text files that contains information about different programming languages popularity in different countries based off of google searches. I have one text file for each year from 2004 to 2015. I also have a text file that breaks this down into each week (called iot.txt) but this file does not include the country.
Example data from 2004.txt:
Region java c++ c# python JavaScript
Argentina 13 14 10 0 17
Australia 22 20 22 64 26
Austria 23 21 19 31 21
Belgium 20 14 17 34 25
Bolivia 25 0 0 0 0
etc
example from iot.txt:
Week java c++ c# python JavaScript
2004-01-04 - 2004-01-10 88 23 12 8 34
2004-01-11 - 2004-01-17 88 25 12 8 36
2004-01-18 - 2004-01-24 91 24 12 8 36
2004-01-25 - 2004-01-31 88 26 11 7 36
2004-02-01 - 2004-02-07 93 26 12 7 37
My problem is that i am trying to write code that will output the number of countries that have exhibited 0 interest in python.
This is my current code that I use to read the text files. But I'm not sure of the best way to tell the number of regions that have 0 interest in python across all the years 2004-2015. At first I thought the best way would be to create a list from all the text files not including iot.txt and then search that for any entries that have 0 interest in python but I have no idea how to do that.
Can anyone suggest a way to do this?
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Starter{
public static void main(String[] args) throws Exception {
BufferedReader fh =
new BufferedReader(new FileReader("iot.txt"));
//First line contains the language names
String s = fh.readLine();
List<String> langs =
new ArrayList<>(Arrays.asList(s.split("\t")));
langs.remove(0); //Throw away the first word - "week"
Map<String,HashMap<String,Integer>> iot = new TreeMap<>();
while ((s=fh.readLine())!=null)
{
String [] wrds = s.split("\t");
HashMap<String,Integer> interest = new HashMap<>();
for(int i=0;i<langs.size();i++)
interest.put(langs.get(i), Integer.parseInt(wrds[i+1]));
iot.put(wrds[0], interest);
}
fh.close();
HashMap<Integer,HashMap<String,HashMap<String,Integer>>>
regionsByYear = new HashMap<>();
for (int i=2004;i<2016;i++)
{
BufferedReader fh1 =
new BufferedReader(new FileReader(i+".txt"));
String s1 = fh1.readLine(); //Throw away the first line
HashMap<String,HashMap<String,Integer>> year = new HashMap<>();
while ((s1=fh1.readLine())!=null)
{
String [] wrds = s1.split("\t");
HashMap<String,Integer>langMap = new HashMap<>();
for(int j=1;j<wrds.length;j++){
langMap.put(langs.get(j-1), Integer.parseInt(wrds[j]));
}
year.put(wrds[0],langMap);
}
regionsByYear.put(i,year);
fh1.close();
}
}
}
Create a Map<String, Integer> using a HashMap and each time you find a new country while scanning the incoming data add it into the map country->0. Each time you find a usage of python increment the value.
At the end loop through the entrySet of the map and for each case where e.value() is zero output e.key().

Read Excel file into R with XLConnect package from URL

There are lots of good examples out there on how to read Microsoft Excel files into R with the XLConnect package, but I can't find any examples of how to read in an Excel file directly from a URL. The reproducible example below returns a "FileNotFoundException (Java)". But, I know the file exists because I can pull it up directly by pasting the URL into a browser.
fname <- "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls"
sheet <- c("Sheet1")
data <- readWorksheetFromFile(fname, sheet, header=TRUE, startRow=11, startCol=2, endCol=13)
Although, the URL is prefixed with "https:" it is a public file that does not require a username or password.
I have tried to download the file first using download.file(fname, destfile="test.xls") and got a message that says it was downloaded but when I try to open it in Excel to check to see if it was successful i get a Excel popup box that says "..found unreadable content in 'test.xls'.
Below are the specifics of my system:
Computer: 64-bit Dell running
Operating System: Windows 7 Professional
R version: R-3.1.0
Any assistance would be greatly appreciated.
You can use RCurl to download the file:
library(RCurl)
library(XLConnect)
appURL <- "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls"
f = CFILE("exfile.xls", mode="wb")
curlPerform(url = appURL, writedata = f#ref, ssl.verifypeer = FALSE)
close(f)
out <- readWorksheetFromFile(file = "exfile.xls", sheet = "Sheet1", header = TRUE
, startRow = 11, startCol = 2, endCol = 15, endRow = 35)
> head(out)
Col1 EEI Col3 IESO MHEB Col6 PJM SOCO SWPP TVA WAUE Col12 Other Total
1 Hour 1 272 NA 768 1671 NA 148 200 -52 198 280 NA 700 4185
2 Hour 2 272 NA 769 1743 NA 598 200 -29 190 267 NA 706 4716
3 Hour 3 272 NA 769 1752 NA 598 200 -28 194 267 NA 710 4734
4 Hour 4 272 NA 769 1740 NA 598 200 -26 189 266 NA 714 4722
5 Hour 5 272 NA 769 1753 NA 554 200 -27 189 270 NA 713 4693
6 Hour 6 602 NA 769 1682 NA 218 200 -32 223 286 NA 714 4662
Two things:
Try using a different package--I know the gdata package's read.xls function has support for URLs
Try loading in a publicly-available xls file to make sure it's not an issue with the particular website.
For instance, you can try:
library("gdata")
site <- "http://www.econ.yale.edu/~shiller/data/chapt26.xls"
data <- read.xls(site, header=FALSE, skip=8)
head(data)
XLConnect does not support importing directly from URLs. You have to use e.g. download.file first to download the file to your local machine:
require(XLConnect)
tmp = tempfile(fileext = ".xls")
download.file(url = "http://www.econ.yale.edu/~shiller/data/chapt26.xls", destfile = tmp)
readWorksheetFromFile(file = tmp, sheet = "Data", header = FALSE, startRow = 9, endRow = 151)
or with your originally proposed URL:
require(XLConnect)
tmp = tempfile(fileext = ".xls")
download.file(url = "https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls", destfile = tmp, method = "curl")
readWorksheetFromFile(file = tmp, sheet = "Sheet1", header = TRUE, startRow = 11, startCol = 2, endCol = 13)
library(relenium)
library(XML)
library(RCurl)
firefox=firefoxClass$new()
url="https://www.misoenergy.org/Library/Repository/Market%20Reports/20140610_sr_nd_is.xls"
url=sprintf(url)
firefox$get(url)
This will open a Firefox instance within R and ask you to download the file, which you could then open in the next line of code. I don't know of any R utilities that will open an excel spreadsheet from HTTPS.
You could then set a delay while you're saving the file and then read the sheet from your downloads folder:
Sys.sleep(10)
sheet <- c("Sheet1")
data <- readWorksheetFromFile(path, sheet, header=TRUE, startRow=11, startCol=2, endCol=13)

Categories

Resources