I need to create a HashMap of directory-to-file in scala while I list all files in the directory. How can I achieve this in scala?
val directoryToFile = awsClient.listFiles(uploadPath).collect {
case path if !path.endsWith("/") => {
path match {
// do some regex matching to get directory & file names
case regex(dir, date) => {
// NEED TO CREATE A HASH MAP OF dir -> date. How???
}
case _ => None
}
}
}
The method listFiles(path: String) returns a Seq[String] of absolute path of all files in the path passed as argument to the function
Try to write more idiomatic Scala. Something like this:
val directoryToFile = (for {
path <- awsClient.listFiles(uploadPath)
if !path.endsWith("/")
regex(dir, date) <- regex.findFirstIn(path)
} yield dir -> date).sortBy(_._2).toMap
You can filter and then foldLeft:
val l = List("""/opt/file1.txt""", """/opt/file2.txt""")
val finalMap = l
.filter(!_.endsWith("/"))
.foldLeft(Map.empty[String, LocalDateTime])((map, s) =>
s match {
case regex(dir, date) => map + (dir -> date)
case _ => map
}
)
You can try something like this:
val regex = """(\d)-(\d)""".r
val paths = List("1-2", "3-4", "555")
for {
// Hint to Scala to produce specific type
_ <- Map("" -> "")
// Not sure why your !path.endsWith("/") is not part of regex
path#regex(a, b) <- paths
if path.startsWith("1")
} yield (a, b)
//> scala.collection.immutable.Map[String,String] = Map(1 -> 2)
Slightly more complicated if you need max:
val regex = """(\d)-(\d)""".r
val paths = List("1-2", "3-4", "555", "1-3")
for {
(_, ps) <-
( for {
path#regex(a, b) <- paths
if path.startsWith("1")
} yield (a, b)
).groupBy(_._1)
} yield ps.maxBy(_._2)
//> scala.collection.immutable.Map[String,String] = Map(1 -> 3)
Related
I have an object
I try to get access to field "english"
val englishSentence = dbField::class.declaredMemberProperties.filter { it.name == "english" }[0]
But when I
model.addAttribute("sentence", englishSentence)
I get val com.cyrillihotin.grammartrainer.entity.Sentence.english: kotlin.String
while I expect bla
You can use the call function on a KProperty to get its value from the object.
val dbField = Sentence(1, "bla-eng", "bla-rus")
val value = dbField::class.declaredMemberProperties.find { it.name == "english" }!!.call(dbField)
println(value)
Output: bla-eng
Remember that data type of value is Any here. You need to cast it manually to the desired data type.
If you want to list all the properties with their values, you can do this:
dbField::class.declaredMemberProperties.forEach {
println("${it.name} -> ${it.call(dbField)}")
}
Output:
english -> bla-eng
id -> 1
russian -> bla-rus
Do you mean this?
data class Sentence(val id:Int, val english:String, val russian:String)
val dbField = Sentence(1, "blaEng", "blaRus")
val englishProp = dbField::class.declaredMemberProperties.first { it.name == "english" }as KProperty1<Sentence, String>
println(englishProp.get(dbField))
It prints blaEng
I have an ConfigEntry class defined as
case class ConfigEntry(
key: String,
value: String
)
and a list:
val list: List[ConfigEntry] = List(
ConfigEntry("general.first", "general first value"),
ConfigEntry("general.second", "general second value"),
ConfigEntry("custom.first", "custom first value"),
ConfigEntry("custom.second", "custom second value")
)
Given a list of ConfigEntry, I want a map from property -> map of entries that satisfy that property
As an example, if I have
def getConfig: Map[String, Map[String, String]] = {
def getKey(key: String, index: Int): String = key.split("\\.")(index)
list.map { config =>
getKey(config.key, 0) -> Map(getKey(config.key, 1) -> config.value)
}.toMap
}
I get result
res0: Map[String,Map[String,String]] =
Map(
"general" ->
Map("second" -> "general second value"),
"custom" ->
Map("second" -> "custom second value")
)
and it should be
res0: Map[String,Map[String,String]] =
Map(
"general" ->
Map(
"first" -> "general first value",
"second" -> "general second value"
),
"custom" ->
Map(
"first" -> "custom first value",
"second" -> "custom second value"
)
)
The first record from the list is missing. It's probably through .toMap
How can I do this?
Thank you for any help given
You can do something like this:
final case class ConfigEntry(
key: String,
value: String
)
type Config = Map[String, Map[String, String]]
def getConfig(data: List[ConfigEntry]): Config =
data
.view
.map(e => e.key.split('.').toList -> e.value)
.collect {
case (k1 :: k2 :: Nil, v) => k1 -> (k2 -> v)
}.groupMap(_._1)(_._2)
.view
.mapValues(_.toMap)
.toMap
Or something like this:
def getConfig(data: List[ConfigEntry]): Config = {
#annotation.tailrec
def loop(remaining: List[ConfigEntry], acc: Config): Config =
remaining match {
case ConfigEntry(key, value) :: xs =>
val newAcc = key.split('.').toList match {
case k1 :: k2 :: Nil =>
acc.updatedWith(k1) {
case Some(map) =>
val newMap = map.updatedWith(k2) {
case Some(v) =>
println(s"Overwriting previous value ${v} for the key: ${key}")
// Just overwrite the previous value.
Some(value)
case None =>
Some(value)
}
Some(newMap)
case None =>
Some(Map(k2 -> value))
}
case _ =>
println(s"Bad key: ${key}")
// Just skip this key.
acc
}
loop(remaining = xs, newAcc)
case Nil =>
acc
}
loop(remaining = data, acc = Map.empty)
}
I leave the handling of errors like duplicated keys or bad keys to the reader.
BTW, since this is a config, have you considered using a Config library?
Your map will only produce a 1 to 1 result. To do what you want you will need an accumulator (existing map) to do this.
Working with your existing code, if you're especially tied to how you're parsing your primary and secondary keys via getKey you can apply foldLeft to your list instead, with an empty map as an initial value.
list.foldLeft(Map.empty[String, Map[String, String]]) { (configs, configEntry) =>
val primaryKey = getKey(configEntry.key, 0)
val secondaryKey = getKey(configEntry.key, 1)
configs.get(primaryKey) match {
case None =>
configs.updated(primaryKey, Map(secondaryKey -> configEntry.value))
case Some(configMap) =>
configs.updated(primaryKey, configMap.updated(secondaryKey, configEntry.value))
}
}
Simply:
list.map { ce =>
val Array(l, r) = ce.key.split("\\.")
l -> (r -> ce.value)
} // List[(String, (String, String))]
.groupBy { case (k, _) => k } // Map[String, List[(String, (String, String))]]
.view.mapValues(_.map { case (_, v) => v }.toMap) // MapView[String, List[(String, String)]]
.toMap // Map[String, Map[String, String]]
I am new to accumulators in Spark . I have created an accumulator which gathers the information of sum and count of all columns in a dataframe into a Map.
Which is not functioning as expected , so I have a few doubts.
When I run this class( pasted below) in local mode , I can see that the accumulators getting updated but the final value is still empty.For debug purposes, I added a print Statement in add() .
Q1) Why is the final accumulable not being updated when the accumulator is being added ?
For reference , I studied the CollectionsAccumulator where they have made use of SynchronizedList from Java Collections.
Q2) Does it need to be a synchronized/concurrent collection for an accumulator to update ?
Q3) Which collection will be best suited for such purpose ?
I have attached my execution flow along with spark ui snapshot for analysis.
Thanks.
EXECUTION:
INPUT DATAFRAME -
+-------+-------+
|Column1|Column2|
+-------+-------+
|1 |2 |
|3 |4 |
+-------+-------+
OUTPUT -
Add - Map(Column1 -> Map(sum -> 1, count -> 1), Column2 -> Map(sum -> 2, count -> 1))
Add - Map(Column1 -> Map(sum -> 4, count -> 2), Column2 -> Map(sum -> 6, count -> 2))
TestRowAccumulator(id: 1, name: Some(Test Accumulator for Sum&Count), value: Map())
SPARK UI SNAPSHOT -
CLASS :
class TestRowAccumulator extends AccumulatorV2[Row,Map[String,Map[String,Int]]]{
private var colMetrics: Map[String, Map[String, Int]] = Map[String , Map[String , Int]]()
override def isZero: Boolean = this.colMetrics.isEmpty
override def copy(): AccumulatorV2[Row, Map[String,Map[String,Int]]] = {
val racc = new TestRowAccumulator
racc.colMetrics = colMetrics
racc
}
override def reset(): Unit = {
colMetrics = Map[String,Map[String,Int]]()
}
override def add(v: Row): Unit = {
v.schema.foreach(field => {
val name: String = field.name
val value: Int = v.getAs[Int](name)
if(!colMetrics.contains(name))
{
colMetrics = colMetrics ++ Map(name -> Map("sum" -> value , "count" -> 1 ))
}else
{
val metric = colMetrics(name)
val sum = metric("sum") + value
val count = metric("count") + 1
colMetrics = colMetrics ++ Map(name -> Map("sum" -> sum , "count" -> count))
}
})
}
override def merge(other: AccumulatorV2[Row, Map[String,Map[String,Int]]]): Unit = {
other match {
case t:TestRowAccumulator => {
colMetrics.map(col => {
val map2: Map[String, Int] = t.colMetrics.getOrElse(col._1 , Map())
val map1: Map[String, Int] = col._2
map1 ++ map2.map{ case (k,v) => k -> (v + map1.getOrElse(k,0)) }
} )
}
case _ => throw new UnsupportedOperationException(s"Cannot merge ${this.getClass.getName} with ${other.getClass.getName}")
}
}
override def value: Map[String, Map[String, Int]] = {
colMetrics
}
}
After a bit of debug , I found that merge function is being called .
It had erroneous code so the accumulable value was Map()
EXECUTION FlOW OF ACCUMULATOR (LOCAL MODE) :
ADD
ADD
MERGE
Once I corrected the merge function , accumulator worked as expected
I am very new to scala trying to understand by changing the equivalent java to scala so that I get better understanding.
How to convert java 8 map, filter and streams to scala ?
I have the following java 8 code which I am trying to convert to Scala :
public Set<String> getValidUsages(String itemId, long sNo, Date timeOfAccess) {
Set<String> itemSet = Sets.newHashSet();
TestWindows testWindows = items.get(itemId).getTestWindows();
final boolean isTV = existsEligibleTestWindow(testWindows.getTV(), timeOfAccess);
if (isTV) {
itemSet.add(TV);
} else {
final boolean isCableUseable = existsEligibleTestWindow(testWindows.getCableUse(), timeOfAccess);
final boolean isWifi = existsEligibleTestWindow(testWindows.getWifi(), timeOfAccess);
if (isCableUseable || isWifi) {
itemSet.add(MOVIE);
}
}
if (testWindows.getUsageIds() != null) {
itemSet.addAll(testWindows.getUsageIds()
.entrySet()
.stream()
.filter(entry -> existsEligibleTestWindow(entry.getValue(), timeOfAccess))
.map(Map.Entry::getKey)
.collect(Collectors.toSet()));
}
return itemSet;
}
private boolean existsEligibleTestWindow(List<TestWindow> windows, Date timeOfAccess) {
if (windows != null) {
return windows.stream()
.filter(w -> withinDateRange(timeOfAccess, w))
.findAny()
.isPresent();
}
return false;
}
private boolean withinDateRange(Date toCheck, TestWindow window) {
return toCheck.after(window.getStartTime()) && toCheck.before(window.getEndTime());
}
I tried :
def withinDateRange(toCheck: Date, window: TestWindow): Boolean = {
toCheck.after( window.getStartTime ) && toCheck.before( window.getEndTime )
}
def getValidUsages(itemId: String, sNo: Long, timeOfAccess: Date): Set[String] = {
var itemSet = Sets.newHashSet()
val testWindows = items.value(itemId).getTestWindows
val isTV = existsEligibleTestWindow(testWindows.get(0).getTV, timeOfAccess)
if (isTV) {
itemSet += TV
} else {
val isCableUseable = existsEligibleTestWindow(testWindows.get(0).getCableUse, timeOfAccess)
val isWifi = existsEligibleTestWindow(testWindows.get(0).getWifi, timeOfAccess)
if (isCableUseable || isWifi) {
itemSet += MOVIE
}
}
if (testWindows.get(0).getUsageIds != null) {
itemSet.addAll(testWindows.get(0).getUsageIds.entrySet().stream()
.filter((x) => existsEligibleTestWindow(x._2, timeOfAccess)).map(x => Map.Entry._1 )
.collect(Collectors.toSet()))
}
itemSet
}
def existsEligibleConsumptionWindow(windows: List[ConsumptionWindow], timeOfAccess: Date): Boolean = {
if (windows != null) {
return windows.exists((x) => withinDateRange(timeOfAccess, x))
}
false
}
But getting error while doing filter and stream. Can some one point to direct direction ? Any references ? I am getting error on getValidUsages :
compile error “cannot resolve reference project with such signature
This is somewhat difficult to answer since I am unfamiliar with some of the types you use. But if I guess there are types like the following:
trait Window {
def getStartTime: LocalDate
def getEndTime: LocalDate
}
trait TestWindows extends Window {
def getTV: List[Window]
def getCableUse: List[Window]
def getWifi: List[Window]
def getUsageIds: Map[String, List[Window]]
}
then you could just do this:
def withinDateRange(toCheck: LocalDate)(window: Window): Boolean =
window.getStartTime.isBefore(toCheck) && window.getEndTime.isAfter(toCheck)
// Nothing should ever be null in Scala. If it's possible you don't have any ConsumptionWindows you should either
// model it as an empty list or an Option[List[ConsumptionWindow]]
def existsEligibleTestWindow(windows: List[Window],
timeOfAccess: LocalDate): Boolean =
windows.exists(withinDateRange(timeOfAccess))
def getValidUsages(testWindows: TestWindows, timeOfAccess: LocalDate): Set[String] = {
val isTV = existsEligibleTestWindow(testWindows.getTV, timeOfAccess)
val isCableUse = existsEligibleTestWindow(testWindows.getCableUse, timeOfAccess)
val isWifi = existsEligibleTestWindow(testWindows.getWifi, timeOfAccess)
val tvOrMovie: Option[String] = if (isTV) Some("TV")
else if (isCableUse || isWifi) Some("MOVIE")
else None
val byUsageId = testWindows.getUsageIds.collect { case (key, windows) if existsEligibleTestWindow(windows, timeOfAccess) => key }.toSet
tvOrMovie.toSet ++ byUsageId
}
In your original code there was presumably some items value, but in the above I just assume you do TestWindows testWindows = items.get(itemId).getTestWindows() outside the getValidUsages function.
My example doesn't use java structures at all and just uses the scala core collections. The other main difference is that I use immutable data structures which is, I think, a little easier to follow and generally safer.
Some items of note:
1) The Option.toSet operation result in an empty set when called upon a None.
2) There is an example of function currying used for the withinDateRange method.
3) I obviously have no idea what your original types do and had to guess at the relevant parts.
The problem seems to be that you are using Java types in Scala while depending on scala's map & filter operations. This has its own troubles but if you convert the list/collections to a Scala's collections first (warning, Scala types are immutable by default), then you should be able to just use the map/filter operations without having to call java's stream() method.
def getValidUsages(itemId: String, sNo: long, timeOfAcess: Date): Set[String] = {
var itemSet: Set[String] = Sets.newHashSet()
val testWindows: TestWindows = items.get(itemId).getTestWindows()
val isTV: Boolean = existsEligibleTestWindow(testWindows.getTV(), timeOfAccess)
isTV match {
case true => itemSet.add(TV)
case false => {
val isCableUseable: Boolean = existsEligibleTestWindow(testWindows.getCableUse(), timeOfAcess)
val isWifi: Boolean = existsEligibleTestWindow(testWindows.getWifi(), timeOfAccess)
if(isCableUseable || isWifi) {
itemSet.add(MOVIE)
}
}
}
if(testWindows.getUsageIds() != null) {
itemSet.addAll(testWindows.getUsageIds()
.stream.
.filter(entry => existsEligibleTestWindow(entry._2, timeOfAccess))
.map(filteredData => Map.Entry._1)
.collect Collectors.toSet())
}
itemSet
}
def existsEligibleTestWindow(windows: List[TestWindow], timeOfAcess: Date): Boolean = {
windows match {
case null => false
case _ => windows.stream.filter(data => withinDateRange(timeOfAcess), data).findAny().isPresent
}
}
Good luck :)
Is it possible to get ClassTag information from a Java Class instance obtained via reflection?
Here's the situation. I have a Scala case class that looks like this:
case class Relation[M : ClassTag](id: UUID,
model: Option[M] = None)
And it is used like this (although with many more classes related to each other):
case class Organization(name: String)
case class Person(firstName: String,
lastName: String,
organization: Relation[Organization])
What I'm trying to do is programmatically build up a tree of these relations using something that looks like this:
private def generateFieldMap(clazz: Class[_]): Map[String, Class[_]] = {
clazz.getDeclaredFields.foldLeft(Map.empty[String, Class[_]])((map, field) => {
map + (field.getName -> field.getType)
})
}
private def getRelationModelClass[M : ClassTag](relationClass: Class[_ <: Relation[M]]): Class[_] = {
classTag[M].runtimeClass
}
def treeOf[M: ClassTag](relations: List[String]): Map[String, Any] = {
val normalizedRelations = ModelHelper.normalize(relations)
val initialFieldMap = Map("" -> generateFieldMap(classTag[M].runtimeClass))
val relationFieldMap = relations.foldLeft(initialFieldMap)((map, relation) => {
val parts = relation.split('.')
val parentRelation = parts.dropRight(1).mkString(".")
val relationClass = map(parentRelation)(parts.last)
val relationModelClass = relationClass match {
case clazz: Class[_ <: Relation[_]] => getRelationModelClass(clazz)
case _ => throw ProcessStreetException("cannot follow non-relation: " + relation)
}
val fieldMap = generateFieldMap(relationModelClass)
map + (relation -> fieldMap)
})
relationFieldMap
}
val relations = List("organization")
val tree = treeOf[Person](relations)
This won't compile. I get this error:
[error] Foo.scala:148: not found: type _$12
[error] case clazz: Class[_ <: Relation[_]] => getRelationModelClass(clazz)
[error] ^
[error] one error found
[error] (compile:compile) Compilation failed
Basically, what I'd like to do is be able to access the ClassTag information when all I have is a Java Class. Is this possible?
Yes, it is absolutely possible and very easy:
val clazz = classOf[String]
val ct = ClassTag(clazz) // just use ClassTag.apply() method
In your example you'd want to call getRelationModelClass method like this:
getRelationModelClass(clazz)(ClassTag(clazz))
This is possible because [T: ClassTag] syntax implicitly creates second parameters list like (implicit ct: ClassTag[T]). Usually it is filled by the compiler, but nothing prevents you from using it explicitly.
You also don't really need to pass the class AND class tag for this clazz at the same time to the method. You're not even using explicit class object in its body. Just pass the class tag, it will be enough.
I ended up accomplishing my goal using TypeTags and the Scala reflection API. Here are the changes necessary.
First, change the Relation class to use a TypeTag.
case class Relation[M : TypeTag](id: UUID,
model: Option[M] = None)
Then change the rest of the code to use the Scala reflection API:
private def generateFieldMap(tpe: Type): Map[String, Type] =
tpe.members.filter(_.asTerm.isVal).foldLeft(Map.empty[String, Type])((map, field) => {
map + (member.name.toString.trim -> member.typeSignature)
})
private def getRelationModelType(tpe: Type): Type =
tpe match { case TypeRef(_, _, args) => args.head }
def treeOf[M: TypeTag](relations: List[String]): Map[String, Any] = {
val normalizedRelations = ModelHelper.normalize(relations)
val initialFieldMap = Map("" -> generateFieldMap(typeTag[T].tpe))
val relationFieldMap = relations.foldLeft(initialFieldMap)((map, relation) => {
val parts = relation.split('.')
val parentRelation = parts.dropRight(1).mkString(".")
val relationType = map(parentRelation)(parts.last)
val relationModelType = getRelationModelType(relationType)
val fieldMap = generateFieldMap(relationModelType)
map + (relation -> fieldMap)
})
relationFieldMap
}