Data mining projects are viewed under the umbrella of Cross Industry Standard Process and have 6 major phases: business understanding, data understanding, data preparation, modeling, evaluation and deployment.
Current project implementation is based on SAS platform and could be divided on four logical parts:
- Data acquisition
- KRI processing
- Data visualization, reporting, dash boarding
- Case Management Technological Stack:
Hadoop
Spark 1.4 - 2.0
Scala
ScalaReact.js
Scala.JS
JavaScript
WebSocket
Tomcat
Oracle
Control-M
TeamCity
Processes/Metodologies:
Agile
CI
Test Automatization
Wymagania
Responsibilities:Practical experience in Scala programming language and Spark.
Practical experience in development of: Domain Specific Languages, AST, Symbolic Expressions and corresponding concepts to enterprise applications.
Machine learning algorithms on large distributed data sets.
Knowledge of Functional Programming techniques and corresponding libraries (Scalaz/Cats, Functional Java). Having overall 3 years of working experience.
Skills Required:
Upper-Intermediate English level.
Must have skills in following languages/technologies:
• Scala, Scalaz/Cats;
• Apache Spark, SparkR, MLlib/Spark ML, Spark API: DDR, DataFrame, DataSet;
• Apache Zeppelin;
• Parboiled;
• Clojure, Gorilla REPL, Sparkling and Flambo, Incanter;
• Hadoop, Parquet files;
• Oracle WebLogic Server, Apache Tomcat;
A Big Plus to have experience in:
• experience in design and development of custom Interpreters/Compilers is a big plus;
• practical experience in Scala Macros and Quasiquotes is a plus;
• experience in generic programming and libraries (Shapeless) is plus;
• Notebooks: Spark-notebook, Scala Notebook, Gorilla REPL, Jupyter Notebook;
• Amazon AWS;
• iPhyton, Phyton;
• Akka;
• KDB;