Kiedy: Wtorek 18.06.2021 godzina 18:00
Temat: Applied DataOps: Automated deployments of Analytics Applications.

https://www.meetup.com/dataops-poland/events/277394154/

One of the core principles of DataOps Manifesto is that 'Analytics is code'. Indeed, analytic teams build analytics applications (or data products) and use a variety of individual tools to access, integrate, model, and visualize data. Fundamentally, each of these tools generates code and configuration which describes the actions taken upon data to deliver insight.
When releasing analytics applications teams quite often leverage DevOps principles like continuous integration and deployment to accelerate delivery of business value through automation. Because analytics applications utilize many tools, automation of deployments is not trivial yet every tool has its own unique characteristics regarding version control, access control or communication interfaces, e.g. version controlled deployment of database changes or incremental deployment of ETL/ELT code. In organizations with many teams - every team will try to automate deployments of their applications which burns some part of the available team budget that could otherwise be used to build new or improve existing business features. On top of that every team will to some extent be reinventing the wheel trying for example to design and implement smart solution to release changes to the data model. Every team will eventually have all steps automated but will probably assume different naming conventions, security models etc. All these things collectively generate unnecessary complexities for operations teams.
In this presentation we will show an alternative approach and introduce a framework called DataOps Assembly which standardizes and automates deployments of analytics applications. Similarly to MLFLow Projects which defines a format for packing data science code - DataOps Assembly introduces a set of conventions for packaging code of analytics applications - so-called installation package format. Teams are responsible for packaging their solution artifacts (database scripts, etl code, configuration) and testing them (continuous integration stage) whereas continuous deployment is handled by DataOps Assembly which effectively works as black box that has scripted know-how on how to deploy respective parts of the application. This single code base is reused by all teams.

Presentation will cover design principles behind DataOps Assembly which address how to handle both standardized configuration of secure application environments and automated deployments of analytics applications. Presented concepts are quite generic and can be applied to both on-premises and public clouds. Presentation will be accompanied by a demo showing how such a framework works when applied to handle automated deployments of applications leveraging Azure services (Azure Data Factory, Azure DataLake, Azure Databricks) and Snowflake.

Speaker:

Łukasz Olejniczak is DXC Technology Analytics Offering Architect where he helps organizations to leverage the power of data by crafting reusable, well-architected, secure and scalable big data platforms and solutions that accelerate innovations through automation and implementation of DevOps, DataOps and MLOps practices.
His background includes a unique mix of software engineering, product design, business intelligence, data integration, data warehousing, data engineering and machine learning with over 10 years of experience working for MicroStrategy, Roche, Autorun, mBank, DXC Technology and their clients.
Łukasz holds a PhD in applied physics from Vrije Universiteit in Brussels/Belgium and SUPELEC in Metz, France.