MLOps in Industry 4.0

A Roadmap for Machine Learning on a Digital Twins Platform

Arthur Flor
7 min readDec 30, 2022
Source: How Oil and Gas Industry Operators Benefit from Digital Twin Technology

In the last two years, I have worked on a Research and Development (R&D) project. The objective was to build a Minimum Viable Product (MVP) of a Digital Twins platform for the energy industry, using machine learning models and data stream.

In the beginning, it wasn’t easy to understand all the concepts at once. So after a long time learning about it, I focused on the development of a tool that helps in the setup and self-management of the lifecycle of machine learning models (training, evaluation, deployment). That is a small part of the whole project, and I would say it is more like an interface between the platform and the models developed by the team.

Some Concepts

I read several concepts about Digital Twins two years ago, and I felt completely lost. Asking me things like: how do I make one? — where do I start? — what is necessary? — what platforms or tools help me to create one?

Still far from the best definitions, today I can try to explain on my own. Industry 4.0 is the transformation of products into services through technological environment within companies. Roughly speaking, it’s composed of massive data ingestion, cloud processing and machine learning models. Digital Twin is the way to use that technological environment. Since Internet of Things (IoT) allows data mapping through sensors, why not map all the main information about equipment, sector, industry, or even a city? After that, all the data is used by machine learning models to cross-information and simulate reality with predictions. Finally, MLOps (Machine Learning Operations) refer to the steps required to train machine learning models, select the best model already trained, and make it available for use.

That said, I would like to share my understanding of some important pillars to start developing a digital twin project. It’s worth mentioning that it was a private work, so there will be no mention of code or sensitive information.

Real Life

I considered Real Life as a pillar because it’s a crucial factor in the development of a project. Here you will find the daily challenges and facilities. Not to mention that each scenario will have its specificities.

As expected, some challenges arose due to my project requirements. Among them, I can highlight that the production environment should have a simplified deployment pipeline, different from the development server. The goal was to create an isolated and lightweight production environment with only models trained in the development server.

Another point was the modularization of all components through parameters. This made it possible to reuse and recombine various functions in different model pipelines. Furthermore, the tool should be able to manage sets of pipelines per client. All that in an automated and self-managing way.

Thus, I was responsible for building this part of the project. Figure 1 highlights the mentioned component within the project overview.

Figure 1: Project overview

Continuous Integration/Continuous Delivery (CI/CD)

CI/CD is responsible for automating the lifecycle process of an application in a target environment. This process runs new builds and makes them available on servers (development or production, for example).

This process can be configured through repositories and Git pipelines, which trigger actions with certain conditions (such as commits or tags). In other words, the objective is to create an automatic mechanism for code repositories to be processed, configured, and made available on the server with each triggering action.

Container Service

After Integration and Delivery, you need to know how you will make your application available on the server. For our context, we used the simple Docker and Docker Compose to build and configure containers with isolated environments for each module.

Container is a software that packages up code and all its dependencies. It’s a lightweight, standalone, executable software that includes everything needed to run an application. So, the objective is to create isolated and scalable environments for each service that will be made available. In such a way, with each code update, only the referring container will be updated, instead of the entire project.

At this point, the code is already linked end-to-end between the local machine, git repository, and server. Thus, the development of modules will be updated in the git repositories, which will trigger actions. These actions will take the new code and throw it on the target server, and then it will package and make it available in a container.

MLFlow and Airflow

Here is the most important part, through two complementary frameworks. Basically, MLFlow is responsible for training, selecting, and serving the best machine learning models. Airflow is responsible for scheduling and triggering pipelines in a synchronous and dependent way, which concerns the lifecycle of machine learning models in our context. It’s important to remember that at this point, the data has already been ingested, processed, and stored for use.

The details and motivations of the code are very extensive. So I separated the main challenges and goals to be achieved concerning this module (MLFLow + Airflow).

First — standardize the environment variables between the two frameworks. This was necessary because MLFlow configures and uses its folder (mlruns) in a relative path (inside python script folder). On the other hand, Airflow configures and uses its folder (airflow) in an absolute path (inside user folder). In this way, some environment variables were configured, centralizing the artifacts generated by the two frameworks. Thus, it was possible to match the same environment between localhost and container; in addition, it was possible to isolate environments between clients (did you already forget about this requirement?).

Second — add an abstraction layer to simplify commands. Many pipelines need an extensive command line; or perhaps partially run through code snippets with a specific setup. In any case, it is important to provide an execution abstraction layer. So, If previously we unified the target folder of artifacts, here we unify the command entries.

Third — parameterize everything. It’s necessary to have good flexibility and integration between the scripts. Thus, the chain of parameters enables different setups with dynamic imports of modules and the reuse of parts of the code. Some examples of what to parameterize: client, data source, models, hyperparameters, pre- and post-processing functions.

Fourth — architect and organize all the code. The goal here is to keep the code readable to make it easy for another developer to join the team and maintain it. So, I recommend creating and maintaining development standards within the team, as well as, good code documentation with small tutorials in Jupyter Notebook.

Overall, the code was modularized and resolved our issues. Figure 2 shows the main code components.

Figure 2: Main components overview

Dashboards

The icing on the cake is the dashboards that MLFlow and Airflow provide. Through them, it’s possible to monitor the status of models and pipelines in real-time. Figure 3 shows a sample of the MLFlow (top) and Airflow (bottom) dashboards.

Figure 3: MLFlow (top) and Airflow (bottom) dashboards

Bonus — I also took the opportunity to integrate custom data analysis models into the training and deployment process, and provide interactive web pages. So, using Pandas Profiling and Streamlit, it was possible to generate web pages to analyze the data that are being used in the training of machine learning models. Figure 4 shows Pandas Profiling (left), which provides simpler analysis, and Streamlit (right), which provides customized analysis focused on time series.

Figure 4: Pandas Profiling (left) and Streamlit app (right) dashboards

Conclusion

The two-year-long project was proposed to develop a Digital Twins platform MVP for the energy industry. Within this huge project, I was responsible for developing a machine learning model orchestration tool, which is responsible for the self-management of models in containers. In addition, I also had the objective of simplifying the development of new pipelines, through modularization and code reuse, which enables scalability for N clients and N machine learning models — it was an amazing and powerful tool that made me proud for sure.

In this post, I described some knowledge acquired along the journey, in an attempt to organize and structure the ideas, and who knows, help someone. It was really a challenge to learn and apply everything, but it certainly helped me to evolve professionally.

Finally, after many challenges and obstacles (stress and demotivation too); the feeling that remains is happiness and mission accomplished. Today, I can say for the last time — There are 0 weeks left to the end of the project.

--

--

No responses yet