As a DevOps or MLOps engineer, it is critical to have a good understanding of the toolset and processes being used at every stage of the pipeline, including development, testing, deployment, scaling, monitoring and so on. This understanding will grant you credentials that enable you to have in-depth conversations with the teams that you support, make suggestions to efficientize their workflows and become a major influence in the end-to-end process. While it would be impossible to become an expert in every tool that is being used, the more you know, the better.
Developing an enterprise-ready application that is based on machine learning requires multiple types of developers. Each type of developer is going to have their own kinds of tools to help them. The developers we’re going to look at are big data database developer, frontend developer, backend developer and data scientist. Generally, at least some of these roles are different people, though sometimes a full-stack developer will fill several or even all of them.
In general, the database needed for big data applications is one that will handle unstructured data from multiple sources. This is the classic use case for the NoSQL database types. They generally allow a more flexible record structure, which doesn’t require every record to have the same field set. One thing to keep in mind with NoSQL databases is that unlike relational databases, there is no standard shared among them. This means that the design, planned implementation and usage will be entirely dependent on the actual tool used. Examples of NoSQL databases that are common for big data projects are ElasticSearch and MongoDB.
Front-end developers need to be able to visualize their code and often use WYSIWYG tools (what you see is what you get, pronounced “wizzy wig”) for development. These tools generally focus on event-driven behavior, enabling the developer to program an initial display and additional functionality for mouse-driven events, such as button click, mouse over and keyboard input for example. This functionality should focus strictly on elements related to the user interface, while any business logic should be handled by the backend. Even when there is a single full-stack developer programming both the front-end and the back end, there should be a clear separation of what aspects go in the front-end, back-end and database. This ensures that the code is maintainable, readable, and properly structured. Well known WYSIWYG editors are Adobe Dreamweaver and Brackets.io.
See also: MLOps Execution Engines
Back-end developers generally focus on the business logic of an application. They often like to use a fully featured IDE (integrated development environment) that will ease the process of development for a large code-base. The IDE generally includes code-completion, so the developer doesn’t have to remember all the objects, methods, and libraries that they are using. It enables the developer to quickly jump to references of code in different areas of the code-base. Debug functionality is built in, including the ability to add breakpoints and look at and manipulate variable and object values during the execution. IDEs often enable integration with the developer’s toolset, such as git for source code/version control, SCA, code coverage and so on. Examples of IDEs are MS Visual Studio Code and JetBrains PyCharm.
Data scientists are focused on analyzing and visualizing data. They often use a special kind of IDE, called a notebook, that enables them to continuously test their models and theories. Notebooks can include rich text, code and dynamic and static visualizations all integrated together. This facilitates sharing work that can be understood and commented on by everyone. When developing in Python and using data science geared libraries, such as numpy, pandas and matlab, it is possible to develop short EDA (Exploratory Data Analysis) models that are instantly visible from within the notebook. The rich text functionality gives them the ability to write out their hypotheses and ideas, along with any initial results. Once the model is proven, the code is often transferred into a module or library that can be called from backend applications.
See also: What is MLOps? Elements of a Basic MLOps Workflow
Addressing the complexity of MLOps
In summary, a complete enterprise ML application requires multiple aspects, each of which requires its own tools and methodologies. By understanding the tools that are appropriate for each role, you will be able to work with them in facilitating their work. Very often, people who aren’t intimately familiar with a specific role will assume that they can use the same tooling as other similar roles. While it may be possible to deploy a single tool that will work for everyone, it will not be the proper tool that will give them the greatest benefit.
Sim Zacks is a DevOps Architect at GM, working on connecting the pieces to deliver the software for the autonomous vehicle. His 25+ years of experience include multiple technology fields including DevOps/CI-CD, software development, system & network administration, database development & design, quality engineering and leading large projects. Sim uses his knowledge & experience to seamlessly connect and translate all aspects of the technology stack with business requirements. He is a mentor & strategist, working with people to overcome everyday challenges and further their careers. He writes a blog on personal strategy, which can be found on his LinkedIn profile. He is also a speaker for local and international conferences. Sim earned an MBA from the University of Phoenix and a BSc in Computer Science from Lawrence Technological University.