This post is the first in a series that will highlight the similarities and differences of AI software development with regards to non-AI software development. In this article, we will focus on the software architecture of a complete AI solution.

Developing Artificial Intelligence (AI) software components using techniques such as Deep Learning (DL) or Machine Learning (ML) implies some changes in the way you produce a software solution. In “traditional” software development (later written non-AI software), software engineers write source code in a programming language (python, java, C++, etc.) to implement an algorithm. On the other hand, AI software development does not involve that much coding. Data and research scientists select meaningful data and then pick, tune, and train an existing 3rd party neural network (Tensorflow, Theano, Caffe) model that will generate a prediction software. If these differences appear as a big change in the process of creating algorithmic components, we will see in this post that it does not have a strong impact on how complete software solutions architecture are created.

To illustrate our points, we will consider a video processing solution, accessible through an online platform, to add subtitles to videos automatically (using Natural Language Processing – NLP – methods). The solution expects a video as input and produces a video with inserted subtitles as output.

The Back-End Layer – The AI Core Component

To develop an ML or DL component that provides a prediction solution, you do not write algorithmic code in a programming language. Instead, you write code (often scripts) that invoke 3rd party neural network libraries. These libraries are used to train a model with finely selected data and tuned parameters. Note that this written code will not be in production.

The development process consists of multiple iterations over 3 stages:

  1. Data selection and refinement
  2. Model tuning and training on data to generate the prediction software
  3. Evaluation of the prediction component produced by the training phase

This iterative process is well accepted, and there are plenty of posts and papers that explain it. (See a quick overview here).

The prediction component generated by the training stage will act as a black box; meaning that you do not know how it behaves internally to produce its results. You cannot natively debug it, and link specific results to a specific data or path in some code. This prediction software will be your AI component. It is certainly in the heart of the value of your solution, and where you will invest most of your engineering efforts in your first versions.

For developing an algorithm that would perform a similar prediction task, you would also invest most of your engineering effort but the development would greatly differ. You would follow the classical software development process:

  1. Select 3rd party library used by your component [optional]
  2. Write algorithmic code and compile it [compilation is optional]
  3. Test and evaluate the generated component

Here, your prediction component would be a combination of a black box (3rd party libraries and a compiler) and white box (your code) sub-components. The executed code can be seen as a black-box too, but you can debug it and link results to a specific path in the source code.An interesting analogy is that you can compare the training phase of AI development to the compile phase in coded software development. In coded software development, compilation expects source files and compiler settings as input. It also generates an executable binary file. In ML/DL, the model expects data source files and training settings, and it generates a prediction algorithm.

The Middle-End Layer – Pre and Post-Process Data

Once you have your AI prediction component, this is just the beginning of the story! Your back-end prediction algorithm will expect formatted data and produce raw output data. In order to get your solution usable by other people in addition to expert data scientists and software engineers (and this is what everyone wishes: spread the product to the largest pool of users!), you will need to add user-friendly layers to support various user input data and to create a pleasing format for AI results.

First, you will have to preprocess your input data. This is an important step to ensure AI component efficiency for many ML/DL algorithms. Next, you will need to add another component that will support your product requirements and user’s workflows:

  • Format input data to feed AI component (file format mapping, input data size reduction, etc.)
  • Parse user settings enabling different behaviors

Finally, once the prediction has been performed and has produced an output, you will have to post-process this output to provide a user-friendly version of the results. It can be anything from computing some quantity of interest, to formatting text, to transforming an output file to a user-supported file format.

All of these pre- and post-processing parts will require software development. It will certainly be using “standard” software development: scripting or compiled code. The more mature your solution, the more users you will have, and the more important these parts will become.

Regarding our video processing example, this solution would certainly need to support various input format for images (AVI, MPEG, etc.) to be converted to an internal one on which prediction would apply. You would also do some check on video quality (sound, resolution, etc.) to ensure it matches your input requirements. Then, you would extract audio from the video, and preprocess the audio data before invoking your AI component. Once the ML/DL prediction is finished, your application would need to include generated subtitles in the output video, at the right place, and with the right timing. Finally, you will need to convert the generated video in the format provided by the user.

The UI Layer – Access to AI Solution for End Users

Finally, you want to provide your solution to your end users via a User Interface (UI). The UI layer will depend on your business model:

  • A command line application
  • An API if the consumer of your solution is a 3rd party software
  • A desktop application if your solution has to be installed on the user computer
  • A web application if your solution should be available online

This UI will enable the interaction with your AI component to:

  1. Perform a prediction (with all the required pre- and post-processing steps)
  2. Access to the results of your AI analysis. This can be by generating a document report of the AI analysis results or accessing a graphical user interface that would display the computed results.

It is extremely important that your UI and API are independent of your computing layers (core and pre/post processing components), for several reasons: it is generally not the same developers that are in charge of each component, technologies used for each component are different, and it will ease building, testing, debugging, and deployment.

In our online video processing example, the solution is web-hosted, and access for the user is possible either through your own web portal or through the partners’ application. You would have to propose a web API (usually REST one) as well as a web UI.

Conclusion

Most AI products will be created using this 3-layer architecture:

This architecture is quite similar to the one for non-AI software. The way to develop and test these different solutions are identical for UI and ME. As for code software development, good practice requires you to carefully design each component and make them as independent as possible. This will enable faster build of your product, better debugging, and more efficient testing. Note that the further your product will evolve, the more effort you will spend on the non-core AI components to support the diversity of all of your users and customers’ workflows.

The main difference between AI and non-AI solution development appears to be on the ML/DL core component, where the development paradigm is changing and different tools and processes are required. Testing and efficiency evaluation is also different. This will be the topic of our next post in this series.

Share