The special feature of this research is artificial intelligence (machine learning) as its basis; prevalence of methods of automatic quantitative analysis over experts’ work in order to prevent subjective assessments and ensure the reliability of the results. Experts were involved as rarely as possible, for example, to remove the most common trends (such as Software, Hardware) and to complete a description of a trend with synonyms (such as SDN, Software Defined Network). Digital technologies allow to significantly expand the sample under study, ensuring a high degree of results reliability, as well as to significantly reduce the initial data processing time, presenting the results and recommendations for management decisions.

The research is based on the analysis of primary sources, mainly text sources. Text fields and metadata of sources have been collected by means of API and downloading robots. Structured data are obtained from arrays by means of machine linguistic analysis, as well as the frequency analysis of references to a particular area of technological development and its scope of application.

Advantages of Using Digital Technologies
Fig.: Advantages of Using Digital Technologies in the Analysis of Digitalization Trends

Research stages:

Stage 1. Normalization

All analyzed word combinations are brought to a normal, initially predetermined form.

Stage 2. Trends Identification

This stage involves identification of areas of technological development:

  • a list of more than 2,000 primary trends was obtained initially on the basis of the search for keywords in scientific publications;
  • the list of primary trends is automatically expanded on the basis of linguistic analysis of other sources (patents, financial information). The result is a list of trends (approx. 3,000);
  • an automatic merger of trends, which are the closest semantically, was carried out on the basis of machine learning methods (by means of probabilistic models) (for example, OpenFLOW and NFV trends were merged with SDN trend). The result is a list of 200 trends;
  • To eliminate inaccuracies made in the course of the previous stage, the obtained trends were verified by experts in information and communication technologies, and the revised list was reduced to 150 trends;
  • the list of trends was supplemented by synonymous expressions on the basis of semantic proximity metrics (e.g. Software Defined Network for SDN trend).
Порядок выявления трендов
Fig.: Trends Identification Procedure

Stage 3. Comparison

At this stage, the list of trends is compared with the trends specific to each document (scientific publications, patents, financial information, media publications).