Assessing the Impact of Big Data on Large Organizations’ Strategies *

Big data and big data analytics’ (BDA) importance for business management is growing at an unmatched pace. Such datasets


The Reshaping Role of Big Data in Modern Organization
Big data is the name given to unstructured datasets characterized by dimensions and complexity that cannot be explored using any traditional statistical data analysis software (Erevelles et al., 2016;Manyika et al., 2011). Their importance has been highlighted by studies on business management and management information systems due to the extremely valuable information that these datasets may hold (McAfee et al., 2012). Scholars have assessed how the informative content of big data allows organizations to predict with a certain degree of reliability future sales trends, the forecast consumption of raw materials, and workforce performance (Chen et al., 2012). Yet, in relation to their complexity, the implementation of big data analytics (BDA) has been considered among the biggest challenges an organization could face .
Despite this growing attention, however, scant attention has been paid to holistic investigation of the strategic use of big data and BDA in modern organizations (Davenport et al., 2015;Erevelles et al., 2016;Rialti et al., 2019a;Rialti et al., 2019b). In detail, while several papers have observed how BDA capabilities (personal, managerial and infrastructural capabilities) could affect organizational performance, literature is still missing a comprehensive view of the impact that more skilled personnel could have on the overall organization. Therefore, starting from this gap, the purpose of this research is to explore how the presence of personnel skilled in BDA within modern large organization could reshape the organization and its functioning, and then affect performance. In particular, the objective of the research is to develop a framework on the impact of big data and BDA skilled personnel moving from qualitative data. Large organizations have been chosen as the context of this research as they are the organizational typology most affected by big data and the ones that can obtain the greatest benefits from these datasets. Large organizations are in fact the organizations that can usually more easily implement BDA processes like (1) data acquisition, (2) data cleansing, (3) data integration, (4) data modelling, and (5) data interpretation (Cillo et al., 2019;Labrinidis & Jagadish, 2012). Furthermore, the implementation of big data analytical processes requires a considerable investment. Therefore, only large organizations can afford to take advantage of these resources frequently without having significant problems covering investments for BDA processes. Finally, large organizations are the organizational typology that can obtain the greatest benefits from the implementation of processes for big data analysis (Motamarri et al., 2017). On the one hand, they are usually the only type of organization that can collect quantities of information sufficiently large to be classified as big data. Moreover, the greatest benefits associated are usually linked to efficiency gains deriving from the reduction of production wastage, and these kinds of benefits are usually easier to see in organizations that produce many types of product and operate in different markets (De Mauro et al., 2016, 2017Camussone & Biffi, 2018).
After this brief introduction, the present research is structured as follows. The next section will review the existing literature on big data in management science. In particular, the focus will be on defining the expression 'big data', reviewing the existing studies in terms of their strategic potential, and exploring the professional figures involved. The third section describes the methodology that was used. Finally, the third and fourth sections present the discussion of results, conclusions and suggestions for future research.

The Increasing Attention to Big Data in Management Science
The advent of big data generated considerable interest among scholars dealing with economic and managerial disciplines. Many authors focused on identifying the potential of big data availability for business management (Chen et al., 2012;McAfee et al., 2012;Provost & Fawcett, 2013). Nevertheless, the research conducted so far is uneven and too anchored to study of the impact of this phenomenon on specific areas. In this sense, the need to provide unitary frameworks capable of holistically representing its strategic potential emerges.
Starting from this premise, in the following subsections, we deal first with identifying the characteristics; subsequently, the professional figures necessary for organizations to be able to derive benefits from those, will be explored.

A Taxonomy of the Expression 'Big Data'
As previously suggested, it is possible to define as big data any set of unstructured data that is too large and complex to be analysed using traditional information technology tools . Therefore, any dataset that cannot be analysed by means of rigid computer programs such as Excel, SPSS or Stata can potentially be identified as "big data". To analyse big data, an advanced knowledge of non-relational databases (e.g. NoSQL) and computer programming languages (e.g. Python, Java, C ++) is necessary.
Volume, velocity and variety were the first three characteristics identified by scholars (McAfee et al., 2012). These are also known as the 3Vs. In addition to these historical features, four other modern features have recently been identified: veracity, variability, visualization and value. The four additional characteristics, together with the three historical ones, have been called the 7Vs that distinguish big data (Mishra, 2015;Zaïane, 2015).
In relation to the historical characteristics, McAfee et al. (2012) state that the difference from traditional datasets is the volume. In fact, only datasets exceeding one petabyte (10 6 gigabytes) in terms of size can be classified as big data (McAfee et al., 2012). The choice of this unit as a reference is consistent with the technological evolution. For example, Facebook every day accumulates more than 100 petabytes of data deriving from the actions of online users (Kaplan & Haenlein, 2010). However, a dataset cannot be classified as big data only according to its volume: huge structured datasets should be considered only as large datasets, which are datasets of unusual dimension (Erevelles et al., 2016). Moving on from this, the importance of the other historical characteristics of big data can be understood. Another aspect is the difference from traditional large datasets because their size can increase rapidly (velocity) and constantly (IBM, 2012). Finally, for a dataset to be classified as big data, it must be characterized by variety; therefore it must be unstructuredthat is, not initially organized when it originatedand made of not necessarily homogeneous sub-components (Davenport et al., 2015). Tweets, posts and reactions on Facebook, feedback on Amazon or eBay, videos uploaded on YouTube, photos shared on Instagram, text messages and voice messages have been identified over the last decade as components of most big data (George et al., 2014;McAfee et al., 2012).
□ Regarding the modern characteristics, IBM's report (2012) titled What is Big Data? has identified veracity as the fourth characteristic: it has been stated that only authentic data originating from reliable sources can be considered big data (Mishra, 2015). Therefore, it is possible to state that they are not wholly reliable if it is not possible to identify the process that generated them (Mishra et al., 2017). Regarding the characteristic of variability, on the one hand, it has emerged how the sub-components of big data can derive from numerous inhomogeneous sources (McAfee et al., 2012). On the other hand, since the process of generating is constant due to the speed (velocity) that distinguishes them, the content can change quickly, following consumer preference changes, for example (Demchenko et al., 2014). In summary, it is possible to explain the fifth feature by describing them as dynamic datasets or as a continuous flow of information (Davenport et al., 2015;Brondoni & Zaninotto, 2018).
Big data must also be characterized by potential visualization through computerized analysis technologies (Zaïane, 2015). The value of the information contained is the last modern feature (Everelles et al., 2016;Rialti et al., 2016).
In light of the stated characteristics, therefore, it is possible to classify as big data any dynamic and unstructured dataset that cannot be analysed through the use of traditional data analysis software programs and that is characterized by high information potential due to its origin from reliable sources.
Over the last decade, the literature dealing with managerial science has explored the importance of this phenomenon. A first stream of research focused on the methods by which big data can be treated and analysed. The topics of greatest interest have concerned the storage of big data, the advanced calculation methodologies to process them and, finally, the development of information technology architectures composed of multiple connected hardware. The solutions proposed for their analysis have been the use of techniques based on machine learning or systems based on cloud computing. It was observed how architectures centred on data lakes (which are huge cloud-based databases often shared between several organizations) can be a viable and technologically suitable solution. Meanwhile, a second stream of research dealt with investigation into their informative potential and how organizations must necessarily update themselves in order to be able to take advantage (Demchenko et al., 2014). In particular, the authors working in this field have highlighted how organizations must develop the technical skills needed to analyse big data, or try to increase the organizational BDA capabilities . In addition, the literature has explored how organizations must also prepare ad hoc information analysis processes in order to analyse it (Labrinidis & Jagadish, 2012).

Professional Figures in the Big Data Era
In order fully to take advantage of the information potential of big data, in addition to technologies and processes, the need for organizations to endow themselves with new professionals (Saccardi, 2003;Mishra, 2015) capable of analysing such data has recently emerged. Davenport and Patil (2012, p. 70) have identified the data scientist as "the sexiest job of the 21st century". Given the various needs present in the organisation, the data scientist is only one of the many professional figures required. In this sense, the relevant literature has shown that the presence of data analysts (Davenport et al., 2015), data architects (Gardiner et al., 2017), data engineers (Provost & Fawcett, 2013) and data managers (Miller, 2014) is also usually necessary for organizations that try to use big data to obtain competitive advantages.
The data analyst is technician who deals with the structuring and analysis of data; his/her skills are related to the programming capability needed to analyse and organize unstructured computer data (De Mauro et al., 2016). Differently, the data scientist is responsible for extracting the most significant content from the data and identifying the most relevant patterns from them. Since there is a link between the analysts and the upper hierarchical levels, the data scientist must also be able to render the data readable and must be equipped with management, marketing and finance skills De Castillo et al., 2015). The data architect, instead, is a figure who is concerned with organizing and maintaining the hardware structures for the collection, storage and analysis of big data (Gardiner et al., 2017). Similarly, the data engineer is a type of electronic-computer engineer who takes care of the software infrastructure (Provost & Fawcett, 2013). Finally, the data manager is designated to interpret the data and extract strategic decisions. Therefore, data managers are the people who actually make strategic data-driven decisionsthat is, decisions based on both big data analysis results and managerial insight (Ciappei & Cinque, 2015) in order to turn big data information into competitive advantages (Provost & Fawcett, 2013).

The Fragmented Literature on Big Data in Organizations and Competitive Strategies
As mentioned, during the last decade the literature that deals with business management has begun to explore critically the impact of big data on organizations and on competitive strategies . Specifically, scholars have focused on identifying the impact of big data on traditional business intelligence processes (Chen et al., 2012), on marketing strategies (Erevelles et al., 2016), on operations management (LaValle et al., 2011) and on investment management .
Regarding their impact on business intelligence, Chen et al. (2012) have recently identified big data as the trigger for the transition from Business Intelligence 2.0a business intelligence paradigm based on the use of semi-structured datasets and typical Web 2.0 technologies (Doan et al., 2011) to Business Intelligence 3.0, which is instead based on the analysis in real time of unstructured data coming from users' activities and machinery sensors (Chen et al., 2012). With this new technological paradigm, the possibility emerged to base the analyses directly on the single customer, product and process; to have data concerning the various locations of the individuals in whom the organization is interested; and, finally, to carry out context-relevant analyses on every aspect concerning business performance (Chen et al., 2012). Moreover, thanks to the use of cloud-based analysis and computation systems aimed at analysing big data, analyses can be performed in real time (Demirkan & Delen, 2013). To be able to take advantage of all the benefits associated with Business Intelligence 3.0, the need to adapt internal and external information systems in a 3.0 perspective has emerged among the organizations (Goes et al., 2014).
As a result of the evolution of business intelligence systems, the other potentialities of these resources have recently been identified. In fact, thanks to big data and their analysis systems, it is possible to carry out surveys directly on individuals and now marketers can carry out sales campaigns aimed at the individual customer (Erevelles et al., 2016). Moreover, thanks to the speed with which the data reach the organization and the speed of calculation of the 3.0 information systems, customized marketing campaigns can be performed almost in real time (Provost & Fawcett, 2013). Therefore, big data allow managers to offer their customers combinations of products and services more pertinent to their requests, thus improving the organization-customer relationship (Mayer-Schönberger & Cukier, 2013; Barile & Polese 2018). Thanks to the information potential of big data, moreover, marketing campaign planning managers can today try to extrapolate forecasts about the future behaviours of consumers observing a high number of previous behaviours (Erevelles et al., 2016;Zirpoli and Cabigiosu, 2018). In terms of the impact on the management of organizations' operations, on the other hand, the possibility of making a triple check on organizations' operations has emerged, thanks to the machineries' sensors and tracking technologies. Specifically, 3.0 information systems capable of analysing big data have made it possible to increase the efficiency of procurement and distribution processes (Waller & Fawcett, 2013). In addition, thanks to the sensors placed on the organization's machinery, which in turn are connected to the internet (Rieple & Pisano, 2015;Caputo et al., 2016), managers are able to monitor each stage of the production process constantly through consoles that are typical of Business Intelligence 3.0 systems, and act accordingly (Waller & Fawcett, 2013). Finally, it has been observed that thanks to big datain particular, those deriving from customers' credit cards or concerning the credit history of subjects that have relations with the organizationit is possible for the organization to plan more favourable extension policies with customers and suppliers, thus improving the management of working capital (Kitchin, 2017).
Despite the high number of studies concerning the definition of big data, their impact on competitive business strategies and the professional figures required, the absence of frameworks able to represent the challenges related to the availability of big data holistically is evident, and the same can be said for big data analytics capabilities in large organizations (Erevelles et al., 2016;Trento et al., 2018). This research seeks to contribute to the literature on the relevance of big data in business management, with particular attention to large organizations, which are the type of organizations mostly impacted by the advent of these new elements and the ones that can benefit from these data the most.
Therefore, the research question underlying this work is the following: □ RQ: Is it possible to create a framework capable of portraying professional figures necessary in the era of big data, disciplines of business management impacted by big data and the strategic potential of this type of data?

Methodology
In order to answer the aforementioned research question, a coding analysis was performed on a dataset composed of job offers, a procedure that is consistent with previous studies (De Mauro et al., 2016;Gardiner et al., 2017). Indeed, job offers have extraordinary information potential. In particular, since they contain the description of the offered position and the goals that the candidate will have to achieve, if well explored they may be able to reveal precious information about organizational strategies. Starting from these considerations, this research focuses on identifying the latent strategic function contained in the description of each offered position. In fact, following this procedure, it was possible to obtain the strategic functions of each figure. Then, it was possible to create a conceptual map indirectly inferring the strategic impact on the organization from each single strategic function identified in more than one job offer related to the same type of professional figure required.

Data Collection Procedure and Creation of the Dataset
The final dataset used in this research was built through an iterative process divided into several phases. Firstly, 14,931 job vacancies for positions in Europe, containing the expression 'big data' in the job description were extracted using a web scraper software. As a preliminary step, we did a web scraping (De Mauro et al., 2016) from the LinkedIn site in order to create the initial unstructured dataset to be analysed and transformed into a usable database and from which to conceptualize the themes. The queries made to the web crawler using the Portia web scraper software, freely downloadable, were structured as follows: Big_Data ∪ EU_Country. In order to collect data from all 28 member countries of the European Community, 28 queries were carried out, from time to time replacing EU_Country with the country name. Job offers were collected from an online platform that aggregates those already existing on multiple sites or on social media. Only those offers that were present for less than a month were selected, thus reducing the dataset to 10,430 units. In fact, it was necessary to consider only recent offers in order to exclude expired offers, offers incorrectly posted by organizations or any duplicates. The offers included in the dataset correspond to those posted for less than one month as of 23 March 2017, the day the data collection ended. Furthermore, in order to have a homogeneous dataset in terms of language, non-English offers (3,544) and further duplicates (490) were eliminated. Although only job offers in English were selected, the dataset was composed of well distributed job offers in all countries. In fact, from the selection process described above, a dataset consisting of 467 job offers from all 28 countries resulted. Specifically, 71 are for positions in the United Kingdom, 70 in Germany, 45 in France and 28 Italy. The rest are distributed in the other 23 countries. Approximately 65 per cent of the job positions of our dataset were offered by multinational organizations. About 45 per cent came from organizations that operate directly in the information technology sector. Finally, since the goal was to identify the impact on large organizations, the fact that the job offer was posted by large organizations with more than 500 employees was considered as an inclusion parameter.
The final dataset consisted of 592 pages of text (Microsoft Word, single line spacing, Times New Roman pt. 10) and 185,945 words.

Preliminary Analysis
The collected data were analysed using NVivo 11 Plus software. NVivo 11 Plus is a Qualitative Data Analysis (QDA) software deemed particularly suitable for the identification of common patterns and common words within small and large volumes of qualitative data. In our case, we selected NVivo 11 Plus as it allows to organize non-numerical unstructured data, obtain information about linkages between words, and to develop models deriving from the interpretation of data. In order to have more information on the organization of the subsequent coding phase, an analysis of the 100 most common words in the dataset was performed (see Table  1 and Figure 1).  The most frequent words that emerged from this preliminary analysis were 'data ', 'experience', 'business', 'team', 'big', 'work', 'skills', 'analytics', 'solutions' and 'development'. These terms were the starting points of the coding analysis, together with the professional figures required in the big data era and the impacted strategic areas already identified by the literature.

Coding Analysis and Framework Design
The dataset was analysed according to the key words. The first step was to identify the key words related to the main professions regarding big data, thus identifying the first level of the framework. Then, based on the main tasks, the strategic impacts of them were recognized. The coding analysis of the dataset was carried out through the word tree function of NVivo 11 Plus, which creates word trees through the analysis of the co-occurrences of words in the text (an example is shown in Figure 2). In the specific case of this research, each word tree has a theme as its starting point from which the branches representing the characteristics of the theme start. The use of co-occurrences as the initial nodes in word tree is a particularly useful method in the research as it allows to understand the linkages existing between two words trough the considered text. Then, moving from several existing links it is possible to create more complex trees and models based on the existing associations. The five main professions represent the themes selected for the first level of the framework: data analyst, data scientist, data engineer, data architect and data manager. The branches that originate from here show the main tasks that these professions perform. Regarding the second level, those aspects of organizations' management most affected by the availability of big data, according to the prevailing literature, have been selected as themes: strategic marketing, human resource management, operations management, corporate finance and information management. The branches that originate from there contain information about the strategic impact of these data on large organizations Wamba et al., 2015;Büchi et al., 2018). Source: Our processing obtained through NVivo11 Plus Therefore, it is possible to describe the whole process, from the identification of the keywords to the conceptualization phase. We followed the original four subsequent phases characterizing the so-called documentary coding method (Wrightson, 1976): 1) The text encodingthat is, the identification of the main variables (nodes of the map and word trees starting points), in turn categorized in this specific case as main themes, characteristics and managerial implications. It is worth noting that in this study we already had an absolute starting point represented by the professional figures (which are common to both literature and job offers).
2) The realization of the 'dictionary'that is, the list of all the present concepts and the relative verbalizations to unify those with the same meaning (merging).
In doing this, the authors manually observed all the posts and identified any similar concepts in the various posts expressed differently. This operation was very useful in order to avoid having double second level nodes or replicas of concepts on more levels or in multiple positions of the framework.
3) The elaboration of the so-called 'relationship card'that is, a table that indicates all the causal relationships and connotations between the identified themes, characteristics and managerial implications. In order to do this, of course, first inspiration was taken from the word trees (which represent relationship cards themselves); then the authors re-interpreted all the relationships of each word tree, identified the branches that could serve as a link between multiple separate word trees, and developed an initial draft of the framework.
4) The conceptualization of the framework.
After writing down each single word tree, and after their qualitative analysis, a preliminary map was elaborated through the systematization of the main positions emerging in a single cognitive framework.

Discussion of Results and Managerial Implications
The results obtained from our analysis show the effects of big data on organizations and the challenges and opportunities connected to big data and BDA. The two main contributions of this research to literature are the identification of professionals able to seize the emerging opportunities and the way in which organizations can benefit from big data.
In respect of the first contribution, it has emerged how managers characterized by skills in big data are fundamental. These managers are extremely important in the marketing sector, as well as in the corporate finance, human resource management, information management and operation management sectors. These managers should indeed orchestrate all the other players that will have to deal with these data. Another professional figure of extreme importance is the data analyst, who plays a crucial role in the analysis of data. We observed that data analysts must take care to help managers to develop marketing and operations strategies. The data scientist, on the other hand, is fundamental to the use of big data in financial strategies. Finally, data architects and engineers have roles that are mainly related to the use of big data for operations management and for the modernization and design of information systems.
Regarding the opportunities connected to the availability of big data, numerous implications concerning strategic marketing have emerged. In fact, thanks to them, it is possible both to analyse the markets in an aggregated manner while being able to track the behaviour of the individual consumer. Thus, it emerges that this phenomenon can play a crucial role in terms of consumer intelligence. Moreover, consumers relationship management (CRM) systems, customer communication and online experience can be improved by the newly available consumer data. Thus, big data can be used for more effective marketing campaigns (Faraoni et al., 2019). Specifically, they can help marketers in consumer segmentation and in measuring advertising effectiveness.
With regard to the impact of big data on corporate finance, it emerged that, in addition to being generically useful for risk management and the prevention of online fraud, they play a fundamental role in customer credit scoring (Santoro et al., 2018). In human resources management, big data analysis allows the identification of online talents and online head hunting. As for the impact on information systems, big data are revolutionizing the classic concept (Wang & Cotton, 2018). It is in fact necessary that organizations adapt their information systems and, to do so, the process of collecting and storing information is central, since in the current economic system the value of information, also with a view to reselling it to third parties, is a source of potential income (Rialti et al., 2018). Finally, big data impact both operations management and supply chain management planning . As in any research related to a specific geographical area, results are anyway to be considered as context dependant. In fact, findings derive from how European companies' employees deal with the analysis of big data. This notwithstanding, from the analysis, it emerges that big data are fundamental for marketing intelligence, revenue and financial intelligence, operations intelligence and information intelligence, that together compose big data-driven business intelligence. Therefore, thanks to big data and the 3.0 information systems that analyse them, managers can now make sales forecasts and monitor production constantly and precisely. Given the possibility of predicting the behaviour of the individual customer, moreover, these forecasts are characterized by extreme reliability (Erevelles et al., 2016;Rialti et al., 2017, Fornari et al., 2018. In terms of suggestions for practitioners, big data-driven decision making is therefore emerging as an innovative approach allowing managers to take big databased decisions in combination with their intuition. From this, in respect of the potential managerial implications, it is possible to state that managers should continue to encourage investment in BDA. Such investment, in fact, can provide organizations with innovative technological tools capable of generating significant competitive advantages. In any case, it emerges that these investments must also be linked to investments aimed at changing the organization's culture in order to accept big data inside the decision-making processes. This is necessary because otherwise some internal professional figures (in particular, those not data skilled) could resist the implementation of BDA.

Conclusions, Limitations and Suggestions for Future Research
Based on our analysis, it was found that big data are fundamental, especially for business intelligence improvement and, subsequently, strategic marketing campaigns. Our results are hereby consistent with the existing literature; they help to shed light on why big data and BDA matter. It is clear, in fact, that they must be managerial persons and manage the strategy-making aspect (Fawcett & Provost, 2013;Brondoni, 2015) using data from the business intelligence systems. Furthermore, we have seen that organizations mainly seek to use them to predict customer behaviour.
The results obtained are not fully generalizable, because the job offers explored are located in Europe in a particular period of time, and only job offers from large organizations have been collected. Moreover, in terms of the limitations connected to the method, the coding process performed was supervised in nature: in fact, the data were only partially analysed by automatic software based on well-identified principles. Given such methodological limitations, possible future research should use larger datasets with unsupervised analysis methodologies based on machine learning, such as latent Dirichlet allocation (Blei et al., 2003).
As for the implications for future research, we suggest that future researchers who want to address issues related to big data explore how managers can encourage organizational BDA capabilities. Furthermore, the potential benefits that can be obtained from the entry of big data into the organization are still to be explored in detail. For example, it may be necessary to explore how they change the micromechanisms of the internal operation of large organizations. It would be useful to analyse whether big data can help make large organizations more agile and dynamic, or if they represent a burden. Finally, it is suggested that scholars who want to address the topic from a more academic perspective also review the main theories used in the big data literature.