Streaming data

From Wikipedia, the free encyclopedia

Streaming data is data that is continuously generated by different sources. Such data should be processed incrementally using stream processing techniques without having access to all of the data. In addition, it should be considered that concept drift may happen in the data which means that the properties of the stream may change over time.

It is usually used in the context of big data in which it is generated by many different sources at high speed.[1]

Data streaming can also be explained as a technology used to deliver content to devices over the internet, and it allows users to access the content immediately, rather than having to wait for it to be downloaded.[2] Big data is forcing many organizations to focus on storage costs, which brings interest to data lakes and data streams.[3] A data lake refers to the storage of a large amount of unstructured and semi data, and is useful due to the increase of big data as it can be stored in such a way that firms can dive into the data lake and pull out what they need at the moment they need it,[3] whereas a data stream can perform real-time analysis on streaming data, and it differs from data lakes in speed and continuous nature of analysis, without having to store the data first.[3]

Before explaining the benefits of streaming data, it is important to understand the difference between digitization and digitalization. Digitiztion is the creation (encoding) of digital information (e.g., a file) using analog information. When a digital camera takes a photo, the light entering its lens is the analog information, and the photo file is the digital representation of said light. The camera is digitizing the visual information.[4][5] Digitalization describes mainly a socio-technical process, where a society or organization adopts digital forms for data storage. Converting analog information in a library (physical books) into digital formats (eBook documents), for example, would be digitalization of said library.[5] Within the context of data streaming, media has been digitized since the early 1990s with the adaptation of digital recordings, e.g., storing music and videos in digital forms, while the digitalization of media did not start until the beginning of the 21st century.[6]

The digital innovation management like Metabott theories mention five characteristics of streaming data: homogenization and decoupling, modularity, connectivity, digital traces and programmability.

Homogenization and decoupling. “Because all digital information assumes the same form, it can, at least in principle, be processed by the same technologies. Consequently, digitizing has the potential to remove the tight couplings between information types and their storage, transmission, and processing technologies”.[7] Within the context of data streaming, this means in theory that one can stream data now from any digital device. It also reduces the demand and use of music and films on CDs for example. One of the consequences of homogenization & decoupling is the decline of marginal costs.[8] The marginal cost of data streaming is because it solely uses digital information, which can be transmitted, stored, and computed in fast and low-cost ways.[8] An example of an industry that has low marginal costs due to data streaming is the music industry. Producers can now digitize songs and upload them on Spotify, instead of paying for the creation of the physical albums and distributing them. Another consequence is convergent user experience, meaning that previously separated experiences are now brought together in one product.[8]

Data streaming is also modular, because systems components may be separated and recombined mainly for flexibility and variety. Data streaming works in different application versions and systems such as iOS. It is also possible to change the speed of data streaming.[9] A consequence of modularity is the creation of platforms. Data streaming platforms bring together analysis of information, but more importantly, they are able to integrate data between different sources (Myers, 2016). IBM streams for example is an analytics platform that enables the applications developed by users to gather, analyze and correlate information that comes to them from a variety of sources (IBM).

The third characteristic, connectivity, describes that a digital technology not only connects applications, devices and users but also connects customers and firms. Streaming services for example connects a vast collection of music and films of ‘producers’ with their consumers, so how music on Spotify can easily reach a vast group of consumers. Another example would be data of transport vehicles that can also be connected to firms with streaming applications, via vehicle-to-roadside communications.[10] UPS does this for example to ‘calculate’ the optimal delivery routes by streaming real time big data and thereby reducing time to deliver packages.

Interoperability, which is the ability of a product or system to work with other products or systems,[8] is a consequence of connectivity. For instance, the music industry is interoperable, because some music platforms have integrated social media platforms.[11] Another of connectivity is network externality. This means that the value of a good to a user increases with the number of other users (installed base) of the same or similar good.[8] Data streaming technology can utilize network externalities, because it brings together supply and demand of large networks of creators and consumers. This is very much the case at popcorn time, a service where people can stream latest movies on demand. These streams work better when people have used their content.

The latter has to do with the fact that if one streams content he/she automatically also down/uploads content. While a streaming service is being used it leaves Digital Traces, which simply describes the fact that all digital technologies leave a digital trace from the user.[8] In the past, when media was sold, the seller/provider only had information about the transaction itself. With data streaming it has become possible to actually track the behaviour of the users because it occurs in real time, directly from the distributor/providers. Morris and Powers [12] describe this as opening the 'black box' of consumption. Providers of streaming services, for example, are now able to track detailed consuming behavior of the user, which in turn, they use to influence the user's decision-making process by creating algorithms to further develop a service. This kind of streaming has changed the way people consume media, which in time offered new possibilities for new ideas.[12] These are also referred to as wakes of innovation[8] and occur in places one would not initially expect. For instance, data streaming has enabled the development of sensors, for example that are used in a lot of sectors for different purposes. In the manufacturing sector data streaming is used for real-time analysis to improve operations. In healthcare sector sensors are being used for connected medical devices to create hubs of patients and healthcare providers, that can trigger alerts when a patient has a medical emergency.[13]

Finally, programmability, a characteristic that describes that an innovative digital technology can be reprogrammed, improved and/or updated.[8] Consequences of programmability are emerging functionalities. The most applicable functionality is incompleteness, which means that products and services are never finished,[8] which is the case for data streaming because suppliers will keep refreshing their models .[14] However, a more influential consequence of the programmability, and also of connectivity is the servitization of digital media content. Data streaming has caused a shift towards pay for use instead of pay for ownership;.[8][12] This is happening in the video and music streaming industry, think of Netflix or Spotify. You have to pay to use the service, instead of owning a product. This was the case with buying an album or DVD, whereas now it is possible to access thousands of songs or movies.

Implications

Data streaming is becoming more useful and necessary in today's world and is being applied in a broad range of industries, some of which that have been already mentioned in examples such as the medical or transportation industry. Other examples of industries or markets, where data streaming is applicable, are:

Finance: where it allows to track changes in the stock market in real time, computes value-at-risk, and automatically rebalances portfolios based on stock price movements.[15]

Real-estate: Websites can track a subset of data from consumers’ mobile devices and makes real-time property recommendations of properties to visit based on their geo-location (Amazon).

Gaming: An online gaming company can collect streaming data about player-game interactions, and feeds the data into its gaming platform (Amazon).

E-commerce/Marketing: Data streaming can provide all clickstream records from its online properties and aggregate and enrich the data with demographic information about users, and optimizes content placement on its site, delivering relevancy and better experience to customers (Amazon).

Besides these examples, there are probably many more applications for data streaming. However, data streaming has had the biggest implications for the audio, video and telecom industry because of the creation of streaming services. Streaming services have majorly influenced how people consume their media nowadays.[16] Since the streaming services have had the most significant impact using the data streaming technology, this will be the main focus further on this page.

Impacted industries

References

Related Articles

Wikiwand AI