The hottest Kunlun data industry big data 7 misund

  • Detail

Kunlun data: seven misunderstandings of industrial big data

there are seven misunderstandings, which are divided into three batches, including planning level, technical route, and implementation level, and possible problems

unspoken: data analysis without business logic is a waste

case: I have explored big data analysis of air compressors with my partners. In the application of big data analysis involving industrial equipment, there are many cases of PHM (equipment health maintenance management)

but before jumping to intelligent operation and maintenance, it is best to discuss the expected business logic first. Our goal is to do third-party operation and maintenance business (big data is used to improve maintenance efficiency), or to support a new business model through the data of air compressors, such as supply chain finance, business process optimization, energy efficiency optimization, etc

first of all, let's analyze. Regardless of the actual conditions, assuming that the technology is 100% successful, what can polyethylene terephthalate itself be a high molecular material that is easy to crystallize after the realization of this technology

how much is the operation and maintenance cost of this compressor in a year? If I'm the equipment manufacturer, who am I doing intelligent operation and maintenance to help achieve what? Where does my income or my cost come from

if the income of intelligent operation and maintenance, including the profit margin, is very low, and the whole industry chain and the whole industry have not made efforts, it is advisable to change the business logic. Don't rush to imitate some cases that seem to be mature in other fields, and first torture your soul about whether your business essence is reasonable

take wild geese as soup: utopia that breaks away from constraints is difficult to succeed.

Case: this kind of problem usually appears in some scheduling optimization and operation research optimization. When it comes to production scheduling or operation research optimization, it is easy for everyone to want to do global optimization. This is everyone's dream. Only global optimization can have room for improvement. But in reality, we should analyze specific problems in a specific way, and we can't break away from the constraints of the physical world

for example, one of the most important problems in the container terminal of the port is the yard optimization of the terminal. Because the storage yard determines the capacity, the wharf in Hong Kong is relatively crowded. There is a lot of room for comparison and optimization, and the benefits are also great. We need to analyze how to quickly support loading and unloading after the ship arrives

but how big is the business scope for yard optimization? Customers want to optimize the storage yard end-to-end. When a container comes, it is necessary to determine the optimal location. It is ignored here that in order to optimize the yard,

first of all, there must be a relatively clear prediction of the arrival volume of containers, which must be relatively accurate

second, I want the maintenance cycle data of the whole equipment, the scheduling data, the ship schedule data and other related data

third, we should avoid truck congestion in the yard. If all containers of the same ship are put together, local congestion may be caused during shipment

in reality, it is difficult for you to obtain such complete data, and there are many constraints in the middle. First of all, it is difficult to make accurate traffic prediction. Second, the arrival of the ship is supposed to have a fixed period, but there are also some weather factors, such as the current epidemic factors, which are not completely controllable. In this case, if the optimization is based on a large number of assumptions, the effect may be discounted

not only in the dock, but also in the scheduling optimization in the factory. Although we pursue global optimization, we still need to consider the actual conditions, which data are not available, including how high the cost saved after obtaining it, we should seriously consider it. Of course, we will try our best in technology

castles in the air: data analysis that is mismatched with organizational morphology is difficult to implement

we have also done some in the past, which are technically feasible, and even the accuracy is good. For example, in equipment fault diagnosis, the fault prediction of some major components, although the samples are relatively small, combined with some mechanical and data mining knowledge, sometimes we can make a good result

but when the result came to the ground, everyone was very depressed. The problem is that I found a problem. Sometimes it indicates that under the existing assessment system, it may indicate that the existing operation and maintenance team is not in place, and the past regular maintenance is not very good. At this time, it is usually difficult to expect the on-site front-line team to give real or timely feedback

you can imagine that various topics, including quality improvement, will encounter similar problems. Many predictions outside industry and even in business will encounter similar problems when they are implemented. This topic happens to be the responsibility of a department, which uses its own experience to do it every day. Now you can use data analysis better than before, unless this project is under the jurisdiction of a department and makes certain adjustments in organizational form, Otherwise, it is usually difficult for him to really use it

avoid the truth and avoid the emptiness: follow the fashion, talk about routines, and forget the originally feasible practices

for example, the box office prediction of the cinema, the prediction before the release, will determine how many times to arrange the film at that time, what time period to arrange, and what kind of film arrangement strategy to do

at that time, Google issued a paper saying that the box office of a film can be accurately predicted through Google search volume. This is a paper with high citation. At that time, many people in China were very excited, but when this method was applied to China, it was found that the accuracy was not ideal. In fact, we are skeptical. What determines the fundamentals of a film's box office

for example, how well does the movie type match the region? Is it a horror movie or something? There are different cinemas in the cinema line, such as some living quarters with CBD; For example, Chengdu likes to watch ancient tomb films, Guangzhou likes to watch Cantonese films, and Shanghai likes to watch petty bourgeoisie films. For example, Harbin is generally better for anti Japanese films and martial arts films. Are the preferences of people in different regions reflected in the search volume? In addition to the theme, there is also the activity of actors, the activity in social media, and what awards have the director won recently? Including what is the theme? Later, we added a lot, such as geographic information, including some past sales trends between different theatres, including the growth trend of actors' influence on what social media, and who matches better among directors, actors and actors

of course, we hope to predict the box office and spare parts demand in the simplest way, but we still need to ask more essential questions and consider the fundamentals. Sometimes it is not difficult to predict, but to consider some external artificial and uncontrollable impacts, including macroeconomic changes

in fact, what data analysis should do most is some scenes that appear from time to time in reality, but people's experience is not good, and they especially expect data to help

this requires us to anticipate some situations in data processing, even if they cannot be supported based on the current model and data, and at least know what the scope of application of this technology is, rather than making a particularly good thing in a specific situation, and then expanding it

any model is a simplification of physical design, which cannot be separated from the physical world. Now speaking of digital twins, it also depends on whether the model is used in the R & D stage or the operation and maintenance stage. After all, there can be no model that is 100% equal to the physical world

in reality, what are the fundamentals? For example, to predict the demand for bulk materials, we need to sort out the supply and demand of bulk materials. What are the driving factors? There is no need for special quantification. First, sort out the related factors. For example, when doing equipment operation optimization and fault monitoring, we should not use all kinds of complex formulas at first. In fact, we should understand the influence relationship and mobilization relationship between basic quantities

evasion: in the name of science, do scientific things with a non scientific attitude

do industrial data analysis, we need to know the use boundary of a model, no model can solve all problems, or apply to all situations, unless it is a pseudoscience

what is the real time-consuming six stages of data analysis? It is the earliest understanding of business problems, which is also the most critical stage. Of course, CRISP-DM does a good job of social division by default. Assuming that data analysts only do data mining and data analysis, it can't be so ideal in reality. It's possible that the problems that others have figured out for you may not be correct. To some extent, we need to redefine the problems, not just understand them

of course, there are some cross majors to understand. The background knowledge of unfamiliar fields is very important. You don't know the principles of chemical engineering and the basic dynamics of electricity. It's easy to dig out some common sense when you do it blindly, which is a waste of social resources

the second time-consuming thing is data preparation. Under normal circumstances, data mining is very easy to do, but most of the time, data mining is to deal with some situations that seem not normal, but often appear in reality. As a rigorous data analyst, we should see some signals that many business experts didn't expect in the early stage from the data, even those that he thought could not appear in the data, or those that he was used to, didn't realize and didn't introduce, which sometimes greatly affected the accuracy of the analysis model. Availability is very important for automatic execution

data is just a representation. As data analysts, our attitude is to believe in data, but we are not superstitious about data, because the data collection method itself may be biased. For example, the "survivor bias" mentioned before, only the plane that was not knocked down flew back, so we lost a lot of information, and the weak links were knocked out

the collection method and accuracy of some data, including the sample selection of data, may mislead us. It seems to be doing well, but in fact, the data itself does not reflect the physical reality. Including the installation position of the sensor and the measurement principle of the sensor itself, it may have a certain impact on the data itself. At this time, it is necessary to be more cautious and optimistic to mine. At the same time, electromechanical should be as deliberative as other engineering disciplines, which is a very tangled and painful process

to some extent, the whole process of data analysis is the same as the traditional engineering method. Everything is based on certain assumptions, put into a reality, or relatively objective reality to verify, after verification, repeatedly observe, so that it is possible to reflect this physical reality from a certain law

what is a scientific attitude? We should ask repeatedly that everything can be falsified or verified, and nothing is absolutely right or wrong

give up the basics from the end: complicate a simple problem

data analysts sometimes inadvertently complicate a problem. Sometimes some process mechanisms are very simple, and the fundamentals are there. There is no need to pull this problem to a deep learning or a profound method

the simple problem should be dealt with simply because the market is prosperous. Don't spend too much time in unimportant places. Many data analysts are more serious, like me

Copyright © 2011 JIN SHI