Data Mining And Data Ware Housing
1.Business Data Analysis:
Popular commercial applications of data mining technology are, for example, in direct mail targeting, credit scoring, churn prediction, stock trading, fraud detection, and customer segmentation. It is closely allied to data warehousing in which large (gigabytes) corporate databases are constructed for decision support applications. Rather than relational databases with SQL, these are often multi-dimensional structures used for so-called on-line analytical processing (OLAP). Data mining is a step further from the directed questioning and reporting of OLAP in that the relevant results cannot be specified in advance.
3. Scientific Data Analysis:
Rules generated by data mining are empirical - they are not physical laws. In most research in the sciences, one compares recorded data with a theory that is founded on an analytic expression of physical laws. The success or otherwise of the comparison is a test of the hypothesis of how nature works expressed as a mathematical formula. This might be something fundamental like an inverse square law. Alternatively, fitting a mathematical model to the data might determine physical parameters (such as a refractive index).
On the other hand, where there are no general theories, data mining techniques are valuable, especially where one has large quantities of data containing noisy patterns. This approach hopes to obtain a theoretical generalisation automatically from the data by means of induction, deriving empirical models and learning from examples. The resultant theory, while maybe not fundamental, can yield a good understanding of the physical process and can have great practical utility.
3. Scientific Applications:
In a growing number of domains, the empirical or black box approach of data mining is good science. Three typical examples are:
* Sequence analysis in bioinformatics:
Genetic data such as the nucleotide sequences in genomic DNA are digital. However, experimental data are inherently noisy, making the search for patterns and the matching of sub-sequences difficult. Machine learning algorithms such as artificial neural nets and hidden Markov chains are a very attractive way to tackle this computationally demanding problem.
* Classification of astronomical objects:
The thousands of photographic plates that comprise a large survey of the night sky contain around a billion faint objects. Having measured the attributes of each object, the problem is to classify each object as a particular type of star or galaxy. Given the number of features to consider, as well as the huge number of objects, decision-tree learning algorithms have been found accurate and reliable for this task.
* Medical decision support:
Patient records collected for diagnosis and prognosis include symptoms, bodily measurements and laboratory test results. Machine learning methods have been applied to a variety of medical domains to improve decision-making. Examples are the induction of rules for early diagnosis of rheumatic diseases and neural nets to recognise the clustered micro-calcifications in digitised mammograms that can lead to cancer.
The common technique is the use of data instances or cases to generate an empirical algorithm that makes sense to the scientist and that can be put to practical use for recognition or prediction.
Conclusion:
(*) Data mining is still an area of current research, and its problems are not yet fully solved. Nonetheless, despite these difficulties, data mining offers an important approach to achieving values from the data warehouse for use in decision support.
(*) Data mining is emerging as one of the key features of many homeland security initiatives. Often used as a means for detecting fraud, assessing risk, and product retailing, data mining involves the use of data analysis tools to discover previously unknown, valid patterns and relationships in large data sets.
(*) Data mining offers great promise in helping organizations uncover hidden patterns in their data and thus would play a major role in organization and evaluation of data and its patterns in the future.
Rate This Article:
Posted On: Friday, 12 October, 2012 - 13:18