|
Strategic Overview
proMISS is a unique software tool that uses advanced statistical methods to impute missing values in
databases. It finds particular application in companies and organisations that have to analyse or process
incomplete data.
proMISS is of benefit to any company or organisation that owns, maintains or processes
(large) databases with missing values, and which requires the missing values to be replaced by
'well-based' imputed values.
The Problem
An ever increasing number of companies and organisations invest large sums of money in collecting
information on their current and potential customers. This information, which comes from many sources,
forms the basis of many companies' marketing programmes, and so it is desirable for the databases to be
as complete as possible.
Unfortunately, some customers (who supply the raw data) do not provide complete information, and so
the value of their records is reduced. Indeed, on average, about 30% of databases are incomplete. In
other words, 30% of the investment spent to create the database has been wasted. The main effect of
having incomplete databases is to limit the amount of new information that can be gained because records
with at least one missing value cannot be analysed statistically.
In a fully populated database all the records are available for statistical analysis and modelling. The limit on the effective size of a partially populated database (caused by the presence of the missing values) may result in a lot of valuable information being ignored when the data are analysed and modelled. In the worst case, a small but very important subset in the database may not be analysed at all, leading to a loss of revenue and, possibly, customers.
The Solution
proMISS offers a novel and practical solution to the problem of missing values in databases by using new and advanced algorithms to impute the missing values. Thus, the result of using proMISS is a fully populated database. This means that the whole complete database rather than just those records with a valid entry in every cell can be analysed and modelled. It is in this way that proMISS adds value and new information to databases, thereby increasing the 'stock of knowledge' held in the databases.
The Market
The types of company that would benefit from proMISS include:
- data mining software suppliers;
- data warehousing companies;
- statistical analysis software suppliers;
- customer relationship management (CRM) companies;
- database analysis companies;
- marketing (analysis) consultancies;
- direct marketing agencies and consultancies;
- lifestyle analysis companies;
- financial institutions offering loans, mortgages, insurance and other financial services;
- credit reference agencies;
- market research companies; and
- mail-order catalogue companies.
Features and Benefits
proMISS has many features and benefits. They include:
- It only uses the information in the database (no external data are required).
- The size of the database, i.e. the number of fields and the number of records, has no effect on the search and imputation models.
- The order of the records does not affect the imputed values.
- The missing values in each record are imputed independently of the missing values in all the other records.
- All the missing values in the database are imputed in one run.
- The concepts of geometrical and statistical similarity are used to find the records with similar characteristics, and it is from these 'similar' records and the known information in each record with missing values that the missing values are imputed.
- There is no limit on the percentage of missing values in the database (although obviously as this percentage decreases, the more likely it is that the imputed values will be 'better').
- A validation module enables users to compare the true values with the imputed values in each field. For text variables the percentage correctly imputed is shown, for ordinal variables a cross-classification matrix is shown, and for numeric variables the mean percentage error and mean absolute percentage error are shown.
- The input database can be in any database format.
- A fully populated database is available at the end of the imputation. This can be analysed like any other fully populated database.
Technical Overview
|
|
|