Monday, October 12, 2009

Data extraction

Data extraction is the act of retrieving unrelated or badly structured data from disparate sources and organizing it into a usable form. Import of data is sometimes followed by Data Transformation and the addition of Metadata, to repair or fill in the gaps in the code as necessary.

Some unstructured data forms include Web Pages, scanned text, mainframe reports, PDF documents, emails and spool files.

Adding structure to data can be done in a number of ways. A table-based method identifies specific areas to be organized in emailed form letters, identifying worker attributes, and experience using a form-reading algorithm. Analytical algorithms can link disparate sources of data together in an Associated Rule approach.

Screen Scraper Software

