Thursday, August 27, 2009

programming to perform data scrape

Web Scraping Basics
Data for many websites is stored in a database and rendered on a website in a consistent layout with various CSS and HTML markup. The most efficient way to use that data is to get direct access to the database. However this may not be allowed or not be affordable. The alternative is to scrape the website using a scripting language such as PHP or a compiled language such as C#. A scraper works by acting as a browser and downloading the HTML of the page that contains the data you want. It then uses "Regular Expressions" (also called "Regex") to parse the HTML and collect the data. The data is then stored in a database that you have access to for later use.
These programs can be very difficult and time consuming to write. Many businesses have sprung up allowing for an esier method, but only one allows for non programmers to easily harvest web data and use it for their own needs. That company is Mozenda

No comments:

Post a Comment