Logo-API - Automatic load of logosr

About the project

Logo-API - Automatic load of logos.

This is a must have tool for any website with lists of references to other websites.
If properly used, it makes lists of links look better and more comprehensive.
For most people scaning a logo if much easier than reading a name of some company.

I use this service on my 'Technolgies I use' page to display icons for links in one move.

Web Scraping

Extracting information from HTML+CSS

It was a challenge of the kind I like :-)
The easiest part was extracting favicons and apple-touch-icons, obeying all the rules a browser would obey.
The fun started later on, with logos!
I can't expalin the whole algorithm in here. In short, it involves parsing HTML and CSS and the combination of the two, with some sophisticated probabilistic algorithms.

Parsing is just half of the story. You still need to manipulate the parsed document.
I used a Web Scraper tool that is very fast at parsing big pieces of HTML and allow traversing the parsed document the same way jQuery does, only on server side using PHP.
It's called hQuery.php.