Monday, March 02, 2009

The user-agent string

A user agent is the client application used with a particular network protocol; the phrase is most commonly used in reference to those which access the World Wide Web, but other systems such as SIP use the term user agent to refer to the user's phone. Web user agents range from web browsers and e-mail clients to search engine crawlers ("spiders"), as well as mobile phones, screen readers and Braille browsers used by people with disabilities. When Internet users visit a web site, a text string is generally sent to identify the user agent to the server. This forms part of the HTTP request, prefixed with User-Agent: (case does not matter) and typically includes information such as the application name, version, host operating system, and language. Bots, such as web crawlers, often also include a URL and/or e-mail address so that the webmaster can contact the operator of the bot.

The user-agent string is one of the criteria by which crawlers can be excluded from certain pages or parts of a website using the "Robots Exclusion Standard" (robots.txt). This allows webmasters who feel that certain parts of their website should not be included in the data gathered by a particular crawler, or that a particular crawler is using up too much bandwidth, to request that crawler not to visit those pages.

The term user agent sniffing refers to the practice of websites showing different content when viewed with a certain user agent. On the Internet, this will result in a different site being shown when browsing the page with a specific browser (e.g. Microsoft Internet Explorer). An infamous example of this is Microsoft Exchange Server 2003's Outlook Web Access feature. When viewed with IE, much more functionality is displayed compared to the same page in any other browser. User agent sniffing is mostly considered poor practice for Web 2.0 web sites, since it encourages browser specific design. Many webmasters are recommended to create an HTML markup that is as standardized as possible, to allow correct rendering in as many browsers as possible.

No comments: