Text Documents and Special File Formats
The majority of files found through Internet searching are HTML files. These look similar to the page you are currently reading. Special attention is needed when searching for information that may be found in text-based file formats, such as PDF or Microsoft Word. Examples of these files would be company annual reports or forms, such as IRS tax forms (e.g., Form 1040). While Yahoo! and Google search text within PDF and some Microsoft Office documents, files created with other programs may not be located. All crawlers do not search through all documents posted on the Web. If searching for something that may be printed in a file format other than HTML, it is worthwhile to take a look at your chosen search engine's advanced search screen. These normally provide a quick summary of the engine's search capabilities.
PDF files are increasingly prevalent on the Web and are used by many government, corporate and educational sites to provide resources that were originally created in other file formats, such as word processing, spreadsheet or desktop publishing formats. PDF, which stands for Portable Document Format, was developed by Adobe Systems and provides an international open standard for document distribution. PDF files require the free Adobe Acrobat Reader for viewing or printing.
Multimedia on the Web consists of images (photographs or graphics, single display or limited animation), audio files, videos, and a combination of all these. There are many types of files available, including image files (jpg, gif, etc.), audio files such as MP3s, streams of historic speeches or live radio broadcasts and multimedia files that incorporate both audio and video, such as a music video or a live TV news stream. Unfortunately, unless someone has also included in the Web page a written text of the speech or transcript of the show you are searching for, it will not be found using a normal search strategy.
The Web is a rich source for graphics and photographs, but be aware that most images on the Web have cryptic filenames that may not correspond to the subject of the image, such as libimg.gif or comp.jpg, so a standard keyword search is not likely to be successful. General searches usually do not produce audio or video files since these files often lack corresponding text descriptions to connect with your search terms.
You can search for these special file formats by using one of the general-purpose search engines that provide multimedia searching, or you can use a metasite devoted to multimedia files or a particular type of file format.
The following chart provides a list of common media file types found on the Web, along with some of their extensions. Some file formats are not supported by some operating systems or Web browsers, or may require a browser plug-in.
||Audio Video Interleave
|.jpg or .jpeg
|Joint Photographic Experts Group
(pronounced jiff or giff)
|Graphics Interchange Format
||Musical Instrument Digital Interface
||Quicktime Video Clip
||MPEG, Audio Layer 3
||Portable Document Format
||Portable Network Graphic
||Graphic, Video, Audio
||Wave Form Audio
For most files on the Web, you will need to download the following software: Adobe Acrobat Reader, Adobe Flash Player, Shockwave, RealPlayer, and Quicktime. St. Petersburg College offers a website that helps students determine if their computers have the necessary programs to view the majority of Web content: https://www.spcollege.edu/Central/utilities/systemCheck/
The following general search engines and resource directories allow you to search for various types of multimedia formats:
General Search Engines with Multimedia Search Capabilities
- Advanced search features in AltaVista include limiting results to various textual file formats, including PDF, Microsoft Office, and other file types in their database. AltaVista searches for images, MP3/audio or video simply by selecting the appropriate tab from the initial search screen. Image search allows you to search by types of image, including photos, graphics, Buttons/Banners, color or black and white, and image size. Audio files are searchable by type: MP3, WAV, Windows Media and Real Audio. The video search looks for such formats as AVI, MPEG, QuickTime, Windows Media, and Real Video. You can specify the length or duration of audio and video files, limiting to less than or more than one minute.
- Advanced search features in Google include limiting results to various textual file formats, including PDF, Microsoft Office, PostScript, and other file types in their database. Google also offers the ability to "View as HTML," allowing users to view the contents of these file formats even if the appropriate application is not installed. Google multimedia search engines are Google Images and Google Video. Google Images includes searching by size, file type and color.
Multimedia General Search Engines (Webcrawler-based)
Human Evaluated Multimedia Resources
- Fagan Finder Image Search allows image searching in image databases and search engines.
- Image Finder, from UC Berkeley Library, is a metasite that allows searching of more than a dozen image collections.
Multimedia Databases with Resources Contributed by Internet Users
Resource directories created by Internet users are exploding on the Web research scene. Similar to the webcrawler-based search engines, these resources are available on the Web, but few qualitative guidelines exist for the usefulness of the information contained in these databases. Wikipedia is an example of a text-based resource that contains user-defined information. In the multimedia area, blinkx is the self-proclaimed world's largest video search engine. YouTube is the most well known, but iPod users have created various directories, too (for example, Podcast Directory).
YouTube is a useful tool for presentations in the academic classroom because of its limitations to streaming video. However, beware of copyright restrictions (see below) even in the classroom. A more detailed description of YouTube is provided in the box below.
|YouTube is a metasite that searches videos contributed to the YouTube database, including individual creations and snippets from professional productions. The primary focus of YouTube is its online video streaming feature rather than as a downloading service. There are services that bypass the restrictive streaming capabilities of YouTube and enable downloading, but beware of copyrights on some videos. Such copyrights make it illegal for downloading and, particularly, re-distribution of the material. Besides searching any Internet video content, Google Video searches the YouTube database.
Beware of Images!
You may notice when searching AltaVista Images that a new setting option appears in the top, right corner. It says: Family Filter: on. Click on this setting and you will have the option to filter your search results, not filter your search results, or only filter multimedia results. Many search engines now include this option to use their automatic filters.
Filtering is not a perfect process. The goal is to keep out materials that may be considered obscene; however they will also often exclude reputable sites because of terms used within the page. This is especially true when researching medical conditions such as breast cancer. Generally an expert searcher can avoid these materials by evaluating their search results screen before clicking on the mentioned links. However, when searching for images, your results screen will usually display thumbnail formats of the retrieved images. Therefore, search engines will generally default to a higher level of filtering for multimedia searching. If you are sensitive to graphic content, think carefully about your search terms and notice whether or not your search engine is using a filter before executing an image search.
A Word on Copyright
Documents found on the Web are protected by copyright law. This means that text, images and/or media files should not be reproduced without permission from the owner. Some government sites, such as the American Memory Project from the Library of Congress allow limited use. Always check for a copyright, permissions or rights page before reproducing any information from the Web. Generally, use of information from the Web is considered fair use (which means it is ok) when used for an academic project, live class presentation, or paper, as long as proper credit is given to the author or creator. This is normally shown through a Bibliography or Works Cited page. There will be more discussion on copyright in Lesson Seven.