Section VI: Collections and Audiences

Respondents who use Greenstone to develop digital library collections were asked to describe characteristics of (a) how access is provided to the collections, (b) collection contents, and (c) who are the primary target audiences for the collections.

Access to Collections

Sixty-seven respondents indicated how access is provided to the collections they build using Greenstone. Of these, 76.1% (51 cases) provide access via the Internet; 61.2% (41 respondents) through local networks; 26.9% (18 respondents) through DVD/CD-Rom; and 4.5% (3 respondents) through other means. The other means indicated by respondents included through intranets (2 respondents) and on (a) local client station(s) (1 respondent). One respondent also indicated that a CD had been planned but was awaiting funding, and another that access provision had not yet commenced.

Respondents were also asked where their collections are hosted. Of the 69 participants who answered this question, 69.6% (48 respondents) indicated that the collection(s) is/are hosted on one or more local servers within their organization; 10.1% (7 respondents) that the collection(s) is/are hosted on an external server; 14.5% (10 respondents) both local and external servers; and 5.8% (4 respondents) “other”. Other hosting configurations included on local client machines (2 respondents).

Number of Collections

Respondents were asked how many production and test collections they (or their organizations) had developed in a free text response question. Forty-eight respondents gave quantifiable answers with respect to production collections, and thirty-nine for test collections. While the number of collections can be interpreted as a reflection of the content made available via digital libraries by an organization or individual, it may also be seen in part as a function of administrative and software implementation decisions.

A wide range in the number of both test and production collections was reported. The number of production collections reported by respondents ranged from 0 to “greater than 50”. The number of test collections reported ranged from 0 to “greater than 65”. The median number of production collections was 4.0, as was the median number of test collections. The mode for each type of collection was 1.

Collection Content

The survey included several questions that asked respondents to describe the content of their collections. The descriptions requested include a broad subject classification, format type (i.e., video, audio, text, photographic images, other images, other); content language(s); file types; and broad collection types (i.e., catalog/index to other resources, special collections, institutional repository, electronic theses and dissertations, journal/other publications, archival collections, museum collections, other).

Collection Subject Classifications
Sixty-seven respondents provided subject-based classifications of their collections. A good number (21 respondents) indicated subjects other than those provided via the survey either in addition to or instead of the provided subjects. The latter were coded to correspond with the provided subjects in order to achieve a subject classification at a broad level across respondent answers. While Local interest (30 responses), Social Sciences (30 responses), and Humanities (29 responses) were represented more than other subjects, all subjects were represented. Frequency and percent of responses are indicated in Table 1 below. Note that respondents were instructed to select as many subject categories as necessary to describe their collections.

Table 1. Subject Classifications of Collection Contents (N=67).

Subject Classification Frequency Percent
Arts 13 19.4%
Computer Science / Engineering 11 16.4%
Health Sciences 16 23.9%
Humanities 29 42.3%
Natural Sciences 15 22.4%
Social Sciences 30 44.8%
Local Interest 30 44.8%
General Interest 19 28.4%

Language(s) of Collection Content
Sixty-two respondents indicated the language(s) of collection content. Of these respondents, 59.7% (37 respondents) indicated the collections they build are in one language; and 41.2% that they build collections in 2 or more languages. Sixty-one of the sixty-two respondents who answered this question indicated which language(s) their collection content is in. Of these, most (83.6%) have English language content. Spanish (19.7%) and French (13.1%) were indicated fairly frequently. It is important to note that this survey was only available in English. In total, 33 unique languages were mentioned, which are listed in alphabetical order in English below.

Arabic Hawaiian Portuguese
Amharic Hindi Romanian
Armenian Indonesian Russian
Catalan Italian Slovenian
Chinese Japanese Spanish
Czech Kannada Tamil
Dari Korean Telugu
Dutch Malayalam Tetum
English Maori Thai
French Pashto Vietnamese
German Polish Xhosa

Collection Content Format Types
Text was the most prevalent format type indicated by respondents (92.6%; N=68). About two-thirds (69.1%) of respondents indicated that their collections contain photographic image files; and about a third each of three other format types: “images other than photos” (35.3%); audio (35.3%); and video (30.9%).

Collection Content File Types
The majority of respondents indicated inclusion of the following file types in their collections (N=68):

  • PDF files: 85.3%;
  • Image files (JPEG, GIF, etc.): 75.0%; and
  • HTML files: 51.5%.

In addition to these very common file types, respondents indicated that a great number of other file types are also included in their collections. A full list of file types selected by respondents is indicated in Table 2 below. Other file types that were not listed but indicated in a text response option were: mail files, OGG video files (Theora/Vorbis), files, MPEG video files, Flash (video and audio) files, Dublin Core metadata files, and METS/ALTO XML metadata.

Table 2. File Types Included in Collections (N=68). The frequency and percent of respondents who indicated each file type as included in one or more of their collections is listed in decreasing order.

File Type Frequency Percent
PDF files 58 85.3%
Image files (JPEG, GIF, etc.) 51 75.0%
HTML files 35 51.5%
MS Office files (Excel, PowerPoint, Word, etc.) 28 41.2%
MP3 audio files 26 38.2%
Plain text (.txt) 23 33.8%
XML files 19 27.9%
ISIS database files 13 19.1%
MARC metadata 13 19.1%
RealMedia files 11 16.2%
Open Archive data 9 13.2%
Compressed files (tar, jar, zip, gzip or bz) 7 10.3%
BibTex files 6 8.8%
METS files (Greenstone format) 6 8.8%
DSpace archive format 3 4.4%
LaTeX files 3 4.4%
Source code (C/C++, Perl, Shell) 3 4.4%

Collection Types
Respondents were asked to describe what type(s) of collection(s) they had developed or were currently developing using Greenstone. Respondents were instructed to select from a set of broad descriptive terms and/or provide their own terms to describe the type(s) of collection(s). The table below indicates the frequency with which respondents selected the provided terms to describe their collections. A list of respondent-provided terms follows the table.

Table 3. Terms used to describe types of collections (N=66). Reported frequencies and percents are the number of respondents who indicated a given term to describe collection type.

Collection Type: Terms Frequency Percent
Special collections 41 62.1%
Archival collections 35 53.0%
Journal/other publications 30 45.5%
Institutional repository 26 39.4%
Electronic theses and dissertations 20 30.3%
Catalog / Index to other resources 18 27.3%
Museum collections 7 10.6%
Other 10 15.2%

“Other” terms: Digital Assets Management (for commercial organizations); Student project reports; Teaching and learning documents; News; Articles and Reports; Links to Knowledge objects; Conferences; Question papers; Training modules; eBooks; Bibliographic collections; Specialized web content; Multilingual; Public domain texts; and Original source legal documents.

Collection Audiences

When asked the broadest audience to whom access to digital library materials is provided, the majority of respondents (62.1%; N=66) indicated the “general public”. Others provide access to affiliates of their organizations (18.2%), organizational staff only (9.1%), or for themselves and/or other private parties (4.5%). Of those who indicated another level (4 respondents, 6.1%), one provided access to a regional international group of libraries and librarians, one to faculty and students, and two had not yet defined or implemented access policies. One respondent indicated that access is provided for one specific government department only.

Respondents were also asked to select from a set of broad terms to describe characteristics of the primary target communities for their collections or the collections they support. The terms related to educational contexts, age groups (i.e., children/teens, adults, elderly), geographical setting (i.e., urban, rural, suburban), and whether the collections are intended for multilingual audiences. Sixty-eight respondents indicated one or more terms to describe the primary target communities.

Audiences: Collections for Academics and Researchers
Of those who indicated terms to describe the primary target communities for their collections, 60.3% indicated “Academic: Researchers”, 57.4% “Academic: Students”, and 45.6% “Academic: Educators” (N=68). Additionally, two respondents wrote in responses indicating researchers and scientists/engineers as the primary target user communities. Overall, 50 of the 68 respondents (73.5%) described the target audience as related to academia by indicating that academic researchers, students, and/or educators were considered to be one of the the primary target audiences.

Audiences: Life Stage Groups
Just over half of the respondents (51.5%) described the primary target audience as “Adults”. The more specific term “Elderly” was selected by 10.3% of respondents, and “Children/Teens” was selected by 7.4% of respondents. In total, 37 of the 68 respondents indicated at least one age group/life stage of the primary target audience(s).

Audiences: Geographical Setting
Fewer respondents (20 of 68) indicated a geographical setting for their primary target audiences. Frequencies and percentages of those who did are as follows (N=68):

  • Urban: 19.1% (13 respondents)
  • Rural: 16.2% (11 respondents)
  • Suburban: 13.2% (9 respondents)

Respondents were also asked to indicate the country or countries in which the primary target audience(s) is/are located. The majority of respondents who answered this question identified one country (70.7%; N=58); and 29.3% indicated multiple countries.

Audiences: Other Demographic Characteristics
The “general public” was indicated to be a primary target audience by just under half of the respondents (42.6%; N=68). Primary target audience(s) were described as “multilingual” by just over a third of respondents (35.3%; N=68).

