Exploring Topics of Interest of Mendeley Users

This paper presents a fine-grained overview of the usage behavior and topics of interest of different types of users in Mendeley. The analysis is based on 1.2 million Web of Science indexed publications published in 2012. The disciplinary differences in the reading (saving) patterns of different types of Mendeley users are identified and depicted using VOSviewer overlay visualizations. The findings show that compared to other fields, publications from Mathematics & Computer Science have the lowest coverage in Mendeley. Publications from the Social Sciences & Humanities receive on average the highest number of readers in Mendeley. The highest uptake of Mendeley is by students, but this differs across fields. Professors, students, and librarians are mainly active in the Social Sciences & Humanities, a field of science with a relatively low citation density in Web of Science. In contrast, researchers and other professionals are mainly active in fields with a relatively high citation density such as the Biomedical & Health Sciences and the Life & Earth Sciences. In addition, it seems that researchers and professionals are relatively more interested in practical, methodological, and technical oriented topics while professors and students are attracted by the more educational and theoretical oriented topics. These different usage patterns among user types possibly reflect the way in which scholarly publications are used for scientific, educational, or other professional purposes. This information could inform relevant stakeholders, such as researchers, librarians, publishers, funders, and policy makers of the scientific, educational, or professional values of publications.


Introduction and Background
The social reference manager tool Mendeley is a prevalent source of altmetric data. It is known that the coverage, density, and distribution of Mendeley readership 1 varies substantially across disciplines (Costas, Zahedi, & Wouters 2015a). Depending on the field, Mendeley covers 45% to 90% of the publications in the Scopus database (Thelwall & Sud 2016), 60% to 90% of the publications in the Web of Science (WoS) database (Zahedi, Costas & Wouters 2017), and more than 80% of the publications published by PLOS (Priem, Piwowar & Hemminger 2012). Fields from the Social Sciences & Humanities (such as Sociology, Communication, Business, Psychology, Anthropology, Educational Research, and Linguistics) have a relatively high coverage and a relatively high number of readers in Mendeley (Costas, Zahedi & Wouters 2015b;Hammarfelt 2014;. In contrast, fields from Mathematics & Computer Science (such as Analysis, Algebra and Number Theory, Geometry and Topology) show a relatively low coverage and a relatively low number of readers (Thelwall 2017;Zahedi, Costas & Wouters 2014). Moreover, readership and citation counts per publication have similar skewed distributions across different fields of science (Costas et al. 2016). Hence similar to citations, normalization approaches for correcting field differences for Mendeley readership have been suggested (Costas Perianes-Rodríguez & Ruiz-Castillo 2017;Haunschild & Bornmann 2016).
In some previous studies, the readership activity of Mendeley users has been analyzed based on the self-declared academic disciplines of users. For example, co-readership based on the publications in the libraries of users and the self-declared academic disciplines of users have been used to measure and depict the similarity of subject areas within the field of Educational Technology (Kraker et al. 2015). The analysis of the network of co-readers in Mendeley also showed that students and postdocs in Mendeley have more common topical interests than other types of users in Mendeley (Haunschild Bornmann & Leydesdorff 2015). In other studies, existing field classification systems have been used to compare readership between different types of users across different fields of science. The readership activity of Mendeley users has, for instance, been analyzed using the 5 main disciplines and 22 sub-disciplines from the NSF classification system (Haustein & Larivière 2014;Mohammadi et al. 2015), the 250 subject categories available in the WoS database (Zahedi & Van Eck 2015), and the 310 subject areas available in the Scopus database (Thelwall 2017). The results of these studies show that substantial differences in readership practices between (sub)fields and user types exist. Moreover, the extent to which the number of readers correlates with the number of citations varies across different (sub)fields and between user types.
Most of the previous studies are based on restricted Mendeley data (only top three user types per publication) and focus on broad fields of science. It is not known yet how readership per user type varies across detailed micro-level fields and how these user types differ in their topics of interest. This is the first large scale and systematic analysis of readership activity across detailed micro-level fields in which complete data on the readership activities of Mendeley users is taken into account. Also, in addition to the overall readership activities of Mendeley users, the relative activity of different types of Mendeley users has been considered in this study. In this way, we have been able to uncover the topics on which different types of users focus relatively strongly. Moreover, a new view on readership statistics has been introduced by looking at the number of readers of publications normalized by the number of citations received by those publications. Combining these different usage statistics and patterns among user types provides insight into the way in which scholarly publications are used for scientific, educational, training, and other practical purposes. In this way, readership statistics could be used by relevant stakeholders (researchers, librarians, publishers, funders, policy makers, etc.) to get more insight into the full impact of scholarly publications. Hence, to determine whether information from Mendeley would be helpful in this respect, we will address the following main research questions in this paper: This paper is organized as follows. We first describe our dataset and analysis methods. Results are then reported. The paper concludes with a summary, a discussion of some key observations, and suggestions for additional work.

Data and Methodology
This study is based on a dataset of 1,196,226 publications collected from the WoS database. The dataset includes all publications of the document types ' article' and 'review' published in 2012 with a Digital Object Identifier (DOI). 2 The DOIs of the collected publications were used to retrieve readership data from Mendeley by using the Mendeley REST API in July 2016. 3 This readership data also includes information on the ' academic status' of users as indicated by the users in their Mendeley profile. 4 To minimize the effect of national and disciplinary differences between the designations of academic and professional appointments and positions, users were grouped into five broad user types. Based on their ' academic status', users were grouped into the following user types: • Students: students (Bachelor), students (Master), students (postgraduate), doctoral students, and PhD students.
• Librarians: librarians or other library professionals.
For each publication, readership counts for all users and readership counts for individual user types were calculated. Citations were counted until the end of week 26 (July) of 2016 using the in-house version of the WoS database of the Centre for Science and Technology Studies of Leiden University. The publications in the dataset have been assigned to 4,113 micro-level fields and to five main fields of science. The 4,113 micro-level fields have been constructed algorithmically based on 282.4 million citation relations between 17.8 million publications from the period 2000-2015 indexed in the WoS database (Waltman & Van Eck 2012). The 2 About 13% of the 2012 WoS indexed articles and reviews do not have a DOI and are therefore excluded from the analysis. 3 It is suggested in the literature (Zahedi, Haustein & Bowman, 2014) that the best strategy to retrieve readership data using the Mendeley API is to perform searches which are based on a combination of DOIs and article titles. In this paper, however, we have chosen to search and match publications based on DOI only in order to keep our data collection accurate and transparent. Matching based on DOI only may lead to missed matches but will avoid wrong matches. 4 The ' academic status' is self-declared by Mendeley users. It therefore may happen that users forget to update or simply do not update their ' academic status' in their Mendeley profile when it has been changed, e.g., due to a job change or promotion. This should be kept in mind while interpreting the results of this study.
definitions used in the 2016 version of the CWTS Leiden Ranking 5 have been used to aggregate the 4,113 micro-level fields into five main fields of science. Table 1 provides the number of publications in our dataset assigned to each of the five main fields of science. Figure 1 shows a visualization that provides an overview of the 4,113 micro-level fields and the five main fields of science used in this study. Each circle represents a micro-level field. The size of a circle indicates the number of publications in our dataset in a micro-level field. The larger the circle, the larger the number of publications in our dataset. The distance between two circles approximately indicates the relatedness of two micro-level fields, where the relatedness is determined by citation relations between the fields. In general, the smaller the distance between two circles, the stronger the micro-level fields are related to each other. The color of a circle indicates the main field to which a microlevel field belong. The color coding and positioning of the main fields is as follows. Mathematics & Computer Science (purple) are located in the top-right, the Physical Sciences & Engineering (blue) are located in the bottom-right, the Life & Earth Sciences (yellow) are located in the center, the Biomedical & Health Sciences (green) are located in the bottomleft, and the Social Sciences & Humanities (red) are located in the top-left in the visualization.
The 4,113 micro-level fields and the five main fields of science enabled us to analyze readership activity in Mendeley at different levels of granularity. In order to analyze readership activity from different perspectives and to allow for analyzing differences between research fields and user types, the following statistics have been calculated for each of the micro-level fields and main fields of science:   Visualizations providing overviews of the above described statistic at the level of 4,113 micro-level fields were constructed. The VOSviewer software tool (Van Eck & Waltman 2010) was used for this purpose. So-called overlay visualizations were constructed using version 1.6.6 of the VOSviewer software tool. These visualizations can be used to show additional information on top of a base map (e.g. Leydesdorff & Rafols 2012;Van Eck et al. In this case, the visualization of the 4,113 micro-level fields presented in Figure 1 was used as a base map. The constructed overlay visualizations enabled us to analyze readership activity in Mendeley in a fine-grained way and to identify possible differences between fields and user types.

Results
A number of different analyses were performed in order to answer the research questions stated in the introduction of this paper. This section presents the results of these analyses. First, results on the coverage of publications by field and user type are presented in order to provide a complete overview of the coverage of Mendeley. Then, results on the readership activity in Mendeley by field and user type are presented. Finally, results on the topics of interest of Mendeley users are presented. As already indicated in the Data and Methodology section, overlay visualizations of the 4,113 micro-level fields played an important role in our analyses. In this section, static figures of the overlay visualizations are presented. The overlay visualizations can also be explored interactively using the VOSviewer software tool. 6 The interactive version of the overlay visualizations is available online at https://goo.gl/CJVRzL. The interactive visualizations offer the possibility to zoom in on a specific area in the visualizations and to explore in more detail the micro-level fields located in that area. The interactive visualizations also offer additional information that is not visible in the static figures. By hovering the mouse over a micro-level field, more detailed information on the field is presented.

Coverage of publications saved by Mendeley users across different fields
In this subsection, we analyze the coverage of our dataset in Mendeley. Table 2 presents the total number of publications in our dataset and the coverage of these publications in Mendeley. With coverage in Mendeley we mean the percentage of publications with at least one reader in Mendeley. Table 2 also presents the breakdown of the coverage by field and by user type. In Figure 2, the coverage of the 4,113 micro-level fields is visualized using VOSviewer overlay visualizations. The coverage is shown for all Mendeley users (Figure 2a) and individual user types (Figure 2b, 2c, 2d, 2e, and 2f). As explained above, each circle represents a micro-level field. The size of a circle indicates the number of publications in our dataset in a micro-level field. The larger the circle, the larger the number of publications. The color of a circle indicates the percentage of publications in a micro-level field that is covered in Mendeley. The color ranges from blue to green to red showing low, medium, and high coverage. The positions of the main fields are as explained in the previous section.
As can be seen in Table 2 and Considering user types, it is clear that students are very active in Mendeley. Table 2 shows that the coverage is the highest for students (87.9%). This is followed by researchers (70.3%) and professors (63.6%). The lowest coverage is for other professionals (33.2%) and librarians (10.0%). The order of user types regarding the coverage is identical in the Biomedical & Health Sciences, the Life & Earth Sciences, and the Physical Sciences & Engineering. In Mathematics & Computer Science and the Social Sciences & Humanities, slightly more publications are saved by Mendeley users classified as professors than those identified as researchers. If we look at the coverage at the more detailed level of microlevel fields, we see that the visualization for students (Figure 2d) best resembles the general pattern based on all users (Figure 2a). The visualizations for professors (Figure 2b) and researchers (Figure 2c) are fairly comparable. A lower coverage can be observed for the peripheral micro-level fields and the micro-level fields from Mathematics & Computer Science. The visualizations for librarians (Figure 2e) and other professionals (Figure 2f) are most different from the general pattern (Figure 2a). Most micro-level fields show a relative low coverage for those user types. In the case of librarians, we see a somewhat higher coverage for micro-level fields at the intersection of the Biomedical & Health Sciences and the Social Sciences & Humanities. In the case of other professionals, micro-levels fields with the highest coverage are from the Biomedical & Health Sciences and the Life & Earth Sciences.

Reader and citation counts of publications
In this subsection, we analyze the reader counts and the citation counts of the publications in our dataset and we make comparisons across fields and between Mendeley user types. Table 3 presents the total and average number of readers per publication in Mendeley by main field and by Mendeley user type. Similarly, Table 4 presents the total and average number of citations per publication in WoS by main field and by Mendeley user type. Table 5 presents the normalized readership activity by main field and by Mendeley user type. Here, the number of readers in Mendeley are normalized by the number of citations in WoS. In Figure 3, the same statistics are presented at the level of the 4,113 micro-level fields using VOSviewer overlay visualizations. Each circle represents again a micro-level field and the size of a circle indicates the number of publications in our dataset in the corresponding micro-level field. The color of a circle indicates the average number of readers per publication (Figure 3a), the average number of citations per publication (Figure 3b), and the normalized number of readers in a micro-level field (Figure 3c).   Based on the results in Tables 3 and 4, differences between the number of readers and the number of citations across fields can be detected. It can be seen in Table 3  If we look at the more detailed level of micro-level fields, we see that the visualization based on reader counts (Figure 3a) differs significantly from the visualization based on citation counts (Figure 3b). The differences that are visible in these visualizations are in line with the results at the level of main fields discussed previously. The highest average number of readers can be observed for micro-level fields from the Social Sciences & Humanities and the Life & Earth  Sciences, while the highest average number of citations can be observed for micro-level fields from the Biomedical & Health Sciences and the Physical Sciences & Engineering. Table 5 and Figure 3c show that particularly when the citation density of each main field and each micro-level field is considered, an above average number of readers for publications from the Social Sciences & Humanities is observable. This emphasizes the fact that fields with a relatively low citation density in WoS receive a relatively high number of readers in Mendeley. This is in particular observable for publications saved by librarians, professors, and students and could reflect the usefulness of these publications in practical, training, or educational contexts.

Relative activity of Mendeley user types
In this subsection, the relative activity of Mendeley users is presented. Figure 4 provides visualizations of the relative activity of the different Mendeley user types at the level of the micro-level fields. As explained in the Data and Methodology section, the relative activity of a Mendeley user type in a field is calculated as the average number of readers per publication based on the activity of the Mendeley user type in the field divided by the average number of readers per publication based on the activity of all Mendeley users in the field. This approach provides us with a detailed overview of the micro-level fields in which different user types are relatively most and least active.

Topics of interest of Mendeley user types
In this subsection, we present an overview of specific topics of interest of different Mendeley user types. For each user type, we identified the micro-level fields in which the users are relatively seen most active. The identification of the micro-level fields was done based on the relative activity of the users of a user type. Micro-level fields with a small absolute number of readers have been filtered out. To get an impression of the topics of the micro-level fields that have been selected in this way, a summary of the top 5 micro-level fields per Mendeley user type is provided in Tables A1 to A5 in the Appendix. For each micro-level field, the tables list the number of readers based on the activity of the corresponding user type, five characteristic terms, the three journals with the largest number of publications, and the most frequently saved publication. By analyzing the micro-level fields listed in Tables A1 to A5 in the Appendix, differences between topics of interests of different Mendeley user types can be observed. Based on the results and in answer to the second main research question, it is interesting to see that professors have a relatively strong focus on topics related to teaching and education, like higher education, medical education, and second language acquisition (Table A1). Researchers seem to be interested in a broad range of topics. Their topics of interest range from climate research, pharmaceutical research, and biotechnology to astronomy and astrophysics (Table A2). Students seem to be biased towards topics such as business, management, and leadership (Table A3). Librarians show relatively most interest in topics that seem to be directly related to their work, namely bibliometrics and scientometrics, library science, and research utilization (Table A4). Other professionals seem to be mostly focused on biological, medical, and clinical oriented topics (Table A5). Below we further elaborate on the way in which different topic interests of different Mendeley user types could indicate different types of usage of scholarly publications.

Discussion and Conclusions
Mendeley is known as a promising source for altmetrics. Readership data from Mendeley can be used to reveal differences in reading (saving) behavior of different types of users. In this study, we have explored the usage of 1.2 million WoS indexed publications by different user types in Mendeley. The aim was to see if there are any differences in readership activity and topics of interest. VOSviewer overlay visualizations have been used to identify and depict these differences.
The findings of this study show that there are quite some disciplinary differences in terms of readership activity and in terms of the topics of interest among different user types in Mendeley. Publications from the Social Sciences & Humanities receive on average the highest number of readers, which may indicate that Mendeley is relatively more popular in this field than in other fields. It is interesting to see that this is in sharp contrast to citations, which are typically less concentrated in the Social Sciences & Humanities and most concentrated in the Biomedical & Health Sciences, the Life & Earth Sciences, and the Physical Sciences & Engineering. The purpose for which Mendeley is used by different user types could help to explain the disciplinary differences. A recent study (Thelwall 2017), for instance, found that professional health-related areas receive high readership in Mendeley due to the usage of the tool in training, while mathematics and high energy physics receive low readership due to use of other tools such as LaTeX. Another study showed that F1000 publications with the tag 'good for teaching' (publications that provide a good overview of a particular topic) receive most attention by Mendeley users classified as lecturers and publications with the tag 'new findings' receive most attention by users classified as researcher . It indicates that Mendeley is used for different purposes. The results of a survey among Mendeley users show that most of the respondents use Mendeley as a tool to cite literature in their publications, to keep track of relevant publications for their jobs, and to teach ). Another recent survey shows that browsing papers and groups and connecting with other users are among the motivations of Mendeley users. Most of these users reported that they have read and cited or intended to cite most of the items in their Mendeley library (Chen et al. 2018). Moreover, research-based features (managing documents and citations) in Mendeley are more popular by members of online groups than social-based features (making friends and connections) (Jeng, He & Jiang 2015). The results from these previous studies coincide with the results of a recent survey (Tenopir et al. 2015) which indicates that the main reason for reading scientific publications in general for US faculty members is research and writing, teaching, current awareness, and education.
In terms of the topics of interest, the results of this study indeed indicate that different user types have relatively more attention for publications related to their role and the purpose for which they use Mendeley. This could range from conducting (literature) research, writing articles, training or (self) education, or the usage of a device or method. We have, for example, found that publications related to teaching and education show high readership among professors. This may be expected since professors use Mendeley among other things to organize literature for teaching and publishing. It assumes that professors reflect both educational and research use. Furthermore, the results seem to suggest that students focus on more general and fundamental topics. Although the fact that a publication is frequently saved by students does not provide conclusive evidence of its educational impact (Thelwall 2016), students seem to frequently save fundamental and basic methodological publications that they read for educational purposes and this could reflect the use of Mendeley by students as a source for course material or as a source for their master or doctoral thesis. The strong interest of researchers in publications about applied sciences could show the scientific impact of these publications in an applied context. Researchers seem to focus on the research front and seem to use Mendeley mostly in a pre-citation context. Other professionals which may, for example, include medical doctors, nutritionists, and lawyers seem to be mostly interested in publications about diagnosis, treating, tools, and devices. In other words, other professionals are more likely to show interest in publications that have practical relevance in their work. It is also interesting to see that publications related to library and information sciences or clinical guidelines show high readership among librarians.
In conclusion, the various patterns of usage of scientific publications observed among the different user types in Mendeley could be an indication of the importance of these publications in research, training, (self) education, or in any professional, practical or applied context. Exploring how the topical interests of users differ across various fields provides useful information on who use scientific outputs, from which fields, and for what purposes. The possibility offered by Mendeley to track the use of scholarly publications by different types of users is an advantage that citation databases lack. Readership statistics based on different user types provide a broad overview of the usage of scholarly publications by a wide range of audiences including non-publishing users. This is important information in addition to information on citations, especially in fields with a low citation density or fields in which citations accumulate slowly. Detailed information on the usage of scholarly publications could help relevant stakeholders, such as researchers, librarians, publishers, funders, and policy makers to get more insight into the full impact of publications.
In addition to the number of users that have saved a publication, more information on specific activities of users in Mendeley could help to get a more accurate and comprehensive picture of the actual usage and impact of scholarly publications. For instance, it would be interesting to have more information on user actions such as assigned tags, notes that are added or parts that are highlighted in the full-text, and time spent on a saved publication. This type of information is not disclosed at the moment for Mendeley. More research is needed to find out whether and how this type of information can be useful in any practical application.