Differences between Altmetric Data Sources – A Case Study

This paper examines the data accuracy and number of altmetric counts reported by Mendeley, altmetric. com and PlumX at two points in time: June 2017 and April 2018 for the dataset of 2,728 articles and reviews published in JASIST between 2001 and 2014. The findings show growth in the number of citations and Mendeley readers over time. In addition, the results also show that there are differences in the altmetric counts reported by the two-altmetric aggregators Altmetric.com and PlumX.


Introduction
Since its inception in 2010 (Priem et al., 2010) altmetrics has been actively promoted as a new set of indicators appropriate to be used for evaluating and capturing the broader impact of scholarly output. Through the years, several studies questioning the meaning of altmetric indicators and what they actually measure were published (e.g., Rasmussen & Andersen, 2013;Haustein, Bowman, & Costas, 2016). One of the major challenges with the use and interpretation of altmetrics indicators is directly related to questionable data quality (Haustein, 2016) and high dependency on commercial providers (aggregators) of altmetric data (Costas, Zahedi, & Wouters, 2015).
Readership counts most often measured by the number of document-saving events on the reference manager Mendeley, have the greatest coverage among all the altmetric indicators (Zahedi, Costas, & Wouters, 2013). Not only is this one of the most prevalent altmetric indicators currently captured (Haustein, Bowman, & Costas, 2016), but it is also the one that most highly correlates with citation counts (around .5) (Mohammadi & Thelwall., 2014;Haustein et al., 2014).
Altmetrics aggregators including the two most often used ones; Altmetric.com and PlumX report Mendeley readership. Considering data accuracy issues and the importance of Mendeley readership data as an indicator, it is important to examine the coverage of readership counts across the data source itself (Mendeley) and two major aggregators (Altmetric and Plum X). Examples of previous reliability studies are (Zahedi, Fenner, & Costas, 2014;Ortega, 2018).
Given that indicators are only as good as the data they are based on, it is not surprising that similar questions were asked in previous studies when comparing citation counts in bibliometric databases (WoS, Scopus, and Google Scholar) (Bar-Ilan, 2008;Halevi, Moed & Bar-Ilan, 2017;Trapp 2016;Boeker, Vach, and Motschall, 2013;Harzing and Alakangas, 2016;Bramer, Giustini, and Kramer, 2016). These studies have shown that there are considerable differences between the numbers reported by the databases, primarily due to the differences in coverage, types of documents covered, and errors.
Mendeley is an online reference manager, reporting the number of users who downloaded an item to their Mendeley libraries ('readers'). Altmetric.com and PlumX are aggregators that report multiple altmetrics, including the number of tweets, blogs, Wikipedia and news mentions. They also report Mendeley reader counts. Altmetric.com and PlumX might report different altmetric scores for a number of reasons. First, they might cover different news and blog sources. In addition, they might use different ways to identify mentions (e.g., by DOI, PMID, arXiv id or title-author-sourcepublication year). Finally, they might have different schedules for updating from the primary data sources Ortega, 2018).
Despite of its comprehensiveness, the Mendeley database has some inherent problems that affect the data it generates. First, the users drive the database. Each user identifies a publication of interest and adds it to their personal libraries. This, in turn, makes the data more prone to errors than third party, curated databases. Secondly, there are often errors that can be found in the metadata fields, which prevent Mendeley to correctly aggregate reader counts, and quite often there is more than one record for a given item. Finally, when using the Mendeley API to retrieve reader counts by DOI, only a single record is retrieved (which is not necessarily the record with the highest number of readers), thus the numbers reported by the aggregators might be an underestimate. Mendeley reorganizes and cleans its database from time to time (Gunn, 2016), which might result in a decrease in the number of readers reported. The Mendeley API allows to search by several fields, such as author, title, publication source, publication year and the like, but since the metadata is entered by users who do not use a predefined format, often multiple records are created for the same item. In addition, because of possible metadata errors the search query might miss records for the given item. When querying the API by article title, Mendeley often returns multiple records for the same article. Finally, special characters in the text fields do not render well in Mendeley.
The aim of this study was, therefore, to examine the data accuracy and number of altmetric counts reported by Mendeley, Altmetric.com and PlumX at two points in time: in June 2017 and in April 2018 and to compare the reported altmetrics at each data collection point. The research questions are as follows: 1. Do all platforms report the same counts when data is downloaded on the same day? 2. What are the changes in the counts over time?

Data Collection
To answer the research questions, we used JASIST (Journal of the American Society for Information and Technology, between 2001 and 2013, and Journal of the Association for Information Science and Technology from 2014 and onwards) articles and reviews published between 2010 and mid 2017 (issues 1 to 7). The initial data collection took place on June 29, 2017, and the second round of data collection on March 29, 2018. The dataset is comprised of 2,666 articles and 62 reviews, altogether 2,728 items (referred to as 'articles' or 'documents' from this point onward). Results from the first data collection point were presented at the Altmetrics17 Workshop (Bar-Ilan & . Data from Mendeley and Altmetric.com were collected with the help of the Webometric Analyst tool developed by Mike Thelwall (http://lexiurl.wlv.ac.uk/). Mendeley was searched both by DOI and by title query. The title queries underwent data cleansing and aggregation of reader counts from multiple records of the same article. The main reason that data cleansing was necessary is due to the fact that when searching Mendeley using the titles it often results in multiple records, which, at times, might not be the ones searched for. In addition, there are cases where there are multiple records of the same item that need to be aggregated. Data from PlumX were downloaded from the PlumX dashboard, which has readily available downloading functionality. Both Altmetric and PlumX use DOIs as a primary source for data collection.

Mendeley Data Coverage and Reader Counts
Following a search by title, aggregating and cleansing the data from the Mendeley API, using Mike Thelwall's Webometric Analyst we found that Mendeley had reader counts for 2,628 publications (96.3%) in the dataset in 2017, and for 2,717 articles in 2018 (99.6%). In 2018 we conducted Mendeley reader count searches also by DOI (which is what the aggregators do), and retrieved reader counts for 2,690 documents (98.3% coverage). See Table 1.
As Table 1 shows, Altmetric.com reported Mendeley reader counts for 1,124 articles (40.8%), while PlumX reported reader counts 1,721 articles (63.1%) in 2017. This difference could be due to the fact that Altmetric.com records Mendeley readership counts only if there is at least one additional altmetric indicator for the document. Therefore, if  (Elsevier, 2013) and PlumX was acquired in February 2017 (Michalek, 2017). The integration of PlumX with Elsevier content might have taken some time therefore not having real influence on PlumX coverage data in June 2017. Now that it is well integrated, and both Mendeley and PlumX metrics are displayed on Scopus (previously the Altmetric.com donut was displayed on Scopus) an increase in Mendeley reads in PlumX counts is noted.
As also can be seen from Table 1, there is an increase in the total number of readers overall. Some of the growth can probably be contributed to increased coverage, but as can be deduced from the average, median and maximum counts there is an increase in the overall number of readers over time.
While the counts reported in Table 1 indicate that there was a growth, we also wanted to test whether this growth is statistically significant. Figure 1 shows the percentages of articles contained in different databases in 2017 and in 2018. Each data point has an error bar showing 68% significance range based on Poisson statistics. Only the change in PlumX coverage is statistically significant (p < 0.0001, whereas it is 0.04 and 0.11 for Mendeley and Altmetric). This can further confirm the growth of importance of PlumX as an altmetric aggregator. The intersection (items with reader counts reported by all three sources) also increased from 804 to 1,021. One should also note the increased intersection between Mendeley and PlumX. Figure 3 displays the distribution of the differences in the Mendeley reader counts per article in 2018 and 2017, as retrieved from Mendeley. Even in the primary data, one can observe that there are cases where the number of readers decreases (but only 1.4% of articles). This is possibly due to periodical rebuilds in Mendeley that include data aggregation and cleansing (Gunn, 2014).

Coverage overlap between Mendeley, Altmetric.com and PlumX
Figures 4 and 5 display the differences in Mendeley readership counts for articles that had at least one Mendeley reader in at least one of the two data sources compared -Altmetric and PlumX. As can be observed, in most cases the differences are small despite of some outliers. In some cases, the difference is negative, i.e. the aggregator reported higher number of readers than the source (Mendeley). For Altmetric.com 80% of articles show no difference in counts,

Twitter, Blogs, Wikipedia and Mainstream News -Almetric vs. PlumX
Our data shows reasonable Twitter activity for articles published from 2012 and onwards although Twitter was launched 6 years earlier, in 2006. This time gap could be due to the fact that it took several years for researchers to harness Twitter as a scientific communications tool. Therefore, for the purpose of Twitter mentions analysis, we considered only a s ubset of 1091 articles published between 2012 and mid-2017.
In the same manner we examined Mendeley readership overlaps, we also examined overlaps in Twitter coverage. Figure 6a and 6b display the overlap in 2017 and 2018 respectively. Unlike Mendeley, the overlap between Altmetric and PlumX is large, and gets larger over time.  As can be seen in Figure 7, the average number of tweets per article remained more or less stable for Altmetric. com whiles the average number of tweets per article decreased for PlumX. Interesting to note that almost for all years of publication, the average number of tweets reported by PlumX.in 2018 was lower than in 2017. This can be explained by the increased coverage of twitter by PlumX in 2018 tracking fewer tweets for the newly discovered documents.
Table 2 in the appendix shows that both aggregators provided rather high Twitter coverage in the dataset, between 66% to 78%, This is contrary to previous studies which reported no more than 25% Twitter coverage of their datasets. For example, Zahedi, Costas and Wouters, (2015), studied altmetric coverage of a large dataset (more than half million articles published from 2011 and onwards) from Altmetric.com and found 13% coverage for Twitter while Thelwall,

Conclusions
This paper demonstrates that overall there is a visible improvement in the coverage overlap between Altmetric.com, Mendeley and PlumX. We compared the same articles in two points in time; 2017 and 2018 and were able to see that within a relatively short amount of time, these three databases have reduced the number of coverage discrepancies. There are still evident signs of gaps in coverage but these seem to decrease over time and could be attributed to varying methodologies used by the three databases. For example, Mendeley is user-driven which makes it prone to errors. Mendeley users save records in different ways and thus there could be several instances of the same article or errors in the metadata itself that might prevent accurate account of readership. Altmetric.com only counts Mendeley readership if another altmetric indicator can be found for the article. Therefore, in cases where only Mendeley readership is found, Altmetric.com might not track the interaction until an additional altmetric indicator is found. This can also be a cause  for some of gap that we observed in the percentage of articles coverage. Although the gap seems to be reducing over time, it is recommended that altmetrics indicators and especially Mendeley readership counts will be analysed across more than one platform, even Mendeley. Because of some of its inherit metadata challenges, Mendeley data alone will, in some cases, not be accurate.
In the same manner, this article also showed that there are differences in the numbers of altmetrics indicators provided by Altmetric.com, Mendeley and PlumX. Again, this is a direct result of the manner by which each platform, tracks and reports altmetrics data as well as the sources it uses to do so. As with readership, we recommend that altmetric indicators analysis will be performed on more than one platform and compared to each other. First, one should look for articles coverage and ensure that the articles being analysed are indeed the same ones. Second, one should aggregate articles that are showing erroneous or partial metadata but are obviously the same. Lastly, one should collect the same altmetrics indicators from more than one platform and note whether there are significant differences between them. Unlike citations for example, altmetrics indicators are dynamic and are more difficult to control via standardizations. Therefore, despite of the considerable improvement in the overall overlap and coverage of articles in these databases, one should compare the results across platforms.

Limitations
This study was based on a relatively small sample in a specific field. Therefore, the results might be difficult to generalize. Further comparisons studies are needed across disciplines and years similarly to the ones that compare WoS, Scopus and Google Scholar (e.g. Halevi et al., 2017).