The Relationship Between Institutional Factors, Citation and Altmetric Counts of Publications from Finnish Universities

The goals of this research were two-fold. First, this research set out to investigate possible institutional characteristics that may or may not have an influence on online attention or, in other words, the number of altmetric events surrounding the scientific articles from that institution. The results suggest that international connections are important in the accumulation of altmetric events, possibly due to the creation of weak ties between researchers and their institutions. Second, it was studied whether the institutional research profile, i.e. in what fields the institution published, matched with the distribution of altmetric events across the same fields of science. Our analysis shows that the universities’ research profiles are not always reflected in the online events. Overall, the results of both goals of this research reflect a complex system where the received online attention can be attributed to many different factors.


Introduction
The idea for altmetrics originates from researchers' need to find new ways to locate relevant and interesting scientific articles (i.e., filtering) from the ever-increasing amount of scientific publications (Priem et al. 2010). Yet altmetric events have primarily been researched from a research evaluation perspective, with some qualitative approaches to analyze the online mentions of research products recently having been introduced. Earlier research on altmetrics has focused on investigating how different altmetric counts are connected to citation counts (e.g., Thelwall, Haustein, Larivière & Sugimoto, 2013;Haustein, Lariviére, Thelwall, Amyot, & Peters, 2014;Haustein, Peters, Sugimoto, Thelwall & Larivière, 2014;Bornmann, 2015), in some cases finding evidence of a connection between the two metrics. Other studies have analysed the potential influence of various document characteristics (e.g., discipline, title length, number of references and level of collaboration) on future altmetric events that research outputs attract (e.g., Haustein, Costas & Larivière, 2015;Didegah, Bowman & Holmberg, 2016) or how citation and altmetric counts differ for different disciplines (e.g., Costas, Zahedi & Wouters 2015). The goal of many earlier studies has been to explain the meaning of altmetrics and to understand what the online attention some research receives could reveal about the research at an article level. This article continues this line of research and investigates effects at an institutional level by studying altmetric events of publications from Finnish universities.
The first goal of this research is to identify if some institutional properties such as size of staff, amount of external funding, and number of international research visits have a connection to the level of online visibility the research publications receive on different online platforms. This line of investigation could reveal some new information about the mechanisms behind the creation of altmetrics and their possible connection to institutional properties of the organizations producing scientific outputs. We analyze the events aggregated by Altmetric.com and Mendeley associated with research articles from 10 universities in Finland between the years 2012 to 2014 from Wikipedia, Twitter, Facebook, mainstream news, blogs, and CiteULike, in combination with Mendeley readership counts retrieved from the Mendeley API. The second goal of this research is to investigate how the research profiles of the institutions (as measured by the distribution of Web of Science (WoS) indexed publications across different disciplines) correspond to the distribution of online attention (i.e. altmetric events) the same publications have received on different platforms. In other words, the second goal of this investigation can reveal some new insights into how well altmetrics can reflect the research profiles of universities. This article proceeds as follows: in Section 2, the literature on the subject is summarised; in Section 3, the data collection and methodology of the study are described. In Section 4, we present the results of the study (section 4.1 addresses the first research question and 4.2 addresses the second research question) and in Section 5 we discuss the results and conclude with our findings.

Background
Altmetrics (short for alternative metrics) has emerged as a potential complementary data source for metrics connected to research performance. Indicators derived from scientific publications and citations are frequently used to measure scientific impact, but they do not take the complexity of scientific activities into account. Citations, for example, only reflect how often other researchers have used a specific scientific article, thus only reflecting the scientific impact of research, while research can and often is expected to have much wider impact on the society. As altmetrics are aggregated from online platforms open to the general public (as well as researchers), they have the potential to reflect both new forms of scholarly communication and the attention received from a wider audience outside of academia. However, there are still many unanswered questions about the applicability and reliability of altmetrics. Altmetrics are not without challenges. Earlier research has shown how only a fraction of scientific outputs receive online attention that generates altmetrics (e.g., Costas, Zahedi & Wouters, 2015). Altmetrics can be manipulated unintentionally or intentionally by automated accounts or so-called bots on various platforms ). Data quality issues and the dependency on the availability of both APIs for data collection and DOIs for identification place great challenges for altmetrics research (Haustein, 2016). Furthermore, the heterogeneity of altmetrics makes it important to view altmetric events identified on different platforms separately (Haustein, 2016). For instance, earlier research into the reasons for engaging with research outputs online has shown how the motivations vary between platforms and how the reasons for engagement vary even within the platforms (Holmberg & Vainio, 2018).
Many studies have approached altmetrics by studying correlations between traditional bibliometric measures and different altmetrics. Thelwall, Haustein, Larivière and Sugimoto (2013) identify that there is a relationship between blog mentions, Facebook wall posts, forum posts, mainstream media mentions, research highlights and tweets on article citation counts. However, they find that the coverage and overall number of mentions was very low for all studied platforms, with the possible exception of Twitter. More recently, some other studies have also found a connection between different altmetrics and later number of citations (e.g., Wang et al. 2017;Finch, O'Hanlon, & Dudley, 2018;Costas, Zahedi & Wouters, 2015), while other studies have not found any connection (e.g., Hassan et al. 2017;Delli et al. 2017;O'Connor et al. 2017;Ruano et al. 2018). These findings may be a result of several factors including differences in methodology, data samples, or possibly changes in the usage of specific platforms over time. While more research is clearly needed to understand how altmetrics are generated and what aspects of scholarly communication the accumulated online attention research receives can reflect, some aspects about the usefulness of specific data sources are emerging. Mendeley readership, for instance, has been suggested to be an important source for altmetric data due to its scholarly user base and similarity to citations. Thelwall (2017), for example, suggests that Mendeley reader counts could be used for early citation impact evidence (if used with caution), as they tend to have strong correlations with citation counts across almost all scientific fields. Twitter, however, may not be as suitable as a data source for altmetrics. Robinson-Garcia et al. (2017) find that only a small portion of the tweets mentioning scientific articles included some commentary about the article or some other evidence of engagement with the article. The majority of tweets was "almost entirely mechanical and devoid of original thought" and some were generated automatically by bots.
Most of earlier altmetrics research has focused on the possibilities of using altmetrics as article level metrics, while research on the applicability of institutional or country level altmetrics is almost non-existent. Alhoori et al. (2014) studied country level altmetrics and suggested that altmetrics could support research evaluation at that level. Alhoori et al. (2014) discovered a weak connection between aggregated country level altmetrics and more traditional impact measures, such as number of publications and citations. In more traditional scientometrics research aggregations of measurable events to various levels are more common. The much criticized (see e.g., Lariviére & Sugimoto, 2018) Journal Impact Factor (JIF), for instance, is an aggregation of the number of publications and citations a specific journal receives in a specific time frame. One of the criticisms surrounding the JIF is that it can be heavily influenced by a few articles that receive an exceptional amount of citations; Seglen (1997) writes "the most cited half of the articles are cited, on average, 10 times as often as the least cited half". More recently it has been discovered that up to 75% of articles have fewer citations than the JIF of the journals would predict ). It appears that the complexity of scientific activities is lost when aggregating bibliometric data to higher levels. This research investigates whether this also holds for altmetrics and whether aggregating altmetrics to an institutional level is useful in revealing some new aspects of altmetrics and the outside influence potentially influencing the creation of altmetrics. The goals of this research can be summarized into the following two research questions: 1. How do specific institutional properties influence the level of attention research outputs from a specific institution receive? 2. How well are the institutional research profiles reflected in the altmetrics events?

Data and methodology
The data about Finnish research publications was retrieved from the national Juuli research publications database. Juuli is maintained by the National Library of Finland in collaboration with the Finnish Ministry of Education and Culture and CDC-IT Centre for Science. The data for the database is collected annually from Finnish research organisations. For this article, a total of 114,496 publications were collected from 14 Finnish universities ranging from the years 2012 through 2014. CrossRef was queried through their API in an effort to add any missing digital object identifiers (DOI) to the data, after which a DOI was identified for 38,819 publications. These DOIs were used to search the altmetric data provided by Altmetric.com. This data showed that a total of 12,438 Finnish research publications from 2012-2014 had at least one recorded altmetric event. For some publications it was discovered that researchers from more than one Finnish university had collaborated, and these publications were counted for each participating university in this analysis. After these steps, the final data compiled for the study contained a total of 13,031 Finnish research publications from 2012 through 2014 with at least one altmetric event captured by Altmetric.com. A summary of the amount of articles included in the study from different universities can be seen in Table 1.
The analysis consisted of both bibliometric and altmetric data about the research publications and descriptive data about the universities. The altmetric data contains mentions of the research publications in blog posts, news articles, Facebook posts, Twitter posts, CiteULike, Mendeley and Wikipedia articles.
The descriptive data from universities consisted of the number of publications and citations in the same time period (retrieved from Web of Science), and the logarithmised amount of external funding the universities received, the proportion of foreign researchers, number of research visits, and the amount of research personnel. The latter data was retrieved from the Finnish Vipunen database, which is a national education statistics database maintained by the Ministry of Education and Culture and the Finnish National Agency for Education. In order to answer the first research question these factors were examined in relation to the altmetric events and citations. The approach of this article is similar to that of Alhoori et al. (2014), Didegah and Thelwall (2013), Thelwall, Haustein, Larivière and Sugimoto (2013), and Torres-Salinas, Robinson-Garcia and Jiménez-Contreras (2016), but examines different factors and university-level data with linear regressions instead of country-level data with correlations. This article attempts to approach the question by employing regression models. It should be noted that the data might contain some of the problems listed by Haustein (2016) as challenges of altmetrics. For example, the amount of level 4 staff (professors) or external funding in universities might very well be driving factors for total other research staff, i.e. influencing the number of other research staff. Table 2 lists some descriptive statistics for the universities in the sample. In order to answer the second research question, the universities' research profiles-as measured by the normalized attention received from different altmetric  events by the Organisation for Economic Co-operation and Development (OECD) main categories-were compared with the universities research profiles based on their research outputs as measured by Web of Science classification of the fields of publications. Due to a low number of publications in some areas, OECD categories were merged, which resulted in four main categories used for this study: i. Agricultural Sciences, Engineering and Technology ii. Medical and Health Sciences iii. Natural Sciences iv. Social Sciences and Humanities.

University-level factors' influence on altmetric events
The university level factors were chosen to represent different aspects of the universities' activities: the size of the university, their internationality (to some degree), publishing activity and their level of success in securing external research funding. The following factors were tested: i) Level 4 staff (total working hours of professors per year) ii) Other research staff (total working hours per year) iii) International research visits from Finland (number of visits with a duration of at least two weeks) iv) International research visits to Finland (number of visits with a duration of at least two weeks) v) Peer-reviewed published journal articles vi) Amount of external research funding accrued from outside the university (in millions of euros) First, the effect of each potential factor was examined separately in order to avoid multicollinearity issues. The levels of significance were omitted from the tables with the ordinary least squares estimates as all variables were found to be individually statistically significant at the 1% level. It should be noted that the coefficients of the effect of the total share of foreign researchers should not be directly compared to other estimates, which are based on absolute value variables, because they define the effect of a change of 1% as opposed to a change in absolute values.
The first test is defined as: where altmetric measure is the activity in altmetric events, factor i,t is the tested variable from the list above, for the universities i in year (t), and ∈ i,t is the error term which defines the difference between the estimated values against a linear effect. Table 3 presents the ordinary least squares estimates (standard errors in parentheses) for the chosen factors on different altmetric measures. The levels of significance were omitted from the table as all the factors were found to be strongly statistically significant as individual explanatory variables at the 1% level. The estimates explain how the change of a single unit affects the altmetric measures. For example, an increase of one full-time professor in a university increases Web of Science citations by 1.513, Wikipedia citations by 0.107 and tweets by 12.145.
Some conclusions that can be construed from the estimates in Table 3: i) Level 4 staff members are, on average, notably more efficient than other research staff in publishing research that is shared through the studied platforms. Whether this is due to these staff members publishing more or being more active in sharing their research through the studied channels is an open question. ii) Foreign academic visitors to Finland have an influence on how much attention Finnish research receives online.
They are, on average, somewhat more active in publishing research that is shared through the studied channels. Similarly academic visits from Finland also have a positive influence, so visiting a foreign university increases the altmetric visibility of research. iii) Outside funding is a substantial factor in published research, although it can be argued that investing the same amount into permanent research staff could yield higher returns in research in the long run as investing a million euros for just a single year earns three to four times the effect of a single level 4 staff member, or about twenty times the same for a single other research staff member.
Many of the studied factors are strongly connected to each other. For example, the amount of level 4 staff members has a strong effect on the amount of total staff. In statistical analysis this problem of two factors defining each other is called endogeneity. In order to further study the effects of the chosen factors on the altmetric measures with multiple regression models, this inherent problem has to be addressed. This can be attempted with instrumental variable methods. Based on the results of OLS-estimates in Table 3, both the level 4 staff members and outside funding appear to be strong, unrelated driving factors for all the altmetric measures. Other research staff and the amount of users for the studied altmetric source are used as instruments to eliminate some of the effects of differing amounts of users for an altmetric channel and variances in total amount of research staff in universities. The results in Table 3 give some evidence of the relationship between all the factor variables and the altmetric measures to be somewhat linear, thus the following two-stage least squares estimation is used for estimating. 1 where altmetric measure is the activity in altmetric events, level 4 staff and outside funding are the variables presented earlier from each university on an annual basis and where research staff i,t is the variable presented earlier, altmetric measure users is the number of users/sharers in the dataset and γ i,t is the error term. Table 4 presents the two-stage least squares estimates (standard errors in parentheses) for level 4 staff and outside funding when instrumented with users of the altmetric channel in question and non-professor research staff. The amount of professors is statistically significant at the 1% level for Web of Science, CiteULike, and Wikipedia citations, statistically significant at the 5% level for Facebook posts and Twitter posts, and at the 10% level for blog posts and news mentions. 2 The amount of outside funding has a weak statistically significant, slightly negative effect on CiteULike readers, Wikipedia citations, and Facebook posts and a positive effect on Twitter posts.
Using the coefficients and adjusted R 2 to directly compare the two-step least squares regressions to each other should not be done as the set of instruments change in each estimation. These measures can, however, provide some evidence for the fits of regressions. The amount of level 4 staff especially explains a portion of the changes in Web of Science citations counts CiteULike readers, and Wikipedia citations. For blogs, news posts, Facebook posts, and tweets, the model explains a smaller portion of the changes in altmetric activity when controlled with user/sharer counts and other research staff.

University-level research profiles and altmetric profiles
The research profiles of the universities (as measured by the distribution of published research articles across different research areas) were also examined to determine if the distribution of altmetric events across research areas would reflect the distribution of research outputs across the same research areas. In addition, the results show how the altmetric events from different sources are divided across the OECD categories for each university. While some universities are doing particularly well in Medical and Health Sciences on all altmetric sources (even when they do not necessarily have a medical school like Aalto University), other universities are doing especially well in Social Sciences on Facebook or Engineering and Technology in news sources. The results paint a picture of universities receiving online attention that may be different from their primary research profiles. Future research could focus on qualitative analysis of these reasons.
The distributions of the events were compared between the OECD categories for all universities. The average distributions of events across different platforms (including WoS publications and citations) are presented in Figure 1. The results reflect the overall popularity of Medical and Health Sciences articles on platforms such as Twitter and Facebook, while articles in Natural Sciences receive much lower attention on Twitter and Facebook than the publishing activity in the field would suggest. The results also demonstrate an overall low attention across all platforms received by articles in Social Sciences and Humanities. The results reflect how different types of research receives more attention on different platforms. The platforms showing distributions closest to that of citations may be further evidence of the platforms closer connection or more important role in scholarly communication, however, further research is needed to confirm this.
The results from the Spearman Rank correlation between the distributions (as shown in Table 6) indicate that on average the distribution of altmetric events for University of Helsinki and University of Eastern Finland across different research areas correspond very well with the distribution of their research output (0.641 and 0.742 respectively), while the altmetric events for Tampere University of Technology, University of Jyväskylä, and Aalto University do not correspond that well with their research output (-0.042, 0.001, and 0.121 respectively). The implications of these results are further discussed in the next section.

Discussion
The goals of this research were two-fold. First, this research set out to investigate possible institutional characteristics that may or may not have a connection to or influence on the online attention of the research outputs from that institution, in other words, the number of altmetric events surrounding the scientific articles from that institution. Second, it was studied whether the institutional research profile, i.e. in what fields the institution published, matched with the distribution of altmetric events across the same fields of science. This research is, however, not without limitations. One clear Figure 1: Average distribution of events across different platforms by merged OECD categories. Table 5: University research profiles based on the distribution of events and publications (%) across four major research areas (OECD) by universities.

WoS profile Wikipedia Twitter Facebook News Blog CiteULike Mendeley Citations (WoS)
Aalto University   limitation of this research is that it focused only on scientific articles and specifically scientific articles that had received at least one altmetric event. As altmetrics data are now being collected also for scientific books (Williams, 2017), although the data still has some challenges (Torres-Salinas, Gorraiz, & Robinson-Garcia, 2018), future research could include scientific books in similar studies. Furthermore, the coverage of Web of Science may also have had some influence on the results (in particular on the second part) of this research (Mongeon & Paul-Hus, 2016). A second limitation of this study is that DOIs were used to match research articles with identified altmetrics events, thus neglecting many research articles that do not have DOIs assigned to them and that do not have any identified altmetrics events attached to them. As the use of DOIs becomes more comprehensive, and as other methods to identify altmetrics connected to research articles are developed, we can expect that future research can work with more comprehensive datasets. Furthermore, the impact that possible self-promotional activities of the authors of the investigated publications may have had on the altmetrics is unclear. Investigating possible intentional manipulation of altmetrics was, however, beyond the scope of this research. The results of this study suggest that international connections are important in the accumulation of altmetric events. International visits both from and to Finland have a clear statistical influence on the altmetric events, especially on the number of tweets (and to some degree to the number of news articles). One possible explanation might be the influence of foreign academic visitors to Finland and visitors from Finland on a wider international network of people who become aware of published research. For example, Granovetter (1973) discusses the benefits of weak network ties in the diffusion of influence and information, mobility opportunity, and community organization. From this point of view, visiting scholars might create a weak tie between Finnish research (and researchers) and a wider international audience. Similarly, the effect that foreign university staff has on the altmetric events might be attributed to their existing networks and the diffusion of information through their established networks. Therefore, the effect might be due to an extension of the overall network availability of the published research rather than an individual researcher being more effective. A possible direction for future research would be to apply more advanced econometric methods to study the data. For example, one might use maximum likelihood estimates or random effect models to diminish the multicollinearity present in the data. It could also be interesting to study whether the factors used in this study affect altmetric measures in different ways across different institutions.
As for the investigated institutional research profiles and their similarity or dissimilarity with the distribution of altmetric events across different research areas, the results reflect 1) the popularity of articles in Medical and Health Sciences on some platforms, and 2) how the research profile of an institution is not necessarily reflected in the online attention the published work from that institution receives. Other studies have also found that articles from medical sciences receive more attention than articles from other fields (e.g., Cho, 2017), which can be possibly explained by the audiences' general interest in medical matters, as many people are influenced directly by some medical findings. Another explanation could be that some of the articles gaining significant attention have received the attention due to their curious or humorous titles . The second finding might be explained by some particularly popular articles that receive significant online attention, thus skewing the attention in benefit of that particular research area. Future research using more qualitative methods, such as content analysis, could confirm this hypothesis. It is, nevertheless, clear that the altmetric events do not necessarily reflect the institutional research profiles.
Overall, the results reflect a complex system where the received online attention can be attributed to many different factors. This may suggest that aggregating altmetrics to an institutional level may inherit similar problems as the aggregation of citations to calculate Journal Impact Factors have, i.e. how some popular articles can skew the end result and should thus be avoided.