To What Extent Does the Open Access Status of Articles Predict Their Social Media Visibility? A Case Study of Life Sciences and Biomedicine

This study aimed to determine whether, and to what extent, the OA status and OA type of articles can predict their social media visibility, when controlling for a considerable number of important factors. Those factors, which previous research confirmed their positive association with altmetric counts, were journal impact, individual collaboration, research funding, number of MESH topics, topic, international collaboration, lay summary, being a mega journal, F1000 Score, and gender of first and last authors. The data for this study comprised 83,444 articles and reviews in the research area of Life Sciences and Biomedicine from 2012–2016, retrieved from Medline in November 2018. The results showed that the percentage of OA articles mentioned on altmetric platforms was significantly higher than those of the non-OA articles. Furthermore, Open Access was significantly associated with a higher probability of a paper being mentioned on the studied social media platforms. Compared to non-OA articles, the OA articles had a higher average of tweets, Facebooks posts, news posts, and blog posts. By increase of a unit in the OA status, the average number of tweets, Facebooks posts, news posts, and blog posts increased by 92.7%, 25.7%, 83.9% and 48.4%, respectively. Regarding the OA types (studied as Gold vs non-Gold), our findings showed that the Gold OA articles had a higher average number of Tweets and a higher probability of being mentioned in tweets and blogs.

1. To what extent does the OA status (OA, non-OA) of articles predict their social media visibility, when controlling for a number of important factors (journal impact, individual collaboration, research funding, number of MESH topics, international collaboration, lay summary, F1000 Score, mega journal, MESH topic, F1000 Score, and the gender of first and last authors)? 2. To what extent does the OA type (Gold vs. non-Gold) of articles affect their social media visibility, when controlling for a number of important factors (journal impact, individual collaboration, research funding, number of MESH topics, international collaboration, lay summary, F1000 Score, mega journal, MESH topic, F1000 Score, and the gender of first and last authors)?

Methodology Data collection
The data for this study comprised 83,444 articles and reviews in the research area of Life Sciences & Biomedicine from 2012-2016, retrieved from the Web of Science Medline in November 2018 using this query: SU = Life Sciences & Biomedicine. Using articles' PMID, a search was conducted in Altmetric.com (October 2017 version) in order to obtain the following altmetric indicators: tweet counts, Facebook posts, news posts, blog posts, F1000 post counts, F1000 score, policy mentions, and Wikipedia mentions.

Dependent, independent and control variables
The numbers of tweets, Facebook posts, news posts, and blog posts were considered as dependent variables in a regression model (explained below) to measure the extent to which the OA status of an article may predict its social media visibility. OA status of the articles and their OA types were obtained from Unpaywall.org in November 2019. In the same model, the OA status of articles and OA types (gold vs. non-gold OA) were considered as the independent variables and several other variables as control variables or covariates. The association between OA factors and social media counts may vary by adding or removing controlled factors from the regression model (Fraser et al. 2019). Hence, a number of control variables that may interfere the association were identified and entered into the regression model simultaneously. While previous research has introduced several factors Dehdarirad  in association with citation counts, not all potential factors are examined in association with altmetric counts. Given the high correlation found between citation and altmetric counts , same citation factors could be also important for social media visibility. Hence, the variables taken into account in this study are the important factors that significantly influenced both citation and social media counts of research articles in previous studies. Journal impact (gauged by SNIP and mega journals in this study) and research collaboration were the most important factors associating with both citation and altmetric counts in difference subject fields (Didegah & Thelwall 2013;Didegah, Bowman & Holmberg 2018). Research funding was found to be an important factor for citation counts in Life Sciences and Medicine (Didegah 2014). Both citation and altmetric counts were also found to vary across subject fields (Didegah & Thelwall 2013;Didegah, Bowman & Holmberg 2018). Because this study is done in the area of Life Sciences and Biomedicine, MESH categories were an appropriate subject classification to consider. Few factors are rather new in the field such as lay summaries, author genders and F1000 score. Lay summaries are a potential factor initially proposed by Didegah, Alperin & Haustein (2018) but its association with altmetric counts is yet a matter of question. Author genders also found to be an important factor on some altmetric platforms (Sotudeh, Dehdarirad, & Freer 2018). Table 1 lists all these different variables and their descriptions; few variables and how they were measured are further explained below. We also used publication year as an offset variable in the regression model. 1 To determine the gender of the authors (first and last), we used Gender API (https://gender-api.com/). This service offers a standard first name search with the possibility to handle double names. The response contains gender assignments (male, female, or unknown), plus confidence parameters, samples and accuracy (Santamaría and Mihaljević, 2018). In cases of gender-neutral, unknown, initials or in cases where the accuracy was lower than 80%, the names were checked manually using internet searches and authors' websites. The gender of 35 authors were remained unidentified. In our regression model they were regarded as missing values. 2 1 By otherwise we mean the other 13 MESH categories. 2 https://elifesciences.org/articles/25411?utm_source=content_alert&utm_medium=email&utm_content=fulltext&utm_campaign=elife-alerts. Regarding mega journals as a covariate in the regression model, we used the journal list provided by Spezi et al.'s (2017) study to determine whether a journal was mega journal or not.
Medline assigns articles to 14 broad MESH. In this article, only seven categories including Anatomy, Organisms, Diseases, Chemicals and Drugs, Analytical, Diagnostic and Therapeutic Techniques and Equipment, Psychiatry and Psychology and Health Care as the most relevant medical topics were considered for evaluation. For each of the seven MESH categories, we created a dummy variable as a control variable in the regression models.
F1000 score as an altmetric indicator is included in the model as a control variable. The rationale behind this is that articles scored in F1000 are recommended as highly important works in the fields of life sciences, health and physical sciences and beyond (Faculty Opinions 2020). Thus, this factor may affect the online visibility of articles in the field, regardless of their open access status.

Hurdle model
Descriptive statistics was used to depict the state-of-the-art of OA articles vs. non-OA articles shared on the seven altmetrics platforms. Two-sample proportion tests were also performed in order to compare the proportion of OA papers shared on different altmetrics platforms in comparison to non-OA articles.
To answer the second research question, given that the dependent variables (altmetric counts) of this study were count data, count regression models were used. As altmetric counts are over-dispersed and include excessive number of zeros, a count model is required to deal with these two issues.
First, a standard negative binomial, a zero-inflated negative binomial and a hurdle negative binomial models were applied. A standard model is frequently used to model overdispersed data. Zero-inflated models are used for overdispersed and excessive zero datasets and assume that there are two types of zeros in the data: zeros which arise from a negative binomial count distribution and zeros which arise from a "perfect-zero" distribution (Hilbe, 2011). Hurdle models measure the likelihood of an observation being positive or zero, and then determine the parameters of the count distribution for positive observations. Thus, a hurdle model comprises two parts: the count model, which is a negative binomial model, and the logit model. The count model predicts changes in non-zero social media counts, whilst the logit model reports the changes in zero social media mentions for a unit change in the open access factors and each of the covariates.
We finally concluded that a negative binomial-logit hurdle model was the best fit for the data as it creates a scenario in which the positive counts follow a Poisson or NB distribution after passing a hurdle to gain positive counts (Didegah, Bowman, & Holmberg 2018).
Therefore, in order to measure the extent to which the OA type (Gold OA vs Green, Bronze, and Hybrid) of an article may affect its social media visibility, researchers ran a a hurdle model to assess the association between OA type of articles and the number of times they were mentioned on each altmetrics platform. The OA type was considered as a binary variable (Gold vs. Non-Gold), because of the small amount of publications that were Green or Hybrid. Moreover, the distribution of Bronze articles on Facebook, news and blogs was too small to be statistically reliable.
Only Twitter, Facebook, news and blogs were considered for the second and third research questions, as there were a significant number of articles visible on these platforms (See Figure 1). To obtain the most reliable results, a number of important factors that possibly had an impact on social media visibility of an article were entered into the model as control variables (See Table 1). Multicollinearity happens when two independent variables/covariates are highly correlated. It is troublesome in regression models because it affects the relationship between the predictors and the output variable (Didegah 2014). A popular test to diagnose multicollinearity is the Variance Inflation Factor (VIF). The VIF is based on the proportion of variance that a predictor variable shares with other predictors in the model. There are several rules of thumb of 4, 10, 20, and over, based on which VIFs over 4, 10, 20, or more are considered to show a high multicollinearity.
The VIF is tested for both the OA status and OA type factors in correlation with the covariates and the results are reported in Table 2. As can be seen, the VIFs of both groups of variables are rather low and they do not exceed any standards. That said, this problem will not affect the results of the regression models.
The results of the two proportion tests also showed that the percentage of OA articles mentioned on altmetrics platforms was significantly higher than that of the non-OA articles [P < 0.0001]. Figure 2 shows the percentage of each type of OA. As can be seen from the figure, Gold OA had the highest percentage of articles being followed by Bronze and Green OA types in the second and third place, respectively.
The detailed results of hurdle models for each platform are as follows.

Twitter
Regarding the count model, both the OA vs. non-OA (hereafter OA status) and Gold vs. non-Gold (hereafter OA type) factors were significantly associated with an increase in the estimated number of received tweets. While a unit change in the OA status increased the estimated number of received tweets by 92.7%, the OA type contributed to 28.4% increase in the estimated received tweet counts. Among the types of OA, Gold OA is found to be a more important factor compared to other types of OA. By a unit change in the factor, which means moving from other types of OA together (Green, Bronze, and Hybrid) to Gold OA, the number of tweets to the article on average will approximately increase by 28.4%. The logit model also confirmed that OA status and OA type of articles were significantly associated with the increased probability of a paper being mentioned in Twitter.    As for the controlled factors in both OA status and OA type models, the count model showed that the journal impact (measured by SNIP), mega journal, individual collaboration, international collaboration, lay summary, F1000 score, and being indexed under 'Psychiatry and Psychology' MESH category significantly contributed to higher number of received tweets, on average. Neither the gender of last author nor the number of mesh categories was found to be an important factor in the OA status and OA types count models (Table 3). Furthermore, as demonstrated by the count and logit models for OA status, Funding had a weak negative association with the average number of tweets counts as well as the probability of being mentioned on Twitter.

Facebook
As can be seen from Table 4, open access articles had a higher average number of Facebook post counts, when controlling for a number of important factors. A unit change in the OA status (changing from non-OA status to the OA status) approximately increased the average number of Facebook posts by 25.7%. This was the weakest association, when compared with the three other social media platforms. According to the logit models, whilst the open access status significantly associated with a higher probability of being mentioned in Facebook, the open access type did not.
Regarding the controlled factors in count models, international collaboration, the number of MESH topics, F1000 score, and Journal impact (SNIP) were found to be associated with a higher number of received Facebook posts.
By looking at the count models for both OA status and OA types, it can be concluded that funding and 'Anatomy' MESH category were found to be negatively associated with the estimated number of received Facebook posts. Finally, the gender of first and last authors were significant factors for the higher number of received Facebook posts in the OA status model (Table 4). However, they were not significant factors for the estimated number of received Facebook posts in the OA type models.

News
As can be seen from both count and logit models for the OA status, for news outlets, the OA status of articles was a significant factor for both a higher probability of a paper being mentioned in news and a higher average number of received news mentions (See Table 5). By increase of a unit in the OA factor (changing from non-OA status to OA status), the estimated number of News mentions to articles increased by 83.9%. This association was higher than the association between the received news mentions and international collaboration, journal impact and other significant factors in the model. However, the OA type was not a significant contributing factor for the estimated number of news mentions or for the probability of being mentioned in news outlets.
For both OA status and OA type models, international collaboration, journal impact (SNIP), number of MESH topics and F1000 score significantly contributed to higher odds of visibility for the articles on News platforms.
Regarding the types of MESH categories, our findings from both logit and count models (OA vs non-OA) showed that articles indexed under specific MESH topics ('Chemicals and Drugs' and 'Analytical, Diagnostic and Therapeutic Techniques') had a news disadvantage in terms of odds of being mentioned in news outlets and the estimated number of received news mentions compared to other MESH topics. Furthermore, the results of logit model for both (OA vs non-OA; Gold vs non-Gold), showed that articles indexed under 'Psychiatry and Psychology' were significantly more likely to be mentioned in news outlets. The gender of first and last authors, lay summary and funding were not significant factors for the estimated number of News mentions received in both OA status and OA type models. However, the gender of first author was significantly associated with a higher probability of being mentioned in news outlets. Interestingly, Funding was significantly associated with lower odds of being mentioned in news outlets in OA status model.

Blogs
The results of count regression models (See Table 6) showed that OA articles on average had significantly a higher number of blog mentions. However, the OA type of articles was not a significant factor for this platform. A unit change in the OA status increased the estimated number of blogs mentions for articles by 48.4%. The logit models showed that both the OA status and OA type significantly associated with higher probability for a paper to be mentioned in blogs. As for the controlled factors, international collaboration, journal impact (SNIP), mega journals, number of MESH topics, lay summary and F1000 score significantly contributed to a higher probability for articles to be mentioned in blogs, both when they were examined together with the OA status and OA type factors. Interestingly, the association of lay summary with blog mentions was highly significant for both OA status and OA type regression models (both count and logit models). In the OA status model, a unit change in the lay summary (that is changing from not Dehdarirad and Didegah: To What Extent Does the Open Access Status of Articles Predict Their Social Media Visibility? A Case Study of Life Sciences and Biomedicine Art. 5, page 11 of 14 having a lay abstract in the article to having a lay abstract) may increase the number of blogs mentions for articles by 37.8% which was higher than the contribution of the international collaboration (14.9%) and journal impact (29.6%).
Regarding MESH topics, the results from both logit and count models (OA vs non-OA; Gold-vs non-Gold) showed that specific MESH topics ('Anatomy', 'Diseases' and 'Chemical and drugs') had a disadvantage in terms of odds of being mentioned in blogs, as well as the estimated number of received blog mentions. The exception to these was the 'Psychiatry and Psychology' MESH category. According to the count and logit models results for OA status, the articles indexed under this MESH category were 65.4% more likely to be mentioned by blogs and to receive 60.3% more blog mentions than the rest of the articles. Furthermore, according to the logit model for OA type, articles that were categorized under the same MESH group, were also 78.1% more likely to be mentioned by blogs. The gender of last authors and funding were not significant factors for the odds of being mentioned in blogs or the estimated number of received blog mentions. However, the gender of first author was weakly associated with the lower odds of being mentioned in blogs for both OA status and OA type models.

Discussion and conclusion
This study aimed to determine whether, and to what extent, the OA status (whether an article is open or closed) and the OA type (whether an article is Gold or non-Gold) of an article in Life Sciences and Biomedicine can predict its social media visibility, when controlling for a number of important factors. These factors were individual collaboration, research funding, number of MESH topics, topic, gender of first and last authors, being a mega journal, international collaboration, lay summary, and F1000 Score.
The findings of our study revealed that the percentage of Life Sciences and Biomedicine OA articles (around 57%) mentioned on social media platforms was significantly higher than that of non-OA articles (around 36%).
Regarding the OA status, our findings showed that being open was significantly associated with a higher probability of a paper being mentioned on the social media platforms studied. Furthermore, open access articles had a higher average number of mentions on the studied platforms. This may provide insight for the European Commission as to whether they should include altmetric indicators as potential metrics for monitoring open science advancement. The highest association between OA status and the estimated number of received mentions was for Twitter (with a likelihood of 92.7% increase in the average number of tweets), whilst the lowest association was for Facebook (with a likelihood of 25.7% increase in the average of Facebook posts). The former finding is in accordance with a large-scale study by Fraser et al. (2019) who found that articles in Medical and Health Sciences received more tweets overall. This shows the importance of making an article open, regardless of type, as this makes it easier for Twitter users to access the full text of articles.
Regarding OA types (studied as Gold vs non-Gold), our findings showed that although Gold OA was the most common OA type in our studied sample, Gold OA was only associated with a higher average of Tweets counts received and a higher probability of a paper being mentioned in Tweets and blogs. While Plan S (and in that Gold OA publishing) is meant to broaden the access and subsequently increase visibility, our findings seem to suggest that social media visibility has not yet been fully reached via Gold OA in the area of Life Sciences and Biomedicine. Our findings about Twitter may be due to the fact that Twitter is a real-time microblog network. Consequently, an article might be tweeted within few hours after publication (Yu et al. 2017). Furthermore, Gold OA publications might be immediately available through preprint repositories, before the official release. This is because policies regarding deposit location, license, and embargo requirements of Gold OA might be less restrictive in comparison to other types of OA. Embargos on Green OA is an example of an access barrier (Laakso 2014). Our findings contradicted that of Holmberg et al. (2020), who found that Gold OA publications had no Twitter advantage for Finnish publications within Medicine and Health Sciences. The difference may result from the different samples studied, especially as Holmberg et al. (2020) investigated a sample of Finnish articles indexed in a national database.
With regard to the covariates for the OA status and OA type models, the results showed that overall, some covariates, such as international collaboration, journal impact and F1000 score were significantly associated with the higher probability of being mentioned on the studied platforms, as well as the estimated number of received tweets, Facebook posts, news posts and blog posts. Additionally, the 'Psychiatry and Psychology' MESH category topic was significantly associated with higher odds of visibility for both OA and Gold OA articles in all social media platforms studied. This finding is in line with Holmberg's et al. (2020) study which found that articles in the field of psychology had a clear OA advantage on Twitter. Our findings regarding news visibility of this MESH topic is in line with Kousha and Thelwall's (2019) study, which found that psychology and psychiatry were among the most frequently cited subject categories in UK newspapers. One reason for our finding, as suggested by Kousha and Thelwall (2019), might be that news outlets prefer to report research findings such as public or mental health issues that have more general interest or benefit to the public. Furthermore, the less restricted access to OA articles in comparison with non-OA articles may make it easier and faster for them to be shared and disseminated.
Our findings also showed that some factors had a higher association, an OA (dis) advantage, or were significant only on certain platforms. Some main examples of these findings are as follows. Lay summary had a high association with the number of received blog mentions (37.8%). In 2013, it was stipulated that all the National Institute for Health Research (NIHR) funded projects required lay summaries. The aim was to ensure that the results of scientific information is disseminated in an accessible and understandable way to the general public (Kirkpatrick, et al. 2017). This is interesting, as with an increase in open access to biomedical research, some scientists and research organizations might be more likely to blog about their research. As another example, our findings showed that while the gender (female) of first author was weakly associated with higher odds of visibility for both OA and gold OA articles in news outlets, it had a weak negative association with the odds of visibility in blogs. Our findings regarding blogs is in line with Paul-Hus' et al. (2015), who found that overall, male first-authored papers had a slightly higher mean number of blogs mentions in different studied disciplines. Our findings regarding news are in contrast with Sotudeh, Dehdarirad, and Freer (2018), who found no difference between female and male first-authored papers in the field of neurosurgery in terms of visibility in news.
Regarding research funding as another example, our findings for OA status models showed that funding had a weak negative association with i) the odds of a paper being mentioned in news outlets and on Twitter ii) the average number of tweets and Facebook post counts. The later finding is in contrast with  which found that funding had a positive association with average number of tweets and Facebook counts received. However, our findings regarding Twitter are in line with Álvarez-Bornstein and Costas (2018) which found that biology was amongst the subject categories with higher funding rates and lower proportion of Twitter mentions.
Collectively, this study provides initial exploratory findings regarding the association between OA (status, type) and social media visibility, where controlling for several important factors. The extent to which the OA factor associates with social media counts may vary by adding or removing factors from the models. However, the current model attempted to control for several important factors. By doing so, we were able to increase the probability of obtaining a more precise and reliable association between the OA factors and the average number of received social media mentions. It is important to consider that our analysis was limited to a sample of data in the area of Life sciences and Biomedicine. Thus, the results obtained in this article are not comprehensive and readers should exercise caution with generalization of the results beyond the case studied.