What Is Societal Impact and Where Do Altmetrics Fit into the Equation?

The expectation that scientific research should provide answers to societal issues and support institutional decision-making is increasing, but still there are no systematic methods of identifying and measuring the wider societal impacts of research. In this article, various views on the meaning of impact, the different types of impact or influence that research can have on the society, and the potential of altmetrics to capture and measure this societal impact will be discussed.

behavioral changes nor as evidence of impact. On the other hand, researchers cannot assume simply publishing their research can contribute to behavioral changes and achieve impact. To put their findings to actual usage (rather than just focus on publishing more research articles), researchers should also promote their work, maximize their dissemination efforts, and engage with the readers (Green 2019).

Societal impact
The terms 'societal impact' and 'social impact' are often used synonymously and interchangeably in the literature, though earlier literature suggests a difference between the two terms. While societal impact refers to the impact of science on various levels and areas of society, social impact often refers to a more personal level of influence, affecting people directly or indirectly (Vanclay et al. 2015). In this text, the term societal impact will be used as an umbrella term to cover all types and forms of impact that research can have at different levels and areas of the society. In other words, societal impact of research designates something beyond scientific impact (Penfield et al. 2014: 21;Wolf et al. 2014: 291), something that is beneficial for other sections of society outside of science (Bornmann 2014). Societal impact assessment could examine the social, cultural, environmental, or economic benefits of research (Bornmann 2014), the environmental and technological benefits (Bond & Pope 2012), or any of the 11 different types of impact (impact on science, technology, economy, culture, society, policy, organizations, health, environment, symbolism, and training) listed by Godin and Doré (2005), depending on the areas covered by the research. It has also been argued the interaction between the research and the public will lead to and determine the outcome of the research (i.e., what kind of influence the research has had and what kind of changes in society it has ignited). With that, the societal impact of science can be seen as bidirectional, suggesting when both parties (science and society) participate in the 'making of impact', the ' amount and density' of impact is likely to grow (Siika-aho 2015: 261). Similarly, the notion of co-creation (a term frequently used in EC Framework Programmes) stems from the ideology that impacts of science are constructed from the interaction between science and society.
Societal impacts of science can be seen in a multitude of places, and different indicators or objects can be seen as evidence of such impacts. As evidence of economic impact, patents can indicate how and where knowledge and business flow from research. They can be the result of academic-corporate collaboration and may show a commercial application of research. Although bibliometric methods have been criticized for not being able to identify and measure all the different aspects of research impact on technology and the economy, patent citations are still considered applicable tools to measure the knowledge flow and interactions between science and technology (Hung 2012). By examining the intensity of research article citations in patents, one is able to reveal scientific excellence in both technology and economic domains (Van Looy et al. 2003). This also helps researchers to better understand the innovation process (Verbeek, Debackere & Luwel 2003). Economics, technological advancements, and science all have mutual reciprocities, which all need to be acknowledged and identified when assessing the economic or technological impact of research. In medicine, on the other hand, Hamers and Visser (2012) have defined societal impact of research as the influence of research on ' clinical practice and healthcare policy and […] on patients' well-being and quality of life' and there is 'mounting quantitative proof of the benefits of medical research to health, society, and the economy' (WHO 2013). Godin and Doré (2005) wrote cultural impact of science refers to 'public understanding of science' (i.e., an individual's understanding and knowledge of science). However, science can influence and inspire popular culture as well, although the connection or path between scientific discoveries and cultural outputs may be even more difficult to identify than other types of impact and the valuation of the cultural outputs that has sprung from science may be debated. With regards to the intersection of science and the arts, there are several examples of artists during the Renaissance era and later who combined scientific studies with art, including Leonardo da Vinci and Johannes Vermeer. Each of these artists utilized science to inform their works, with da Vinci's Sketch of Uterus and Fetus and Vermeer's The Astronomer being just two examples of science informing art (Eskridge 2014). There are also several examples of the impact of science on music, with composers such as Mozart and Bartok utilizing scientific principles in their compositions. Composers such as these have integrated mathematic principles in their music, with Bartok utilizing the golden mean in several of his compositions. In other examples, artists were inspired by advances in science during the moon race of the late 1960s to create rock-n-roll music, such as glam rocker David Bowie's Space Oddity album and Pink Floyd's album Astronomy Domine (Ball 2015, March 20). Such connections between scientific discoveries and societal impacts can, however, be difficult to trace, and it could perhaps be questioned whether all effects of science should be identified and/or quantified. As these examples demonstrate, the complexity of the societal impact of science requires a new way of thinking about impact and new methods for evaluating the various types of impact research may have had.

Evaluating Impact of Science
Researchers can produce many different types of research outputs, from openly available datasets and code to news articles, blog entries, and keynote lectures. While these types of outputs are often difficult to identify and their impact potential is difficult to measure, research evaluations have mainly focused on peer reviewed and published research articles. Peer review is the foundation of all scientific evaluation. Peer review forms the mechanism of scientific quality control, as it is through peer reviewing decisions about which scientific articles are 'good enough' to get published are made and, with that, which articles are included in the common pool of scientific knowledge. Peer review cannot be replaced by quantitative performance measures (Butler 2007), but various quantitative measures can be cost-effective and efficient in other types of research evaluations, such as evaluations of the performance of research groups or universities. This is why citation counts are considered to be the most important research impact indicator (Furnham 1990). Furthermore, citations are also widely acknowledged as indicators of scientific merit, with highly cited authors recognized as having made a more significant contribution to science (Merton 1968). These assumptions are supported by earlier research that have found high-quality articles are indeed usually also cited more often (Lawani 1986;Patterson & Harris 2009) and that articles cited a lot also associate with other quality measures, such as winning awards (Cole 1973). However, the lack of a generally accepted citation theory has been discussed by many scholars (Cronin 1984;Leydesdorff 1998;Zuckerman 1978), and the complexity of identifying motivations and the rationale for the act of citing is high. The motivations to cite vary greatly between researchers and also between cited works (MacRoberts & MacRoberts 1989). All citations are, for instance, not positive acknowledgments of quality or of value, as citations can also criticize the earlier work (Murugesan & Moravcsik 1978). Moreover, citations are not immediate metrics due to publication and indexing policies; thus, it can take a long time for a research paper to receive its first citation (if it even gets cited at all) after it has been published.
According to Vanclay, societal impact assessment involves analysing, monitoring, and managing the intended and unintended, positive and negative, social consequences of development [concerning] changes in societies' way of life, culture, community, political system, environment, health and wellbeing, personal and property rights, fears, or aspirations (as cited in Bradbury-Jones & Taylor, 2014: 46-47).
With that, in impact evaluation one of the core evaluation questions is, according to Streatfield and Markless (2009), how one can tell if they are truly making a difference to their users. This leads to efforts trying to identify changes in the user's (of science) behavior (doing things differently), competence (doing things better), and levels of knowledge and attitudes (e.g., confidence). In scientific impact assessment, a citation could, in this sense, be considered an indication of increased level of knowledge as the researcher is citing earlier work and, with that, acknowledging he or she is using the earlier research work. But as the societal impact of research is difficult to identify and measure and often it may be difficult to pinpoint what type of impact research has had and on whom, or which specific research has led to a specific impact, there is a danger of focusing disproportionately on quantifiable aspects, such as the commercialization of science or other financial benefits (Russell Group Papers 2012). Or as Godin and Doré (2005) have noted, other dimensions seem to be missing from the picture as most research, when identifying societal impact, references only economic impact. Hicks and Wouters (2015), in the Leiden manifesto, warn of Impact-factor obsession, noting easy-toaccess metrics may lure one to measure only what is available. This may lead to a real danger of neglecting the funding of basic research (Shapiro & Taylor 2013) as the outcomes of applied research may be easier to predict and assess.

Systems to assess societal impact
Impact, and impact assessments, can be divided into potential impact (ex ante impact) and realized impact (ex post impact) (i.e., what the impact might be and what the impact has been, respectively). While ex ante impact assessment focuses on the possibilities to transform important societal questions and problems into research questions and the abilities to answer them and are thus predictive in their nature, ex post impact assessment would use existing evidence and recorded performance to identify how well the research has been able to answer those questions and translate the findings into practical solutions and policy decisions and ultimately, changes in the behavior of those affected by the research directly or indirectly. Both types of impact are used in research assessment and in decisions about research funding. A research group applying for funding for a research project would be assessed ex ante (i.e., the assessment would try to forecast the capacity potential of the research group to accomplish the set research goals). Universities, on the other hand, are assessed ex post (i.e., their past performance is assessed in order to make future funding decisions). The National Science Foundation (NSF) in the US reviews research funding proposals in the light of two aspects: intellectual merit and broader impacts. The intellectual merit component refers to the potential of the proposed research to advance scientific knowledge, and the broader impacts ' encompasses the potential to benefit society and contribute to the achievement of specific, desired societal outcomes' (NSF 2013). Therefore, the reviewing process combines both ex ante and ex post assessments or, in other words, the process evaluates past performance and forecasts potential impact. While peer review ensures the quality of research, it becomes a prediction or a guess of future potential when used for decisions about research funding, thus placing the reviewers in the role of 'unwilling futurologists' (Rip 2000). To judge the societal impact of research proposals makes reviewing even more complicated as it takes the reviewers beyond their disciplinary expertise, but 'unless scientists embrace their own ability to judge impact, their role in the decision-making process will increasingly be transferred to others' (Holbrook & Frodeman 2011: 245). Holbrook and Frodeman (2011: 245) also argued scientists ought to play a central role in determining what research gets funded. But this will only continue to be possible if scientists also embrace the fact that their research can be judged on its potential societal impacts as well as its intrinsic intellectual merit.
Earlier projects, such as ERiC (Van der Meulen 2010), SIAMPI (SIAMPI 2012), ASIRPA (Joly et al. 2015), and UNICO (Holi, Wickramasinghe & van Leeuwen 2008) have identified and assessed a significant number of quantitative and qualitative indicators that can be used to measure research impact, or some aspects thereof, on different areas of society. For instance, ERiC and ASIRPA divide indicators into categories such as dissemination of knowledge (e.g., publications, advisory activities, number of PhDs, conference presentations), interest of stakeholders (e.g., funding, collaboration, staff exchanges, consortium partnerships), and impact and use of results (e.g., public debates and media appearance, patens, spin-offs). UNICO takes a different approach and focuses on knowledge transfer, listing potential indicators connected to networks in which researchers operate, professional development, collaborative research, contract research, spin-outs, teaching, and many other measures. The multitude of possible measures listed in approaches such as these highlight the complexity of the possible influences that research can have on society. While approaches such as those presented by SIAMPI and ASIRPA focus on the processes and interactions, other approaches try to introduce more multi-faceted assessment systems that take into account both quantifiable indicators and narratives in the form of case studies. The Payback Framework approach by Buxton and Hanney (1996) allows for narratives to be put forward, thus giving researchers a chance to explain and demonstrate the impact their research has had on society. The Payback Framework was one of the first research assessment tools to take both scientific and societal impact into account in the evaluation, specifically in the case of health sciences. Donovan and Hanney (2011) explained how the Payback Framework consists of a model incorporating the complete research process from the inception of a research idea to the dissemination of the results and, eventually, to the final outcomes of wider societal benefits. According to Donovan and Hanney (2011: 181), 'its multi-dimensional categorization of benefits from research starts with more traditional academic benefits of knowledge production and research capacity-building, and then extends to wider benefits to society.' The Payback Framework (Buxton & Hanney 1996) uses an outcome-based approach, including a multitude of different methods for data collection and data analysis, documentary and literature reviews, interviews, and bibliometric analyses (Hanney et al. 2004;Samuel & Derrick 2015). Furthermore, the inclusion of narratives allows highlighting types of impact that would not be identified using more traditional impact indicators. Perhaps the most current, and certainly the largest and most followed, example of such approaches in research assessment in recent years has been the Research Excellence Framework (REF) 2014 in the UK, in which almost 7,000 case studies in the form of narratives were assessed by more than 1,000 assessment panel members. An outcome-based evaluation is a method of program evaluation that can be used to determine how to implement projects, to ascertain if the desired outcomes were achieved, and to determine the overall societal impact (Westat 2010). The approach highlights the importance of preset goals and the use of data sources and methods that are suitable for assessing how well those goals were met. The models used in this approach typically consist of inputs (e.g., money, time), activities (e.g., projects, services), outputs (e.g., the products of the activities), and short-, medium-, and long-term outcomes (that can overlap to some degree). Short-term outcomes often demonstrate changes in awareness, skills, or knowledge, while changes in behavior, knowledge, or attitudes are considered intermediate outcomes, and changes in attitudes, values, conditions, and life status are long-term goals (McNamara 2015). In addition to outcomes, outputs are also measured in outcome-based evaluations. In Walter et al. (2007), outputs are defined as the immediate, tangible results of a research project, including workshops, meetings, reports, and other publications. Impacts are intermediate effects, such as changes in knowledge, attitudes, and behavior, while outcomes are long-term effects that meet the set goals of the research project. Outcomes represent changes in the policy, which have steered the behavior of a wider public and thus had wide impact. When interviewing REF2014 evaluators about impact, Samuel and Derrick (2015) found a majority of the 62 interviewed evaluators in fact viewed impact as an outcome, emphasizing that counting or assessing the research outputs do not convey much about their impact or of the outcome of the research. Focusing on the research outputs would therefore not tell anything about the resulting outcomes of the research or of the impact they have had on society.

Potential of and Challenges with Altmetrics
Altmetrics captures the mentions of research outputs in social media and elsewhere online. They could potentially reveal something about the influence or impact research has made (Priem et al. 2010). Shema, Bar-Ilan, and Thelwall (2014) defined these new metrics as 'web-based metrics for the impact of scholarly material, with an emphasis on social media outlets as sources of data' (e.g., Twitter, Facebook, blogs, LinkedIn, YouTube, Reddit, Wikipedia, mainstream media). Altmetrics data are the aggregated views, mentions, downloads, shares, discussions, and recommendations of research outputs across the scholarly web (Fenner 2014), as well as citations and mentions in more non-academic communications, such as public policy documents, online syllabi, patent applications, and clinical guidelines (Bradbury-Jones & Taylor 2014). With that, altmetrics capture a wide variety of interactions from an equally wide variety of different online data sources. Furthermore, as altmetrics events occur quickly after the research articles are published, altmetrics could provide faster means to measure how the public reacts to research (Barnes 2015) and possibly to 'provide evidence of the reach, uptake, and diffusion of research' (Dinsmore, Allen & Dolby 2014).
Much of the earlier research on altmetrics has focused on studying correlations between different altmetrics and citations, finding some evidence of a connection between the two (e.g., Haustein, Costas & Lariviére, 2015;Mohammadi et al. 2015;Thelwall et al. 2013). Based on these results, altmetrics could perhaps roughly be divided into two groups: data sources that reflect certain aspects of new forms of scholarly communication (those most similar to citations) and those that reflect some other aspects of how scientific information is shared, received, discussed, and used, possibly by a predominantly non-academic audience and that, therefore, could complement more traditional metrics of research impact (those that least resemble citations). Altmetrics events identified on different platforms could thus provide evidence of different types of impact. For instance, sites such as Mendeley, that have shown high similarity between reader counts and later citation counts (and are predominantly used by researchers) might be used as evidence of possible future citation counts (Thelwall 2018), thus providing evidence of future scientific impact. As we also know who the primary audiences of online syllabi are and why the research articles and books have been listed in the syllabi, we can assume with fairly strong confidence that syllabi could be analyzed for educational impact of research (Kousha & Thelwall 2016). In a similar way, we could analyze the mentions of research articles in clinical guidelines as evidence of impact on well-being and health. On the other hand, more general social media sites used by a wider audience (including, but not limited to, researchers) could be able to reflect wider societal impact of research. For altmetrics to be a reliable source for evidence of the societal impact of research the data must contain information about 1) how one's knowledge has increased or how one's behavior has changed because of research (derived from the two functions of science according to Russell (1952Russell ( /2016) and 2) identifying whose knowledge has changed or whose behavior has changed. If it is unclear who has been influenced by research, it is unclear what areas of society (if any) have been influenced. In addition, for altmetrics to be attributed to a specific research there must be a clear path between research outputs and the outcomes of the research. In other words, there has to be evidence of changes in behavior or increases in knowledge that can be traced back to a specific research object.
The review by Sugimoto et al. (2017) highlights the heterogeneity of the online platforms from which altmetrics are generated, extending to both the underlying actions and the intentions and motivations behind those actions. For instance, the act of citing a research article on Wikipedia is most likely motivated by different intentions than mentioning or sharing a research article on Twitter or Facebook. In fact, even different actions within a single platform may be motivated by different objectives, such as tweeting and retweeting a research article may be, as the research article triggers tweeting while the retweeting is triggered by the tweet about the research article. The heterogeneity of altmetrics is perhaps its greatest promise, as different altmetrics could potentially reflect different forms and various levels of engagement with research outputs (Haustein, Bowman & Costas 2016). A simple tweet could reflect awareness, while a blog entry could reflect deeper levels of engagement. But regarding the content of tweets, tweets are restricted by length, which places a restriction also on the amount of evidence that can be found about possible impact that specific research has made. In fact,  found the majority of tweets that mention scientific articles are ' devoid of original thought' and they are just mechanical actions of forwarding the information. In only about 10% of their sample could evidence of original thought and commentary about the research be found. Simple mentions of research outputs, no matter who wrote them, do not necessarily disclose anything about the kind of impact the research has had or whether the person that has seen the research output has changed his or her behavior in any way. A simple tweet mentioning a research output is not evidence the person tweeting about it has even read it, and a retweet of a tweet mentioning a research output may provide even less evidence. Still, Twitter is one of the biggest (as measured by the number of identified altmetrics events) altmetrics data sources (Thelwall et al. 2013), with millions of research outputs being disseminated in tweets and retweets. It has been found, however, many of the tweets that mention scientific articles may be sent by researchers themselves (Birkholz, Seeber & Holmberg 2015;Tsou et al. 2015;Vainio & Holmberg 2017). In fact, Sugimoto et al. (2017) argue 'social media has rather opened a new channel for informal discussions among researchers, rather than a bridge between the research community and society at large.' If this is the case, then altmetrics events may not express societal impact, but rather reflect new forms of scholarly communication. But many social media users prefer to remain anonymous, using nicknames or pseudonyms in their profiles and refraining from revealing any personal information in their profiles. Although groups of users can be identified to some degree , determining on whom or what the research has influenced remains difficult at best. In addition, a great deal of the content on social media in general and Twitter in particular is generated by automated accounts or so-called bots (Gilani et al. 2017;Wojcik et al. 2018) that may be difficult to identify. How prevalent bots are in generating and disseminating scientific content is unknown, but they certainly have some effect  on overall counts. On Wikipedia, for instance, it has been discovered approximately 15% of the articles on average have been edited by bots (Steiner 2014), but on certain language versions, this number may be much higher. For instance, the bot called Lsjbot (https://sv.wikipedia.org/wiki/Anv%C3%A4ndare:Lsjbot) created and edited over 17 million articles in the Swedish, Cebuano, and Waray language versions of Wikipedia. Bots backed by artificial intelligence are also writing hundreds of news articles for mainstream media (Tatalovic 2018), and the amount of online content created by bots is rapidly increasing due to technological advances. Bots mentioning research articles would most likely not exhibit any measurable evidence of how the research they mention has made changes to the society, yet the acts would be counted as evidence of attention or impact if only the quantifiable events were assessed. In addition, several earlier studies have pointed out limitations associated with data collection and quality (e.g., Bornmann 2014; Wouters & Costas 2012;Zahedi, Fenner & Costas 2014). Most of the actions generating altmetrics on different platforms are identified by the mentions of unique object identifiers (such as DOIs) that are attached to research outputs (mainly research articles). The mention of a unique identifier shows a direct path between the specific altmetrics event and the research article, but unique identifiers, such as DOIs, are not always attached to online conversations about scientific research (Haustein 2016), neither do all research articles have DOIs attached to them. Furthermore, using the unique identifiers to identify online discussions about research loses a great deal of the surrounding conversations that do not include the identifiers, thus a complete conversation surrounding the research article is not captured. With the data collection issues, potential presence of bots and the lack of original thought in online messages mentioning research articles, it may be challenging to find any demonstrable evidence of behavioral changes or of the impact research may have had on society. But altmetrics may also be used to show the networks where research is being communicated and point to the actors engaged in these conversations. Altmetrics may be best suited to map the networks where research is being disseminated and discussed and to track where and how researchers engage with the public (Haustein in press;Holmberg et al. 2014;Robinson-Garcia, van Leeuwen & Rafols 2017) and, through that, hinting at societal influences of research.

Conclusions
The goal of assessing the societal impact of research is to identify and measure how a specific research document has been used and what kind of influence it has had, not just within academia, but also beyond. Altmetrics are currently being investigated for that purpose, if and how they could be used to assess societal impact of research. This leads to the following question: Would an evaluation system that took societal impact into account favor researchers who were better at communicating their research or that would have the means to employ the help of professionals to design and execute a communication strategy? There is a real danger that some researchers will begin to manipulate the attention their work receives online if altmetrics indicators were integrated into the scientific reward system. This could happen in many forms, including writing document titles that appeal to larger audiences, which would make them more likely to be shared, or the creation of automated bots that would automatically disseminate information about the research articles. Before any altmetrics can be used for research assessment, instruments to detect this kind of intentional manipulation of the altmetrics acts need to be in place. Furthermore, it needs to be specified what would account as intentional manipulation and what would be counted as normal scientific communication.
Impact of research can be considered as the outcome of research or how the increased knowledge from research has led to a change in some area of society or possibly on science itself. As discussed earlier in this text, a requirement of impact (as defined by funders) is that it is demonstrable and the influences of research can be identified. This entails the path from specific research to the outcomes or the influences of it be identifiable and demonstrable. With an increasing demand for evidence of wider societal impact of research, many researchers are investigating altmetrics as a potential data source for evidence of societal impact and creating new indicators that utilize these new online data sources. The potential societal impacts of research are, however, often less tangible than scientific impact of research, which can be traced through citations. While the DOIs, when present, can be used to identify mentions of specific research articles, in many types of altmetrics it can be difficult to determine who the users or groups of users are that are generating the altmetrics by interacting with research outputs and equally difficult may be to determine their motivations to do so. This makes it difficult to judge on whom the research has potentially had some influence or who has at least become aware of it. Furthermore, the actions generating altmetrics may often be 'void of original thought', as the users are only forwarding and sharing information about scientific articles without discussing them, making it difficult to judge what kind of influence, if any, the research has had. Some researchers have even suggested altmetrics should not be considered as impact metrics but rather as indicators of attention (Crotty 2014;Sugimoto 2015). Another challenge we must not forget is the dynamic nature of the web and, with that, altmetrics. Any assessment using altmetrics would be an analysis of a situation only in a specific moment in time, a snapshot of online data, while more dynamic approaches would be required to capture the dynamic nature of the data and the actions generating altmetrics on different online platforms. With these challenges, two important questions still remain, answers to which determine the applicability and reliability of altmetrics for research assessment purposes: 1) Who are the users generating altmetrics acts, and 2) Where is the evidence of impact? Ideally, an altmetrics event would include information and evidence about both, but this rarely seems to be the case. Perhaps due to its bibliometric roots, in altmetrics research the focus often seems to be in quantifying online events connected to research outputs and in creating new indicators from the collected data. But because of the many uncertainties with and the dynamic nature of the data, we may need to come up with new approaches and new research questions that are more suitable to fully take advantage of the rich data that is altmetrics. It may not be possible to aggregate meaningful indicators from the online data that is available, but instead the data may be able to answer new types of questions about new forms of scholarly communication and societal impact of research, questions that we have not yet come to ask. Aggregated factors of impact may not be a fruitful way to utilize the rich data that is altmetrics. Instead, our best bet may be to use the data to examine the social networks in which impact is created.
On the other hand, demands from the funders for researchers to plan for 'pathways to impact' or to demonstrate a plan on how to communicate research findings to audiences beyond academia forces researchers to think beyond the specific research outputs and to think about their potential research outcomes as narratives. Researchers are thus increasingly expected to engage with the public and to communicate their research to audiences beyond academia. That, even if not measurable, certainly leads to increased societal impact.