-
PDF
- Split View
-
Views
-
Cite
Cite
Chuang Tang, Shaobo (Kevin) Li, Suming Hu, Fue Zeng, Qianzhou Du, Gender disparities in the impact of generative artificial intelligence: Evidence from academia, PNAS Nexus, Volume 4, Issue 2, February 2025, pgae591, https://doi.org/10.1093/pnasnexus/pgae591
- Share Icon Share
Abstract
The emergence of generative artificial intelligence (AI) tools such as ChatGPT has substantially increased individuals’ productivity. In this study, we adopt a difference-in-differences approach to analyze a large dataset of research preprints to systematically examine whether the advent of generative AI has distinct effects on the productivity of male and female academic researchers. We find that after the emergence of ChatGPT, the increase in the productivity of male researchers is 6.4% higher than that of female researchers, implying a widening of the productivity gap between them. We then conduct a survey about researchers’ use of ChatGPT and find that male researchers use generative AI more frequently and experience higher efficiency improvement from its use than female researchers. Our findings show the unintended consequences of generative AI and point to the need for institutions to consider its differential effects on productivity to ensure fairness when evaluating faculty members.
The rise of generative artificial intelligence (AI) has brought significant enhancements in productivity across various fields. However, within academia, its impact appears to be uneven across genders. In our study, we conduct a difference-in-differences analysis of a large dataset of research preprints to examine the influence of generative AI tools on research productivity. The analysis reveals that since the release of ChatGPT, the increase in the productivity of male researchers is higher by 6.4% than that of female researchers. The study not only highlights the widening of the gender gap in productivity induced by generative AI, but also underscores the necessity for academic institutions to consider these disparities in their evaluation and support systems to foster an equitable research environment.
Introduction
Generative artificial intelligence (AI) has brought great changes to the society. It has dramatically influenced various fields, including education, entertainment, and academic research (1–10). In the academic community, it is used by researchers to release their research burden by automating the processes of data collection, literature reviews, research design, and data analysis. Consequently, generative AI liberates researchers from time-consuming and repetitive tasks, allowing them to focus on the more innovative aspects of their work. It is reported that two scientists managed to produce a research paper in less than an hour with the help of ChatGPT (11). An experimental study shows that generative AI can substantially raise productivity in professional writing (7). Furthermore, a study argues that economists can reap significant productivity gains by leveraging generative AI to automate microtasks (12).
Given the immense potential of generative AI in academic writing, an increasing number of researchers are incorporating it into their regular workflow. In a study examining the responses of 672 readers of Nature to an online questionnaire, around 80% of the respondents report using ChatGPT or a similar AI tool at least once (13).
However, due to its novelty and the uncertainty of the quality of its output in the context of conducting research, there still exist doubts among researchers regarding the use of generative AI (14). Because different groups of people may have different rates of adoption of generative AI, its benefits are likely to be distributed unevenly across various groups. Concerns have been raised about how the adoption of generative AI in the workplace might worsen existing workplace inequalities (15). Capraro et al. (16) describe the important impacts of generative AI on the workplace from various aspects, especially the potential uneven benefits for different groups of people. People's intention to adopt ChatGPT is significantly influenced by factors such as job satisfaction, organizational culture, social influence, and demographics (17).
Anecdotal evidence suggests that a gender gap exists in terms of attitudes toward AI (15), with men exhibiting more positive attitudes toward AI than women. Two surveys (17, 18) conducted in March and July of 2023, respectively, report that men are more likely than women to have used ChatGPT by then. Another survey (19) conducted in the workplace finds a gender gap in ChatGPT use: 65% of male professionals have tried ChatGPT compared with 47% of female professionals. Archival data (20) on the demographics of ChatGPT users also exhibits the same trend. This gender difference in attitudes toward generative AI may lead to different adoption rates among male and female researchers. That is, male researchers may be more likely to utilize generative AI tools in their research process. Furthermore, the gender difference may result in differences in patterns in the usage and efficiency in the use of generative AI to assist academic writing between male and female researchers. These differences imply that the emergence of ChatGPT may have generated unequal benefits in terms of research efficiency for male and female researchers and thus may have enlarged the gender gap in research productivity.
However, there is a dearth of empirical evidence on whether and how ChatGPT influences the research productivity of male and female researchers differently. Therefore, in this study, our research question is the following: Has the advent of ChatGPT resulted in an unequal impact on the research productivity of male and female researchers?
To answer this research question, we conduct two studies. In study 1, we collect data on preprints uploaded to the Social Science Research Network (SSRN) from 2022 May to 2023 June. The SSRN is a repository for preprints devoted to the rapid dissemination of scholarly research and it was ranked the largest open-access repository in the world by Ranking Web of Repositories in 2012 (21). We employ a difference-in-differences (DiD) approach to estimate the gender difference in research productivity. The results show that after the emergence of generative AI, male authors experience a 6.4% greater increase in research productivity than female authors, revealing an increase in the productivity gap between them. And we conduct a difference-in-differences-in-differences (DDD) analysis by including ChatGPT's penetration metrics for each country as the third dimension. We find that for countries in which ChatGPT has a high penetration, the gender disparity in research productivity is more pronounced. Additional analyses reveal considerable heterogeneity in this effect across countries and disciplines. In particular, the effect is more pronounced in some countries, such as the United States, Spain, and Australia. Furthermore, we find that the use of generative AI does not influence research quality.
In study 2, we conduct a survey to collect behavioral and attitudinal data on the usage of generative AI tools among researchers. The survey shows that male researchers, compared with female researchers, spend more time using generative AI and use it more frequently, leading to greater improvement in efficiency and stronger intention to recommend its use to colleagues among them.
Taken together, these results suggest that whereas both female and male researchers have opportunities to use generative AI to assist their research, male researchers are more productive in terms of producing papers, and this intensified disparity is primarily associated with divergent behavior regarding the use of generative AI between male and female researchers.
Study 1: analysis of SSRN preprint data
Data collection and preliminary evidence
We first collect data from SSRN on preprints submitted between 2022 May and 2023 June. We extract information on paper titles and the authors’ names, affiliations, and addresses, which we use to identify the authors’ countries, their academic positions, and the rankings of their institutions. We use their first names to identify their gender. The summary statistics of the variables obtained are presented in Table 1. The average probability of an author uploading a preprint in a month is 0.065.
Variable . | Obs. . | Mean . | SD . | Min. . | Max. . |
---|---|---|---|---|---|
Upload | 684,124 | 0.065 | 0.246 | 0 | 1 |
Post | 684,124 | 0.500 | 0.500 | 0 | 1 |
Male | 669,284 | 0.670 | 0.470 | 0 | 1 |
Average Number of Pages | 684,124 | 2.425 | 11.649 | 1 | 1,444 |
Average Number of Downloads | 684,124 | 7.687 | 490.865 | 0 | 118,025 |
Average Number of Views of Abstract | 684,124 | 31.167 | 2,172.606 | 0 | 645,976 |
Number of Previous Preprints | 684,124 | 7.196 | 14.961 | 1 | 599 |
Number of SSRN Citations | 684,124 | 40.88 | 293.862 | 0 | 19,270 |
Number of CrossRef Citations | 684,124 | 32.323 | 284.372 | 0 | 19,118 |
Variable . | Obs. . | Mean . | SD . | Min. . | Max. . |
---|---|---|---|---|---|
Upload | 684,124 | 0.065 | 0.246 | 0 | 1 |
Post | 684,124 | 0.500 | 0.500 | 0 | 1 |
Male | 669,284 | 0.670 | 0.470 | 0 | 1 |
Average Number of Pages | 684,124 | 2.425 | 11.649 | 1 | 1,444 |
Average Number of Downloads | 684,124 | 7.687 | 490.865 | 0 | 118,025 |
Average Number of Views of Abstract | 684,124 | 31.167 | 2,172.606 | 0 | 645,976 |
Number of Previous Preprints | 684,124 | 7.196 | 14.961 | 1 | 599 |
Number of SSRN Citations | 684,124 | 40.88 | 293.862 | 0 | 19,270 |
Number of CrossRef Citations | 684,124 | 32.323 | 284.372 | 0 | 19,118 |
Variable . | Obs. . | Mean . | SD . | Min. . | Max. . |
---|---|---|---|---|---|
Upload | 684,124 | 0.065 | 0.246 | 0 | 1 |
Post | 684,124 | 0.500 | 0.500 | 0 | 1 |
Male | 669,284 | 0.670 | 0.470 | 0 | 1 |
Average Number of Pages | 684,124 | 2.425 | 11.649 | 1 | 1,444 |
Average Number of Downloads | 684,124 | 7.687 | 490.865 | 0 | 118,025 |
Average Number of Views of Abstract | 684,124 | 31.167 | 2,172.606 | 0 | 645,976 |
Number of Previous Preprints | 684,124 | 7.196 | 14.961 | 1 | 599 |
Number of SSRN Citations | 684,124 | 40.88 | 293.862 | 0 | 19,270 |
Number of CrossRef Citations | 684,124 | 32.323 | 284.372 | 0 | 19,118 |
Variable . | Obs. . | Mean . | SD . | Min. . | Max. . |
---|---|---|---|---|---|
Upload | 684,124 | 0.065 | 0.246 | 0 | 1 |
Post | 684,124 | 0.500 | 0.500 | 0 | 1 |
Male | 669,284 | 0.670 | 0.470 | 0 | 1 |
Average Number of Pages | 684,124 | 2.425 | 11.649 | 1 | 1,444 |
Average Number of Downloads | 684,124 | 7.687 | 490.865 | 0 | 118,025 |
Average Number of Views of Abstract | 684,124 | 31.167 | 2,172.606 | 0 | 645,976 |
Number of Previous Preprints | 684,124 | 7.196 | 14.961 | 1 | 599 |
Number of SSRN Citations | 684,124 | 40.88 | 293.862 | 0 | 19,270 |
Number of CrossRef Citations | 684,124 | 32.323 | 284.372 | 0 | 19,118 |
Figure 1 plots the time trend of the number of preprints submitted on SSRN by male and female researchers from 2022 May to 2023 June. The vertical line represents 2022 November 30, the date on which ChatGPT was officially released. The figure shows that, on average, male researchers submit more preprints than female researchers and that the trends in their research productivity are parallel before the launch of ChatGPT. After the release of ChatGPT, however, the productivity of male researchers significantly increases relative to that of female researchers, indicating an increased gap in their productivity.

It is worth noting that there is a time lag between the release of ChatGPT and the increase in researchers’ research productivity. It takes time for people to familiarize themselves with the functionality of ChatGPT and incorporate it into their daily workflow. Furthermore, some people may hesitate to be among the first to adopt this new technology. However, when they understand its benefits and observe an increasing number of people using it to increase work efficiency, they are more likely to begin to use it themselves.
Main results
We use a DiD approach to analyze the differential impacts of generative AI on male and female researchers’ productivity.a Table 2 reports the estimated effect of the emergence of ChatGPT on male and female researchers’ productivity. The analysis presented in column (1) includes author-related controls, and column (2) shows the results for the model using author fixed effects. As shown in the two columns, the increase in the probability of male authors uploading a preprint to SSRN each month is greater by 0.004 (equivalent to 6.4%) than that of female authors in the period after ChatGPT became publicly accessible. In addition, a back-of-the-envelope calculation shows that the productivity gap between male and female researchers widens by 57.1% (from 0.007 to 0.011). It is possible that the emergence of ChatGPT is related to a sharp increase in the number of papers about ChatGPT, which might confound the main results. Therefore, in column (3) of Table 2, we exclude papers related to ChatGPT; in column (4), we exclude all papers in computer science discipline. We rerun the DiD analysis and find that the main findings hold, indicating that the increase in gender inequality is not simply the result of an increase in the number of papers about ChatGPT.
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
All sample . | All sample . | Excluding ChatGPT . | Excluding Computer Science . | |
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.004*** | 0.004*** | 0.004*** | 0.004*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Constant | 0.019*** | 0.063*** | 0.063*** | 0.062*** |
(0.001) | (0.000) | (0.000) | (0.000) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | |
Author-related controls | √ | |||
Observations | 669,284 | 669,284 | 669,284 | 669,284 |
R2 | 0.641 | 0.084 | 0.083 | 0.086 |
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
All sample . | All sample . | Excluding ChatGPT . | Excluding Computer Science . | |
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.004*** | 0.004*** | 0.004*** | 0.004*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Constant | 0.019*** | 0.063*** | 0.063*** | 0.062*** |
(0.001) | (0.000) | (0.000) | (0.000) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | |
Author-related controls | √ | |||
Observations | 669,284 | 669,284 | 669,284 | 669,284 |
R2 | 0.641 | 0.084 | 0.083 | 0.086 |
Notes: Dependent variable is Upload. Column (1) reports the DiD results for all sample by including author-related controls. Column (2) reports the DiD results for all sample by author fixed effects. Column (3) excludes preprints related to ChatGPT. Column (4) excludes preprints in computer science. Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
All sample . | All sample . | Excluding ChatGPT . | Excluding Computer Science . | |
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.004*** | 0.004*** | 0.004*** | 0.004*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Constant | 0.019*** | 0.063*** | 0.063*** | 0.062*** |
(0.001) | (0.000) | (0.000) | (0.000) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | |
Author-related controls | √ | |||
Observations | 669,284 | 669,284 | 669,284 | 669,284 |
R2 | 0.641 | 0.084 | 0.083 | 0.086 |
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
All sample . | All sample . | Excluding ChatGPT . | Excluding Computer Science . | |
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.004*** | 0.004*** | 0.004*** | 0.004*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Constant | 0.019*** | 0.063*** | 0.063*** | 0.062*** |
(0.001) | (0.000) | (0.000) | (0.000) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | |
Author-related controls | √ | |||
Observations | 669,284 | 669,284 | 669,284 | 669,284 |
R2 | 0.641 | 0.084 | 0.083 | 0.086 |
Notes: Dependent variable is Upload. Column (1) reports the DiD results for all sample by including author-related controls. Column (2) reports the DiD results for all sample by author fixed effects. Column (3) excludes preprints related to ChatGPT. Column (4) excludes preprints in computer science. Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
Robustness checks
In this section, we conduct a series of tests to verify the robustness of the results.
First, because we use an indicator variable in the main analysis for whether an author uploads any preprints in a month as the dependent variable, we rerun the analysis using two alternative dependent variables: Monthly Productivity and Monthly Effective Productivity. Monthly Productivity is measured as the number of preprints uploaded by each author in each month. To calculate Monthly Effective Productivity, for a preprint that has n authors, each author is given an effective productivity of 1/n; this value is then aggregated at the monthly level for each author to better capture the actual contributions of the author (22). As shown in columns (1) and (2) of Table 3, the main findings hold in both the analyses.
(1) . | (2) . | (3) . | (4) . | (5) . | (6) . | (7) . | |
---|---|---|---|---|---|---|---|
Alternative Dependent Variable . | Alternative Dependent Variable . | Logit Regression . | Poisson Regression . | Negative Binominal Regression . | Alternative Definition of Productivity . | Alternative Definition of Productivity . | |
Productivity . | Effective Productivity . | Upload . | Productivity . | Productivity . | log(Num Downloads) . | log(Num Views) . | |
Male * Post | 0.006*** | 0.003*** | 0.068*** | 0.081*** | 0.068*** | 0.010** | 0.015** |
(0.002) | (0.001) | (0.022) | (0.024) | (0.021) | (0.004) | (0.006) | |
Constant | 0.073*** | 0.031*** | −1.911*** | −0.776*** | 0.209*** | 0.319*** | |
(0.001) | (0.000) | (0.009) | (0.047) | (0.001) | (0.002) | ||
Month fixed effects | √ | √ | √ | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ | √ | √ | √ |
Observations | 669,284 | 669,284 | 430,878 | 430,892 | 430,892 | 669,284 | 669,284 |
R2 | 0.124 | 0.139 | 0.106 | 0.098 |
(1) . | (2) . | (3) . | (4) . | (5) . | (6) . | (7) . | |
---|---|---|---|---|---|---|---|
Alternative Dependent Variable . | Alternative Dependent Variable . | Logit Regression . | Poisson Regression . | Negative Binominal Regression . | Alternative Definition of Productivity . | Alternative Definition of Productivity . | |
Productivity . | Effective Productivity . | Upload . | Productivity . | Productivity . | log(Num Downloads) . | log(Num Views) . | |
Male * Post | 0.006*** | 0.003*** | 0.068*** | 0.081*** | 0.068*** | 0.010** | 0.015** |
(0.002) | (0.001) | (0.022) | (0.024) | (0.021) | (0.004) | (0.006) | |
Constant | 0.073*** | 0.031*** | −1.911*** | −0.776*** | 0.209*** | 0.319*** | |
(0.001) | (0.000) | (0.009) | (0.047) | (0.001) | (0.002) | ||
Month fixed effects | √ | √ | √ | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ | √ | √ | √ |
Observations | 669,284 | 669,284 | 430,878 | 430,892 | 430,892 | 669,284 | 669,284 |
R2 | 0.124 | 0.139 | 0.106 | 0.098 |
Note: Columns (1) and (2) report the DiD results when dependent variables are Monthly Productivity and Monthly Effective Productivity. Columns (3), (4) and (5) report the DiD results when estimation models are Logit, Poisson and Negative Binominal Regressions. Columns (6) and (7) report the DiD results when dependent variables are log(NumDownloads) and log(NumViews). Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
(1) . | (2) . | (3) . | (4) . | (5) . | (6) . | (7) . | |
---|---|---|---|---|---|---|---|
Alternative Dependent Variable . | Alternative Dependent Variable . | Logit Regression . | Poisson Regression . | Negative Binominal Regression . | Alternative Definition of Productivity . | Alternative Definition of Productivity . | |
Productivity . | Effective Productivity . | Upload . | Productivity . | Productivity . | log(Num Downloads) . | log(Num Views) . | |
Male * Post | 0.006*** | 0.003*** | 0.068*** | 0.081*** | 0.068*** | 0.010** | 0.015** |
(0.002) | (0.001) | (0.022) | (0.024) | (0.021) | (0.004) | (0.006) | |
Constant | 0.073*** | 0.031*** | −1.911*** | −0.776*** | 0.209*** | 0.319*** | |
(0.001) | (0.000) | (0.009) | (0.047) | (0.001) | (0.002) | ||
Month fixed effects | √ | √ | √ | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ | √ | √ | √ |
Observations | 669,284 | 669,284 | 430,878 | 430,892 | 430,892 | 669,284 | 669,284 |
R2 | 0.124 | 0.139 | 0.106 | 0.098 |
(1) . | (2) . | (3) . | (4) . | (5) . | (6) . | (7) . | |
---|---|---|---|---|---|---|---|
Alternative Dependent Variable . | Alternative Dependent Variable . | Logit Regression . | Poisson Regression . | Negative Binominal Regression . | Alternative Definition of Productivity . | Alternative Definition of Productivity . | |
Productivity . | Effective Productivity . | Upload . | Productivity . | Productivity . | log(Num Downloads) . | log(Num Views) . | |
Male * Post | 0.006*** | 0.003*** | 0.068*** | 0.081*** | 0.068*** | 0.010** | 0.015** |
(0.002) | (0.001) | (0.022) | (0.024) | (0.021) | (0.004) | (0.006) | |
Constant | 0.073*** | 0.031*** | −1.911*** | −0.776*** | 0.209*** | 0.319*** | |
(0.001) | (0.000) | (0.009) | (0.047) | (0.001) | (0.002) | ||
Month fixed effects | √ | √ | √ | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ | √ | √ | √ |
Observations | 669,284 | 669,284 | 430,878 | 430,892 | 430,892 | 669,284 | 669,284 |
R2 | 0.124 | 0.139 | 0.106 | 0.098 |
Note: Columns (1) and (2) report the DiD results when dependent variables are Monthly Productivity and Monthly Effective Productivity. Columns (3), (4) and (5) report the DiD results when estimation models are Logit, Poisson and Negative Binominal Regressions. Columns (6) and (7) report the DiD results when dependent variables are log(NumDownloads) and log(NumViews). Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
Second, we use three alternative model specifications: a logit model, a Poisson regression, and a negative binomial regression. The logit model is used with the incorporation of a dummy variable, Upload. The Poisson and negative binomial regressions are used with the incorporation of a count variable, Productivity. The estimation results, reported in columns (3), (4), and (5) of Table 3, suggest that the main findings are robust to these alternative model specifications.
Third, in the main analysis, research productivity is defined as the volume of preprints written by an author. It can alternatively be defined as a combination of the volume and quality of the preprints. To capture this alternative definition, we use (i) the total number of downloads of the preprints and (ii) the total number of views of the abstracts of the preprints written by an author in each month as the dependent variables. If an author does not upload a preprint in a given month, the dependent variable values for that author in that month are recorded as zero. These two alternative measures reflect both the quantity and quality of preprints. As shown in columns (6), and (7) of Table 3, the results are consistent with the main findings.
Fourth, as mentioned in the main analysis, we use the database Genderize to classify authors’ genders. To confirm the robustness of the findings, we use higher confidence levels for gender classification. In particular, we choose 95%, 90%, and 85% as thresholds to filter authors based on the confidence level of the classification of their genders. Only those authors whose genders are classified with confidence levels higher than the chosen thresholds are used in this analysis. The results, reported in Supplementary material, are consistent with the main findings.
Fifth, the validity of our previous strategy to measure productivity, which leverages the DiD model, relies critically on the pretreatment parallel trend assumption (i.e. that there is no significant change in the difference between the productivity of male and female researchers before the treatment). To test this assumption, we use the relative time model with lag periods (23). The results of the relative time model are presented in Table 4 and Fig. 2. None of the coefficients of the prelaunch dummies are significant, suggesting that there are no significant differences in the prelaunch period between male and female researchers. Therefore, the parallel trend assumption in our DiD estimation is valid.

(1) . | |
---|---|
Upload . | |
bi > Male × Prelaunch (month = −6) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −5) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −4) | −0.003 |
(0.003) | |
Male × Prelaunch (month = −3) | −0.004 |
(0.003) | |
Male × Prelaunch (month = −2) | 0.000 |
(0.003) | |
Male × Prelaunch (month = −1) | −0.002 |
(0.003) | |
Constant | 0.065*** |
(0.001) | |
Month fixed effects | √ |
Author fixed effects | √ |
Observations | 334,642 |
R2 | 0.158 |
(1) . | |
---|---|
Upload . | |
bi > Male × Prelaunch (month = −6) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −5) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −4) | −0.003 |
(0.003) | |
Male × Prelaunch (month = −3) | −0.004 |
(0.003) | |
Male × Prelaunch (month = −2) | 0.000 |
(0.003) | |
Male × Prelaunch (month = −1) | −0.002 |
(0.003) | |
Constant | 0.065*** |
(0.001) | |
Month fixed effects | √ |
Author fixed effects | √ |
Observations | 334,642 |
R2 | 0.158 |
Notes: Dependent variable is Upload. Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
(1) . | |
---|---|
Upload . | |
bi > Male × Prelaunch (month = −6) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −5) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −4) | −0.003 |
(0.003) | |
Male × Prelaunch (month = −3) | −0.004 |
(0.003) | |
Male × Prelaunch (month = −2) | 0.000 |
(0.003) | |
Male × Prelaunch (month = −1) | −0.002 |
(0.003) | |
Constant | 0.065*** |
(0.001) | |
Month fixed effects | √ |
Author fixed effects | √ |
Observations | 334,642 |
R2 | 0.158 |
(1) . | |
---|---|
Upload . | |
bi > Male × Prelaunch (month = −6) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −5) | −0.004 |
(0.003) | |
bi > Male × Prelaunch (month = −4) | −0.003 |
(0.003) | |
Male × Prelaunch (month = −3) | −0.004 |
(0.003) | |
Male × Prelaunch (month = −2) | 0.000 |
(0.003) | |
Male × Prelaunch (month = −1) | −0.002 |
(0.003) | |
Constant | 0.065*** |
(0.001) | |
Month fixed effects | √ |
Author fixed effects | √ |
Observations | 334,642 |
R2 | 0.158 |
Notes: Dependent variable is Upload. Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
Additional analyses
In this section, we conduct several additional analyses to shed further light on the increase in gender disparity following the emergence of ChatGPT observed in the main findings.
First, we conduct a DDD analysis of ChatGPT's penetration metrics across different countries. We collect data on ChatGPT's penetration into more than 200 countries and regions by August 2023.b If the observed gender disparity is associated with the emergence of ChatGPT, gender differences in the intensity of ChatGPT use might alter the effect size obtained in the results. That is, we expect that a higher rate of penetration into a country would be associated with a wider gender gap in research productivity in that country. To measure ChatGPT's penetration into each country, we employ four metrics that are commonly used to measure user engagement, i.e. traffic, number of unique visitors, average number of pages visited, and bounce rate. Higher traffic, a larger number of unique visitors, a larger number of page visits, or a lower bounce rate of visitors from a country on ChatGPT indicate that the penetration of ChatGPT in that country is higher and vice versa. We run DDD analyses using each of these metrics. As shown in Table 5, we find that for countries in which ChatGPT has a higher penetration, the gender disparity in research productivity is more pronounced. The widespread use of ChatGPT in these countries aggravates the problem of the divergent adoption rates of generative AI among male and female researchers, causing greater inequality in their research productivity compared with countries in which it is scarcely adopted.
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.000 | 0.000 | −0.017*** | 0.024** |
(0.001) | (0.001) | (0.007) | (0.010) | |
Post * Traffic | −0.009*** | |||
(0.001) | ||||
Male * Post * Traffic | 0.002** | |||
(0.001) | ||||
Post * Unique Visitors | −0.032*** | |||
(0.003) | ||||
Male * Post * Unique Visitors | 0.007** | |||
(0.003) | ||||
Post * Pages | −0.031*** | |||
(0.002) | ||||
Male * Post * Pages | 0.008*** | |||
(0.003) | ||||
Post * Bounce Rate | 0.002*** | |||
(0.000) | ||||
Male * Post * Bounce Rate | −0.000** | |||
(0.000) | ||||
Constant | 0.061*** | 0.061*** | 0.095*** | 0.021*** |
(0.000) | (0.000) | (0.003) | (0.004) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ |
Observations | 4,697,630 | 4,697,630 | 4,697,630 | 4,584,692 |
R2 | 0.056 | 0.056 | 0.056 | 0.055 |
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.000 | 0.000 | −0.017*** | 0.024** |
(0.001) | (0.001) | (0.007) | (0.010) | |
Post * Traffic | −0.009*** | |||
(0.001) | ||||
Male * Post * Traffic | 0.002** | |||
(0.001) | ||||
Post * Unique Visitors | −0.032*** | |||
(0.003) | ||||
Male * Post * Unique Visitors | 0.007** | |||
(0.003) | ||||
Post * Pages | −0.031*** | |||
(0.002) | ||||
Male * Post * Pages | 0.008*** | |||
(0.003) | ||||
Post * Bounce Rate | 0.002*** | |||
(0.000) | ||||
Male * Post * Bounce Rate | −0.000** | |||
(0.000) | ||||
Constant | 0.061*** | 0.061*** | 0.095*** | 0.021*** |
(0.000) | (0.000) | (0.003) | (0.004) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ |
Observations | 4,697,630 | 4,697,630 | 4,697,630 | 4,584,692 |
R2 | 0.056 | 0.056 | 0.056 | 0.055 |
Notes: Dependent variable is Upload. Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.000 | 0.000 | −0.017*** | 0.024** |
(0.001) | (0.001) | (0.007) | (0.010) | |
Post * Traffic | −0.009*** | |||
(0.001) | ||||
Male * Post * Traffic | 0.002** | |||
(0.001) | ||||
Post * Unique Visitors | −0.032*** | |||
(0.003) | ||||
Male * Post * Unique Visitors | 0.007** | |||
(0.003) | ||||
Post * Pages | −0.031*** | |||
(0.002) | ||||
Male * Post * Pages | 0.008*** | |||
(0.003) | ||||
Post * Bounce Rate | 0.002*** | |||
(0.000) | ||||
Male * Post * Bounce Rate | −0.000** | |||
(0.000) | ||||
Constant | 0.061*** | 0.061*** | 0.095*** | 0.021*** |
(0.000) | (0.000) | (0.003) | (0.004) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ |
Observations | 4,697,630 | 4,697,630 | 4,697,630 | 4,584,692 |
R2 | 0.056 | 0.056 | 0.056 | 0.055 |
(1) . | (2) . | (3) . | (4) . | |
---|---|---|---|---|
Upload . | Upload . | Upload . | Upload . | |
Male * Post | 0.000 | 0.000 | −0.017*** | 0.024** |
(0.001) | (0.001) | (0.007) | (0.010) | |
Post * Traffic | −0.009*** | |||
(0.001) | ||||
Male * Post * Traffic | 0.002** | |||
(0.001) | ||||
Post * Unique Visitors | −0.032*** | |||
(0.003) | ||||
Male * Post * Unique Visitors | 0.007** | |||
(0.003) | ||||
Post * Pages | −0.031*** | |||
(0.002) | ||||
Male * Post * Pages | 0.008*** | |||
(0.003) | ||||
Post * Bounce Rate | 0.002*** | |||
(0.000) | ||||
Male * Post * Bounce Rate | −0.000** | |||
(0.000) | ||||
Constant | 0.061*** | 0.061*** | 0.095*** | 0.021*** |
(0.000) | (0.000) | (0.003) | (0.004) | |
Month fixed effects | √ | √ | √ | √ |
Author fixed effects | √ | √ | √ | √ |
Observations | 4,697,630 | 4,697,630 | 4,697,630 | 4,584,692 |
R2 | 0.056 | 0.056 | 0.056 | 0.055 |
Notes: Dependent variable is Upload. Standard errors are reported in parentheses; *p<.10, **p<.05, ***p<.01.
Second, we examine how the estimated gender inequality varies across countries by replicating the analysis for the subsamples of researchers in each country. In Supplementary material, we show the impact of the emergence of ChatGPT on the productivity gap by country. The figure in the Supplementary material plots the estimates of the interaction term with 90% CI; a positive value represents an increase in male researchers’ research productivity relative to that of female researchers. Most of the countries have positive coefficients.
Third, we examine heterogeneous effects across disciplines. We categorize authors by whether their disciplines are in the social sciences or natural sciences. The results are displayed in the Supplementary material. The interaction term in the social sciences subsample is statistically significant, but that in the natural sciences subsample is not. However, a Z-test shows that the interaction terms in the two subsamples are not significantly different from each other. Therefore, although there appears to be a significant trend in the social sciences, caution is warranted in interpreting these results as they do not provide strong evidence of a difference between the two fields. In Supplementary material, we depict the coefficients for each discipline, showing that most disciplines have positive coefficients and the coefficients in the marketing, anthropology, computer science, and innovation disciplines are statistically significant.
Fourth, the COVID-19 outbreak might have caused gender inequality in research productivity due to female researchers dealing with more childcare responsibilities and housework when working from home (22). To verify whether the gender disparity documented in this study is a consequence of COVID-19, we collect data on COVID-19 infection and deaths in the United States. As shown in Fig. 3, there is no substantial change in the number of new COVID-19 cases around the date of the launch of ChatGPT. Lockdown policies and mask orders have also been lifted in all states of the United States since 2022 March 26 (24, 25). Furthermore, we control for the severity of the pandemic in the DiD analysis by including the flexible polynomials of the monthly number of new cases and new deaths in the control variables. In addition, we test whether the results remain consistent when we assume that scholars require a certain time to adopt ChatGPT to help them write papers and upload them to SSRN. We redefine Post as being equal to one if the time is later than 2023 May. After controlling for the relationship between COVID-19 and the potential time lag in adopting ChatGPT, the estimation results (presented in Supplementary material) are consistent with the main findings. These tests further strengthen the evidence regarding the role of ChatGPT in the increased gender disparity and allow us to distinguish its impact from that of COVID-19.

Fifth, in the main analysis, the dependent variable captures whether an author uploads a preprint to SSRN in a given month as a measure of research productivity. It is possible that ChatGPT increases the volume of research produced by male researchers at the expense of quality. If this is the case, when comparing the quality of preprints submitted by female researchers to those submitted by male researchers, we would expect female researchers’ preprints to be of higher quality than those of male researchers following the emergence of ChatGPT. The dataset includes two quality indicators used by SSRN to rank preprints, i.e. the number of times an abstract has been viewed and the number of times a preprint has been downloaded. In Supplementary material, we report the effects of ChatGPT on the number of abstract views and the number of downloads of each submitted preprint. The coefficients are not significant, suggesting that in the period following the emergence of ChatGPT, the quality of the research of male researchers, as measured by the number of downloads per submitted preprint, does not decrease relative to that of female researchers.
Study 2: survey
In study 2, we conduct a survey among researchers to gather behavioral and attitudinal data related to their use of generative AI tools. This study involves measuring of variables such as duration of AI use, AI engagement, perceived efficiency improvement caused by the use of AI, and tendency to recommend AI to others, which are directly related to the use of tools based on large language models (LLMs). It is therefore essential to recruit participants who already use these tools in their academic work. The survey is administered through Qualtrics’ sample service, targeting researchers in the United States who actively use LLMs. The survey link is distributed to a prescreened pool of LLM users, and is kept open until our target of 400 respondents is reached (Mage = 37.12; 53.8% women). The participants are informed of the survey's purpose and assured of their anonymity and the confidentiality of their information. The voluntary nature of their participation is emphasized and informed consent is obtained before proceeding to the survey.
We measure the time per week the participants spend using LLM-based tools for research tasks (AI duration) on a 7-point Likert scale ranging from 1 (“Do not use at all”) to 7 (“More than 20 hours”). Their engagement with AI tools (i.e. how frequently they use LLMs; AI engagement) is also measured on a 7-point Likert scale ranging from 1 (“Very infrequently”) to 7 (“Very frequently”). Their perceived improvement in efficiency (Efficiency improvement) is measured using a three-item scale (“The large language models have helped me complete my tasks more quickly,” “The large language models have made me more productive in my work,” and “The large language models have improved the quality of my work”; 1 = strongly disagree; 7 = strongly agree; Cronbach's α = 0.94). Additionally, we assess their tendency to recommend LLMs to others (Recommendation tendency) by asking them to rate their agreement with the statement “I would recommend large language models to others,” using the same 7-point Likert scale. Finally, demographic information, including gender and age, is collected.
We find that the male participants demonstrate higher engagement with AI tools than the female participants. Specifically, the male participants (M = 2.78, SD = 1.28) dedicate more time to using generative AI than the female participants (M = 2.47, SD = 0.93; F(1, 398) = 7.78, P = 0.006; d = 0.28) and use generative AI more frequently (male participants: M = 3.43, SD = 1.86; female participants: M = 3.01, SD = 1.63; F(1, 398) = 5.86, P = 0.016; d = 0.24).
More importantly, we discover significant gender disparities in perceived generative AI-induced efficiency improvements (F(1, 398) = 5.88, P = 0.016; d = 0.24). The male researchers (M = 4.80, SD = 1.54) report greater perceived efficiency improvement from using generative AI than their female counterparts (M = 4.40, SD = 1.74). Additionally, the male researchers (M = 5.12, SD = 1.43) are more inclined to recommend the use of generative AI to their peers than the female researchers (M = 4.76, SD = 1.69; F(1, 398) = 5.24, P = 0.023; d = 0.23).
We run four mediation models to test the mediating roles of AI duration and AI engagement in the relationships between Gender and Perceived efficiency improvement and Gender and AI recommendation. The model includes Gender (coded as a binary variable; 0 = Female, 1 = Male) as the independent variable, AI duration (or AI engagement) as the mediator, and Perceived efficiency improvement (or AI recommendation) as the dependent variable.
We first examine the relationship between Gender and the Perceived efficiency improvement caused by AI via AI duration. We find that Gender increases AI duration (b = 0.31, SE = 0.11, t(398) = 2.79, P = 0.006) and Efficiency improvement (b = 0.40, SE = 0.17, t(398) = 2.43, P = 0.016). Moreover, a higher AI duration is predictive of a higher positive value of Perceived efficiency improvement (b = 0.79, SE = 0.06, t(398) = 12.50, P < 0.001). However, when we consider both Gender and AI duration as predictors of Perceived efficiency improvement, the direct effect of Gender (b = 0.16, SE = 0.14, t(397) = 1.12, P = 0.263) on Perceived efficiency improvement is not significant. A bootstrap analysis (5,000 samples, PROCESS model 4) (26) shows that the indirect effect via AI duration is significant (b = 0.24, SE = 0.09, 95% CI = [0.08, 0.41], excluding 0), suggesting that Gender affects Perceived efficiency improvement through AI duration.
Second, we examine the Gender on Perceived efficiency improvement via AI engagement. The results follow the same pattern. The indirect effect via AI engagement is significant (b = 0.27, SE = 0.11, 95% CI = [0.05, 0.50], excluding 0).
Third, we examine the relationship between Gender and Recommendation tendency via AI duration. The results follow the same pattern. The indirect effect via AI duration is significant (b = 0.19, SE = 0.07, 95% CI = [0.05, 0.33], excluding 0).
Last, we examine the relationship between Gender and Recommendation tendency via AI engagement. The results follow the same pattern. The indirect effect via AI engagement is significant (b = 0.21, SE = 0.09, 95% CI = [0.04, 0.39], excluding 0).
In sum, the results from the survey demonstrate that compared with the female participants, the male participants report higher perceived improvement in efficiency and exhibit greater inclination to recommend the use of generative AI. The relationship between gender and the outcomes of generative AI use is mediated by the duration and frequency of AI use, with the male participants spending more time using these tools and using them more frequently, leading to greater efficiency improvement and stronger recommendation intention among them. One limitation of study 2 is that we restrict the analysis to LLM users, which may introduce selection bias. A possible criticism is that males may use LLMs more efficiently but less frequently than females, potentially leading to an overall decrease in observed productivity gains among males compared with females. To address this concern, we conduct an additional survey among researchers in the United States, including participants who were not necessarily active LLM users. The results replicate the findings on user frequency, perceived efficiency improvement, and recommendation tendency: males outperform females on all these indicators. We include the results of this additional study in the Supplementary material.
Discussion
Using data from SSRN, we compare the statistics of the preprint uploads of male researchers with that of female researchers before and after the emergence of generative AI tools. We uncover an unintended consequence of generative AI for gender equality in academia by showing that the increase in the research productivity of male researchers since the emergence of ChatGPT is 6.4% higher than that of female researchers. A follow-up survey further indicates that male researchers use generative AI more frequently and experience greater efficiency improvements from its use than their female counterparts.
This study makes significant contributions to the literature. First, it contributes to the broader literature on how generative AI might exacerbate or mitigate gender differences. Gender bias has emerged as a critical issue in the study of LLMs. A recent study observes that four LLMs released in 2023 are three to six times more likely to select occupations that are stereotypically associated with a person's gender during a linguistic deduction task (27). Additionally, these models not only amplify existing biases but also rationalize their biases inaccurately, potentially obscuring the true reasoning behind their decisions. Another study emphasizes how ChatGPT perpetuates stereotypes (28). For instance, when asked to narrate a story about children's future careers, ChatGPT tends to associate girls with artistic and emotional professions but links boys to scientific and technological fields. Similarly, another study finds that although ChatGPT's letters of reference do not exhibit explicit gender bias, there is a subtle preference for male-biased language (29). Additionally, researchers have identified gender bias in the association of communal descriptive words with roles in GPT-4 (30). Further evidence of implicit bias is found in GPT-4, which is shown to associate science with boys 250% more frequently than with girls (31).
The existing scholarly work that is most related to this study is that of Carvajal et al. (32), who use surveys to show that female students are 25% less likely to report a high use of ChatGPT than male students. However, there are notable distinctions between our study and theirs. First, we not only document the gender disparity in AI adoption, but also examine its consequences for research productivity. Second, in addition to conducting a survey, we collect observational data recording their actual behavior. By doing so, we avoid the self-report bias that survey data are prone to and provide more objective and generalizable empirical evidence. Third, to achieve the objectives of our research, we undertake a more exhaustive analysis. This includes DDD analysis of ChatGPT penetration metrics, analysis of heterogeneity of effects across countries and disciplines, and a series of robustness tests to bolster the credibility of our findings.
Second, this study sheds light on gender inequality in academia. Studies document gender inequality in academia in terms of tenure evaluation (33), coauthoring choices (34), and number of citations (35) and demonstrate that female researchers face disadvantages compared with male researchers. The disadvantages documented include receiving less laboratory space, office spaces that are less optimal, lower equipment funding, and lower amounts of research support and funding and contending with inequitable service burdens and more parenting responsibilities (22,36–40). Prior literature has documented that gender differences in academic productivity arise from structural, institutional, cultural, and individual factors. Women often face unequal access to resources, including funding and research support, and are underrepresented in leadership positions, which limit their opportunities for impactful research (41). Additionally, caregiving responsibilities disproportionately fall on women, reducing their available time for research and leading to career interruptions (22, 42). Implicit biases in peer review, hiring, and promotion processes further exacerbate disparities, along with collaborative exclusion from high-profile projects (43). Women tend to have smaller professional networks and are less likely to be lead authors on publications, affecting their research visibility and impact (44). They also prioritize different research areas, often focusing on less mainstream topics that receive fewer citations and recognition (45). Societal expectations and cultural norms play a role as well, discouraging women from pursuing research-intensive careers or fields like STEM, leading to lower representation (46). Our study expands on prior research by introducing an additional factor which exacerbates the gender gap in academic productivity, i.e. the emergence of generative AI. As all researchers participate in an open competition for promotions and positions, short-term changes in productivity affect long-term career outcomes (47). Thus, institutions should consider this inequality when evaluating faculty members.
Thirdly, the findings from our survey provide additional evidence for gender differences in attitudes and behavior toward technology. The technology acceptance model (TAM), introduced by Fred Davis more than 30 years ago, is the dominant model adopted in research on factors affecting users’ acceptance of new technologies (48). Studies adopting an extended TAM to investigate gender differences in email perception and usage illuminate how gender impacts technology acceptance (49). Similarly, studies using the unified theory of acceptance and use of technology to examine consumers’ acceptance of information technology (IT) find a significant impact of gender on consumers’ IT adoption (50). A study reviewing literature from 2000 to 2017 on gender differences in IT finds that men are more likely to adopt new IT technologies than women (51). Some studies find that women tend to experience higher levels of anxiety regarding IT usage than men, and this heightened anxiety can reduce self-efficacy and increase the perception that using IT requires more effort (52). Recent studies show gender differences in agricultural technology adoption and agricultural productivity, with men being more likely to adopt improved technology than women (53, 54). Moreover, factors such as job satisfaction, organizational culture, and social influence strongly affect people's intention to adopt ChatGPT (55).
This study also offers significant implications for institutions and policymakers. Our findings suggest that generative AI exacerbates gender-based productivity disparities and this difference stems from varied usage patterns across genders. Therefore, more effort needs to be invested to mitigate the gender inequality induced by AI. For instance, creators of AI systems can implement regular audits to ensure that AI systems comply with ethical guidelines and do not contribute to gender disparities. They should ensure that the data used to train AI models are representative of all genders. Governments and institutions should implement policies that require AI systems to be evaluated for gender bias before deployment. They should also increase funding for research that examines how AI can be used to promote gender parity. Campaigns to raise public awareness about the potential for AI to exacerbate gender inequality should be conducted. These campaigns should advocate for responsible AI use and encourage public discourse on the topic. Specifically, in academia, given that all scholars compete for promotions and positions, an abrupt shift in productivity such as that documented in this study can influence career trajectories over time (33). Consequently, it is essential for institutions to factor such disparities into their faculty evaluations. Moreover, there is a pressing need to substantially increase training opportunities for all researchers, ensuring that they are well versed in techniques for harnessing the capabilities of emerging generative AI tools.
Although generative AI has the potential to democratize access to research tools, our findings highlight a complex issue wherein gender disparities in technology adoption can lead to unintended consequences. For instance, efforts to design AI systems that are sensitive to gender parity may inadvertently introduce new forms of gender discrimination (56). This issue underscores the need for a more nuanced approach that goes beyond technical fixes and considers both the social and cultural factors that influence AI use. Promoting gender parity in academia requires not only the careful design and deployment of AI tools but also proactive efforts by institutions to ensure that these technologies foster gender equity rather than widen existing gaps.
The study has some limitations. First, there is limited information in the SSRN dataset about whether researchers actually use ChatGPT in their academic work. To address this limitation, we conduct a survey to collect data on ChatGPT usage behavior. Future research could collect more fine-grained data on researchers’ ChatGPT usage to pinpoint the exact mechanism underlying the observed empirical findings. Second, this study mainly uses data from one database, i.e. SSRN. Future research could collect data from multiple sources.
Materials and methods
Study 1
Data
We collect data from SSRN, a repository of preprints for the rapid dissemination of scholarly research in the social sciences. We gather data on discipline preprints submitted between 2022 May and 2023 June. We extract the titles of the papers and the authors’ names, affiliations, and addresses. We use the authors’ addresses to identify their countries of residence.
To identify the authors’ genders, we use Genderize, a database that predicts gender based on first names and provides confidence levels for its predictions. About 78% of the authors’ genders are identified with confidence levels >80%.c Our sample contains data on 21,733 female researchers and 55,099 male researchers.
The summary statistics of the variables are presented in Table 1. Upload denotes whether an author uploads any preprint to SSRN in a given month. Upload is equal to 1 if the author uploads a preprint in a given month and 0 otherwise. Post denotes whether a given month occurs after the emergence of ChatGPT. Post is equal to 1 if the month occurs after the emergence of ChatGPT and 0 otherwise. In addition, we present variables for attributes of preprints, such as Average Number of Pages, Average Number of Downloads, and Average Number of Views of Abstract, and variables for author characteristics, including Number of Papers Previously Uploaded to SSRN, Number of SSRN Citations, and Number of Crossref Citations.
DiD analysis
We use a DiD approach to analyze the differential impacts of generative AI on male and female researchers’ productivity. The main specification is as follows:
where is an indicator of whether author i uploads papers to the SSRN website in month t. The majority of authors upload at most one paper in a month (only 4.6% of the author–month observations have more than one paper in a month). Therefore, in the main analysis, we use a dummy variable, Upload, to denote whether an author uploads any preprint to SSRN in a given month. is an indicator of whether the author is male. is an indicator of whether the month occurs after the release of ChatGPT. is our coefficient of interest. We also include author fixed effects, , and month fixed effects, in the model to control for authors’ time-invariant characteristics and time trends. For the inferences, we use cluster-robust variance–covariance estimators clustered at the author level, which adjust for heteroskedasticity and serial correlation, in all of our analyses.
Test for parallel trends
The validity of our previous identification strategy, which leverages the DiD model, relies critically on the pretreatment parallel trend assumption (i.e. that there is no significant change in the difference in the productivity of male and female researchers before the treatment). To test this assumption, we use the relative time model with lag periods (23). In this model, we add a series of time dummies that indicate the relative chronological distance between the observation time and treatment time. Intuitively, this model enables us to validate the parallel trend assumption used the DiD analysis (57). The specification of our relative time model is as follows:
Specifically, is the vector of relative time dummies of lag periods, including indicator variables for months that occur before the launch of ChatGPT (one month is omitted as the baseline).
Study 2
Data
In study 2, we conduct a survey among researchers to gather behavioral and attitudinal data related to their use of generative AI tools. This study involves the measuring of variables such as duration of AI use, AI engagement, perceived efficiency improvement caused by the use of AI, and tendency to recommend AI to others, which are directly related to the use of tools based on LLMs. It is therefore essential to recruit participants who already use these tools in their academic work. The survey is administered through Qualtrics’ sample service, targeting researchers in the United States who actively use LLMs. The survey link is distributed to a pre-screened pool of LLM users, and is kept open until our target of 400 respondents is reached (Mage = 37.12; 53.8% women). The participants are informed of the survey's purpose and assured of their anonymity and the confidentiality of their information. The voluntary nature of their participation is emphasized and informed consent is obtained before proceeding to the survey.
Inclusion and ethics
The scope of this research is global, and it may not fully address region-specific contexts or issues. Additionally, while we recognize the value of local insights, local researchers were not extensively involved throughout the research process. All roles and responsibilities were agreed among collaborators ahead of the research. The survey conducted in the study has been approved by Institutional Review Board of Southern University of Science and Technology. This research does not result in any stigmatization, incrimination, discrimination, or other personal risk to the participants. The research does not involve any health, safety, security, or other risk to the researchers.
Notes
The econometric model is described in the “Materials and methods” section.
The main findings are robust to different thresholds of confidence levels for gender classification, as shown in the “Robustness Checks” section.
Supplementary Material
Supplementary material is available at PNAS Nexus online.
Funding
The authors gratefully acknowledge the financial support provided by the National Natural Science Foundation of China (grant nos. 72302008, 72472068, and 72442017), the Key Project of Philosophy and Social Science Research of the Chinese Ministry of Education (Project no. 22JZD012), and the Guangdong Basic and Applied Basic Research Foundation (grant no. 2023A1515010823).
Author Contributions
C.T. proposed the research idea and completed the model design. C.T., S.L., and S.H. collected and analyzed the data. The five authors organized the paper structure and conducted the literature review. The five authors wrote the manuscript. All authors have approved the final version for submission.
Data Availability
The data in study 1 were collected from SSRN. The data in study 2 were collected through Qualtrics. The data are available at https://osf.io/ucw6x/? view_only = fe96b6aecf22414bbb8a725df0b57181.
References
Author notes
Competing Interest: The authors declare no competing interest.