Parsing the ARRIVE Trial: Should First-Time Parents Be Routinely Induced at 39 Weeks?

arrive hero.jpg

Last February, I did a guest post "Preventative Induction of Labor: Does Mother Nature Know Best?  - Henci Goer Examines the ARRIVE Study" for Science & Sensibility analyzing the abstract for the ARRIVE trial, which reported that low-risk 1st-time mothers would undergo fewer cesareans and their babies would be slightly better off if they underwent routine induction at 39 weeks compared with expectant management. An abstract is only the bare bones of a study, so no final conclusions could be reached until publication of the full study. That has now happened. Let's revisit my analysis of the abstract and see what more the full trial gives us.

What did we know from the abstract and what questions did it raise?

Here's what I wrote about the trial's design and results back in February:

The ARRIVE trial is a large randomized controlled trial (6106 women), meaning participants were allocated by chance to one form of treatment or the other, conducted at multiple institutions (41 hospitals). Investigators randomly allocated “low-risk” 1st-time mothers in week 38 either to induction between 39 wk 0 days and 39 wk 4 days or to “forgo elective delivery” before 40 wk 5 days. The management plan was adhered to in 94% of the induction group and 95% of the expectant-management group. Outcomes did not differ by race/ethnicity, maternal age > 34 yr, BMI > 30, or Bishop score < 5 at time of allocation.

Women in the induction group delivered significantly (meaning the difference is unlikely to be due to chance) earlier (39 wk 3 days; interquartile range 39 wk 1 day – 39 wk 6 days) than the expectant-management group (40 wk 0 days; interquartile range 39 wk 3 days – 40 wk 7 days), although, as you can see, there was considerable overlap. Women in the induction group were significantly less likely to deliver by cesarean (19% vs. 22%) and to experience preeclampsia/gestational hypertension (9% vs. 14%).

Babies in the induction group were significantly less likely to require respiratory support (3% vs. 4%). The text also states that babies were less likely to experience the primary perinatal outcome (5% vs. 4%), a composite of adverse events, but omits that this difference didn’t achieve statistical significance, a fact that must be gleaned from the accompanying table. The composite consisted of perinatal death, respiratory support, Apgar ≤ 3 at 5 min, hypoxic ischemic encephalopathy, seizures, infection, meconium aspiration syndrome, birth trauma, intracranial or subgaleal hemorrhage, and hypotension requiring pressor support. All but two of the adverse outcomes making up the composite occurred at rates of 6 per 1000 or less. The two exceeding that rate were meconium aspiration syndrome at 6 vs. 9 per 1000 and need for respiratory support at 3.0 vs. 4.2 per 100.

And here are the questions the abstract raised for me:

Did clinicians refrain from electively delivering women allocated to expectant management before 40 wk 5 days? The interquartile range is the middle 50% of the group. Among expectantly-managed women, the middle 50% were delivered by 40 wk 7 days, which means 75% of the group overall had their babies by that day. A study on median pregnancy duration (50% delivered before and 50% after) in uncomplicated pregnancy in 1st-time mothers reaching term reported a median length of 41 wk 1 day (Mittendorf 1990). In the ARRIVE trial, 75%, not 50%, of expectantly-managed women had delivered by a day earlier than that, which raises the question of what percentage of them were induced? Observational studies consistently find that inducing labor in healthy 1st-time mothers roughly doubles their odds of cesarean, amounting to an absolute difference of 3 to 31 more women per 100, even after adjusting for factors such as birth weight and gestational age and despite treatment to ripen the cervix (Baud 2013; Boulvain 2001; Cammu 2002; Davey 2016; Dublin 2000; Ehrenthal 2010; Glantz 2005; Jaquemyn 2012; Le Ray 2007; Luthy 2004; Macer 1992; Maslow 2000; Seyb 1999; Vahratian 2005; Van Gemmund 2003; Vardo 2011; Vrouenaets 2005; Yeast 1999). If, as seems likely, sizeable percentages of the expectant-management group were induced, this would diminish differences between them and the induction group.

Why was the elective induction threshold set at 40 wk 5 days rather than 41 wk 0 days? Even if 41 weeks has become the new 42, why weren’t expectantly-managed women given the full 41 weeks? To repeat, if inducing labor increases the risk of cesarean, then expectantly-managed women were handicapped by not having an extra two days to start labor on their own.

Was the population actually low risk? The abstract tells us that 14% of expectantly-managed women had preeclampsia or gestational hypertension as did 9% of women in the induction group. But these were all “low-risk” women at the time of group allocation in week 38. It seems extremely unlikely that 9 women per 100 developed new hypertension in the following week and an additional 5 per hundred did so in the ensuing days after that, and, in fact, a study of 31,000 U.S. 1st-time mothers found that only 5% of women developed hypertension with ongoing pregnancy beginning at week 39 (Bailit 2015). If women in the ARRIVE trial could have medical indications for induction at trial entry, then was the ARRIVE trial truly studying elective induction?

Perhaps there is another explanation for 1 in 7 women in the expectant-management group being diagnosed with hypertension: “You can always find a reason to do what you want to do.” Clark et al. (2009) write that among women having planned delivery for hypertension, in only 3 of 27 facilities was the mean admission systolic pressure > 145 mm Hg, and in only 1 was the mean diastolic pressure > 90 mm Hg. If expectantly-managed women were being labeled hypertensive to justify inducing them, it calls into question the trustworthiness of the data.

Could management of the expectant group have contributed to their excess risk of cesarean? It’s a safe bet that expectantly-managed women underwent antenatal fetal surveillance testing. Fetal surveillance testing does a poor job of discriminating fetuses at risk (Grivell 2015; Lalor 2009). Almost all positive tests will be false positives, but they will lead to inducing labor, and, what’s more, to inducing labor under the worst circumstances: with a nervous practitioner ready to call for a cesarean at the slightest deviation from normal.

What more does the full study tell us and does it resolve any of the questions?

The paper's accompanying supplementary appendix provides detailed information on eligibility criteria, definitions of terms used in the paper, and guidelines for patient management, some of which is germane to the issues and concerns raised by the abstract.

To begin with, by "low risk," investigators didn't just mean "nulliparous, term, singleton, vertex" (NTSV) but free of any conditions that would contraindicate labor or warrant induction before 40 weeks 5 days. I wrote in my earlier post that we have a birth center study and a home birth study reporting lower induction rates (4% and 12%) and lower cesarean rates (9% and 14%) in 1st-time mothers (Jolles 2017; van der Hulst 2004) than found in the induction arm of the trial. Had the investigators defined low-risk merely as NTSV, some of that difference might have been explained by women in the trial having medical complications. Now we know they didn't. They were women who would have been eligible for out-of-hospital birth, which confirms that physiologic care would result in much lower induction rates and lower cesarean rates than routine induction at 39 weeks.

The low-risk definition also answers the question of whether the trial included medically at-risk women, specifically women with hypertension, at the time of randomization. It didn't.

According to the supplement, a diagnosis of "gestational hypertension" could be based on nothing more than undefined blood pressure elevation in the absence of protein in the urine or any other symptoms. This vague definition supports my speculation in the February post that the extraordinarily high percentage of new hypertension cases suggests that labeling women "hypertensive" might have been a way to justify labor induction.

Compliance in the expectant-management group was defined as "spontaneous labor or medically-indicated induction or cesarean on or before 42 weeks 2 days." That's not quite accurate, though, because the paper states that elective induction could be undertaken after 40 weeks 5 days, and indeed, non-compliance in the expectant management group is defined as including "elective induction before 40 weeks 5 days due to patient or provider preference," which necessarily means that elective induction after 40 weeks 5 days is considered "compliant." According to the paper, only 135 women in the expectant-management group had elective labor induction before 40 weeks 5 days, but this seems suspiciously low because, as I argued in February, far fewer women than we would reasonably expect were delivered by 40 weeks 1 day. Therefore, the question of whether clinicians electively induced substantial numbers of women allocated to expectant management remains open.

Turning to management, the supplementary appendix tells us that women undergoing induction with an unfavorable cervix "were expected" to undergo cervical ripening first, with the method left up to the care provider. Not mandated, but "suggested," was to allow 12 hours or more in latent-phase labor after completion of cervical ripening (if needed), rupture of membranes, and use of oxytocin. This differs from a California Maternal Quality Care Collaborative commentary on the abstract back in February. In that commentary, Elliot Main, MD wrote that all hospitals used a common definition of failed labor, i.e., that cesarean delivery should not be done in latent phase prior to at least 15 hours after rupture of membranes and beginning oxytocin administration. In addition, active phase was defined as achieving six cm dilation, after which ACOG/SMFM guidelines were to be followed in diagnosing labor arrest and descent disorders. Nothing is said about diagnosis of labor arrest or descent disorder in the supplementary appendix's management guidelines; they recommend a minimum of 12 hours in latent labor, not 15; and they merely suggest, not dictate, management. Presumably, Dr. Main got his information from an authoritative source, probably the conference at which the paper was presented. Which, then, is correct and why the discrepancies between how management was described earlier and now? Taken together with the vague definition of hypertension, it raises the possibility that definitions and guidelines were revised after the fact to match what actually went on during the trial. Be that as it may, the point made in the CMQCC commentary holds good: failure to allow enough time is likely to result in much higher cesarean rates than found in the trial.

What's new in the study?

As you would expect, outcomes didn't change, but there are some new tidbits of intriguing information:

  • One cesarean may be avoided for every 28 planned inductions at 39 weeks in healthy, 1st-time mothers. A commentary on the trial by the American College of Nurse-Midwives, "ACNM Responds to Release of ARRIVE Trial Study Results" notes that there are more effective, non-interventive, evidence-based approaches for reducing cesareans. For example, it states that one cesarean will be prevented for every 14 women having continuous labor support. (Note: According to the source for the ACNM's statistic [Bohren 2017], significant reduction was only seen when the labor companion was neither a hospital staff member nor a member of the woman's social network, in other words, a doula. In that instance, rates were 14% vs. 21 % with usual care.)
  • Only 27% of the 22,533 women eligible to participate agreed to be randomized. This tells us that most women weren't interested in routine early induction, and the ones who were differed in potentially important ways from the ones who weren't. This is borne out by the data on protocol violation: 5 of these healthy women underwent elective cesarean surgery. That's not a big number, but it is a pointer to attitude. Unfortunately, the preference for allowing nature to take its course is likely to change, now that obstetricians can confidently tell pregnant women that induction at 39 weeks won't harm their babies, reduces their odds of cesarean, and averts the possibility of stillbirth.
  • Six percent of the population was cared for by midwives, but there were no significant differences in cesarean rate or rate of composite adverse perinatal outcome according to admitting provider type. It would be interesting to have the actual rates because failure to find a difference may be because the population admitted by midwives was too small to reliably detect a significant difference. Even if rates truly are similar, the finding that midwifery care made no difference could be explained in any number of ways: not all midwives practice physiologic care, care may have been constrained by their employer physicians' or their hospital's policies, and as the previous bullet makes clear, women in the trial differed from the typical midwifery client in their values and preferences, which could have affected outcomes.
  • Women in the population overall who had an unfavorable cervix (Bishop score < 5) at randomization were more likely to have a cesarean. The paper only gives the raw numbers, but that readily allows calculation of the rates: 24% with a Bishop score less than 5 vs. 14%—10 cesareans per 100 women fewer—with a score of 5 or more. The study authors note that while more women with an unfavorable cervix at randomization had cesareans, cervical status didn't affect the magnitude of the difference in cesarean rates between the induction and expectant-management groups, although you would expect that this would disadvantage the induction group. They explain this non-intuitive finding by saying:

Yet, because women with an unfavorable score at baseline also had a higher chance of cesarean delivery than women with a favorable score when they followed the expectant-management strategy, labor induction in women with an unfavorable score still resulted in fewer cesarean deliveries than expectant management (p. 522).

That's an eyebrow raiser but hold that thought. I'll have more to say on its implications in the next section.

What's still missing from the picture?

The burning question remains: "How many women in the expectant-management group were induced?" Fueling the fire is the low probability that so few women would remain undelivered by 41 weeks if they had been allowed to begin labor spontaneously. Also feeding the flames is the finding that having an unfavorable cervix at trial entry didn't tilt the cesarean rate in favor of the expectant-management group. Surely, the cervix would have ripened before women began labor spontaneously, which should have given them the advantage. The only logical explanation is that a large percentage of women in the expectant-management group underwent unnecessary induction with an unfavorable cervix. While trial investigators may tell us that only 135 women in the expectant-management group were "electively" induced, the lack of a definition for what was considered medical indication for induction and what wasn't calls that into question. And, if, in fact, a large percentage of the women assigned to expectant management were induced, this would shift not just cesarean rates in favor of induction but rates of adverse outcomes as well, since, except for anal sphincter injury, complications are more likely to occur with surgical delivery.

What remains the same?

The best that can be said for the results of the ARRIVE trial is that routine 39-week induction confers a minor reduction in cesarean rates (3 fewer per 100) and no advantage to babies other than 1 fewer baby per 100 needing respiratory support. These are far from compelling reasons for routine induction at 39 weeks, but even these advantages are doubtful. As I ended the February post:

Randomized controlled trials are based on the premise that results depend on the interaction between factors intrinsic to the participants and the treatment, in this case, that women change over a few additional days of gestation in ways that heighten the odds of cesarean and adverse newborn outcomes and that labor induction reduces those odds. But the trial wasn’t measuring anything to do with the women or the treatment. The trial was measuring care-provider propensity to perform a cesarean--or induce labor, since "expectant management" doesn't preclude that. What is more, women in both groups were managed in ways that obstruct their ability to achieve spontaneous, uncomplicated, vaginal birth. This means the trial is nothing more than a frying pan vs. fire comparison with the not surprising finding that in the hands of medical-model practitioners, the frying pan comes out slightly ahead of the fire. As Sarah Wickham (2014) put it:

"We might consider that [the research] teaches us that awaiting spontaneous labor while in the care of an obstetrician may increase the risk of being advised to have a caesarean section, which may or may not have been genuinely warranted."

The takeaway

ARRIVEinfographic2.jpgIn the end, it matters little what the physiologic birth community thinks. Care providers and the media will be telling women that 39-week induction is better than awaiting labor. Our task becomes helping women make the best decisions for themselves and their babies in light of this. This infographic can help women determine whether 39-week induction is in line with their values and preferences while providing stealth education as to why it might not be a good idea.

If parents do opt for elective induction at 39 weeks—or any other time, for that matter—then we can help them ascertain whether their care provider adheres to guidelines that will maximize their chances of vaginal birth to a healthy baby. Here are some questions they can ask:

  • When there is no medical reason to induce labor, do you wait for cervical ripening before inducing? Perhaps the biggest myth of labor induction is that cervical ripening solves the problem of excess cesareans with induction. As we can see in the ARRIVE trial, it doesn't, and, as noted in the discussion following my question on how many women with planned expectant management were induced, there is abundant research confirming this.
  • Provided the baby and I are doing well, how long do you allow for me to get into active labor once oxytocin (Pitocin, "Pit," Syntocinon) is started? The ARRIVE trial recommended a minimum of 12 hours to achieve active labor after initiation of IV oxytocin, but longer is better. ACOG/SMFM's care consensus document, "Safe prevention of the primary cesarean delivery," states that if mother and baby are tolerating labor, oxytocin should be administered for "at least 12-18 hours" before diagnosing induction failure.
  • At what point do you consider that I'm in active labor? The same document states that active labor should be defined as achieving six cm dilation.
  • Once I'm in active labor, what criteria do you use to decide that I'm not progressing well enough to birth the baby vaginally? The consensus document states that slow, but progressive, labor is not an indication for cesarean and that active phase arrest should be reserved for failure to progress for at least 4 hours with adequate uterine activity and 6 hours with inadequate activity and oxytocin administration.
  • Once I'm in active labor, do you discontinue the oxytocin drip and see if I continue to progress without it? A systematic review of this policy reported the decreased likelihood of cesarean (9% vs. 15%) (Saccone 2017).
  • How long will you give me to push out the baby? The ACOG/SMFM consensus document states that 1st-time mothers should push for at least three hours and longer if they have an epidural or with fetal malposition as long as progress is being documented.
  • Are you comfortable with my declining to have membranes ruptured? It is common practice to rupture membranes when beginning oxytocin administration, but women may wish to decline membrane rupture until established in progressive labor. So long as membranes are intact, the induction can be stopped and tried another day if it doesn't take, but once membranes are ruptured, there is no turning back; the baby will be delivered, by one route or the other. Furthermore, rupturing membranes isn't benign. In addition to committing to delivery, it opens a pathway for ascending infection, permits cord compression, and creates a potential for cord prolapse.


Bailit, J. L., Grobman, W., Zhao, Y., Wapner, R. J., Reddy, U. M., Varner, M. W., . . . Human Development Maternal-Fetal Medicine Units, N. (2015). Nonmedically indicated induction vs expectant treatment in term nulliparous women. American Journal of Obstetrics and Gynecology, 212(1), 103 e101-107. doi:10.1016/j.ajog.2014.06.054

Baud, D., Rouiller, S., Hohlfeld, P., Tolsa, J. F., & Vial, Y. (2013). Adverse obstetrical and neonatal outcomes in elective and medically indicated inductions of labor at term. J Matern Fetal Neonatal Med, 26(16), 1595-1601. doi:10.3109/14767058.2013.795533

Bohren, M. A., Hofmeyr, G. J., Sakala, C., Fukuzawa, R. K., & Cuthbert, A. (2017). Continuous support for women during childbirth. Cochrane Database Syst Rev, 7, CD003766. doi:10.1002/14651858.CD003766.pub6

Boulvain, M., Marcoux, S., Bureau, M., Fortier, M., & Fraser, W. (2001). Risks of induction of labour in uncomplicated term pregnancies. Paediatric and Perinatal Epidemiology, 15(2), 131-138.

Cammu, H., Martens, G., Ruyssinck, G., & Amy, J. J. (2002). Outcome after elective labor induction in nulliparous women: a matched cohort study. American Journal of Obstetrics and Gynecology, 186(2), 240-244.

Clark, S. L., Simpson, K. R., Knox, G. E., & Garite, T. J. (2009). Oxytocin: new perspectives on an old drug. American Journal of Obstetrics and Gynecology, 200(1), 35 e31-36. doi:S0002-9378(08)00620-0[pii]10.1016/j.ajog.2008.06.010

Davey, M. A., & King, J. (2016). Caesarean section following induction of labour in uncomplicated first births- a population-based cross-sectional analysis of 42,950 births. BMC Pregnancy Childbirth, 16, 92. doi:10.1186/s12884-016-0869-0

Dublin, S., Lydon-Rochelle, M., Kaplan, R. C., Watts, D. H., & Critchlow, C. W. (2000). Maternal and neonatal outcomes after induction of labor without an identified indication. American Journal of Obstetrics and Gynecology, 183(4), 986-994.

Ehrenthal, D. B., Jiang, X., & Strobino, D. M. (2010). Labor induction and the risk of a cesarean delivery among nulliparous women at term. Obstetrics and Gynecology, 116(1), 35-42. doi:10.1097/AOG.0b013e3181e10c5c00006250-201007000-00008 [pii]

Glantz, J. C. (2005). Elective induction vs. spontaneous labor associations and outcomes. Journal of Reproductive Medicine, 50(4), 235-240.

Grivell, R. M., Alfirevic, Z., Gyte, G. M., & Devane, D. (2015). Antenatal cardiotocography for fetal assessment. Cochrane Database Syst Rev(9), CD007863. doi:10.1002/14651858.CD007863.pub4

Jacquemyn, Y., Michiels, I., & Martens, G. (2012). Elective induction of labour increases caesarean section rate in low risk multiparous women. Journal of Obstetrics and Gynaecology, 32(3), 257-259. doi:10.3109/01443615.2011.645091

Jolles, D. R., Langford, R., Stapleton, S., Cesario, S., Koci, A., & Alliman, J. (2017). Outcomes of childbearing Medicaid beneficiaries engaged in care at Strong Start birth center sites between 2012 and 2014. Birth, 44(4), 298-305. doi:10.1111/birt.12302

Lalor, J. G., Fawole, B., Alfirevic, Z., & Devane, D. (2009). Biophysical profile for fetal assessment in high risk pregnancies. Cochrane Database Syst Rev(1), CD000038. doi:10.1002/14651858.CD000038.pub2

Le Ray, C., Carayol, M., Breart, G., & Goffinet, F. (2007). Elective induction of labor: failure to follow guidelines and risk of cesarean delivery. Acta Obstetricia et Gynecologica Scandinavica, 86(6), 657-665. doi:778947535 [pii]10.1080/00016340701245427

Luthy, D. A., Malmgren, J. A., & Zingheim, R. W. (2004). Cesarean delivery after elective induction in nulliparous women: the physician effect. American Journal of Obstetrics and Gynecology, 191(5), 1511-1515.

Macer, J. A., Macer, C. L., & Chan, L. S. (1992). Elective induction versus spontaneous labor: a retrospective study of complications and outcome. American Journal of Obstetrics and Gynecology, 166(6 Pt 1), 1690-1696; discussion 1696-1697.

Maslow, A. S., & Sweeny, A. L. (2000). Elective induction of labor as a risk factor for cesarean delivery among low-risk women at term. Obstetrics and Gynecology, 95(6 Pt 1), 917-922.

Mittendorf, R., Williams, M. A., Berkey, C. S., & Cotter, P. F. (1990). The length of uncomplicated human gestation. Obstetrics and Gynecology, 75(6), 929-932.

Saccone, G., Ciardulli, A., Baxter, J. K., Quinones, J. N., Diven, L. C., Pinar, B., . . . Berghella, V. (2017). Discontinuing Oxytocin Infusion in the Active Phase of Labor: A Systematic Review and Meta-analysis. Obstetrics and Gynecology, 130(5), 1090-1096. doi:10.1097/AOG.0000000000002325

Seyb, S. T., Berka, R. J., Socol, M. L., & Dooley, S. L. (1999). Risk of cesarean delivery with elective induction of labor at term in nulliparous women. Obstetrics and Gynecology, 94(4), 600-607.

Vahratian, A., Zhang, J., Troendle, J. F., Sciscione, A. C., & Hoffman, M. K. (2005). Labor progression and risk of cesarean delivery in electively induced nulliparas. Obstetrics and Gynecology, 105(4), 698-704.

van Der Hulst, L. A., van Teijlingen, E. R., Bonsel, G. J., Eskes, M., & Bleker, O. P. (2004). Does a pregnant woman's intended place of birth influence her attitudes toward and occurrence of obstetric interventions? Birth, 31(1), 28-33. doi:271 [pii]

van Gemund, N., Hardeman, A., Scherjon, S. A., & Kanhai, H. H. (2003). Intervention rates after elective induction of labor compared to labor with a spontaneous onset. A matched cohort study. Gynecologic and Obstetric Investigation, 56(3), 133-138.

Vardo, J. H., Thornburg, L. L., & Glantz, J. C. (2011). Maternal and neonatal morbidity among nulliparous women undergoing elective induction of labor. Journal of Reproductive Medicine, 56(1-2), 25-30.

Vrouenraets, F. P., Roumen, F. J., Dehing, C. J., van den Akker, E. S., Aarts, M. J., & Scheve, E. J. (2005). Bishop score and risk of cesarean delivery after induction of labor in nulliparous women. Obstetrics and Gynecology, 105(4), 690-697.

Wickham, S. (2014). Does induction really reduce the likelihood of caesarean section? Practicing Midwife, 17(8), 39-40.

Yeast, J. D., Jones, A., & Poskin, M. (1999). Induction of labor and the relationship to cesarean delivery: A review of 7001 consecutive inductions. American Journal of Obstetrics and Gynecology, 180(3 Pt 1), 628-633.

About Henci Goer

Henci Goer

Henci Goer

Henci Goer, award-winning medical writer, and internationally known speaker, is an acknowledged expert on evidence-based maternity care. Her first book, Obstetric Myths Versus Research Realities, was a valued resource for childbirth professionals. Its successor, Optimal Care in Childbirth: The Case for a Physiologic Approach, won the American College of Nurse-Midwives “Best Book of the Year” award. Goer has also written The Thinking Woman's Guide to a Better Birth, which gives pregnant women access to the research evidence, as well as consumer education pamphlets and articles for trade, consumer, and academic periodicals; and she posts regularly on Lamaze International’s Science & Sensibility. Goer is founder and director of Childbirth U, a website offering narrated slide lectures to help pregnant women make informed decisions and obtain optimal care for themselves and their babies.


To leave a comment, click on the Comment icon on the left side of the screen.  You must login to submit a comment.  

Recent Stories
Interview with Carol Sakala, Listening to Mothers in California Lead Researcher

Listening to Mothers in California: New Survey Results Give a State's Mothers a Voice

Shape Future Blog Content - Take Our Quick Readership Survey Now!