Wetlands are the world's largest natural source of methane, a powerful greenhouse gas. The strong sensitivity of methane emissions to environmental factors such as soil temperature and moisture has led to concerns about potential positive feedbacks to climate change. This risk is particularly relevant at high latitudes, which have experienced pronounced warming and where thawing permafrost could potentially liberate large amounts of labile carbon over the next 100 years. However, global models disagree as to the magnitude and spatial distribution of emissions, due to uncertainties in wetland area and emissions per unit area and a scarcity of in situ observations. Recent intensive field campaigns across the West Siberian Lowland (WSL) make this an ideal region over which to assess the performance of large-scale process-based wetland models in a high-latitude environment. Here we present the results of a follow-up to the Wetland and Wetland CH[subscript 4] Intercomparison of Models Project (WETCHIMP), focused on the West Siberian Lowland (WETCHIMP-WSL). We assessed 21 models and 5 inversions over this domain in terms of total CH[subscript 4] emissions, simulated wetland areas, and CH[subscript 4] fluxes per unit wetland area and compared these results to an intensive in situ CH[subscript 4] flux data set, several wetland maps, and two satellite surface water products. We found that (a) despite the large scatter of individual estimates, 12-year mean estimates of annual total emissions over the WSL from forward models (5.34 ± 0.54 Tg CH[subscript 4] yr[superscript −1]), inversions (6.06 ± 1.22 Tg CH[subscript 4] yr[superscript −1]), and in situ observations (3.91 ± 1.29 Tg CH[subscript 4] yr[superscript −1]) largely agreed; (b) forward models using surface water products alone to estimate wetland areas suffered from severe biases in CH[subscript 4] emissions; (c) the interannual time series of models that lacked either soil thermal physics appropriate to the high latitudes or realistic emissions from unsaturated peatlands tended to be dominated by a single environmental driver (inundation or air temperature), unlike those of inversions and more sophisticated forward models; (d) differences in biogeochemical schemes across models had relatively smaller influence over performance; and (e) multiyear or multidecade observational records are crucial for evaluating models' responses to long-term climate change.