This project focuses on building an online tool that facilitates data extraction from oil and gas regulatory documents using AI/ML techniques. State agencies maintain historic regulatory records for oil and gas wells under their jurisdiction. These records contain valuable information about the design, construction, and operation of wells over their lifetime. In many states, these valuable data are locked away in disparate forms. Our tool will help state agencies, researchers, and other stakeholders digitize information from historic records and organized databases. These databases of well construction information will help prioritize orphaned well abandonment.


Our objective is to have a working prototype of the web tool by the end of the first year of the project (~7/1/2024).

  1.  Short term impact
    • Facilitates digitization of regulatory records for batches of oil and gas wells. Will be helpful when considering smaller subsets of orphaned wells for projects.
  1. Long term impact
    • Enables development of statewide well construction databases containing digitized regulatory records. Newly gathered data will inform state orphaned well plugging prioritization schemes and may help determine characteristics of undocumented wells.


  • Begun conversations with state agencies that manage oil and gas records in Pennsylvania, Colorado, and Oklahoma to get feedback on approach.
  • Identified target document types
  • Prototyped data extraction using a combination of optical character recognition and large language modeling.

Research Products


CATALOG Data Extractor


Dan O’Malley 
Los Alamos National Laboratory

Greg Lackey
National Energy Technology Laboratory