Background: Physicians invest hours creating patient notes, which are rich in information but difficult for computers to analyze due to their unstructured format. GPT-4 reshaped our ability to process text, yet it is unknown how well this model can handle medical notes. This project aims to compare GPT-4's ability to annotate medical notes against experienced physicians across three different languages at multiple institutions and countries. Methods: This study included eight sites from four countries - the United States, Colombia, Singapore, and Italy. Each site contributed seven de-identified notes (admission, progress, or consult) from hospitalized patients. GPT-4 assessed each note by answering 14 questions, including demographic information, clinical judgments, data quality, and patients' eligibility for a hypothetical study enrollment. For validation, two physicians from each site independently evaluated GPT-4's responses. Findings: Overall, 56 medical notes, written in English, Italian, and Spanish, were analyzed. A total of 784 responses from GPT-4 were generated. Both physicians agreed with GPT-4's response 79% of the time (622/784, 95%CI 76-82%). Only one of the two physicians agreed with GPT-4's response 10% of the time (82/784, 95%CI 8-13%). Neither physician agreed with GPT-4's response 10% of the time (80/784, 95%CI 8-13%). Both physicians agreed with GPT-4 more often in notes written in Spanish and Italian than in English, with agreement rates of 88% (86/98, 95%CI 79-93%), 84% (82/98, 95%CI 75-90%), and 77% (454/588, 95%CI 74-80%), respectively. Hallucinations were rare (10/784, 95%CI 0-2%). GPT-4 correctly selected patients for a hypothetical study enrollment based on three criteria 90% of the time (95%CI 81-98%). Interpretation: The findings indicate that GPT-4 annotations demonstrated a high agreement rate with physicians across all languages. We also demonstrate GPT-4's potential to assist in patient selection for studies.Funding: None.Declaration of Interest: All authors report no conflicts of interests. DH receives support from the grant #UM1TRO04404. DRM receives support from the National Center for Advancing Translational Sciences of the National Institutes of Health under the award #UL1TR002366. EP and RB receive support from the Horizon 2020 Project Periscope under the grant #101016233. GSO receives support from grants #U24CA271037 and #P30ES017885. VLM receives support through his institution from Siemens Healthineers, the Melvyn Rubenfire Professorship in Preventive Cardiology, and the grants #R01AG059729, #R01HL136685, and #U01DK123013 from the National Institutes of Health, and the American Heart Association Strategically Focused Research Network #20SFRN35120123. ZX receives support from the National Institute of Neurological Disorders and Stroke under the grants #R01NS098023 and #R01NS124882.Ethical Approval: This retrospective study included sites in the United States, Colombia, Singapore, and Italy. These sites were Boston Children's Hospital (BCH), University of Michigan (UMICH), University of Pittsburgh Medical Center (UPMC), University of Wisconsin (WISC), University of Kansas Medical Center (KUMC), Universidad de Antioquia (UDEA), National University of Singapore (NUS), and Istituti Clinici Scientifici Maugeri (ICSM). Harvard Department of Biomedical Informatics served as a coordinating center. Each site obtained institutional review board approval and had data use agreements in place.