Ropecount

R.

    For the first time, the original protein was successfully generated from scratch, and AI did it

    PHOTO CREDIT: IAN C. HAYDON

    Scientists have created an artificial intelligence (AI) system capable of generating artificial enzymes from scratch. In laboratory tests, some of the enzymes were as effective as those found in nature, despite the fact that the artificially generated amino acid sequences differed significantly from any known natural protein. Related research results were published in Nature-Biotechnology on January 26.

    This experiment shows that while natural language processing was developed for reading and writing linguistic texts, at least some fundamental principles of biology can be learned. Salesforce Research has developed an AI program called ProGen that uses next-generation marker prediction to assemble amino acid sequences into artificial proteins.

    The new technique, which could be more powerful than the Nobel Prize-winning protein design technique known as directed evolution, will speed up the development of new proteins and revitalize the 50-year-old field of protein engineering, scientists say. These new proteins could be used in almost anything from treating disease to degrading plastic.

    "The performance of the artificial design is better than the design inspired by the evolutionary process." James Fraser, one of the authors of the study and a professor of bioengineering and therapeutic science at the University of California, San Francisco School of Pharmacy, said that the language model is learning various aspects of evolution, But it's different from the normal evolutionary process. "We are now able to tune the production of these properties for specific effects, such as enzymes that are very resistant to heat or prefer acidic environments and do not interact with other proteins."

    To create the model, scientists simply fed the machine-learning model the amino acid sequences of 280 million different proteins and let it digest the information within a few weeks. They then fine-tuned the model using 56,000 sequences from five lysozyme families, as well as contextual information about those proteins.

    The model quickly generated 1 million sequences. The research team selected 100 sequences to test based on how similar they were to natural protein sequences, and how natural the AI protein's underlying amino acid "syntax" and "semantics" were.

    Of the first 100 proteins screened in vitro by Tierra Biosciences, the team made five artificial proteins for cell testing and compared their activity to an enzyme found in egg white (hen egg white lysozyme, HEWL ) were compared. Similar lysozymes are also found in human tears, saliva and milk, where they protect against bacteria and fungi.

    Two of the artificial enzymes were able to break down the bacterial cell wall with activity comparable to that of HEWL. But their sequences are only about 18 percent identical, with the two sequences being 90 percent and 70 percent similar, respectively, to any known protein.

    A single mutation in the natural protein can make it stop working. But in another round of screening, the team found that the AI-generated enzyme showed activity even though only 31.4 percent of its sequence was similar to any known natural protein.

    The AI was even able to understand how enzymes were formed by studying raw sequence data. X-ray crystallographic measurements have revealed that the atomic structures of the artificial proteins look just like they should, despite the unprecedented sequences.

    In 2020, Salesforce Research developed ProGen based on a type of natural language programming that researchers originally developed to generate English-language text. They knew from previous work that AI systems can teach themselves grammar and the meaning of words, as well as other basic rules that keep writing organized.

    "When you train sequence-based models with a lot of data, they're really powerful at learning structure and rules, understanding which words can co-occur, and compositionality," said one of the study's corresponding authors and director of artificial intelligence research at Salesforce Research. Nikhil Naik said.

    When it comes to proteins, the design options are nearly limitless. Lysozyme is as small as a protein, with a maximum of about 300 amino acids. But there are 20 possible amino acids, so there are 20,300 possible combinations. Given the endless possibilities, it's remarkable how easily the model can generate enzymes.

    Ali Madani, first author of the study and founder of Profluent Bio, said: "The ability to generate functional proteins from scratch shows that we are entering a new era of protein design. This is a versatile new tool for protein engineers. Looking forward to seeing its therapeutic application."

    Related paper information:

    https://doi.org/10.1038/s41587-022-01618-2

    (Originally titled "AI successfully generated raw protein from zero for the first time")

    Comments

    Leave a Reply

    + =