Lecture 002 — Expanding SMILES to more Diverse Functional Groups: Introduction to Alkenes, Alkynes, and Carbonyls.

Sulstice
3 min readOct 29, 2022

This lecture will continue from Lecture 001 where we will expand our writing of SMILES to more advanced string representations. Previously, we wrote Propane like so:

CCC

But let’s say we would like to expand this into propene:

In SMILES we denote a double bond with the equal sign,= , which is pretty intuitive to read. To add to a string it’s pretty easy, we place an equal sign between the letters where we want to install the double bond.

C=CC

We can also easily expand this to a molecule like alkyne:

Where the triple bond is represented as a hashtag # and the string is written in the same was as the double bond by placing the symbol between the letters:

C#CC

Using characters and symbology to represent our constructs is how we create and develop a language. Let’s go back and look at our previous lecture where we had propanol

CCCO

Let’s convert this alcohol into an aldehyde where the carbon is double bonded to the oxygen and then into a carboxylic acid.

And here we go, remember the hydrogens are implied:

CCCO --> CCC=O

Let’s continue the expansion into a a carboxylic acid:

CCC(=O)O

We wrap the double bonded oxygen into a parenthesis as a branch or the other way where the alcohol group is branched.

CCC(O)=O

From here you can start to realize that functional group transformation is pretty easy in SMILES, this is what makes it so powerful. There is also something you might have noticed that two different SMILES strings of the propionic acid actually give you the same compound structure.

So multiple SMILES strings can equal the same structure. When one SMILES string is uniquely matched to one structure it is called canonicalism. And idea that will be later expanded in a future lecture. Non-canonical SMILES are fine for now for our purposes.

For your homework, do the assigned practice problem which will be a little tricky but should be able to accomplish with everything I have taught in the past two lectures.

Practice Problems

  1. Write a program that detects pure hydrocarbon molecules from a list and convert it into a carboxylic acid.
molecules = [
'CCCC=C',
'CCCC',
'CC#CCC#C',
'CCCCCCCCCC',
'CCC([H])([H])([H])'
]

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Sulstice
Sulstice

No responses yet

Write a response