Every Thursday morning, we get together and go over a portion of a thesis, a short paper, a conference submission, an abstract or suchlike that is in the process of being written by a member of the group. This morning was my turn: 4 pages, 1 hour. In the end, we only got through 2 1/2 pages. It's not that there were huge changes: it's more that we cover both grammar and meaning. And it was fantastically useful (see Matt, I can get the word "fantastically" into a work document! Or was it supposed to be fantastical?).
This writing group has been helpful to so many of us in our group, and my thanks go to Jen for starting it up. I highly recommend such a practice anyway. Not only does it encourage people to write more to ensure that each week is filled up, it also means that what comes out of our group is (hopefully) more consistent and more polished. Consistent because we have all started adopting group-wide conventions for dashes ("group-wide conventions"), commas (Oxford comma or no?), mixed casing (Web Services or web services?) and more; polished because we get an end result which is more cohesive, reads better, and is generally more succinct.
So, what would my tips be after getting a good set of comments this morning from the group?
- Try very hard to give 18-24 hours' notice to the rest of your group. Often the paper doesn't get sent around ahead of time until 5pm the day before (as happened with my paper this week), and don't go too hard on them if that's the case. The fact remains, however: earlier is better. You'll get more complete feedback from your peers
- Always bring along a copy of Strunk and White. Masters of concise writing and the answering of tricky grammar questions.
- You might also like Truss' Eats, Shoots & Leaves. Humorous, and gives you the right idea about punctuation.
- Send an editable version of the file if you are also sending a PDF. If you are a latex person, ALWAYS send the tex version around as well as the PDF. Not everyone can comment directly onto PDF, and making it possible for them to easily edit the tex file (and track changes) is really helpful.
- A writing group is always useful, at any stage of the writing process. It doesn't matter how early on or late into the paper-writing procedure you are: other people always catch things that you haven't seen, no matter how often you've read it!
- No more than 4 single-spaced pages, max! And, as I've said, even that is often too long. Takes less time for the group members to read, and it means you can get a greater depth of responses for the section you've sent out.
Summary: It's great. Go for it!
And the result? U can haz gud riting too. But maybe, you'd prefer a cheezburger.
...Or, on defining a protein.
For some time now, I have been interested in the meanings of things. I am working with ontologies and terminologies in my area of research, and producing rigorous definitions that can be written logically requires deep thinking on the meanings of things. I'm writing this up after a discussion I had with a number of colleagues1, and it was such an interesting example of the thought process that goes into creating definitions suitable for ontologies using Description Logics (which many of mine do) that I thought I would reproduce a distillation of that conversation here. The reason I was delving into what a protein is (or, rather, how I wish a protein to be modelled in an ontology) isn't so important - it's all about the journey, in this case.
Step 1: I have the terms of "polypeptide chain", "protein", and "protein complex". What exactly are they?
You don't always need to start with this question: indeed, you may just start with a biological domain (e.g. translation), and build up what you want. However, in this case I was constrained, and needed to answer this particular question. It quickly became clear that, as with many common words, we had a feel for what each term meant, but didn't really know where one term ended and the other began. We looked at the GO definition of protein complex (which we liked), multiple online definitions of protein, and many definitions of polypeptide, including the one from SO (aka the Sequence Types and Feature Ontology). SO had polypeptide as a synonym for protein, while others said proteins could have more than one polypeptide chain. And if the latter is true, then where do you draw the line between a protein and a protein complex? So, we end up at Step 2.
Step 2: What are the differentiating features that make up the concept of "polypeptide chain", "protein", and "protein complex"?
This is actually a more telling question, as it will lead you to a list of things that make each concept unique. In Description Logics, you have two main ideas behind defining a concept: what is necessary to about a concept, and what is both necessary and sufficient. Take a simple example: we can make a statement about the normal2 state of the concept dog saying that a dog has 4 legs. This statement is necessary because if we declare something to be a dog, then we can infer that it has 4 legs. However, it is not sufficient to describe the concept of a dog unambiguously: if something has 4 legs, we cannot say with certainty that it must be a dog. When you make a definition of any concept, keeping the ideas of necessary and necessary and sufficient in your head is very handy. You'll generally end up with a list of statements, some of which together make up a necessary and sufficient list, and some of which are simply necessary. So, by listing differentiating features of these three terms, we started thinking about their definitions in a logical way. Here were some thoughts we had, as an example of the process we went through:
- Can the concept of a protein include extra features such as metals, or are these just cofactors and not what makes a protein, a protein?
- What is different about a protein and a polypeptide chain? Is a protein just a specialization of a polypeptide chain (a parent-child relationship), or not (e.g. a sibling or even further apart)?
- If a protein can have multiple polypeptide chains, then what differentiates the concept of a protein from that of a protein complex?
Step 3: The Answers
We weren't out to create an entire ontology here (I'll leave that for another post some other time), just think of some sensible starting definitions for these concepts that would be unambiguous and useful in another context. However, we did try to think of relationships between concepts as a fundamental part of their definitions: what are relationships but another type of logic statement that falls into either the necessary (N) or necessary and sufficient (N&S) categories?
PLEASE NOTE: By creating these first-attempt definitions, we are not trying to define these concepts for the entire biological world. The point in my mind is not that we get a definition that 100% of people agree on. The actual point is to get an unambiguous definition that, if written in an appropriate language, would be intelligible by both programs and people. If a group of you share a common understanding of a concept (a bit like a group hallucination? :) ) then you can all talk about it sensibly, and then magic things with inference and integration of data can happen!
Polypeptide Chain (PC): It's all about a lack of tertiary structure, and a multiplicity of 1. We had a number of starting points for this definition, as this concept was already in the ontology under discussion. In the end, we were happy to keep that definition, which boiled down to the following set of logic statements. (I'm paraphrasing here to keep the post as generic as possible.) Necessary: A string of amino acids linked by peptide bonds. However, this is not N&S, as there are many things which would fit this sentence, but have other parts to it that would prevent it from just being a PC. If we wanted to make this a N&S statement, we could change it to be something like: has exactly one string of amino acids linked by peptide bonds, and has no other parts3. By stating that there can be only the one component to make a PC, then if any object meets this set of criteria, then we can infer that it must be of type "polypeptide chain". That is what the N&S statements give us.
Protein: It's all about a presence of tertiary structure, and can be composed of either one chain or multiple "permanent", covalently-linked chains. To differentiate it from a PC, we had the common-sense statement that a PC does not have have any appreciable tertiary structure, while a protein does (e.g. disulphide bonds). Such a trait is N and not N&S: there are many other things in biology which have tertiary structure and which are not proteins. If that was the only defining feature, we could have placed protein as a child of PC, as protein would have been a more specific type of PC. This is not the case, however: things which are commonly called proteins often have multiple PCs, and we wanted to include this usage in our definition of protein. So, we've differentiated it from a PC based on both its structure and its multiplicity.
Next, how can we define a protein in a way that separates it from protein complex, which also has multiple PCs? Could it be that a multiple-PC protein only ever has PCs that are from the same transcript? No, they can be encoded by multiple transcripts and still be called a protein in conventional use. We quickly realized that finding a clear definition would be hard. In the end, the two main distinguishing features between protein and protein complex seemed to be that proteins, even if containing multiple PCs, were in a more permanent state of attachment than complexes, and that proteins always had covalently-linked subunits. An example of this is the insulin receptor, commonly classed as a protein, and whose alpha and beta subunits are connected with disulphide bonds.
Protein Complex (PCX): It's all about transience of the association between proteins (and perhaps a little about non-covalently linked subunits). More than one protein joined together, generally non-covalently, in a more transient way than with a multi-chain protein. This is N, as other objects may have non-covalently-joined proteins and are not protein complexes. An example of a PCX would be a transcription factor. This is our more problematic definition, and at the time couldn't think of a good N&S statement.
Final Thoughts
Since the original discussion, more colleagues have joined in, noting that there are some things we call protein complexes which have covalently-bonded subunits. For example, ubiquitination (a covalent modification) is seen as something suitable for a protein complex. This leaves us with just the transience argument. You can see how this could be a problem, and why this sort of work can be difficult and frustrating.
There will always be multiple, contradictory meanings for words like gene and protein, which are in such common use. The thing to do isn't to try to change that - just to try to provide a rigorous way of capturing each useful orthologous definition, and having differently-named concepts for each one.
I'm not sure we scientists think enough about rigorous definitions for such important words (and we should), and I'm not sure any life-sciences ontology or terminology has gotten it completely right (or completely "complete") yet. Most terminologies/ontologies are currently very good at labelling (creating lots of term lists and hierarchies), but not very good at classification yet (don't provide precise, rigorous definitions). As one of my colleagues said, most of what's available now are "natural histories" of a term such as protein rather than true definition of the biology behind it. I'm sure we'll get there, though!
So, paraphrasing the words of a well-known Spaniard, next time you use the words gene or protein, think also of the people you're using it with, and have a ponder on whether or not you mean what you think they mean.
1 "Colleagues", if you wish to be named, just let me know. I didn't want to take liberties!
2 Yes, we can discuss all day about what "normal" is. Just take it in the spirit it's written, for this small example. However, if you wish to have a discussion, just let me know...! :)
3 "parts" is vague, most definitely. However, this is a general discussion of the process, and saying anything more about what "parts" are at this stage would necessitate the creation of an entire ontology, not just these three concepts that we were interested in! If we carry it to extreme, we'd have to create new concepts in our ontology for virutally all of the nouns that we use in our definitions. This is the proper way to build an ontology, but this post is about a short exercise in thinking about good definitions, NOT about building an entire ontology.